All of lore.kernel.org
 help / color / mirror / Atom feed
* BFS vs. mainline scheduler benchmarks and measurements
@ 2009-09-06 20:59 Ingo Molnar
  2009-09-07  2:05 ` Frans Pop
                   ` (7 more replies)
  0 siblings, 8 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-06 20:59 UTC (permalink / raw)
  To: Con Kolivas, linux-kernel; +Cc: Peter Zijlstra, Mike Galbraith

hi Con,

I've read your BFS announcement/FAQ with great interest:

    http://ck.kolivas.org/patches/bfs/bfs-faq.txt

First and foremost, let me say that i'm happy that you are hacking 
the Linux scheduler again. It's perhaps proof that hacking the 
scheduler is one of the most addictive things on the planet ;-)

I understand that BFS is still early code and that you are not 
targeting BFS for mainline inclusion - but BFS is an interesting 
and bold new approach, cutting a _lot_ of code out of 
kernel/sched*.c, so it raised my curiosity and interest :-)

In the announcement and on your webpage you have compared BFS to 
the mainline scheduler in various workloads - showing various 
improvements over it. I have tried and tested BFS and ran a set of 
benchmarks - this mail contains the results and my (quick) 
findings.

So ... to get to the numbers - i've tested both BFS and the tip of 
the latest upstream scheduler tree on a testbox of mine. I 
intentionally didnt test BFS on any really large box - because you 
described its upper limit like this in the announcement:

-----------------------
|
| How scalable is it?
|
| I don't own the sort of hardware that is likely to suffer from 
| using it, so I can't find the upper limit. Based on first 
| principles about the overhead of locking, and the way lookups 
| occur, I'd guess that a machine with more than 16 CPUS would 
| start to have less performance. BIG NUMA machines will probably 
| suck a lot with this because it pays no deference to locality of 
| the NUMA nodes when deciding what cpu to use. It just keeps them 
| all busy. The so-called "light NUMA" that constitutes commodity 
| hardware these days seems to really like BFS.
|
-----------------------

I generally agree with you that "light NUMA" is what a Linux 
scheduler needs to concentrate on (at most) in terms of 
scalability. Big NUMA, 4096 CPUs is not very common and we tune the 
Linux scheduler for desktop and small-server workloads mostly.

So the testbox i picked fits into the upper portion of what i 
consider a sane range of systems to tune for - and should still fit 
into BFS's design bracket as well according to your description: 
it's a dual quad core system with hyperthreading. It has twice as 
many cores as the quad you tested on but it's not excessive and 
certainly does not have 4096 CPUs ;-)

Here are the benchmark results:

  kernel build performance:
     http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild.jpg     

  pipe performance:
     http://redhat.com/~mingo/misc/bfs-vs-tip-pipe.jpg

  messaging performance (hackbench):
     http://redhat.com/~mingo/misc/bfs-vs-tip-messaging.jpg  

  OLTP performance (postgresql + sysbench)
     http://redhat.com/~mingo/misc/bfs-vs-tip-oltp.jpg

Alas, as it can be seen in the graphs, i can not see any BFS 
performance improvements, on this box.

Here's a more detailed description of the results:

| Kernel build performance
---------------------------

  http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild.jpg     

In the kbuild test BFS is showing significant weaknesses up to 16 
CPUs. On 8 CPUs utilized (half load) it's 27.6% slower. All results 
(-j1, -j2... -j15 are slower. The peak at 100% utilization at -j16 
is slightly stronger under BFS, by 1.5%. The 'absolute best' result 
is sched-devel at -j64 with 46.65 seconds - the best BFS result is 
47.38 seconds (at -j64) - 1.5% better.

| Pipe performance
-------------------

  http://redhat.com/~mingo/misc/bfs-vs-tip-pipe.jpg

Pipe performance is a very simple test, two tasks message to each 
other via pipes. I measured 1 million such messages:

   http://redhat.com/~mingo/cfs-scheduler/tools/pipe-test-1m.c

The pipe test ran a number of them in parallel:

   for ((i=0;i<$NR;i++)); do ~/sched-tests/pipe-test-1m & done; wait

and measured elapsed time. This tests two things: basic scheduler 
performance and also scheduler fairness. (if one of these parallel 
jobs is delayed unfairly then the test will finish later.)

[ see further below for a simpler pipe latency benchmark as well. ]

As can be seen in the graph BFS performed very poorly in this test: 
at 8 pairs of tasks it had a runtime of 45.42 seconds - while 
sched-devel finished them in 3.8 seconds.

I saw really bad interactivity in the BFS test here - the system 
was starved for as long as the test ran. I stopped the tests at 8 
loops - the system was unusable and i was getting IO timeouts due 
to the scheduling lag:

 sd 0:0:0:0: [sda] Unhandled error code
 sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
 end_request: I/O error, dev sda, sector 81949243
 Aborting journal on device sda2.
 ext3_abort called.
 EXT3-fs error (device sda2): ext3_journal_start_sb: Detected aborted journal
 Remounting filesystem read-only

I measured interactivity during this test:

   $ time ssh aldebaran /bin/true
   real  2m17.968s
   user  0m0.009s
   sys   0m0.003s

A single command took more than 2 minutes.

| Messaging performance
------------------------

  http://redhat.com/~mingo/misc/bfs-vs-tip-messaging.jpg  

Hackbench ran better - but mainline sched-devel is significantly 
faster for smaller and larger loads as well. With 20 groups 
mainline ran 61.5% faster.

| OLTP performance
--------------------

http://redhat.com/~mingo/misc/bfs-vs-tip-oltp.jpg

As can be seen in the graph for sysbench OLTP performance 
sched-devel outperforms BFS on each of the main stages:

   single client load   (   1 client  -   6.3% faster )
   half load            (   8 clients -  57.6% faster )
   peak performance     (  16 clients - 117.6% faster )
   overload             ( 512 clients - 288.3% faster )

| Other tests
--------------

I also tested a couple of other things, such as lat_tcp:

  BFS:          TCP latency using localhost: 16.5608 microseconds
  sched-devel:  TCP latency using localhost: 13.5528 microseconds [22.1% faster]

lat_pipe:

  BFS:          Pipe latency: 4.9703 microseconds
  sched-devel:  Pipe latency: 2.6137 microseconds [90.1% faster]

General interactivity of BFS seemed good to me - except for the 
pipe test when there was significant lag over a minute. I think 
it's some starvation bug, not an inherent design property of BFS, 
so i'm looking forward to re-test it with the fix.

Test environment: i used latest BFS (205 and then i re-ran under 
208 and the numbers are all from 208), and the latest mainline 
scheduler development tree from:

  http://people.redhat.com/mingo/tip.git/README

Commit 840a065 in particular. It's on a .31-rc8 base while BFS is 
on a .30 base - will be able to test BFS on a .31 base as well once 
you release it. (but it doesnt matter much to the results - there 
werent any heavy core kernel changes impacting these workloads.)

The system had enough RAM to have the workloads cached, and i 
repeated all tests to make sure it's all representative. 
Nevertheless i'd like to encourage others to repeat these (or 
other) tests - the more testing the better.

I also tried to configure the kernel in a BFS friendly way, i used 
HZ=1000 as recommended, turned off all debug options, etc. The 
kernel config i used can be found here:

  http://redhat.com/~mingo/misc/config

( Let me know if you need any more info about any of the tests i
  conducted. )

Also, i'd like to outline that i agree with the general goals 
described by you in the BFS announcement - small desktop systems 
matter more than large systems. We find it critically important 
that the mainline Linux scheduler performs well on those systems 
too - and if you (or anyone else) can reproduce suboptimal behavior 
please let the scheduler folks know so that we can fix/improve it.

I hope to be able to work with you on this, please dont hesitate 
sending patches if you wish - and we'll also be following BFS for 
good ideas and code to adopt to mainline.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-06 20:59 BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
@ 2009-09-07  2:05 ` Frans Pop
  2009-09-07 12:16   ` [quad core results] " Ingo Molnar
  2009-09-07  3:38 ` Nikos Chantziaras
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 216+ messages in thread
From: Frans Pop @ 2009-09-07  2:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kernel, linux-kernel, a.p.zijlstra, efault

Ingo Molnar wrote:
> So the testbox i picked fits into the upper portion of what i
> consider a sane range of systems to tune for - and should still fit
> into BFS's design bracket as well according to your description:
> it's a dual quad core system with hyperthreading.

Ingo,

Nice that you've looked into this.

Would it be possible for you to run the same tests on e.g. a dual core 
and/or a UP system (or maybe just offline some CPUs?)? It would be very 
interesting to see whether BFS does better in the lower portion of the 
range, or if the differences you show between the two schedulers are 
consistent across the range.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-06 20:59 BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
  2009-09-07  2:05 ` Frans Pop
@ 2009-09-07  3:38 ` Nikos Chantziaras
  2009-09-07 11:01   ` Frederic Weisbecker
                     ` (2 more replies)
  2009-09-07  3:50 ` Con Kolivas
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-07  3:38 UTC (permalink / raw)
  To: linux-kernel

On 09/06/2009 11:59 PM, Ingo Molnar wrote:
>[...]
> Also, i'd like to outline that i agree with the general goals
> described by you in the BFS announcement - small desktop systems
> matter more than large systems. We find it critically important
> that the mainline Linux scheduler performs well on those systems
> too - and if you (or anyone else) can reproduce suboptimal behavior
> please let the scheduler folks know so that we can fix/improve it.

BFS improved behavior of many applications on my Intel Core 2 box in a 
way that can't be benchmarked.  Examples:

mplayer using OpenGL renderer doesn't drop frames anymore when dragging 
and dropping the video window around in an OpenGL composited desktop 
(KDE 4.3.1).  (Start moving the mplayer window around; then drop it. At 
the moment the move starts and at the moment you drop the window back to 
the desktop, there's a big frame skip as if mplayer was frozen for a 
bit; around 200 or 300ms.)

Composite desktop effects like zoom and fade out don't stall for 
sub-second periods of time while there's CPU load in the background.  In 
other words, the desktop is more fluid and less skippy even during heavy 
CPU load.  Moving windows around with CPU load in the background doesn't 
result in short skips.

LMMS (a tool utilizing real-time sound synthesis) does not produce 
"pops", "crackles" and drops in the sound during real-time playback due 
to buffer under-runs.  Those problems amplify when there's heavy CPU 
load in the background, while with BFS heavy load doesn't produce those 
artifacts (though LMMS makes itself run SCHED_ISO with BFS)  Also, 
hitting a key on the keyboard needs less time for the note to become 
audible when using BFS.  Same should hold true for other tools who 
traditionally benefit from the "-rt" kernel sources.

Games like Doom 3 and such don't "freeze" periodically for small amounts 
of time (again for sub-second amounts) when something in the background 
grabs CPU time (be it my mailer checking for new mail or a cron job, or 
whatever.)

And, the most drastic improvement here, with BFS I can do a "make -j2" 
in the kernel tree and the GUI stays fluid.  Without BFS, things start 
to lag, even with in-RAM builds (like having the whole kernel tree 
inside a tmpfs) and gcc running with nice 19 and ionice -c 3.

Unfortunately, I can't come up with any way to somehow benchmark all of 
this.  There's no benchmark for "fluidity" and "responsiveness". 
Running the Doom 3 benchmark, or any other benchmark, doesn't say 
anything about responsiveness, it only measures how many frames were 
calculated in a specific period of time.  How "stable" (with no stalls) 
those frames were making it to the screen is not measurable.

If BFS would imply small drops in pure performance counted in 
instructions per seconds, that would be a totally acceptable regression 
for desktop/multimedia/gaming PCs.  Not for server machines, of course. 
  However, on my machine, BFS is faster in classic workloads.  When I 
run "make -j2" with BFS and the standard scheduler, BFS always finishes 
a bit faster.  Not by much, but still.  One thing I'm noticing here is 
that BFS produces 100% CPU load on each core with "make -j2" while the 
normal scheduler stays at about 90-95% with -j2 or higher in at least 
one of the cores.  There seems to be under-utilization of CPU time.

Also, by searching around the net but also through discussions on 
various mailing lists, there seems to be a trend: the problems for some 
reason seem to occur more often with Intel CPUs (Core 2 chips and lower; 
I can't say anything about Core I7) while people on AMD CPUs mostly not 
being affected by most or even all of the above.  (And due to this flame 
wars often break out, with one party accusing the other of imagining 
things).  Can the integrated memory controller on AMD chips have 
something to do with this?  Do AMD chips generally offer better 
"multithrading" behavior?  Unfortunately, you didn't mention on what CPU 
you ran your tests.  If it was AMD, it might be a good idea to run tests 
on Pentium and Core 2 CPUs.

For reference, my system is:

CPU: Intel Core 2 Duo E6600 (2.4GHz)
Mainboard: Asus P5E (Intel X38 chipset)
RAM: 6GB (2+2+1+1) dual channel DDR2 800
GPU: RV770 (Radeon HD4870).


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-06 20:59 BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
  2009-09-07  2:05 ` Frans Pop
  2009-09-07  3:38 ` Nikos Chantziaras
@ 2009-09-07  3:50 ` Con Kolivas
  2009-09-07 18:20   ` Jerome Glisse
  2009-09-07  9:49 ` Jens Axboe
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 216+ messages in thread
From: Con Kolivas @ 2009-09-07  3:50 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Peter Zijlstra, Mike Galbraith

2009/9/7 Ingo Molnar <mingo@elte.hu>:
> hi Con,

Sigh..

Well hello there.

>
> I've read your BFS announcement/FAQ with great interest:
>
>    http://ck.kolivas.org/patches/bfs/bfs-faq.txt

> I understand that BFS is still early code and that you are not
> targeting BFS for mainline inclusion - but BFS is an interesting
> and bold new approach, cutting a _lot_ of code out of
> kernel/sched*.c, so it raised my curiosity and interest :-)

Hard to keep a project under wraps and get an audience at the same
time, it is. I do realise it was inevitable LKML would invade my
personal space no matter how much I didn't want it to, but it would be
rude of me to not respond.

> In the announcement and on your webpage you have compared BFS to
> the mainline scheduler in various workloads - showing various
> improvements over it. I have tried and tested BFS and ran a set of
> benchmarks - this mail contains the results and my (quick)
> findings.

/me sees Ingo run off to find the right combination of hardware and
benchmark to prove his point.

[snip lots of bullshit meaningless benchmarks showing how great cfs is
and/or how bad bfs is, along with telling people they should use these
artificial benchmarks to determine how good it is, demonstrating yet
again why benchmarks fail the desktop]

I'm not interested in a long protracted discussion about this since
I'm too busy to live linux the way full time developers do, so I'll
keep it short, and perhaps you'll understand my intent better if the
FAQ wasn't clear enough.


Do you know what a normal desktop PC looks like? No, a more realistic
question based on what you chose to benchmark to prove your point
would be: Do you know what normal people actually do on them?


Feel free to treat the question as rhetorical.

Regards,
-ck

/me checks on his distributed computing client's progress, fires up
his next H264 encode, changes music tracks and prepares to have his
arse whooped on quakelive.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-06 20:59 BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
                   ` (2 preceding siblings ...)
  2009-09-07  3:50 ` Con Kolivas
@ 2009-09-07  9:49 ` Jens Axboe
  2009-09-07 10:12   ` Nikos Chantziaras
                     ` (3 more replies)
  2009-09-07 15:16 ` BFS vs. mainline scheduler benchmarks and measurements Michael Buesch
                   ` (3 subsequent siblings)
  7 siblings, 4 replies; 216+ messages in thread
From: Jens Axboe @ 2009-09-07  9:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

On Sun, Sep 06 2009, Ingo Molnar wrote:
> So ... to get to the numbers - i've tested both BFS and the tip of 
> the latest upstream scheduler tree on a testbox of mine. I 
> intentionally didnt test BFS on any really large box - because you 
> described its upper limit like this in the announcement:

I ran a simple test as well, since I was curious to see how it performed
wrt interactiveness. One of my pet peeves with the current scheduler is
that I have to nice compile jobs, or my X experience is just awful while
the compile is running.

Now, this test case is something that attempts to see what
interactiveness would be like. It'll run a given command line while at
the same time logging delays. The delays are measured as follows:

- The app creates a pipe, and forks a child that blocks on reading from
  that pipe.
- The app sleeps for a random period of time, anywhere between 100ms
  and 2s. When it wakes up, it gets the current time and writes that to
  the pipe.
- The child then gets woken, checks the time on its own, and logs the
  difference between the two.

The idea here being that the delay between writing to the pipe and the
child reading the data and comparing should (in some way) be indicative
of how responsive the system would seem to a user.

The test app was quickly hacked up, so don't put too much into it. The
test run is a simple kernel compile, using -jX where X is the number of
threads in the system. The files are cache hot, so little IO is done.
The -x2 run is using the double number of processes as we have threads,
eg -j128 on a 64 thread box.

And I have to apologize for using a large system to test this on, I
realize it's out of the scope of BFS, but it's just easier to fire one
of these beasts up than it is to sacrifice my notebook or desktop
machine... So it's a 64 thread box. CFS -jX runtime is the baseline at
100, lower number means faster and vice versa. The latency numbers are
in msecs.


Scheduler       Runtime         Max lat     Avg lat     Std dev
----------------------------------------------------------------
CFS             100             951         462         267
CFS-x2          100             983         484         308
BFS
BFS-x2

And unfortunately this is where it ends for now, since BFS doesn't boot
on the two boxes I tried. It hard hangs right after disk detection. But
the latency numbers look pretty appalling for CFQ, so it's a bit of a
shame that I did not get to compare. I'll try again later with a newer
revision, when available.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07  9:49 ` Jens Axboe
@ 2009-09-07 10:12   ` Nikos Chantziaras
  2009-09-07 10:41     ` Jens Axboe
  2009-09-07 11:57   ` Jens Axboe
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-07 10:12 UTC (permalink / raw)
  To: linux-kernel

On 09/07/2009 12:49 PM, Jens Axboe wrote:
> [...]
> And I have to apologize for using a large system to test this on, I
> realize it's out of the scope of BFS, but it's just easier to fire one
> of these beasts up than it is to sacrifice my notebook or desktop
> machine...

How does a kernel rebuild constitute "sacrifice"?


> So it's a 64 thread box. CFS -jX runtime is the baseline at
> 100, lower number means faster and vice versa. The latency numbers are
> in msecs.
>
>
> Scheduler       Runtime         Max lat     Avg lat     Std dev
> ----------------------------------------------------------------
> CFS             100             951         462         267
> CFS-x2          100             983         484         308
> BFS
> BFS-x2
>
> And unfortunately this is where it ends for now, since BFS doesn't boot
> on the two boxes I tried.

Then who post this in the first place?


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 10:12   ` Nikos Chantziaras
@ 2009-09-07 10:41     ` Jens Axboe
  0 siblings, 0 replies; 216+ messages in thread
From: Jens Axboe @ 2009-09-07 10:41 UTC (permalink / raw)
  To: Nikos Chantziaras; +Cc: linux-kernel

On Mon, Sep 07 2009, Nikos Chantziaras wrote:
> On 09/07/2009 12:49 PM, Jens Axboe wrote:
>> [...]
>> And I have to apologize for using a large system to test this on, I
>> realize it's out of the scope of BFS, but it's just easier to fire one
>> of these beasts up than it is to sacrifice my notebook or desktop
>> machine...
>
> How does a kernel rebuild constitute "sacrifice"?

It's more of a bother since I have to physically be at the notebook,
where as the server type boxes usually have remote management. The
workstation I use currently, so it'd be very disruptive to do it there.
And as things are apparently very alpha on the bfs side currently, it's
easier to 'sacrifice' an idle test box. That's the keyword, 'test'
boxes. You know, machines used for testing. Not production machines.

Plus the notebook is using btrfs which isn't format compatible with
2.6.30 on disk format.

Is there a point to this question?

>> So it's a 64 thread box. CFS -jX runtime is the baseline at
>> 100, lower number means faster and vice versa. The latency numbers are
>> in msecs.
>>
>>
>> Scheduler       Runtime         Max lat     Avg lat     Std dev
>> ----------------------------------------------------------------
>> CFS             100             951         462         267
>> CFS-x2          100             983         484         308
>> BFS
>> BFS-x2
>>
>> And unfortunately this is where it ends for now, since BFS doesn't boot
>> on the two boxes I tried.
>
> Then who post this in the first place?

You snipped the relevant part of the conclusion, the part where I make a
comment on the cfs latencies.

Don't bother replying to any of my emails if YOU continue writing emails
in this fashion. I have MUCH better things to do than entertain kiddies.
If you do get your act together and want to reply, follow lkml etiquette
and group reply.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07  3:38 ` Nikos Chantziaras
@ 2009-09-07 11:01   ` Frederic Weisbecker
  2009-09-08 18:15     ` Nikos Chantziaras
  2009-09-07 14:40   ` Arjan van de Ven
  2009-09-07 23:54   ` Thomas Fjellstrom
  2 siblings, 1 reply; 216+ messages in thread
From: Frederic Weisbecker @ 2009-09-07 11:01 UTC (permalink / raw)
  To: Nikos Chantziaras; +Cc: linux-kernel, Jens Axboe, Ingo Molnar, Con Kolivas

On Mon, Sep 07, 2009 at 06:38:36AM +0300, Nikos Chantziaras wrote:
> Unfortunately, I can't come up with any way to somehow benchmark all of  
> this.  There's no benchmark for "fluidity" and "responsiveness". Running 
> the Doom 3 benchmark, or any other benchmark, doesn't say anything about 
> responsiveness, it only measures how many frames were calculated in a 
> specific period of time.  How "stable" (with no stalls) those frames were 
> making it to the screen is not measurable.



That looks eventually benchmarkable. This is about latency.
For example, you could try to run high load tasks in the
background and then launch a task that wakes up in middle/large
periods to do something. You could measure the time it takes to wake
it up to perform what it wants.

We have some events tracing infrastructure in the kernel that can
snapshot the wake up and sched switch events.

Having CONFIG_EVENT_TRACING=y should be sufficient for that.

You just need to mount a debugfs point, say in /debug.

Then you can activate these sched events by doing:

echo 0 > /debug/tracing/tracing_on
echo 1 > /debug/tracing/events/sched/sched_switch/enable
echo 1 > /debug/tracing/events/sched/sched_wake_up/enable

#Launch your tasks

echo 1 > /debug/tracing/tracing_on

#Wait for some time

echo 0 > /debug/tracing/tracing_off

That will require some parsing of the result in /debug/tracing/trace
to get the delays between wake_up events and switch in events
for the task that periodically wakes up and then produce some
statistics such as the average or the maximum latency.

That's a bit of a rough approach to measure such latencies but that
should work.


> If BFS would imply small drops in pure performance counted in  
> instructions per seconds, that would be a totally acceptable regression  
> for desktop/multimedia/gaming PCs.  Not for server machines, of course.  
> However, on my machine, BFS is faster in classic workloads.  When I run 
> "make -j2" with BFS and the standard scheduler, BFS always finishes a bit 
> faster.  Not by much, but still.  One thing I'm noticing here is that BFS 
> produces 100% CPU load on each core with "make -j2" while the normal 
> scheduler stays at about 90-95% with -j2 or higher in at least one of the 
> cores.  There seems to be under-utilization of CPU time.



That also could be benchmarkable by using the above sched events and
look at the average time spent in a cpu to run the idle tasks.


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07  9:49 ` Jens Axboe
  2009-09-07 10:12   ` Nikos Chantziaras
@ 2009-09-07 11:57   ` Jens Axboe
  2009-09-07 14:14     ` Ingo Molnar
  2009-09-07 18:02   ` Avi Kivity
  2009-09-09  7:38   ` Pavel Machek
  3 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-07 11:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

On Mon, Sep 07 2009, Jens Axboe wrote:
> Scheduler       Runtime         Max lat     Avg lat     Std dev
> ----------------------------------------------------------------
> CFS             100             951         462         267
> CFS-x2          100             983         484         308
> BFS
> BFS-x2

Those numbers are buggy, btw, it's not nearly as bad. But responsiveness
under compile load IS bad though, the test app just didn't quantify it
correctly. I'll see if I can get it working properly.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07  2:05 ` Frans Pop
@ 2009-09-07 12:16   ` Ingo Molnar
  2009-09-07 12:36     ` Stefan Richter
                       ` (2 more replies)
  0 siblings, 3 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-07 12:16 UTC (permalink / raw)
  To: Frans Pop; +Cc: kernel, linux-kernel, a.p.zijlstra, efault


* Frans Pop <elendil@planet.nl> wrote:

> Ingo Molnar wrote:
> > So the testbox i picked fits into the upper portion of what i
> > consider a sane range of systems to tune for - and should still fit
> > into BFS's design bracket as well according to your description:
> > it's a dual quad core system with hyperthreading.
> 
> Ingo,
> 
> Nice that you've looked into this.
> 
> Would it be possible for you to run the same tests on e.g. a dual 
> core and/or a UP system (or maybe just offline some CPUs?)? It 
> would be very interesting to see whether BFS does better in the 
> lower portion of the range, or if the differences you show between 
> the two schedulers are consistent across the range.

Sure!

Note that usually we can extrapolate ballpark-figure quad and dual 
socket results from 8 core results. Trends as drastic as the ones 
i reported do not get reversed as one shrinks the number of cores. 

[ This technique is not universal - for example borderline graphs
  on cannot be extrapolated down reliably - but the graphs i 
  posted were far from borderline. ]

Con posted single-socket quad comparisons/graphs so to make it 100% 
apples to apples i re-tested with a single-socket (non-NUMA) quad as 
well, and have uploaded the new graphs/results to:

  kernel build performance on quad:
     http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild-quad.jpg

  pipe performance on quad:
     http://redhat.com/~mingo/misc/bfs-vs-tip-pipe-quad.jpg

  messaging performance (hackbench) on quad:
     http://redhat.com/~mingo/misc/bfs-vs-tip-messaging-quad.jpg

  OLTP performance (postgresql + sysbench) on quad:
     http://redhat.com/~mingo/misc/bfs-vs-tip-oltp-quad.jpg

It shows similar curves and behavior to the 8-core results i posted 
- BFS is slower than mainline in virtually every measurement. The 
ratios are different for different parts of the graphs - but the 
trend is similar.

I also re-ran a few standalone kernel latency tests with a single 
quad:

lat_tcp:

  BFS:          TCP latency using localhost: 16.9926 microseconds
  sched-devel:  TCP latency using localhost: 12.4141 microseconds [36.8% faster]

  as a comparison, the 8 core lat_tcp result was:

  BFS:          TCP latency using localhost: 16.5608 microseconds
  sched-devel:  TCP latency using localhost: 13.5528 microseconds [22.1% faster]

lat_pipe quad result:

  BFS:          Pipe latency: 4.6978 microseconds
  sched-devel:  Pipe latency: 2.6860 microseconds [74.8% faster]

  as a comparison, the 8 core lat_pipe result was:

  BFS:          Pipe latency: 4.9703 microseconds
  sched-devel:  Pipe latency: 2.6137 microseconds [90.1% faster]

On the desktop interactivity front, i also still saw that bad 
starvation artifact with BFS with multiple copies of CPU-bound 
pipe-test-1m.c running in parallel:

   http://redhat.com/~mingo/cfs-scheduler/tools/pipe-test-1m.c

Start up a few copies of them like this:

  for ((i=0;i<32;i++)); do ./pipe-test-1m & done

and the quad eventually came to a halt here - until the tasks 
finished running.

I also tested a few key data points on dual core and it shows 
similar trends as well (as expected from the 8 and 4 core results).

But ... i'd really encourage everyone to test these things yourself 
as well and not take anyone's word on this as granted. The more 
people provide numbers, the better. The latest BFS patch can be 
found at:

  http://ck.kolivas.org/patches/bfs/

The mainline sched-devel tree can be found at:

  http://people.redhat.com/mingo/tip.git/README

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 12:16   ` [quad core results] " Ingo Molnar
@ 2009-09-07 12:36     ` Stefan Richter
  2009-09-07 13:41     ` Markus Tornqvist
  2009-09-07 15:34     ` Nikos Chantziaras
  2 siblings, 0 replies; 216+ messages in thread
From: Stefan Richter @ 2009-09-07 12:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frans Pop, kernel, linux-kernel, a.p.zijlstra, efault, Jens Axboe

Ingo Molnar wrote:
> i'd really encourage everyone to test these things yourself 
> as well and not take anyone's word on this as granted. The more 
> people provide numbers, the better.

Besides mean values from bandwidth and latency focused tests, standard 
deviations or variance, or e.g. 90th percentiles and perhaps maxima of 
latency focused tests might be of interest.  Or graphs with error bars.
-- 
Stefan Richter
-=====-==--= =--= --===
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 12:16   ` [quad core results] " Ingo Molnar
  2009-09-07 12:36     ` Stefan Richter
@ 2009-09-07 13:41     ` Markus Tornqvist
  2009-09-07 13:59       ` Ingo Molnar
  2009-09-07 14:45       ` Arjan van de Ven
  2009-09-07 15:34     ` Nikos Chantziaras
  2 siblings, 2 replies; 216+ messages in thread
From: Markus Tornqvist @ 2009-09-07 13:41 UTC (permalink / raw)
  To: Ingo Molnar, mjt; +Cc: Frans Pop, kernel, linux-kernel, a.p.zijlstra, efault

Please Cc me as I'm not a subscriber.

(LKML bounced this message once already for 8-bit headers, I'm retrying
now - sorry if someone gets it twice)

On Mon, Sep 07, 2009 at 02:16:13PM +0200, Ingo Molnar wrote:
>
>Con posted single-socket quad comparisons/graphs so to make it 100% 
>apples to apples i re-tested with a single-socket (non-NUMA) quad as 
>well, and have uploaded the new graphs/results to:
>
>  kernel build performance on quad:
>     http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild-quad.jpg
[...]
>
>It shows similar curves and behavior to the 8-core results i posted 
>- BFS is slower than mainline in virtually every measurement. The 
>ratios are different for different parts of the graphs - but the 
>trend is similar.

Dude, not cool.

1. Quad HT is not the same as a 4-core desktop, you're doing it with 8 cores
2. You just proved BFS is better on the job_count == core_count case, as BFS
   says it is, if you look at the graph
3. You're comparing an old version of BFS against an unreleased dev kernel

Also, you said on http://article.gmane.org/gmane.linux.kernel/886319
"I also tried to configure the kernel in a BFS friendly way, i used 
HZ=1000 as recommended, turned off all debug options, etc. The 
kernel config i used can be found here:
http://redhat.com/~mingo/misc/config
"

Quickly looking at the conf you have
CONFIG_HZ_250=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y

And other DEBUG.

-- 
mjt


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 13:41     ` Markus Tornqvist
@ 2009-09-07 13:59       ` Ingo Molnar
  2009-09-09  5:54         ` Markus Tornqvist
  2009-09-07 14:45       ` Arjan van de Ven
  1 sibling, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-07 13:59 UTC (permalink / raw)
  To: Markus =?unknown-8bit?Q?T=F6rnqvist?=
  Cc: Frans Pop, kernel, linux-kernel, a.p.zijlstra, efault


* Markus T?rnqvist <mjt@nysv.org> wrote:

> Please Cc me as I'm not a subscriber.
> 
> On Mon, Sep 07, 2009 at 02:16:13PM +0200, Ingo Molnar wrote:
> >
> >Con posted single-socket quad comparisons/graphs so to make it 100% 
> >apples to apples i re-tested with a single-socket (non-NUMA) quad as 
> >well, and have uploaded the new graphs/results to:
> >
> >  kernel build performance on quad:
> >     http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild-quad.jpg
> [...]
> >
> >It shows similar curves and behavior to the 8-core results i posted 
> >- BFS is slower than mainline in virtually every measurement. The 
> >ratios are different for different parts of the graphs - but the 
> >trend is similar.
> 
> Dude, not cool.
> 
> 1. Quad HT is not the same as a 4-core desktop, you're doing it with 8 cores

No, it's 4 cores. HyperThreading adds two 'siblings' per core, which 
are not 'cores'.

> 2. You just proved BFS is better on the job_count == core_count case, as BFS
>    says it is, if you look at the graph

I pointed that out too. I think the graphs speak for themselves:

     http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild-quad.jpg
     http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild.jpg

> 3. You're comparing an old version of BFS against an unreleased dev kernel

bfs-208 was 1 day old (and it is a 500K+ kernel patch) when i tested 
it against the 2 days old sched-devel tree. Btw., i initially 
measured 205 as well and spent one more day on acquiring and 
analyzing the 208 results.

There's bfs-209 out there today. These tests take 8+ hours to 
complete and validate. I'll re-test BFS in the future too, and as i 
said it in the first mail i'll test it on a .31 base as well once 
BFS has been ported to it:

> > It's on a .31-rc8 base while BFS is on a .30 base - will be able 
> > to test BFS on a .31 base as well once you release it. (but it 
> > doesnt matter much to the results - there werent any heavy core 
> > kernel changes impacting these workloads.)

> Also, you said on http://article.gmane.org/gmane.linux.kernel/886319
> "I also tried to configure the kernel in a BFS friendly way, i used 
> HZ=1000 as recommended, turned off all debug options, etc. The 
> kernel config i used can be found here:
> http://redhat.com/~mingo/misc/config
> "
> 
> Quickly looking at the conf you have
> CONFIG_HZ_250=y
> CONFIG_PREEMPT_NONE=y
> # CONFIG_PREEMPT_VOLUNTARY is not set
> # CONFIG_PREEMPT is not set

Indeed. HZ does not seem to matter according to what i see in my 
measurements. Can you measure such sensitivity?

> CONFIG_ARCH_WANT_FRAME_POINTERS=y
> CONFIG_FRAME_POINTER=y
> 
> And other DEBUG.

These are the defaults and they dont make a measurable difference to 
these results. What other debug options do you mean and do they make 
a difference?

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 11:57   ` Jens Axboe
@ 2009-09-07 14:14     ` Ingo Molnar
  2009-09-07 17:38       ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-07 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith


* Jens Axboe <jens.axboe@oracle.com> wrote:

> On Mon, Sep 07 2009, Jens Axboe wrote:
> > Scheduler       Runtime         Max lat     Avg lat     Std dev
> > ----------------------------------------------------------------
> > CFS             100             951         462         267
> > CFS-x2          100             983         484         308
> > BFS
> > BFS-x2
> 
> Those numbers are buggy, btw, it's not nearly as bad. But 
> responsiveness under compile load IS bad though, the test app just 
> didn't quantify it correctly. I'll see if I can get it working 
> properly.

What's the default latency target on your box:

  cat /proc/sys/kernel/sched_latency_ns

?

And yes, it would be wonderful to get a test-app from you that would 
express the kind of pain you are seeing during compile jobs.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07  3:38 ` Nikos Chantziaras
  2009-09-07 11:01   ` Frederic Weisbecker
@ 2009-09-07 14:40   ` Arjan van de Ven
  2009-09-08  7:19     ` Nikos Chantziaras
  2009-09-07 23:54   ` Thomas Fjellstrom
  2 siblings, 1 reply; 216+ messages in thread
From: Arjan van de Ven @ 2009-09-07 14:40 UTC (permalink / raw)
  To: Nikos Chantziaras; +Cc: linux-kernel

On Mon, 07 Sep 2009 06:38:36 +0300
Nikos Chantziaras <realnc@arcor.de> wrote:

> On 09/06/2009 11:59 PM, Ingo Molnar wrote:
> >[...]
> > Also, i'd like to outline that i agree with the general goals
> > described by you in the BFS announcement - small desktop systems
> > matter more than large systems. We find it critically important
> > that the mainline Linux scheduler performs well on those systems
> > too - and if you (or anyone else) can reproduce suboptimal behavior
> > please let the scheduler folks know so that we can fix/improve it.
> 
> BFS improved behavior of many applications on my Intel Core 2 box in
> a way that can't be benchmarked.  Examples:

Have you tried to see if latencytop catches such latencies ?

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 13:41     ` Markus Tornqvist
  2009-09-07 13:59       ` Ingo Molnar
@ 2009-09-07 14:45       ` Arjan van de Ven
  2009-09-07 15:20         ` Frans Pop
  2009-09-07 15:24         ` Xavier Bestel
  1 sibling, 2 replies; 216+ messages in thread
From: Arjan van de Ven @ 2009-09-07 14:45 UTC (permalink / raw)
  To: Markus Tornqvist
  Cc: Ingo Molnar, mjt, Frans Pop, kernel, linux-kernel, a.p.zijlstra, efault

On Mon, 7 Sep 2009 16:41:51 +0300
> >It shows similar curves and behavior to the 8-core results i posted 
> >- BFS is slower than mainline in virtually every measurement. The 
> >ratios are different for different parts of the graphs - but the 
> >trend is similar.
> 
> Dude, not cool.
> 
> 1. Quad HT is not the same as a 4-core desktop, you're doing it with
> 8 cores 

4 cores, 8 threads. Which is basically the standard desktop cpu going
forward... (4 cores already is today, 8 threads is that any day now)



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-06 20:59 BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
                   ` (3 preceding siblings ...)
  2009-09-07  9:49 ` Jens Axboe
@ 2009-09-07 15:16 ` Michael Buesch
  2009-09-07 18:26   ` Ingo Molnar
  2009-09-08 12:57 ` Epic regression in throughput since v2.6.23 Serge Belyshev
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 216+ messages in thread
From: Michael Buesch @ 2009-09-07 15:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith, Felix Fietkau

Here's a very simple test setup on an embedded singlecore bcm47xx machine (WL500GPv2)
It uses iperf for performance testing. The iperf server is run on the
embedded device. The device is so slow that the iperf test is completely
CPU bound. The network connection is a 100MBit on the device connected
via patch cable to a 1000MBit machine.

The kernel is openwrt-2.6.30.5.

Here are the results:



Mainline CFS scheduler:

mb@homer:~$ iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.99 port 35793 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  27.4 MBytes  23.0 Mbits/sec
mb@homer:~$ iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.99 port 35794 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  27.3 MBytes  22.9 Mbits/sec
mb@homer:~$ iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.99 port 56147 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  27.3 MBytes  22.9 Mbits/sec


BFS scheduler:

mb@homer:~$ iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.99 port 52489 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  38.2 MBytes  32.0 Mbits/sec
mb@homer:~$ iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.99 port 52490 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  38.1 MBytes  31.9 Mbits/sec
mb@homer:~$ iperf -c 192.168.1.1
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.99 port 52491 connected with 192.168.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  38.1 MBytes  31.9 Mbits/sec


-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 14:45       ` Arjan van de Ven
@ 2009-09-07 15:20         ` Frans Pop
  2009-09-07 15:36           ` Arjan van de Ven
  2009-09-07 15:24         ` Xavier Bestel
  1 sibling, 1 reply; 216+ messages in thread
From: Frans Pop @ 2009-09-07 15:20 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Markus Tornqvist, Ingo Molnar, kernel, linux-kernel,
	a.p.zijlstra, efault

On Monday 07 September 2009, Arjan van de Ven wrote:
> 4 cores, 8 threads. Which is basically the standard desktop cpu going
> forward... (4 cores already is today, 8 threads is that any day now)

Despite that I'm personally more interested in what I have available here 
*now*. And that's various UP Pentium systems, one dual core Pentium D and 
Core Duo.

I've been running BFS on my laptop today while doing CPU intensive jobs 
(not disk intensive), and I must say that BFS does seem very responsive.
OTOH, I've also noticed some surprising things, such as processors staying 
on lower frequencies while doing CPU-intensive work.

I feels like I have less of the mouse cursor and typing freezes I'm used 
to with CFS, even when I'm *not* doing anything special. I've been 
blaming those on still running with ordered mode ext3, but now I'm 
starting to wonder.

I'll try to do more structured testing, comparisons and measurements 
later. At the very least it's nice to have something to compare _with_.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 14:45       ` Arjan van de Ven
  2009-09-07 15:20         ` Frans Pop
@ 2009-09-07 15:24         ` Xavier Bestel
  2009-09-07 15:37           ` Arjan van de Ven
  2009-09-07 16:00           ` Diego Calleja
  1 sibling, 2 replies; 216+ messages in thread
From: Xavier Bestel @ 2009-09-07 15:24 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Markus Tornqvist, Ingo Molnar, Frans Pop, kernel, linux-kernel,
	a.p.zijlstra, efault


On Mon, 2009-09-07 at 07:45 -0700, Arjan van de Ven wrote:
> On Mon, 7 Sep 2009 16:41:51 +0300
> > >It shows similar curves and behavior to the 8-core results i posted 
> > >- BFS is slower than mainline in virtually every measurement. The 
> > >ratios are different for different parts of the graphs - but the 
> > >trend is similar.
> > 
> > Dude, not cool.
> > 
> > 1. Quad HT is not the same as a 4-core desktop, you're doing it with
> > 8 cores 
> 
> 4 cores, 8 threads. Which is basically the standard desktop cpu going
> forward... (4 cores already is today, 8 threads is that any day now)

Except on your typical smartphone, which will run linux and probably
vastly outnumber the number of "traditional" linux desktops.

	Xav




^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 12:16   ` [quad core results] " Ingo Molnar
  2009-09-07 12:36     ` Stefan Richter
  2009-09-07 13:41     ` Markus Tornqvist
@ 2009-09-07 15:34     ` Nikos Chantziaras
  2 siblings, 0 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-07 15:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Frans Pop, kernel, linux-kernel, a.p.zijlstra, efault

On 09/07/2009 03:16 PM, Ingo Molnar wrote:
> [...]
> Note that usually we can extrapolate ballpark-figure quad and dual
> socket results from 8 core results. Trends as drastic as the ones
> i reported do not get reversed as one shrinks the number of cores.
>
> Con posted single-socket quad comparisons/graphs so to make it 100%
> apples to apples i re-tested with a single-socket (non-NUMA) quad as
> well, and have uploaded the new graphs/results to:
>
>    kernel build performance on quad:
>       http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild-quad.jpg
>
>    pipe performance on quad:
>       http://redhat.com/~mingo/misc/bfs-vs-tip-pipe-quad.jpg
>
>    messaging performance (hackbench) on quad:
>       http://redhat.com/~mingo/misc/bfs-vs-tip-messaging-quad.jpg
>
>    OLTP performance (postgresql + sysbench) on quad:
>       http://redhat.com/~mingo/misc/bfs-vs-tip-oltp-quad.jpg
>
> It shows similar curves and behavior to the 8-core results i posted
> - BFS is slower than mainline in virtually every measurement.

Except for numbers, what's your *experience* with BFS when it comes to 
composited desktops + games + multimedia apps?  (Watching high 
definition videos, playing some latest high-tech 3D game, etc.)  I 
described the exact problems experienced with mainline in a previous reply.

Are you even using that stuff actually?  Because it would be hard to 
tell if your desktop consists mainly of Emacs and an xterm; you even 
seem to be using Mutt so I suspect your desktop probably doesn't look 
very Windows Vista/OS X/Compiz-like.  Usually, with "multimedia desktop 
PC" one doesn't mean:

   http://foss.math.aegean.gr/~realnc/pics/desktop2.png

but rather:

   http://foss.math.aegean.gr/~realnc/pics/desktop1.png

BFS probably wouldn't offer the former anything, while on the latter it 
does make a difference.  If your usage of the "desktop" bears a 
resemblance to the first example, I'd say you might be not the most 
qualified person to judge on the "Linux desktop experience."  That is 
not meant be offensive or patronizing, just an observation and I even 
might be totally wrong about it.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 15:20         ` Frans Pop
@ 2009-09-07 15:36           ` Arjan van de Ven
  2009-09-07 15:47             ` Frans Pop
  0 siblings, 1 reply; 216+ messages in thread
From: Arjan van de Ven @ 2009-09-07 15:36 UTC (permalink / raw)
  To: Frans Pop
  Cc: Markus Tornqvist, Ingo Molnar, kernel, linux-kernel,
	a.p.zijlstra, efault

On Mon, 7 Sep 2009 17:20:33 +0200
Frans Pop <elendil@planet.nl> wrote:

> On Monday 07 September 2009, Arjan van de Ven wrote:
> > 4 cores, 8 threads. Which is basically the standard desktop cpu
> > going forward... (4 cores already is today, 8 threads is that any
> > day now)
> 
> Despite that I'm personally more interested in what I have available
> here *now*. And that's various UP Pentium systems, one dual core
> Pentium D and Core Duo.
> 
> I've been running BFS on my laptop today while doing CPU intensive
> jobs (not disk intensive), and I must say that BFS does seem very
> responsive. OTOH, I've also noticed some surprising things, such as
> processors staying on lower frequencies while doing CPU-intensive
> work.
> 
> I feels like I have less of the mouse cursor and typing freezes I'm
> used to with CFS, even when I'm *not* doing anything special. I've
> been blaming those on still running with ordered mode ext3, but now
> I'm starting to wonder.
> 
> I'll try to do more structured testing, comparisons and measurements 
> later. At the very least it's nice to have something to compare
> _with_.
> 

it's a shameless plug since I wrote it, but latencytop will be able to
tell you what your bottleneck is...
and that is very interesting to know, regardless of the "what scheduler
code" discussion;

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 15:24         ` Xavier Bestel
@ 2009-09-07 15:37           ` Arjan van de Ven
  2009-09-07 16:00           ` Diego Calleja
  1 sibling, 0 replies; 216+ messages in thread
From: Arjan van de Ven @ 2009-09-07 15:37 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Markus Tornqvist, Ingo Molnar, Frans Pop, kernel, linux-kernel,
	a.p.zijlstra, efault

On Mon, 07 Sep 2009 17:24:29 +0200
Xavier Bestel <xavier.bestel@free.fr> wrote:

> 
> On Mon, 2009-09-07 at 07:45 -0700, Arjan van de Ven wrote:
> > On Mon, 7 Sep 2009 16:41:51 +0300
> > > >It shows similar curves and behavior to the 8-core results i
> > > >posted 
> > > >- BFS is slower than mainline in virtually every measurement.
> > > >The ratios are different for different parts of the graphs - but
> > > >the trend is similar.
> > > 
> > > Dude, not cool.
> > > 
> > > 1. Quad HT is not the same as a 4-core desktop, you're doing it
> > > with 8 cores 
> > 
> > 4 cores, 8 threads. Which is basically the standard desktop cpu
> > going forward... (4 cores already is today, 8 threads is that any
> > day now)
> 
> Except on your typical smartphone, which will run linux and probably
> vastly outnumber the number of "traditional" linux desktops.

yeah the trend in cellphones is only quad core without HT, not quad
core WITH ht ;-)



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 15:36           ` Arjan van de Ven
@ 2009-09-07 15:47             ` Frans Pop
  0 siblings, 0 replies; 216+ messages in thread
From: Frans Pop @ 2009-09-07 15:47 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Markus Tornqvist, Ingo Molnar, kernel, linux-kernel,
	a.p.zijlstra, efault

On Monday 07 September 2009, Arjan van de Ven wrote:
> it's a shameless plug since I wrote it, but latencytop will be able to
> tell you what your bottleneck is...
> and that is very interesting to know, regardless of the "what scheduler
> code" discussion;

I'm very much aware of that and I've tried pinning it down a few times, 
but failed to come up with anything conclusive. I plan to make a new 
effort in this context as the freezes have increasingly been annoying me.

Unfortunately latencytop only shows a blank screen when used with BFS, but 
I guess that's not totally unexpected.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 15:24         ` Xavier Bestel
  2009-09-07 15:37           ` Arjan van de Ven
@ 2009-09-07 16:00           ` Diego Calleja
  1 sibling, 0 replies; 216+ messages in thread
From: Diego Calleja @ 2009-09-07 16:00 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Arjan van de Ven, Markus Tornqvist, Ingo Molnar, Frans Pop,
	kernel, linux-kernel, a.p.zijlstra, efault

On Lunes 07 Septiembre 2009 17:24:29 Xavier Bestel escribió:
> Except on your typical smartphone, which will run linux and probably
> vastly outnumber the number of "traditional" linux desktops.

Smartphones will probably start using ARM dualcore cpus the next year,
the embedded land is no SMP-free.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 14:14     ` Ingo Molnar
@ 2009-09-07 17:38       ` Jens Axboe
  2009-09-07 20:44         ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-07 17:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

On Mon, Sep 07 2009, Ingo Molnar wrote:
> 
> * Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Mon, Sep 07 2009, Jens Axboe wrote:
> > > Scheduler       Runtime         Max lat     Avg lat     Std dev
> > > ----------------------------------------------------------------
> > > CFS             100             951         462         267
> > > CFS-x2          100             983         484         308
> > > BFS
> > > BFS-x2
> > 
> > Those numbers are buggy, btw, it's not nearly as bad. But 
> > responsiveness under compile load IS bad though, the test app just 
> > didn't quantify it correctly. I'll see if I can get it working 
> > properly.
> 
> What's the default latency target on your box:
> 
>   cat /proc/sys/kernel/sched_latency_ns
> 
> ?

It's off right now, but it is set to whatever is the default. I don't
touch it.

> And yes, it would be wonderful to get a test-app from you that would 
> express the kind of pain you are seeing during compile jobs.

I was hoping this one would, but it's not showing anything. I even added
support for doing the ping and wakeup over a socket, to see if the pipe
test was doing well because of the sync wakeup we do there. The net
latency is a little worse, but still good. So no luck in making that app
so far.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07  9:49 ` Jens Axboe
  2009-09-07 10:12   ` Nikos Chantziaras
  2009-09-07 11:57   ` Jens Axboe
@ 2009-09-07 18:02   ` Avi Kivity
  2009-09-07 18:46     ` Jens Axboe
  2009-09-09  7:38   ` Pavel Machek
  3 siblings, 1 reply; 216+ messages in thread
From: Avi Kivity @ 2009-09-07 18:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ingo Molnar, Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

On 09/07/2009 12:49 PM, Jens Axboe wrote:
>
> I ran a simple test as well, since I was curious to see how it performed
> wrt interactiveness. One of my pet peeves with the current scheduler is
> that I have to nice compile jobs, or my X experience is just awful while
> the compile is running.
>    

I think the problem is that CFS is optimizing for the wrong thing.  It's 
trying to be fair to tasks, but these are meaningless building blocks of 
jobs, which is what the user sees and measures.  Your make -j128 
dominates your interactive task by two orders of magnitude.  If the 
scheduler attempts to bridge this gap using heuristics, it will fail 
badly when it misdetects since it will starve the really important 
100-thread job for a task that was misdetected as interactive.

I think that bash (and the GUI shell) should put any new job (for bash, 
a pipeline; for the GUI, an application launch from the menu) in a 
scheduling group of its own.  This way it will have equal weight in the 
scheduler's eyes with interactive tasks; one will not dominate the 
other.  Of course if the cpu is free the compile job is welcome to use 
all 128 threads.

(similarly, different login sessions should be placed in different jobs 
to avoid a heavily multithreaded screensaver from overwhelming ed).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07  3:50 ` Con Kolivas
@ 2009-09-07 18:20   ` Jerome Glisse
  0 siblings, 0 replies; 216+ messages in thread
From: Jerome Glisse @ 2009-09-07 18:20 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Mike Galbraith

On Mon, 2009-09-07 at 13:50 +1000, Con Kolivas wrote:

> /me checks on his distributed computing client's progress, fires up
> his next H264 encode, changes music tracks and prepares to have his
> arse whooped on quakelive.
> --

For such computer usage i would strongly suggest that you look into
GPU driver development there is a lot of performances to win in this
area and my feeling is that you can improve what you are doing
(games -> opengl (so GPU), H264 (encoding is harder to accelerate
with a GPU but for decoding and displaying it you definitely want
to involve the GPU), and tons of others things you are doing on your
linux desktop would go faster if GPU was put to more use. A wild guess
is that you can get a 2 or even 3 figures percentage improvement
with better GPU driver. My point is that i don't think a linux
scheduler improvement (compared to what we have now) will give a
significant boost for the linux desktop, on the contrary any even
slight improvement to the GPU driver stack can give you a boost.
Another way of saying that, there is no point into prioritizing  X or
desktop app if CPU has to do all the drawing by itself (CPU is
several magnitude slower than GPU at doing such kind of task).

Regards,
Jerome Glisse


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 15:16 ` BFS vs. mainline scheduler benchmarks and measurements Michael Buesch
@ 2009-09-07 18:26   ` Ingo Molnar
  2009-09-07 18:47     ` Daniel Walker
                       ` (2 more replies)
  0 siblings, 3 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-07 18:26 UTC (permalink / raw)
  To: Michael Buesch
  Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith, Felix Fietkau


* Michael Buesch <mb@bu3sch.de> wrote:

> Here's a very simple test setup on an embedded singlecore bcm47xx 
> machine (WL500GPv2) It uses iperf for performance testing. The 
> iperf server is run on the embedded device. The device is so slow 
> that the iperf test is completely CPU bound. The network 
> connection is a 100MBit on the device connected via patch cable to 
> a 1000MBit machine.
> 
> The kernel is openwrt-2.6.30.5.
> 
> Here are the results:
> 
> 
> 
> Mainline CFS scheduler:
> 
> mb@homer:~$ iperf -c 192.168.1.1
> ------------------------------------------------------------
> Client connecting to 192.168.1.1, TCP port 5001
> TCP window size: 16.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 192.168.1.99 port 35793 connected with 192.168.1.1 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  27.4 MBytes  23.0 Mbits/sec
> mb@homer:~$ iperf -c 192.168.1.1
> ------------------------------------------------------------
> Client connecting to 192.168.1.1, TCP port 5001
> TCP window size: 16.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 192.168.1.99 port 35794 connected with 192.168.1.1 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  27.3 MBytes  22.9 Mbits/sec
> mb@homer:~$ iperf -c 192.168.1.1
> ------------------------------------------------------------
> Client connecting to 192.168.1.1, TCP port 5001
> TCP window size: 16.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 192.168.1.99 port 56147 connected with 192.168.1.1 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  27.3 MBytes  22.9 Mbits/sec
> 
> 
> BFS scheduler:
> 
> mb@homer:~$ iperf -c 192.168.1.1
> ------------------------------------------------------------
> Client connecting to 192.168.1.1, TCP port 5001
> TCP window size: 16.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 192.168.1.99 port 52489 connected with 192.168.1.1 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  38.2 MBytes  32.0 Mbits/sec
> mb@homer:~$ iperf -c 192.168.1.1
> ------------------------------------------------------------
> Client connecting to 192.168.1.1, TCP port 5001
> TCP window size: 16.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 192.168.1.99 port 52490 connected with 192.168.1.1 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  38.1 MBytes  31.9 Mbits/sec
> mb@homer:~$ iperf -c 192.168.1.1
> ------------------------------------------------------------
> Client connecting to 192.168.1.1, TCP port 5001
> TCP window size: 16.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 192.168.1.99 port 52491 connected with 192.168.1.1 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  38.1 MBytes  31.9 Mbits/sec

That's interesting. I tried to reproduce it on x86, but the profile 
does not show any scheduler overhead at all on the server:

$ perf report

#
# Samples: 8369
#
# Overhead  Symbol
# ........  ......
#
     9.20%  [k] copy_user_generic_string
     3.80%  [k] e1000_clean
     3.58%  [k] ipt_do_table
     2.72%  [k] mwait_idle
     2.68%  [k] nf_iterate
     2.28%  [k] e1000_intr
     2.15%  [k] tcp_packet
     2.10%  [k] __hash_conntrack
     1.59%  [k] read_tsc
     1.52%  [k] _local_bh_enable_ip
     1.34%  [k] eth_type_trans
     1.29%  [k] __alloc_skb
     1.19%  [k] tcp_recvmsg
     1.19%  [k] ip_rcv
     1.17%  [k] e1000_clean_rx_irq
     1.12%  [k] apic_timer_interrupt
     0.99%  [k] vsnprintf
     0.96%  [k] nf_conntrack_in
     0.96%  [k] kmem_cache_free
     0.93%  [k] __kmalloc_track_caller


Could you profile it please? Also, what's the context-switch rate?

Below is the call-graph profile as well - all the overhead is in 
networking and SLAB.

	Ingo

 $ perf report --call-graph fractal,5

#
# Samples: 8947
#
# Overhead         Command                  Shared Object  Symbol
# ........  ..............  .............................  ......
#
     9.06%           iperf  [kernel]                       [k] copy_user_generic_string
                |          
                |--98.89%-- skb_copy_datagram_iovec
                |          |          
                |          |--77.18%-- tcp_recvmsg
                |          |          sock_common_recvmsg
                |          |          __sock_recvmsg
                |          |          sock_recvmsg
                |          |          sys_recvfrom
                |          |          system_call_fastpath
                |          |          __recv
                |          |          
                |           --22.82%-- tcp_rcv_established
                |                     tcp_v4_do_rcv
                |                     tcp_prequeue_process
                |                     tcp_recvmsg
                |                     sock_common_recvmsg
                |                     __sock_recvmsg
                |                     sock_recvmsg
                |                     sys_recvfrom
                |                     system_call_fastpath
                |                     __recv
                |          
                 --1.11%-- system_call_fastpath
                           __GI___libc_nanosleep

     3.62%          [init]  [kernel]                       [k] e1000_clean
     2.96%          [init]  [kernel]                       [k] ipt_do_table
     2.79%          [init]  [kernel]                       [k] mwait_idle
     2.22%          [init]  [kernel]                       [k] e1000_intr
     1.93%          [init]  [kernel]                       [k] nf_iterate
     1.65%          [init]  [kernel]                       [k] __hash_conntrack
     1.52%          [init]  [kernel]                       [k] tcp_packet
     1.29%          [init]  [kernel]                       [k] ip_rcv
     1.18%          [init]  [kernel]                       [k] __alloc_skb
     1.15%           iperf  [kernel]                       [k] tcp_recvmsg

     1.04%          [init]  [kernel]                       [k] _local_bh_enable_ip
     1.02%          [init]  [kernel]                       [k] apic_timer_interrupt
     1.02%          [init]  [kernel]                       [k] eth_type_trans
     1.01%          [init]  [kernel]                       [k] tcp_v4_rcv
     0.96%           iperf  [kernel]                       [k] kfree
                |          
                |--95.35%-- skb_release_data
                |          __kfree_skb
                |          |          
                |          |--79.27%-- tcp_recvmsg
                |          |          sock_common_recvmsg
                |          |          __sock_recvmsg
                |          |          sock_recvmsg
                |          |          sys_recvfrom
                |          |          system_call_fastpath
                |          |          __recv
                |          |          
                |           --20.73%-- tcp_rcv_established
                |                     tcp_v4_do_rcv
                |                     tcp_prequeue_process
                |                     tcp_recvmsg
                |                     sock_common_recvmsg
                |                     __sock_recvmsg
                |                     sock_recvmsg
                |                     sys_recvfrom
                |                     system_call_fastpath
                |                     __recv
                |          
                 --4.65%-- __kfree_skb
                           |          
                           |--75.00%-- tcp_rcv_established
                           |          tcp_v4_do_rcv
                           |          tcp_prequeue_process
                           |          tcp_recvmsg
                           |          sock_common_recvmsg
                           |          __sock_recvmsg
                           |          sock_recvmsg
                           |          sys_recvfrom
                           |          system_call_fastpath
                           |          __recv
                           |          
                            --25.00%-- tcp_recvmsg
                                      sock_common_recvmsg
                                      __sock_recvmsg
                                      sock_recvmsg
                                      sys_recvfrom
                                      system_call_fastpath
                                      __recv

     0.96%          [init]  [kernel]                       [k] read_tsc
     0.92%           iperf  [kernel]                       [k] tcp_v4_do_rcv
                |          
                |--95.12%-- tcp_prequeue_process
                |          tcp_recvmsg
                |          sock_common_recvmsg
                |          __sock_recvmsg
                |          sock_recvmsg
                |          sys_recvfrom
                |          system_call_fastpath
                |          __recv
                |          
                 --4.88%-- tcp_recvmsg
                           sock_common_recvmsg
                           __sock_recvmsg
                           sock_recvmsg
                           sys_recvfrom
                           system_call_fastpath
                           __recv

     0.92%          [init]  [kernel]                       [k] e1000_clean_rx_irq
     0.86%           iperf  [kernel]                       [k] tcp_rcv_established
                |          
                |--96.10%-- tcp_v4_do_rcv
                |          tcp_prequeue_process
                |          tcp_recvmsg
                |          sock_common_recvmsg
                |          __sock_recvmsg
                |          sock_recvmsg
                |          sys_recvfrom
                |          system_call_fastpath
                |          __recv
                |          
                 --3.90%-- tcp_prequeue_process
                           tcp_recvmsg
                           sock_common_recvmsg
                           __sock_recvmsg
                           sock_recvmsg
                           sys_recvfrom
                           system_call_fastpath
                           __recv

     0.84%           iperf  [kernel]                       [k] kmem_cache_free
                |          
                |--93.33%-- __kfree_skb
                |          |          
                |          |--71.43%-- tcp_recvmsg
                |          |          sock_common_recvmsg
                |          |          __sock_recvmsg
                |          |          sock_recvmsg
                |          |          sys_recvfrom
                |          |          system_call_fastpath
                |          |          __recv
                |          |          
                |           --28.57%-- tcp_rcv_established
                |                     tcp_v4_do_rcv
                |                     tcp_prequeue_process
                |                     tcp_recvmsg
                |                     sock_common_recvmsg
                |                     __sock_recvmsg
                |                     sock_recvmsg
                |                     sys_recvfrom
                |                     system_call_fastpath
                |                     __recv
                |          
                |--4.00%-- tcp_recvmsg
                |          sock_common_recvmsg
                |          __sock_recvmsg
                |          sock_recvmsg
                |          sys_recvfrom
                |          system_call_fastpath
                |          __recv
                |          
                 --2.67%-- tcp_rcv_established
                           tcp_v4_do_rcv
                           tcp_prequeue_process
                           tcp_recvmsg
                           sock_common_recvmsg
                           __sock_recvmsg
                           sock_recvmsg
                           sys_recvfrom
                           system_call_fastpath
                           __recv

     0.80%          [init]  [kernel]                       [k] netif_receive_skb
     0.79%           iperf  [kernel]                       [k] tcp_event_data_recv
                |          
                |--83.10%-- tcp_rcv_established
                |          tcp_v4_do_rcv
                |          tcp_prequeue_process
                |          tcp_recvmsg
                |          sock_common_recvmsg
                |          __sock_recvmsg
                |          sock_recvmsg
                |          sys_recvfrom
                |          system_call_fastpath
                |          __recv
                |          
                |--12.68%-- tcp_v4_do_rcv
                |          tcp_prequeue_process
                |          tcp_recvmsg
                |          sock_common_recvmsg
                |          __sock_recvmsg
                |          sock_recvmsg
                |          sys_recvfrom
                |          system_call_fastpath
                |          __recv
                |          
                 --4.23%-- tcp_data_queue
                           tcp_rcv_established
                           tcp_v4_do_rcv
                           tcp_prequeue_process
                           tcp_recvmsg
                           sock_common_recvmsg
                           __sock_recvmsg
                           sock_recvmsg
                           sys_recvfrom
                           system_call_fastpath
                           __recv

     0.67%            perf  [kernel]                       [k] format_decode
                |          
                |--91.67%-- vsnprintf
                |          seq_printf
                |          |          
                |          |--67.27%-- show_map_vma
                |          |          show_map
                |          |          seq_read
                |          |          vfs_read
                |          |          sys_read
                |          |          system_call_fastpath
                |          |          __GI_read
                |          |          
                |          |--23.64%-- render_sigset_t
                |          |          proc_pid_status
                |          |          proc_single_show
                |          |          seq_read
                |          |          vfs_read
                |          |          sys_read
                |          |          system_call_fastpath
                |          |          __GI_read
                |          |          
                |          |--7.27%-- proc_pid_status
                |          |          proc_single_show
                |          |          seq_read
                |          |          vfs_read
                |          |          sys_read
                |          |          system_call_fastpath
                |          |          __GI_read
                |          |          
                |           --1.82%-- cpuset_task_status_allowed
                |                     proc_pid_status
                |                     proc_single_show
                |                     seq_read
                |                     vfs_read
                |                     sys_read
                |                     system_call_fastpath
                |                     __GI_read
                |          
                 --8.33%-- seq_printf
                           |          
                           |--60.00%-- proc_pid_status
                           |          proc_single_show
                           |          seq_read
                           |          vfs_read
                           |          sys_read
                           |          system_call_fastpath
                           |          __GI_read
                           |          
                            --40.00%-- show_map_vma
                                      show_map
                                      seq_read
                                      vfs_read
                                      sys_read
                                      system_call_fastpath
                                      __GI_read

     0.65%          [init]  [kernel]                       [k] __kmalloc_track_caller
     0.63%          [init]  [kernel]                       [k] nf_conntrack_in
     0.63%          [init]  [kernel]                       [k] ip_route_input
     0.58%            perf  [kernel]                       [k] vsnprintf
                |          
                |--98.08%-- seq_printf
                |          |          
                |          |--60.78%-- show_map_vma
                |          |          show_map
                |          |          seq_read
                |          |          vfs_read
                |          |          sys_read
                |          |          system_call_fastpath
                |          |          __GI_read
                |          |          
                |          |--19.61%-- render_sigset_t
                |          |          proc_pid_status
                |          |          proc_single_show
                |          |          seq_read
                |          |          vfs_read
                |          |          sys_read
                |          |          system_call_fastpath
                |          |          __GI_read
                |          |          
                |          |--9.80%-- proc_pid_status
                |          |          proc_single_show
                |          |          seq_read
                |          |          vfs_read
                |          |          sys_read
                |          |          system_call_fastpath
                |          |          __GI_read
                |          |          
                |          |--3.92%-- task_mem
                |          |          proc_pid_status
                |          |          proc_single_show
                |          |          seq_read
                |          |          vfs_read
                |          |          sys_read
                |          |          system_call_fastpath
                |          |          __GI_read
                |          |          
                |          |--3.92%-- cpuset_task_status_allowed
                |          |          proc_pid_status
                |          |          proc_single_show
                |          |          seq_read
                |          |          vfs_read
                |          |          sys_read
                |          |          system_call_fastpath
                |          |          __GI_read
                |          |          
                |           --1.96%-- render_cap_t
                |                     proc_pid_status
                |                     proc_single_show
                |                     seq_read
                |                     vfs_read
                |                     sys_read
                |                     system_call_fastpath
                |                     __GI_read
                |          
                 --1.92%-- snprintf
                           proc_task_readdir
                           vfs_readdir
                           sys_getdents
                           system_call_fastpath
                           __getdents64
                           0x69706565000a3430

     0.57%          [init]  [kernel]                       [k] ktime_get
     0.57%          [init]  [kernel]                       [k] nf_nat_fn
     0.56%           iperf  [kernel]                       [k] tcp_packet
                |          
                |--68.00%-- __tcp_ack_snd_check
                |          tcp_rcv_established
                |          tcp_v4_do_rcv
                |          tcp_prequeue_process
                |          tcp_recvmsg
                |          sock_common_recvmsg
                |          __sock_recvmsg
                |          sock_recvmsg
                |          sys_recvfrom
                |          system_call_fastpath
                |          __recv
                |          
                 --32.00%-- tcp_cleanup_rbuf
                           tcp_recvmsg
                           sock_common_recvmsg
                           __sock_recvmsg
                           sock_recvmsg
                           sys_recvfrom
                           system_call_fastpath
                           __recv

     0.56%           iperf  /usr/bin/iperf                 [.] 0x000000000059f8
                |          
                |--8.00%-- 0x4059f8
                |          
                |--8.00%-- 0x405a16
                |          
                |--8.00%-- 0x4059fd
                |          
                |--4.00%-- 0x409d22
                |          
                |--4.00%-- 0x405871
                |          
                |--4.00%-- 0x406ee1
                |          
                |--4.00%-- 0x405726
                |          
                |--4.00%-- 0x4058db
                |          
                |--4.00%-- 0x406ee8
                |          
                |--2.00%-- 0x405b60
                |          
                |--2.00%-- 0x4058fd
                |          
                |--2.00%-- 0x4058d5
                |          
                |--2.00%-- 0x405490
                |          
                |--2.00%-- 0x4058bb
                |          
                |--2.00%-- 0x405b93
                |          
                |--2.00%-- 0x405b8e
                |          
                |--2.00%-- 0x405903
                |          
                |--2.00%-- 0x405ba8
                |          
                |--2.00%-- 0x406eae
                |          
                |--2.00%-- 0x405545
                |          
                |--2.00%-- 0x405870
                |          
                |--2.00%-- 0x405b67
                |          
                |--2.00%-- 0x4058ce
                |          
                |--2.00%-- 0x40570e
                |          
                |--2.00%-- 0x406ee4
                |          
                |--2.00%-- 0x405a02
                |          
                |--2.00%-- 0x406eec
                |          
                |--2.00%-- 0x405b82
                |          
                |--2.00%-- 0x40556a
                |          
                |--2.00%-- 0x405755
                |          
                |--2.00%-- 0x405a0a
                |          
                |--2.00%-- 0x405498
                |          
                |--2.00%-- 0x409d20
                |          
                |--2.00%-- 0x405b21
                |          
                 --2.00%-- 0x405a2c

     0.56%          [init]  [kernel]                       [k] kmem_cache_alloc
     0.56%          [init]  [kernel]                       [k] __inet_lookup_established
     0.55%            perf  [kernel]                       [k] number
                |          
                |--95.92%-- vsnprintf
                |          |          
                |          |--97.87%-- seq_printf
                |          |          |          
                |          |          |--56.52%-- show_map_vma
                |          |          |          show_map
                |          |          |          seq_read
                |          |          |          vfs_read
                |          |          |          sys_read
                |          |          |          system_call_fastpath
                |          |          |          __GI_read
                |          |          |          
                |          |          |--28.26%-- render_sigset_t
                |          |          |          proc_pid_status
                |          |          |          proc_single_show
                |          |          |          seq_read
                |          |          |          vfs_read
                |          |          |          sys_read
                |          |          |          system_call_fastpath
                |          |          |          __GI_read
                |          |          |          
                |          |          |--6.52%-- proc_pid_status
                |          |          |          proc_single_show
                |          |          |          seq_read
                |          |          |          vfs_read
                |          |          |          sys_read
                |          |          |          system_call_fastpath
                |          |          |          __GI_read
                |          |          |          
                |          |          |--4.35%-- render_cap_t
                |          |          |          proc_pid_status
                |          |          |          proc_single_show
                |          |          |          seq_read
                |          |          |          vfs_read
                |          |          |          sys_read
                |          |          |          system_call_fastpath
                |          |          |          __GI_read
                |          |          |          
                |          |           --4.35%-- task_mem
                |          |                     proc_pid_status
                |          |                     proc_single_show
                |          |                     seq_read
                |          |                     vfs_read
                |          |                     sys_read
                |          |                     system_call_fastpath
                |          |                     __GI_read
                |          |          
                |           --2.13%-- scnprintf
                |                     bitmap_scnlistprintf
                |                     seq_bitmap_list
                |                     cpuset_task_status_allowed
                |                     proc_pid_status
                |                     proc_single_show
                |                     seq_read
                |                     vfs_read
                |                     sys_read
                |                     system_call_fastpath
                |                     __GI_read
                |          
                 --4.08%-- seq_printf
                           |          
                           |--50.00%-- show_map_vma
                           |          show_map
                           |          seq_read
                           |          vfs_read
                           |          sys_read
                           |          system_call_fastpath
                           |          __GI_read
                           |          
                            --50.00%-- render_sigset_t
                                      proc_pid_status
                                      proc_single_show
                                      seq_read
                                      vfs_read
                                      sys_read
                                      system_call_fastpath
                                      __GI_read

     0.55%          [init]  [kernel]                       [k] native_sched_clock
     0.50%           iperf  [kernel]                       [k] e1000_xmit_frame
                |          
                |--71.11%-- __tcp_ack_snd_check
                |          tcp_rcv_established
                |          tcp_v4_do_rcv
                |          tcp_prequeue_process
                |          tcp_recvmsg
                |          sock_common_recvmsg
                |          __sock_recvmsg
                |          sock_recvmsg
                |          sys_recvfrom
                |          system_call_fastpath
                |          __recv
                |          
                 --28.89%-- tcp_cleanup_rbuf
                           tcp_recvmsg
                           sock_common_recvmsg
                           __sock_recvmsg
                           sock_recvmsg
                           sys_recvfrom
                           system_call_fastpath
                           __recv

     0.50%           iperf  [kernel]                       [k] ipt_do_table
                |          
                |--37.78%-- ipt_local_hook
                |          nf_iterate
                |          nf_hook_slow
                |          __ip_local_out
                |          ip_local_out
                |          ip_queue_xmit
                |          tcp_transmit_skb
                |          tcp_send_ack
                |          |          
                |          |--58.82%-- __tcp_ack_snd_check
                |          |          tcp_rcv_established
                |          |          tcp_v4_do_rcv
                |          |          tcp_prequeue_process
                |          |          tcp_recvmsg
                |          |          sock_common_recvmsg
                |          |          __sock_recvmsg
                |          |          sock_recvmsg
                |          |          sys_recvfrom
                |          |          system_call_fastpath
                |          |          __recv
                |          |          
                |           --41.18%-- tcp_cleanup_rbuf
                |                     tcp_recvmsg
                |                     sock_common_recvmsg
                |                     __sock_recvmsg
                |                     sock_recvmsg
                |                     sys_recvfrom
                |                     system_call_fastpath
                |                     __recv
                |          
                |--31.11%-- ipt_post_routing_hook
                |          nf_iterate
                |          nf_hook_slow
                |          ip_output
                |          ip_local_out
                |          ip_queue_xmit
                |          tcp_transmit_skb
                |          tcp_send_ack
                |          |          
                |          |--64.29%-- __tcp_ack_snd_check
                |          |          tcp_rcv_established
                |          |          tcp_v4_do_rcv
                |          |          tcp_prequeue_process
                |          |          tcp_recvmsg
                |          |          sock_common_recvmsg
                |          |          __sock_recvmsg
                |          |          sock_recvmsg
                |          |          sys_recvfrom
                |          |          system_call_fastpath
                |          |          __recv
                |          |          
                |           --35.71%-- tcp_cleanup_rbuf
                |                     tcp_recvmsg
                |                     sock_common_recvmsg
                |                     __sock_recvmsg
                |                     sock_recvmsg
                |                     sys_recvfrom
                |                     system_call_fastpath
                |                     __recv
                |          
                |--20.00%-- ipt_local_out_hook
                |          nf_iterate
                |          nf_hook_slow
                |          __ip_local_out
                |          ip_local_out
                |          ip_queue_xmit
                |          tcp_transmit_skb
                |          tcp_send_ack
                |          |          
                |          |--88.89%-- __tcp_ack_snd_check
                |          |          tcp_rcv_established
                |          |          tcp_v4_do_rcv
                |          |          tcp_prequeue_process
                |          |          tcp_recvmsg
                |          |          sock_common_recvmsg
                |          |          __sock_recvmsg
                |          |          sock_recvmsg
                |          |          sys_recvfrom
                |          |          system_call_fastpath
                |          |          __recv
                |          |          
                |           --11.11%-- tcp_cleanup_rbuf
                |                     tcp_recvmsg
                |                     sock_common_recvmsg
                |                     __sock_recvmsg
                |                     sock_recvmsg
                |                     sys_recvfrom
                |                     system_call_fastpath
                |                     __recv
                |          
                |--6.67%-- nf_iterate
                |          nf_hook_slow
                |          |          
                |          |--66.67%-- ip_output
                |          |          ip_local_out
                |          |          ip_queue_xmit
                |          |          tcp_transmit_skb
                |          |          tcp_send_ack
                |          |          tcp_cleanup_rbuf
                |          |          tcp_recvmsg
                |          |          sock_common_recvmsg
                |          |          __sock_recvmsg
                |          |          sock_recvmsg
                |          |          sys_recvfrom
                |          |          system_call_fastpath
                |          |          __recv
                |          |          
                |           --33.33%-- __ip_local_out
                |                     ip_local_out
                |                     ip_queue_xmit
                |                     tcp_transmit_skb
                |                     tcp_send_ack
                |                     __tcp_ack_snd_check
                |                     tcp_rcv_established
                |                     tcp_v4_do_rcv
                |                     tcp_prequeue_process
                |                     tcp_recvmsg
                |                     sock_common_recvmsg
                |                     __sock_recvmsg
                |                     sock_recvmsg
                |                     sys_recvfrom
                |                     system_call_fastpath
                |                     __recv
                |          
                |--2.22%-- ipt_local_in_hook
                |          nf_iterate
                |          nf_hook_slow
                |          ip_local_deliver
                |          ip_rcv_finish
                |          ip_rcv
                |          netif_receive_skb
                |          napi_skb_finish
                |          napi_gro_receive
                |          e1000_receive_skb
                |          e1000_clean_rx_irq
                |          e1000_clean
                |          net_rx_action
                |          __do_softirq
                |          call_softirq
                |          do_softirq
                |          irq_exit
                |          do_IRQ
                |          ret_from_intr
                |          vgettimeofday
                |          
                 --2.22%-- ipt_pre_routing_hook
                           nf_iterate
                           nf_hook_slow
                           ip_rcv
                           netif_receive_skb
                           napi_skb_finish
                           napi_gro_receive
                           e1000_receive_skb
                           e1000_clean_rx_irq
                           e1000_clean
                           net_rx_action
                           __do_softirq
                           call_softirq
                           do_softirq
                           irq_exit
                           do_IRQ
                           ret_from_intr
                           __GI___libc_nanosleep

     0.50%           iperf  [kernel]                       [k] schedule
                |          
                |--57.78%-- do_nanosleep
                |          hrtimer_nanosleep
                |          sys_nanosleep
                |          system_call_fastpath
                |          __GI___libc_nanosleep
                |          
                |--33.33%-- schedule_timeout
                |          sk_wait_data
                |          tcp_recvmsg
                |          sock_common_recvmsg
                |          __sock_recvmsg
                |          sock_recvmsg
                |          sys_recvfrom
                |          system_call_fastpath
                |          __recv
                |          
                |--6.67%-- hrtimer_nanosleep
                |          sys_nanosleep
                |          system_call_fastpath
                |          __GI___libc_nanosleep
                |          
                 --2.22%-- sk_wait_data
                           tcp_recvmsg
                           sock_common_recvmsg
                           __sock_recvmsg
                           sock_recvmsg
                           sys_recvfrom
                           system_call_fastpath
                           __recv

     0.49%           iperf  [kernel]                       [k] tcp_transmit_skb
                |          
                |--97.73%-- tcp_send_ack
                |          |          
                |          |--83.72%-- __tcp_ack_snd_check
                |          |          tcp_rcv_established
                |          |          tcp_v4_do_rcv
                |          |          |          
                |          |          |--97.22%-- tcp_prequeue_process
                |          |          |          tcp_recvmsg
                |          |          |          sock_common_recvmsg
                |          |          |          __sock_recvmsg
                |          |          |          sock_recvmsg
                |          |          |          sys_recvfrom
                |          |          |          system_call_fastpath
                |          |          |          __recv
                |          |          |          
                |          |           --2.78%-- release_sock
                |          |                     tcp_recvmsg
                |          |                     sock_common_recvmsg
                |          |                     __sock_recvmsg
                |          |                     sock_recvmsg
                |          |                     sys_recvfrom
                |          |                     system_call_fastpath
                |          |                     __recv
                |          |          
                |           --16.28%-- tcp_cleanup_rbuf
                |                     tcp_recvmsg
                |                     sock_common_recvmsg
                |                     __sock_recvmsg
                |                     sock_recvmsg
                |                     sys_recvfrom
                |                     system_call_fastpath
                |                     __recv
                |          
                 --2.27%-- __tcp_ack_snd_check
                           tcp_rcv_established
                           tcp_v4_do_rcv
                           tcp_prequeue_process
                           tcp_recvmsg
                           sock_common_recvmsg
                           __sock_recvmsg
                           sock_recvmsg
                           sys_recvfrom
                           system_call_fastpath
                           __recv

     0.49%          [init]  [kernel]                       [k] nf_hook_slow
     0.48%           iperf  [kernel]                       [k] virt_to_head_page
                |          
                |--53.49%-- kfree
                |          skb_release_data
                |          __kfree_skb
                |          |          
                |          |--65.22%-- tcp_recvmsg
                |          |          sock_common_recvmsg
                |          |          __sock_recvmsg
                |          |          sock_recvmsg
                |          |          sys_recvfrom
                |          |          system_call_fastpath
                |          |          __recv
                |          |          
                |           --34.78%-- tcp_rcv_established
                |                     tcp_v4_do_rcv
                |                     tcp_prequeue_process
                |                     tcp_recvmsg
                |                     sock_common_recvmsg
                |                     __sock_recvmsg
                |                     sock_recvmsg
                |                     sys_recvfrom
                |                     system_call_fastpath
                |                     __recv
                |          
                |--18.60%-- skb_release_data
                |          __kfree_skb
                |          |          
                |          |--62.50%-- tcp_rcv_established
                |          |          tcp_v4_do_rcv
                |          |          tcp_prequeue_process
                |          |          tcp_recvmsg
                |          |          sock_common_recvmsg
                |          |          __sock_recvmsg
                |          |          sock_recvmsg
                |          |          sys_recvfrom
                |          |          system_call_fastpath
                |          |          __recv
                |          |          
                |           --37.50%-- tcp_recvmsg
                |                     sock_common_recvmsg
                |                     __sock_recvmsg
                |                     sock_recvmsg
                |                     sys_recvfrom
                |                     system_call_fastpath
                |                     __recv
                |          
                |--18.60%-- kmem_cache_free
                |          __kfree_skb
                |          |          
                |          |--62.50%-- tcp_rcv_established
                |          |          tcp_v4_do_rcv
                |          |          tcp_prequeue_process
                |          |          tcp_recvmsg
                |          |          sock_common_recvmsg
                |          |          __sock_recvmsg
                |          |          sock_recvmsg
                |          |          sys_recvfrom
                |          |          system_call_fastpath
                |          |          __recv
                |          |          
                |           --37.50%-- tcp_recvmsg
                |                     sock_common_recvmsg
                |                     __sock_recvmsg
                |                     sock_recvmsg
                |                     sys_recvfrom
                |                     system_call_fastpath
                |                     __recv
                |          
                 --9.30%-- __kfree_skb
                           |          
                           |--75.00%-- tcp_rcv_established
                           |          tcp_v4_do_rcv
                           |          tcp_prequeue_process
                           |          tcp_recvmsg
                           |          sock_common_recvmsg
                           |          __sock_recvmsg
                           |          sock_recvmsg
                           |          sys_recvfrom
                           |          system_call_fastpath
                           |          __recv
                           |          
                            --25.00%-- tcp_recvmsg
                                      sock_common_recvmsg
                                      __sock_recvmsg
                                      sock_recvmsg
                                      sys_recvfrom
                                      system_call_fastpath
                                      __recv
 ...

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 18:02   ` Avi Kivity
@ 2009-09-07 18:46     ` Jens Axboe
  2009-09-07 20:36       ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-07 18:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

On Mon, Sep 07 2009, Avi Kivity wrote:
> On 09/07/2009 12:49 PM, Jens Axboe wrote:
>>
>> I ran a simple test as well, since I was curious to see how it performed
>> wrt interactiveness. One of my pet peeves with the current scheduler is
>> that I have to nice compile jobs, or my X experience is just awful while
>> the compile is running.
>>    
>
> I think the problem is that CFS is optimizing for the wrong thing.  It's  
> trying to be fair to tasks, but these are meaningless building blocks of  
> jobs, which is what the user sees and measures.  Your make -j128  
> dominates your interactive task by two orders of magnitude.  If the  
> scheduler attempts to bridge this gap using heuristics, it will fail  
> badly when it misdetects since it will starve the really important  
> 100-thread job for a task that was misdetected as interactive.

Agree, I was actually looking into doing joint latency for X number of
tasks for the test app. I'll try and do that and see if we can detect
something from that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 18:26   ` Ingo Molnar
@ 2009-09-07 18:47     ` Daniel Walker
  2009-09-07 18:51     ` Michael Buesch
  2009-09-08  7:48     ` Ingo Molnar
  2 siblings, 0 replies; 216+ messages in thread
From: Daniel Walker @ 2009-09-07 18:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michael Buesch, Con Kolivas, linux-kernel, Peter Zijlstra,
	Mike Galbraith, Felix Fietkau

On Mon, 2009-09-07 at 20:26 +0200, Ingo Molnar wrote:
> That's interesting. I tried to reproduce it on x86, but the profile 
> does not show any scheduler overhead at all on the server:

If the scheduler isn't running the task which causes the lower
throughput , would that even show up in profiling output?

Daniel


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 18:26   ` Ingo Molnar
  2009-09-07 18:47     ` Daniel Walker
@ 2009-09-07 18:51     ` Michael Buesch
  2009-09-07 20:57       ` Ingo Molnar
  2009-09-08  7:48     ` Ingo Molnar
  2 siblings, 1 reply; 216+ messages in thread
From: Michael Buesch @ 2009-09-07 18:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith, Felix Fietkau

On Monday 07 September 2009 20:26:29 Ingo Molnar wrote:
> Could you profile it please? Also, what's the context-switch rate?

As far as I can tell, the broadcom mips architecture does not have profiling support.
It does only have some proprietary profiling registers that nobody wrote kernel
support for, yet.

-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 18:46     ` Jens Axboe
@ 2009-09-07 20:36       ` Ingo Molnar
  2009-09-07 20:46         ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-07 20:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Avi Kivity, Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith


* Jens Axboe <jens.axboe@oracle.com> wrote:

> Agree, I was actually looking into doing joint latency for X 
> number of tasks for the test app. I'll try and do that and see if 
> we can detect something from that.

Could you please try latest -tip:

   http://people.redhat.com/mingo/tip.git/README

(c26f010 or later)

Does it get any better with make -j128 build jobs? Peter just fixed 
a bug in the SMP load-balancer that can cause interactivity problems 
on large CPU count systems.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 17:38       ` Jens Axboe
@ 2009-09-07 20:44         ` Jens Axboe
  2009-09-08  9:13           ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-07 20:44 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

[-- Attachment #1: Type: text/plain, Size: 1454 bytes --]

On Mon, Sep 07 2009, Jens Axboe wrote:
> > And yes, it would be wonderful to get a test-app from you that would 
> > express the kind of pain you are seeing during compile jobs.
> 
> I was hoping this one would, but it's not showing anything. I even added
> support for doing the ping and wakeup over a socket, to see if the pipe
> test was doing well because of the sync wakeup we do there. The net
> latency is a little worse, but still good. So no luck in making that app
> so far.

Here's a version that bounces timestamps between a producer and a number
of consumers (clients). Not really tested much, but perhaps someone can
compare this on a box that boots BFS and see what happens.

To run it, use -cX where X is the number of children that you wait for a
response from. The max delay between this children is logged for each
wakeup. You can invoke it ala:

$ ./latt -c4 'make -j4'

and it'll dump the max/avg/stddev bounce time after make has completed,
or if you just want to play around, start the compile in one xterm and
do:

$ ./latt -c4 'sleep 5'

to just log for a small period of time. Vary the number of clients to
see how that changes the aggregated latency. 1 should be fast, adding
more clients quickly adds up.

Additionally, it has a -f and -t option that controls the window of
sleep time for the parent between each message. The numbers are in
msecs, and it defaults to a minimum of 100msecs and up to 500msecs.

-- 
Jens Axboe


[-- Attachment #2: latt.c --]
[-- Type: text/x-csrc, Size: 5561 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <getopt.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/time.h>
#include <time.h>
#include <math.h>


/*
 * In msecs
 */
static unsigned int min_delay = 100;
static unsigned int max_delay = 500;
static unsigned int clients = 1;

#define MAX_CLIENTS		512

struct delays {
	unsigned long nr_delays;
	unsigned long mmap_entries;
	unsigned long max_delay;
	unsigned long delays[0];
};

static struct delays *delays;
static int pipes[MAX_CLIENTS][2];

static unsigned long avg;
static double stddev;

static pid_t app_pid;

#define CLOCKSOURCE		CLOCK_MONOTONIC

#define DEF_ENTRIES		1024

static int parse_options(int argc, char *argv[])
{
	struct option l_opts[] = {
		{ "min-delay", 	1, 	NULL,	'f' },
		{ "max-delay",	1,	NULL,	't' },
		{ "clients",	1,	NULL,	'c' }
	};
	int c, res, index = 0;

	while ((c = getopt_long(argc, argv, "f:t:c:", l_opts, &res)) != -1) {
		index++;
		switch (c) {
			case 'f':
				min_delay = atoi(optarg);
				break;
			case 't':
				max_delay = atoi(optarg);
				break;
			case 'c':
				clients = atoi(optarg);
				if (clients > MAX_CLIENTS)
					clients = MAX_CLIENTS;
				break;
		}
	}

	return index + 1;
}

static pid_t fork_off(const char *app)
{
	pid_t pid;

	pid = fork();
	if (pid)
		return pid;

	exit(system(app));
}

#define entries_to_size(n)	((n) * sizeof(unsigned long) + sizeof(struct delays))

static unsigned long usec_since(struct timespec *start, struct timespec *end)
{
	long secs, nsecs, delay;

	secs = end->tv_sec - start->tv_sec;
	nsecs = end->tv_nsec - start->tv_nsec;

	delay = secs * 1000000L;
	delay += (nsecs / 1000L);

	return delay;
}

static unsigned long usec_since_now(struct timespec *start)
{
	struct timespec e;

	clock_gettime(CLOCKSOURCE, &e);
	return usec_since(start, &e);
}

static void log_delay(unsigned long delay)
{
	if (delays->nr_delays == delays->mmap_entries) {
		unsigned long new_size;

		delays->mmap_entries <<= 1;
		new_size = entries_to_size(delays->mmap_entries);
		delays = realloc(delays, new_size);
	}

	delays->delays[delays->nr_delays++] = delay;

	if (delay > delays->max_delay)
		delays->max_delay = delay;
}

static void run_child(int *pipe)
{
	struct timespec ts;

	do {
		int ret;

		ret = read(pipe[0], &ts, sizeof(ts));
		if (ret <= 0)
			break;

		clock_gettime(CLOCKSOURCE, &ts);

		ret = write(pipe[1], &ts, sizeof(ts));
		if (ret <= 0)
			break;
	} while (1);
}

static void do_rand_sleep(void)
{
	unsigned int msecs;

	msecs = min_delay + ((float) max_delay * (rand() / (RAND_MAX + 1.0)));
	usleep(msecs * 1000);
}

static void kill_connection(void)
{
	int i;

	for (i = 0; i < clients; i++) {
		if (pipes[i][0] != -1) {
			close(pipes[i][0]);
			pipes[i][0] = -1;
		}
		if (pipes[i][1] != -1) {
			close(pipes[i][1]);
			pipes[i][1] = -1;
		}
	}
}

static void run_parent(void)
{
	struct timespec *t1, t2;
	int status, ret, do_exit = 0, i;

	t1 = malloc(sizeof(struct timespec) * clients);

	srand(1234);

	do {
		unsigned long delay, max_delay = 0;

		do_rand_sleep();

		ret = waitpid(app_pid, &status, WNOHANG);
		if (ret < 0) {
			perror("waitpid");
			break;
		} else if (ret == app_pid &&
			   (WIFSIGNALED(status) || WIFEXITED(status))) {
			do_exit = 1;
			kill_connection();
		}
			
		for (i = 0; i < clients; i++) {
			clock_gettime(CLOCKSOURCE, &t1[i]);
			if (write(pipes[i][1], &t1[i], sizeof(t2)) != sizeof(t2)) {
				do_exit = 1;
				break;
			}
		}

		for (i = 0; i < clients; i++) {
			if (read(pipes[i][0], &t2, sizeof(t2)) != sizeof(t2)) {
				do_exit = 1;
				break;
			}
			delay = usec_since(&t1[i], &t2);
			if (delay > max_delay)
				max_delay = delay;
		}

		log_delay(max_delay);
	} while (!do_exit);

	kill_connection();
}

static void parent_setup_connection(void)
{
	int i;

	for (i = 0; i < clients; i++) {
		if (pipe(pipes[i])) {
			perror("pipe");
			return;
		}
	}
}

static void run_test(void)
{
	pid_t cpids[MAX_CLIENTS];
	int i, status;

	parent_setup_connection();

	for (i = 0; i < clients; i++) {
		cpids[i] = fork();
		if (cpids[i])
			continue;

		run_child(pipes[i]);
		exit(0);
	}

	run_parent();

	for (i = 0; i < clients; i++)
		kill(cpids[i], SIGQUIT);
	for (i = 0; i < clients; i++)
		waitpid(cpids[i], &status, 0);
}

static void setup_shared_area(void)
{
	delays = malloc(entries_to_size(DEF_ENTRIES));
	delays->nr_delays = 0;
	delays->mmap_entries = DEF_ENTRIES;
}

static void calc_latencies(void)
{
	unsigned long long sum = 0;
	int i;

	if (!delays->nr_delays)
		return;

	for (i = 0; i < delays->nr_delays; i++)
		sum += delays->delays[i];

	avg = sum / delays->nr_delays;

	if (delays->nr_delays < 2)
		return;

	sum = 0;
	for (i = 0; i < delays->nr_delays; i++) {
		long diff;

		diff = delays->delays[i] - avg;
		sum += (diff * diff);
	}

	stddev = sqrt(sum / (delays->nr_delays - 1));
}

static void handle_sigint(int sig)
{
	kill(app_pid, SIGINT);
}

int main(int argc, char *argv[])
{
	int app_offset, off;
	char app[256];

	setup_shared_area();

	off = 0;
	app_offset = parse_options(argc, argv);
	while (app_offset < argc) {
		if (off) {
			app[off] = ' ';
			off++;
		}
		off += sprintf(app + off, "%s", argv[app_offset]);
		app_offset++;
	}

	signal(SIGINT, handle_sigint);
	app_pid = fork_off(app);
	run_test();

	calc_latencies();

	printf("Entries: %lu (clients=%d)\n", delays->nr_delays, clients);
	printf("\nAverages (in usecs)\n");
	printf("-------------------\n");
	printf("\tMax\t %lu\n", delays->max_delay);
	printf("\tAvg\t %lu\n", avg);
	printf("\tStdev\t %.0f\n", stddev);

	free(delays);
	return 0;
}

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 20:36       ` Ingo Molnar
@ 2009-09-07 20:46         ` Jens Axboe
  2009-09-07 21:03           ` Peter Zijlstra
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-07 20:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

On Mon, Sep 07 2009, Ingo Molnar wrote:
> 
> * Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > Agree, I was actually looking into doing joint latency for X 
> > number of tasks for the test app. I'll try and do that and see if 
> > we can detect something from that.
> 
> Could you please try latest -tip:
> 
>    http://people.redhat.com/mingo/tip.git/README
> 
> (c26f010 or later)
> 
> Does it get any better with make -j128 build jobs? Peter just fixed 

The compile 'problem' is on my workstation, which is a dual core Intel
core 2. I use -j4 on that typically. On the bigger boxes, I don't notice
any interactivity problems, largely because I don't run anything latency
sensitive on those :-)

> a bug in the SMP load-balancer that can cause interactivity problems 
> on large CPU count systems.

Worth trying on the dual core box?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 18:51     ` Michael Buesch
@ 2009-09-07 20:57       ` Ingo Molnar
  2009-09-07 23:24         ` Pekka Pietikainen
  2009-09-08 15:45         ` Michael Buesch
  0 siblings, 2 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-07 20:57 UTC (permalink / raw)
  To: Michael Buesch
  Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith, Felix Fietkau


* Michael Buesch <mb@bu3sch.de> wrote:

> On Monday 07 September 2009 20:26:29 Ingo Molnar wrote:
> > Could you profile it please? Also, what's the context-switch rate?
> 
> As far as I can tell, the broadcom mips architecture does not have 
> profiling support. It does only have some proprietary profiling 
> registers that nobody wrote kernel support for, yet.

Well, what does 'vmstat 1' show - how many context switches are 
there per second on the iperf server? In theory if it's a truly 
saturated box, there shouldnt be many - just a single iperf task 
running at 100% CPU utilization or so.

(Also, if there's hrtimer support for that board then perfcounters 
could be used to profile it.)

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 20:46         ` Jens Axboe
@ 2009-09-07 21:03           ` Peter Zijlstra
  2009-09-07 21:05             ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Peter Zijlstra @ 2009-09-07 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ingo Molnar, Avi Kivity, Con Kolivas, linux-kernel, Mike Galbraith

On Mon, 2009-09-07 at 22:46 +0200, Jens Axboe wrote:
> > a bug in the SMP load-balancer that can cause interactivity problems 
> > on large CPU count systems.
> 
> Worth trying on the dual core box?

I debugged the issue on a dual core :-)

It should be more pronounced on larger machines, but its present on
dual-core too.


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 21:03           ` Peter Zijlstra
@ 2009-09-07 21:05             ` Jens Axboe
  2009-09-07 22:18               ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-07 21:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Avi Kivity, Con Kolivas, linux-kernel, Mike Galbraith

On Mon, Sep 07 2009, Peter Zijlstra wrote:
> On Mon, 2009-09-07 at 22:46 +0200, Jens Axboe wrote:
> > > a bug in the SMP load-balancer that can cause interactivity problems 
> > > on large CPU count systems.
> > 
> > Worth trying on the dual core box?
> 
> I debugged the issue on a dual core :-)
> 
> It should be more pronounced on larger machines, but its present on
> dual-core too.

Alright, I'll upgrade that box to -tip tomorrow and see if it makes
a noticable difference. At -j4 or higher, I can literally see windows
slowly popping up when switching to a different virtual desktop.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 21:05             ` Jens Axboe
@ 2009-09-07 22:18               ` Ingo Molnar
  0 siblings, 0 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-07 22:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Peter Zijlstra, Avi Kivity, Con Kolivas, linux-kernel, Mike Galbraith


* Jens Axboe <jens.axboe@oracle.com> wrote:

> On Mon, Sep 07 2009, Peter Zijlstra wrote:
> > On Mon, 2009-09-07 at 22:46 +0200, Jens Axboe wrote:
> > > > a bug in the SMP load-balancer that can cause interactivity problems 
> > > > on large CPU count systems.
> > > 
> > > Worth trying on the dual core box?
> > 
> > I debugged the issue on a dual core :-)
> > 
> > It should be more pronounced on larger machines, but its present on
> > dual-core too.
> 
> Alright, I'll upgrade that box to -tip tomorrow and see if it 
> makes a noticable difference. At -j4 or higher, I can literally 
> see windows slowly popping up when switching to a different 
> virtual desktop.

btw., if you run -tip and have these enabled:

  CONFIG_PERF_COUNTER=y 
  CONFIG_EVENT_TRACING=y

  cd tools/perf/
  make -j install

... then you can use a couple of new perfcounters features to 
measure scheduler latencies. For example:

  perf stat -e sched:sched_stat_wait -e task-clock ./hackbench 20

Will tell you how many times this workload got delayed by waiting 
for CPU time.

You can repeat the workload as well and see the statistical 
properties of those metrics:

 aldebaran:/home/mingo> perf stat --repeat 10 -e \
              sched:sched_stat_wait:r -e task-clock ./hackbench 20
 Time: 0.251
 Time: 0.214
 Time: 0.254
 Time: 0.278
 Time: 0.245
 Time: 0.308
 Time: 0.242
 Time: 0.222
 Time: 0.268
 Time: 0.244

 Performance counter stats for './hackbench 20' (10 runs):

          59826  sched:sched_stat_wait    #      0.026 M/sec   ( +-   5.540% )
    2280.099643  task-clock-msecs         #      7.525 CPUs    ( +-   1.620% )

    0.303013390  seconds time elapsed   ( +-   3.189% )

To get scheduling events, do:

 # perf list 2>&1 | grep sched:
  sched:sched_kthread_stop                   [Tracepoint event]
  sched:sched_kthread_stop_ret               [Tracepoint event]
  sched:sched_wait_task                      [Tracepoint event]
  sched:sched_wakeup                         [Tracepoint event]
  sched:sched_wakeup_new                     [Tracepoint event]
  sched:sched_switch                         [Tracepoint event]
  sched:sched_migrate_task                   [Tracepoint event]
  sched:sched_process_free                   [Tracepoint event]
  sched:sched_process_exit                   [Tracepoint event]
  sched:sched_process_wait                   [Tracepoint event]
  sched:sched_process_fork                   [Tracepoint event]
  sched:sched_signal_send                    [Tracepoint event]
  sched:sched_stat_wait                      [Tracepoint event]
  sched:sched_stat_sleep                     [Tracepoint event]
  sched:sched_stat_iowait                    [Tracepoint event]

stat_wait/sleep/iowait would be the interesting ones, for latency 
analysis.

Or, if you want to see all the specific delays and want to see 
min/max/avg, you can do:

  perf record -e sched:sched_stat_wait:r -f -R -c 1 ./hackbench 20
  perf trace

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 20:57       ` Ingo Molnar
@ 2009-09-07 23:24         ` Pekka Pietikainen
  2009-09-08  8:04           ` Ingo Molnar
  2009-09-08 15:45         ` Michael Buesch
  1 sibling, 1 reply; 216+ messages in thread
From: Pekka Pietikainen @ 2009-09-07 23:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michael Buesch, Con Kolivas, linux-kernel, Peter Zijlstra,
	Mike Galbraith, Felix Fietkau

On Mon, Sep 07, 2009 at 10:57:01PM +0200, Ingo Molnar wrote:
> > > Could you profile it please? Also, what's the context-switch rate?
> > 
> > As far as I can tell, the broadcom mips architecture does not have 
> > profiling support. It does only have some proprietary profiling 
> > registers that nobody wrote kernel support for, yet.
> Well, what does 'vmstat 1' show - how many context switches are 
> there per second on the iperf server? In theory if it's a truly 
> saturated box, there shouldnt be many - just a single iperf task 
Yay, finally something that's measurable in this thread \o/

Gigabit Ethernet iperf on an Atom or so might be something that 
shows similar effects yet is debuggable. Anyone feel like taking a shot?

That beast doing iperf probably ends up making it go quite close to it's
limits (IO, mem bw, cpu). IIRC the routing/bridging performance is
something like 40Mbps (depends a lot on the model, corresponds pretty
well with the Mhz of the beast). 

Maybe not totally unlike what make -j16 does to a 1-4 core box? 



^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07  3:38 ` Nikos Chantziaras
  2009-09-07 11:01   ` Frederic Weisbecker
  2009-09-07 14:40   ` Arjan van de Ven
@ 2009-09-07 23:54   ` Thomas Fjellstrom
  2009-09-08 11:30     ` Nikos Chantziaras
  2 siblings, 1 reply; 216+ messages in thread
From: Thomas Fjellstrom @ 2009-09-07 23:54 UTC (permalink / raw)
  To: linux-kernel

On Sun September 6 2009, Nikos Chantziaras wrote:
> On 09/06/2009 11:59 PM, Ingo Molnar wrote:
> >[...]
> > Also, i'd like to outline that i agree with the general goals
> > described by you in the BFS announcement - small desktop systems
> > matter more than large systems. We find it critically important
> > that the mainline Linux scheduler performs well on those systems
> > too - and if you (or anyone else) can reproduce suboptimal behavior
> > please let the scheduler folks know so that we can fix/improve it.
> 
> BFS improved behavior of many applications on my Intel Core 2 box in a
> way that can't be benchmarked.  Examples:
> 
> mplayer using OpenGL renderer doesn't drop frames anymore when dragging
> and dropping the video window around in an OpenGL composited desktop
> (KDE 4.3.1).  (Start moving the mplayer window around; then drop it. At
> the moment the move starts and at the moment you drop the window back to
> the desktop, there's a big frame skip as if mplayer was frozen for a
> bit; around 200 or 300ms.)
> 
> Composite desktop effects like zoom and fade out don't stall for
> sub-second periods of time while there's CPU load in the background.  In
> other words, the desktop is more fluid and less skippy even during heavy
> CPU load.  Moving windows around with CPU load in the background doesn't
> result in short skips.
> 
> LMMS (a tool utilizing real-time sound synthesis) does not produce
> "pops", "crackles" and drops in the sound during real-time playback due
> to buffer under-runs.  Those problems amplify when there's heavy CPU
> load in the background, while with BFS heavy load doesn't produce those
> artifacts (though LMMS makes itself run SCHED_ISO with BFS)  Also,
> hitting a key on the keyboard needs less time for the note to become
> audible when using BFS.  Same should hold true for other tools who
> traditionally benefit from the "-rt" kernel sources.
> 
> Games like Doom 3 and such don't "freeze" periodically for small amounts
> of time (again for sub-second amounts) when something in the background
> grabs CPU time (be it my mailer checking for new mail or a cron job, or
> whatever.)
> 
> And, the most drastic improvement here, with BFS I can do a "make -j2"
> in the kernel tree and the GUI stays fluid.  Without BFS, things start
> to lag, even with in-RAM builds (like having the whole kernel tree
> inside a tmpfs) and gcc running with nice 19 and ionice -c 3.
> 
> Unfortunately, I can't come up with any way to somehow benchmark all of
> this.  There's no benchmark for "fluidity" and "responsiveness".
> Running the Doom 3 benchmark, or any other benchmark, doesn't say
> anything about responsiveness, it only measures how many frames were
> calculated in a specific period of time.  How "stable" (with no stalls)
> those frames were making it to the screen is not measurable.
> 
> If BFS would imply small drops in pure performance counted in
> instructions per seconds, that would be a totally acceptable regression
> for desktop/multimedia/gaming PCs.  Not for server machines, of course.
>   However, on my machine, BFS is faster in classic workloads.  When I
> run "make -j2" with BFS and the standard scheduler, BFS always finishes
> a bit faster.  Not by much, but still.  One thing I'm noticing here is
> that BFS produces 100% CPU load on each core with "make -j2" while the
> normal scheduler stays at about 90-95% with -j2 or higher in at least
> one of the cores.  There seems to be under-utilization of CPU time.
> 
> Also, by searching around the net but also through discussions on
> various mailing lists, there seems to be a trend: the problems for some
> reason seem to occur more often with Intel CPUs (Core 2 chips and lower;
> I can't say anything about Core I7) while people on AMD CPUs mostly not
> being affected by most or even all of the above.  (And due to this flame
> wars often break out, with one party accusing the other of imagining
> things).  Can the integrated memory controller on AMD chips have
> something to do with this?  Do AMD chips generally offer better
> "multithrading" behavior?  Unfortunately, you didn't mention on what CPU
> you ran your tests.  If it was AMD, it might be a good idea to run tests
> on Pentium and Core 2 CPUs.
> 
> For reference, my system is:
> 
> CPU: Intel Core 2 Duo E6600 (2.4GHz)
> Mainboard: Asus P5E (Intel X38 chipset)
> RAM: 6GB (2+2+1+1) dual channel DDR2 800
> GPU: RV770 (Radeon HD4870).
> 

My Phenom 9550 (2.2Ghz) whips the pants off my Intel Q6600 (2.6Ghz). I and a 
friend of mine both get large amounts of stalling when doing a lot of IO. I 
haven't seen such horrible desktop interactivity since before the new 
schedulers and the -ck patchset came out for 2.4.x. Its a heck of a lot better 
on my AMD Phenom's, but some lag is noticeable these days, even when it wasn't 
a few kernel releases ago.

Intel Specs:
CPU: Intel Core 2 Quad Q6600 (2.6Ghz)
Mainboard: ASUS P5K-SE (Intel p35 iirc)
RAM: 4G 800Mhz DDR2 dual channel (4x1G)
GPU: NVidia 8800GTS 320M

AMD Specs:
CPU: AMD Phenom I 9550 (2.2Ghz)
Mainboard: Gigabyte MA78GM-S2H
RAM: 4G 800Mhz DDR2 dual channel (2x2G)
GPU: Onboard Radeon 3200HD

AMD Specs x2:
CPU: AMD Phenom II 810 (2.6Ghz)
Mainboard: Gigabyte MA790FXT-UD5P
RAM: 4G 1066Mhz DDR3 dual channel (2x2G)
GPU: NVidia 8800GTS 320M (or currently a 8400GS)

Of course I get better performance out of the Phenom II vs either other box, 
but it surprises me that I'd get more out of the budget AMD box over the not 
so budget Intel box.

-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 14:40   ` Arjan van de Ven
@ 2009-09-08  7:19     ` Nikos Chantziaras
  2009-09-08  8:31       ` Arjan van de Ven
  2009-09-08  8:38       ` Arjan van de Ven
  0 siblings, 2 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08  7:19 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel

On 09/07/2009 05:40 PM, Arjan van de Ven wrote:
> On Mon, 07 Sep 2009 06:38:36 +0300
> Nikos Chantziaras<realnc@arcor.de>  wrote:
>
>> On 09/06/2009 11:59 PM, Ingo Molnar wrote:
>>> [...]
>>> Also, i'd like to outline that i agree with the general goals
>>> described by you in the BFS announcement - small desktop systems
>>> matter more than large systems. We find it critically important
>>> that the mainline Linux scheduler performs well on those systems
>>> too - and if you (or anyone else) can reproduce suboptimal behavior
>>> please let the scheduler folks know so that we can fix/improve it.
>>
>> BFS improved behavior of many applications on my Intel Core 2 box in
>> a way that can't be benchmarked.  Examples:
>
> Have you tried to see if latencytop catches such latencies ?

I've just tried it.

I start latencytop and then mplayer on a video that doesn't max out the 
CPU (needs about 20-30% of a single core (out of 2 available)).  Then, 
while the video is playing, I press Alt+Tab repeatedly which makes the 
desktop compositor kick-in and stay active (it lays out all windows as a 
"flip-switch", similar to the Microsoft Vista Aero alt+tab effect). 
Repeatedly pressing alt+tab results in the compositor (in this case KDE 
4.3.1) keep doing processing.  With the mainline scheduler, mplayer 
starts dropping frames and skip sound like crazy for the whole duration 
of this exercise.

latencytop has this to say:

   http://foss.math.aegean.gr/~realnc/pics/latop1.png

Though I don't really understand what this tool is trying to tell me, I 
hope someone does.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 18:26   ` Ingo Molnar
  2009-09-07 18:47     ` Daniel Walker
  2009-09-07 18:51     ` Michael Buesch
@ 2009-09-08  7:48     ` Ingo Molnar
  2009-09-08  9:50       ` Benjamin Herrenschmidt
  2009-09-08 14:45       ` Michael Buesch
  2 siblings, 2 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-08  7:48 UTC (permalink / raw)
  To: Michael Buesch
  Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith, Felix Fietkau


* Ingo Molnar <mingo@elte.hu> wrote:

> That's interesting. I tried to reproduce it on x86, but the 
> profile does not show any scheduler overhead at all on the server:

I've now simulated a saturated iperf server by adding an 
udelay(3000) to e1000_intr() in via the patch below.

There's no idle time left that way:

 Cpu(s):  0.0%us,  2.6%sy,  0.0%ni,  0.0%id,  0.0%wa, 93.2%hi,  4.2%si,  0.0%st
 Mem:   1021044k total,    93400k used,   927644k free,     5068k buffers
 Swap:  8193140k total,        0k used,  8193140k free,    25404k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                    
 1604 mingo     20   0 38300  956  724 S 99.4  0.1   3:15.07 iperf                      
  727 root      15  -5     0    0    0 S  0.2  0.0   0:00.41 kondemand/0                
 1226 root      20   0  6452  336  240 S  0.2  0.0   0:00.06 irqbalance                 
 1387 mingo     20   0 78872 1988 1300 S  0.2  0.2   0:00.23 sshd                       
 1657 mingo     20   0 12752 1128  800 R  0.2  0.1   0:01.34 top                        
    1 root      20   0 10320  684  572 S  0.0  0.1   0:01.79 init                       
    2 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kthreadd                   

And the server is only able to saturate half of the 1 gigabit 
bandwidth:

 Client connecting to t, TCP port 5001
 TCP window size: 16.0 KByte (default)
 ------------------------------------------------------------
 [  3] local 10.0.1.19 port 50836 connected with 10.0.1.14 port 5001
 [ ID] Interval       Transfer     Bandwidth
 [  3]  0.0-10.0 sec    504 MBytes    423 Mbits/sec
 ------------------------------------------------------------
 Client connecting to t, TCP port 5001
 TCP window size: 16.0 KByte (default)
 ------------------------------------------------------------
 [  3] local 10.0.1.19 port 50837 connected with 10.0.1.14 port 5001
 [ ID] Interval       Transfer     Bandwidth
 [  3]  0.0-10.0 sec    502 MBytes    420 Mbits/sec


perf top is showing:

------------------------------------------------------------------------------
   PerfTop:   28517 irqs/sec  kernel:99.4% [100000 cycles],  (all, 1 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

           139553.00 - 93.2% : delay_tsc
             2098.00 -  1.4% : hmac_digest
              561.00 -  0.4% : ip_call_ra_chain
              335.00 -  0.2% : neigh_alloc
              279.00 -  0.2% : __hash_conntrack
              257.00 -  0.2% : dev_activate
              186.00 -  0.1% : proc_tcp_available_congestion_control
              178.00 -  0.1% : e1000_get_regs
              167.00 -  0.1% : tcp_event_data_recv

delay_tsc() dominates, as expected. Still zero scheduler overhead 
and the contex-switch rate is well below 1000 per sec.

Then i booted v2.6.30 vanilla, added the udelay(3000) and got:

 [  5] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 47026
 [  5]  0.0-10.0 sec    493 MBytes    412 Mbits/sec
 [  4] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 47027
 [  4]  0.0-10.0 sec    520 MBytes    436 Mbits/sec
 [  5] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 47028
 [  5]  0.0-10.0 sec    506 MBytes    424 Mbits/sec
 [  4] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 47029
 [  4]  0.0-10.0 sec    496 MBytes    415 Mbits/sec

i.e. essentially the same throughput. (and this shows that using .30 
versus .31 did not materially impact iperf performance in this test, 
under these conditions and with this hardware)

The i applied the BFS patch to v2.6.30 and used the same 
udelay(3000) hack and got:

No measurable change in throughput.

Obviously, this test is not equivalent to your test - but it does 
show that even saturated iperf is getting scheduled just fine. (or, 
rather, does not get scheduled all that much.)

[  5] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 38505
[  5]  0.0-10.1 sec    481 MBytes    401 Mbits/sec
[  4] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 38506
[  4]  0.0-10.0 sec    505 MBytes    423 Mbits/sec
[  5] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 38507
[  5]  0.0-10.0 sec    508 MBytes    426 Mbits/sec
[  4] local 10.0.1.14 port 5001 connected with 10.0.1.19 port 38508
[  4]  0.0-10.0 sec    486 MBytes    406 Mbits/sec

So either your MIPS system has some unexpected dependency on the 
scheduler, or there's something weird going on.

Mind poking on this one to figure out whether it's all repeatable 
and why that slowdown happens? Multiple attempts to reproduce it 
failed here for me.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 23:24         ` Pekka Pietikainen
@ 2009-09-08  8:04           ` Ingo Molnar
  2009-09-08  8:13             ` Nikos Chantziaras
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-08  8:04 UTC (permalink / raw)
  To: Pekka Pietikainen
  Cc: Michael Buesch, Con Kolivas, linux-kernel, Peter Zijlstra,
	Mike Galbraith, Felix Fietkau


* Pekka Pietikainen <pp@ee.oulu.fi> wrote:

> On Mon, Sep 07, 2009 at 10:57:01PM +0200, Ingo Molnar wrote:
> > > > Could you profile it please? Also, what's the context-switch rate?
> > > 
> > > As far as I can tell, the broadcom mips architecture does not have 
> > > profiling support. It does only have some proprietary profiling 
> > > registers that nobody wrote kernel support for, yet.
> > Well, what does 'vmstat 1' show - how many context switches are 
> > there per second on the iperf server? In theory if it's a truly 
> > saturated box, there shouldnt be many - just a single iperf task 
>
> Yay, finally something that's measurable in this thread \o/

My initial posting in this thread contains 6 separate types of 
measurements, rather extensive ones. Out of those, 4 measurements 
were latency oriented, two were throughput oriented. Plenty of data, 
plenty of results, and very good reproducability.

> Gigabit Ethernet iperf on an Atom or so might be something that 
> shows similar effects yet is debuggable. Anyone feel like taking a 
> shot?

I tried iperf on x86 and simulated saturation and no, there's no BFS 
versus mainline performance difference that i can measure - simply 
because a saturated iperf server does not schedule much - it's busy 
handling all that networking workload.

I did notice that iperf is somewhat noisy: it can easily have weird 
outliers regardless of which scheduler is used. That could be an 
effect of queueing/timing: depending on precisely what order packets 
arrive and they get queued by the networking stack, does get a 
cache-effective pathway of packets get opened - while with slightly 
different timings, that pathway closes and we get much worse 
queueing performance. I saw noise on the order of magnitude of 10%, 
so iperf has to be measured carefully before drawing conclusions.

> That beast doing iperf probably ends up making it go quite close 
> to it's limits (IO, mem bw, cpu). IIRC the routing/bridging 
> performance is something like 40Mbps (depends a lot on the model, 
> corresponds pretty well with the Mhz of the beast).
> 
> Maybe not totally unlike what make -j16 does to a 1-4 core box?

No, a single iperf session is very different from kbuild make -j16. 

Firstly, iperf server is just a single long-lived task - so we 
context-switch between that and the idle thread , [and perhaps a 
kernel thread such as ksoftirqd]. The scheduler essentially has no 
leeway what task to schedule and for how long: if there's work going 
on the iperf server task will run - if there's none, the idle task 
runs. [modulo ksoftirqd - depending on the driver model and 
dependent on precise timings.]

kbuild -j16 on the other hand is a complex hierarchy and mixture of 
thousands of short-lived and long-lived tasks. The scheduler has a 
lot of leeway to decide what to schedule and for how long.

>From a scheduler perspective the two workloads could not be any more 
different. Kbuild does test scheduler decisions in non-trivial ways 
- iperf server does not really.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08  8:04           ` Ingo Molnar
@ 2009-09-08  8:13             ` Nikos Chantziaras
  2009-09-08 10:12               ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08  8:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Pietikainen, Michael Buesch, Con Kolivas, linux-kernel,
	Peter Zijlstra, Mike Galbraith, Felix Fietkau

On 09/08/2009 11:04 AM, Ingo Molnar wrote:
>
> * Pekka Pietikainen<pp@ee.oulu.fi>  wrote:
>
>> On Mon, Sep 07, 2009 at 10:57:01PM +0200, Ingo Molnar wrote:
>>>>> Could you profile it please? Also, what's the context-switch rate?
>>>>
>>>> As far as I can tell, the broadcom mips architecture does not have
>>>> profiling support. It does only have some proprietary profiling
>>>> registers that nobody wrote kernel support for, yet.
>>> Well, what does 'vmstat 1' show - how many context switches are
>>> there per second on the iperf server? In theory if it's a truly
>>> saturated box, there shouldnt be many - just a single iperf task
>>
>> Yay, finally something that's measurable in this thread \o/
>
> My initial posting in this thread contains 6 separate types of
> measurements, rather extensive ones. Out of those, 4 measurements
> were latency oriented, two were throughput oriented. Plenty of data,
> plenty of results, and very good reproducability.

None of which involve latency-prone GUI applications running on cheap 
commodity hardware though.  I listed examples where mainline seems to 
behave sub-optimal and ways to reproduce them but this doesn't seem to 
be an area of interest.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08  7:19     ` Nikos Chantziaras
@ 2009-09-08  8:31       ` Arjan van de Ven
  2009-09-08 20:22         ` Frans Pop
  2009-09-08  8:38       ` Arjan van de Ven
  1 sibling, 1 reply; 216+ messages in thread
From: Arjan van de Ven @ 2009-09-08  8:31 UTC (permalink / raw)
  To: Nikos Chantziaras; +Cc: linux-kernel

On Tue, 08 Sep 2009 10:19:06 +0300
Nikos Chantziaras <realnc@arcor.de> wrote:

> latencytop has this to say:
> 
>    http://foss.math.aegean.gr/~realnc/pics/latop1.png
> 
> Though I don't really understand what this tool is trying to tell me,
> I hope someone does.

unfortunately this is both an older version of latencytop, and it's
incorrectly installed ;-(
Latencytop is supposed to translate those cryptic strings to english,
but due to not being correctly installed, it does not do this ;(

the latest version of latencytop also has a GUI (thanks to Ben) 

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08  7:19     ` Nikos Chantziaras
  2009-09-08  8:31       ` Arjan van de Ven
@ 2009-09-08  8:38       ` Arjan van de Ven
  2009-09-08 10:13         ` Nikos Chantziaras
  1 sibling, 1 reply; 216+ messages in thread
From: Arjan van de Ven @ 2009-09-08  8:38 UTC (permalink / raw)
  To: Nikos Chantziaras; +Cc: linux-kernel

On Tue, 08 Sep 2009 10:19:06 +0300
Nikos Chantziaras <realnc@arcor.de> wrote:

> latencytop has this to say:
> 
>    http://foss.math.aegean.gr/~realnc/pics/latop1.png
> 
> Though I don't really understand what this tool is trying to tell me,
> I hope someone does.

despite the untranslated content, it is clear that you have scheduler
delays (either due to scheduler bugs or cpu contention) of upto 68
msecs... Second in line is your binary AMD graphics driver that is
chewing up 14% of your total latency...


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 20:44         ` Jens Axboe
@ 2009-09-08  9:13           ` Jens Axboe
  2009-09-08 15:23             ` Peter Zijlstra
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-08  9:13 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

[-- Attachment #1: Type: text/plain, Size: 1068 bytes --]

On Mon, Sep 07 2009, Jens Axboe wrote:
> On Mon, Sep 07 2009, Jens Axboe wrote:
> > > And yes, it would be wonderful to get a test-app from you that would 
> > > express the kind of pain you are seeing during compile jobs.
> > 
> > I was hoping this one would, but it's not showing anything. I even added
> > support for doing the ping and wakeup over a socket, to see if the pipe
> > test was doing well because of the sync wakeup we do there. The net
> > latency is a little worse, but still good. So no luck in making that app
> > so far.
> 
> Here's a version that bounces timestamps between a producer and a number
> of consumers (clients). Not really tested much, but perhaps someone can
> compare this on a box that boots BFS and see what happens.

And here's a newer version. It ensures that clients are running before
sending a timestamp, and it drops the first and last log entry to
eliminate any weird effects there. Accuracy should also be improved.

On an idle box, it'll usually log all zeroes. Sometimes I see 3-4msec
latencies, weird.

-- 
Jens Axboe


[-- Attachment #2: latt.c --]
[-- Type: text/x-csrc, Size: 9792 bytes --]

/*
 * Simple latency tester that combines multiple processes.
 *
 * Compile: gcc -Wall -O2 -D_GNU_SOURCE -lrt -lm -o latt latt.c
 *
 * Run with: latt -c8 'program --args'
 *
 * Options:
 *
 *	-cX	Use X number of clients
 *	-fX	Use X msec as the minimum sleep time for the parent
 *	-tX	Use X msec as the maximum sleep time for the parent
 *	-v	Print all delays as they are logged
 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <getopt.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <time.h>
#include <math.h>
#include <poll.h>
#include <pthread.h>


/*
 * In msecs
 */
static unsigned int min_delay = 100;
static unsigned int max_delay = 500;
static unsigned int clients = 1;
static unsigned int verbose;

#define MAX_CLIENTS		512

struct delays {
	unsigned long nr_delays;
	unsigned long total_entries;
	unsigned long max_delay;
	unsigned long delays[0];
};

#define entries_to_size(n)	\
	((n) * sizeof(unsigned long) + sizeof(struct delays))

static struct delays *delays;
static int pipes[MAX_CLIENTS][2];

static unsigned long avg;
static double stddev;

static pid_t app_pid;

#define CLOCKSOURCE		CLOCK_MONOTONIC

#define DEF_ENTRIES		1024

struct mutex {
	pthread_mutex_t lock;
	pthread_cond_t cond;
	int value;
	int waiters;
};

static void init_mutex(struct mutex *mutex)
{
	pthread_mutexattr_t attr;
	pthread_condattr_t cond;

	pthread_mutexattr_init(&attr);
	pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
	pthread_condattr_init(&cond);
	pthread_condattr_setpshared(&cond, PTHREAD_PROCESS_SHARED);
	pthread_cond_init(&mutex->cond, &cond);
	pthread_mutex_init(&mutex->lock, &attr);

	mutex->value = 0;
	mutex->waiters = 0;
}

static void mutex_down(struct mutex *mutex)
{
	pthread_mutex_lock(&mutex->lock);

	while (!mutex->value) {
		mutex->waiters++;
		pthread_cond_wait(&mutex->cond, &mutex->lock);
		mutex->waiters--;
	}

	mutex->value--;
	pthread_mutex_unlock(&mutex->lock);
}

static void mutex_up(struct mutex *mutex)
{
	pthread_mutex_lock(&mutex->lock);
	if (!mutex->value && mutex->waiters)
		pthread_cond_signal(&mutex->cond);
	mutex->value++;
	pthread_mutex_unlock(&mutex->lock);
}

static int parse_options(int argc, char *argv[])
{
	struct option l_opts[] = {
		{ "min-delay", 	1, 	NULL,	'f' },
		{ "max-delay",	1,	NULL,	't' },
		{ "clients",	1,	NULL,	'c' },
		{ "verbose",	1,	NULL,	'v' }
	};
	int c, res, index = 0;

	while ((c = getopt_long(argc, argv, "f:t:c:v", l_opts, &res)) != -1) {
		index++;
		switch (c) {
			case 'f':
				min_delay = atoi(optarg);
				break;
			case 't':
				max_delay = atoi(optarg);
				break;
			case 'c':
				clients = atoi(optarg);
				if (clients > MAX_CLIENTS)
					clients = MAX_CLIENTS;
				break;
			case 'v':
				verbose = 1;
				break;
		}
	}

	return index + 1;
}

static pid_t fork_off(const char *app)
{
	pid_t pid;

	pid = fork();
	if (pid)
		return pid;

	exit(system(app));
}

static unsigned long usec_since(struct timespec *start, struct timespec *end)
{
	long secs, nsecs;

	secs = end->tv_sec - start->tv_sec;
	nsecs = end->tv_nsec - start->tv_nsec;

	return secs * 1000000L + nsecs / 1000;
}

static void log_delay(unsigned long delay)
{
	if (verbose)
		printf("log delay %8lu usec\n", delay);

	if (delays->nr_delays == delays->total_entries) {
		unsigned long new_size;

		delays->total_entries <<= 1;
		new_size = entries_to_size(delays->total_entries);
		delays = realloc(delays, new_size);
	}

	delays->delays[delays->nr_delays++] = delay;
}

/*
 * Reads a timestamp (which is ignored, it's just a wakeup call), and replies
 * with the timestamp of when we saw it
 */
static void run_child(int *pipe, struct mutex *mutex)
{
	struct timespec ts;

	mutex_up(mutex);

	do {
		int ret;

		ret = read(pipe[0], &ts, sizeof(ts));
		if (ret <= 0)
			break;

		clock_gettime(CLOCKSOURCE, &ts);

		ret = write(pipe[1], &ts, sizeof(ts));
		if (ret <= 0)
			break;
	} while (1);
}

/*
 * Do a random sleep between min and max delay
 */
static void do_rand_sleep(void)
{
	unsigned int msecs;

	msecs = min_delay + ((float) max_delay * (rand() / (RAND_MAX + 1.0)));
	usleep(msecs * 1000);
}

static void kill_connection(void)
{
	int i;

	for (i = 0; i < clients; i++) {
		if (pipes[i][0] != -1) {
			close(pipes[i][0]);
			pipes[i][0] = -1;
		}
		if (pipes[i][1] != -1) {
			close(pipes[i][1]);
			pipes[i][1] = -1;
		}
	}
}

static int __write_ts(int fd, struct timespec *ts)
{
	clock_gettime(CLOCKSOURCE, ts);

	return write(fd, ts, sizeof(*ts)) != sizeof(*ts);
}

static int write_ts(struct pollfd *pfd, unsigned int nr, struct timespec *ts)
{
	unsigned int i;

	for (i = 0; i < clients; i++) {
		if (pfd[i].revents & (POLLERR | POLLHUP | POLLNVAL))
			return 1;
		if (pfd[i].revents & POLLOUT) {
			pfd[i].events = 0;
			if (__write_ts(pfd[i].fd, &ts[i]))
				return 1;
			nr--;
		}
		if (!nr)
			break;
	}

	return 0;
}

static long __read_ts(int fd, struct timespec *ts)
{
	struct timespec t;

	if (read(fd, &t, sizeof(t)) != sizeof(t))
		return -1;

	return usec_since(ts, &t);
}

static long read_ts(struct pollfd *pfd, unsigned int nr, struct timespec *ts)
{
	long delay, max_delay = 0;
	unsigned int i;

	for (i = 0; i < clients; i++) {
		if (pfd[i].revents & (POLLERR | POLLHUP | POLLNVAL))
			return -1L;
		if (pfd[i].revents & POLLIN) {
			pfd[i].events = 0;
			delay = __read_ts(pfd[i].fd, &ts[i]);
			if (delay < 0)
				return -1L;
			else if (delay > max_delay)
				max_delay = delay;
			nr--;
		}
		if (!nr)
			break;
	}

	return max_delay;
}

static int app_has_exited(void)
{
	int ret, status;

	/*
	 * If our app has exited, stop
	 */
	ret = waitpid(app_pid, &status, WNOHANG);
	if (ret < 0) {
		perror("waitpid");
		return 1;
	} else if (ret == app_pid &&
		   (WIFSIGNALED(status) || WIFEXITED(status))) {
		return 1;
	}

	return 0;
}

/*
 * While our given app is running, send a timestamp to each client and
 * log the maximum latency for each of them to wakeup and reply
 */
static void run_parent(void)
{
	struct pollfd *ipfd, *opfd;
	int do_exit = 0, i;
	struct timespec *t1;

	t1 = malloc(sizeof(struct timespec) * clients);
	opfd = malloc(sizeof(struct pollfd) * clients);
	ipfd = malloc(sizeof(struct pollfd) * clients);

	srand(1234);

	do {
		unsigned long max_delay = 0;
		unsigned pending_events;

		do_rand_sleep();

		for (i = 0; i < clients; i++) {
			ipfd[i].fd = pipes[i][0];
			ipfd[i].events = POLLIN;
			opfd[i].fd = pipes[i][1];
			opfd[i].events = POLLOUT;
		}

		/*
		 * Write wakeup calls
		 */
		pending_events = clients;
		while (pending_events) {
			int evts = poll(opfd, clients, 0);

			if (app_has_exited()) {
				do_exit = 1;
				break;
			}

			if (evts < 0) {
				do_exit = 1;
				break;
			} else if (!evts)
				continue;

			if (write_ts(opfd, evts, t1)) {
				do_exit = 1;
				break;
			}

			pending_events -= evts;
		}

		if (do_exit)
			break;

		/*
		 * Poll and read replies
		 */
		pending_events = clients;
		while (pending_events) {
			int evts = poll(ipfd, clients, 0);

			if (app_has_exited()) {
				do_exit = 1;
				break;
			}

			if (evts < 0) {
				do_exit = 1;
				break;
			} else if (!evts)
				continue;

			max_delay = read_ts(ipfd, evts, t1);
			if (max_delay < 0) {
				do_exit = 1;
				break;
			}

			pending_events -= evts;
		}
		log_delay(max_delay);
	} while (!do_exit);

	free(t1);
	free(ipfd);
	free(opfd);
	kill_connection();
}

static void run_test(void)
{
	struct mutex *mutex;
	pid_t *cpids;
	int i, status;

	mutex = mmap(NULL, sizeof(*mutex), PROT_READ|PROT_WRITE,
			MAP_SHARED | MAP_ANONYMOUS, 0, 0);
	if (mutex == MAP_FAILED) {
		perror("mmap");
		return;
	}

	init_mutex(mutex);

	for (i = 0; i < clients; i++) {
		if (pipe(pipes[i])) {
			perror("pipe");
			return;
		}
	}

	cpids = malloc(sizeof(pid_t) * clients);

	for (i = 0; i < clients; i++) {
		cpids[i] = fork();
		if (cpids[i]) {
			mutex_down(mutex);
			continue;
		}

		run_child(pipes[i], mutex);
		exit(0);
	}

	run_parent();

	for (i = 0; i < clients; i++)
		kill(cpids[i], SIGQUIT);
	for (i = 0; i < clients; i++)
		waitpid(cpids[i], &status, 0);

	free(cpids);
	munmap(mutex, sizeof(*mutex));
}

static void setup_log(void)
{
	delays = malloc(entries_to_size(DEF_ENTRIES));
	delays->nr_delays = 0;
	delays->total_entries = DEF_ENTRIES;
}

/*
 * Calculate average and stddev for the entries in the log. Drop the
 * first and last entry.
 */
static int calc_latencies(void)
{
	unsigned long long sum = 0;
	int i;

	if (delays->nr_delays <= 2)
		return 1;

	for (i = 1; i < delays->nr_delays - 1; i++) {
		unsigned long delay = delays->delays[i];

		if (delay > delays->max_delay)
			delays->max_delay = delay;

		sum += delay;
	}

	avg = sum / (delays->nr_delays - 2);

	if (delays->nr_delays <= 3)
		return 0;

	sum = 0;
	for (i = 1; i < delays->nr_delays - 1; i++) {
		long diff;

		diff = delays->delays[i] - avg;
		sum += (diff * diff);
	}

	stddev = sqrt(sum / (delays->nr_delays - 3));
	return 0;
}

static void handle_sigint(int sig)
{
	kill(app_pid, SIGINT);
}

int main(int argc, char *argv[])
{
	int app_offset, off;
	char app[256];

	setup_log();

	off = 0;
	app_offset = parse_options(argc, argv);
	while (app_offset < argc) {
		if (off) {
			app[off] = ' ';
			off++;
		}
		off += sprintf(app + off, "%s", argv[app_offset]);
		app_offset++;
	}

	signal(SIGINT, handle_sigint);

	/*
	 * Start app and start logging latencies
	 */
	app_pid = fork_off(app);
	run_test();

	if (calc_latencies()) {
		printf("Runtime too short to render result\n");
		return 1;
	}

	printf("Entries: %lu (clients=%d)\n", delays->nr_delays, clients);
	printf("\nAverages:\n");
	printf("------------------------------\n");
	printf("\tMax\t %8lu usec\n", delays->max_delay);
	printf("\tAvg\t %8lu usec\n", avg);
	printf("\tStdev\t %8.0f usec\n", stddev);

	free(delays);
	return 0;
}

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08  7:48     ` Ingo Molnar
@ 2009-09-08  9:50       ` Benjamin Herrenschmidt
  2009-09-08 13:09         ` Ralf Baechle
  2009-09-08 13:09         ` Felix Fietkau
  2009-09-08 14:45       ` Michael Buesch
  1 sibling, 2 replies; 216+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-08  9:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michael Buesch, Con Kolivas, linux-kernel, Peter Zijlstra,
	Mike Galbraith, Felix Fietkau

On Tue, 2009-09-08 at 09:48 +0200, Ingo Molnar wrote:
> So either your MIPS system has some unexpected dependency on the 
> scheduler, or there's something weird going on.
> 
> Mind poking on this one to figure out whether it's all repeatable 
> and why that slowdown happens? Multiple attempts to reproduce it 
> failed here for me.

Could it be the scheduler using constructs that don't do well on MIPS ? 

I remember at some stage we spotted an expensive multiply in there,
maybe there's something similar, or some unaligned or non-cache friendly
vs. the MIPS cache line size data structure, that sort of thing ...

Is this a SW loaded TLB ? Does it misses on kernel space ? That could
also be some differences in how many pages are touched by each scheduler
causing more TLB pressure. This will be mostly invisible on x86.

At this stage, it will be hard to tell without some profile data I
suppose. Maybe next week I can try on a small SW loaded TLB embedded PPC
see if I can reproduce some of that, but no promises here.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08  8:13             ` Nikos Chantziaras
@ 2009-09-08 10:12               ` Ingo Molnar
  2009-09-08 10:40                 ` Nikos Chantziaras
  2009-09-08 12:00                 ` el_es
  0 siblings, 2 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-08 10:12 UTC (permalink / raw)
  To: Nikos Chantziaras
  Cc: Pekka Pietikainen, Michael Buesch, Con Kolivas, linux-kernel,
	Peter Zijlstra, Mike Galbraith, Felix Fietkau


* Nikos Chantziaras <realnc@arcor.de> wrote:

> On 09/08/2009 11:04 AM, Ingo Molnar wrote:
>>
>> * Pekka Pietikainen<pp@ee.oulu.fi>  wrote:
>>
>>> On Mon, Sep 07, 2009 at 10:57:01PM +0200, Ingo Molnar wrote:
>>>>>> Could you profile it please? Also, what's the context-switch rate?
>>>>>
>>>>> As far as I can tell, the broadcom mips architecture does not have
>>>>> profiling support. It does only have some proprietary profiling
>>>>> registers that nobody wrote kernel support for, yet.
>>>> Well, what does 'vmstat 1' show - how many context switches are
>>>> there per second on the iperf server? In theory if it's a truly
>>>> saturated box, there shouldnt be many - just a single iperf task
>>>
>>> Yay, finally something that's measurable in this thread \o/
>>
>> My initial posting in this thread contains 6 separate types of 
>> measurements, rather extensive ones. Out of those, 4 measurements 
>> were latency oriented, two were throughput oriented. Plenty of 
>> data, plenty of results, and very good reproducability.
>
> None of which involve latency-prone GUI applications running on 
> cheap commodity hardware though. [...]

The lat_tcp, lat_pipe and pipe-test numbers are all benchmarks that 
characterise such workloads - they show the latency of context 
switches.

I also tested where Con posted numbers that BFS has an edge over 
mainline: kbuild performance. Should i not have done that?

Also note the interbench latency measurements that Con posted:

   http://ck.kolivas.org/patches/bfs/interbench-bfs-cfs.txt

--- Benchmarking simulated cpu of Audio in the presence of simulated ---
Load    Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None      0.004 +/- 0.00436    0.006             100            100
Video     0.008 +/- 0.00879    0.015             100            100
X         0.006 +/- 0.0067     0.014             100            100
Burn      0.005 +/- 0.00563    0.009             100            100
Write     0.005 +/- 0.00887     0.16             100            100
Read      0.006 +/- 0.00696    0.018             100            100
Compile   0.007 +/- 0.00751    0.019             100            100

Versus the mainline scheduler:

--- Benchmarking simulated cpu of Audio in the presence of simulated ---
Load    Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None      0.005 +/- 0.00562    0.007             100            100
Video     0.003 +/- 0.00333    0.009             100            100
X         0.003 +/- 0.00409     0.01             100            100
Burn      0.004 +/- 0.00415    0.006             100            100
Write     0.005 +/- 0.00592    0.021             100            100
Read      0.004 +/- 0.00463    0.009             100            100
Compile   0.003 +/- 0.00426    0.014             100            100

look at those standard deviation numbers, their spread is way too 
high, often 50% or more - very hard to compare such noisy data. 

Furthermore, they happen to show the 2.6.30 mainline scheduler 
outperforming BFS in almost every interactivity metric.

Check it for yourself and compare the entries. I havent made those 
measurements, Con did.

For example 'Compile' latencies:

--- Benchmarking simulated cpu of Audio in the presence of simulated Load
                  Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
 v2.6.30: Compile   0.003 +/- 0.00426    0.014             100            100
     BFS: Compile   0.007 +/- 0.00751    0.019             100            100

but ... with a near 100% standard deviation that's pretty hard to 
judge. The Max Latency went from 14 usecs under v2.6.30 to 19 usecs 
on BFS.

> [...]  I listed examples where mainline seems to behave 
> sub-optimal and ways to reproduce them but this doesn't seem to be 
> an area of interest.

It is an area of interest of course. That's how the interactivity 
results above became possible.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08  8:38       ` Arjan van de Ven
@ 2009-09-08 10:13         ` Nikos Chantziaras
  2009-09-08 11:32           ` Juergen Beisert
                             ` (2 more replies)
  0 siblings, 3 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08 10:13 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel

On 09/08/2009 11:38 AM, Arjan van de Ven wrote:
> On Tue, 08 Sep 2009 10:19:06 +0300
> Nikos Chantziaras<realnc@arcor.de>  wrote:
>
>> latencytop has this to say:
>>
>>     http://foss.math.aegean.gr/~realnc/pics/latop1.png
>>
>> Though I don't really understand what this tool is trying to tell me,
>> I hope someone does.
>
> despite the untranslated content, it is clear that you have scheduler
> delays (either due to scheduler bugs or cpu contention) of upto 68
> msecs... Second in line is your binary AMD graphics driver that is
> chewing up 14% of your total latency...

I've now used a correctly installed and up-to-date version of latencytop 
and repeated the test.  Also, I got rid of AMD's binary blob and used 
kernel DRM drivers for my graphics card to throw fglrx out of the 
equation (which btw didn't help; the exact same problems occur).

Here the result:

     http://foss.math.aegean.gr/~realnc/pics/latop2.png

Again: this is on an Intel Core 2 Duo CPU.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 10:12               ` Ingo Molnar
@ 2009-09-08 10:40                 ` Nikos Chantziaras
  2009-09-08 11:35                   ` Ingo Molnar
  2009-09-08 12:00                 ` el_es
  1 sibling, 1 reply; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08 10:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Pietikainen, Michael Buesch, Con Kolivas, linux-kernel,
	Peter Zijlstra, Mike Galbraith, Felix Fietkau

On 09/08/2009 01:12 PM, Ingo Molnar wrote:
>
> * Nikos Chantziaras<realnc@arcor.de>  wrote:
>
>> On 09/08/2009 11:04 AM, Ingo Molnar wrote:
>>>
>>> * Pekka Pietikainen<pp@ee.oulu.fi>   wrote:
>>>
>>>> On Mon, Sep 07, 2009 at 10:57:01PM +0200, Ingo Molnar wrote:
>>>>>>> Could you profile it please? Also, what's the context-switch rate?
>>>>>>
>>>>>> As far as I can tell, the broadcom mips architecture does not have
>>>>>> profiling support. It does only have some proprietary profiling
>>>>>> registers that nobody wrote kernel support for, yet.
>>>>> Well, what does 'vmstat 1' show - how many context switches are
>>>>> there per second on the iperf server? In theory if it's a truly
>>>>> saturated box, there shouldnt be many - just a single iperf task
>>>>
>>>> Yay, finally something that's measurable in this thread \o/
>>>
>>> My initial posting in this thread contains 6 separate types of
>>> measurements, rather extensive ones. Out of those, 4 measurements
>>> were latency oriented, two were throughput oriented. Plenty of
>>> data, plenty of results, and very good reproducability.
>>
>> None of which involve latency-prone GUI applications running on
>> cheap commodity hardware though. [...]
>
> The lat_tcp, lat_pipe and pipe-test numbers are all benchmarks that
> characterise such workloads - they show the latency of context
> switches.
>
> I also tested where Con posted numbers that BFS has an edge over
> mainline: kbuild performance. Should i not have done that?

It's good that you did, of course.  However, when someone reports a 
problem/issue, the developer usually tries to reproduce the problem; he 
needs to see what the user sees.  This is how it's usually done, not 
only in most other development environments, but also here from I could 
gather by reading this list.  When getting reports about interactivity 
issues and with very specific examples of how to reproduce, I would have 
expected that most developers interested in identifying the issue would 
try to reproduce the same problem and work from there.  That would mean 
that you (or anyone else with an interest of tracking this down) would 
follow the examples given (by me and others, like enabling desktop 
compositing, firing up mplayer with a video and generally reproducing 
this using the quite detailed steps I posted as a recipe).

However, in this case, instead of the above, raw numbers are posted with 
batch jobs and benchmarks that aren't actually reproducing the issue as 
described by the reporter(s).  That way, the developer doesn't get to 
experience the issue firt-hand (and due to this possibly missing the 
real cause).  In most other bug reports or issues, the right thing seems 
to happen and the devs try to reproduce it exactly as described.  But 
not in this case.  I suspect this is due to most devs not using the 
software components on their machines that are necessary for this and 
therefore it would take too much time to reproduce the issue exactly as 
described?

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 23:54   ` Thomas Fjellstrom
@ 2009-09-08 11:30     ` Nikos Chantziaras
  0 siblings, 0 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08 11:30 UTC (permalink / raw)
  To: tfjellstrom; +Cc: linux-kernel

On 09/08/2009 02:54 AM, Thomas Fjellstrom wrote:
> On Sun September 6 2009, Nikos Chantziaras wrote:
>>  [...]
>> For reference, my system is:
>>
>> CPU: Intel Core 2 Duo E6600 (2.4GHz)
>> Mainboard: Asus P5E (Intel X38 chipset)
>> RAM: 6GB (2+2+1+1) dual channel DDR2 800
>> GPU: RV770 (Radeon HD4870).
>>
>
> My Phenom 9550 (2.2Ghz) whips the pants off my Intel Q6600 (2.6Ghz). I and a
> friend of mine both get large amounts of stalling when doing a lot of IO. I
> haven't seen such horrible desktop interactivity since before the new
> schedulers and the -ck patchset came out for 2.4.x. Its a heck of a lot better
> on my AMD Phenom's, but some lag is noticeable these days, even when it wasn't
> a few kernel releases ago.

It seems someone tried BFS on quite slower hardware: Android.  According 
to the feedback, the device is much more responsive with BFS: 
http://twitter.com/cyanogen

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 10:13         ` Nikos Chantziaras
@ 2009-09-08 11:32           ` Juergen Beisert
  2009-09-08 22:00             ` Nikos Chantziaras
  2009-09-08 12:03           ` Theodore Tso
  2009-09-08 14:20           ` Arjan van de Ven
  2 siblings, 1 reply; 216+ messages in thread
From: Juergen Beisert @ 2009-09-08 11:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Nikos Chantziaras, Arjan van de Ven

On Dienstag, 8. September 2009, Nikos Chantziaras wrote:
> On 09/08/2009 11:38 AM, Arjan van de Ven wrote:
> > On Tue, 08 Sep 2009 10:19:06 +0300
> >
> > Nikos Chantziaras<realnc@arcor.de>  wrote:
> >> latencytop has this to say:
> >>
> >>     http://foss.math.aegean.gr/~realnc/pics/latop1.png
> >>
> >> Though I don't really understand what this tool is trying to tell me,
> >> I hope someone does.
> >
> > despite the untranslated content, it is clear that you have scheduler
> > delays (either due to scheduler bugs or cpu contention) of upto 68
> > msecs... Second in line is your binary AMD graphics driver that is
> > chewing up 14% of your total latency...
>
> I've now used a correctly installed and up-to-date version of latencytop
> and repeated the test.  Also, I got rid of AMD's binary blob and used
> kernel DRM drivers for my graphics card to throw fglrx out of the
> equation (which btw didn't help; the exact same problems occur).
>
> Here the result:
>
>      http://foss.math.aegean.gr/~realnc/pics/latop2.png
>
> Again: this is on an Intel Core 2 Duo CPU.

Just an idea: Maybe some system management code hits you?

jbe

-- 
Pengutronix e.K.                              | Juergen Beisert             |
Linux Solutions for Science and Industry      | Phone: +49-8766-939 228     |
Vertretung Sued/Muenchen, Germany             | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686              | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 10:40                 ` Nikos Chantziaras
@ 2009-09-08 11:35                   ` Ingo Molnar
  2009-09-08 19:06                     ` Nikos Chantziaras
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-08 11:35 UTC (permalink / raw)
  To: Nikos Chantziaras
  Cc: Pekka Pietikainen, Michael Buesch, Con Kolivas, linux-kernel,
	Peter Zijlstra, Mike Galbraith, Felix Fietkau


* Nikos Chantziaras <realnc@arcor.de> wrote:

> [...] That would mean that you (or anyone else with an interest of 
> tracking this down) would follow the examples given (by me and 
> others, like enabling desktop compositing, firing up mplayer with 
> a video and generally reproducing this using the quite detailed 
> steps I posted as a recipe).

Could you follow up on Frederic's detailed tracing suggestions that 
would give us the source of the latency?

( Also, as per lkml etiquette, please try to keep the Cc: list 
  intact when replying to emails. I missed your first reply
  that you un-Cc:-ed. )

A quick look at the latencytop output suggests a scheduling latency. 
Could you send me the kernel .config that you are using?

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 10:12               ` Ingo Molnar
  2009-09-08 10:40                 ` Nikos Chantziaras
@ 2009-09-08 12:00                 ` el_es
  1 sibling, 0 replies; 216+ messages in thread
From: el_es @ 2009-09-08 12:00 UTC (permalink / raw)
  To: linux-kernel

Ingo Molnar <mingo <at> elte.hu> writes:


> For example 'Compile' latencies:
> 
> --- Benchmarking simulated cpu of Audio in the presence of simulated Load
>                   Latency +/- SD (ms)  Max Latency   % Desired CPU  %
Deadlines Met
>  v2.6.30: Compile   0.003 +/- 0.00426    0.014             100            100
>      BFS: Compile   0.007 +/- 0.00751    0.019             100            100
> 
> but ... with a near 100% standard deviation that's pretty hard to 
> judge. The Max Latency went from 14 usecs under v2.6.30 to 19 usecs 
> on BFS.
> 
[...]
> 	Ingo
> 

This just struck me : maybe what desktop users *feel* is exactly that : current
approach is too fine-grained, trying to achieve the minimum latency with *most*
reproductible result (less stddev) at all cost ? And BFS just doesn't care? 
I know this sounds like heresy.

[ the space below is to satisfy the brain-dead GMane posting engine].










Lukasz




^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 10:13         ` Nikos Chantziaras
  2009-09-08 11:32           ` Juergen Beisert
@ 2009-09-08 12:03           ` Theodore Tso
  2009-09-08 21:28             ` Nikos Chantziaras
  2009-09-08 14:20           ` Arjan van de Ven
  2 siblings, 1 reply; 216+ messages in thread
From: Theodore Tso @ 2009-09-08 12:03 UTC (permalink / raw)
  To: Nikos Chantziaras; +Cc: Arjan van de Ven, linux-kernel

On Tue, Sep 08, 2009 at 01:13:34PM +0300, Nikos Chantziaras wrote:
>> despite the untranslated content, it is clear that you have scheduler
>> delays (either due to scheduler bugs or cpu contention) of upto 68
>> msecs... Second in line is your binary AMD graphics driver that is
>> chewing up 14% of your total latency...
>
> I've now used a correctly installed and up-to-date version of latencytop  
> and repeated the test.  Also, I got rid of AMD's binary blob and used  
> kernel DRM drivers for my graphics card to throw fglrx out of the  
> equation (which btw didn't help; the exact same problems occur).
>
> Here the result:
>
>     http://foss.math.aegean.gr/~realnc/pics/latop2.png

This was with an unmodified 2.6.31-rcX kernel?  Does Latencytop do
anything useful on a BFS-patched kernel?

							- Ted

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Epic regression in throughput since v2.6.23
  2009-09-06 20:59 BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
                   ` (4 preceding siblings ...)
  2009-09-07 15:16 ` BFS vs. mainline scheduler benchmarks and measurements Michael Buesch
@ 2009-09-08 12:57 ` Serge Belyshev
  2009-09-08 17:47   ` Jesse Brandeburg
                     ` (2 more replies)
  2009-09-10  7:43 ` [updated] BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
  2009-09-14  9:46 ` Phoronix CFS vs BFS bencharks Nikos Chantziaras
  7 siblings, 3 replies; 216+ messages in thread
From: Serge Belyshev @ 2009-09-08 12:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith


Hi. I've done measurments of time taken by make -j4 kernel build
on a quadcore box.  Results are interesting: mainline kernel
has regressed since v2.6.23 release by more than 10%.

The following graph is time taken by "make -j4" (median over 9 runs)
versus kernel version.  The huge (10%) regressions since v2.6.23 is
apparent.  Note that tip/master c26f010 is better than current mainline.
Also note that BFS is significantly better than both and shows the same
throughput as vanilla v2.6.23:

http://img403.imageshack.us/img403/7029/epicmakej4.png


The following plot is a detailed comparison of time taken versus number
of parallel jobs. Note that at "make -j4" (which equals number of hardware
threads), BFS has the minimum (best performance), 
and tip/master -- maximum (worst).  I've also tested mainline v2.6.31
(not shown on the graph) which produces similar, albeit a bit slower,
results as the tip/master.

http://img179.imageshack.us/img179/5335/epicbfstip.png


Conclusion are
1) mainline has severely regressed since v2.6.23
2) BFS shows optimal performance at make -jN where N equals number of
   h/w threads, while current mainline scheduler performance is far from
   optimal in this case.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08  9:50       ` Benjamin Herrenschmidt
@ 2009-09-08 13:09         ` Ralf Baechle
  2009-09-09  1:36           ` Felix Fietkau
  2009-09-08 13:09         ` Felix Fietkau
  1 sibling, 1 reply; 216+ messages in thread
From: Ralf Baechle @ 2009-09-08 13:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Michael Buesch, Con Kolivas, linux-kernel,
	Peter Zijlstra, Mike Galbraith, Felix Fietkau

On Tue, Sep 08, 2009 at 07:50:00PM +1000, Benjamin Herrenschmidt wrote:

> On Tue, 2009-09-08 at 09:48 +0200, Ingo Molnar wrote:
> > So either your MIPS system has some unexpected dependency on the 
> > scheduler, or there's something weird going on.
> > 
> > Mind poking on this one to figure out whether it's all repeatable 
> > and why that slowdown happens? Multiple attempts to reproduce it 
> > failed here for me.
> 
> Could it be the scheduler using constructs that don't do well on MIPS ? 

It would surprise me.

I'm wondering if BFS has properties that make it perform better on a very
low memory system; I guess the BCM74xx system will have like 32MB or 64MB
only.

> I remember at some stage we spotted an expensive multiply in there,
> maybe there's something similar, or some unaligned or non-cache friendly
> vs. the MIPS cache line size data structure, that sort of thing ...
> 
> Is this a SW loaded TLB ? Does it misses on kernel space ? That could
> also be some differences in how many pages are touched by each scheduler
> causing more TLB pressure. This will be mostly invisible on x86.

Software refilled.  No misses ever for kernel space or low-mem; think of
it as low-mem and kernel executable living in a 512MB page that is mapped
by a mechanism outside the TLB.  Vmalloc ranges are TLB mapped.  Ioremap
address ranges only if above physical address 512MB.

An emulated unaligned load/store is very expensive; one that is encoded
properly by GCC for __attribute__((packed)) is only 1 cycle and 1
instruction ( = 4 bytes) extra.

> At this stage, it will be hard to tell without some profile data I
> suppose. Maybe next week I can try on a small SW loaded TLB embedded PPC
> see if I can reproduce some of that, but no promises here.

  Ralf

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08  9:50       ` Benjamin Herrenschmidt
  2009-09-08 13:09         ` Ralf Baechle
@ 2009-09-08 13:09         ` Felix Fietkau
  2009-09-09  0:28           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 216+ messages in thread
From: Felix Fietkau @ 2009-09-08 13:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Michael Buesch, Con Kolivas, linux-kernel,
	Peter Zijlstra, Mike Galbraith

Benjamin Herrenschmidt wrote:
> On Tue, 2009-09-08 at 09:48 +0200, Ingo Molnar wrote:
>> So either your MIPS system has some unexpected dependency on the 
>> scheduler, or there's something weird going on.
>> 
>> Mind poking on this one to figure out whether it's all repeatable 
>> and why that slowdown happens? Multiple attempts to reproduce it 
>> failed here for me.
> 
> Could it be the scheduler using constructs that don't do well on MIPS ? 
> 
> I remember at some stage we spotted an expensive multiply in there,
> maybe there's something similar, or some unaligned or non-cache friendly
> vs. the MIPS cache line size data structure, that sort of thing ...
> 
> Is this a SW loaded TLB ? Does it misses on kernel space ? That could
> also be some differences in how many pages are touched by each scheduler
> causing more TLB pressure. This will be mostly invisible on x86.
The TLB is SW loaded, yes. However it should not do any misses on kernel
space, since the whole segment is in a wired TLB entry.

- Felix

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 10:13         ` Nikos Chantziaras
  2009-09-08 11:32           ` Juergen Beisert
  2009-09-08 12:03           ` Theodore Tso
@ 2009-09-08 14:20           ` Arjan van de Ven
  2009-09-08 22:53             ` Nikos Chantziaras
  2 siblings, 1 reply; 216+ messages in thread
From: Arjan van de Ven @ 2009-09-08 14:20 UTC (permalink / raw)
  To: Nikos Chantziaras; +Cc: linux-kernel, mingo

On Tue, 08 Sep 2009 13:13:34 +0300
Nikos Chantziaras <realnc@arcor.de> wrote:

> On 09/08/2009 11:38 AM, Arjan van de Ven wrote:
> > On Tue, 08 Sep 2009 10:19:06 +0300
> > Nikos Chantziaras<realnc@arcor.de>  wrote:
> >
> >> latencytop has this to say:
> >>
> >>     http://foss.math.aegean.gr/~realnc/pics/latop1.png
> >>
> >> Though I don't really understand what this tool is trying to tell
> >> me, I hope someone does.
> >
> > despite the untranslated content, it is clear that you have
> > scheduler delays (either due to scheduler bugs or cpu contention)
> > of upto 68 msecs... Second in line is your binary AMD graphics
> > driver that is chewing up 14% of your total latency...
> 
> I've now used a correctly installed and up-to-date version of
> latencytop and repeated the test.  Also, I got rid of AMD's binary
> blob and used kernel DRM drivers for my graphics card to throw fglrx
> out of the equation (which btw didn't help; the exact same problems
> occur).
> 
> Here the result:
> 
>      http://foss.math.aegean.gr/~realnc/pics/latop2.png
> 
> Again: this is on an Intel Core 2 Duo CPU.


so we finally have objective numbers!

now the interesting part is also WHERE the latency hits. Because
fundamentally, if you oversubscribe the CPU, you WILL get scheduling
latency.. simply you have more to run than there is CPU.

Now the scheduler impacts this latency in two ways
* Deciding how long apps run before someone else gets to take over
  ("time slicing")
* Deciding who gets to run first/more; eg priority between apps

the first one more or less controls the maximum, while the second one
controls which apps get to enjoy this maximum.

latencytop shows you both, but it is interesting to see how much the
apps get that you care about latency for....



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08  7:48     ` Ingo Molnar
  2009-09-08  9:50       ` Benjamin Herrenschmidt
@ 2009-09-08 14:45       ` Michael Buesch
  2009-09-18 11:24         ` Ingo Molnar
  1 sibling, 1 reply; 216+ messages in thread
From: Michael Buesch @ 2009-09-08 14:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith, Felix Fietkau

On Tuesday 08 September 2009 09:48:25 Ingo Molnar wrote:
> Mind poking on this one to figure out whether it's all repeatable 
> and why that slowdown happens?

I repeated the test several times, because I couldn't really believe that
there's such a big difference for me, but the results were the same.
I don't really know what's going on nor how to find out what's going on.

-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08  9:13           ` Jens Axboe
@ 2009-09-08 15:23             ` Peter Zijlstra
  2009-09-08 20:34               ` Jens Axboe
  2009-09-09 11:52               ` BFS vs. mainline scheduler benchmarks and measurements Nikos Chantziaras
  0 siblings, 2 replies; 216+ messages in thread
From: Peter Zijlstra @ 2009-09-08 15:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ingo Molnar, Con Kolivas, linux-kernel, Mike Galbraith

[-- Attachment #1: Type: text/plain, Size: 1272 bytes --]

On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
> And here's a newer version.

I tinkered a bit with your proglet and finally found the problem.

You used a single pipe per child, this means the loop in run_child()
would consume what it just wrote out until it got force preempted by the
parent which would also get woken.

This results in the child spinning a while (its full quota) and only
reporting the last timestamp to the parent.

Since consumer (parent) is a single thread the program basically
measures the worst delay in a thundering herd wakeup of N children.

The below version yields:

idle

[root@opteron sched]# ./latt -c8 sleep 30
Entries: 664 (clients=8)

Averages:
------------------------------
        Max           128 usec
        Avg            26 usec
        Stdev          16 usec


make -j4

[root@opteron sched]# ./latt -c8 sleep 30
Entries: 648 (clients=8)                 

Averages:
------------------------------
        Max         20861 usec
        Avg          3763 usec
        Stdev        4637 usec


Mike's patch, make -j4

[root@opteron sched]# ./latt -c8 sleep 30
Entries: 648 (clients=8)

Averages:
------------------------------
        Max         17854 usec
        Avg          6298 usec
        Stdev        4735 usec


[-- Attachment #2: latt.c --]
[-- Type: text/x-csrc, Size: 9214 bytes --]

/*
 * Simple latency tester that combines multiple processes.
 *
 * Compile: gcc -Wall -O2 -D_GNU_SOURCE -lrt -lm -o latt latt.c
 *
 * Run with: latt -c8 'program --args'
 *
 * Options:
 *
 *	-cX	Use X number of clients
 *	-fX	Use X msec as the minimum sleep time for the parent
 *	-tX	Use X msec as the maximum sleep time for the parent
 *	-v	Print all delays as they are logged
 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <getopt.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <time.h>
#include <math.h>
#include <poll.h>
#include <pthread.h>


/*
 * In msecs
 */
static unsigned int min_delay = 100;
static unsigned int max_delay = 500;
static unsigned int clients = 1;
static unsigned int verbose;

#define MAX_CLIENTS		512

struct stats
{
	double n, mean, M2, max;
};

static void update_stats(struct stats *stats, unsigned long long val)
{
	double delta, x = val;

	stats->n++;
	delta = x - stats->mean;
	stats->mean += delta / stats->n;
	stats->M2 += delta*(x - stats->mean);

	if (stats->max < x)
		stats->max = x;
}

static unsigned long nr_stats(struct stats *stats)
{
	return stats->n;
}

static double max_stats(struct stats *stats)
{
	return stats->max;
}

static double avg_stats(struct stats *stats)
{
	return stats->mean;
}

/*
 * http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
 *
 *       (\Sum n_i^2) - ((\Sum n_i)^2)/n
 * s^2 = -------------------------------
 *                  n - 1
 *
 * http://en.wikipedia.org/wiki/Stddev
 */
static double stddev_stats(struct stats *stats)
{
	double variance = stats->M2 / (stats->n - 1);

	return sqrt(variance);
}

/*
 * The std dev of the mean is related to the std dev by:
 *
 *             s
 * s_mean = -------
 *          sqrt(n)
 *
 */
static double stddev_mean_stats(struct stats *stats)
{
	double variance = stats->M2 / (stats->n - 1);
	double variance_mean = variance / stats->n;

	return sqrt(variance_mean);
}

struct stats delay_stats;

static int pipes[MAX_CLIENTS*2][2];

static pid_t app_pid;

#define CLOCKSOURCE		CLOCK_MONOTONIC

struct sem {
	pthread_mutex_t lock;
	pthread_cond_t cond;
	int value;
	int waiters;
};

static void init_sem(struct sem *sem)
{
	pthread_mutexattr_t attr;
	pthread_condattr_t cond;

	pthread_mutexattr_init(&attr);
	pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
	pthread_condattr_init(&cond);
	pthread_condattr_setpshared(&cond, PTHREAD_PROCESS_SHARED);
	pthread_cond_init(&sem->cond, &cond);
	pthread_mutex_init(&sem->lock, &attr);

	sem->value = 0;
	sem->waiters = 0;
}

static void sem_down(struct sem *sem)
{
	pthread_mutex_lock(&sem->lock);

	while (!sem->value) {
		sem->waiters++;
		pthread_cond_wait(&sem->cond, &sem->lock);
		sem->waiters--;
	}

	sem->value--;
	pthread_mutex_unlock(&sem->lock);
}

static void sem_up(struct sem *sem)
{
	pthread_mutex_lock(&sem->lock);
	if (!sem->value && sem->waiters)
		pthread_cond_signal(&sem->cond);
	sem->value++;
	pthread_mutex_unlock(&sem->lock);
}

static int parse_options(int argc, char *argv[])
{
	struct option l_opts[] = {
		{ "min-delay", 	1, 	NULL,	'f' },
		{ "max-delay",	1,	NULL,	't' },
		{ "clients",	1,	NULL,	'c' },
		{ "verbose",	1,	NULL,	'v' }
	};
	int c, res, index = 0;

	while ((c = getopt_long(argc, argv, "f:t:c:v", l_opts, &res)) != -1) {
		index++;
		switch (c) {
			case 'f':
				min_delay = atoi(optarg);
				break;
			case 't':
				max_delay = atoi(optarg);
				break;
			case 'c':
				clients = atoi(optarg);
				if (clients > MAX_CLIENTS)
					clients = MAX_CLIENTS;
				break;
			case 'v':
				verbose = 1;
				break;
		}
	}

	return index + 1;
}

static pid_t fork_off(const char *app)
{
	pid_t pid;

	pid = fork();
	if (pid)
		return pid;

	exit(system(app));
}

static unsigned long usec_since(struct timespec *start, struct timespec *end)
{
	unsigned long long s, e;

	s = start->tv_sec * 1000000000ULL + start->tv_nsec;
	e =   end->tv_sec * 1000000000ULL +   end->tv_nsec;

	return (e - s) / 1000;
}

static void log_delay(unsigned long delay)
{
	if (verbose) {
		fprintf(stderr, "log delay %8lu usec\n", delay);
		fflush(stderr);
	}

	update_stats(&delay_stats, delay);
}

/*
 * Reads a timestamp (which is ignored, it's just a wakeup call), and replies
 * with the timestamp of when we saw it
 */
static void run_child(int *in, int *out, struct sem *sem)
{
	struct timespec ts;

	if (verbose) {
		fprintf(stderr, "present: %d\n", getpid());
		fflush(stderr);
	}

	sem_up(sem);

	do {
		int ret;

		ret = read(in[0], &ts, sizeof(ts));
		if (ret <= 0)
			break;

		if (ret != sizeof(ts))
			printf("bugger3\n");

		clock_gettime(CLOCKSOURCE, &ts);

		ret = write(out[1], &ts, sizeof(ts));
		if (ret <= 0)
			break;

		if (ret != sizeof(ts))
			printf("bugger4\n");

		if (verbose) {
			fprintf(stderr, "alife: %d\n", getpid());
			fflush(stderr);
		}
	} while (1);
}

/*
 * Do a random sleep between min and max delay
 */
static void do_rand_sleep(void)
{
	unsigned int msecs;

	msecs = min_delay + ((float) max_delay * (rand() / (RAND_MAX + 1.0)));
	if (verbose) {
		fprintf(stderr, "sleeping for: %u msec\n", msecs);
		fflush(stderr);
	}
	usleep(msecs * 1000);
}

static void kill_connection(void)
{
	int i;

	for (i = 0; i < 2*clients; i++) {
		if (pipes[i][0] != -1) {
			close(pipes[i][0]);
			pipes[i][0] = -1;
		}
		if (pipes[i][1] != -1) {
			close(pipes[i][1]);
			pipes[i][1] = -1;
		}
	}
}

static int __write_ts(int i, struct timespec *ts)
{
	int fd = pipes[2*i][1];

	clock_gettime(CLOCKSOURCE, ts);

	return write(fd, ts, sizeof(*ts)) != sizeof(*ts);
}

static long __read_ts(int i, struct timespec *ts)
{
	int fd = pipes[2*i+1][0];
	struct timespec t;

	if (read(fd, &t, sizeof(t)) != sizeof(t))
		return -1;

	log_delay(usec_since(ts, &t));

	return 0;
}

static int read_ts(struct pollfd *pfd, unsigned int nr, struct timespec *ts)
{
	unsigned int i;

	for (i = 0; i < clients; i++) {
		if (pfd[i].revents & (POLLERR | POLLHUP | POLLNVAL))
			return -1L;
		if (pfd[i].revents & POLLIN) {
			pfd[i].events = 0;
			if (__read_ts(i, &ts[i]))
				return -1L;
			nr--;
		}
		if (!nr)
			break;
	}

	return 0;
}

static int app_has_exited(void)
{
	int ret, status;

	/*
	 * If our app has exited, stop
	 */
	ret = waitpid(app_pid, &status, WNOHANG);
	if (ret < 0) {
		perror("waitpid");
		return 1;
	} else if (ret == app_pid &&
		   (WIFSIGNALED(status) || WIFEXITED(status))) {
		return 1;
	}

	return 0;
}

/*
 * While our given app is running, send a timestamp to each client and
 * log the maximum latency for each of them to wakeup and reply
 */
static void run_parent(pid_t *cpids)
{
	struct pollfd *ipfd;
	int do_exit = 0, i;
	struct timespec *t1;

	t1 = malloc(sizeof(struct timespec) * clients);
	ipfd = malloc(sizeof(struct pollfd) * clients);

	srand(1234);

	do {
		unsigned long delay;
		unsigned pending_events;

		do_rand_sleep();

		if (app_has_exited())
			break;

		for (i = 0; i < clients; i++) {
			ipfd[i].fd = pipes[2*i+1][0];
			ipfd[i].events = POLLIN;
		}

		/*
		 * Write wakeup calls
		 */
		for (i = 0; i < clients; i++) {
			if (verbose) {
				fprintf(stderr, "waking: %d\n", cpids[i]);
				fflush(stderr);
			}

			if (__write_ts(i, t1+i)) {
				do_exit = 1;
				break;
			}
		}

		if (do_exit)
			break;

		/*
		 * Poll and read replies
		 */
		pending_events = clients;
		while (pending_events) {
			int evts = poll(ipfd, clients, 0);

			if (evts < 0) {
				do_exit = 1;
				break;
			} else if (!evts) {
				/* printf("bugger2\n"); */
				continue;
			}

			if (read_ts(ipfd, evts, t1)) {
				do_exit = 1;
				break;
			}

			pending_events -= evts;
		}
	} while (!do_exit);

	free(t1);
	free(ipfd);
	kill_connection();
}

static void run_test(void)
{
	struct sem *sem;
	pid_t *cpids;
	int i, status;

	sem = mmap(NULL, sizeof(*sem), PROT_READ|PROT_WRITE,
			MAP_SHARED | MAP_ANONYMOUS, 0, 0);
	if (sem == MAP_FAILED) {
		perror("mmap");
		return;
	}

	init_sem(sem);

	for (i = 0; i < 2*clients; i++) {
		if (pipe(pipes[i])) {
			perror("pipe");
			return;
		}
	}

	cpids = malloc(sizeof(pid_t) * clients);

	for (i = 0; i < clients; i++) {
		cpids[i] = fork();
		if (cpids[i]) {
			sem_down(sem);
			continue;
		}

		run_child(pipes[2*i], pipes[2*i+1], sem);
		exit(0);
	}

	run_parent(cpids);

	for (i = 0; i < clients; i++)
		kill(cpids[i], SIGQUIT);
	for (i = 0; i < clients; i++)
		waitpid(cpids[i], &status, 0);

	free(cpids);
	munmap(sem, sizeof(*sem));
}

static void handle_sigint(int sig)
{
	kill(app_pid, SIGINT);
}

int main(int argc, char *argv[])
{
	int app_offset, off;
	char app[256];

	off = 0;
	app_offset = parse_options(argc, argv);
	while (app_offset < argc) {
		if (off) {
			app[off] = ' ';
			off++;
		}
		off += sprintf(app + off, "%s", argv[app_offset]);
		app_offset++;
	}

	signal(SIGINT, handle_sigint);

	/*
	 * Start app and start logging latencies
	 */
	app_pid = fork_off(app);
	run_test();

	printf("Entries: %lu (clients=%d)\n", nr_stats(&delay_stats), clients);
	printf("\nAverages:\n");
	printf("------------------------------\n");
	printf("\tMax\t %8.0f usec\n", max_stats(&delay_stats));
	printf("\tAvg\t %8.0f usec\n", avg_stats(&delay_stats));
	printf("\tStdev\t %8.0f usec\n", stddev_stats(&delay_stats));

	return 0;
}

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 20:57       ` Ingo Molnar
  2009-09-07 23:24         ` Pekka Pietikainen
@ 2009-09-08 15:45         ` Michael Buesch
  1 sibling, 0 replies; 216+ messages in thread
From: Michael Buesch @ 2009-09-08 15:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith, Felix Fietkau

On Monday 07 September 2009 22:57:01 Ingo Molnar wrote:
> 
> * Michael Buesch <mb@bu3sch.de> wrote:
> 
> > On Monday 07 September 2009 20:26:29 Ingo Molnar wrote:
> > > Could you profile it please? Also, what's the context-switch rate?
> > 
> > As far as I can tell, the broadcom mips architecture does not have 
> > profiling support. It does only have some proprietary profiling 
> > registers that nobody wrote kernel support for, yet.
> 
> Well, what does 'vmstat 1' show - how many context switches are 
> there per second on the iperf server? In theory if it's a truly 
> saturated box, there shouldnt be many - just a single iperf task 
> running at 100% CPU utilization or so.
> 
> (Also, if there's hrtimer support for that board then perfcounters 
> could be used to profile it.)

CFS:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0  15892   1684   5868    0    0     0     0  268    6 31 69  0  0
 1  0      0  15892   1684   5868    0    0     0     0  266    2 34 66  0  0
 1  0      0  15892   1684   5868    0    0     0     0  266    6 33 67  0  0
 1  0      0  15892   1684   5868    0    0     0     0  267    4 37 63  0  0
 1  0      0  15892   1684   5868    0    0     0     0  267    6 34 66  0  0
[  4] local 192.168.1.1 port 5001 connected with 192.168.1.99 port 47278
 2  0      0  15756   1684   5868    0    0     0     0 1655   68 26 74  0  0
 2  0      0  15756   1684   5868    0    0     0     0 1945   88 20 80  0  0
 2  0      0  15756   1684   5868    0    0     0     0 1882   85 20 80  0  0
 2  0      0  15756   1684   5868    0    0     0     0 1923   86 18 82  0  0
 2  0      0  15756   1684   5868    0    0     0     0 1986   87 23 77  0  0
 2  0      0  15756   1684   5868    0    0     0     0 1923   87 17 83  0  0
 2  0      0  15756   1684   5868    0    0     0     0 1951   84 19 81  0  0
 2  0      0  15756   1684   5868    0    0     0     0 1970   87 18 82  0  0
 2  0      0  15756   1684   5868    0    0     0     0 1972   85 23 77  0  0
 2  0      0  15756   1684   5868    0    0     0     0 1961   87 18 82  0  0
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  28.6 MBytes  23.9 Mbits/sec
 1  0      0  15752   1684   5868    0    0     0     0  599   22 22 78  0  0
 1  0      0  15752   1684   5868    0    0     0     0  269    4 32 68  0  0
 1  0      0  15752   1684   5868    0    0     0     0  266    4 29 71  0  0
 1  0      0  15764   1684   5868    0    0     0     0  267    6 37 63  0  0
 1  0      0  15764   1684   5868    0    0     0     0  267    4 31 69  0  0
 1  0      0  15768   1684   5868    0    0     0     0  266    4 51 49  0  0


I'm currently unable to test BFS, because the device throws strange flash errors.
Maybe the flash is broken :(

-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-08 12:57 ` Epic regression in throughput since v2.6.23 Serge Belyshev
@ 2009-09-08 17:47   ` Jesse Brandeburg
  2009-09-08 18:20     ` Nikos Chantziaras
  2009-09-08 19:00     ` Jeff Garzik
  2009-09-08 18:37   ` Nikos Chantziaras
  2009-09-08 22:15   ` Serge Belyshev
  2 siblings, 2 replies; 216+ messages in thread
From: Jesse Brandeburg @ 2009-09-08 17:47 UTC (permalink / raw)
  To: Serge Belyshev
  Cc: Ingo Molnar, Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

On Tue, Sep 8, 2009 at 5:57 AM, Serge
Belyshev<belyshev@depni.sinp.msu.ru> wrote:
>
> Hi. I've done measurments of time taken by make -j4 kernel build
> on a quadcore box.  Results are interesting: mainline kernel
> has regressed since v2.6.23 release by more than 10%.

Is this related to why I now have to double the amount of threads X I
pass to make -jX, in order to use all my idle time for a kernel
compile?  I had noticed (without measuring exactly) that it seems with
each kernel released in this series mentioned, I had to increase my
number of worker threads, my common working model now is (cpus * 2) in
order to get zero idle time.

Sorry I haven't tested BFS yet, but am interested to see if it helps
interactivity when playing flash videos on my dual core laptop.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 11:01   ` Frederic Weisbecker
@ 2009-09-08 18:15     ` Nikos Chantziaras
  2009-09-10 20:25       ` Frederic Weisbecker
  0 siblings, 1 reply; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08 18:15 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: linux-kernel, Jens Axboe, Ingo Molnar, Con Kolivas

On 09/07/2009 02:01 PM, Frederic Weisbecker wrote:
> On Mon, Sep 07, 2009 at 06:38:36AM +0300, Nikos Chantziaras wrote:
>> Unfortunately, I can't come up with any way to somehow benchmark all of
>> this.  There's no benchmark for "fluidity" and "responsiveness". Running
>> the Doom 3 benchmark, or any other benchmark, doesn't say anything about
>> responsiveness, it only measures how many frames were calculated in a
>> specific period of time.  How "stable" (with no stalls) those frames were
>> making it to the screen is not measurable.
>
>
> That looks eventually benchmarkable. This is about latency.
> For example, you could try to run high load tasks in the
> background and then launch a task that wakes up in middle/large
> periods to do something. You could measure the time it takes to wake
> it up to perform what it wants.
>
> We have some events tracing infrastructure in the kernel that can
> snapshot the wake up and sched switch events.
>
> Having CONFIG_EVENT_TRACING=y should be sufficient for that.
>
> You just need to mount a debugfs point, say in /debug.
>
> Then you can activate these sched events by doing:
>
> echo 0>  /debug/tracing/tracing_on
> echo 1>  /debug/tracing/events/sched/sched_switch/enable
> echo 1>  /debug/tracing/events/sched/sched_wake_up/enable
>
> #Launch your tasks
>
> echo 1>  /debug/tracing/tracing_on
>
> #Wait for some time
>
> echo 0>  /debug/tracing/tracing_off
>
> That will require some parsing of the result in /debug/tracing/trace
> to get the delays between wake_up events and switch in events
> for the task that periodically wakes up and then produce some
> statistics such as the average or the maximum latency.
>
> That's a bit of a rough approach to measure such latencies but that
> should work.

I've tried this with 2.6.31-rc9 while running mplayer and alt+tabbing 
repeatedly to the point where mplayer starts to stall and drop frames. 
This produced a 4.1MB trace file (132k bzip2'ed):

     http://foss.math.aegean.gr/~realnc/kernel/trace1.bz2

Uncompressed for online viewing:

     http://foss.math.aegean.gr/~realnc/kernel/trace1

I must admit that I don't know what it is I'm looking at :P

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-08 17:47   ` Jesse Brandeburg
@ 2009-09-08 18:20     ` Nikos Chantziaras
  2009-09-08 19:00     ` Jeff Garzik
  1 sibling, 0 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08 18:20 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Serge Belyshev, Ingo Molnar, Con Kolivas, linux-kernel,
	Peter Zijlstra, Mike Galbraith

On 09/08/2009 08:47 PM, Jesse Brandeburg wrote:
>[...]
> Sorry I haven't tested BFS yet, but am interested to see if it helps
> interactivity when playing flash videos on my dual core laptop.

Interactivity: yes (Flash will not result in the rest of the system 
lagging).

Flash videos: they will still play as bad as before.  BFS has no way to 
fix broken code inside Flash :P

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-08 12:57 ` Epic regression in throughput since v2.6.23 Serge Belyshev
  2009-09-08 17:47   ` Jesse Brandeburg
@ 2009-09-08 18:37   ` Nikos Chantziaras
  2009-09-08 22:15   ` Serge Belyshev
  2 siblings, 0 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08 18:37 UTC (permalink / raw)
  To: Serge Belyshev
  Cc: Ingo Molnar, Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

On 09/08/2009 03:57 PM, Serge Belyshev wrote:
>
> Hi. I've done measurments of time taken by make -j4 kernel build
> on a quadcore box.  Results are interesting: mainline kernel
> has regressed since v2.6.23 release by more than 10%.

It seems more people are starting to confirm this issue:

   http://foldingforum.org/viewtopic.php?f=44&t=11336

IMHO it's not *that* dramatic as some people there describe it ("Is it 
the holy grail?") but if something makes your desktop "smooth as silk" 
just like that, it might seem as a holy grail ;)  In any case, there 
clearly seems to be a performance problem with the mainline scheduler on 
many people's desktops that are being solved by BFS.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-08 17:47   ` Jesse Brandeburg
  2009-09-08 18:20     ` Nikos Chantziaras
@ 2009-09-08 19:00     ` Jeff Garzik
  2009-09-08 19:20       ` Serge Belyshev
  1 sibling, 1 reply; 216+ messages in thread
From: Jeff Garzik @ 2009-09-08 19:00 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Serge Belyshev, Ingo Molnar, Con Kolivas, linux-kernel,
	Peter Zijlstra, Mike Galbraith

On 09/08/2009 01:47 PM, Jesse Brandeburg wrote:
> On Tue, Sep 8, 2009 at 5:57 AM, Serge
> Belyshev<belyshev@depni.sinp.msu.ru>  wrote:
>>
>> Hi. I've done measurments of time taken by make -j4 kernel build
>> on a quadcore box.  Results are interesting: mainline kernel
>> has regressed since v2.6.23 release by more than 10%.
>
> Is this related to why I now have to double the amount of threads X I
> pass to make -jX, in order to use all my idle time for a kernel
> compile?  I had noticed (without measuring exactly) that it seems with
> each kernel released in this series mentioned, I had to increase my
> number of worker threads, my common working model now is (cpus * 2) in
> order to get zero idle time.

You will almost certainly see idle CPUs/threads with "make -jN_CPUS" due 
to processes waiting for I/O.

If you're curious, there is also room for experimenting with make's "-l" 
argument, which caps the number of jobs based on load average rather 
than a static number of job slots.

	Jeff



^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 11:35                   ` Ingo Molnar
@ 2009-09-08 19:06                     ` Nikos Chantziaras
  0 siblings, 0 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08 19:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Pietikainen, Michael Buesch, Con Kolivas, linux-kernel,
	Peter Zijlstra, Mike Galbraith, Felix Fietkau

On 09/08/2009 02:35 PM, Ingo Molnar wrote:
>
> * Nikos Chantziaras<realnc@arcor.de>  wrote:
>
>> [...] That would mean that you (or anyone else with an interest of
>> tracking this down) would follow the examples given (by me and
>> others, like enabling desktop compositing, firing up mplayer with
>> a video and generally reproducing this using the quite detailed
>> steps I posted as a recipe).
>
> Could you follow up on Frederic's detailed tracing suggestions that
> would give us the source of the latency?

I've set it up and ran the tests now.


> ( Also, as per lkml etiquette, please try to keep the Cc: list
>    intact when replying to emails. I missed your first reply
>    that you un-Cc:-ed. )

Sorry for that.


> A quick look at the latencytop output suggests a scheduling latency.
> Could you send me the kernel .config that you are using?

That would be this one:

     http://foss.math.aegean.gr/~realnc/kernel/config-2.6.31-rc9

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-08 19:00     ` Jeff Garzik
@ 2009-09-08 19:20       ` Serge Belyshev
  2009-09-08 19:26         ` Jeff Garzik
  0 siblings, 1 reply; 216+ messages in thread
From: Serge Belyshev @ 2009-09-08 19:20 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Jesse Brandeburg, Ingo Molnar, Con Kolivas, linux-kernel,
	Peter Zijlstra, Mike Galbraith

Jeff Garzik <jeff@garzik.org> writes:

> You will almost certainly see idle CPUs/threads with "make -jN_CPUS"
> due to processes waiting for I/O.

Just to clarify: I have excluded all I/O effects in my plots completely
by building completely from tmpfs. Also, before each actual measurment
there was a thrown-off "pre-caching" one.  And my box has 8GB of RAM.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-08 19:20       ` Serge Belyshev
@ 2009-09-08 19:26         ` Jeff Garzik
  0 siblings, 0 replies; 216+ messages in thread
From: Jeff Garzik @ 2009-09-08 19:26 UTC (permalink / raw)
  To: Serge Belyshev
  Cc: Jesse Brandeburg, Ingo Molnar, Con Kolivas, linux-kernel,
	Peter Zijlstra, Mike Galbraith

On 09/08/2009 03:20 PM, Serge Belyshev wrote:
> Jeff Garzik<jeff@garzik.org>  writes:
>
>> You will almost certainly see idle CPUs/threads with "make -jN_CPUS"
>> due to processes waiting for I/O.
>
> Just to clarify: I have excluded all I/O effects in my plots completely
> by building completely from tmpfs. Also, before each actual measurment
> there was a thrown-off "pre-caching" one.  And my box has 8GB of RAM.

You could always one-up that by using ramfs ;)

	Jeff




^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08  8:31       ` Arjan van de Ven
@ 2009-09-08 20:22         ` Frans Pop
  2009-09-08 21:10           ` Michal Schmidt
                             ` (2 more replies)
  0 siblings, 3 replies; 216+ messages in thread
From: Frans Pop @ 2009-09-08 20:22 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: realnc, linux-kernel

Arjan van de Ven wrote:
> the latest version of latencytop also has a GUI (thanks to Ben)

That looks nice, but...

I kind of miss the split screen feature where latencytop would show both 
the overall figures + the ones for the currently most affected task. 
Downside of that last was that I never managed to keep the display on a 
specific task.

The graphical display also makes it impossible to simply copy and paste 
the results.

Having the freeze button is nice though.

Would it be possible to have a command line switch that allows to start 
the old textual mode?

Looks like the man page needs updating too :-)

Cheers,
FJP

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 15:23             ` Peter Zijlstra
@ 2009-09-08 20:34               ` Jens Axboe
  2009-09-09  6:13                 ` Ingo Molnar
  2009-09-09 11:52               ` BFS vs. mainline scheduler benchmarks and measurements Nikos Chantziaras
  1 sibling, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-08 20:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Con Kolivas, linux-kernel, Mike Galbraith

On Tue, Sep 08 2009, Peter Zijlstra wrote:
> On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
> > And here's a newer version.
> 
> I tinkered a bit with your proglet and finally found the problem.
> 
> You used a single pipe per child, this means the loop in run_child()
> would consume what it just wrote out until it got force preempted by the
> parent which would also get woken.
> 
> This results in the child spinning a while (its full quota) and only
> reporting the last timestamp to the parent.

Oh doh, that's not well thought out. Well it was a quick hack :-)
Thanks for the fixup, now it's at least usable to some degree.

> Since consumer (parent) is a single thread the program basically
> measures the worst delay in a thundering herd wakeup of N children.

Yes, it's really meant to measure how long it takes to wake a group of
processes, assuming that this is where things fall down on the 'box
loaded, switch desktop' case. Now whether that's useful or not or
whether this test app is worth the bits it takes up on the hard drive,
is another question.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 20:22         ` Frans Pop
@ 2009-09-08 21:10           ` Michal Schmidt
  2009-09-08 21:11           ` Frans Pop
  2009-09-09  9:53           ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 216+ messages in thread
From: Michal Schmidt @ 2009-09-08 21:10 UTC (permalink / raw)
  To: Frans Pop; +Cc: Arjan van de Ven, realnc, linux-kernel

Dne Tue, 8 Sep 2009 22:22:43 +0200
Frans Pop <elendil@planet.nl> napsal(a):
> Would it be possible to have a command line switch that allows to
> start the old textual mode?

I use:
DISPLAY= latencytop

:-)
Michal

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 20:22         ` Frans Pop
  2009-09-08 21:10           ` Michal Schmidt
@ 2009-09-08 21:11           ` Frans Pop
  2009-09-08 21:40             ` GeunSik Lim
  2009-09-09  9:53           ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 216+ messages in thread
From: Frans Pop @ 2009-09-08 21:11 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: realnc, linux-kernel

On Tuesday 08 September 2009, Frans Pop wrote:
> Arjan van de Ven wrote:
> > the latest version of latencytop also has a GUI (thanks to Ben)
>
> That looks nice, but...
>
> I kind of miss the split screen feature where latencytop would show
> both the overall figures + the ones for the currently most affected
> task. Downside of that last was that I never managed to keep the
> display on a specific task.
[...]
> Would it be possible to have a command line switch that allows to start
> the old textual mode?

I got a private reply suggesting that --nogui might work, and it does.
Thanks a lot Nikos!

> Looks like the man page needs updating too :-)

So this definitely needs attention :-P
Support of the standard -h and --help options would be great too.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 12:03           ` Theodore Tso
@ 2009-09-08 21:28             ` Nikos Chantziaras
  0 siblings, 0 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08 21:28 UTC (permalink / raw)
  To: Theodore Tso, Arjan van de Ven, linux-kernel

On 09/08/2009 03:03 PM, Theodore Tso wrote:
> On Tue, Sep 08, 2009 at 01:13:34PM +0300, Nikos Chantziaras wrote:
>>> despite the untranslated content, it is clear that you have scheduler
>>> delays (either due to scheduler bugs or cpu contention) of upto 68
>>> msecs... Second in line is your binary AMD graphics driver that is
>>> chewing up 14% of your total latency...
>>
>> I've now used a correctly installed and up-to-date version of latencytop
>> and repeated the test.  Also, I got rid of AMD's binary blob and used
>> kernel DRM drivers for my graphics card to throw fglrx out of the
>> equation (which btw didn't help; the exact same problems occur).
>>
>> Here the result:
>>
>>      http://foss.math.aegean.gr/~realnc/pics/latop2.png
>
> This was with an unmodified 2.6.31-rcX kernel?

Yes (-rc9).  I also tested with 2.6.30.5 and getting the same results.


> Does Latencytop do anything useful on a BFS-patched kernel?

Nope.  BFS does not support any form of tracing yet.  latencytop runs 
but only shows a blank list.  All I can say is that a BFS patched kernel 
with the same .config fixes all visible latency issues.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 21:11           ` Frans Pop
@ 2009-09-08 21:40             ` GeunSik Lim
  2009-09-08 22:36               ` Frans Pop
  0 siblings, 1 reply; 216+ messages in thread
From: GeunSik Lim @ 2009-09-08 21:40 UTC (permalink / raw)
  To: Frans Pop; +Cc: Arjan van de Ven, realnc, linux-kernel

On Wed, Sep 9, 2009 at 6:11 AM, Frans Pop<elendil@planet.nl> wrote:
>> Would it be possible to have a command line switch that allows to start
>> the old textual mode?
> I got a private reply suggesting that --nogui might work, and it does.
Um. You means that you tested with runlevel 3(multiuser mode). Is it right?
Frans. Can you share me your linux distribution for this test?
I want to check with same conditions(e.g:linux distribution like
fedora 11,ubuntu9.04 , runlevel, and so on.).
> Thanks a lot Nikos!
>> Looks like the man page needs updating too :-)
> So this definitely needs attention :-P
> Support of the standard -h and --help options would be great too.
> Cheers,
> FJP
> --

Thanks,
GeunSik Lim.



-- 
Regards,
GeunSik Lim ( Samsung Electronics )
Blog : http://blog.naver.com/invain/
e-Mail: geunsik.lim@samsung.com
           leemgs@gmail.com , leemgs1@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 11:32           ` Juergen Beisert
@ 2009-09-08 22:00             ` Nikos Chantziaras
  2009-09-08 23:20               ` Jiri Kosina
  0 siblings, 1 reply; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08 22:00 UTC (permalink / raw)
  To: Juergen Beisert; +Cc: linux-kernel, Arjan van de Ven

On 09/08/2009 02:32 PM, Juergen Beisert wrote:
> On Dienstag, 8. September 2009, Nikos Chantziaras wrote:
>> On 09/08/2009 11:38 AM, Arjan van de Ven wrote:
>>> On Tue, 08 Sep 2009 10:19:06 +0300
>>>
>>> Nikos Chantziaras<realnc@arcor.de>   wrote:
>>>> latencytop has this to say:
>>>>
>>>>      http://foss.math.aegean.gr/~realnc/pics/latop1.png
>>>>
>>>> Though I don't really understand what this tool is trying to tell me,
>>>> I hope someone does.
>>>
>>> despite the untranslated content, it is clear that you have scheduler
>>> delays (either due to scheduler bugs or cpu contention) of upto 68
>>> msecs... Second in line is your binary AMD graphics driver that is
>>> chewing up 14% of your total latency...
>>
>> I've now used a correctly installed and up-to-date version of latencytop
>> and repeated the test.  Also, I got rid of AMD's binary blob and used
>> kernel DRM drivers for my graphics card to throw fglrx out of the
>> equation (which btw didn't help; the exact same problems occur).
>>
>> Here the result:
>>
>>       http://foss.math.aegean.gr/~realnc/pics/latop2.png
>>
>> Again: this is on an Intel Core 2 Duo CPU.
>
> Just an idea: Maybe some system management code hits you?

I'm not sure what is meant with "system management code."

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-08 12:57 ` Epic regression in throughput since v2.6.23 Serge Belyshev
  2009-09-08 17:47   ` Jesse Brandeburg
  2009-09-08 18:37   ` Nikos Chantziaras
@ 2009-09-08 22:15   ` Serge Belyshev
  2009-09-09 15:52     ` Ingo Molnar
  2 siblings, 1 reply; 216+ messages in thread
From: Serge Belyshev @ 2009-09-08 22:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

Serge Belyshev <belyshev@depni.sinp.msu.ru> writes:
>[snip]

I've updated the graphs, added kernels 2.6.24..2.6.29:
http://img186.imageshack.us/img186/7029/epicmakej4.png

And added comparison with best-performing 2.6.23 kernel:
http://img34.imageshack.us/img34/7563/epicbfstips.png

>
> Conclusions are
> 1) mainline has severely regressed since v2.6.23
> 2) BFS shows optimal performance at make -jN where N equals number of
>    h/w threads, while current mainline scheduler performance is far from
>    optimal in this case.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 21:40             ` GeunSik Lim
@ 2009-09-08 22:36               ` Frans Pop
  0 siblings, 0 replies; 216+ messages in thread
From: Frans Pop @ 2009-09-08 22:36 UTC (permalink / raw)
  To: GeunSik Lim; +Cc: Arjan van de Ven, realnc, linux-kernel

On Tuesday 08 September 2009, you wrote:
> On Wed, Sep 9, 2009 at 6:11 AM, Frans Pop<elendil@planet.nl> wrote:
> >> Would it be possible to have a command line switch that allows to
> >> start the old textual mode?
> >
> > I got a private reply suggesting that --nogui might work, and it
> > does.
>
> Um. You means that you tested with runlevel 3(multiuser mode). Is it
> right? Frans. Can you share me your linux distribution for this test? I
> want to check with same conditions(e.g:linux distribution like fedora
> 11,ubuntu9.04 , runlevel, and so on.).

I ran it from KDE's konsole by just entering 'sudo latencytop --nogui' at 
the command prompt.

Distro is Debian stable ("Lenny"), which does not have differences between 
runlevels: by default they all start a desktop environment (if a display 
manager like xdm/kdm/gdm is installed). But if you really want to know, 
the runlevel was 2 ;-)

Cheers,
FJP

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 14:20           ` Arjan van de Ven
@ 2009-09-08 22:53             ` Nikos Chantziaras
  0 siblings, 0 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08 22:53 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, mingo

On 09/08/2009 05:20 PM, Arjan van de Ven wrote:
> On Tue, 08 Sep 2009 13:13:34 +0300
> Nikos Chantziaras<realnc@arcor.de>  wrote:
>
>> On 09/08/2009 11:38 AM, Arjan van de Ven wrote:
>>> On Tue, 08 Sep 2009 10:19:06 +0300
>>> Nikos Chantziaras<realnc@arcor.de>   wrote:
>>>
>>>> latencytop has this to say:
>>>>
>>>>      http://foss.math.aegean.gr/~realnc/pics/latop1.png
>>>>
>>>> Though I don't really understand what this tool is trying to tell
>>>> me, I hope someone does.
>>>
>>> despite the untranslated content, it is clear that you have
>>> scheduler delays (either due to scheduler bugs or cpu contention)
>>> of upto 68 msecs... Second in line is your binary AMD graphics
>>> driver that is chewing up 14% of your total latency...
>>
>> I've now used a correctly installed and up-to-date version of
>> latencytop and repeated the test.  Also, I got rid of AMD's binary
>> blob and used kernel DRM drivers for my graphics card to throw fglrx
>> out of the equation (which btw didn't help; the exact same problems
>> occur).
>>
>> Here the result:
>>
>>       http://foss.math.aegean.gr/~realnc/pics/latop2.png
>>
>> Again: this is on an Intel Core 2 Duo CPU.
>
>
> so we finally have objective numbers!
>
> now the interesting part is also WHERE the latency hits. Because
> fundamentally, if you oversubscribe the CPU, you WILL get scheduling
> latency.. simply you have more to run than there is CPU.

Sounds plausible.  However, with mainline this latency is very, very 
noticeable.  With BFS I need to look really hard to detect it or do 
outright silly things, like a "make -j50".  (At first I wrote "-j20" 
here but then went ahead an tested it just for kicks, and BFS would 
still let me use the GUI smoothly, LOL.  So then I corrected it to 
"-j50"...)

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 22:00             ` Nikos Chantziaras
@ 2009-09-08 23:20               ` Jiri Kosina
  2009-09-08 23:38                 ` Nikos Chantziaras
  0 siblings, 1 reply; 216+ messages in thread
From: Jiri Kosina @ 2009-09-08 23:20 UTC (permalink / raw)
  To: Nikos Chantziaras; +Cc: Juergen Beisert, linux-kernel, Arjan van de Ven

On Wed, 9 Sep 2009, Nikos Chantziaras wrote:

> > > Here the result:
> > > 
> > >       http://foss.math.aegean.gr/~realnc/pics/latop2.png
> > > 
> > > Again: this is on an Intel Core 2 Duo CPU.
> > 
> > Just an idea: Maybe some system management code hits you?
> 
> I'm not sure what is meant with "system management code."

System management interrupt happens when firmware/BIOS/HW-debugger is 
executed in privilege mode so high, that even OS can't do anything about 
that.

It is used in many situations, such as

- memory errors
- ACPI (mostly fan control)
- TPM

OS has small to none possibility to influence SMI/SMM. But if this would 
be the cause, you should probably obtain completely different results on 
different hardware configuration (as it is likely to have completely 
different SMM behavior).

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 23:20               ` Jiri Kosina
@ 2009-09-08 23:38                 ` Nikos Chantziaras
  0 siblings, 0 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-08 23:38 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Juergen Beisert, linux-kernel, Arjan van de Ven

On 09/09/2009 02:20 AM, Jiri Kosina wrote:
> On Wed, 9 Sep 2009, Nikos Chantziaras wrote:
>
>>>> Here the result:
>>>>
>>>>        http://foss.math.aegean.gr/~realnc/pics/latop2.png
>>>>
>>>> Again: this is on an Intel Core 2 Duo CPU.
>>>
>>> Just an idea: Maybe some system management code hits you?
>>
>> I'm not sure what is meant with "system management code."
>
> System management interrupt happens when firmware/BIOS/HW-debugger is
> executed in privilege mode so high, that even OS can't do anything about
> that.
>
> It is used in many situations, such as
>
> - memory errors
> - ACPI (mostly fan control)
> - TPM
>
> OS has small to none possibility to influence SMI/SMM. But if this would
> be the cause, you should probably obtain completely different results on
> different hardware configuration (as it is likely to have completely
> different SMM behavior).

Wouldn't that mean that a BFS-patched kernel would suffer from this too?

In any case, of the above, only fan control is active, and I've run with 
it disabled on occasion (hot summer days, I wanted to just keep it max 
with no fan control) with no change.  As far as I can tell, the Asus P5E 
doesn't have a TPM (the "Deluxe" and "VM" models seem to have one.)  As 
for memory errors, I use unbuffered non-ECC RAM which passes a 
memtest86+ cycle cleanly (well, at least the last time I ran it through 
one, a few months ago.)

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 13:09         ` Felix Fietkau
@ 2009-09-09  0:28           ` Benjamin Herrenschmidt
  2009-09-09  0:37             ` David Miller
  0 siblings, 1 reply; 216+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-09  0:28 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Ingo Molnar, Michael Buesch, Con Kolivas, linux-kernel,
	Peter Zijlstra, Mike Galbraith

> The TLB is SW loaded, yes. However it should not do any misses on kernel
> space, since the whole segment is in a wired TLB entry.

Including vmalloc space ?

Ben.



^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  0:28           ` Benjamin Herrenschmidt
@ 2009-09-09  0:37             ` David Miller
  0 siblings, 0 replies; 216+ messages in thread
From: David Miller @ 2009-09-09  0:37 UTC (permalink / raw)
  To: benh; +Cc: nbd, mingo, mb, kernel, linux-kernel, a.p.zijlstra, efault

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Wed, 09 Sep 2009 10:28:22 +1000

>> The TLB is SW loaded, yes. However it should not do any misses on kernel
>> space, since the whole segment is in a wired TLB entry.
> 
> Including vmalloc space ?

No, MIPS does take SW tlb misses on vmalloc space. :-)


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 13:09         ` Ralf Baechle
@ 2009-09-09  1:36           ` Felix Fietkau
  0 siblings, 0 replies; 216+ messages in thread
From: Felix Fietkau @ 2009-09-09  1:36 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Benjamin Herrenschmidt, Ingo Molnar, Michael Buesch, Con Kolivas,
	linux-kernel, Peter Zijlstra, Mike Galbraith

Ralf Baechle wrote:
>> I remember at some stage we spotted an expensive multiply in there,
>> maybe there's something similar, or some unaligned or non-cache friendly
>> vs. the MIPS cache line size data structure, that sort of thing ...
>> 
>> Is this a SW loaded TLB ? Does it misses on kernel space ? That could
>> also be some differences in how many pages are touched by each scheduler
>> causing more TLB pressure. This will be mostly invisible on x86.
> 
> Software refilled.  No misses ever for kernel space or low-mem; think of
> it as low-mem and kernel executable living in a 512MB page that is mapped
> by a mechanism outside the TLB.  Vmalloc ranges are TLB mapped.  Ioremap
> address ranges only if above physical address 512MB.
> 
> An emulated unaligned load/store is very expensive; one that is encoded
> properly by GCC for __attribute__((packed)) is only 1 cycle and 1
> instruction ( = 4 bytes) extra.
CFS definitely isn't causing any emulated unaligned load/stores on these
devices, we've tested that.

- Felix

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [quad core results] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07 13:59       ` Ingo Molnar
@ 2009-09-09  5:54         ` Markus Tornqvist
  0 siblings, 0 replies; 216+ messages in thread
From: Markus Tornqvist @ 2009-09-09  5:54 UTC (permalink / raw)
  To: linux-kernel

Let's test gmane's followup feature ;)

Ingo Molnar <mingo <at> elte.hu> writes:

> > Please Cc me as I'm not a subscriber.
> > >  kernel build performance on quad:
> > >     http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild-quad.jpg
> > [...]
> > >
> > >It shows similar curves and behavior to the 8-core results i posted 
> > >- BFS is slower than mainline in virtually every measurement. The 
> > >ratios are different for different parts of the graphs - but the 
> > >trend is similar.
> > 
> > Dude, not cool.
> > 
> > 1. Quad HT is not the same as a 4-core desktop, you're doing it with 8 cores
> 
> No, it's 4 cores. HyperThreading adds two 'siblings' per core, which 
> are not 'cores'.

Like Serge Belyshev says in
http://article.gmane.org/gmane.linux.kernel/886881
and Con thanks you inthe FAQ for confiming it:
"h/w threads" - My Sempron II X4 lists four of those, and it seems
to be a common setup.
 
> > 2. You just proved BFS is better on the job_count == core_count case, as BFS
> >    says it is, if you look at the graph
> 
> I pointed that out too. I think the graphs speak for themselves:
> 
>      http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild-quad.jpg
>      http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild.jpg

Those are in alignment with the FAQ, for the hardware threads.

Mr Belyshev's benchmarks are closer to a common desktop and they rock
over CFS.

That's also something that IMO "we" forgot here: it doesn't really matter!

BFS is not up for merging, it feels way better than CFS on the desktop
and it does not scale.

This thread can be about improving CFS, I do not care personally, and
will stay out of that discussion.

> There's bfs-209 out there today. These tests take 8+ hours to 
> complete and validate. I'll re-test BFS in the future too, and as i 
> said it in the first mail i'll test it on a .31 base as well once 
> BFS has been ported to it:

Apropos your tests, under which circumstances would I have a million
piped messages on my desktop?

Would you care to comment on the relevance of your other tests from
a desktop point of view?

Fortunately you got help from the community as posted on the list.

> > Also, you said on http://article.gmane.org/gmane.linux.kernel/886319
> > "I also tried to configure the kernel in a BFS friendly way, i used 
> > HZ=1000 as recommended, turned off all debug options, etc. The 
> > kernel config i used can be found here:
> > http://redhat.com/~mingo/misc/config
> > "
> > 
> > Quickly looking at the conf you have
> > CONFIG_HZ_250=y
> > CONFIG_PREEMPT_NONE=y
> > # CONFIG_PREEMPT_VOLUNTARY is not set
> > # CONFIG_PREEMPT is not set
> 
> Indeed. HZ does not seem to matter according to what i see in my 
> measurements. Can you measure such sensitivity?

Hardly the point - You said one thing and got caught with something else,
which doesn't give a credible image.

Can I measure it? IANAKH, and I think there are people more passionate
here to run benchmark scripts and endless analyses.

All I can "measure" is that my desktop experience isn't stuttery and jittery
with basic stuff like scrolling over Firefox tabs with my mouse wheel
while watching pr0n.
 
> > CONFIG_ARCH_WANT_FRAME_POINTERS=y
> > CONFIG_FRAME_POINTER=y
> > 
> > And other DEBUG.
> 
> These are the defaults and they dont make a measurable difference to 
> these results. What other debug options do you mean and do they make 
> a difference?

Don't care as long as your kernel comparisons truly were with equivalent
settings to each other.

Köszönöm.

-- 
mjt



^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 20:34               ` Jens Axboe
@ 2009-09-09  6:13                 ` Ingo Molnar
  2009-09-09  8:34                   ` Nikos Chantziaras
  2009-09-09  8:52                   ` Mike Galbraith
  0 siblings, 2 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-09  6:13 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Peter Zijlstra, Con Kolivas, linux-kernel, Mike Galbraith


* Jens Axboe <jens.axboe@oracle.com> wrote:

> On Tue, Sep 08 2009, Peter Zijlstra wrote:
> > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
> > > And here's a newer version.
> > 
> > I tinkered a bit with your proglet and finally found the 
> > problem.
> > 
> > You used a single pipe per child, this means the loop in 
> > run_child() would consume what it just wrote out until it got 
> > force preempted by the parent which would also get woken.
> > 
> > This results in the child spinning a while (its full quota) and 
> > only reporting the last timestamp to the parent.
> 
> Oh doh, that's not well thought out. Well it was a quick hack :-) 
> Thanks for the fixup, now it's at least usable to some degree.

What kind of latencies does it report on your box?

Our vanilla scheduler default latency targets are:

  single-core: 20 msecs
    dual-core: 40 msecs
    quad-core: 60 msecs
    opto-core: 80 msecs

You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via 
/proc/sys/kernel/sched_latency_ns:

   echo 10000000 > /proc/sys/kernel/sched_latency_ns

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-07  9:49 ` Jens Axboe
                     ` (2 preceding siblings ...)
  2009-09-07 18:02   ` Avi Kivity
@ 2009-09-09  7:38   ` Pavel Machek
  2009-09-10 12:19     ` latt location (Was Re: BFS vs. mainline scheduler benchmarks and measurements) Jens Axboe
  3 siblings, 1 reply; 216+ messages in thread
From: Pavel Machek @ 2009-09-09  7:38 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ingo Molnar, Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

Hi!

> > So ... to get to the numbers - i've tested both BFS and the tip of 
> > the latest upstream scheduler tree on a testbox of mine. I 
> > intentionally didnt test BFS on any really large box - because you 
> > described its upper limit like this in the announcement:
> 
> I ran a simple test as well, since I was curious to see how it performed
> wrt interactiveness. One of my pet peeves with the current scheduler is
> that I have to nice compile jobs, or my X experience is just awful while
> the compile is running.
> 
> Now, this test case is something that attempts to see what
> interactiveness would be like. It'll run a given command line while at
> the same time logging delays. The delays are measured as follows:
> 
> - The app creates a pipe, and forks a child that blocks on reading from
>   that pipe.
> - The app sleeps for a random period of time, anywhere between 100ms
>   and 2s. When it wakes up, it gets the current time and writes that to
>   the pipe.
> - The child then gets woken, checks the time on its own, and logs the
>   difference between the two.
> 
> The idea here being that the delay between writing to the pipe and the
> child reading the data and comparing should (in some way) be indicative
> of how responsive the system would seem to a user.
> 
> The test app was quickly hacked up, so don't put too much into it. The
> test run is a simple kernel compile, using -jX where X is the number of
> threads in the system. The files are cache hot, so little IO is done.
> The -x2 run is using the double number of processes as we have threads,
> eg -j128 on a 64 thread box.

Could you post the source? Someone else might get us
numbers... preferably on dualcore box or something...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  6:13                 ` Ingo Molnar
@ 2009-09-09  8:34                   ` Nikos Chantziaras
  2009-09-09  8:52                   ` Mike Galbraith
  1 sibling, 0 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-09  8:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jens Axboe, Peter Zijlstra, Con Kolivas, linux-kernel, Mike Galbraith

On 09/09/2009 09:13 AM, Ingo Molnar wrote:
>
> * Jens Axboe<jens.axboe@oracle.com>  wrote:
>
>> On Tue, Sep 08 2009, Peter Zijlstra wrote:
>>> On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
>>>> And here's a newer version.
>>>
>>> I tinkered a bit with your proglet and finally found the
>>> problem.
>>>
>>> You used a single pipe per child, this means the loop in
>>> run_child() would consume what it just wrote out until it got
>>> force preempted by the parent which would also get woken.
>>>
>>> This results in the child spinning a while (its full quota) and
>>> only reporting the last timestamp to the parent.
>>
>> Oh doh, that's not well thought out. Well it was a quick hack :-)
>> Thanks for the fixup, now it's at least usable to some degree.
>
> What kind of latencies does it report on your box?
>
> Our vanilla scheduler default latency targets are:
>
>    single-core: 20 msecs
>      dual-core: 40 msecs
>      quad-core: 60 msecs
>      opto-core: 80 msecs
>
> You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via
> /proc/sys/kernel/sched_latency_ns:
>
>     echo 10000000>  /proc/sys/kernel/sched_latency_ns

I've tried values ranging from 10000000 down to 100000.  This results in 
the stalls/freezes being a bit shorter, but clearly still there.  It 
does not eliminate them.

If there's anything else I can try/test, I would be happy to do so.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  6:13                 ` Ingo Molnar
  2009-09-09  8:34                   ` Nikos Chantziaras
@ 2009-09-09  8:52                   ` Mike Galbraith
  2009-09-09  9:02                     ` Peter Zijlstra
                                       ` (5 more replies)
  1 sibling, 6 replies; 216+ messages in thread
From: Mike Galbraith @ 2009-09-09  8:52 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Jens Axboe, Peter Zijlstra, Con Kolivas, linux-kernel

On Wed, 2009-09-09 at 08:13 +0200, Ingo Molnar wrote:
> * Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Tue, Sep 08 2009, Peter Zijlstra wrote:
> > > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
> > > > And here's a newer version.
> > > 
> > > I tinkered a bit with your proglet and finally found the 
> > > problem.
> > > 
> > > You used a single pipe per child, this means the loop in 
> > > run_child() would consume what it just wrote out until it got 
> > > force preempted by the parent which would also get woken.
> > > 
> > > This results in the child spinning a while (its full quota) and 
> > > only reporting the last timestamp to the parent.
> > 
> > Oh doh, that's not well thought out. Well it was a quick hack :-) 
> > Thanks for the fixup, now it's at least usable to some degree.
> 
> What kind of latencies does it report on your box?
> 
> Our vanilla scheduler default latency targets are:
> 
>   single-core: 20 msecs
>     dual-core: 40 msecs
>     quad-core: 60 msecs
>     opto-core: 80 msecs
> 
> You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via 
> /proc/sys/kernel/sched_latency_ns:
> 
>    echo 10000000 > /proc/sys/kernel/sched_latency_ns

He would also need to lower min_granularity, otherwise, it'd be larger
than the whole latency target.

I'm testing right now, and one thing that is definitely a problem is the
amount of sleeper fairness we're giving.  A full latency is just too
much short term fairness in my testing.  While sleepers are catching up,
hogs languish.  That's the biggest issue going on.

I've also been doing some timings of make -j4 (looking at idle time),
and find that child_runs_first is mildly detrimental to fork/exec load,
as are buddies.

I'm running with the below at the moment.  (the kthread/workqueue thing
is just because I don't see any reason for it to exist, so consider it
to be a waste of perfectly good math;)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 6ec4643..a44210e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -16,8 +16,6 @@
 #include <linux/mutex.h>
 #include <trace/events/sched.h>
 
-#define KTHREAD_NICE_LEVEL (-5)
-
 static DEFINE_SPINLOCK(kthread_create_lock);
 static LIST_HEAD(kthread_create_list);
 
@@ -150,7 +148,6 @@ struct task_struct *kthread_create(int (*threadfn)(void *data),
 		 * The kernel thread should not inherit these properties.
 		 */
 		sched_setscheduler_nocheck(create.result, SCHED_NORMAL, &param);
-		set_user_nice(create.result, KTHREAD_NICE_LEVEL);
 		set_cpus_allowed_ptr(create.result, cpu_all_mask);
 	}
 	return create.result;
@@ -226,7 +223,6 @@ int kthreadd(void *unused)
 	/* Setup a clean context for our children to inherit. */
 	set_task_comm(tsk, "kthreadd");
 	ignore_signals(tsk);
-	set_user_nice(tsk, KTHREAD_NICE_LEVEL);
 	set_cpus_allowed_ptr(tsk, cpu_all_mask);
 	set_mems_allowed(node_possible_map);
 
diff --git a/kernel/sched.c b/kernel/sched.c
index c512a02..e68c341 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -7124,33 +7124,6 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
  */
 cpumask_var_t nohz_cpu_mask;
 
-/*
- * Increase the granularity value when there are more CPUs,
- * because with more CPUs the 'effective latency' as visible
- * to users decreases. But the relationship is not linear,
- * so pick a second-best guess by going with the log2 of the
- * number of CPUs.
- *
- * This idea comes from the SD scheduler of Con Kolivas:
- */
-static inline void sched_init_granularity(void)
-{
-	unsigned int factor = 1 + ilog2(num_online_cpus());
-	const unsigned long limit = 200000000;
-
-	sysctl_sched_min_granularity *= factor;
-	if (sysctl_sched_min_granularity > limit)
-		sysctl_sched_min_granularity = limit;
-
-	sysctl_sched_latency *= factor;
-	if (sysctl_sched_latency > limit)
-		sysctl_sched_latency = limit;
-
-	sysctl_sched_wakeup_granularity *= factor;
-
-	sysctl_sched_shares_ratelimit *= factor;
-}
-
 #ifdef CONFIG_SMP
 /*
  * This is how migration works:
@@ -9356,7 +9329,6 @@ void __init sched_init_smp(void)
 	/* Move init over to a non-isolated CPU */
 	if (set_cpus_allowed_ptr(current, non_isolated_cpus) < 0)
 		BUG();
-	sched_init_granularity();
 	free_cpumask_var(non_isolated_cpus);
 
 	alloc_cpumask_var(&fallback_doms, GFP_KERNEL);
@@ -9365,7 +9337,6 @@ void __init sched_init_smp(void)
 #else
 void __init sched_init_smp(void)
 {
-	sched_init_granularity();
 }
 #endif /* CONFIG_SMP */
 
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index e386e5d..ff7fec9 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -51,7 +51,7 @@ static unsigned int sched_nr_latency = 5;
  * After fork, child runs first. (default) If set to 0 then
  * parent will (try to) run first.
  */
-const_debug unsigned int sysctl_sched_child_runs_first = 1;
+const_debug unsigned int sysctl_sched_child_runs_first = 0;
 
 /*
  * sys_sched_yield() compat mode
@@ -713,7 +713,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 	if (!initial) {
 		/* sleeps upto a single latency don't count. */
 		if (sched_feat(NEW_FAIR_SLEEPERS)) {
-			unsigned long thresh = sysctl_sched_latency;
+			unsigned long thresh = sysctl_sched_min_granularity;
 
 			/*
 			 * Convert the sleeper threshold into virtual time.
@@ -1502,7 +1502,8 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int sync)
 	 */
 	if (sched_feat(LAST_BUDDY) && likely(se->on_rq && curr != rq->idle))
 		set_last_buddy(se);
-	set_next_buddy(pse);
+	if (sched_feat(NEXT_BUDDY))
+		set_next_buddy(pse);
 
 	/*
 	 * We can come here with TIF_NEED_RESCHED already set from new task
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index 4569bfa..85d30d1 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -13,5 +13,6 @@ SCHED_FEAT(LB_BIAS, 1)
 SCHED_FEAT(LB_WAKEUP_UPDATE, 1)
 SCHED_FEAT(ASYM_EFF_LOAD, 1)
 SCHED_FEAT(WAKEUP_OVERLAP, 0)
-SCHED_FEAT(LAST_BUDDY, 1)
+SCHED_FEAT(LAST_BUDDY, 0)
+SCHED_FEAT(NEXT_BUDDY, 0)
 SCHED_FEAT(OWNER_SPIN, 1)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 3c44b56..addfe2d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -317,8 +317,6 @@ static int worker_thread(void *__cwq)
 	if (cwq->wq->freezeable)
 		set_freezable();
 
-	set_user_nice(current, -5);
-
 	for (;;) {
 		prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
 		if (!freezing(current) &&



^ permalink raw reply related	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  8:52                   ` Mike Galbraith
@ 2009-09-09  9:02                     ` Peter Zijlstra
  2009-09-09  9:18                       ` Mike Galbraith
  2009-09-09  9:05                     ` Nikos Chantziaras
                                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 216+ messages in thread
From: Peter Zijlstra @ 2009-09-09  9:02 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Ingo Molnar, Jens Axboe, Con Kolivas, linux-kernel

On Wed, 2009-09-09 at 10:52 +0200, Mike Galbraith wrote:
> @@ -1502,7 +1502,8 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int sync)
>          */
>         if (sched_feat(LAST_BUDDY) && likely(se->on_rq && curr != rq->idle))
>                 set_last_buddy(se);
> -       set_next_buddy(pse);
> +       if (sched_feat(NEXT_BUDDY))
> +               set_next_buddy(pse);
>  
>         /*
>          * We can come here with TIF_NEED_RESCHED already set from new task

You might want to test stuff like sysbench again, iirc we went on a
cache-trashing rampage without buddies.

Our goal is not to excel at any one load but to not suck at any one
load.


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  8:52                   ` Mike Galbraith
  2009-09-09  9:02                     ` Peter Zijlstra
@ 2009-09-09  9:05                     ` Nikos Chantziaras
  2009-09-09  9:17                       ` Peter Zijlstra
  2009-09-09  9:10                     ` Jens Axboe
                                       ` (3 subsequent siblings)
  5 siblings, 1 reply; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-09  9:05 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Jens Axboe, Peter Zijlstra, Con Kolivas, linux-kernel

On 09/09/2009 11:52 AM, Mike Galbraith wrote:
> On Wed, 2009-09-09 at 08:13 +0200, Ingo Molnar wrote:
>> * Jens Axboe<jens.axboe@oracle.com>  wrote:
>>
>>> On Tue, Sep 08 2009, Peter Zijlstra wrote:
>>>> On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
>>>>> And here's a newer version.
>>>>
>>>> I tinkered a bit with your proglet and finally found the
>>>> problem.
>>>>
>>>> You used a single pipe per child, this means the loop in
>>>> run_child() would consume what it just wrote out until it got
>>>> force preempted by the parent which would also get woken.
>>>>
>>>> This results in the child spinning a while (its full quota) and
>>>> only reporting the last timestamp to the parent.
>>>
>>> Oh doh, that's not well thought out. Well it was a quick hack :-)
>>> Thanks for the fixup, now it's at least usable to some degree.
>>
>> What kind of latencies does it report on your box?
>>
>> Our vanilla scheduler default latency targets are:
>>
>>    single-core: 20 msecs
>>      dual-core: 40 msecs
>>      quad-core: 60 msecs
>>      opto-core: 80 msecs
>>
>> You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via
>> /proc/sys/kernel/sched_latency_ns:
>>
>>     echo 10000000>  /proc/sys/kernel/sched_latency_ns
>
> He would also need to lower min_granularity, otherwise, it'd be larger
> than the whole latency target.

Thank you for mentioning min_granularity.  After:

   echo 10000000 > /proc/sys/kernel/sched_latency_ns
   echo 2000000 > /proc/sys/kernel/sched_min_granularity_ns

I can clearly see an improvement: animations that are supposed to be 
fluid "skip" much less now, and in one occasion (simply moving the video 
window around) have been eliminated completely.  However, there seems to 
be a side effect from having CONFIG_SCHED_DEBUG enabled; things seem to 
be generally a tad more "jerky" with that option enabled, even when not 
even touching the latency and granularity defaults.

I'll try the patch you posted and see if this further improves things.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  8:52                   ` Mike Galbraith
  2009-09-09  9:02                     ` Peter Zijlstra
  2009-09-09  9:05                     ` Nikos Chantziaras
@ 2009-09-09  9:10                     ` Jens Axboe
  2009-09-09 11:54                       ` Jens Axboe
  2009-09-09 15:37                     ` [tip:sched/core] sched: Turn off child_runs_first tip-bot for Mike Galbraith
                                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-09  9:10 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Ingo Molnar, Peter Zijlstra, Con Kolivas, linux-kernel

On Wed, Sep 09 2009, Mike Galbraith wrote:
> On Wed, 2009-09-09 at 08:13 +0200, Ingo Molnar wrote:
> > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > On Tue, Sep 08 2009, Peter Zijlstra wrote:
> > > > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
> > > > > And here's a newer version.
> > > > 
> > > > I tinkered a bit with your proglet and finally found the 
> > > > problem.
> > > > 
> > > > You used a single pipe per child, this means the loop in 
> > > > run_child() would consume what it just wrote out until it got 
> > > > force preempted by the parent which would also get woken.
> > > > 
> > > > This results in the child spinning a while (its full quota) and 
> > > > only reporting the last timestamp to the parent.
> > > 
> > > Oh doh, that's not well thought out. Well it was a quick hack :-) 
> > > Thanks for the fixup, now it's at least usable to some degree.
> > 
> > What kind of latencies does it report on your box?
> > 
> > Our vanilla scheduler default latency targets are:
> > 
> >   single-core: 20 msecs
> >     dual-core: 40 msecs
> >     quad-core: 60 msecs
> >     opto-core: 80 msecs
> > 
> > You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via 
> > /proc/sys/kernel/sched_latency_ns:
> > 
> >    echo 10000000 > /proc/sys/kernel/sched_latency_ns
> 
> He would also need to lower min_granularity, otherwise, it'd be larger
> than the whole latency target.
> 
> I'm testing right now, and one thing that is definitely a problem is the
> amount of sleeper fairness we're giving.  A full latency is just too
> much short term fairness in my testing.  While sleepers are catching up,
> hogs languish.  That's the biggest issue going on.
> 
> I've also been doing some timings of make -j4 (looking at idle time),
> and find that child_runs_first is mildly detrimental to fork/exec load,
> as are buddies.
> 
> I'm running with the below at the moment.  (the kthread/workqueue thing
> is just because I don't see any reason for it to exist, so consider it
> to be a waste of perfectly good math;)

Using latt, it seems better than -rc9. The below are entries logged
while running make -j128 on a 64 thread box. I did two runs on each, and
latt is using 8 clients.

-rc9
        Max                23772 usec
        Avg                 1129 usec
        Stdev               4328 usec
        Stdev mean           117 usec

        Max                32709 usec
        Avg                 1467 usec
        Stdev               5095 usec
        Stdev mean           136 usec

-rc9 + patch

        Max                11561 usec
        Avg                 1532 usec
        Stdev               1994 usec
        Stdev mean            48 usec

        Max                 9590 usec
        Avg                 1550 usec
        Stdev               2051 usec
        Stdev mean            50 usec

max latency is way down, and much smaller variation as well.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  9:05                     ` Nikos Chantziaras
@ 2009-09-09  9:17                       ` Peter Zijlstra
  2009-09-09  9:40                         ` Nikos Chantziaras
  2009-09-10 19:45                         ` Martin Steigerwald
  0 siblings, 2 replies; 216+ messages in thread
From: Peter Zijlstra @ 2009-09-09  9:17 UTC (permalink / raw)
  To: Nikos Chantziaras
  Cc: Mike Galbraith, Ingo Molnar, Jens Axboe, Con Kolivas, linux-kernel

On Wed, 2009-09-09 at 12:05 +0300, Nikos Chantziaras wrote:

> Thank you for mentioning min_granularity.  After:
> 
>    echo 10000000 > /proc/sys/kernel/sched_latency_ns
>    echo 2000000 > /proc/sys/kernel/sched_min_granularity_ns

You might also want to do:

     echo 2000000 > /proc/sys/kernel/sched_wakeup_granularity_ns

That affects when a newly woken task will preempt an already running
task.

> I can clearly see an improvement: animations that are supposed to be 
> fluid "skip" much less now, and in one occasion (simply moving the video 
> window around) have been eliminated completely.  However, there seems to 
> be a side effect from having CONFIG_SCHED_DEBUG enabled; things seem to 
> be generally a tad more "jerky" with that option enabled, even when not 
> even touching the latency and granularity defaults.

There's more code in the scheduler with that enabled but unless you've
got a terrible high ctx rate that really shouldn't affect things.

Anyway, you can always poke at these numbers in the code, and like Mike
did, kill sched_init_granularity().




^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  9:02                     ` Peter Zijlstra
@ 2009-09-09  9:18                       ` Mike Galbraith
  0 siblings, 0 replies; 216+ messages in thread
From: Mike Galbraith @ 2009-09-09  9:18 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Jens Axboe, Con Kolivas, linux-kernel

On Wed, 2009-09-09 at 11:02 +0200, Peter Zijlstra wrote:
> On Wed, 2009-09-09 at 10:52 +0200, Mike Galbraith wrote:
> > @@ -1502,7 +1502,8 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int sync)
> >          */
> >         if (sched_feat(LAST_BUDDY) && likely(se->on_rq && curr != rq->idle))
> >                 set_last_buddy(se);
> > -       set_next_buddy(pse);
> > +       if (sched_feat(NEXT_BUDDY))
> > +               set_next_buddy(pse);
> >  
> >         /*
> >          * We can come here with TIF_NEED_RESCHED already set from new task
> 
> You might want to test stuff like sysbench again, iirc we went on a
> cache-trashing rampage without buddies.
> 
> Our goal is not to excel at any one load but to not suck at any one
> load.

Oh absolutely.  I wouldn't want buddies disabled by default, I only
added the buddy knob to test effects on fork/exec.

I only posted to patch to give Jens something canned to try out.

	-Mike


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  9:17                       ` Peter Zijlstra
@ 2009-09-09  9:40                         ` Nikos Chantziaras
  2009-09-09 10:17                           ` Nikos Chantziaras
  2009-09-10 19:45                         ` Martin Steigerwald
  1 sibling, 1 reply; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-09  9:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Ingo Molnar, Jens Axboe, Con Kolivas, linux-kernel

On 09/09/2009 12:17 PM, Peter Zijlstra wrote:
> On Wed, 2009-09-09 at 12:05 +0300, Nikos Chantziaras wrote:
>
>> Thank you for mentioning min_granularity.  After:
>>
>>     echo 10000000>  /proc/sys/kernel/sched_latency_ns
>>     echo 2000000>  /proc/sys/kernel/sched_min_granularity_ns
>
> You might also want to do:
>
>       echo 2000000>  /proc/sys/kernel/sched_wakeup_granularity_ns
>
> That affects when a newly woken task will preempt an already running
> task.

Lowering wakeup_granularity seems to make things worse in an interesting 
way:

With low wakeup_granularity, the video itself will start skipping if I 
move the window around.  However, the window manager's effect of moving 
a window around is smooth.

With high wakeup_granularity, the video itself will not skip while 
moving the window around.  But this time, the window manager's effect of 
the window move is skippy.

(I should point out that only with the BFS-patched kernel can I have a 
smooth video *and* a smooth window-moving effect at the same time.) 
Mainline seems to prioritize one of the two according to whether 
wakeup_granularity is raised or lowered.  However, I have not tested 
Mike's patch yet (but will do so ASAP.)

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 20:22         ` Frans Pop
  2009-09-08 21:10           ` Michal Schmidt
  2009-09-08 21:11           ` Frans Pop
@ 2009-09-09  9:53           ` Benjamin Herrenschmidt
  2009-09-09 11:14             ` David Newall
  2009-09-09 11:55             ` Frans Pop
  2 siblings, 2 replies; 216+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-09  9:53 UTC (permalink / raw)
  To: Frans Pop; +Cc: Arjan van de Ven, realnc, linux-kernel

On Tue, 2009-09-08 at 22:22 +0200, Frans Pop wrote:
> Arjan van de Ven wrote:
> > the latest version of latencytop also has a GUI (thanks to Ben)
> 
> That looks nice, but...
> 
> I kind of miss the split screen feature where latencytop would show both 
> the overall figures + the ones for the currently most affected task. 
> Downside of that last was that I never managed to keep the display on a 
> specific task.

Any idea of how to present it ? I'm happy to spend 5mn improving the
GUI :-) 

> The graphical display also makes it impossible to simply copy and paste 
> the results.

Ah that's right. I'm not 100% sure how to do that (first experiments
with gtk). I suppose I could try to do some kind of "snapshot" feature
which saves the results in textual form.

> Having the freeze button is nice though.
> 
> Would it be possible to have a command line switch that allows to start 
> the old textual mode?

It's there iirc. --nogui :-)

Cheers,
Ben.

> Looks like the man page needs updating too :-)
> 
> Cheers,
> FJP
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  9:40                         ` Nikos Chantziaras
@ 2009-09-09 10:17                           ` Nikos Chantziaras
  0 siblings, 0 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-09 10:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Ingo Molnar, Jens Axboe, Con Kolivas, linux-kernel

On 09/09/2009 12:40 PM, Nikos Chantziaras wrote:
> On 09/09/2009 12:17 PM, Peter Zijlstra wrote:
>> On Wed, 2009-09-09 at 12:05 +0300, Nikos Chantziaras wrote:
>>
>>> Thank you for mentioning min_granularity. After:
>>>
>>> echo 10000000> /proc/sys/kernel/sched_latency_ns
>>> echo 2000000> /proc/sys/kernel/sched_min_granularity_ns
>>
>> You might also want to do:
>>
>> echo 2000000> /proc/sys/kernel/sched_wakeup_granularity_ns
>>
>> That affects when a newly woken task will preempt an already running
>> task.
>
> Lowering wakeup_granularity seems to make things worse in an interesting
> way:
>
> With low wakeup_granularity, the video itself will start skipping if I
> move the window around. However, the window manager's effect of moving a
> window around is smooth.
>
> With high wakeup_granularity, the video itself will not skip while
> moving the window around. But this time, the window manager's effect of
> the window move is skippy.
>
> (I should point out that only with the BFS-patched kernel can I have a
> smooth video *and* a smooth window-moving effect at the same time.)
> Mainline seems to prioritize one of the two according to whether
> wakeup_granularity is raised or lowered. However, I have not tested
> Mike's patch yet (but will do so ASAP.)

I've tested Mike's patch and it achieves the same effect as raising 
sched_min_granularity.

To round it up:

By testing various values for sched_latency_ns, sched_min_granularity_ns 
and sched_wakeup_granularity_ns, I can achieve three results:

   1. Fluid animations for the foreground app, skippy ones for
      the rest (video plays nicely, rest of the desktop lags.)

   2. Fluid animations for the background apps, a skippy one for
      the one in the foreground (dekstop behaves nicely, video lags.)

   3. Equally skippy/jerky behavior for all of them.

Unfortunately, a "4. Equally fluid behavior for all of them" cannot be 
achieved with mainline, unless I missed some other tweak.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  9:53           ` Benjamin Herrenschmidt
@ 2009-09-09 11:14             ` David Newall
  2009-09-09 11:32               ` Benjamin Herrenschmidt
  2009-09-09 11:55             ` Frans Pop
  1 sibling, 1 reply; 216+ messages in thread
From: David Newall @ 2009-09-09 11:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Frans Pop, Arjan van de Ven, realnc, linux-kernel

Benjamin Herrenschmidt wrote:
> On Tue, 2009-09-08 at 22:22 +0200, Frans Pop wrote:
>   
>> Arjan van de Ven wrote:
>>     
>>> the latest version of latencytop also has a GUI (thanks to Ben)
>>>       
>> That looks nice, but...
>>
>> I kind of miss the split screen feature where latencytop would show both 
>> the overall figures + the ones for the currently most affected task. 
>> Downside of that last was that I never managed to keep the display on a 
>> specific task.
>>     
>
> Any idea of how to present it ? I'm happy to spend 5mn improving the
> GUI :-) 

Use a second window.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09 11:14             ` David Newall
@ 2009-09-09 11:32               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 216+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-09 11:32 UTC (permalink / raw)
  To: David Newall; +Cc: Frans Pop, Arjan van de Ven, realnc, linux-kernel

On Wed, 2009-09-09 at 20:44 +0930, David Newall wrote:
> Benjamin Herrenschmidt wrote:
> > On Tue, 2009-09-08 at 22:22 +0200, Frans Pop wrote:
> >   
> >> Arjan van de Ven wrote:
> >>     
> >>> the latest version of latencytop also has a GUI (thanks to Ben)
> >>>       
> >> That looks nice, but...
> >>
> >> I kind of miss the split screen feature where latencytop would show both 
> >> the overall figures + the ones for the currently most affected task. 
> >> Downside of that last was that I never managed to keep the display on a 
> >> specific task.
> >>     
> >
> > Any idea of how to present it ? I'm happy to spend 5mn improving the
> > GUI :-) 
> 
> Use a second window.

I'm not too fan of cluttering the screen with windows... I suppose I
could have a separate pane for the "global" view but I haven't found a
way to lay it out in a way that doesn't suck :-) I could have done a 3rd
colums on the right with the overall view but it felt like using too
much screen real estate.

I'll experiment a bit, maybe 2 windows is indeed the solution. But you
get into the problem of what to do if only one of them is closed ? Do I
add a menu bar on each of them to re-open the "other" one if closed ?
etc...

Don't get me wrong, I have a shitload of experience doing GUIs (back in
the old days when I was hacking on MacOS), though I'm relatively new to
GTK. But GUI design is rather hard in general :-)

Ben.



^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 15:23             ` Peter Zijlstra
  2009-09-08 20:34               ` Jens Axboe
@ 2009-09-09 11:52               ` Nikos Chantziaras
  1 sibling, 0 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-09 11:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jens Axboe, Ingo Molnar, Con Kolivas, linux-kernel, Mike Galbraith

On 09/08/2009 06:23 PM, Peter Zijlstra wrote:
> On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
>> And here's a newer version.
>
> I tinkered a bit with your proglet and finally found the problem.
>
> You used a single pipe per child, this means the loop in run_child()
> would consume what it just wrote out until it got force preempted by the
> parent which would also get woken.
>
> This results in the child spinning a while (its full quota) and only
> reporting the last timestamp to the parent.
>
> Since consumer (parent) is a single thread the program basically
> measures the worst delay in a thundering herd wakeup of N children.
>
> The below version yields:
>
> idle
>
> [root@opteron sched]# ./latt -c8 sleep 30
> Entries: 664 (clients=8)
>
> Averages:
> ------------------------------
>          Max           128 usec
>          Avg            26 usec
>          Stdev          16 usec
>
>
> make -j4
>
> [root@opteron sched]# ./latt -c8 sleep 30
> Entries: 648 (clients=8)
>
> Averages:
> ------------------------------
>          Max         20861 usec
>          Avg          3763 usec
>          Stdev        4637 usec
>
>
> Mike's patch, make -j4
>
> [root@opteron sched]# ./latt -c8 sleep 30
> Entries: 648 (clients=8)
>
> Averages:
> ------------------------------
>          Max         17854 usec
>          Avg          6298 usec
>          Stdev        4735 usec

I've run two tests with this tool.  One with mainline (2.6.31-rc9) and 
one patched with 2.6.31-rc9-sched-bfs-210.patch.

Before running this test, I disabled the cron daemon in order not to 
have something pop-up in the background out of a sudden.

The test consisted of starting a "make -j2" in the kernel tree inside a 
3GB tmpfs mountpoint and then running 'latt "mplayer -vo gl2 -framedrop 
videofile.mkv"'  (mplayer in this case is a single-threaded 
application.)  Caches were warmed-up first; the results below are from 
the second run of each test.

The kernel .config file used by the running kernels and also for "make 
-j2" is:

   http://foss.math.aegean.gr/~realnc/kernel/config-2.6.31-rc9-latt-test

The video file used for mplayer is:

   http://foss.math.aegean.gr/~realnc/vids/3DMark2000.mkv (100MB)
   (The reason this was used is that it's a 60FPS video,
   therefore very smooth and makes all skips stand out
   clearly.)


Results for mainline:

   Averages:
   ------------------------------
           Max         29930 usec
           Avg         11043 usec
           Stdev        5752 usec


Results for BFS:

   Averages:
   ------------------------------
           Max         14017 usec
           Avg            49 usec
           Stdev         697 usec


One thing that's worth noting is that with mainline, mplayer would 
occasionally spit this out:

    YOUR SYSTEM IS TOO SLOW TO PLAY THIS

which doesn't happen with BFS.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  9:10                     ` Jens Axboe
@ 2009-09-09 11:54                       ` Jens Axboe
  2009-09-09 12:20                         ` Jens Axboe
  2009-09-09 12:48                         ` Mike Galbraith
  0 siblings, 2 replies; 216+ messages in thread
From: Jens Axboe @ 2009-09-09 11:54 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Ingo Molnar, Peter Zijlstra, Con Kolivas, linux-kernel

On Wed, Sep 09 2009, Jens Axboe wrote:
> On Wed, Sep 09 2009, Mike Galbraith wrote:
> > On Wed, 2009-09-09 at 08:13 +0200, Ingo Molnar wrote:
> > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > > 
> > > > On Tue, Sep 08 2009, Peter Zijlstra wrote:
> > > > > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
> > > > > > And here's a newer version.
> > > > > 
> > > > > I tinkered a bit with your proglet and finally found the 
> > > > > problem.
> > > > > 
> > > > > You used a single pipe per child, this means the loop in 
> > > > > run_child() would consume what it just wrote out until it got 
> > > > > force preempted by the parent which would also get woken.
> > > > > 
> > > > > This results in the child spinning a while (its full quota) and 
> > > > > only reporting the last timestamp to the parent.
> > > > 
> > > > Oh doh, that's not well thought out. Well it was a quick hack :-) 
> > > > Thanks for the fixup, now it's at least usable to some degree.
> > > 
> > > What kind of latencies does it report on your box?
> > > 
> > > Our vanilla scheduler default latency targets are:
> > > 
> > >   single-core: 20 msecs
> > >     dual-core: 40 msecs
> > >     quad-core: 60 msecs
> > >     opto-core: 80 msecs
> > > 
> > > You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via 
> > > /proc/sys/kernel/sched_latency_ns:
> > > 
> > >    echo 10000000 > /proc/sys/kernel/sched_latency_ns
> > 
> > He would also need to lower min_granularity, otherwise, it'd be larger
> > than the whole latency target.
> > 
> > I'm testing right now, and one thing that is definitely a problem is the
> > amount of sleeper fairness we're giving.  A full latency is just too
> > much short term fairness in my testing.  While sleepers are catching up,
> > hogs languish.  That's the biggest issue going on.
> > 
> > I've also been doing some timings of make -j4 (looking at idle time),
> > and find that child_runs_first is mildly detrimental to fork/exec load,
> > as are buddies.
> > 
> > I'm running with the below at the moment.  (the kthread/workqueue thing
> > is just because I don't see any reason for it to exist, so consider it
> > to be a waste of perfectly good math;)
> 
> Using latt, it seems better than -rc9. The below are entries logged
> while running make -j128 on a 64 thread box. I did two runs on each, and
> latt is using 8 clients.
> 
> -rc9
>         Max                23772 usec
>         Avg                 1129 usec
>         Stdev               4328 usec
>         Stdev mean           117 usec
> 
>         Max                32709 usec
>         Avg                 1467 usec
>         Stdev               5095 usec
>         Stdev mean           136 usec
> 
> -rc9 + patch
> 
>         Max                11561 usec
>         Avg                 1532 usec
>         Stdev               1994 usec
>         Stdev mean            48 usec
> 
>         Max                 9590 usec
>         Avg                 1550 usec
>         Stdev               2051 usec
>         Stdev mean            50 usec
> 
> max latency is way down, and much smaller variation as well.

Things are much better with this patch on the notebook! I cannot compare
with BFS as that still doesn't run anywhere I want it to run, but it's
way better than -rc9-git stock. latt numbers on the notebook have 1/3
the max latency, average is lower, and stddev is much smaller too.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  9:53           ` Benjamin Herrenschmidt
  2009-09-09 11:14             ` David Newall
@ 2009-09-09 11:55             ` Frans Pop
  2009-09-11  1:36               ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 216+ messages in thread
From: Frans Pop @ 2009-09-09 11:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Arjan van de Ven, realnc, linux-kernel

On Wednesday 09 September 2009, Benjamin Herrenschmidt wrote:
> On Tue, 2009-09-08 at 22:22 +0200, Frans Pop wrote:
> > Arjan van de Ven wrote:
> > > the latest version of latencytop also has a GUI (thanks to Ben)
> >
> > That looks nice, but...
> >
> > I kind of miss the split screen feature where latencytop would show
> > both the overall figures + the ones for the currently most affected
> > task. Downside of that last was that I never managed to keep the
> > display on a specific task.
>
> Any idea of how to present it ? I'm happy to spend 5mn improving the
> GUI :-)

I'd say add an extra horizontal split in the second column, so you'd get 
three areas in the right column:
- top for the global target (permanently)
- middle for current, either:
  - "current most lagging" if "Global" is selected in left column
  - selected process if a specific target is selected in left column
- bottom for backtrace

Maybe with that setup "Global" in the left column should be renamed to 
something like "Dynamic".

The backtrace area would show selection from either top or middle areas 
(so selecting a cause in top or middle area should unselect causes in the 
other).

Cheers,
FJP

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09 11:54                       ` Jens Axboe
@ 2009-09-09 12:20                         ` Jens Axboe
  2009-09-09 18:04                           ` Ingo Molnar
  2009-09-10  6:55                           ` Peter Zijlstra
  2009-09-09 12:48                         ` Mike Galbraith
  1 sibling, 2 replies; 216+ messages in thread
From: Jens Axboe @ 2009-09-09 12:20 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Ingo Molnar, Peter Zijlstra, Con Kolivas, linux-kernel

On Wed, Sep 09 2009, Jens Axboe wrote:
> On Wed, Sep 09 2009, Jens Axboe wrote:
> > On Wed, Sep 09 2009, Mike Galbraith wrote:
> > > On Wed, 2009-09-09 at 08:13 +0200, Ingo Molnar wrote:
> > > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > 
> > > > > On Tue, Sep 08 2009, Peter Zijlstra wrote:
> > > > > > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
> > > > > > > And here's a newer version.
> > > > > > 
> > > > > > I tinkered a bit with your proglet and finally found the 
> > > > > > problem.
> > > > > > 
> > > > > > You used a single pipe per child, this means the loop in 
> > > > > > run_child() would consume what it just wrote out until it got 
> > > > > > force preempted by the parent which would also get woken.
> > > > > > 
> > > > > > This results in the child spinning a while (its full quota) and 
> > > > > > only reporting the last timestamp to the parent.
> > > > > 
> > > > > Oh doh, that's not well thought out. Well it was a quick hack :-) 
> > > > > Thanks for the fixup, now it's at least usable to some degree.
> > > > 
> > > > What kind of latencies does it report on your box?
> > > > 
> > > > Our vanilla scheduler default latency targets are:
> > > > 
> > > >   single-core: 20 msecs
> > > >     dual-core: 40 msecs
> > > >     quad-core: 60 msecs
> > > >     opto-core: 80 msecs
> > > > 
> > > > You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via 
> > > > /proc/sys/kernel/sched_latency_ns:
> > > > 
> > > >    echo 10000000 > /proc/sys/kernel/sched_latency_ns
> > > 
> > > He would also need to lower min_granularity, otherwise, it'd be larger
> > > than the whole latency target.
> > > 
> > > I'm testing right now, and one thing that is definitely a problem is the
> > > amount of sleeper fairness we're giving.  A full latency is just too
> > > much short term fairness in my testing.  While sleepers are catching up,
> > > hogs languish.  That's the biggest issue going on.
> > > 
> > > I've also been doing some timings of make -j4 (looking at idle time),
> > > and find that child_runs_first is mildly detrimental to fork/exec load,
> > > as are buddies.
> > > 
> > > I'm running with the below at the moment.  (the kthread/workqueue thing
> > > is just because I don't see any reason for it to exist, so consider it
> > > to be a waste of perfectly good math;)
> > 
> > Using latt, it seems better than -rc9. The below are entries logged
> > while running make -j128 on a 64 thread box. I did two runs on each, and
> > latt is using 8 clients.
> > 
> > -rc9
> >         Max                23772 usec
> >         Avg                 1129 usec
> >         Stdev               4328 usec
> >         Stdev mean           117 usec
> > 
> >         Max                32709 usec
> >         Avg                 1467 usec
> >         Stdev               5095 usec
> >         Stdev mean           136 usec
> > 
> > -rc9 + patch
> > 
> >         Max                11561 usec
> >         Avg                 1532 usec
> >         Stdev               1994 usec
> >         Stdev mean            48 usec
> > 
> >         Max                 9590 usec
> >         Avg                 1550 usec
> >         Stdev               2051 usec
> >         Stdev mean            50 usec
> > 
> > max latency is way down, and much smaller variation as well.
> 
> Things are much better with this patch on the notebook! I cannot compare
> with BFS as that still doesn't run anywhere I want it to run, but it's
> way better than -rc9-git stock. latt numbers on the notebook have 1/3
> the max latency, average is lower, and stddev is much smaller too.

BFS210 runs on the laptop (dual core intel core duo). With make -j4
running, I clock the following latt -c8 'sleep 10' latencies:

-rc9

        Max                17895 usec
        Avg                 8028 usec
        Stdev               5948 usec
        Stdev mean           405 usec

        Max                17896 usec
        Avg                 4951 usec
        Stdev               6278 usec
        Stdev mean           427 usec

        Max                17885 usec
        Avg                 5526 usec
        Stdev               6819 usec
        Stdev mean           464 usec

-rc9 + mike

        Max                 6061 usec
        Avg                 3797 usec
        Stdev               1726 usec
        Stdev mean           117 usec

        Max                 5122 usec
        Avg                 3958 usec
        Stdev               1697 usec
        Stdev mean           115 usec

        Max                 6691 usec
        Avg                 2130 usec
        Stdev               2165 usec
        Stdev mean           147 usec

-rc9 + bfs210

        Max                   92 usec
        Avg                   27 usec
        Stdev                 19 usec
        Stdev mean             1 usec

        Max                   80 usec
        Avg                   23 usec
        Stdev                 15 usec
        Stdev mean             1 usec

        Max                   97 usec
        Avg                   27 usec
        Stdev                 21 usec
        Stdev mean             1 usec

One thing I also noticed is that when I have logged in, I run xmodmap
manually to load some keymappings (I always tell myself to add this to
the log in scripts, but I suspend/resume this laptop for weeks at the
time and forget before the next boot). With the stock kernel, xmodmap
will halt X updates and take forever to run. With BFS, it returned
instantly. As I would expect.

So the BFS design may be lacking in the scalability end (which is
obviously true, if you look at the code), but I can understand the
appeal of the scheduler for "normal" desktop people.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09 11:54                       ` Jens Axboe
  2009-09-09 12:20                         ` Jens Axboe
@ 2009-09-09 12:48                         ` Mike Galbraith
  1 sibling, 0 replies; 216+ messages in thread
From: Mike Galbraith @ 2009-09-09 12:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ingo Molnar, Peter Zijlstra, Con Kolivas, linux-kernel

On Wed, 2009-09-09 at 13:54 +0200, Jens Axboe wrote:

> Things are much better with this patch on the notebook! I cannot compare
> with BFS as that still doesn't run anywhere I want it to run, but it's
> way better than -rc9-git stock. latt numbers on the notebook have 1/3
> the max latency, average is lower, and stddev is much smaller too.

That patch has a bit of bustage in it.

We definitely want to turn down sched_latency though, and LAST_BUDDY
also wants some examination it seems.

taskset -c 3 ./xx 1 (100% cpu 1 sec interval perturbation measurement proggy.  overhead is what it is not getting)
xx says
2392.52 MHZ CPU
perturbation threshold 0.057 usecs.
...
'nuther terminal
taskset -c 3 make -j2 vmlinux

xx output

current (fixed breakage) patched tip tree
pert/s:      153 >18842.18us:       11 min:  0.50 max:36010.37 avg:4354.06 sum/s:666171us overhead:66.62%
pert/s:      160 >18767.18us:       12 min:  0.13 max:32011.66 avg:4172.69 sum/s:667631us overhead:66.66%
pert/s:      156 >18499.43us:        9 min:  0.13 max:27883.24 avg:4296.08 sum/s:670189us overhead:66.49%
pert/s:      146 >18480.71us:       10 min:  0.50 max:32009.38 avg:4615.19 sum/s:673818us overhead:67.26%
pert/s:      154 >18433.20us:       17 min:  0.14 max:31537.12 avg:4474.14 sum/s:689018us overhead:67.68%
pert/s:      158 >18520.11us:        9 min:  0.50 max:34328.86 avg:4275.66 sum/s:675554us overhead:66.76%
pert/s:      154 >18683.74us:       12 min:  0.51 max:35949.23 avg:4363.67 sum/s:672005us overhead:67.04%
pert/s:      154 >18745.53us:        8 min:  0.51 max:34203.43 avg:4399.72 sum/s:677556us overhead:67.03%

bfs209
pert/s:      124 >18681.88us:       17 min:  0.15 max:27274.74 avg:4627.36 sum/s:573793us overhead:56.70%
pert/s:      106 >18702.52us:       20 min:  0.55 max:32022.07 avg:5754.48 sum/s:609975us overhead:59.80%
pert/s:      116 >19082.42us:       17 min:  0.15 max:39835.34 avg:5167.69 sum/s:599452us overhead:59.95%
pert/s:      109 >19289.41us:       22 min:  0.14 max:36818.95 avg:5485.79 sum/s:597951us overhead:59.64%
pert/s:      108 >19238.97us:       19 min:  0.14 max:32026.74 avg:5543.17 sum/s:598662us overhead:59.87%
pert/s:      106 >19415.76us:       20 min:  0.54 max:36011.78 avg:6001.89 sum/s:636201us overhead:62.95%
pert/s:      115 >19341.89us:       16 min:  0.08 max:32040.83 avg:5313.45 sum/s:611047us overhead:59.98%
pert/s:      101 >19527.53us:       24 min:  0.14 max:36018.37 avg:6378.06 sum/s:644184us overhead:64.42%

stock tip (ouch ouch ouch)
pert/s:      153 >48453.23us:        5 min:  0.12 max:144009.85 avg:4688.90 sum/s:717401us overhead:70.89%
pert/s:      172 >47209.49us:        3 min:  0.48 max:68009.05 avg:4022.55 sum/s:691879us overhead:67.05%
pert/s:      148 >51139.18us:        5 min:  0.53 max:168094.76 avg:4918.14 sum/s:727885us overhead:71.65%
pert/s:      171 >51350.64us:        6 min:  0.12 max:102202.79 avg:4304.77 sum/s:736115us overhead:69.24%
pert/s:      153 >57686.54us:        5 min:  0.12 max:224019.85 avg:5399.31 sum/s:826094us overhead:74.50%
pert/s:      172 >55886.47us:        2 min:  0.11 max:75378.18 avg:3993.52 sum/s:686885us overhead:67.67%
pert/s:      157 >58819.31us:        3 min:  0.12 max:165976.63 avg:4453.16 sum/s:699146us overhead:69.91%
pert/s:      149 >58410.21us:        5 min:  0.12 max:104663.89 avg:4792.73 sum/s:714116us overhead:71.41%

sched_latency=20ms min_granularity=4ms
pert/s:      162 >30152.07us:        2 min:  0.49 max:60011.85 avg:4272.97 sum/s:692221us overhead:68.13%
pert/s:      147 >29705.33us:        8 min:  0.14 max:46577.27 avg:4792.03 sum/s:704428us overhead:70.44%
pert/s:      162 >29344.16us:        2 min:  0.49 max:48010.50 avg:4176.75 sum/s:676633us overhead:67.40%
pert/s:      155 >29109.69us:        2 min:  0.49 max:49575.08 avg:4423.87 sum/s:685700us overhead:68.30%
pert/s:      153 >30627.66us:        3 min:  0.13 max:84005.71 avg:4573.07 sum/s:699680us overhead:69.42%
pert/s:      142 >30652.47us:        5 min:  0.49 max:56760.06 avg:4991.61 sum/s:708808us overhead:70.88%
pert/s:      152 >30101.12us:        2 min:  0.49 max:45757.88 avg:4519.92 sum/s:687028us overhead:67.89%
pert/s:      161 >29303.50us:        3 min:  0.12 max:40011.73 avg:4238.15 sum/s:682342us overhead:67.43%

NO_LAST_BUDDY
pert/s:      154 >15257.87us:       28 min:  0.13 max:42004.05 avg:4590.99 sum/s:707013us overhead:70.41%
pert/s:      162 >15392.05us:       34 min:  0.12 max:29021.79 avg:4177.47 sum/s:676750us overhead:66.81%
pert/s:      162 >15665.11us:       33 min:  0.13 max:32008.34 avg:4237.10 sum/s:686410us overhead:67.90%
pert/s:      159 >15914.89us:       31 min:  0.56 max:32056.86 avg:4268.87 sum/s:678751us overhead:67.47%
pert/s:      166 >15858.94us:       26 min:  0.13 max:26655.84 avg:4055.02 sum/s:673134us overhead:66.65%
pert/s:      165 >15878.96us:       32 min:  0.13 max:28010.44 avg:4107.86 sum/s:677798us overhead:66.68%
pert/s:      164 >16213.55us:       29 min:  0.14 max:34263.04 avg:4186.64 sum/s:686610us overhead:68.04%
pert/s:      149 >16764.54us:       20 min:  0.13 max:38688.64 avg:4758.26 sum/s:708981us overhead:70.23%


^ permalink raw reply	[flat|nested] 216+ messages in thread

* [tip:sched/core] sched: Turn off child_runs_first
  2009-09-09  8:52                   ` Mike Galbraith
                                       ` (2 preceding siblings ...)
  2009-09-09  9:10                     ` Jens Axboe
@ 2009-09-09 15:37                     ` tip-bot for Mike Galbraith
  2009-09-09 17:57                       ` Theodore Tso
  2009-09-09 15:37                     ` [tip:sched/core] sched: Re-tune the scheduler latency defaults to decrease worst-case latencies tip-bot for Mike Galbraith
  2009-09-09 15:37                     ` [tip:sched/core] sched: Keep kthreads at default priority tip-bot for Mike Galbraith
  5 siblings, 1 reply; 216+ messages in thread
From: tip-bot for Mike Galbraith @ 2009-09-09 15:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  2bba22c50b06abe9fd0d23933b1e64d35b419262
Gitweb:     http://git.kernel.org/tip/2bba22c50b06abe9fd0d23933b1e64d35b419262
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Wed, 9 Sep 2009 15:41:37 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 9 Sep 2009 17:30:05 +0200

sched: Turn off child_runs_first

Set child_runs_first default to off.

It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs
get preempted by child tasks, reducing parallelism.

Note, this patch might make existing races in user
applications more prominent than before - so breakages
might be bisected to this commit.

Child-runs-first is broken on SMP to begin with, and we
already had it off briefly in v2.6.23 so most of the
offenders ought to be fixed. Would be nice not to revert
this commit but fix those apps finally ...

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1252486344.28645.18.camel@marge.simson.net>
[ made the sysctl independent of CONFIG_SCHED_DEBUG, in case
  people want to work around broken apps. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 include/linux/sched.h |    2 +-
 kernel/sched_fair.c   |    4 ++--
 kernel/sysctl.c       |   16 ++++++++--------
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3b7f43e..3a50e82 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1820,8 +1820,8 @@ extern unsigned int sysctl_sched_min_granularity;
 extern unsigned int sysctl_sched_wakeup_granularity;
 extern unsigned int sysctl_sched_shares_ratelimit;
 extern unsigned int sysctl_sched_shares_thresh;
-#ifdef CONFIG_SCHED_DEBUG
 extern unsigned int sysctl_sched_child_runs_first;
+#ifdef CONFIG_SCHED_DEBUG
 extern unsigned int sysctl_sched_features;
 extern unsigned int sysctl_sched_migration_cost;
 extern unsigned int sysctl_sched_nr_migrate;
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index e386e5d..af325a3 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -48,10 +48,10 @@ unsigned int sysctl_sched_min_granularity = 4000000ULL;
 static unsigned int sched_nr_latency = 5;
 
 /*
- * After fork, child runs first. (default) If set to 0 then
+ * After fork, child runs first. If set to 0 (default) then
  * parent will (try to) run first.
  */
-const_debug unsigned int sysctl_sched_child_runs_first = 1;
+unsigned int sysctl_sched_child_runs_first __read_mostly;
 
 /*
  * sys_sched_yield() compat mode
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6c9836e..25d6bf3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -246,6 +246,14 @@ static int max_wakeup_granularity_ns = NSEC_PER_SEC;	/* 1 second */
 #endif
 
 static struct ctl_table kern_table[] = {
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "sched_child_runs_first",
+		.data		= &sysctl_sched_child_runs_first,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
 #ifdef CONFIG_SCHED_DEBUG
 	{
 		.ctl_name	= CTL_UNNUMBERED,
@@ -300,14 +308,6 @@ static struct ctl_table kern_table[] = {
 	},
 	{
 		.ctl_name	= CTL_UNNUMBERED,
-		.procname	= "sched_child_runs_first",
-		.data		= &sysctl_sched_child_runs_first,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= &proc_dointvec,
-	},
-	{
-		.ctl_name	= CTL_UNNUMBERED,
 		.procname	= "sched_features",
 		.data		= &sysctl_sched_features,
 		.maxlen		= sizeof(unsigned int),

^ permalink raw reply related	[flat|nested] 216+ messages in thread

* [tip:sched/core] sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
  2009-09-09  8:52                   ` Mike Galbraith
                                       ` (3 preceding siblings ...)
  2009-09-09 15:37                     ` [tip:sched/core] sched: Turn off child_runs_first tip-bot for Mike Galbraith
@ 2009-09-09 15:37                     ` tip-bot for Mike Galbraith
  2009-09-12 11:45                       ` Martin Steigerwald
  2009-09-09 15:37                     ` [tip:sched/core] sched: Keep kthreads at default priority tip-bot for Mike Galbraith
  5 siblings, 1 reply; 216+ messages in thread
From: tip-bot for Mike Galbraith @ 2009-09-09 15:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  172e082a9111ea504ee34cbba26284a5ebdc53a7
Gitweb:     http://git.kernel.org/tip/172e082a9111ea504ee34cbba26284a5ebdc53a7
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Wed, 9 Sep 2009 15:41:37 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 9 Sep 2009 17:30:06 +0200

sched: Re-tune the scheduler latency defaults to decrease worst-case latencies

Reduce the latency target from 20 msecs to 5 msecs.

Why? Larger latencies increase spread, which is good for scaling,
but bad for worst case latency.

We still have the ilog(nr_cpus) rule to scale up on bigger
server boxes.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1252486344.28645.18.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/sched_fair.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index af325a3..26fadb4 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -24,7 +24,7 @@
 
 /*
  * Targeted preemption latency for CPU-bound tasks:
- * (default: 20ms * (1 + ilog(ncpus)), units: nanoseconds)
+ * (default: 5ms * (1 + ilog(ncpus)), units: nanoseconds)
  *
  * NOTE: this latency value is not the same as the concept of
  * 'timeslice length' - timeslices in CFS are of variable length
@@ -34,13 +34,13 @@
  * (to see the precise effective timeslice length of your workload,
  *  run vmstat and monitor the context-switches (cs) field)
  */
-unsigned int sysctl_sched_latency = 20000000ULL;
+unsigned int sysctl_sched_latency = 5000000ULL;
 
 /*
  * Minimal preemption granularity for CPU-bound tasks:
- * (default: 4 msec * (1 + ilog(ncpus)), units: nanoseconds)
+ * (default: 1 msec * (1 + ilog(ncpus)), units: nanoseconds)
  */
-unsigned int sysctl_sched_min_granularity = 4000000ULL;
+unsigned int sysctl_sched_min_granularity = 1000000ULL;
 
 /*
  * is kept at sysctl_sched_latency / sysctl_sched_min_granularity
@@ -63,13 +63,13 @@ unsigned int __read_mostly sysctl_sched_compat_yield;
 
 /*
  * SCHED_OTHER wake-up granularity.
- * (default: 5 msec * (1 + ilog(ncpus)), units: nanoseconds)
+ * (default: 1 msec * (1 + ilog(ncpus)), units: nanoseconds)
  *
  * This option delays the preemption effects of decoupled workloads
  * and reduces their over-scheduling. Synchronous workloads will still
  * have immediate wakeup/sleep latencies.
  */
-unsigned int sysctl_sched_wakeup_granularity = 5000000UL;
+unsigned int sysctl_sched_wakeup_granularity = 1000000UL;
 
 const_debug unsigned int sysctl_sched_migration_cost = 500000UL;
 

^ permalink raw reply related	[flat|nested] 216+ messages in thread

* [tip:sched/core] sched: Keep kthreads at default priority
  2009-09-09  8:52                   ` Mike Galbraith
                                       ` (4 preceding siblings ...)
  2009-09-09 15:37                     ` [tip:sched/core] sched: Re-tune the scheduler latency defaults to decrease worst-case latencies tip-bot for Mike Galbraith
@ 2009-09-09 15:37                     ` tip-bot for Mike Galbraith
  2009-09-09 16:55                       ` Dmitry Torokhov
  5 siblings, 1 reply; 216+ messages in thread
From: tip-bot for Mike Galbraith @ 2009-09-09 15:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  61cbe54d9479ad98283b2dda686deae4c34b2d59
Gitweb:     http://git.kernel.org/tip/61cbe54d9479ad98283b2dda686deae4c34b2d59
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Wed, 9 Sep 2009 15:41:37 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 9 Sep 2009 17:30:06 +0200

sched: Keep kthreads at default priority

Removes kthread/workqueue priority boost, they increase worst-case
desktop latencies.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1252486344.28645.18.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/kthread.c   |    4 ----
 kernel/workqueue.c |    2 --
 2 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index eb8751a..5fe7099 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -16,8 +16,6 @@
 #include <linux/mutex.h>
 #include <trace/events/sched.h>
 
-#define KTHREAD_NICE_LEVEL (-5)
-
 static DEFINE_SPINLOCK(kthread_create_lock);
 static LIST_HEAD(kthread_create_list);
 struct task_struct *kthreadd_task;
@@ -145,7 +143,6 @@ struct task_struct *kthread_create(int (*threadfn)(void *data),
 		 * The kernel thread should not inherit these properties.
 		 */
 		sched_setscheduler_nocheck(create.result, SCHED_NORMAL, &param);
-		set_user_nice(create.result, KTHREAD_NICE_LEVEL);
 		set_cpus_allowed_ptr(create.result, cpu_all_mask);
 	}
 	return create.result;
@@ -221,7 +218,6 @@ int kthreadd(void *unused)
 	/* Setup a clean context for our children to inherit. */
 	set_task_comm(tsk, "kthreadd");
 	ignore_signals(tsk);
-	set_user_nice(tsk, KTHREAD_NICE_LEVEL);
 	set_cpus_allowed_ptr(tsk, cpu_all_mask);
 	set_mems_allowed(node_possible_map);
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0668795..ea1b4e7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -317,8 +317,6 @@ static int worker_thread(void *__cwq)
 	if (cwq->wq->freezeable)
 		set_freezable();
 
-	set_user_nice(current, -5);
-
 	for (;;) {
 		prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
 		if (!freezing(current) &&

^ permalink raw reply related	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-08 22:15   ` Serge Belyshev
@ 2009-09-09 15:52     ` Ingo Molnar
  2009-09-09 20:49       ` Serge Belyshev
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-09 15:52 UTC (permalink / raw)
  To: Serge Belyshev; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith


* Serge Belyshev <belyshev@depni.sinp.msu.ru> wrote:

> Serge Belyshev <belyshev@depni.sinp.msu.ru> writes:
> >[snip]
> 
> I've updated the graphs, added kernels 2.6.24..2.6.29:
> http://img186.imageshack.us/img186/7029/epicmakej4.png
> 
> And added comparison with best-performing 2.6.23 kernel:
> http://img34.imageshack.us/img34/7563/epicbfstips.png

Thanks!

I think we found the reason for that regression - would you mind 
to re-test with latest -tip, e157986 or later?

If that works for you i'll describe our theory.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [tip:sched/core] sched: Keep kthreads at default priority
  2009-09-09 15:37                     ` [tip:sched/core] sched: Keep kthreads at default priority tip-bot for Mike Galbraith
@ 2009-09-09 16:55                       ` Dmitry Torokhov
  2009-09-09 17:06                         ` Peter Zijlstra
  0 siblings, 1 reply; 216+ messages in thread
From: Dmitry Torokhov @ 2009-09-09 16:55 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, a.p.zijlstra, efault, tglx, mingo
  Cc: linux-tip-commits

On Wed, Sep 09, 2009 at 03:37:34PM +0000, tip-bot for Mike Galbraith wrote:
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index eb8751a..5fe7099 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -16,8 +16,6 @@
>  #include <linux/mutex.h>
>  #include <trace/events/sched.h>
>  
> -#define KTHREAD_NICE_LEVEL (-5)
> -

Why don't we just redefine it to 0? We may find out later that we'd
still prefer to have kernel threads have boost.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [tip:sched/core] sched: Keep kthreads at default priority
  2009-09-09 16:55                       ` Dmitry Torokhov
@ 2009-09-09 17:06                         ` Peter Zijlstra
  2009-09-09 17:34                           ` Mike Galbraith
  0 siblings, 1 reply; 216+ messages in thread
From: Peter Zijlstra @ 2009-09-09 17:06 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: mingo, hpa, linux-kernel, efault, tglx, mingo, linux-tip-commits

On Wed, 2009-09-09 at 09:55 -0700, Dmitry Torokhov wrote:
> On Wed, Sep 09, 2009 at 03:37:34PM +0000, tip-bot for Mike Galbraith wrote:
> > 
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index eb8751a..5fe7099 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -16,8 +16,6 @@
> >  #include <linux/mutex.h>
> >  #include <trace/events/sched.h>
> >  
> > -#define KTHREAD_NICE_LEVEL (-5)
> > -
> 
> Why don't we just redefine it to 0? We may find out later that we'd
> still prefer to have kernel threads have boost.

Seems sensible, also the traditional reasoning behind this nice level is
that kernel threads do work on behalf of multiple tasks. Its a kind of
prio ceiling thing.


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [tip:sched/core] sched: Keep kthreads at default priority
  2009-09-09 17:06                         ` Peter Zijlstra
@ 2009-09-09 17:34                           ` Mike Galbraith
  2009-09-12 11:48                             ` Martin Steigerwald
  0 siblings, 1 reply; 216+ messages in thread
From: Mike Galbraith @ 2009-09-09 17:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Torokhov, mingo, hpa, linux-kernel, tglx, mingo,
	linux-tip-commits

On Wed, 2009-09-09 at 19:06 +0200, Peter Zijlstra wrote:
> On Wed, 2009-09-09 at 09:55 -0700, Dmitry Torokhov wrote:
> > On Wed, Sep 09, 2009 at 03:37:34PM +0000, tip-bot for Mike Galbraith wrote:
> > > 
> > > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > > index eb8751a..5fe7099 100644
> > > --- a/kernel/kthread.c
> > > +++ b/kernel/kthread.c
> > > @@ -16,8 +16,6 @@
> > >  #include <linux/mutex.h>
> > >  #include <trace/events/sched.h>
> > >  
> > > -#define KTHREAD_NICE_LEVEL (-5)
> > > -
> > 
> > Why don't we just redefine it to 0? We may find out later that we'd
> > still prefer to have kernel threads have boost.
> 
> Seems sensible, also the traditional reasoning behind this nice level is
> that kernel threads do work on behalf of multiple tasks. Its a kind of
> prio ceiling thing.

True.  None of our current threads are heavy enough to matter much.

	-Mike


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [tip:sched/core] sched: Turn off child_runs_first
  2009-09-09 15:37                     ` [tip:sched/core] sched: Turn off child_runs_first tip-bot for Mike Galbraith
@ 2009-09-09 17:57                       ` Theodore Tso
  2009-09-09 18:08                         ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Theodore Tso @ 2009-09-09 17:57 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, a.p.zijlstra, efault, tglx, mingo
  Cc: linux-tip-commits

On Wed, Sep 09, 2009 at 03:37:07PM +0000, tip-bot for Mike Galbraith wrote:
> Commit-ID:  2bba22c50b06abe9fd0d23933b1e64d35b419262
> Gitweb:     http://git.kernel.org/tip/2bba22c50b06abe9fd0d23933b1e64d35b419262
> Author:     Mike Galbraith <efault@gmx.de>
> AuthorDate: Wed, 9 Sep 2009 15:41:37 +0200
> Committer:  Ingo Molnar <mingo@elte.hu>
> CommitDate: Wed, 9 Sep 2009 17:30:05 +0200
> 
> sched: Turn off child_runs_first
> 
> Set child_runs_first default to off.
> 
> It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs
> get preempted by child tasks, reducing parallelism.

Wasn't one of the reasons why we historically did child_runs_first was
so that for fork/exit workloads, the child has a chance to exec the
new process?  If the parent runs first, then more pages will probably
need to be COW'ed.

					- Ted

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09 12:20                         ` Jens Axboe
@ 2009-09-09 18:04                           ` Ingo Molnar
  2009-09-09 20:12                             ` Nikos Chantziaras
  2009-09-10  9:48                             ` BFS vs. mainline scheduler benchmarks and measurements Jens Axboe
  2009-09-10  6:55                           ` Peter Zijlstra
  1 sibling, 2 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-09 18:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Mike Galbraith, Peter Zijlstra, Con Kolivas, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5521 bytes --]


* Jens Axboe <jens.axboe@oracle.com> wrote:

> On Wed, Sep 09 2009, Jens Axboe wrote:
> > On Wed, Sep 09 2009, Jens Axboe wrote:
> > > On Wed, Sep 09 2009, Mike Galbraith wrote:
> > > > On Wed, 2009-09-09 at 08:13 +0200, Ingo Molnar wrote:
> > > > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > > 
> > > > > > On Tue, Sep 08 2009, Peter Zijlstra wrote:
> > > > > > > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
> > > > > > > > And here's a newer version.
> > > > > > > 
> > > > > > > I tinkered a bit with your proglet and finally found the 
> > > > > > > problem.
> > > > > > > 
> > > > > > > You used a single pipe per child, this means the loop in 
> > > > > > > run_child() would consume what it just wrote out until it got 
> > > > > > > force preempted by the parent which would also get woken.
> > > > > > > 
> > > > > > > This results in the child spinning a while (its full quota) and 
> > > > > > > only reporting the last timestamp to the parent.
> > > > > > 
> > > > > > Oh doh, that's not well thought out. Well it was a quick hack :-) 
> > > > > > Thanks for the fixup, now it's at least usable to some degree.
> > > > > 
> > > > > What kind of latencies does it report on your box?
> > > > > 
> > > > > Our vanilla scheduler default latency targets are:
> > > > > 
> > > > >   single-core: 20 msecs
> > > > >     dual-core: 40 msecs
> > > > >     quad-core: 60 msecs
> > > > >     opto-core: 80 msecs
> > > > > 
> > > > > You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via 
> > > > > /proc/sys/kernel/sched_latency_ns:
> > > > > 
> > > > >    echo 10000000 > /proc/sys/kernel/sched_latency_ns
> > > > 
> > > > He would also need to lower min_granularity, otherwise, it'd be larger
> > > > than the whole latency target.
> > > > 
> > > > I'm testing right now, and one thing that is definitely a problem is the
> > > > amount of sleeper fairness we're giving.  A full latency is just too
> > > > much short term fairness in my testing.  While sleepers are catching up,
> > > > hogs languish.  That's the biggest issue going on.
> > > > 
> > > > I've also been doing some timings of make -j4 (looking at idle time),
> > > > and find that child_runs_first is mildly detrimental to fork/exec load,
> > > > as are buddies.
> > > > 
> > > > I'm running with the below at the moment.  (the kthread/workqueue thing
> > > > is just because I don't see any reason for it to exist, so consider it
> > > > to be a waste of perfectly good math;)
> > > 
> > > Using latt, it seems better than -rc9. The below are entries logged
> > > while running make -j128 on a 64 thread box. I did two runs on each, and
> > > latt is using 8 clients.
> > > 
> > > -rc9
> > >         Max                23772 usec
> > >         Avg                 1129 usec
> > >         Stdev               4328 usec
> > >         Stdev mean           117 usec
> > > 
> > >         Max                32709 usec
> > >         Avg                 1467 usec
> > >         Stdev               5095 usec
> > >         Stdev mean           136 usec
> > > 
> > > -rc9 + patch
> > > 
> > >         Max                11561 usec
> > >         Avg                 1532 usec
> > >         Stdev               1994 usec
> > >         Stdev mean            48 usec
> > > 
> > >         Max                 9590 usec
> > >         Avg                 1550 usec
> > >         Stdev               2051 usec
> > >         Stdev mean            50 usec
> > > 
> > > max latency is way down, and much smaller variation as well.
> > 
> > Things are much better with this patch on the notebook! I cannot compare
> > with BFS as that still doesn't run anywhere I want it to run, but it's
> > way better than -rc9-git stock. latt numbers on the notebook have 1/3
> > the max latency, average is lower, and stddev is much smaller too.
> 
> BFS210 runs on the laptop (dual core intel core duo). With make -j4
> running, I clock the following latt -c8 'sleep 10' latencies:
> 
> -rc9
> 
>         Max                17895 usec
>         Avg                 8028 usec
>         Stdev               5948 usec
>         Stdev mean           405 usec
> 
>         Max                17896 usec
>         Avg                 4951 usec
>         Stdev               6278 usec
>         Stdev mean           427 usec
> 
>         Max                17885 usec
>         Avg                 5526 usec
>         Stdev               6819 usec
>         Stdev mean           464 usec
> 
> -rc9 + mike
> 
>         Max                 6061 usec
>         Avg                 3797 usec
>         Stdev               1726 usec
>         Stdev mean           117 usec
> 
>         Max                 5122 usec
>         Avg                 3958 usec
>         Stdev               1697 usec
>         Stdev mean           115 usec
> 
>         Max                 6691 usec
>         Avg                 2130 usec
>         Stdev               2165 usec
>         Stdev mean           147 usec

At least in my tests these latencies were mainly due to a bug in 
latt.c - i've attached the fixed version.

The other reason was wakeup batching. If you do this:

   echo 0 > /proc/sys/kernel/sched_wakeup_granularity_ns 

... then you can switch on insta-wakeups on -tip too.

With a dual-core box and a make -j4 background job running, on 
latest -tip i get the following latencies:

 $ ./latt -c8 sleep 30
 Entries: 656 (clients=8)

 Averages:
 ------------------------------ 
 	Max	      158 usec 
	Avg	       12 usec
	Stdev	       10 usec

Thanks,

	Ingo

[-- Attachment #2: latt.c --]
[-- Type: text/plain, Size: 9067 bytes --]

/*
 * Simple latency tester that combines multiple processes.
 *
 * Compile: gcc -Wall -O2 -D_GNU_SOURCE -lrt -lm -o latt latt.c
 *
 * Run with: latt -c8 'program --args'
 *
 * Options:
 *
 *	-cX	Use X number of clients
 *	-fX	Use X msec as the minimum sleep time for the parent
 *	-tX	Use X msec as the maximum sleep time for the parent
 *	-v	Print all delays as they are logged
 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <getopt.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <time.h>
#include <math.h>
#include <poll.h>
#include <pthread.h>


/*
 * In msecs
 */
static unsigned int min_delay = 100;
static unsigned int max_delay = 500;
static unsigned int clients = 1;
static unsigned int verbose;

#define MAX_CLIENTS		512

struct stats
{
	double n, mean, M2, max;
	int max_pid;
};

static void update_stats(struct stats *stats, unsigned long long val)
{
	double delta, x = val;

	stats->n++;
	delta = x - stats->mean;
	stats->mean += delta / stats->n;
	stats->M2 += delta*(x - stats->mean);

	if (stats->max < x)
		stats->max = x;
}

static unsigned long nr_stats(struct stats *stats)
{
	return stats->n;
}

static double max_stats(struct stats *stats)
{
	return stats->max;
}

static double avg_stats(struct stats *stats)
{
	return stats->mean;
}

/*
 * http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
 *
 *       (\Sum n_i^2) - ((\Sum n_i)^2)/n
 * s^2 = -------------------------------
 *                  n - 1
 *
 * http://en.wikipedia.org/wiki/Stddev
 */
static double stddev_stats(struct stats *stats)
{
	double variance = stats->M2 / (stats->n - 1);

	return sqrt(variance);
}

struct stats delay_stats;

static int pipes[MAX_CLIENTS*2][2];

static pid_t app_pid;

#define CLOCKSOURCE		CLOCK_MONOTONIC

struct sem {
	pthread_mutex_t lock;
	pthread_cond_t cond;
	int value;
	int waiters;
};

static void init_sem(struct sem *sem)
{
	pthread_mutexattr_t attr;
	pthread_condattr_t cond;

	pthread_mutexattr_init(&attr);
	pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
	pthread_condattr_init(&cond);
	pthread_condattr_setpshared(&cond, PTHREAD_PROCESS_SHARED);
	pthread_cond_init(&sem->cond, &cond);
	pthread_mutex_init(&sem->lock, &attr);

	sem->value = 0;
	sem->waiters = 0;
}

static void sem_down(struct sem *sem)
{
	pthread_mutex_lock(&sem->lock);

	while (!sem->value) {
		sem->waiters++;
		pthread_cond_wait(&sem->cond, &sem->lock);
		sem->waiters--;
	}

	sem->value--;
	pthread_mutex_unlock(&sem->lock);
}

static void sem_up(struct sem *sem)
{
	pthread_mutex_lock(&sem->lock);
	if (!sem->value && sem->waiters)
		pthread_cond_signal(&sem->cond);
	sem->value++;
	pthread_mutex_unlock(&sem->lock);
}

static int parse_options(int argc, char *argv[])
{
	struct option l_opts[] = {
		{ "min-delay", 	1, 	NULL,	'f' },
		{ "max-delay",	1,	NULL,	't' },
		{ "clients",	1,	NULL,	'c' },
		{ "verbose",	1,	NULL,	'v' }
	};
	int c, res, index = 0;

	while ((c = getopt_long(argc, argv, "f:t:c:v", l_opts, &res)) != -1) {
		index++;
		switch (c) {
			case 'f':
				min_delay = atoi(optarg);
				break;
			case 't':
				max_delay = atoi(optarg);
				break;
			case 'c':
				clients = atoi(optarg);
				if (clients > MAX_CLIENTS)
					clients = MAX_CLIENTS;
				break;
			case 'v':
				verbose = 1;
				break;
		}
	}

	return index + 1;
}

static pid_t fork_off(const char *app)
{
	pid_t pid;

	pid = fork();
	if (pid)
		return pid;

	exit(system(app));
}

static unsigned long usec_since(struct timespec *start, struct timespec *end)
{
	unsigned long long s, e;

	s = start->tv_sec * 1000000000ULL + start->tv_nsec;
	e =   end->tv_sec * 1000000000ULL +   end->tv_nsec;

	return (e - s) / 1000;
}

static void log_delay(unsigned long delay)
{
	if (verbose) {
		fprintf(stderr, "log delay %8lu usec (pid %d)\n", delay, getpid());
		fflush(stderr);
	}

	update_stats(&delay_stats, delay);
}

/*
 * Reads a timestamp (which is ignored, it's just a wakeup call), and replies
 * with the timestamp of when we saw it
 */
static void run_child(int *in, int *out, struct sem *sem)
{
	struct timespec ts;

	if (verbose) {
		fprintf(stderr, "present: %d\n", getpid());
		fflush(stderr);
	}

	sem_up(sem);

	do {
		int ret;

		ret = read(in[0], &ts, sizeof(ts));
		if (ret <= 0)
			break;

		if (ret != sizeof(ts))
			printf("bugger3\n");

		clock_gettime(CLOCKSOURCE, &ts);

		ret = write(out[1], &ts, sizeof(ts));
		if (ret <= 0)
			break;

		if (ret != sizeof(ts))
			printf("bugger4\n");

		if (verbose) {
			fprintf(stderr, "alife: %d\n", getpid());
			fflush(stderr);
		}
	} while (1);
}

/*
 * Do a random sleep between min and max delay
 */
static void do_rand_sleep(void)
{
	unsigned int msecs;

	msecs = min_delay + ((float) max_delay * (rand() / (RAND_MAX + 1.0)));
	if (verbose) {
		fprintf(stderr, "sleeping for: %u msec\n", msecs);
		fflush(stderr);
	}
	usleep(msecs * 1000);
}

static void kill_connection(void)
{
	int i;

	for (i = 0; i < 2*clients; i++) {
		if (pipes[i][0] != -1) {
			close(pipes[i][0]);
			pipes[i][0] = -1;
		}
		if (pipes[i][1] != -1) {
			close(pipes[i][1]);
			pipes[i][1] = -1;
		}
	}
}

static int __write_ts(int i, struct timespec *ts)
{
	int fd = pipes[2*i][1];

	clock_gettime(CLOCKSOURCE, ts);

	return write(fd, ts, sizeof(*ts)) != sizeof(*ts);
}

static long __read_ts(int i, struct timespec *ts, pid_t *cpids)
{
	int fd = pipes[2*i+1][0];
	struct timespec t;

	if (read(fd, &t, sizeof(t)) != sizeof(t))
		return -1;

	log_delay(usec_since(ts, &t));
	if (verbose)
		fprintf(stderr, "got delay %ld from child %d [pid %d]\n", usec_since(ts, &t), i, cpids[i]);

	return 0;
}

static int read_ts(struct pollfd *pfd, unsigned int nr, struct timespec *ts,
		   pid_t *cpids)
{
	unsigned int i;

	for (i = 0; i < clients; i++) {
		if (pfd[i].revents & (POLLERR | POLLHUP | POLLNVAL))
			return -1L;
		if (pfd[i].revents & POLLIN) {
			pfd[i].events = 0;
			if (__read_ts(i, &ts[i], cpids))
				return -1L;
			nr--;
		}
		if (!nr)
			break;
	}

	return 0;
}

static int app_has_exited(void)
{
	int ret, status;

	/*
	 * If our app has exited, stop
	 */
	ret = waitpid(app_pid, &status, WNOHANG);
	if (ret < 0) {
		perror("waitpid");
		return 1;
	} else if (ret == app_pid &&
		   (WIFSIGNALED(status) || WIFEXITED(status))) {
		return 1;
	}

	return 0;
}

/*
 * While our given app is running, send a timestamp to each client and
 * log the maximum latency for each of them to wakeup and reply
 */
static void run_parent(pid_t *cpids)
{
	struct pollfd *ipfd;
	int do_exit = 0, i;
	struct timespec *t1;

	t1 = malloc(sizeof(struct timespec) * clients);
	ipfd = malloc(sizeof(struct pollfd) * clients);

	srand(1234);

	do {
		unsigned pending_events;

		do_rand_sleep();

		if (app_has_exited())
			break;

		for (i = 0; i < clients; i++) {
			ipfd[i].fd = pipes[2*i+1][0];
			ipfd[i].events = POLLIN;
		}

		/*
		 * Write wakeup calls
		 */
		for (i = 0; i < clients; i++) {
			if (verbose) {
				fprintf(stderr, "waking: %d\n", cpids[i]);
				fflush(stderr);
			}

			if (__write_ts(i, t1+i)) {
				do_exit = 1;
				break;
			}
		}

		if (do_exit)
			break;

		/*
		 * Poll and read replies
		 */
		pending_events = clients;
		while (pending_events) {
			int evts = poll(ipfd, clients, -1);

			if (evts < 0) {
				do_exit = 1;
				break;
			} else if (!evts) {
				printf("bugger2\n");
				continue;
			}

			if (read_ts(ipfd, evts, t1, cpids)) {
				do_exit = 1;
				break;
			}

			pending_events -= evts;
		}
	} while (!do_exit);

	free(t1);
	free(ipfd);
	kill_connection();
}

static void run_test(void)
{
	struct sem *sem;
	pid_t *cpids;
	int i, status;

	sem = mmap(NULL, sizeof(*sem), PROT_READ|PROT_WRITE,
			MAP_SHARED | MAP_ANONYMOUS, 0, 0);
	if (sem == MAP_FAILED) {
		perror("mmap");
		return;
	}

	init_sem(sem);

	for (i = 0; i < 2*clients; i++) {
		if (pipe(pipes[i])) {
			perror("pipe");
			return;
		}
	}

	cpids = malloc(sizeof(pid_t) * clients);

	for (i = 0; i < clients; i++) {
		cpids[i] = fork();
		if (cpids[i]) {
			sem_down(sem);
			continue;
		}

		run_child(pipes[2*i], pipes[2*i+1], sem);
		exit(0);
	}

	run_parent(cpids);

	for (i = 0; i < clients; i++)
		kill(cpids[i], SIGQUIT);
	for (i = 0; i < clients; i++)
		waitpid(cpids[i], &status, 0);

	free(cpids);
	munmap(sem, sizeof(*sem));
}

static void handle_sigint(int sig)
{
	kill(app_pid, SIGINT);
}

int main(int argc, char *argv[])
{
	int app_offset, off;
	char app[256];

	off = 0;
	app_offset = parse_options(argc, argv);
	while (app_offset < argc) {
		if (off) {
			app[off] = ' ';
			off++;
		}
		off += sprintf(app + off, "%s", argv[app_offset]);
		app_offset++;
	}

	signal(SIGINT, handle_sigint);

	/*
	 * Start app and start logging latencies
	 */
	app_pid = fork_off(app);
	run_test();

	printf("Entries: %lu (clients=%d)\n", nr_stats(&delay_stats), clients);
	printf("\nAverages:\n");
	printf("------------------------------\n");
	printf("\tMax\t %8.0f usec\n", max_stats(&delay_stats));
	printf("\tAvg\t %8.0f usec\n", avg_stats(&delay_stats));
	printf("\tStdev\t %8.0f usec\n", stddev_stats(&delay_stats));

	return 0;
}

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [tip:sched/core] sched: Turn off child_runs_first
  2009-09-09 17:57                       ` Theodore Tso
@ 2009-09-09 18:08                         ` Ingo Molnar
  2009-09-09 18:59                           ` Chris Friesen
  2009-09-09 19:48                           ` Pavel Machek
  0 siblings, 2 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-09 18:08 UTC (permalink / raw)
  To: Theodore Tso, mingo, hpa, linux-kernel, a.p.zijlstra, efault,
	tglx, linux-tip-commits


* Theodore Tso <tytso@mit.edu> wrote:

> On Wed, Sep 09, 2009 at 03:37:07PM +0000, tip-bot for Mike Galbraith wrote:
> > Commit-ID:  2bba22c50b06abe9fd0d23933b1e64d35b419262
> > Gitweb:     http://git.kernel.org/tip/2bba22c50b06abe9fd0d23933b1e64d35b419262
> > Author:     Mike Galbraith <efault@gmx.de>
> > AuthorDate: Wed, 9 Sep 2009 15:41:37 +0200
> > Committer:  Ingo Molnar <mingo@elte.hu>
> > CommitDate: Wed, 9 Sep 2009 17:30:05 +0200
> > 
> > sched: Turn off child_runs_first
> > 
> > Set child_runs_first default to off.
> > 
> > It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs
> > get preempted by child tasks, reducing parallelism.
> 
> Wasn't one of the reasons why we historically did child_runs_first 
> was so that for fork/exit workloads, the child has a chance to 
> exec the new process?  If the parent runs first, then more pages 
> will probably need to be COW'ed.

That kind of workload should be using vfork() anyway, and be even 
faster because it can avoid the fork overhead, right?

Also, on SMP we do that anyway - there's good likelyhood on an idle 
system that we wake the child on the other core straight away.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [tip:sched/core] sched: Turn off child_runs_first
  2009-09-09 18:08                         ` Ingo Molnar
@ 2009-09-09 18:59                           ` Chris Friesen
  2009-09-09 19:48                           ` Pavel Machek
  1 sibling, 0 replies; 216+ messages in thread
From: Chris Friesen @ 2009-09-09 18:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Theodore Tso, mingo, hpa, linux-kernel, a.p.zijlstra, efault,
	tglx, linux-tip-commits

On 09/09/2009 12:08 PM, Ingo Molnar wrote:
> 
> * Theodore Tso <tytso@mit.edu> wrote:

>> Wasn't one of the reasons why we historically did child_runs_first 
>> was so that for fork/exit workloads, the child has a chance to 
>> exec the new process?  If the parent runs first, then more pages 
>> will probably need to be COW'ed.
> 
> That kind of workload should be using vfork() anyway, and be even 
> faster because it can avoid the fork overhead, right?

According to my man page, POSIX.1-2008 removes the specification  of
vfork().

Chris

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [tip:sched/core] sched: Turn off child_runs_first
  2009-09-09 18:08                         ` Ingo Molnar
  2009-09-09 18:59                           ` Chris Friesen
@ 2009-09-09 19:48                           ` Pavel Machek
  1 sibling, 0 replies; 216+ messages in thread
From: Pavel Machek @ 2009-09-09 19:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Theodore Tso, mingo, hpa, linux-kernel, a.p.zijlstra, efault,
	tglx, linux-tip-commits

Hi!

> > > It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs
> > > get preempted by child tasks, reducing parallelism.
> > 
> > Wasn't one of the reasons why we historically did child_runs_first 
> > was so that for fork/exit workloads, the child has a chance to 
> > exec the new process?  If the parent runs first, then more pages 
> > will probably need to be COW'ed.
> 
> That kind of workload should be using vfork() anyway, and be even 
> faster because it can avoid the fork overhead, right?

Well.. one should not have to update userspace to keep
performance.... and vfork is extremely ugly interface.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09 18:04                           ` Ingo Molnar
@ 2009-09-09 20:12                             ` Nikos Chantziaras
  2009-09-09 20:50                               ` Jens Axboe
                                                 ` (2 more replies)
  2009-09-10  9:48                             ` BFS vs. mainline scheduler benchmarks and measurements Jens Axboe
  1 sibling, 3 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-09 20:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jens Axboe, Mike Galbraith, Peter Zijlstra, Con Kolivas, linux-kernel

On 09/09/2009 09:04 PM, Ingo Molnar wrote:
> [...]
> * Jens Axboe<jens.axboe@oracle.com>  wrote:
>
>> On Wed, Sep 09 2009, Jens Axboe wrote:
>>  [...]
>> BFS210 runs on the laptop (dual core intel core duo). With make -j4
>> running, I clock the following latt -c8 'sleep 10' latencies:
>>
>> -rc9
>>
>>          Max                17895 usec
>>          Avg                 8028 usec
>>          Stdev               5948 usec
>>          Stdev mean           405 usec
>>
>>          Max                17896 usec
>>          Avg                 4951 usec
>>          Stdev               6278 usec
>>          Stdev mean           427 usec
>>
>>          Max                17885 usec
>>          Avg                 5526 usec
>>          Stdev               6819 usec
>>          Stdev mean           464 usec
>>
>> -rc9 + mike
>>
>>          Max                 6061 usec
>>          Avg                 3797 usec
>>          Stdev               1726 usec
>>          Stdev mean           117 usec
>>
>>          Max                 5122 usec
>>          Avg                 3958 usec
>>          Stdev               1697 usec
>>          Stdev mean           115 usec
>>
>>          Max                 6691 usec
>>          Avg                 2130 usec
>>          Stdev               2165 usec
>>          Stdev mean           147 usec
>
> At least in my tests these latencies were mainly due to a bug in
> latt.c - i've attached the fixed version.
>
> The other reason was wakeup batching. If you do this:
>
>     echo 0>  /proc/sys/kernel/sched_wakeup_granularity_ns
>
> ... then you can switch on insta-wakeups on -tip too.
>
> With a dual-core box and a make -j4 background job running, on
> latest -tip i get the following latencies:
>
>   $ ./latt -c8 sleep 30
>   Entries: 656 (clients=8)
>
>   Averages:
>   ------------------------------
>   	Max	      158 usec
> 	Avg	       12 usec
> 	Stdev	       10 usec

With your version of latt.c, I get these results with 2.6-tip vs 
2.6.31-rc9-bfs:


(mainline)
Averages:
------------------------------
         Max            50 usec
         Avg            12 usec
         Stdev           3 usec


(BFS)
Averages:
------------------------------
         Max           474 usec
         Avg            11 usec
         Stdev          16 usec


However, the interactivity problems still remain.  Does that mean it's 
not a latency issue?

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-09 15:52     ` Ingo Molnar
@ 2009-09-09 20:49       ` Serge Belyshev
  2009-09-09 21:23         ` Cory Fields
  2009-09-10  6:53         ` Ingo Molnar
  0 siblings, 2 replies; 216+ messages in thread
From: Serge Belyshev @ 2009-09-09 20:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

Ingo Molnar <mingo@elte.hu> writes:

> Thanks!
>
> I think we found the reason for that regression - would you mind 
> to re-test with latest -tip, e157986 or later?
>
> If that works for you i'll describe our theory.
>

Good job -- seems to work, thanks.  Regression is still about 3% though:
http://img3.imageshack.us/img3/5335/epicbfstip.png

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09 20:12                             ` Nikos Chantziaras
@ 2009-09-09 20:50                               ` Jens Axboe
  2009-09-10  1:02                                 ` Con Kolivas
  2009-09-10  3:15                               ` Mike Galbraith
  2009-09-10  6:08                               ` Ingo Molnar
  2 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-09 20:50 UTC (permalink / raw)
  To: Nikos Chantziaras
  Cc: Ingo Molnar, Mike Galbraith, Peter Zijlstra, Con Kolivas, linux-kernel

On Wed, Sep 09 2009, Nikos Chantziaras wrote:
> On 09/09/2009 09:04 PM, Ingo Molnar wrote:
>> [...]
>> * Jens Axboe<jens.axboe@oracle.com>  wrote:
>>
>>> On Wed, Sep 09 2009, Jens Axboe wrote:
>>>  [...]
>>> BFS210 runs on the laptop (dual core intel core duo). With make -j4
>>> running, I clock the following latt -c8 'sleep 10' latencies:
>>>
>>> -rc9
>>>
>>>          Max                17895 usec
>>>          Avg                 8028 usec
>>>          Stdev               5948 usec
>>>          Stdev mean           405 usec
>>>
>>>          Max                17896 usec
>>>          Avg                 4951 usec
>>>          Stdev               6278 usec
>>>          Stdev mean           427 usec
>>>
>>>          Max                17885 usec
>>>          Avg                 5526 usec
>>>          Stdev               6819 usec
>>>          Stdev mean           464 usec
>>>
>>> -rc9 + mike
>>>
>>>          Max                 6061 usec
>>>          Avg                 3797 usec
>>>          Stdev               1726 usec
>>>          Stdev mean           117 usec
>>>
>>>          Max                 5122 usec
>>>          Avg                 3958 usec
>>>          Stdev               1697 usec
>>>          Stdev mean           115 usec
>>>
>>>          Max                 6691 usec
>>>          Avg                 2130 usec
>>>          Stdev               2165 usec
>>>          Stdev mean           147 usec
>>
>> At least in my tests these latencies were mainly due to a bug in
>> latt.c - i've attached the fixed version.
>>
>> The other reason was wakeup batching. If you do this:
>>
>>     echo 0>  /proc/sys/kernel/sched_wakeup_granularity_ns
>>
>> ... then you can switch on insta-wakeups on -tip too.
>>
>> With a dual-core box and a make -j4 background job running, on
>> latest -tip i get the following latencies:
>>
>>   $ ./latt -c8 sleep 30
>>   Entries: 656 (clients=8)
>>
>>   Averages:
>>   ------------------------------
>>   	Max	      158 usec
>> 	Avg	       12 usec
>> 	Stdev	       10 usec
>
> With your version of latt.c, I get these results with 2.6-tip vs  
> 2.6.31-rc9-bfs:
>
>
> (mainline)
> Averages:
> ------------------------------
>         Max            50 usec
>         Avg            12 usec
>         Stdev           3 usec
>
>
> (BFS)
> Averages:
> ------------------------------
>         Max           474 usec
>         Avg            11 usec
>         Stdev          16 usec
>
>
> However, the interactivity problems still remain.  Does that mean it's  
> not a latency issue?

It probably just means that latt isn't a good measure of the problem.
Which isn't really too much of a surprise.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-09 20:49       ` Serge Belyshev
@ 2009-09-09 21:23         ` Cory Fields
  2009-09-10  6:53         ` Ingo Molnar
  1 sibling, 0 replies; 216+ messages in thread
From: Cory Fields @ 2009-09-09 21:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo

I've noticed the same regression since around 2.6.23, mainly in
multi-core video decoding. A git bisect reveals the guilty commit to
be: 33b0c4217dcd67b788318c3192a2912b530e4eef

It is easily visible because with the guilty commit included, one core
of the cpu remains pegged while the other(s) are severly
underutilized.

Hope this helps

Cory Fields

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09 20:50                               ` Jens Axboe
@ 2009-09-10  1:02                                 ` Con Kolivas
  2009-09-10 11:03                                   ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Con Kolivas @ 2009-09-10  1:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Nikos Chantziaras, Ingo Molnar, Mike Galbraith, Peter Zijlstra,
	linux-kernel

On Thu, 10 Sep 2009 06:50:43 Jens Axboe wrote:
> On Wed, Sep 09 2009, Nikos Chantziaras wrote:
> > On 09/09/2009 09:04 PM, Ingo Molnar wrote:
> >> [...]
> >>
> >> * Jens Axboe<jens.axboe@oracle.com>  wrote:
> >>> On Wed, Sep 09 2009, Jens Axboe wrote:
> >>>  [...]
> >>> BFS210 runs on the laptop (dual core intel core duo). With make -j4
> >>> running, I clock the following latt -c8 'sleep 10' latencies:
> >>>
> >>> -rc9
> >>>
> >>>          Max                17895 usec
> >>>          Avg                 8028 usec
> >>>          Stdev               5948 usec
> >>>          Stdev mean           405 usec
> >>>
> >>>          Max                17896 usec
> >>>          Avg                 4951 usec
> >>>          Stdev               6278 usec
> >>>          Stdev mean           427 usec
> >>>
> >>>          Max                17885 usec
> >>>          Avg                 5526 usec
> >>>          Stdev               6819 usec
> >>>          Stdev mean           464 usec
> >>>
> >>> -rc9 + mike
> >>>
> >>>          Max                 6061 usec
> >>>          Avg                 3797 usec
> >>>          Stdev               1726 usec
> >>>          Stdev mean           117 usec
> >>>
> >>>          Max                 5122 usec
> >>>          Avg                 3958 usec
> >>>          Stdev               1697 usec
> >>>          Stdev mean           115 usec
> >>>
> >>>          Max                 6691 usec
> >>>          Avg                 2130 usec
> >>>          Stdev               2165 usec
> >>>          Stdev mean           147 usec
> >>
> >> At least in my tests these latencies were mainly due to a bug in
> >> latt.c - i've attached the fixed version.
> >>
> >> The other reason was wakeup batching. If you do this:
> >>
> >>     echo 0>  /proc/sys/kernel/sched_wakeup_granularity_ns
> >>
> >> ... then you can switch on insta-wakeups on -tip too.
> >>
> >> With a dual-core box and a make -j4 background job running, on
> >> latest -tip i get the following latencies:
> >>
> >>   $ ./latt -c8 sleep 30
> >>   Entries: 656 (clients=8)
> >>
> >>   Averages:
> >>   ------------------------------
> >>   	Max	      158 usec
> >> 	Avg	       12 usec
> >> 	Stdev	       10 usec
> >
> > With your version of latt.c, I get these results with 2.6-tip vs
> > 2.6.31-rc9-bfs:
> >
> >
> > (mainline)
> > Averages:
> > ------------------------------
> >         Max            50 usec
> >         Avg            12 usec
> >         Stdev           3 usec
> >
> >
> > (BFS)
> > Averages:
> > ------------------------------
> >         Max           474 usec
> >         Avg            11 usec
> >         Stdev          16 usec
> >
> >
> > However, the interactivity problems still remain.  Does that mean it's
> > not a latency issue?
>
> It probably just means that latt isn't a good measure of the problem.
> Which isn't really too much of a surprise.

And that's a real shame because this was one of the first real good attempts 
I've seen to actually measure the difference, and I thank you for your 
efforts Jens. I believe the reason it's limited is because all you're 
measuring is time from wakeup and the test app isn't actually doing any work. 
The issue is more than just waking up as fast as possible, it's then doing 
some meaningful amount of work within a reasonable time frame as well. What 
the "meaningful amount of work" and "reasonable time frame" are, remains a 
mystery, but I guess could be added on to this testing app.

What does please me now, though, is that this message thread is finally 
concentrating on what BFS was all about. The fact that it doesn't scale is no 
mystery whatsoever. The fact that that throughput and lack of scaling was 
what was given attention was missing the point entirely. To point that out I 
used the bluntest response possible, because I know that works on lkml (does 
it not?). Unfortunately I was so blunt that I ended up writing it in another 
language; Troll. So for that, I apologise.

The unfortunate part is that BFS is still far from a working, complete state, 
yet word got out that I had "released" something, which I had not, but 
obviously there's no great distinction between putting something on a server 
for testing, and a real release with an announce.

BFS is a scheduling experiment to demonstrate what effect the cpu scheduler 
really has on the desktop and how it might be able to perform if we design 
the scheduler for that one purpose.

It pleases me immensely to see that it has already spurred on a flood of 
changes to the interactivity side of mainline development in its few days of 
existence, including some ideas that BFS uses itself. That in itself, to me, 
means it has already started to accomplish its goal, which ultimately, one 
way or another, is to improve what the CPU scheduler can do for the linux 
desktop. I can't track all the sensitive areas of the mainline kernel 
scheduler changes without getting involved more deeply than I care to so it 
would be counterproductive of me to try and hack on mainline. I much prefer 
the quieter inbox.

If people want to use BFS for their own purposes or projects, or even better 
help hack on it, that would make me happy for different reasons. I will 
continue to work on my little project -in my own time- and hope that it 
continues to drive further development of the mainline kernel in its own way. 
We need more experiments like this to question what we currently have and 
accept. Other major kernel subsystems are no exception.

Regards,
-- 
-ck

<code before rhetoric>

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09 20:12                             ` Nikos Chantziaras
  2009-09-09 20:50                               ` Jens Axboe
@ 2009-09-10  3:15                               ` Mike Galbraith
  2009-09-10  6:08                               ` Ingo Molnar
  2 siblings, 0 replies; 216+ messages in thread
From: Mike Galbraith @ 2009-09-10  3:15 UTC (permalink / raw)
  To: Nikos Chantziaras
  Cc: Ingo Molnar, Jens Axboe, Peter Zijlstra, Con Kolivas, linux-kernel

On Wed, 2009-09-09 at 23:12 +0300, Nikos Chantziaras wrote:

> With your version of latt.c, I get these results with 2.6-tip vs 
> 2.6.31-rc9-bfs:
> 
> 
> (mainline)
> Averages:
> ------------------------------
>          Max            50 usec
>          Avg            12 usec
>          Stdev           3 usec
> 
> 
> (BFS)
> Averages:
> ------------------------------
>          Max           474 usec
>          Avg            11 usec
>          Stdev          16 usec
> 
> 
> However, the interactivity problems still remain.  Does that mean it's 
> not a latency issue?

Could be a fairness issue.  If X+client needs more than it's fair share
of CPU, there's nothing to do but use nice levels.  I'm stuck with
unaccelerated X (nvidia card), so if I want a good DVD watching or
whatever eye-candy experience while my box does a lot of other work, I
either have to use SCHED_IDLE/nice for the background stuff, or renice
X.  That's the down side of a fair scheduler.

There is another variant of latency related interactivity issue for the
desktop though, too LOW latency.  If X and clients are switching too
fast, redraw can look nasty, sliced/diced.

	-Mike


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09 20:12                             ` Nikos Chantziaras
  2009-09-09 20:50                               ` Jens Axboe
  2009-09-10  3:15                               ` Mike Galbraith
@ 2009-09-10  6:08                               ` Ingo Molnar
  2009-09-10  6:40                                 ` Ingo Molnar
                                                   ` (2 more replies)
  2 siblings, 3 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10  6:08 UTC (permalink / raw)
  To: Nikos Chantziaras
  Cc: Jens Axboe, Mike Galbraith, Peter Zijlstra, Con Kolivas, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 8360 bytes --]


* Nikos Chantziaras <realnc@arcor.de> wrote:

> On 09/09/2009 09:04 PM, Ingo Molnar wrote:
>> [...]
>> * Jens Axboe<jens.axboe@oracle.com>  wrote:
>>
>>> On Wed, Sep 09 2009, Jens Axboe wrote:
>>>  [...]
>>> BFS210 runs on the laptop (dual core intel core duo). With make -j4
>>> running, I clock the following latt -c8 'sleep 10' latencies:
>>>
>>> -rc9
>>>
>>>          Max                17895 usec
>>>          Avg                 8028 usec
>>>          Stdev               5948 usec
>>>          Stdev mean           405 usec
>>>
>>>          Max                17896 usec
>>>          Avg                 4951 usec
>>>          Stdev               6278 usec
>>>          Stdev mean           427 usec
>>>
>>>          Max                17885 usec
>>>          Avg                 5526 usec
>>>          Stdev               6819 usec
>>>          Stdev mean           464 usec
>>>
>>> -rc9 + mike
>>>
>>>          Max                 6061 usec
>>>          Avg                 3797 usec
>>>          Stdev               1726 usec
>>>          Stdev mean           117 usec
>>>
>>>          Max                 5122 usec
>>>          Avg                 3958 usec
>>>          Stdev               1697 usec
>>>          Stdev mean           115 usec
>>>
>>>          Max                 6691 usec
>>>          Avg                 2130 usec
>>>          Stdev               2165 usec
>>>          Stdev mean           147 usec
>>
>> At least in my tests these latencies were mainly due to a bug in
>> latt.c - i've attached the fixed version.
>>
>> The other reason was wakeup batching. If you do this:
>>
>>     echo 0>  /proc/sys/kernel/sched_wakeup_granularity_ns
>>
>> ... then you can switch on insta-wakeups on -tip too.
>>
>> With a dual-core box and a make -j4 background job running, on
>> latest -tip i get the following latencies:
>>
>>   $ ./latt -c8 sleep 30
>>   Entries: 656 (clients=8)
>>
>>   Averages:
>>   ------------------------------
>>   	Max	      158 usec
>> 	Avg	       12 usec
>> 	Stdev	       10 usec
>
> With your version of latt.c, I get these results with 2.6-tip vs  
> 2.6.31-rc9-bfs:
>
>
> (mainline)
> Averages:
> ------------------------------
>         Max            50 usec
>         Avg            12 usec
>         Stdev           3 usec
>
>
> (BFS)
> Averages:
> ------------------------------
>         Max           474 usec
>         Avg            11 usec
>         Stdev          16 usec
>
> However, the interactivity problems still remain.  Does that mean 
> it's not a latency issue?

It means that Jens's test-app, which demonstrated and helped us fix 
the issue for him does not help us fix it for you just yet.

The "fluidity problem" you described might not be a classic latency 
issue per se (which latt.c measures), but a timeslicing / CPU time 
distribution problem.

A slight shift in CPU time allocation can change the flow of tasks 
to result in a 'choppier' system.

Have you tried, in addition of the granularity tweaks you've done, 
to renice mplayer either up or down? (or compiz and Xorg for that 
matter)

I'm not necessarily suggesting this as a 'real' solution (we really 
prefer kernels that just get it right) - but it's an additional 
parameter dimension along which you can tweak CPU time distribution 
on your box.

Here's the general rule of thumb: mine one nice level gives plus 5% 
CPU time to a task and takes away 5% CPU time from another task - 
i.e. shifts the CPU allocation by 10%.
 
( this is modified by all sorts of dynamic conditions: by the number
  of tasks running and their wakeup patters so not a rule cast into 
  stone - but still a good ballpark figure for CPU intense tasks. )

Btw., i've read your descriptions about what you've tuned so far - 
have you seen/checked the wakeup_granularity tunable as well? 
Setting that to 0 will change the general balance of how CPU time is 
allocated between tasks too.

There's also a whole bunch of scheduler features you can turn on/off 
individually via /debug/sched_features. For example, to turn off 
NEW_FAIR_SLEEPERS, you can do:

  # cat /debug/sched_features 
  NEW_FAIR_SLEEPERS NO_NORMALIZED_SLEEPER ADAPTIVE_GRAN WAKEUP_PREEMPT 
  START_DEBIT AFFINE_WAKEUPS CACHE_HOT_BUDDY SYNC_WAKEUPS NO_HRTICK 
  NO_DOUBLE_TICK ASYM_GRAN LB_BIAS LB_WAKEUP_UPDATE ASYM_EFF_LOAD 
  NO_WAKEUP_OVERLAP LAST_BUDDY OWNER_SPIN 

  # echo NO_NEW_FAIR_SLEEPERS > /debug/sched_features

Btw., NO_NEW_FAIR_SLEEPERS is something that will turn the scheduler 
into a more classic fair scheduler (like BFS is too).

NO_START_DEBIT might be another thing that improves (or worsens :-/) 
make -j type of kernel build workloads.

Note, these flags are all runtime, the new settings take effect 
almost immediately (and at the latest it takes effect when a task 
has started up) and safe to do runtime.

It basically gives us 32768 pluggable schedulers each with a 
slightly separate algorithm - each setting in essence creates a new 
scheduler. (this mechanism is how we introduce new scheduler 
features and allow their debugging / regression-testing.)

(okay, almost, so beware: turning on HRTICK might lock up your 
system.)

Plus, yet another dimension of tuning on SMP systems (such as 
dual-core) are the sched-domains tunable. There's a whole world of 
tuning in that area and BFS essentially implements a very agressive 
'always balance to other CPUs' policy.

I've attached my sched-tune-domains script which helps tune these 
parameters.

For example on a testbox of mine it outputs:

usage: tune-sched-domains <val>
{cpu0/domain0:SIBLING} SD flag: 239
+   1: SD_LOAD_BALANCE:          Do load balancing on this domain
+   2: SD_BALANCE_NEWIDLE:       Balance when about to become idle
+   4: SD_BALANCE_EXEC:          Balance on exec
+   8: SD_BALANCE_FORK:          Balance on fork, clone
-  16: SD_WAKE_IDLE:             Wake to idle CPU on task wakeup
+  32: SD_WAKE_AFFINE:           Wake task to waking CPU
+  64: SD_WAKE_BALANCE:          Perform balancing at task wakeup
+ 128: SD_SHARE_CPUPOWER:        Domain members share cpu power
- 256: SD_POWERSAVINGS_BALANCE:  Balance for power savings
- 512: SD_SHARE_PKG_RESOURCES:   Domain members share cpu pkg resources
-1024: SD_SERIALIZE:             Only a single load balancing instance
-2048: SD_WAKE_IDLE_FAR:         Gain latency sacrificing cache hit
-4096: SD_PREFER_SIBLING:        Prefer to place tasks in a sibling domain
{cpu0/domain1:MC} SD flag: 4735
+   1: SD_LOAD_BALANCE:          Do load balancing on this domain
+   2: SD_BALANCE_NEWIDLE:       Balance when about to become idle
+   4: SD_BALANCE_EXEC:          Balance on exec
+   8: SD_BALANCE_FORK:          Balance on fork, clone
+  16: SD_WAKE_IDLE:             Wake to idle CPU on task wakeup
+  32: SD_WAKE_AFFINE:           Wake task to waking CPU
+  64: SD_WAKE_BALANCE:          Perform balancing at task wakeup
- 128: SD_SHARE_CPUPOWER:        Domain members share cpu power
- 256: SD_POWERSAVINGS_BALANCE:  Balance for power savings
+ 512: SD_SHARE_PKG_RESOURCES:   Domain members share cpu pkg resources
-1024: SD_SERIALIZE:             Only a single load balancing instance
-2048: SD_WAKE_IDLE_FAR:         Gain latency sacrificing cache hit
+4096: SD_PREFER_SIBLING:        Prefer to place tasks in a sibling domain
{cpu0/domain2:NODE} SD flag: 3183
+   1: SD_LOAD_BALANCE:          Do load balancing on this domain
+   2: SD_BALANCE_NEWIDLE:       Balance when about to become idle
+   4: SD_BALANCE_EXEC:          Balance on exec
+   8: SD_BALANCE_FORK:          Balance on fork, clone
-  16: SD_WAKE_IDLE:             Wake to idle CPU on task wakeup
+  32: SD_WAKE_AFFINE:           Wake task to waking CPU
+  64: SD_WAKE_BALANCE:          Perform balancing at task wakeup
- 128: SD_SHARE_CPUPOWER:        Domain members share cpu power
- 256: SD_POWERSAVINGS_BALANCE:  Balance for power savings
- 512: SD_SHARE_PKG_RESOURCES:   Domain members share cpu pkg resources
+1024: SD_SERIALIZE:             Only a single load balancing instance
+2048: SD_WAKE_IDLE_FAR:         Gain latency sacrificing cache hit
-4096: SD_PREFER_SIBLING:        Prefer to place tasks in a sibling domain

The way i can turn on say SD_WAKE_IDLE for the NODE domain is to:

   tune-sched-domains 239 4735 $((3183+16))

( This is a pretty stone-age script i admit ;-)

Thanks for all your testing so far,

	Ingo

[-- Attachment #2: tune-sched-domains --]
[-- Type: text/plain, Size: 2152 bytes --]


DIR=/proc/sys/kernel/sched_domain/

print_flags()
{
  flags[1]="SD_LOAD_BALANCE:          Do load balancing on this domain"
  flags[2]="SD_BALANCE_NEWIDLE:       Balance when about to become idle"
  flags[4]="SD_BALANCE_EXEC:          Balance on exec"
  flags[8]="SD_BALANCE_FORK:          Balance on fork, clone"
  flags[16]="SD_WAKE_IDLE:             Wake to idle CPU on task wakeup"
  flags[32]="SD_WAKE_AFFINE:           Wake task to waking CPU"
  flags[64]="SD_WAKE_BALANCE:          Perform balancing at task wakeup"
  flags[128]="SD_SHARE_CPUPOWER:        Domain members share cpu power"
  flags[256]="SD_POWERSAVINGS_BALANCE:  Balance for power savings"
  flags[512]="SD_SHARE_PKG_RESOURCES:   Domain members share cpu pkg resources"
  flags[1024]="SD_SERIALIZE:             Only a single load balancing instance"
  flags[2048]="SD_WAKE_IDLE_FAR:         Gain latency sacrificing cache hit"
  flags[4096]="SD_PREFER_SIBLING:        Prefer to place tasks in a sibling domain"

  DEC=$1
  CPU=$2
  DOM=$3

  [ -d $DIR/$CPU/$DOM ] || { exit 0; }

  NAME=$(cat $DIR/$CPU/$DOM/name)

  [ $DEC = "-1" ] && DEC=$(cat $DIR/$CPU/$DOM/flags)

  echo "{$CPU/$DOM:$NAME} SD flag: $DEC"

  for ((mask=1;mask<=4096;mask*=2)); do
   if [ "$[$mask & $DEC]" != "0" ]; then
      printf "+%4d: %s\n" $mask "${flags[$mask]}"
   else
      if [ $mask -le 4096 ]; then
        printf "%c%4d: %s\n" "-" $mask "${flags[$mask]}"
      fi
   fi
  done
}


[ $# -lt "1" ] && {
  echo 'usage: tune-sched-domains <val>'

  print_flags -1 cpu0 domain0
  print_flags -1 cpu0 domain1
  print_flags -1 cpu0 domain2
  print_flags -1 cpu0 domain3

  exit -1;
}

DOM0=$1
DOM1=${2:-$DOM0}
DOM2=${3:-$DOM1}
DOM3=${4:-$DOM2}

cd $DIR

for CPU in *; do
 cd $CPU
 for DOM in *; do

  FLAGS=$DOM/flags

  VAL=`cat $FLAGS`
  case $FLAGS in
     "domain0/flags") NEW_VAL=$DOM0 ;;
     "domain1/flags") NEW_VAL=$DOM1 ;;
     "domain2/flags") NEW_VAL=$DOM2 ;;
     "domain3/flags") NEW_VAL=$DOM3 ;;
     *) echo "error!" ;;
  esac

  echo $NEW_VAL > $FLAGS

  [ "$CPU" = "cpu0" ] && {
    echo "changed $FLAGS: $VAL => $NEW_VAL"
    print_flags $NEW_VAL $CPU $DOM
  }
 done
 cd ..
done



^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  6:08                               ` Ingo Molnar
@ 2009-09-10  6:40                                 ` Ingo Molnar
  2009-09-10  9:54                                   ` Jens Axboe
  2009-09-10 16:02                                 ` Bret Towe
  2009-09-10 17:53                                 ` Nikos Chantziaras
  2 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10  6:40 UTC (permalink / raw)
  To: Nikos Chantziaras
  Cc: Jens Axboe, Mike Galbraith, Peter Zijlstra, Con Kolivas, linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> > However, the interactivity problems still remain.  Does that 
> > mean it's not a latency issue?
> 
> It means that Jens's test-app, which demonstrated and helped us 
> fix the issue for him does not help us fix it for you just yet.

Lemme qualify that by saying that Jens's issues are improved not 
fixed [he has not re-run with latest latt.c yet] but not all things 
are fully fixed yet. For example the xmodmap thing sounds 
interesting - could that be a child-runs-first effect?

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-09 20:49       ` Serge Belyshev
  2009-09-09 21:23         ` Cory Fields
@ 2009-09-10  6:53         ` Ingo Molnar
  2009-09-10 23:23           ` Serge Belyshev
  1 sibling, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10  6:53 UTC (permalink / raw)
  To: Serge Belyshev; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith


* Serge Belyshev <belyshev@depni.sinp.msu.ru> wrote:

> Ingo Molnar <mingo@elte.hu> writes:
> 
> > Thanks!
> >
> > I think we found the reason for that regression - would you mind 
> > to re-test with latest -tip, e157986 or later?
> >
> > If that works for you i'll describe our theory.
> >
> 
> Good job -- seems to work, thanks.  Regression is still about 3% 
> though: http://img3.imageshack.us/img3/5335/epicbfstip.png

Ok, thanks for the update. The problem is that i've run out of 
testsystems that can reproduce this. So we need your help to debug 
this directly ...

A good start would be to post the -tip versus BFS "perf stat" 
measurement results:

   perf stat --repeat 3 make -j4 bzImage

And also the -j8 perf stat result, so that we can see what the 
difference is between -j4 and -j8.

Note: please check out latest tip and do:

  cd tools/perf/
  make -j install

To pick up the latest 'perf' tool. In particular the precision of 
--repeat has been improved recently so you want that binary from 
-tip even if you measure vanilla .31 or .31 based BFS.

Also, it would be nice if you could send me your kernel config - 
maybe it's some config detail that keeps me from being able to 
reproduce these results. I havent seen a link to a config in your 
mails (maybe i missed it - these threads are voluminous).

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09 12:20                         ` Jens Axboe
  2009-09-09 18:04                           ` Ingo Molnar
@ 2009-09-10  6:55                           ` Peter Zijlstra
  2009-09-10  6:58                             ` Jens Axboe
  2009-09-10  6:59                             ` BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
  1 sibling, 2 replies; 216+ messages in thread
From: Peter Zijlstra @ 2009-09-10  6:55 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Mike Galbraith, Ingo Molnar, Con Kolivas, linux-kernel

On Wed, 2009-09-09 at 14:20 +0200, Jens Axboe wrote:
> 
> One thing I also noticed is that when I have logged in, I run xmodmap
> manually to load some keymappings (I always tell myself to add this to
> the log in scripts, but I suspend/resume this laptop for weeks at the
> time and forget before the next boot). With the stock kernel, xmodmap
> will halt X updates and take forever to run. With BFS, it returned
> instantly. As I would expect.

Can you provide a little more detail (I'm a xmodmap n00b), how does one
run xmodmap and maybe provide your xmodmap config?


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  6:55                           ` Peter Zijlstra
@ 2009-09-10  6:58                             ` Jens Axboe
  2009-09-10  7:04                               ` Ingo Molnar
  2009-09-10  7:33                               ` Jens Axboe
  2009-09-10  6:59                             ` BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
  1 sibling, 2 replies; 216+ messages in thread
From: Jens Axboe @ 2009-09-10  6:58 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mike Galbraith, Ingo Molnar, Con Kolivas, linux-kernel

On Thu, Sep 10 2009, Peter Zijlstra wrote:
> On Wed, 2009-09-09 at 14:20 +0200, Jens Axboe wrote:
> > 
> > One thing I also noticed is that when I have logged in, I run xmodmap
> > manually to load some keymappings (I always tell myself to add this to
> > the log in scripts, but I suspend/resume this laptop for weeks at the
> > time and forget before the next boot). With the stock kernel, xmodmap
> > will halt X updates and take forever to run. With BFS, it returned
> > instantly. As I would expect.
> 
> Can you provide a little more detail (I'm a xmodmap n00b), how does one
> run xmodmap and maybe provide your xmodmap config?

Will do, let me get the notebook and strace time it on both bfs and
mainline.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  6:55                           ` Peter Zijlstra
  2009-09-10  6:58                             ` Jens Axboe
@ 2009-09-10  6:59                             ` Ingo Molnar
  1 sibling, 0 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10  6:59 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jens Axboe, Mike Galbraith, Con Kolivas, linux-kernel


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Wed, 2009-09-09 at 14:20 +0200, Jens Axboe wrote:
> > 
> > One thing I also noticed is that when I have logged in, I run xmodmap
> > manually to load some keymappings (I always tell myself to add this to
> > the log in scripts, but I suspend/resume this laptop for weeks at the
> > time and forget before the next boot). With the stock kernel, xmodmap
> > will halt X updates and take forever to run. With BFS, it returned
> > instantly. As I would expect.
> 
> Can you provide a little more detail (I'm a xmodmap n00b), how 
> does one run xmodmap and maybe provide your xmodmap config?

(and which version did you use, just in case it matters.)

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  6:58                             ` Jens Axboe
@ 2009-09-10  7:04                               ` Ingo Molnar
  2009-09-10  9:44                                 ` Jens Axboe
  2009-09-10  7:33                               ` Jens Axboe
  1 sibling, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10  7:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Peter Zijlstra, Mike Galbraith, Con Kolivas, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]


* Jens Axboe <jens.axboe@oracle.com> wrote:

> On Thu, Sep 10 2009, Peter Zijlstra wrote:
> > On Wed, 2009-09-09 at 14:20 +0200, Jens Axboe wrote:
> > > 
> > > One thing I also noticed is that when I have logged in, I run xmodmap
> > > manually to load some keymappings (I always tell myself to add this to
> > > the log in scripts, but I suspend/resume this laptop for weeks at the
> > > time and forget before the next boot). With the stock kernel, xmodmap
> > > will halt X updates and take forever to run. With BFS, it returned
> > > instantly. As I would expect.
> > 
> > Can you provide a little more detail (I'm a xmodmap n00b), how 
> > does one run xmodmap and maybe provide your xmodmap config?
> 
> Will do, let me get the notebook and strace time it on both bfs 
> and mainline.

A 'perf stat' comparison would be nice as well - that will show us 
events strace doesnt show, and shows us the basic scheduler behavior 
as well.

A 'full' trace could be done as well via trace-cmd.c (attached), if 
you enable:

  CONFIG_CONTEXT_SWITCH_TRACER=y

and did something like:

  trace-cmd -s xmodmap ... > trace.txt

	Ingo

[-- Attachment #2: trace-cmd.c --]
[-- Type: text/plain, Size: 6541 bytes --]

/*
 * Copyright (C) 2008, Steven Rostedt <srostedt@redhat.com>
 *
 * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; version 2 of the License (not later!)
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 *
 * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

#define VERSION "0.2"

#define _STR(x) #x
#define STR(x) _STR(x)
#define MAX_PATH 256

#define TRACE_CTRL	"tracing_enabled"
#define TRACE		"latency_trace"
#define AVAILABLE	"available_tracers"
#define CURRENT		"current_tracer"
#define ITER_CTRL	"iter_ctrl"
#define MAX_LATENCY	"tracing_max_latency"
#define THRESH		"tracing_thresh"

static void die(char *fmt, ...)
{
	va_list ap;
	int ret = errno;

	if (errno)
		perror("trace-cmd");
	else
		ret = -1;

	va_start(ap, fmt);
	fprintf(stderr, "  ");
	vfprintf(stderr, fmt, ap);
	va_end(ap);

	fprintf(stderr, "\n");
	exit(ret);
}

static int search_mounts(char *path, int size)
{
	FILE *fp;
	static char debugfs[MAX_PATH+1];
	static int debugfs_size;

	if (!debugfs_size) {
		char type[100];

		if ((fp = fopen("/proc/mounts","r")) == NULL)
			die("Can't open /proc/mounts for read");

		while (fscanf(fp, "%*s %"
			      STR(MAX_PATH)
			      "s %99s %*s %*d %*d\n",
			      debugfs, type) == 2) {
			if (strcmp(type, "debugfs") == 0)
				break;
		}
		fclose(fp);

		if (strcmp(type, "debugfs") != 0)
			die("debugfs not mounted, please mount");
	}

	debugfs_size = strlen(debugfs)+1;

	if (size > debugfs_size)
		size = debugfs_size;

	memcpy(path, debugfs, size);

	return size;
}

/*
 * Finds the path to the debugfs/tracing
 * Allocates the string and stores it.
 */
static int tracing_dir(char *path, int size)
{
	static char debugfs[MAX_PATH];
	static int debugfs_size;
	int ret;

	if (!debugfs_size) {
		ret = search_mounts(debugfs, MAX_PATH);
		if (ret < 0)
			return ret;
		debugfs_size = MAX_PATH - ret;
		strncat(debugfs, "/tracing", debugfs_size);
		debugfs_size = strlen(debugfs)+1;
	}

	if (size > debugfs_size)
		size = debugfs_size;

	memcpy(path, debugfs, size);

	return size;
}

static int tracing_type(char *path, const char *type, int size)
{
	int len = strlen(type) + 1;
	int ret;

	ret = tracing_dir(path, size);
	size -= ret;
	if (len > size)
		die ("debugfs path is too big!");
	strcat(path, "/");
	strcat(path, type);

	return ret + len;
}

static void write_trace(const char *file, const char *val)
{
	char path[MAX_PATH+1];
	int fd;

	tracing_type(path, file, MAX_PATH);

	fd = open(path, O_WRONLY);
	if (fd < 0)
		die("writng %s", path);
	write(fd, val, strlen(val));
	close(fd);
}

static int find_trace_type(const char *type)
{
	FILE *fp;
	char path[MAX_PATH+1];
	char scan[100];
	int ret;

	tracing_type(path, AVAILABLE, MAX_PATH);
	fp = fopen(path, "r");
	if (!fp)
		die("reading %s", path);
	do {
		ret = fscanf(fp, "%99s", scan);
		if (ret > 0 && strcmp(scan, type) == 0)
			break;
	} while (ret > 0);
	fclose(fp);

	return ret > 0;
}

static void set_ftrace(int set)
{
	int fd;
	char *val = set ? "1" : "0";

	fd = open("/proc/sys/kernel/ftrace_enabled", O_WRONLY);
	if (fd < 0)
		die ("Can't %s ftrace", set ? "enable" : "disable");

	write(fd, val, 1);
	close(fd);
}

void run_cmd(int argc, char **argv)
{
	int status;
	int pid;

	if ((pid = fork()) < 0)
		die("failed to fork");
	if (!pid) {
		/* child */
		if (execvp(argv[0], argv))
			exit(-1);
	}
	waitpid(pid, &status, 0);
}

static void usage(char **argv)
{
	char *arg = argv[0];
	char *p = arg+strlen(arg);

	while (p >= arg && *p != '/')
		p--;
	p++;

	printf("\n"
	       "%s version %s\n\n"
	       "usage: %s OPTION [-f] command ...\n"
	       "\n"
	       "  -s  set context switch trace\n"
	       "  -p  set preemption off trace\n"
	       "  -i  set interrupts off trace\n"
	       "  -b  set preempt and interrupts off trace\n"
	       "  -w  set wakeup tracing\n"
	       "  -e  set event tracing\n"
	       "  -f  set function trace\n"
	       "\n"
	       "  Note: only -f may be set with any other trace\n"
	       "\n", p, VERSION, p);
	exit(-1);
}

int main (int argc, char **argv)
{
	const char *type = NULL;
	const char *config;
	int function = 0;
	int type_set = 0;
	int max = 0;
	int c;

	while ((c = getopt(argc, argv, "+hspibfew")) >= 0) {
		switch (c) {
		case 'h':
			usage(argv);
			break;
		case 's':
			type = "sched_switch";
			config = "CONFIG_CONTEXT_SWITCH_TRACER";
			type_set++;
			break;
		case 'p':
			type = "preemptoff";
			config = "CONFIG_CRITICAL_PREEMPT_TIMING";
			max = 1;
			type_set++;
			break;
		case 'i':
			type = "irqsoff";
			config = "CONFIG_CRITICAL_IRQSOFF_TIMING";
			max = 1;
			type_set++;
			break;
		case 'b':
			type = "preemptirqsoff";
			config = "CONFIG_CRITICAL_IRQSOFF_TIMING and"
				" CONFIG_CRITICAL_PREEMPT_TIMING";
			max = 1;
			type_set++;
			break;
		case 'w':
			type = "wakeup";
			config = "CONFIG_WAKEUP_TRACER";
			max = 1;
			type_set++;
			break;
		case 'e':
			type = "events";
			config = "CONFIG_EVENT_TRACER";
			type_set++;
			break;
		case 'f':
			if (!type) {
				type = "ftrace";
				config = "CONFIG_FTRACE";
			}
			function = 1;
			break;
		default:
			/* yeah yeah, I know this is a dup! */
			usage(argv);
			break;
		}
	}

	if (type_set > 1)
		usage(argv);

	if (!(argc - optind))
		usage(argv);

	if (!type)
		usage(argv);

	if (!find_trace_type(type))
		die("Trace type %s not found.\n"
		    " Please configure the kernel with %s set\n",
		    type, config);

	write_trace(TRACE_CTRL, "0");
	if (function)
		set_ftrace(1);
	if (max)
		write_trace(MAX_LATENCY, "0");

	write_trace(CURRENT, type);
	write_trace(TRACE_CTRL, "1");

	run_cmd(argc - optind, &argv[optind]);

	write_trace(TRACE_CTRL, "0");
	if (function)
		set_ftrace(0);

	system("cat /debug/tracing/trace");

	return 0;
}


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  6:58                             ` Jens Axboe
  2009-09-10  7:04                               ` Ingo Molnar
@ 2009-09-10  7:33                               ` Jens Axboe
  2009-09-10  7:49                                 ` Ingo Molnar
  1 sibling, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-10  7:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mike Galbraith, Ingo Molnar, Con Kolivas, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2179 bytes --]

On Thu, Sep 10 2009, Jens Axboe wrote:
> On Thu, Sep 10 2009, Peter Zijlstra wrote:
> > On Wed, 2009-09-09 at 14:20 +0200, Jens Axboe wrote:
> > > 
> > > One thing I also noticed is that when I have logged in, I run xmodmap
> > > manually to load some keymappings (I always tell myself to add this to
> > > the log in scripts, but I suspend/resume this laptop for weeks at the
> > > time and forget before the next boot). With the stock kernel, xmodmap
> > > will halt X updates and take forever to run. With BFS, it returned
> > > instantly. As I would expect.
> > 
> > Can you provide a little more detail (I'm a xmodmap n00b), how does one
> > run xmodmap and maybe provide your xmodmap config?
> 
> Will do, let me get the notebook and strace time it on both bfs and
> mainline.

Here's the result of running perf stat xmodmap .xmodmap-carl on the
notebook. I have attached the .xmodmap-carl file, it's pretty simple. I
have also attached the output of strace -o foo -f -tt xmodmap
.xmodmap-carl when run on 2.6.31-rc9.

2.6.31-rc9-bfs210

 Performance counter stats for 'xmodmap .xmodmap-carl':

     153.994976  task-clock-msecs         #      0.990 CPUs   (scaled from 99.86%)
              0  context-switches         #      0.000 M/sec  (scaled from 99.86%)
              0  CPU-migrations           #      0.000 M/sec  (scaled from 99.86%)
            315  page-faults              #      0.002 M/sec  (scaled from 99.86%)
  <not counted>  cycles                  
  <not counted>  instructions            
  <not counted>  cache-references        
  <not counted>  cache-misses            

    0.155573406  seconds time elapsed

2.6.31-rc9

 Performance counter stats for 'xmodmap .xmodmap-carl':

       8.529265  task-clock-msecs         #      0.001 CPUs 
             23  context-switches         #      0.003 M/sec
              1  CPU-migrations           #      0.000 M/sec
            315  page-faults              #      0.037 M/sec
  <not counted>  cycles                  
  <not counted>  instructions            
  <not counted>  cache-references        
  <not counted>  cache-misses            

   11.804293482  seconds time elapsed


-- 
Jens Axboe


[-- Attachment #2: .xmodmap-carl --]
[-- Type: text/plain, Size: 1365 bytes --]

!
! This is an `xmodmap' input file for 
!   PC 105 key, wide Delete, tall Enter (XFree86; US) keyboards.
! Automatically generated on Sun Aug 31 20:11:39 2008 by axboe with
! XKeyCaps 2.47; Copyright (c) 1991-1999 Jamie Zawinski; 2005-2006 Christoph Berg.
! http://www.jwz.org/xkeycaps/
!
! This file presupposes that the keyboard is in the default state, and
! may malfunction if it is not.
!
remove Lock    = Caps_Lock
remove Mod4    = Meta_L Meta_R
remove Mod5    = Scroll_Lock

keycode 0x43 =	Escape	XF86_Switch_VT_1
keycode 0x44 =	F2	XF86_Switch_VT_2
keycode 0x45 =	F3	XF86_Switch_VT_3
keycode 0x46 =	F4	XF86_Switch_VT_4
keycode 0x47 =	F5	XF86_Switch_VT_5
keycode 0x48 =	F6	XF86_Switch_VT_6
keycode 0x49 =	F7	XF86_Switch_VT_7
keycode 0x4A =	F8	XF86_Switch_VT_8
keycode 0x4B =	F9	XF86_Switch_VT_9
keycode 0x4C =	F10	XF86_Switch_VT_10
keycode 0x5F =	F11	XF86_Switch_VT_11
keycode 0x60 =	F12	XF86_Switch_VT_12
keycode 0x16 =	BackSpace	Terminate_Server
keycode 0x3F =	KP_Multiply	XF86_ClearGrab
keycode 0x52 =	KP_Subtract	XF86_Prev_VMode
keycode 0x56 =	KP_Add	XF86_Next_VMode
keycode 0x42 =	Control_L
keycode 0x5E =	less	greater	bar	brokenbar	bar	brokenbar
keycode 0x40 =	Alt_L	Meta_L

add    Control = Control_L
add    Mod1    = 0x009C
add    Mod4    = 0x007F 0x0080
add    Mod5    = Mode_switch ISO_Level3_Shift

keycode 0xa6 = Page_Up
keycode 0xa7 = Page_Down

[-- Attachment #3: strace-xmodmap.txt --]
[-- Type: text/plain, Size: 20480 bytes --]

4659  09:32:29.464949 execve("/usr/bin/xmodmap", ["xmodmap", ".xmodmap-carl"], [/* 44 vars */]) = 0
4659  09:32:29.465551 brk(0)            = 0x87e2000
4659  09:32:29.465664 access("/etc/ld.so.nohwcap", F_OK) = 0
4659  09:32:29.465771 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6ff8a000
4659  09:32:29.465875 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
4659  09:32:29.466003 open("/etc/ld.so.cache", O_RDONLY) = 3
4659  09:32:29.466100 fstat64(3, {st_mode=S_IFREG|0644, st_size=80093, ...}) = 0
4659  09:32:29.466243 mmap2(NULL, 80093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x6ff76000
4659  09:32:29.466355 close(3)          = 0
4659  09:32:29.466438 access("/etc/ld.so.nohwcap", F_OK) = 0
4659  09:32:29.466551 open("/usr/lib/libX11.so.6", O_RDONLY) = 3
4659  09:32:29.466654 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\2604\1\0004\0\0\0\234"..., 512) = 512
4659  09:32:29.466775 fstat64(3, {st_mode=S_IFREG|0644, st_size=971436, ...}) = 0
4659  09:32:29.466913 mmap2(NULL, 975508, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x6fe87000
4659  09:32:29.467000 mprotect(0x6ff71000, 4096, PROT_NONE) = 0
4659  09:32:29.467089 mmap2(0x6ff72000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xea) = 0x6ff72000
4659  09:32:29.467193 mmap2(0x6ff75000, 660, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x6ff75000
4659  09:32:29.467290 close(3)          = 0
4659  09:32:29.467404 access("/etc/ld.so.nohwcap", F_OK) = 0
4659  09:32:29.467504 open("/lib/libc.so.6", O_RDONLY) = 3
4659  09:32:29.467605 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320h\1\0004\0\0\0T"..., 512) = 512
4659  09:32:29.467722 fstat64(3, {st_mode=S_IFREG|0755, st_size=1331404, ...}) = 0
4659  09:32:29.467860 mmap2(NULL, 1336944, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x6fd40000
4659  09:32:29.467951 mmap2(0x6fe81000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x141) = 0x6fe81000
4659  09:32:29.468055 mmap2(0x6fe84000, 9840, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x6fe84000
4659  09:32:29.468152 close(3)          = 0
4659  09:32:29.468238 access("/etc/ld.so.nohwcap", F_OK) = 0
4659  09:32:29.468375 open("/usr/lib/libxcb.so.1", O_RDONLY) = 3
4659  09:32:29.468477 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200z\0\0004\0\0\0\250"..., 512) = 512
4659  09:32:29.468599 fstat64(3, {st_mode=S_IFREG|0644, st_size=99768, ...}) = 0
4659  09:32:29.468732 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6fd3f000
4659  09:32:29.468829 mmap2(NULL, 102676, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x6fd25000
4659  09:32:29.468920 mmap2(0x6fd3d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17) = 0x6fd3d000
4659  09:32:29.469026 close(3)          = 0
4659  09:32:29.469112 access("/etc/ld.so.nohwcap", F_OK) = 0
4659  09:32:29.469213 open("/lib/libdl.so.2", O_RDONLY) = 3
4659  09:32:29.469324 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 \n\0\0004\0\0\0D"..., 512) = 512
4659  09:32:29.469453 fstat64(3, {st_mode=S_IFREG|0644, st_size=9676, ...}) = 0
4659  09:32:29.469594 mmap2(NULL, 12408, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x6fd21000
4659  09:32:29.469685 mmap2(0x6fd23000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1) = 0x6fd23000
4659  09:32:29.469790 close(3)          = 0
4659  09:32:29.469879 access("/etc/ld.so.nohwcap", F_OK) = 0
4659  09:32:29.469976 open("/usr/lib/libXau.so.6", O_RDONLY) = 3
4659  09:32:29.470077 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0P\n\0\0004\0\0\0<"..., 512) = 512
4659  09:32:29.470192 fstat64(3, {st_mode=S_IFREG|0644, st_size=9508, ...}) = 0
4659  09:32:29.470341 mmap2(NULL, 12412, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x6fd1d000
4659  09:32:29.470445 mmap2(0x6fd1f000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1) = 0x6fd1f000
4659  09:32:29.470557 close(3)          = 0
4659  09:32:29.470655 access("/etc/ld.so.nohwcap", F_OK) = 0
4659  09:32:29.470753 open("/usr/lib/libXdmcp.so.6", O_RDONLY) = 3
4659  09:32:29.470855 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0P\16\0\0004\0\0\0\f"..., 512) = 512
4659  09:32:29.470976 fstat64(3, {st_mode=S_IFREG|0644, st_size=16628, ...}) = 0
4659  09:32:29.471112 mmap2(NULL, 19520, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x6fd18000
4659  09:32:29.471204 mmap2(0x6fd1c000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3) = 0x6fd1c000
4659  09:32:29.471320 close(3)          = 0
4659  09:32:29.471450 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6fd17000
4659  09:32:29.471555 set_thread_area({entry_number:-1 -> 6, base_addr:0x6fd17a00, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
4659  09:32:29.471658 open("/dev/urandom", O_RDONLY) = 3
4659  09:32:29.471755 read(3, "\6~R)"..., 4) = 4
4659  09:32:29.471844 close(3)          = 0
4659  09:32:29.471947 mprotect(0x6fd1f000, 4096, PROT_READ) = 0
4659  09:32:29.472055 mprotect(0x6fd23000, 4096, PROT_READ) = 0
4659  09:32:29.472153 mprotect(0x6fd3d000, 4096, PROT_READ) = 0
4659  09:32:29.472344 mprotect(0x6fe81000, 8192, PROT_READ) = 0
4659  09:32:29.472525 mprotect(0x6ff72000, 4096, PROT_READ) = 0
4659  09:32:29.472621 mprotect(0x804e000, 4096, PROT_READ) = 0
4659  09:32:29.472712 mprotect(0x6ffa7000, 4096, PROT_READ) = 0
4659  09:32:29.472794 munmap(0x6ff76000, 80093) = 0
4659  09:32:29.473047 brk(0)            = 0x87e2000
4659  09:32:29.473126 brk(0x8803000)    = 0x8803000
4659  09:32:29.473323 socket(PF_FILE, SOCK_STREAM, 0) = 3
4659  09:32:29.473445 connect(3, {sa_family=AF_FILE, path=@"/tmp/.X11-unix/X0"...}, 20) = 0
4659  09:32:29.473708 getpeername(3, {sa_family=AF_FILE, path=@"/tmp/.X11-unix/X0"...}, [20]) = 0
4659  09:32:29.473878 uname({sys="Linux", node="carl", ...}) = 0
4659  09:32:29.474146 access("/home/axboe/.Xauthority", R_OK) = 0
4659  09:32:29.474257 open("/home/axboe/.Xauthority", O_RDONLY) = 4
4659  09:32:29.474398 fstat64(4, {st_mode=S_IFREG|0600, st_size=115, ...}) = 0
4659  09:32:29.474536 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6ff89000
4659  09:32:29.474629 read(4, "\1\0\0\4carl\0\0010\0\22MIT-MAGIC-COOKIE-1\0\20"..., 4096) = 115
4659  09:32:29.474756 read(4, ""..., 4096) = 0
4659  09:32:29.474839 close(4)          = 0
4659  09:32:29.474918 munmap(0x6ff89000, 4096) = 0
4659  09:32:29.475036 fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
4659  09:32:29.475124 fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
4659  09:32:29.475205 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
4659  09:32:29.475313 select(4, [3], [3], NULL, NULL) = 1 (out [3])
4659  09:32:29.475437 writev(3, [{"l\0\v\0\0\0\22\0\20\0\0\0"..., 12}, {""..., 0}, {"MIT-MAGIC-COOKIE-1"..., 18}, {"\0\0"..., 2}, {"\232\346\374\251t\31M}(\306K\276\272\16\354\376"..., 16}, {""..., 0}], 6) = 48
4659  09:32:29.475719 read(3, "\1\0\v\0\0\0\263\1"..., 8) = 8
4659  09:32:29.475823 read(3, "@\276\241\0\0\0`\2\377\377\37\0\0\1\0\0\24\0\377\377\1\7\0\0  \10\377\1\0\0\0T"..., 1740) = 1740
4659  09:32:29.476021 select(4, [3], [3], NULL, NULL) = 1 (out [3])
4659  09:32:29.476122 writev(3, [{"b\0\5\0\f\0\0\0BIG-REQUESTS"..., 20}], 1) = 20
4659  09:32:29.476314 select(4, [3], [], NULL, NULL) = 1 (in [3])
4659  09:32:29.476427 read(3, "\1\0\1\0\0\0\0\0\1\216\0\0\0\0\0\0\24\0\0\0\0\0\0\0p\343%\t\0\0\0\0"..., 4096) = 32
4659  09:32:29.476580 select(4, [3], [3], NULL, NULL) = 1 (out [3])
4659  09:32:29.476680 writev(3, [{"\216\0\1\0"..., 4}], 1) = 4
4659  09:32:29.476843 select(4, [3], [], NULL, NULL) = 1 (in [3])
4659  09:32:29.476940 read(3, "\1\0\2\0\0\0\0\0\377\377?\0\0\0\0\1\0\0\0\0p\343%\t\0\0\0\0\364_\36\10"..., 4096) = 32
4659  09:32:29.477087 read(3, 0x87e2850, 4096) = -1 EAGAIN (Resource temporarily unavailable)
4659  09:32:29.477193 select(4, [3], [3], NULL, NULL) = 1 (out [3])
4659  09:32:29.477306 writev(3, [{"7\0\5\0\0\0`\2#\1\0\0\10\0\0\0\377\377\377\0\24\0\6\0#\1\0\0\27\0\0\0\37"..., 44}, {NULL, 0}, {""..., 0}], 3) = 44
4659  09:32:29.477566 select(4, [3], [], NULL, NULL) = 1 (in [3])
4659  09:32:29.477665 read(3, "\1\10\4\0x\0\0\0\37\0\0\0\0\0\0\0\335\1\0\0``\37\10xs\310w9\2\26\10*"..., 4096) = 512
4659  09:32:29.477802 read(3, 0x87e2850, 4096) = -1 EAGAIN (Resource temporarily unavailable)
4659  09:32:29.477909 select(4, [3], [3], NULL, NULL) = 1 (out [3])
4659  09:32:29.478008 writev(3, [{"b\0\5\0\t\0`\2"..., 8}, {"XKEYBOARD"..., 9}, {"\0\0\0"..., 3}], 3) = 20
4659  09:32:29.478200 select(4, [3], [], NULL, NULL) = 1 (in [3])
4659  09:32:29.478310 read(3, "\1\0\5\0\0\0\0\0\1\220`\231\0\0\0\0\24\0\0\0\0\0\0\0p\343%\t\0\0\0\0"..., 4096) = 32
4659  09:32:29.478459 read(3, 0x87e2850, 4096) = -1 EAGAIN (Resource temporarily unavailable)
4659  09:32:29.478561 select(4, [3], [3], NULL, NULL) = 1 (out [3])
4659  09:32:29.478660 writev(3, [{"\220\0\2\0\1\0\0\0"..., 8}, {NULL, 0}, {""..., 0}], 3) = 8
4659  09:32:29.478841 select(4, [3], [], NULL, NULL) = 1 (in [3])
4659  09:32:29.478938 read(3, "\1\1\6\0\0\0\0\0\1\0\0\0\230\270\30\t\2\0\0\0\10\0\0\0p\343%\t\30\260\350\10"..., 4096) = 32
4659  09:32:29.479072 read(3, 0x87e2850, 4096) = -1 EAGAIN (Resource temporarily unavailable)
4659  09:32:29.479170 select(4, [3], [3], NULL, NULL) = 1 (out [3])
4659  09:32:29.479269 writev(3, [{"w\0\1\0"..., 4}, {NULL, 0}, {""..., 0}], 3) = 4
4659  09:32:29.479475 select(4, [3], [], NULL, NULL) = 1 (in [3])
4659  09:32:29.479577 read(3, "\1\3\7\0\6\0\0\0p\343%\t\0\0\0\0\4\0\0\0\0\0\0\0p\343%\t\0\0\0\0002"..., 4096) = 56
4659  09:32:29.479712 read(3, 0x87e2850, 4096) = -1 EAGAIN (Resource temporarily unavailable)
4659  09:32:29.479804 open(".xmodmap-carl", O_RDONLY) = 4
4659  09:32:29.479905 fstat64(4, {st_mode=S_IFREG|0644, st_size=1365, ...}) = 0
4659  09:32:29.480039 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6ff89000
4659  09:32:29.480126 read(4, "!\n! This is an `xmodmap' input fi"..., 4096) = 1365
4659  09:32:29.480305 select(4, [3], [3], NULL, NULL) = 1 (out [3])
4659  09:32:29.480421 writev(3, [{"\220\10\7\0\0\1\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 28}, {NULL, 0}, {""..., 0}], 3) = 28
4659  09:32:29.480679 select(4, [3], [], NULL, NULL) = 1 (in [3])
4659  09:32:29.480778 read(3, "\1\1\10\0G\4\0\0\0\0\10\377\7\0\0\31\31\10(\1\370\0\0\0\0\0\0\0\0\0\0\10\370"..., 4096) = 4096
4659  09:32:29.480906 read(3, "\0\0\0\0\0\0\0\0\1\1\1\0006\377\10\20\0\0\0\0\0\0\0\0\0\0\0\0\1\1\1\0i"..., 316) = 316
4659  09:32:29.481043 read(3, 0x87e2850, 4096) = -1 EAGAIN (Resource temporarily unavailable)
4659  09:32:29.481734 open("/usr/share/X11/XKeysymDB", O_RDONLY) = 6
4659  09:32:29.481858 fstat64(6, {st_mode=S_IFREG|0644, st_size=8934, ...}) = 0
4659  09:32:29.481999 read(6, "! $Xorg: XKeysymDB,v 1.3 2000/08/"..., 8934) = 8934
4659  09:32:29.482113 close(6)          = 0
4659  09:32:29.482440 open("/usr/share/X11/locale/locale.alias", O_RDONLY) = 6
4659  09:32:29.482570 fstat64(6, {st_mode=S_IFREG|0644, st_size=80646, ...}) = 0
4659  09:32:29.482705 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6ff88000
4659  09:32:29.482793 read(6, "#\t$XdotOrg: lib/X11/nls/locale.al"..., 4096) = 4096
4659  09:32:29.482975 read(6, "br_FR.iso88591\t\t\t\t\tbr_FR.ISO8859-"..., 4096) = 4096
4659  09:32:29.483150 read(6, "9-1\nde_DE.ISO_8859-1\t\t\t\tde_DE.ISO"..., 4096) = 4096
4659  09:32:29.483358 read(6, "-1\nen_ZW.ISO-8859-1\t\t\t\ten_ZW.ISO8"..., 4096) = 4096
4659  09:32:29.483553 read(6, "UTF-8\nfi\t\t\t\t\t\tfi_FI.ISO8859-15\nfi"..., 4096) = 4096
4659  09:32:29.483732 read(6, "1255\nhe_IL.MICROSOFT-CP1255\t\t\t\the"..., 4096) = 4096
4659  09:32:29.483832 read(6, "\tlt_LT.UTF-8\nlv\t\t\t\t\t\tlv_LV.ISO885"..., 4096) = 4096
4659  09:32:29.483930 read(6, "1\t\t\t\tpt_BR.ISO8859-1\npt_BR.ISO_88"..., 4096) = 4096
4659  09:32:29.484024 read(6, "O-8859-1\t\t\t\tsv_FI.ISO8859-1\nsv_FI"..., 4096) = 4096
4659  09:32:29.484124 read(6, "8\nZH_TW.UTF-8\t\t\t\t\tzh_TW.UTF-8\nzu\t"..., 4096) = 4096
4659  09:32:29.484222 read(6, "\tar_AE.ISO8859-6\nar_AE.utf8:\t\t\t\t\t"..., 4096) = 4096
4659  09:32:29.484340 read(6, "5\nca_ES.UTF-8@euro:\t\t\t\tca_ES.UTF-"..., 4096) = 4096
4659  09:32:29.484447 read(6, "_GR.ISO8859-7\nel_GR@euro:\t\t\t\t\tel_"..., 4096) = 4096
4659  09:32:29.484551 read(6, "\tes_DO.ISO8859-1\nes_DO.iso88591:\t"..., 4096) = 4096
4659  09:32:29.484652 read(6, "\t\t\t\tfr_BE.ISO8859-1\nfr_BE.88591.e"..., 4096) = 4096
4659  09:32:29.484751 read(6, "nesian (now id).  These lines sho"..., 4096) = 4096
4659  09:32:29.484853 read(6, "5\nmk_MK.cp1251:\t\t\t\t\tmk_MK.CP1251\n"..., 4096) = 4096
4659  09:32:29.484955 read(6, "15\npt_PT.ISO-8859-15@euro:\t\t\t\tpt_"..., 4096) = 4096
4659  09:32:29.485052 read(6, "E.ISO8859-1\nsv_SE.88591:\t\t\t\t\tsv_S"..., 4096) = 4096
4659  09:32:29.485156 read(6, "zu_ZA.utf8:\t\t\t\t\tzu_ZA.UTF-8\n\n# Th"..., 4096) = 2822
4659  09:32:29.485243 read(6, ""..., 4096) = 0
4659  09:32:29.485288 close(6)          = 0
4659  09:32:29.485349 munmap(0x6ff88000, 4096) = 0
4659  09:32:29.485406 open("/usr/share/X11/locale/locale.dir", O_RDONLY) = 6
4659  09:32:29.485466 fstat64(6, {st_mode=S_IFREG|0644, st_size=39195, ...}) = 0
4659  09:32:29.485539 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6ff88000
4659  09:32:29.485590 read(6, "#\t$XdotOrg: lib/X11/nls/locale.di"..., 4096) = 4096
4659  09:32:29.485682 read(6, "ALE:\t\t\teo_EO.ISO8859-3\n# iso8859-"..., 4096) = 4096
4659  09:32:29.485773 read(6, "59-1\niso8859-1/XLC_LOCALE\t\t\tny_NO"..., 4096) = 4096
4659  09:32:29.485864 read(6, "\t\tar_TN.UTF-8\nen_US.UTF-8/XLC_LOC"..., 4096) = 4096
4659  09:32:29.485953 read(6, "/XLC_LOCALE\t\t\tmk_MK.UTF-8\nen_US.U"..., 4096) = 4096
4659  09:32:29.486035 close(6)          = 0
4659  09:32:29.486079 munmap(0x6ff88000, 4096) = 0
4659  09:32:29.486128 access("/usr/share/X11/locale/C/XLC_LOCALE", R_OK) = 0
4659  09:32:29.486190 open("/usr/share/X11/locale/C/XLC_LOCALE", O_RDONLY) = 6
4659  09:32:29.486251 fstat64(6, {st_mode=S_IFREG|0644, st_size=772, ...}) = 0
4659  09:32:29.486342 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6ff88000
4659  09:32:29.486389 read(6, "#  $Xorg: C,v 1.3 2000/08/17 19:4"..., 4096) = 772
4659  09:32:29.486512 read(6, ""..., 4096) = 0
4659  09:32:29.486561 close(6)          = 0
4659  09:32:29.486605 munmap(0x6ff88000, 4096) = 0
4659  09:32:29.486720 open("/usr/share/X11/locale/locale.alias", O_RDONLY) = 6
4659  09:32:29.486780 fstat64(6, {st_mode=S_IFREG|0644, st_size=80646, ...}) = 0
4659  09:32:29.486853 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6ff88000
4659  09:32:29.486901 read(6, "#\t$XdotOrg: lib/X11/nls/locale.al"..., 4096) = 4096
4659  09:32:29.487000 read(6, "br_FR.iso88591\t\t\t\t\tbr_FR.ISO8859-"..., 4096) = 4096
4659  09:32:29.487094 read(6, "9-1\nde_DE.ISO_8859-1\t\t\t\tde_DE.ISO"..., 4096) = 4096
4659  09:32:29.487189 read(6, "-1\nen_ZW.ISO-8859-1\t\t\t\ten_ZW.ISO8"..., 4096) = 4096
4659  09:32:29.487285 read(6, "UTF-8\nfi\t\t\t\t\t\tfi_FI.ISO8859-15\nfi"..., 4096) = 4096
4659  09:32:29.487398 read(6, "1255\nhe_IL.MICROSOFT-CP1255\t\t\t\the"..., 4096) = 4096
4659  09:32:29.487497 read(6, "\tlt_LT.UTF-8\nlv\t\t\t\t\t\tlv_LV.ISO885"..., 4096) = 4096
4659  09:32:29.487597 read(6, "1\t\t\t\tpt_BR.ISO8859-1\npt_BR.ISO_88"..., 4096) = 4096
4659  09:32:29.487690 read(6, "O-8859-1\t\t\t\tsv_FI.ISO8859-1\nsv_FI"..., 4096) = 4096
4659  09:32:29.487789 read(6, "8\nZH_TW.UTF-8\t\t\t\t\tzh_TW.UTF-8\nzu\t"..., 4096) = 4096
4659  09:32:29.487886 read(6, "\tar_AE.ISO8859-6\nar_AE.utf8:\t\t\t\t\t"..., 4096) = 4096
4659  09:32:29.487987 read(6, "5\nca_ES.UTF-8@euro:\t\t\t\tca_ES.UTF-"..., 4096) = 4096
4659  09:32:29.488085 read(6, "_GR.ISO8859-7\nel_GR@euro:\t\t\t\t\tel_"..., 4096) = 4096
4659  09:32:29.488185 read(6, "\tes_DO.ISO8859-1\nes_DO.iso88591:\t"..., 4096) = 4096
4659  09:32:29.488283 read(6, "\t\t\t\tfr_BE.ISO8859-1\nfr_BE.88591.e"..., 4096) = 4096
4659  09:32:29.488399 read(6, "nesian (now id).  These lines sho"..., 4096) = 4096
4659  09:32:29.488501 read(6, "5\nmk_MK.cp1251:\t\t\t\t\tmk_MK.CP1251\n"..., 4096) = 4096
4659  09:32:29.488604 read(6, "15\npt_PT.ISO-8859-15@euro:\t\t\t\tpt_"..., 4096) = 4096
4659  09:32:29.488709 read(6, "E.ISO8859-1\nsv_SE.88591:\t\t\t\t\tsv_S"..., 4096) = 4096
4659  09:32:29.488812 read(6, "zu_ZA.utf8:\t\t\t\t\tzu_ZA.UTF-8\n\n# Th"..., 4096) = 2822
4659  09:32:29.488899 read(6, ""..., 4096) = 0
4659  09:32:29.488943 close(6)          = 0
4659  09:32:29.488986 munmap(0x6ff88000, 4096) = 0
4659  09:32:29.489038 open("/usr/share/X11/locale/locale.dir", O_RDONLY) = 6
4659  09:32:29.489097 fstat64(6, {st_mode=S_IFREG|0644, st_size=39195, ...}) = 0
4659  09:32:29.489170 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6ff88000
4659  09:32:29.489218 read(6, "#\t$XdotOrg: lib/X11/nls/locale.di"..., 4096) = 4096
4659  09:32:29.489318 read(6, "ALE:\t\t\teo_EO.ISO8859-3\n# iso8859-"..., 4096) = 4096
4659  09:32:29.489417 read(6, "59-1\niso8859-1/XLC_LOCALE\t\t\tny_NO"..., 4096) = 4096
4659  09:32:29.489507 read(6, "\t\tar_TN.UTF-8\nen_US.UTF-8/XLC_LOC"..., 4096) = 4096
4659  09:32:29.489599 read(6, "/XLC_LOCALE\t\t\tmk_MK.UTF-8\nen_US.U"..., 4096) = 4096
4659  09:32:29.489680 close(6)          = 0
4659  09:32:29.489723 munmap(0x6ff88000, 4096) = 0
4659  09:32:29.489771 access("/usr/share/X11/locale/C/XLC_LOCALE", R_OK) = 0
4659  09:32:29.489831 open("/usr/share/X11/locale/C/XLC_LOCALE", O_RDONLY) = 6
4659  09:32:29.489890 fstat64(6, {st_mode=S_IFREG|0644, st_size=772, ...}) = 0
4659  09:32:29.489963 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6ff88000
4659  09:32:29.490010 read(6, "#  $Xorg: C,v 1.3 2000/08/17 19:4"..., 4096) = 772
4659  09:32:29.490129 read(6, ""..., 4096) = 0
4659  09:32:29.490173 close(6)          = 0
4659  09:32:29.490217 munmap(0x6ff88000, 4096) = 0
4659  09:32:29.490608 read(4, ""..., 4096) = 0
4659  09:32:29.490665 close(4)          = 0
4659  09:32:29.490709 munmap(0x6ff89000, 4096) = 0
4659  09:32:29.490767 select(4, [3], [3], NULL, NULL) = 1 (out [3])
4659  09:32:29.490823 writev(3, [{"\220\1\5\0\1\0\1\0\0\0\0\0\0\0\0\0\5\0\5\0\220\1\4\0\1\0\2\0\0\0\0\0\7"..., 356}, {NULL, 0}, {""..., 0}], 3) = 356
4659  09:32:29.496610 select(4, [3], [], NULL, NULL) = 1 (in [3])
4659  09:32:29.499184 read(3, "`\1\v\0\1\31\6\0\1\0\22\0\10\377\0\0C\1C\1\0\0\0\0\0\0\0\0\0\0\1C`"..., 4096) = 2464
4659  09:32:29.499699 read(3, 0x87e2850, 4096) = -1 EAGAIN (Resource temporarily unavailable)
4659  09:32:29.499929 read(3, 0x87e2850, 4096) = -1 EAGAIN (Resource temporarily unavailable)
4659  09:32:29.500036 read(3, 0x87e2850, 4096) = -1 EAGAIN (Resource temporarily unavailable)
4659  09:32:29.500150 select(4, [3], [3], NULL, NULL) = 1 (out [3])
4659  09:32:29.500339 writev(3, [{"\220\10\7\0\1\0\0\0\2\0\0\0\26K\0\0\0\0\0\0\0\0\0\0\0\0\2\0"..., 28}, {NULL, 0}, {""..., 0}], 3) = 28
4659  09:32:29.500496 select(4, [3], [], NULL, NULL) = 1 (in [3])
4659  09:32:29.504436 read(3, "\1\1\37\0+\1\0\0\0\0\10\377\2\0\0\0\31\26\223\0K\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 1228
4659  09:32:29.504621 read(3, 0x87e2850, 4096) = -1 EAGAIN (Resource temporarily unavailable)
4659  09:32:29.505186 select(4, [3], [3], NULL, NULL) = 1 (out [3])
4659  09:32:29.505244 writev(3, [{"d\1\3\0\246\1\0\0U\377\0\0d\1\3\0\247\1\0\0V\377\0\0v\3\7\0002>\0\0\0"..., 52}, {NULL, 0}, {""..., 0}], 3) = 52
4659  09:32:29.505331 select(4, [3], [], NULL, NULL) = 1 (in [3])
4659  09:32:35.629358 read(3, "`\1 \0\3720\6\0\1\0\22\0\10\377\0\0\246\1\246\1\0\0\0\0\0\0\0\0\0\0\1\246`"..., 4096) = 416
4659  09:32:35.629460 read(3, 0x87e2850, 4096) = -1 EAGAIN (Resource temporarily unavailable)
4659  09:32:35.629527 select(4, [3], [3], NULL, NULL) = 1 (out [3])
4659  09:32:35.629592 writev(3, [{"<\1\2\0\0\0`\2+\377\1\0"..., 12}, {NULL, 0}, {""..., 0}], 3) = 12
4659  09:32:35.629658 select(4, [3], [], NULL, NULL) = 1 (in [3])
4659  09:32:36.638211 read(3, "\1\1$\0\0\0\0\0$\0\300\1\0\0\0\0\4\0\0\0p\0\0\0p\343%\t\0\0\0\0"..., 4096) = 32
4659  09:32:36.638304 read(3, 0x87e2850, 4096) = -1 EAGAIN (Resource temporarily unavailable)
4659  09:32:36.638373 close(3)          = 0
4659  09:32:36.638473 exit_group(1)     = ?

^ permalink raw reply	[flat|nested] 216+ messages in thread

* [updated] BFS vs. mainline scheduler benchmarks and measurements
  2009-09-06 20:59 BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
                   ` (5 preceding siblings ...)
  2009-09-08 12:57 ` Epic regression in throughput since v2.6.23 Serge Belyshev
@ 2009-09-10  7:43 ` Ingo Molnar
  2009-09-14  9:46 ` Phoronix CFS vs BFS bencharks Nikos Chantziaras
  7 siblings, 0 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10  7:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Mike Galbraith, Con Kolivas


* Ingo Molnar <mingo@elte.hu> wrote:

>   OLTP performance (postgresql + sysbench)
>      http://redhat.com/~mingo/misc/bfs-vs-tip-oltp.jpg

To everyone who might care about this, i've updated the sysbench 
results to latest -tip:

    http://redhat.com/~mingo/misc/bfs-vs-tip-oltp-v2.jpg

This double checks the effects of the various interactivity fixlets 
in the scheduler tree (whose interactivity effects 
mentioned/documented in the various threads on lkml) in the 
throughput space too and they also improved sysbench performance.

Con, i'd also like to thank you for raising general interest in 
scheduler latencies once more by posting the BFS patch. It gave us 
more bugreports upstream and gave us desktop users willing to test 
patches which in turn helps us improve the code. When users choose 
to suffer in silence that is never helpful.

BFS isnt particularly strong in this graph - from having looked at 
the workload under BFS my impression was that this is primarily due 
to you having cut out much of the sched-domains SMP load-balancer 
code. BFS 'insta-balances' very agressively, which hurts cache 
affine workloads rather visibly.

You might want to have a look at that design detail if you care - 
load-balancing is in significant parts orthogonal to the basic 
design of a fair scheduler.

For example we kept much of the existing load-balancer when we went 
to CFS in v2.6.23 - the fairness engine and the load-balancer are in 
large parts independent units of code and can be improved/tweaked 
separately.

There's interactions, but the concepts are largely separate.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  7:33                               ` Jens Axboe
@ 2009-09-10  7:49                                 ` Ingo Molnar
  2009-09-10  7:53                                   ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10  7:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Peter Zijlstra, Mike Galbraith, Con Kolivas, linux-kernel


* Jens Axboe <jens.axboe@oracle.com> wrote:

> On Thu, Sep 10 2009, Jens Axboe wrote:
> > On Thu, Sep 10 2009, Peter Zijlstra wrote:
> > > On Wed, 2009-09-09 at 14:20 +0200, Jens Axboe wrote:
> > > > 
> > > > One thing I also noticed is that when I have logged in, I 
> > > > run xmodmap manually to load some keymappings (I always tell 
> > > > myself to add this to the log in scripts, but I 
> > > > suspend/resume this laptop for weeks at the time and forget 
> > > > before the next boot). With the stock kernel, xmodmap will 
> > > > halt X updates and take forever to run. With BFS, it 
> > > > returned instantly. As I would expect.
> > > 
> > > Can you provide a little more detail (I'm a xmodmap n00b), how does one
> > > run xmodmap and maybe provide your xmodmap config?
> > 
> > Will do, let me get the notebook and strace time it on both bfs and
> > mainline.
> 
> Here's the result of running perf stat xmodmap .xmodmap-carl on 
> the notebook. I have attached the .xmodmap-carl file, it's pretty 
> simple. I have also attached the output of strace -o foo -f -tt 
> xmodmap .xmodmap-carl when run on 2.6.31-rc9.
> 
> 2.6.31-rc9-bfs210
> 
>  Performance counter stats for 'xmodmap .xmodmap-carl':
> 
>      153.994976  task-clock-msecs         #      0.990 CPUs   (scaled from 99.86%)
>               0  context-switches         #      0.000 M/sec  (scaled from 99.86%)
>               0  CPU-migrations           #      0.000 M/sec  (scaled from 99.86%)
>             315  page-faults              #      0.002 M/sec  (scaled from 99.86%)
>   <not counted>  cycles                  
>   <not counted>  instructions            
>   <not counted>  cache-references        
>   <not counted>  cache-misses            
> 
>     0.155573406  seconds time elapsed

(Side question: what hardware is this - why are there no hw 
counters? Could you post the /proc/cpuinfo?)

> 2.6.31-rc9
> 
>  Performance counter stats for 'xmodmap .xmodmap-carl':
> 
>        8.529265  task-clock-msecs         #      0.001 CPUs 
>              23  context-switches         #      0.003 M/sec
>               1  CPU-migrations           #      0.000 M/sec
>             315  page-faults              #      0.037 M/sec
>   <not counted>  cycles                  
>   <not counted>  instructions            
>   <not counted>  cache-references        
>   <not counted>  cache-misses            
> 
>    11.804293482  seconds time elapsed

Thanks - so we context-switch 23 times - possibly to Xorg. But 11 
seconds is extremely long. Will try to reproduce it.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  7:49                                 ` Ingo Molnar
@ 2009-09-10  7:53                                   ` Jens Axboe
  2009-09-10 10:02                                     ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-10  7:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Mike Galbraith, Con Kolivas, linux-kernel

On Thu, Sep 10 2009, Ingo Molnar wrote:
> 
> * Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Thu, Sep 10 2009, Jens Axboe wrote:
> > > On Thu, Sep 10 2009, Peter Zijlstra wrote:
> > > > On Wed, 2009-09-09 at 14:20 +0200, Jens Axboe wrote:
> > > > > 
> > > > > One thing I also noticed is that when I have logged in, I 
> > > > > run xmodmap manually to load some keymappings (I always tell 
> > > > > myself to add this to the log in scripts, but I 
> > > > > suspend/resume this laptop for weeks at the time and forget 
> > > > > before the next boot). With the stock kernel, xmodmap will 
> > > > > halt X updates and take forever to run. With BFS, it 
> > > > > returned instantly. As I would expect.
> > > > 
> > > > Can you provide a little more detail (I'm a xmodmap n00b), how does one
> > > > run xmodmap and maybe provide your xmodmap config?
> > > 
> > > Will do, let me get the notebook and strace time it on both bfs and
> > > mainline.
> > 
> > Here's the result of running perf stat xmodmap .xmodmap-carl on 
> > the notebook. I have attached the .xmodmap-carl file, it's pretty 
> > simple. I have also attached the output of strace -o foo -f -tt 
> > xmodmap .xmodmap-carl when run on 2.6.31-rc9.
> > 
> > 2.6.31-rc9-bfs210
> > 
> >  Performance counter stats for 'xmodmap .xmodmap-carl':
> > 
> >      153.994976  task-clock-msecs         #      0.990 CPUs   (scaled from 99.86%)
> >               0  context-switches         #      0.000 M/sec  (scaled from 99.86%)
> >               0  CPU-migrations           #      0.000 M/sec  (scaled from 99.86%)
> >             315  page-faults              #      0.002 M/sec  (scaled from 99.86%)
> >   <not counted>  cycles                  
> >   <not counted>  instructions            
> >   <not counted>  cache-references        
> >   <not counted>  cache-misses            
> > 
> >     0.155573406  seconds time elapsed
> 
> (Side question: what hardware is this - why are there no hw 
> counters? Could you post the /proc/cpuinfo?)

Sure, attached. It's a Thinkpad x60, core duo. Nothing fancy. The perf
may be a bit dated.

I went to try -tip btw, but it crashes on boot. Here's the backtrace,
typed manually, it's crashing in queue_work_on+0x28/0x60.

Call Trace:
        queue_work
        schedule_work
        clocksource_mark_unstable
        mark_tsc_unstable
        check_tsc_sync_source
        native_cpu_up
        relay_hotcpu_callback
        do_forK_idle
        _cpu_up
        cpu_up
        kernel_init
        kernel_thread_helper

> >  Performance counter stats for 'xmodmap .xmodmap-carl':
> > 
> >        8.529265  task-clock-msecs         #      0.001 CPUs 
> >              23  context-switches         #      0.003 M/sec
> >               1  CPU-migrations           #      0.000 M/sec
> >             315  page-faults              #      0.037 M/sec
> >   <not counted>  cycles                  
> >   <not counted>  instructions            
> >   <not counted>  cache-references        
> >   <not counted>  cache-misses            
> > 
> >    11.804293482  seconds time elapsed
> 
> Thanks - so we context-switch 23 times - possibly to Xorg. But 11 
> seconds is extremely long. Will try to reproduce it.

There's also the strace info with timings. Xorg is definitely involved,
during those 11s things stop updating completely.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  7:04                               ` Ingo Molnar
@ 2009-09-10  9:44                                 ` Jens Axboe
  2009-09-10  9:45                                   ` Jens Axboe
  2009-09-10 13:53                                   ` Steven Rostedt
  0 siblings, 2 replies; 216+ messages in thread
From: Jens Axboe @ 2009-09-10  9:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Con Kolivas, linux-kernel, srostedt

On Thu, Sep 10 2009, Ingo Molnar wrote:
> 
> * Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Thu, Sep 10 2009, Peter Zijlstra wrote:
> > > On Wed, 2009-09-09 at 14:20 +0200, Jens Axboe wrote:
> > > > 
> > > > One thing I also noticed is that when I have logged in, I run xmodmap
> > > > manually to load some keymappings (I always tell myself to add this to
> > > > the log in scripts, but I suspend/resume this laptop for weeks at the
> > > > time and forget before the next boot). With the stock kernel, xmodmap
> > > > will halt X updates and take forever to run. With BFS, it returned
> > > > instantly. As I would expect.
> > > 
> > > Can you provide a little more detail (I'm a xmodmap n00b), how 
> > > does one run xmodmap and maybe provide your xmodmap config?
> > 
> > Will do, let me get the notebook and strace time it on both bfs 
> > and mainline.
> 
> A 'perf stat' comparison would be nice as well - that will show us 
> events strace doesnt show, and shows us the basic scheduler behavior 
> as well.
> 
> A 'full' trace could be done as well via trace-cmd.c (attached), if 
> you enable:
> 
>   CONFIG_CONTEXT_SWITCH_TRACER=y
> 
> and did something like:
> 
>   trace-cmd -s xmodmap ... > trace.txt

trace.txt attached. Steven, you seem to go through a lot of trouble to
find the debugfs path, yet at the very end do:

> 	system("cat /debug/tracing/trace");

which doesn't seem quite right :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  9:44                                 ` Jens Axboe
@ 2009-09-10  9:45                                   ` Jens Axboe
  2009-09-10 13:53                                   ` Steven Rostedt
  1 sibling, 0 replies; 216+ messages in thread
From: Jens Axboe @ 2009-09-10  9:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Con Kolivas, linux-kernel, srostedt

[-- Attachment #1: Type: text/plain, Size: 165 bytes --]

On Thu, Sep 10 2009, Jens Axboe wrote:
> trace.txt attached.

Now it really is, I very much need a more clever MUA to help me with
these things :-)

-- 
Jens Axboe


[-- Attachment #2: trace.txt.bz2 --]
[-- Type: application/octet-stream, Size: 241339 bytes --]

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09 18:04                           ` Ingo Molnar
  2009-09-09 20:12                             ` Nikos Chantziaras
@ 2009-09-10  9:48                             ` Jens Axboe
  2009-09-10  9:59                               ` Ingo Molnar
  1 sibling, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-10  9:48 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mike Galbraith, Peter Zijlstra, Con Kolivas, linux-kernel

On Wed, Sep 09 2009, Ingo Molnar wrote:
> At least in my tests these latencies were mainly due to a bug in 
> latt.c - i've attached the fixed version.

What bug? I don't see any functional change between the version you
attach and the current one.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  6:40                                 ` Ingo Molnar
@ 2009-09-10  9:54                                   ` Jens Axboe
  2009-09-10 10:03                                     ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-10  9:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nikos Chantziaras, Mike Galbraith, Peter Zijlstra, Con Kolivas,
	linux-kernel

On Thu, Sep 10 2009, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > However, the interactivity problems still remain.  Does that 
> > > mean it's not a latency issue?
> > 
> > It means that Jens's test-app, which demonstrated and helped us 
> > fix the issue for him does not help us fix it for you just yet.
> 
> Lemme qualify that by saying that Jens's issues are improved not 
> fixed [he has not re-run with latest latt.c yet] but not all things 
> are fully fixed yet. For example the xmodmap thing sounds 
> interesting - could that be a child-runs-first effect?

I thought so too, so when -tip failed to boot I pulled the patches from
Mike into 2.6.31. It doesn't change anything for xmodmap, though.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  9:48                             ` BFS vs. mainline scheduler benchmarks and measurements Jens Axboe
@ 2009-09-10  9:59                               ` Ingo Molnar
  2009-09-10 10:01                                 ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10  9:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Mike Galbraith, Peter Zijlstra, Con Kolivas, linux-kernel


* Jens Axboe <jens.axboe@oracle.com> wrote:

> On Wed, Sep 09 2009, Ingo Molnar wrote:
> > At least in my tests these latencies were mainly due to a bug in 
> > latt.c - i've attached the fixed version.
> 
> What bug? I don't see any functional change between the version 
> you attach and the current one.

Here's the diff of what i fixed yesterday over the last latt.c 
version i found in this thread. The poll() thing is the significant 
one.

	Ingo

--- latt.c.orig
+++ latt.c
@@ -39,6 +39,7 @@ static unsigned int verbose;
 struct stats
 {
 	double n, mean, M2, max;
+	int max_pid;
 };
 
 static void update_stats(struct stats *stats, unsigned long long val)
@@ -85,22 +86,6 @@ static double stddev_stats(struct stats 
 	return sqrt(variance);
 }
 
-/*
- * The std dev of the mean is related to the std dev by:
- *
- *             s
- * s_mean = -------
- *          sqrt(n)
- *
- */
-static double stddev_mean_stats(struct stats *stats)
-{
-	double variance = stats->M2 / (stats->n - 1);
-	double variance_mean = variance / stats->n;
-
-	return sqrt(variance_mean);
-}
-
 struct stats delay_stats;
 
 static int pipes[MAX_CLIENTS*2][2];
@@ -212,7 +197,7 @@ static unsigned long usec_since(struct t
 static void log_delay(unsigned long delay)
 {
 	if (verbose) {
-		fprintf(stderr, "log delay %8lu usec\n", delay);
+		fprintf(stderr, "log delay %8lu usec (pid %d)\n", delay, getpid());
 		fflush(stderr);
 	}
 
@@ -300,7 +285,7 @@ static int __write_ts(int i, struct time
 	return write(fd, ts, sizeof(*ts)) != sizeof(*ts);
 }
 
-static long __read_ts(int i, struct timespec *ts)
+static long __read_ts(int i, struct timespec *ts, pid_t *cpids)
 {
 	int fd = pipes[2*i+1][0];
 	struct timespec t;
@@ -309,11 +294,14 @@ static long __read_ts(int i, struct time
 		return -1;
 
 	log_delay(usec_since(ts, &t));
+	if (verbose)
+		fprintf(stderr, "got delay %ld from child %d [pid %d]\n", usec_since(ts, &t), i, cpids[i]);
 
 	return 0;
 }
 
-static int read_ts(struct pollfd *pfd, unsigned int nr, struct timespec *ts)
+static int read_ts(struct pollfd *pfd, unsigned int nr, struct timespec *ts,
+		   pid_t *cpids)
 {
 	unsigned int i;
 
@@ -322,7 +310,7 @@ static int read_ts(struct pollfd *pfd, u
 			return -1L;
 		if (pfd[i].revents & POLLIN) {
 			pfd[i].events = 0;
-			if (__read_ts(i, &ts[i]))
+			if (__read_ts(i, &ts[i], cpids))
 				return -1L;
 			nr--;
 		}
@@ -368,7 +356,6 @@ static void run_parent(pid_t *cpids)
 	srand(1234);
 
 	do {
-		unsigned long delay;
 		unsigned pending_events;
 
 		do_rand_sleep();
@@ -404,17 +391,17 @@ static void run_parent(pid_t *cpids)
 		 */
 		pending_events = clients;
 		while (pending_events) {
-			int evts = poll(ipfd, clients, 0);
+			int evts = poll(ipfd, clients, -1);
 
 			if (evts < 0) {
 				do_exit = 1;
 				break;
 			} else if (!evts) {
-				/* printf("bugger2\n"); */
+				printf("bugger2\n");
 				continue;
 			}
 
-			if (read_ts(ipfd, evts, t1)) {
+			if (read_ts(ipfd, evts, t1, cpids)) {
 				do_exit = 1;
 				break;
 			}

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  9:59                               ` Ingo Molnar
@ 2009-09-10 10:01                                 ` Jens Axboe
  0 siblings, 0 replies; 216+ messages in thread
From: Jens Axboe @ 2009-09-10 10:01 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mike Galbraith, Peter Zijlstra, Con Kolivas, linux-kernel

On Thu, Sep 10 2009, Ingo Molnar wrote:
> 
> * Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Wed, Sep 09 2009, Ingo Molnar wrote:
> > > At least in my tests these latencies were mainly due to a bug in 
> > > latt.c - i've attached the fixed version.
> > 
> > What bug? I don't see any functional change between the version 
> > you attach and the current one.
> 
> Here's the diff of what i fixed yesterday over the last latt.c 
> version i found in this thread. The poll() thing is the significant 
> one.

Ah indeed, thanks Ingo! I'm tempted to add some actual work processing
into latt as well, to see if that helps improve it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  7:53                                   ` Jens Axboe
@ 2009-09-10 10:02                                     ` Ingo Molnar
  2009-09-10 10:09                                       ` Jens Axboe
  2009-09-10 18:00                                       ` [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable Ingo Molnar
  0 siblings, 2 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10 10:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Peter Zijlstra, Mike Galbraith, Con Kolivas, linux-kernel


* Jens Axboe <jens.axboe@oracle.com> wrote:

> I went to try -tip btw, but it crashes on boot. Here's the 
> backtrace, typed manually, it's crashing in 
> queue_work_on+0x28/0x60.
> 
> Call Trace:
>         queue_work
>         schedule_work
>         clocksource_mark_unstable
>         mark_tsc_unstable
>         check_tsc_sync_source
>         native_cpu_up
>         relay_hotcpu_callback
>         do_forK_idle
>         _cpu_up
>         cpu_up
>         kernel_init
>         kernel_thread_helper

hm, that looks like an old bug i fixed days ago via:

  00a3273: Revert "x86: Make tsc=reliable override boot time stability checks"

Have you tested tip:master - do you still know which sha1?

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  9:54                                   ` Jens Axboe
@ 2009-09-10 10:03                                     ` Ingo Molnar
  2009-09-10 10:11                                       ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10 10:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Nikos Chantziaras, Mike Galbraith, Peter Zijlstra, Con Kolivas,
	linux-kernel


* Jens Axboe <jens.axboe@oracle.com> wrote:

> On Thu, Sep 10 2009, Ingo Molnar wrote:
> > 
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > > However, the interactivity problems still remain.  Does that 
> > > > mean it's not a latency issue?
> > > 
> > > It means that Jens's test-app, which demonstrated and helped us 
> > > fix the issue for him does not help us fix it for you just yet.
> > 
> > Lemme qualify that by saying that Jens's issues are improved not 
> > fixed [he has not re-run with latest latt.c yet] but not all things 
> > are fully fixed yet. For example the xmodmap thing sounds 
> > interesting - could that be a child-runs-first effect?
> 
> I thought so too, so when -tip failed to boot I pulled the patches 
> from Mike into 2.6.31. It doesn't change anything for xmodmap, 
> though.

Note, you can access just the pristine scheduler patches by checking 
out and testing tip:sched/core - no need to pull them out and apply.

Your crash looks like clocksource related - that's in a separate 
topic which you can thus isolate if you use sched/core.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 10:02                                     ` Ingo Molnar
@ 2009-09-10 10:09                                       ` Jens Axboe
  2009-09-10 18:00                                       ` [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable Ingo Molnar
  1 sibling, 0 replies; 216+ messages in thread
From: Jens Axboe @ 2009-09-10 10:09 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Mike Galbraith, Con Kolivas, linux-kernel

On Thu, Sep 10 2009, Ingo Molnar wrote:
> 
> * Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > I went to try -tip btw, but it crashes on boot. Here's the 
> > backtrace, typed manually, it's crashing in 
> > queue_work_on+0x28/0x60.
> > 
> > Call Trace:
> >         queue_work
> >         schedule_work
> >         clocksource_mark_unstable
> >         mark_tsc_unstable
> >         check_tsc_sync_source
> >         native_cpu_up
> >         relay_hotcpu_callback
> >         do_forK_idle
> >         _cpu_up
> >         cpu_up
> >         kernel_init
> >         kernel_thread_helper
> 
> hm, that looks like an old bug i fixed days ago via:
> 
>   00a3273: Revert "x86: Make tsc=reliable override boot time stability checks"
> 
> Have you tested tip:master - do you still know which sha1?

It was -tip pulled this morning, 2-3 hours ago. I don't have the sha
anymore, but it was a fresh pull today.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 10:03                                     ` Ingo Molnar
@ 2009-09-10 10:11                                       ` Jens Axboe
  2009-09-10 10:28                                         ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-10 10:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nikos Chantziaras, Mike Galbraith, Peter Zijlstra, Con Kolivas,
	linux-kernel

On Thu, Sep 10 2009, Ingo Molnar wrote:
> 
> * Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Thu, Sep 10 2009, Ingo Molnar wrote:
> > > 
> > > * Ingo Molnar <mingo@elte.hu> wrote:
> > > 
> > > > > However, the interactivity problems still remain.  Does that 
> > > > > mean it's not a latency issue?
> > > > 
> > > > It means that Jens's test-app, which demonstrated and helped us 
> > > > fix the issue for him does not help us fix it for you just yet.
> > > 
> > > Lemme qualify that by saying that Jens's issues are improved not 
> > > fixed [he has not re-run with latest latt.c yet] but not all things 
> > > are fully fixed yet. For example the xmodmap thing sounds 
> > > interesting - could that be a child-runs-first effect?
> > 
> > I thought so too, so when -tip failed to boot I pulled the patches 
> > from Mike into 2.6.31. It doesn't change anything for xmodmap, 
> > though.
> 
> Note, you can access just the pristine scheduler patches by checking 
> out and testing tip:sched/core - no need to pull them out and apply.
> 
> Your crash looks like clocksource related - that's in a separate 
> topic which you can thus isolate if you use sched/core.

I'm building sched/core now and will run the xmodmap test there.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 10:11                                       ` Jens Axboe
@ 2009-09-10 10:28                                         ` Jens Axboe
  2009-09-10 10:57                                           ` Mike Galbraith
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-10 10:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nikos Chantziaras, Mike Galbraith, Peter Zijlstra, Con Kolivas,
	linux-kernel

On Thu, Sep 10 2009, Jens Axboe wrote:
> On Thu, Sep 10 2009, Ingo Molnar wrote:
> > 
> > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > On Thu, Sep 10 2009, Ingo Molnar wrote:
> > > > 
> > > > * Ingo Molnar <mingo@elte.hu> wrote:
> > > > 
> > > > > > However, the interactivity problems still remain.  Does that 
> > > > > > mean it's not a latency issue?
> > > > > 
> > > > > It means that Jens's test-app, which demonstrated and helped us 
> > > > > fix the issue for him does not help us fix it for you just yet.
> > > > 
> > > > Lemme qualify that by saying that Jens's issues are improved not 
> > > > fixed [he has not re-run with latest latt.c yet] but not all things 
> > > > are fully fixed yet. For example the xmodmap thing sounds 
> > > > interesting - could that be a child-runs-first effect?
> > > 
> > > I thought so too, so when -tip failed to boot I pulled the patches 
> > > from Mike into 2.6.31. It doesn't change anything for xmodmap, 
> > > though.
> > 
> > Note, you can access just the pristine scheduler patches by checking 
> > out and testing tip:sched/core - no need to pull them out and apply.
> > 
> > Your crash looks like clocksource related - that's in a separate 
> > topic which you can thus isolate if you use sched/core.
> 
> I'm building sched/core now and will run the xmodmap test there.

No difference. Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then
I get:

 Performance counter stats for 'xmodmap .xmodmap-carl':

       9.009137  task-clock-msecs         #      0.447 CPUs 
             18  context-switches         #      0.002 M/sec
              1  CPU-migrations           #      0.000 M/sec
            315  page-faults              #      0.035 M/sec
  <not counted>  cycles                  
  <not counted>  instructions            
  <not counted>  cache-references        
  <not counted>  cache-misses            

    0.020167093  seconds time elapsed

Woot!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 10:28                                         ` Jens Axboe
@ 2009-09-10 10:57                                           ` Mike Galbraith
  2009-09-10 11:09                                             ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Mike Galbraith @ 2009-09-10 10:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ingo Molnar, Nikos Chantziaras, Peter Zijlstra, Con Kolivas,
	linux-kernel

On Thu, 2009-09-10 at 12:28 +0200, Jens Axboe wrote:

> No difference. Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then
> I get:
> 
>  Performance counter stats for 'xmodmap .xmodmap-carl':
> 
>        9.009137  task-clock-msecs         #      0.447 CPUs 
>              18  context-switches         #      0.002 M/sec
>               1  CPU-migrations           #      0.000 M/sec
>             315  page-faults              #      0.035 M/sec
>   <not counted>  cycles                  
>   <not counted>  instructions            
>   <not counted>  cache-references        
>   <not counted>  cache-misses            
> 
>     0.020167093  seconds time elapsed
> 
> Woot!

Something is very seriously hosed on that box... clock?

Can you turn it back on, and do..
	while sleep .1; do cat /proc/sched_debug >> foo; done 
..on one core, and (quickly;) xmodmap .xmodmap-carl, then send me a few
seconds worth (gzipped up) to eyeball?

	-Mike


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  1:02                                 ` Con Kolivas
@ 2009-09-10 11:03                                   ` Jens Axboe
  0 siblings, 0 replies; 216+ messages in thread
From: Jens Axboe @ 2009-09-10 11:03 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Nikos Chantziaras, Ingo Molnar, Mike Galbraith, Peter Zijlstra,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1096 bytes --]

On Thu, Sep 10 2009, Con Kolivas wrote:
> > It probably just means that latt isn't a good measure of the problem.
> > Which isn't really too much of a surprise.
> 
> And that's a real shame because this was one of the first real good attempts 
> I've seen to actually measure the difference, and I thank you for your 
> efforts Jens. I believe the reason it's limited is because all you're 
> measuring is time from wakeup and the test app isn't actually doing any work. 
> The issue is more than just waking up as fast as possible, it's then doing 
> some meaningful amount of work within a reasonable time frame as well. What 
> the "meaningful amount of work" and "reasonable time frame" are, remains a 
> mystery, but I guess could be added on to this testing app.

Here's a quickie addition that adds some work to the threads. The
latency measure is now 'when did I wake up and complete my work'. The
default work is filling a buffer with pseudo random data and then
compressing it with zlib. Default is 64kb of data, can be adjusted with
-x. -x0 turns off work processing.

-- 
Jens Axboe


[-- Attachment #2: latt.c --]
[-- Type: text/x-csrc, Size: 11245 bytes --]

/*
 * Simple latency tester that combines multiple processes.
 *
 * Compile: gcc -Wall -O2 -D_GNU_SOURCE -lrt -lm -lz -o latt latt.c
 *
 * Run with: latt -c8 'program --args'
 *
 * Options:
 *
 *	-cX	Use X number of clients
 *	-sX	Use X msec as the minimum sleep time for the parent
 *	-SX	Use X msec as the maximum sleep time for the parent
 *	-xX	Use X kb as the work buffer to randomize and compress
 *	-v	Print all delays as they are logged
 *
 * (C) Jens Axboe <jens.axboe@oracle.com> 2009
 * Fixes from Peter Zijlstra <a.p.zijlstra@chello.nl> to actually make it
 * measure what it was intended to measure.
 * Fix from Ingo Molnar for poll() using 0 as timeout (should be negative).
 *
 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <getopt.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <time.h>
#include <math.h>
#include <poll.h>
#include <pthread.h>
#include <zlib.h>
#include <string.h>

/*
 * In msecs
 */
static unsigned int min_delay = 100;
static unsigned int max_delay = 500;
static unsigned int clients = 1;
static unsigned int compress_size = 64 * 1024;
static unsigned int verbose;

#define print_if_verbose(args...)		\
	do {					\
	if (verbose) {				\
		fprintf(stderr, ##args);	\
		fflush(stderr);			\
	}					\
	} while (0)

#define MAX_CLIENTS		512

static int pipes[MAX_CLIENTS*2][2];
static pid_t app_pid;

#define CLOCKSOURCE		CLOCK_MONOTONIC

struct stats
{
	double n, mean, M2, max, max_client;
} delay_stats;


static void update_stats(struct stats *stats, unsigned long long val)
{
	double delta, x = val;

	stats->n++;
	delta = x - stats->mean;
	stats->mean += delta / stats->n;
	stats->M2 += delta*(x - stats->mean);

	if (stats->max < x)
		stats->max = x;
}

static unsigned long nr_stats(struct stats *stats)
{
	return stats->n;
}

static double max_stats(struct stats *stats)
{
	return stats->max;
}

static double avg_stats(struct stats *stats)
{
	return stats->mean;
}

/*
 * http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
 *
 *       (\Sum n_i^2) - ((\Sum n_i)^2)/n
 * s^2 = -------------------------------
 *                  n - 1
 *
 * http://en.wikipedia.org/wiki/Stddev
 */
static double stddev_stats(struct stats *stats)
{
	double variance = stats->M2 / (stats->n - 1);

	return sqrt(variance);
}

/*
 * The std dev of the mean is related to the std dev by:
 *
 *             s
 * s_mean = -------
 *          sqrt(n)
 *
 */
static double stddev_mean_stats(struct stats *stats)
{
	double variance = stats->M2 / (stats->n - 1);
	double variance_mean = variance / stats->n;

	return sqrt(variance_mean);
}

struct sem {
	pthread_mutex_t lock;
	pthread_cond_t cond;
	int value;
	int waiters;
};

static void init_sem(struct sem *sem)
{
	pthread_mutexattr_t attr;
	pthread_condattr_t cond;

	pthread_mutexattr_init(&attr);
	pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
	pthread_condattr_init(&cond);
	pthread_condattr_setpshared(&cond, PTHREAD_PROCESS_SHARED);
	pthread_cond_init(&sem->cond, &cond);
	pthread_mutex_init(&sem->lock, &attr);

	sem->value = 0;
	sem->waiters = 0;
}

static void sem_down(struct sem *sem)
{
	pthread_mutex_lock(&sem->lock);

	while (!sem->value) {
		sem->waiters++;
		pthread_cond_wait(&sem->cond, &sem->lock);
		sem->waiters--;
	}

	sem->value--;
	pthread_mutex_unlock(&sem->lock);
}

static void sem_up(struct sem *sem)
{
	pthread_mutex_lock(&sem->lock);
	if (!sem->value && sem->waiters)
		pthread_cond_signal(&sem->cond);
	sem->value++;
	pthread_mutex_unlock(&sem->lock);
}

static int parse_options(int argc, char *argv[])
{
	struct option l_opts[] = {
		{ "min-delay (msec)", 	1, 	NULL,	's' },
		{ "max-delay (msec)",	1,	NULL,	'S' },
		{ "clients",		1,	NULL,	'c' },
		{ "compress-size (kb)",	1,	NULL,	'x' },
		{ "verbose",		0,	NULL,	'v' },
		{ "help",		0,	NULL,	'h' },
		{ NULL,					    },
	};
	int c, res, index = 0, i;

	while ((c = getopt_long(argc, argv, "s:S:c:vhx:", l_opts, &res)) != -1) {
		index++;
		switch (c) {
			case 's':
				min_delay = atoi(optarg);
				break;
			case 'S':
				max_delay = atoi(optarg);
				break;
			case 'c':
				clients = atoi(optarg);
				if (clients > MAX_CLIENTS) {
					clients = MAX_CLIENTS;
					printf("Capped clients to %d\n", clients);
				}
				break;
			case 'v':
				verbose = 1;
				break;
			case 'h':
				for (i = 0; l_opts[i].name; i++)
					printf("-%c: %s\n", l_opts[i].val, l_opts[i].name);
				break;
			case 'x':
				compress_size = atoi(optarg);
				compress_size *= 1024;
				break;
		}
	}

	return index + 1;
}

static pid_t fork_off(const char *app)
{
	pid_t pid;

	pid = fork();
	if (pid)
		return pid;

	exit(system(app));
}

static unsigned long usec_since(struct timespec *start, struct timespec *end)
{
	unsigned long long s, e;

	s = start->tv_sec * 1000000000ULL + start->tv_nsec;
	e =   end->tv_sec * 1000000000ULL +   end->tv_nsec;

	return (e - s) / 1000;
}

static void log_delay(unsigned long delay)
{
	print_if_verbose("log delay %8lu usec (pid=%d)\n", delay, getpid());

	update_stats(&delay_stats, delay);
}

static unsigned long pseed = 1;

static int get_rand(void)
{
	pseed = pseed * 1103515245 + 12345;
	return ((unsigned) (pseed / 65536) % 32768);
}

static void fill_random_data(int *buf, unsigned int nr_ints)
{
	int i;

	for (i = 0; i < nr_ints; i++)
		buf[i] = get_rand();
}

/*
 * Generate random data and compress it with zlib
 */
static void do_work(void)
{
	unsigned long work_size = compress_size;
	z_stream stream;
	int ret, bytes;
	void *zbuf;
	int *buf;

	memset(&stream, 0, sizeof(stream));
	deflateInit(&stream, 7);
	bytes = deflateBound(&stream, work_size);

	zbuf = malloc(bytes);
	buf = malloc(work_size);
	fill_random_data(buf, work_size / sizeof(int));

	stream.next_in = (void *) buf;
	stream.avail_in = work_size;
	stream.next_out = zbuf;
	stream.avail_out = bytes;

	do {
		ret = deflate(&stream, Z_FINISH);
	} while (ret == Z_OK);

	deflateEnd(&stream);
	free(buf);
	free(zbuf);
}

/*
 * Reads a timestamp (which is ignored, it's just a wakeup call), and replies
 * with the timestamp of when we saw it
 */
static void run_child(int *in, int *out, struct sem *sem)
{
	struct timespec ts;

	print_if_verbose("present: %d\n", getpid());

	sem_up(sem);

	do {
		int ret;

		ret = read(in[0], &ts, sizeof(ts));
		if (ret <= 0)
			break;
		else if (ret != sizeof(ts))
			break;

		if (compress_size)
			do_work();

		clock_gettime(CLOCKSOURCE, &ts);

		ret = write(out[1], &ts, sizeof(ts));
		if (ret <= 0)
			break;
		else if (ret != sizeof(ts))
			break;

		print_if_verbose("alive: %d\n", getpid());
	} while (1);
}

/*
 * Do a random sleep between min and max delay
 */
static void do_rand_sleep(void)
{
	unsigned int msecs;

	msecs = min_delay + ((float) max_delay * (rand() / (RAND_MAX + 1.0)));
	print_if_verbose("sleeping for: %u msec\n", msecs);
	usleep(msecs * 1000);
}

static void kill_connection(void)
{
	int i;

	for (i = 0; i < 2*clients; i++) {
		if (pipes[i][0] != -1) {
			close(pipes[i][0]);
			pipes[i][0] = -1;
		}
		if (pipes[i][1] != -1) {
			close(pipes[i][1]);
			pipes[i][1] = -1;
		}
	}
}

static int __write_ts(int i, struct timespec *ts)
{
	int fd = pipes[2*i][1];

	clock_gettime(CLOCKSOURCE, ts);

	return write(fd, ts, sizeof(*ts)) != sizeof(*ts);
}

static long __read_ts(int i, struct timespec *ts, pid_t *cpids)
{
	int fd = pipes[2*i+1][0];
	unsigned long delay;
	struct timespec t;

	if (read(fd, &t, sizeof(t)) != sizeof(t))
		return -1;

	delay = usec_since(ts, &t);
	log_delay(delay);
	print_if_verbose("got delay %ld from child %d [pid %d]\n", delay,
				i, cpids[i]);
	return 0;
}

static int read_ts(struct pollfd *pfd, unsigned int nr, struct timespec *ts,
		   pid_t *cpids)
{
	unsigned int i;

	for (i = 0; i < clients; i++) {
		if (pfd[i].revents & (POLLERR | POLLHUP | POLLNVAL))
			return -1L;
		if (pfd[i].revents & POLLIN) {
			pfd[i].events = 0;
			if (__read_ts(i, &ts[i], cpids))
				return -1L;
			nr--;
		}
		if (!nr)
			break;
	}

	return 0;
}

static int app_has_exited(void)
{
	int ret, status;

	/*
	 * If our app has exited, stop
	 */
	ret = waitpid(app_pid, &status, WNOHANG);
	if (ret < 0) {
		perror("waitpid");
		return 1;
	} else if (ret == app_pid &&
		   (WIFSIGNALED(status) || WIFEXITED(status))) {
		return 1;
	}

	return 0;
}

/*
 * While our given app is running, send a timestamp to each client and
 * log the maximum latency for each of them to wakeup and reply
 */
static void run_parent(pid_t *cpids)
{
	struct pollfd *ipfd;
	int do_exit = 0, i;
	struct timespec *t1;

	t1 = malloc(sizeof(struct timespec) * clients);
	ipfd = malloc(sizeof(struct pollfd) * clients);

	srand(1234);

	do {
		unsigned pending_events;

		do_rand_sleep();

		if (app_has_exited())
			break;

		for (i = 0; i < clients; i++) {
			ipfd[i].fd = pipes[2*i+1][0];
			ipfd[i].events = POLLIN;
		}

		/*
		 * Write wakeup calls
		 */
		for (i = 0; i < clients; i++) {
			print_if_verbose("waking: %d\n", cpids[i]);

			if (__write_ts(i, t1+i)) {
				do_exit = 1;
				break;
			}
		}

		if (do_exit)
			break;

		/*
		 * Poll and read replies
		 */
		pending_events = clients;
		while (pending_events) {
			int evts = poll(ipfd, clients, -1);

			if (evts < 0) {
				do_exit = 1;
				break;
			} else if (!evts)
				continue;

			if (read_ts(ipfd, evts, t1, cpids)) {
				do_exit = 1;
				break;
			}

			pending_events -= evts;
		}
	} while (!do_exit);

	kill_connection();
	free(t1);
	free(ipfd);
}

static void run_test(void)
{
	struct sem *sem;
	pid_t *cpids;
	int i, status;

	sem = mmap(NULL, sizeof(*sem), PROT_READ|PROT_WRITE,
			MAP_SHARED | MAP_ANONYMOUS, 0, 0);
	if (sem == MAP_FAILED) {
		perror("mmap");
		return;
	}

	init_sem(sem);

	for (i = 0; i < 2*clients; i++) {
		if (pipe(pipes[i])) {
			perror("pipe");
			return;
		}
	}

	cpids = malloc(sizeof(pid_t) * clients);

	for (i = 0; i < clients; i++) {
		cpids[i] = fork();
		if (cpids[i]) {
			sem_down(sem);
			continue;
		}

		run_child(pipes[2*i], pipes[2*i+1], sem);
		exit(0);
	}

	run_parent(cpids);

	for (i = 0; i < clients; i++)
		kill(cpids[i], SIGQUIT);
	for (i = 0; i < clients; i++)
		waitpid(cpids[i], &status, 0);

	free(cpids);
	munmap(sem, sizeof(*sem));
}

static void handle_sigint(int sig)
{
	kill(app_pid, SIGINT);
}

int main(int argc, char *argv[])
{
	int app_offset, off;
	char app[256];

	off = 0;
	app_offset = parse_options(argc, argv);
	if (app_offset >= argc) {
		printf("%s: [options] 'app'\n", argv[0]);
		return 1;
	}

	while (app_offset < argc) {
		if (off) {
			app[off] = ' ';
			off++;
		}
		off += sprintf(app + off, "%s", argv[app_offset]);
		app_offset++;
	}

	signal(SIGINT, handle_sigint);

	/*
	 * Start app and start logging latencies
	 */
	app_pid = fork_off(app);
	run_test();

	printf("\nParameters: min_wait=%ums, max_wait=%ums, clients=%u\n",
			min_delay, max_delay, clients);
	printf("Entries logged: %lu\n", nr_stats(&delay_stats));
	printf("\n             Averages\n");
	printf("-------------------------------------\n");
	printf("\tMax\t\t%8.0f usec\n", max_stats(&delay_stats));
	printf("\tAvg\t\t%8.0f usec\n", avg_stats(&delay_stats));
	printf("\tStdev\t\t%8.0f usec\n", stddev_stats(&delay_stats));
	printf("\tStdev mean\t%8.0f usec\n", stddev_mean_stats(&delay_stats));

	return 0;
}

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 10:57                                           ` Mike Galbraith
@ 2009-09-10 11:09                                             ` Jens Axboe
  2009-09-10 11:21                                               ` Mike Galbraith
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-10 11:09 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Nikos Chantziaras, Peter Zijlstra, Con Kolivas,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]

On Thu, Sep 10 2009, Mike Galbraith wrote:
> On Thu, 2009-09-10 at 12:28 +0200, Jens Axboe wrote:
> 
> > No difference. Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then
> > I get:
> > 
> >  Performance counter stats for 'xmodmap .xmodmap-carl':
> > 
> >        9.009137  task-clock-msecs         #      0.447 CPUs 
> >              18  context-switches         #      0.002 M/sec
> >               1  CPU-migrations           #      0.000 M/sec
> >             315  page-faults              #      0.035 M/sec
> >   <not counted>  cycles                  
> >   <not counted>  instructions            
> >   <not counted>  cache-references        
> >   <not counted>  cache-misses            
> > 
> >     0.020167093  seconds time elapsed
> > 
> > Woot!
> 
> Something is very seriously hosed on that box... clock?

model name      : Genuine Intel(R) CPU           T2400  @ 1.83GHz

Throttles down to 1.00GHz when idle.

> Can you turn it back on, and do..

I guess you mean turn NEW_FAIR_SLEEPERS back on, correct?

> 	while sleep .1; do cat /proc/sched_debug >> foo; done 
> ..on one core, and (quickly;) xmodmap .xmodmap-carl, then send me a few
> seconds worth (gzipped up) to eyeball?

Attached.

-- 
Jens Axboe


[-- Attachment #2: sched-debug-cat.bz2 --]
[-- Type: application/octet-stream, Size: 12209 bytes --]

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 11:09                                             ` Jens Axboe
@ 2009-09-10 11:21                                               ` Mike Galbraith
  2009-09-10 11:24                                                 ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Mike Galbraith @ 2009-09-10 11:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ingo Molnar, Nikos Chantziaras, Peter Zijlstra, Con Kolivas,
	linux-kernel

On Thu, 2009-09-10 at 13:09 +0200, Jens Axboe wrote:
> On Thu, Sep 10 2009, Mike Galbraith wrote:
> > On Thu, 2009-09-10 at 12:28 +0200, Jens Axboe wrote:
> > 
> > > No difference. Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then
> > > I get:
> > > 
> > >  Performance counter stats for 'xmodmap .xmodmap-carl':
> > > 
> > >        9.009137  task-clock-msecs         #      0.447 CPUs 
> > >              18  context-switches         #      0.002 M/sec
> > >               1  CPU-migrations           #      0.000 M/sec
> > >             315  page-faults              #      0.035 M/sec
> > >   <not counted>  cycles                  
> > >   <not counted>  instructions            
> > >   <not counted>  cache-references        
> > >   <not counted>  cache-misses            
> > > 
> > >     0.020167093  seconds time elapsed
> > > 
> > > Woot!
> > 
> > Something is very seriously hosed on that box... clock?
> 
> model name      : Genuine Intel(R) CPU           T2400  @ 1.83GHz
> 
> Throttles down to 1.00GHz when idle.
> 
> > Can you turn it back on, and do..
> 
> I guess you mean turn NEW_FAIR_SLEEPERS back on, correct?
> 
> > 	while sleep .1; do cat /proc/sched_debug >> foo; done 
> > ..on one core, and (quickly;) xmodmap .xmodmap-carl, then send me a few
> > seconds worth (gzipped up) to eyeball?
> 
> Attached.

xmodmap doesn't seem to be running in this sample.

	-Mike


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 11:21                                               ` Mike Galbraith
@ 2009-09-10 11:24                                                 ` Jens Axboe
  2009-09-10 11:28                                                   ` Mike Galbraith
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-10 11:24 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Nikos Chantziaras, Peter Zijlstra, Con Kolivas,
	linux-kernel

On Thu, Sep 10 2009, Mike Galbraith wrote:
> On Thu, 2009-09-10 at 13:09 +0200, Jens Axboe wrote:
> > On Thu, Sep 10 2009, Mike Galbraith wrote:
> > > On Thu, 2009-09-10 at 12:28 +0200, Jens Axboe wrote:
> > > 
> > > > No difference. Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then
> > > > I get:
> > > > 
> > > >  Performance counter stats for 'xmodmap .xmodmap-carl':
> > > > 
> > > >        9.009137  task-clock-msecs         #      0.447 CPUs 
> > > >              18  context-switches         #      0.002 M/sec
> > > >               1  CPU-migrations           #      0.000 M/sec
> > > >             315  page-faults              #      0.035 M/sec
> > > >   <not counted>  cycles                  
> > > >   <not counted>  instructions            
> > > >   <not counted>  cache-references        
> > > >   <not counted>  cache-misses            
> > > > 
> > > >     0.020167093  seconds time elapsed
> > > > 
> > > > Woot!
> > > 
> > > Something is very seriously hosed on that box... clock?
> > 
> > model name      : Genuine Intel(R) CPU           T2400  @ 1.83GHz
> > 
> > Throttles down to 1.00GHz when idle.
> > 
> > > Can you turn it back on, and do..
> > 
> > I guess you mean turn NEW_FAIR_SLEEPERS back on, correct?
> > 
> > > 	while sleep .1; do cat /proc/sched_debug >> foo; done 
> > > ..on one core, and (quickly;) xmodmap .xmodmap-carl, then send me a few
> > > seconds worth (gzipped up) to eyeball?
> > 
> > Attached.
> 
> xmodmap doesn't seem to be running in this sample.

That's weird, it was definitely running. I did:

sleep 1; xmodmap .xmodmap-carl

in one xterm, and then switched to the other and ran the sched_debug
dump. I have to do it this way, as X will not move focus once xmodmap
starts running. It could be that xmodmap is mostly idle, and the real
work is done by Xorg and/or xfwm4 (my window manager).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 11:24                                                 ` Jens Axboe
@ 2009-09-10 11:28                                                   ` Mike Galbraith
  2009-09-10 11:35                                                     ` Jens Axboe
  0 siblings, 1 reply; 216+ messages in thread
From: Mike Galbraith @ 2009-09-10 11:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ingo Molnar, Nikos Chantziaras, Peter Zijlstra, Con Kolivas,
	linux-kernel

On Thu, 2009-09-10 at 13:24 +0200, Jens Axboe wrote:
> On Thu, Sep 10 2009, Mike Galbraith wrote:

> > xmodmap doesn't seem to be running in this sample.
> 
> That's weird, it was definitely running. I did:
> 
> sleep 1; xmodmap .xmodmap-carl
> 
> in one xterm, and then switched to the other and ran the sched_debug
> dump. I have to do it this way, as X will not move focus once xmodmap
> starts running. It could be that xmodmap is mostly idle, and the real
> work is done by Xorg and/or xfwm4 (my window manager).

Hm.  Ok, I'll crawl over it, see if anything falls out.

	-Mike


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 11:28                                                   ` Mike Galbraith
@ 2009-09-10 11:35                                                     ` Jens Axboe
  2009-09-10 11:42                                                       ` Mike Galbraith
  0 siblings, 1 reply; 216+ messages in thread
From: Jens Axboe @ 2009-09-10 11:35 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Nikos Chantziaras, Peter Zijlstra, Con Kolivas,
	linux-kernel

On Thu, Sep 10 2009, Mike Galbraith wrote:
> On Thu, 2009-09-10 at 13:24 +0200, Jens Axboe wrote:
> > On Thu, Sep 10 2009, Mike Galbraith wrote:
> 
> > > xmodmap doesn't seem to be running in this sample.
> > 
> > That's weird, it was definitely running. I did:
> > 
> > sleep 1; xmodmap .xmodmap-carl
> > 
> > in one xterm, and then switched to the other and ran the sched_debug
> > dump. I have to do it this way, as X will not move focus once xmodmap
> > starts running. It could be that xmodmap is mostly idle, and the real
> > work is done by Xorg and/or xfwm4 (my window manager).
> 
> Hm.  Ok, I'll crawl over it, see if anything falls out.

That seems to be confirmed with the low context switch rate of the perf
stat of xmodmap. If I run perf stat -a to get a system wide collection
for xmodmap, I get:

 Performance counter stats for 'xmodmap .xmodmap-carl':

   20112.060925  task-clock-msecs         #      1.998 CPUs 
         629360  context-switches         #      0.031 M/sec
              8  CPU-migrations           #      0.000 M/sec
          13489  page-faults              #      0.001 M/sec
  <not counted>  cycles                  
  <not counted>  instructions            
  <not counted>  cache-references        
  <not counted>  cache-misses            

   10.067532449  seconds time elapsed

And again, system is idle while this is happening. Can't rule out that
this is some kind of user space bug of course.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 11:35                                                     ` Jens Axboe
@ 2009-09-10 11:42                                                       ` Mike Galbraith
  0 siblings, 0 replies; 216+ messages in thread
From: Mike Galbraith @ 2009-09-10 11:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ingo Molnar, Nikos Chantziaras, Peter Zijlstra, Con Kolivas,
	linux-kernel

On Thu, 2009-09-10 at 13:35 +0200, Jens Axboe wrote:
> On Thu, Sep 10 2009, Mike Galbraith wrote:
> > On Thu, 2009-09-10 at 13:24 +0200, Jens Axboe wrote:
> > > On Thu, Sep 10 2009, Mike Galbraith wrote:
> > 
> > > > xmodmap doesn't seem to be running in this sample.
> > > 
> > > That's weird, it was definitely running. I did:
> > > 
> > > sleep 1; xmodmap .xmodmap-carl
> > > 
> > > in one xterm, and then switched to the other and ran the sched_debug
> > > dump. I have to do it this way, as X will not move focus once xmodmap
> > > starts running. It could be that xmodmap is mostly idle, and the real
> > > work is done by Xorg and/or xfwm4 (my window manager).
> > 
> > Hm.  Ok, I'll crawl over it, see if anything falls out.
> 
> That seems to be confirmed with the low context switch rate of the perf
> stat of xmodmap. If I run perf stat -a to get a system wide collection
> for xmodmap, I get:
> 
>  Performance counter stats for 'xmodmap .xmodmap-carl':
> 
>    20112.060925  task-clock-msecs         #      1.998 CPUs 
>          629360  context-switches         #      0.031 M/sec
>               8  CPU-migrations           #      0.000 M/sec
>           13489  page-faults              #      0.001 M/sec
>   <not counted>  cycles                  
>   <not counted>  instructions            
>   <not counted>  cache-references        
>   <not counted>  cache-misses            
> 
>    10.067532449  seconds time elapsed
> 
> And again, system is idle while this is happening. Can't rule out that
> this is some kind of user space bug of course.

All I'm seeing so far is massive CPU usage for dinky job.

	-Mike


^ permalink raw reply	[flat|nested] 216+ messages in thread

* latt location (Was Re: BFS vs. mainline scheduler benchmarks and measurements)
  2009-09-09  7:38   ` Pavel Machek
@ 2009-09-10 12:19     ` Jens Axboe
  0 siblings, 0 replies; 216+ messages in thread
From: Jens Axboe @ 2009-09-10 12:19 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Ingo Molnar, Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

On Wed, Sep 09 2009, Pavel Machek wrote:
> Could you post the source? Someone else might get us
> numbers... preferably on dualcore box or something...

Since it's posted in various places and by various people, I've put it
on the web now as well. Should always be the latest version.

http://kernel.dk/latt.c

Note that it requires zlib-devel packages to build now.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  9:44                                 ` Jens Axboe
  2009-09-10  9:45                                   ` Jens Axboe
@ 2009-09-10 13:53                                   ` Steven Rostedt
  1 sibling, 0 replies; 216+ messages in thread
From: Steven Rostedt @ 2009-09-10 13:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ingo Molnar, Peter Zijlstra, Mike Galbraith, Con Kolivas, linux-kernel

On Thu, 2009-09-10 at 11:44 +0200, Jens Axboe wrote:
> On Thu, Sep 10 2009, Ingo Molnar wrote:

> trace.txt attached. Steven, you seem to go through a lot of trouble to
> find the debugfs path, yet at the very end do:
> 
> > 	system("cat /debug/tracing/trace");
> 
> which doesn't seem quite right :-)
> 

That's an older version of the tool. The newer version (still in alpha)
doesn't do that.

-- Steve



^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  6:08                               ` Ingo Molnar
  2009-09-10  6:40                                 ` Ingo Molnar
@ 2009-09-10 16:02                                 ` Bret Towe
  2009-09-10 16:05                                   ` Peter Zijlstra
  2009-09-10 17:53                                 ` Nikos Chantziaras
  2 siblings, 1 reply; 216+ messages in thread
From: Bret Towe @ 2009-09-10 16:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nikos Chantziaras, Jens Axboe, Mike Galbraith, Peter Zijlstra,
	Con Kolivas, linux-kernel

On Wed, Sep 9, 2009 at 11:08 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Nikos Chantziaras <realnc@arcor.de> wrote:
>
>> On 09/09/2009 09:04 PM, Ingo Molnar wrote:
>>> [...]
>>> * Jens Axboe<jens.axboe@oracle.com>  wrote:
>>>
>>>> On Wed, Sep 09 2009, Jens Axboe wrote:
>>>>  [...]
>>>> BFS210 runs on the laptop (dual core intel core duo). With make -j4
>>>> running, I clock the following latt -c8 'sleep 10' latencies:
>>>>
>>>> -rc9
>>>>
>>>>          Max                17895 usec
>>>>          Avg                 8028 usec
>>>>          Stdev               5948 usec
>>>>          Stdev mean           405 usec
>>>>
>>>>          Max                17896 usec
>>>>          Avg                 4951 usec
>>>>          Stdev               6278 usec
>>>>          Stdev mean           427 usec
>>>>
>>>>          Max                17885 usec
>>>>          Avg                 5526 usec
>>>>          Stdev               6819 usec
>>>>          Stdev mean           464 usec
>>>>
>>>> -rc9 + mike
>>>>
>>>>          Max                 6061 usec
>>>>          Avg                 3797 usec
>>>>          Stdev               1726 usec
>>>>          Stdev mean           117 usec
>>>>
>>>>          Max                 5122 usec
>>>>          Avg                 3958 usec
>>>>          Stdev               1697 usec
>>>>          Stdev mean           115 usec
>>>>
>>>>          Max                 6691 usec
>>>>          Avg                 2130 usec
>>>>          Stdev               2165 usec
>>>>          Stdev mean           147 usec
>>>
>>> At least in my tests these latencies were mainly due to a bug in
>>> latt.c - i've attached the fixed version.
>>>
>>> The other reason was wakeup batching. If you do this:
>>>
>>>     echo 0>  /proc/sys/kernel/sched_wakeup_granularity_ns
>>>
>>> ... then you can switch on insta-wakeups on -tip too.
>>>
>>> With a dual-core box and a make -j4 background job running, on
>>> latest -tip i get the following latencies:
>>>
>>>   $ ./latt -c8 sleep 30
>>>   Entries: 656 (clients=8)
>>>
>>>   Averages:
>>>   ------------------------------
>>>      Max           158 usec
>>>      Avg            12 usec
>>>      Stdev          10 usec
>>
>> With your version of latt.c, I get these results with 2.6-tip vs
>> 2.6.31-rc9-bfs:
>>
>>
>> (mainline)
>> Averages:
>> ------------------------------
>>         Max            50 usec
>>         Avg            12 usec
>>         Stdev           3 usec
>>
>>
>> (BFS)
>> Averages:
>> ------------------------------
>>         Max           474 usec
>>         Avg            11 usec
>>         Stdev          16 usec
>>
>> However, the interactivity problems still remain.  Does that mean
>> it's not a latency issue?
>
> It means that Jens's test-app, which demonstrated and helped us fix
> the issue for him does not help us fix it for you just yet.
>
> The "fluidity problem" you described might not be a classic latency
> issue per se (which latt.c measures), but a timeslicing / CPU time
> distribution problem.
>
> A slight shift in CPU time allocation can change the flow of tasks
> to result in a 'choppier' system.
>
> Have you tried, in addition of the granularity tweaks you've done,
> to renice mplayer either up or down? (or compiz and Xorg for that
> matter)
>
> I'm not necessarily suggesting this as a 'real' solution (we really
> prefer kernels that just get it right) - but it's an additional
> parameter dimension along which you can tweak CPU time distribution
> on your box.
>
> Here's the general rule of thumb: mine one nice level gives plus 5%
> CPU time to a task and takes away 5% CPU time from another task -
> i.e. shifts the CPU allocation by 10%.
>
> ( this is modified by all sorts of dynamic conditions: by the number
>  of tasks running and their wakeup patters so not a rule cast into
>  stone - but still a good ballpark figure for CPU intense tasks. )
>
> Btw., i've read your descriptions about what you've tuned so far -
> have you seen/checked the wakeup_granularity tunable as well?
> Setting that to 0 will change the general balance of how CPU time is
> allocated between tasks too.
>
> There's also a whole bunch of scheduler features you can turn on/off
> individually via /debug/sched_features. For example, to turn off
> NEW_FAIR_SLEEPERS, you can do:
>
>  # cat /debug/sched_features
>  NEW_FAIR_SLEEPERS NO_NORMALIZED_SLEEPER ADAPTIVE_GRAN WAKEUP_PREEMPT
>  START_DEBIT AFFINE_WAKEUPS CACHE_HOT_BUDDY SYNC_WAKEUPS NO_HRTICK
>  NO_DOUBLE_TICK ASYM_GRAN LB_BIAS LB_WAKEUP_UPDATE ASYM_EFF_LOAD
>  NO_WAKEUP_OVERLAP LAST_BUDDY OWNER_SPIN
>
>  # echo NO_NEW_FAIR_SLEEPERS > /debug/sched_features
>
> Btw., NO_NEW_FAIR_SLEEPERS is something that will turn the scheduler
> into a more classic fair scheduler (like BFS is too).
>
> NO_START_DEBIT might be another thing that improves (or worsens :-/)
> make -j type of kernel build workloads.

thanks to this thread and others I've seen several kernel tunables
that can effect how the scheduler performs/acts
but what I don't see after a bit of looking is where all these are documented
perhaps thats also part of the reason there are unhappy people with
the current code in the kernel just because they don't know how
to tune it for their workload

> Note, these flags are all runtime, the new settings take effect
> almost immediately (and at the latest it takes effect when a task
> has started up) and safe to do runtime.
>
> It basically gives us 32768 pluggable schedulers each with a
> slightly separate algorithm - each setting in essence creates a new
> scheduler. (this mechanism is how we introduce new scheduler
> features and allow their debugging / regression-testing.)
>
> (okay, almost, so beware: turning on HRTICK might lock up your
> system.)
>
> Plus, yet another dimension of tuning on SMP systems (such as
> dual-core) are the sched-domains tunable. There's a whole world of
> tuning in that area and BFS essentially implements a very agressive
> 'always balance to other CPUs' policy.
>
> I've attached my sched-tune-domains script which helps tune these
> parameters.
>
> For example on a testbox of mine it outputs:
>
> usage: tune-sched-domains <val>
> {cpu0/domain0:SIBLING} SD flag: 239
> +   1: SD_LOAD_BALANCE:          Do load balancing on this domain
> +   2: SD_BALANCE_NEWIDLE:       Balance when about to become idle
> +   4: SD_BALANCE_EXEC:          Balance on exec
> +   8: SD_BALANCE_FORK:          Balance on fork, clone
> -  16: SD_WAKE_IDLE:             Wake to idle CPU on task wakeup
> +  32: SD_WAKE_AFFINE:           Wake task to waking CPU
> +  64: SD_WAKE_BALANCE:          Perform balancing at task wakeup
> + 128: SD_SHARE_CPUPOWER:        Domain members share cpu power
> - 256: SD_POWERSAVINGS_BALANCE:  Balance for power savings
> - 512: SD_SHARE_PKG_RESOURCES:   Domain members share cpu pkg resources
> -1024: SD_SERIALIZE:             Only a single load balancing instance
> -2048: SD_WAKE_IDLE_FAR:         Gain latency sacrificing cache hit
> -4096: SD_PREFER_SIBLING:        Prefer to place tasks in a sibling domain
> {cpu0/domain1:MC} SD flag: 4735
> +   1: SD_LOAD_BALANCE:          Do load balancing on this domain
> +   2: SD_BALANCE_NEWIDLE:       Balance when about to become idle
> +   4: SD_BALANCE_EXEC:          Balance on exec
> +   8: SD_BALANCE_FORK:          Balance on fork, clone
> +  16: SD_WAKE_IDLE:             Wake to idle CPU on task wakeup
> +  32: SD_WAKE_AFFINE:           Wake task to waking CPU
> +  64: SD_WAKE_BALANCE:          Perform balancing at task wakeup
> - 128: SD_SHARE_CPUPOWER:        Domain members share cpu power
> - 256: SD_POWERSAVINGS_BALANCE:  Balance for power savings
> + 512: SD_SHARE_PKG_RESOURCES:   Domain members share cpu pkg resources
> -1024: SD_SERIALIZE:             Only a single load balancing instance
> -2048: SD_WAKE_IDLE_FAR:         Gain latency sacrificing cache hit
> +4096: SD_PREFER_SIBLING:        Prefer to place tasks in a sibling domain
> {cpu0/domain2:NODE} SD flag: 3183
> +   1: SD_LOAD_BALANCE:          Do load balancing on this domain
> +   2: SD_BALANCE_NEWIDLE:       Balance when about to become idle
> +   4: SD_BALANCE_EXEC:          Balance on exec
> +   8: SD_BALANCE_FORK:          Balance on fork, clone
> -  16: SD_WAKE_IDLE:             Wake to idle CPU on task wakeup
> +  32: SD_WAKE_AFFINE:           Wake task to waking CPU
> +  64: SD_WAKE_BALANCE:          Perform balancing at task wakeup
> - 128: SD_SHARE_CPUPOWER:        Domain members share cpu power
> - 256: SD_POWERSAVINGS_BALANCE:  Balance for power savings
> - 512: SD_SHARE_PKG_RESOURCES:   Domain members share cpu pkg resources
> +1024: SD_SERIALIZE:             Only a single load balancing instance
> +2048: SD_WAKE_IDLE_FAR:         Gain latency sacrificing cache hit
> -4096: SD_PREFER_SIBLING:        Prefer to place tasks in a sibling domain
>
> The way i can turn on say SD_WAKE_IDLE for the NODE domain is to:
>
>   tune-sched-domains 239 4735 $((3183+16))
>
> ( This is a pretty stone-age script i admit ;-)
>
> Thanks for all your testing so far,
>
>        Ingo
>

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 16:02                                 ` Bret Towe
@ 2009-09-10 16:05                                   ` Peter Zijlstra
  2009-09-10 16:12                                     ` Bret Towe
  0 siblings, 1 reply; 216+ messages in thread
From: Peter Zijlstra @ 2009-09-10 16:05 UTC (permalink / raw)
  To: Bret Towe
  Cc: Ingo Molnar, Nikos Chantziaras, Jens Axboe, Mike Galbraith,
	Con Kolivas, linux-kernel

On Thu, 2009-09-10 at 09:02 -0700, Bret Towe wrote:
> 
> thanks to this thread and others I've seen several kernel tunables
> that can effect how the scheduler performs/acts
> but what I don't see after a bit of looking is where all these are
> documented
> perhaps thats also part of the reason there are unhappy people with
> the current code in the kernel just because they don't know how
> to tune it for their workload

The thing is, ideally they should not need to poke at these. These knobs
are under CONFIG_SCHED_DEBUG, and that is exactly what they are for.


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 16:05                                   ` Peter Zijlstra
@ 2009-09-10 16:12                                     ` Bret Towe
  2009-09-10 16:26                                       ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Bret Towe @ 2009-09-10 16:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Nikos Chantziaras, Jens Axboe, Mike Galbraith,
	Con Kolivas, linux-kernel

On Thu, Sep 10, 2009 at 9:05 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Thu, 2009-09-10 at 09:02 -0700, Bret Towe wrote:
>>
>> thanks to this thread and others I've seen several kernel tunables
>> that can effect how the scheduler performs/acts
>> but what I don't see after a bit of looking is where all these are
>> documented
>> perhaps thats also part of the reason there are unhappy people with
>> the current code in the kernel just because they don't know how
>> to tune it for their workload
>
> The thing is, ideally they should not need to poke at these. These knobs
> are under CONFIG_SCHED_DEBUG, and that is exactly what they are for.

even then I would think they should be documented so people can find out
what item is hurting their workload so they can better report the bug no?

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 16:12                                     ` Bret Towe
@ 2009-09-10 16:26                                       ` Ingo Molnar
  2009-09-10 16:33                                         ` Bret Towe
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10 16:26 UTC (permalink / raw)
  To: Bret Towe
  Cc: Peter Zijlstra, Nikos Chantziaras, Jens Axboe, Mike Galbraith,
	Con Kolivas, linux-kernel


* Bret Towe <magnade@gmail.com> wrote:

> On Thu, Sep 10, 2009 at 9:05 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > On Thu, 2009-09-10 at 09:02 -0700, Bret Towe wrote:
> >>
> >> thanks to this thread and others I've seen several kernel tunables
> >> that can effect how the scheduler performs/acts
> >> but what I don't see after a bit of looking is where all these are
> >> documented
> >> perhaps thats also part of the reason there are unhappy people with
> >> the current code in the kernel just because they don't know how
> >> to tune it for their workload
> >
> > The thing is, ideally they should not need to poke at these. 
> > These knobs are under CONFIG_SCHED_DEBUG, and that is exactly 
> > what they are for.
> 
> even then I would think they should be documented so people can 
> find out what item is hurting their workload so they can better 
> report the bug no?

Would be happy to apply such documentation patches. You could also 
help start adding a 'scheduler performance' wiki portion to 
perf.wiki.kernel.org, if you have time for that.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 16:26                                       ` Ingo Molnar
@ 2009-09-10 16:33                                         ` Bret Towe
  2009-09-10 17:03                                           ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Bret Towe @ 2009-09-10 16:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Nikos Chantziaras, Jens Axboe, Mike Galbraith,
	Con Kolivas, linux-kernel

On Thu, Sep 10, 2009 at 9:26 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Bret Towe <magnade@gmail.com> wrote:
>
>> On Thu, Sep 10, 2009 at 9:05 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>> > On Thu, 2009-09-10 at 09:02 -0700, Bret Towe wrote:
>> >>
>> >> thanks to this thread and others I've seen several kernel tunables
>> >> that can effect how the scheduler performs/acts
>> >> but what I don't see after a bit of looking is where all these are
>> >> documented
>> >> perhaps thats also part of the reason there are unhappy people with
>> >> the current code in the kernel just because they don't know how
>> >> to tune it for their workload
>> >
>> > The thing is, ideally they should not need to poke at these.
>> > These knobs are under CONFIG_SCHED_DEBUG, and that is exactly
>> > what they are for.
>>
>> even then I would think they should be documented so people can
>> find out what item is hurting their workload so they can better
>> report the bug no?
>
> Would be happy to apply such documentation patches. You could also
> help start adding a 'scheduler performance' wiki portion to
> perf.wiki.kernel.org, if you have time for that.

time isn't so much the issue but not having any clue as to what any
of the options do

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 16:33                                         ` Bret Towe
@ 2009-09-10 17:03                                           ` Ingo Molnar
  0 siblings, 0 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10 17:03 UTC (permalink / raw)
  To: Bret Towe
  Cc: Peter Zijlstra, Nikos Chantziaras, Jens Axboe, Mike Galbraith,
	Con Kolivas, linux-kernel


* Bret Towe <magnade@gmail.com> wrote:

> On Thu, Sep 10, 2009 at 9:26 AM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Bret Towe <magnade@gmail.com> wrote:
> >
> >> On Thu, Sep 10, 2009 at 9:05 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >> > On Thu, 2009-09-10 at 09:02 -0700, Bret Towe wrote:
> >> >>
> >> >> thanks to this thread and others I've seen several kernel tunables
> >> >> that can effect how the scheduler performs/acts
> >> >> but what I don't see after a bit of looking is where all these are
> >> >> documented
> >> >> perhaps thats also part of the reason there are unhappy people with
> >> >> the current code in the kernel just because they don't know how
> >> >> to tune it for their workload
> >> >
> >> > The thing is, ideally they should not need to poke at these.
> >> > These knobs are under CONFIG_SCHED_DEBUG, and that is exactly
> >> > what they are for.
> >>
> >> even then I would think they should be documented so people can
> >> find out what item is hurting their workload so they can better
> >> report the bug no?
> >
> > Would be happy to apply such documentation patches. You could also
> > help start adding a 'scheduler performance' wiki portion to
> > perf.wiki.kernel.org, if you have time for that.
> 
> time isn't so much the issue but not having any clue as to what 
> any of the options do

One approach would be to list them in an email in this thread with 
question marks and let people here fill them in - then help by 
organizing and prettifying the result on the wiki.

Asking for clarifications when an explanation is unclear is also 
helpful - those who write this code are not the best people to judge 
whether technical descriptions are understandable enough.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10  6:08                               ` Ingo Molnar
  2009-09-10  6:40                                 ` Ingo Molnar
  2009-09-10 16:02                                 ` Bret Towe
@ 2009-09-10 17:53                                 ` Nikos Chantziaras
  2009-09-10 18:46                                   ` Ingo Molnar
                                                     ` (2 more replies)
  2 siblings, 3 replies; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-10 17:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jens Axboe, Mike Galbraith, Peter Zijlstra, Con Kolivas, linux-kernel

On 09/10/2009 09:08 AM, Ingo Molnar wrote:
>
> * Nikos Chantziaras<realnc@arcor.de>  wrote:
>>
>> With your version of latt.c, I get these results with 2.6-tip vs
>> 2.6.31-rc9-bfs:
>>
>>
>> (mainline)
>> Averages:
>> ------------------------------
>>          Max            50 usec
>>          Avg            12 usec
>>          Stdev           3 usec
>>
>>
>> (BFS)
>> Averages:
>> ------------------------------
>>          Max           474 usec
>>          Avg            11 usec
>>          Stdev          16 usec
>>
>> However, the interactivity problems still remain.  Does that mean
>> it's not a latency issue?
>
> It means that Jens's test-app, which demonstrated and helped us fix
> the issue for him does not help us fix it for you just yet.
>
> The "fluidity problem" you described might not be a classic latency
> issue per se (which latt.c measures), but a timeslicing / CPU time
> distribution problem.
>
> A slight shift in CPU time allocation can change the flow of tasks
> to result in a 'choppier' system.
>
> Have you tried, in addition of the granularity tweaks you've done,
> to renice mplayer either up or down? (or compiz and Xorg for that
> matter)

Yes.  It seems to do what one would expect, but only if two separate 
programs are competing for CPU time continuously.  For example, when 
running two glxgears instances, one with nice 0 the other with 19, the 
first will report ~5000 FPS, the other ~1000.  Renicing the second one 
from 19 to 0, will result in both reporting ~3000.  So nice values 
obviously work in distributing CPU time.  But the problem isn't the 
available CPU time it seems since even if running glxgears nice -20, it 
will still freeze during various other interactive taks (moving windows 
etc.)


> [...]
>    # echo NO_NEW_FAIR_SLEEPERS>  /debug/sched_features
>
> Btw., NO_NEW_FAIR_SLEEPERS is something that will turn the scheduler
> into a more classic fair scheduler (like BFS is too).

Setting NO_NEW_FAIR_SLEEPERS (with everything else at default values) 
pretty much solves all issues I raised in all my other posts!  With this 
setting, I can do "nice -n 19 make -j20" and still have a very smooth 
desktop and watch a movie at the same time.  Various other annoyances 
(like the "logout/shutdown/restart" dialog of KDE not appearing at all 
until the background fade-out effect has finished) are also gone.  So 
this seems to be the single most important setting that vastly improves 
desktop behavior, at least here.

In fact, I liked this setting so much that I went to 
kernel/sched_features.h of kernel 2.6.30.5 (the kernel I use normally 
right now) and set SCHED_FEAT(NEW_FAIR_SLEEPERS, 0) (default is 1) with 
absolutely no other tweaks (like sched_latency_ns, 
sched_wakeup_granularity_ns, etc.).  It pretty much behaves like BFS now 
from an interactivity point of view.  But I've used it only for about an 
hour or so, so I don't know if any ill effects will appear later on.


> NO_START_DEBIT might be another thing that improves (or worsens :-/)
> make -j type of kernel build workloads.

No effect with this one, at least not one I could observe.

I didn't have the opportunity yet to test and tweak all the other 
various settings you listed, but I will try to do so as soon as possible.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable
  2009-09-10 10:02                                     ` Ingo Molnar
  2009-09-10 10:09                                       ` Jens Axboe
@ 2009-09-10 18:00                                       ` Ingo Molnar
  2009-09-11  7:37                                         ` Ingo Molnar
  1 sibling, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10 18:00 UTC (permalink / raw)
  To: Jens Axboe, Martin Schwidefsky, John Stultz
  Cc: Peter Zijlstra, Mike Galbraith, Con Kolivas, linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > I went to try -tip btw, but it crashes on boot. Here's the 
> > backtrace, typed manually, it's crashing in 
> > queue_work_on+0x28/0x60.
> > 
> > Call Trace:
> >         queue_work
> >         schedule_work
> >         clocksource_mark_unstable
> >         mark_tsc_unstable
> >         check_tsc_sync_source
> >         native_cpu_up
> >         relay_hotcpu_callback
> >         do_forK_idle
> >         _cpu_up
> >         cpu_up
> >         kernel_init
> >         kernel_thread_helper
> 
> hm, that looks like an old bug i fixed days ago via:
> 
>   00a3273: Revert "x86: Make tsc=reliable override boot time stability checks"
> 
> Have you tested tip:master - do you still know which sha1?

Ok, i reproduced it on a testbox and bisected it, the crash is 
caused by:

 7285dd7fd375763bfb8ab1ac9cf3f1206f503c16 is first bad commit
 commit 7285dd7fd375763bfb8ab1ac9cf3f1206f503c16
 Author: Thomas Gleixner <tglx@linutronix.de>
 Date:   Fri Aug 28 20:25:24 2009 +0200

    clocksource: Resolve cpu hotplug dead lock with TSC unstable
    
    Martin Schwidefsky analyzed it:

I've reverted it in tip/master for now.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 17:53                                 ` Nikos Chantziaras
@ 2009-09-10 18:46                                   ` Ingo Molnar
  2009-09-10 18:51                                   ` [tip:sched/core] sched: Disable NEW_FAIR_SLEEPERS for now tip-bot for Ingo Molnar
  2009-09-10 18:57                                   ` [tip:sched/core] sched: Fix sched::sched_stat_wait tracepoint field tip-bot for Ingo Molnar
  2 siblings, 0 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10 18:46 UTC (permalink / raw)
  To: Nikos Chantziaras
  Cc: Jens Axboe, Mike Galbraith, Peter Zijlstra, Con Kolivas, linux-kernel


* Nikos Chantziaras <realnc@arcor.de> wrote:

> On 09/10/2009 09:08 AM, Ingo Molnar wrote:
>>
>> * Nikos Chantziaras<realnc@arcor.de>  wrote:
>>>
>>> With your version of latt.c, I get these results with 2.6-tip vs
>>> 2.6.31-rc9-bfs:
>>>
>>>
>>> (mainline)
>>> Averages:
>>> ------------------------------
>>>          Max            50 usec
>>>          Avg            12 usec
>>>          Stdev           3 usec
>>>
>>>
>>> (BFS)
>>> Averages:
>>> ------------------------------
>>>          Max           474 usec
>>>          Avg            11 usec
>>>          Stdev          16 usec
>>>
>>> However, the interactivity problems still remain.  Does that mean
>>> it's not a latency issue?
>>
>> It means that Jens's test-app, which demonstrated and helped us fix
>> the issue for him does not help us fix it for you just yet.
>>
>> The "fluidity problem" you described might not be a classic latency
>> issue per se (which latt.c measures), but a timeslicing / CPU time
>> distribution problem.
>>
>> A slight shift in CPU time allocation can change the flow of tasks
>> to result in a 'choppier' system.
>>
>> Have you tried, in addition of the granularity tweaks you've done,
>> to renice mplayer either up or down? (or compiz and Xorg for that
>> matter)
>
> Yes.  It seems to do what one would expect, but only if two separate  
> programs are competing for CPU time continuously.  For example, when  
> running two glxgears instances, one with nice 0 the other with 19, the  
> first will report ~5000 FPS, the other ~1000.  Renicing the second one  
> from 19 to 0, will result in both reporting ~3000.  So nice values  
> obviously work in distributing CPU time.  But the problem isn't the  
> available CPU time it seems since even if running glxgears nice -20, it  
> will still freeze during various other interactive taks (moving windows  
> etc.)
>
>
>> [...]
>>    # echo NO_NEW_FAIR_SLEEPERS>  /debug/sched_features
>>
>> Btw., NO_NEW_FAIR_SLEEPERS is something that will turn the scheduler
>> into a more classic fair scheduler (like BFS is too).
>
> Setting NO_NEW_FAIR_SLEEPERS (with everything else at default 
> values) pretty much solves all issues I raised in all my other 
> posts!  With this setting, I can do "nice -n 19 make -j20" and 
> still have a very smooth desktop and watch a movie at the same 
> time.  Various other annoyances (like the 
> "logout/shutdown/restart" dialog of KDE not appearing at all until 
> the background fade-out effect has finished) are also gone.  So 
> this seems to be the single most important setting that vastly 
> improves desktop behavior, at least here.
>
> In fact, I liked this setting so much that I went to 
> kernel/sched_features.h of kernel 2.6.30.5 (the kernel I use 
> normally right now) and set SCHED_FEAT(NEW_FAIR_SLEEPERS, 0) 
> (default is 1) with absolutely no other tweaks (like 
> sched_latency_ns, sched_wakeup_granularity_ns, etc.).  It pretty 
> much behaves like BFS now from an interactivity point of view.  
> But I've used it only for about an hour or so, so I don't know if 
> any ill effects will appear later on.

ok, this is quite an important observation!

Either NEW_FAIR_SLEEPERS is broken, or if it works it's not what we 
want to do. Other measures in the scheduler protect us from fatal 
badness here, but all the finer wakeup behavior is out the window 
really.

Will check this. We'll probably start with a quick commit disabling 
it first - then re-enabling it if it's fixed (will Cc: you so that 
you can re-test with fixed-NEW_FAIR_SLEEPERS, if it's re-enabled).

Thanks a lot for the persistent testing!

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* [tip:sched/core] sched: Disable NEW_FAIR_SLEEPERS for now
  2009-09-10 17:53                                 ` Nikos Chantziaras
  2009-09-10 18:46                                   ` Ingo Molnar
@ 2009-09-10 18:51                                   ` tip-bot for Ingo Molnar
  2009-09-10 18:57                                   ` [tip:sched/core] sched: Fix sched::sched_stat_wait tracepoint field tip-bot for Ingo Molnar
  2 siblings, 0 replies; 216+ messages in thread
From: tip-bot for Ingo Molnar @ 2009-09-10 18:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, jens.axboe,
	realnc, tglx, mingo

Commit-ID:  3f2aa307c4d26b4ed6509d0a79e8254c9e07e921
Gitweb:     http://git.kernel.org/tip/3f2aa307c4d26b4ed6509d0a79e8254c9e07e921
Author:     Ingo Molnar <mingo@elte.hu>
AuthorDate: Thu, 10 Sep 2009 20:34:48 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 10 Sep 2009 20:34:48 +0200

sched: Disable NEW_FAIR_SLEEPERS for now

Nikos Chantziaras and Jens Axboe reported that turning off
NEW_FAIR_SLEEPERS improves desktop interactivity visibly.

Nikos described his experiences the following way:

  " With this setting, I can do "nice -n 19 make -j20" and
    still have a very smooth desktop and watch a movie at
    the same time.  Various other annoyances (like the
    "logout/shutdown/restart" dialog of KDE not appearing
    at all until the background fade-out effect has finished)
    are also gone.  So this seems to be the single most
    important setting that vastly improves desktop behavior,
    at least here. "

Jens described it the following way, referring to a 10-seconds
xmodmap scheduling delay he was trying to debug:

  " Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then
    I get:

    Performance counter stats for 'xmodmap .xmodmap-carl':

         9.009137  task-clock-msecs         #      0.447 CPUs
               18  context-switches         #      0.002 M/sec
                1  CPU-migrations           #      0.000 M/sec
              315  page-faults              #      0.035 M/sec

    0.020167093  seconds time elapsed

    Woot! "

So disable it for now. In perf trace output i can see weird
delta timestamps:

  cc1-9943  [001]  2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]

That nsec field is not supposed to be that large. More digging
is needed - but lets turn it off while the real bug is found.

Reported-by: Nikos Chantziaras <realnc@arcor.de>
Tested-by: Nikos Chantziaras <realnc@arcor.de>
Reported-by: Jens Axboe <jens.axboe@oracle.com>
Tested-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <4AA93D34.8040500@arcor.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/sched_features.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index 4569bfa..e2dc63a 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -1,4 +1,4 @@
-SCHED_FEAT(NEW_FAIR_SLEEPERS, 1)
+SCHED_FEAT(NEW_FAIR_SLEEPERS, 0)
 SCHED_FEAT(NORMALIZED_SLEEPER, 0)
 SCHED_FEAT(ADAPTIVE_GRAN, 1)
 SCHED_FEAT(WAKEUP_PREEMPT, 1)

^ permalink raw reply related	[flat|nested] 216+ messages in thread

* [tip:sched/core] sched: Fix sched::sched_stat_wait tracepoint field
  2009-09-10 17:53                                 ` Nikos Chantziaras
  2009-09-10 18:46                                   ` Ingo Molnar
  2009-09-10 18:51                                   ` [tip:sched/core] sched: Disable NEW_FAIR_SLEEPERS for now tip-bot for Ingo Molnar
@ 2009-09-10 18:57                                   ` tip-bot for Ingo Molnar
  2 siblings, 0 replies; 216+ messages in thread
From: tip-bot for Ingo Molnar @ 2009-09-10 18:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, jens.axboe,
	realnc, tglx, mingo

Commit-ID:  e1f8450854d69f0291882804406ea1bab3ca44b4
Gitweb:     http://git.kernel.org/tip/e1f8450854d69f0291882804406ea1bab3ca44b4
Author:     Ingo Molnar <mingo@elte.hu>
AuthorDate: Thu, 10 Sep 2009 20:52:09 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 10 Sep 2009 20:52:54 +0200

sched: Fix sched::sched_stat_wait tracepoint field

This weird perf trace output:

  cc1-9943  [001]  2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]

Is caused by setting one component field of the delta to zero
a bit too early. Move it to later.

( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug,
  it's just a reporting bug in essence. )

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nikos Chantziaras <realnc@arcor.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <4AA93D34.8040500@arcor.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/sched_fair.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 26fadb4..aa7f841 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -545,14 +545,13 @@ update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	schedstat_set(se->wait_count, se->wait_count + 1);
 	schedstat_set(se->wait_sum, se->wait_sum +
 			rq_of(cfs_rq)->clock - se->wait_start);
-	schedstat_set(se->wait_start, 0);
-
 #ifdef CONFIG_SCHEDSTATS
 	if (entity_is_task(se)) {
 		trace_sched_stat_wait(task_of(se),
 			rq_of(cfs_rq)->clock - se->wait_start);
 	}
 #endif
+	schedstat_set(se->wait_start, 0);
 }
 
 static inline void

^ permalink raw reply related	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09  9:17                       ` Peter Zijlstra
  2009-09-09  9:40                         ` Nikos Chantziaras
@ 2009-09-10 19:45                         ` Martin Steigerwald
  2009-09-10 20:06                           ` Ingo Molnar
  1 sibling, 1 reply; 216+ messages in thread
From: Martin Steigerwald @ 2009-09-10 19:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Nikos Chantziaras, Mike Galbraith, Ingo Molnar,
	Jens Axboe, Con Kolivas

[-- Attachment #1: Type: Text/Plain, Size: 1239 bytes --]

Am Mittwoch 09 September 2009 schrieb Peter Zijlstra:
> On Wed, 2009-09-09 at 12:05 +0300, Nikos Chantziaras wrote:
> > Thank you for mentioning min_granularity.  After:
> >
> >    echo 10000000 > /proc/sys/kernel/sched_latency_ns
> >    echo 2000000 > /proc/sys/kernel/sched_min_granularity_ns
> 
> You might also want to do:
> 
>      echo 2000000 > /proc/sys/kernel/sched_wakeup_granularity_ns
> 
> That affects when a newly woken task will preempt an already running
> task.

Heh that scheduler thing again... and unfortunately Col appearing to feel 
hurt while I am think that Ingo is honest on his offer on collaboration...

While it makes fun playing with that numbers and indeed experiencing 
subjectively a more fluid deskopt how about just a

echo "This is a f* desktop!" > /proc/sys/kernel/sched_workload

Or to say it in other words: The Linux kernel should not require me to 
fine-tune three or more values to have the scheduler act in a way that 
matches my workload.

I am willing to test stuff on my work thinkpad and my Amarok thinkpad in 
order to help improving with that.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 19:45                         ` Martin Steigerwald
@ 2009-09-10 20:06                           ` Ingo Molnar
  2009-09-10 20:39                             ` Martin Steigerwald
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10 20:06 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: linux-kernel, Peter Zijlstra, Nikos Chantziaras, Mike Galbraith,
	Jens Axboe, Con Kolivas


* Martin Steigerwald <Martin@lichtvoll.de> wrote:

> Am Mittwoch 09 September 2009 schrieb Peter Zijlstra:
> > On Wed, 2009-09-09 at 12:05 +0300, Nikos Chantziaras wrote:
> > > Thank you for mentioning min_granularity.  After:
> > >
> > >    echo 10000000 > /proc/sys/kernel/sched_latency_ns
> > >    echo 2000000 > /proc/sys/kernel/sched_min_granularity_ns
> > 
> > You might also want to do:
> > 
> >      echo 2000000 > /proc/sys/kernel/sched_wakeup_granularity_ns
> > 
> > That affects when a newly woken task will preempt an already running
> > task.
> 
> Heh that scheduler thing again... and unfortunately Col appearing 
> to feel hurt while I am think that Ingo is honest on his offer on 
> collaboration...
> 
> While it makes fun playing with that numbers and indeed 
> experiencing subjectively a more fluid deskopt how about just a
> 
> echo "This is a f* desktop!" > /proc/sys/kernel/sched_workload

No need to do that, that's supposed to be the default :-) The knobs 
are really just there to help us make it even more so - i.e. you 
dont need to tune them. But it really relies on people helping us 
out and tell us which combinations work best ...

> Or to say it in other words: The Linux kernel should not require 
> me to fine-tune three or more values to have the scheduler act in 
> a way that matches my workload.
> 
> I am willing to test stuff on my work thinkpad and my Amarok 
> thinkpad in order to help improving with that.

It would be great if you could check latest -tip:

  http://people.redhat.com/mingo/tip.git/README

and compare it to vanilla .31?

Also, could you outline the interactivity problems/complaints you 
have?

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 18:15     ` Nikos Chantziaras
@ 2009-09-10 20:25       ` Frederic Weisbecker
  0 siblings, 0 replies; 216+ messages in thread
From: Frederic Weisbecker @ 2009-09-10 20:25 UTC (permalink / raw)
  To: Nikos Chantziaras; +Cc: linux-kernel, Jens Axboe, Ingo Molnar, Con Kolivas

On Tue, Sep 08, 2009 at 09:15:22PM +0300, Nikos Chantziaras wrote:
> On 09/07/2009 02:01 PM, Frederic Weisbecker wrote:
>> That looks eventually benchmarkable. This is about latency.
>> For example, you could try to run high load tasks in the
>> background and then launch a task that wakes up in middle/large
>> periods to do something. You could measure the time it takes to wake
>> it up to perform what it wants.
>>
>> We have some events tracing infrastructure in the kernel that can
>> snapshot the wake up and sched switch events.
>>
>> Having CONFIG_EVENT_TRACING=y should be sufficient for that.
>>
>> You just need to mount a debugfs point, say in /debug.
>>
>> Then you can activate these sched events by doing:
>>
>> echo 0>  /debug/tracing/tracing_on
>> echo 1>  /debug/tracing/events/sched/sched_switch/enable
>> echo 1>  /debug/tracing/events/sched/sched_wake_up/enable
>>
>> #Launch your tasks
>>
>> echo 1>  /debug/tracing/tracing_on
>>
>> #Wait for some time
>>
>> echo 0>  /debug/tracing/tracing_off
>>
>> That will require some parsing of the result in /debug/tracing/trace
>> to get the delays between wake_up events and switch in events
>> for the task that periodically wakes up and then produce some
>> statistics such as the average or the maximum latency.
>>
>> That's a bit of a rough approach to measure such latencies but that
>> should work.
>
> I've tried this with 2.6.31-rc9 while running mplayer and alt+tabbing  
> repeatedly to the point where mplayer starts to stall and drop frames.  
> This produced a 4.1MB trace file (132k bzip2'ed):
>
>     http://foss.math.aegean.gr/~realnc/kernel/trace1.bz2
>
> Uncompressed for online viewing:
>
>     http://foss.math.aegean.gr/~realnc/kernel/trace1
>
> I must admit that I don't know what it is I'm looking at :P


Hehe :-)

Basically you have samples of two kind of events:

- wake up (when thread A wakes up B)

The format is as follows:


	task-pid
	(the waker A)
	   |
	   |     cpu     timestamp   event-name        wakee(B)    prio    status
	   |      |          |           |               |          |        |
	X-11482 [001]  1023.219246: sched_wakeup: task kwin:11571 [120] success=1

Here X is awakening kwin.


- sched switch (when the scheduler stops A and launches B)

	                                            A, task                  B, task
	                                            that gets                that gets
	                                            sched                    sched
                                                     out                      in
	  A      cpu    timestamp     event-name       |       A prio          |        B prio
	  |       |         |             |            |         |             |          |
	X-11482 [001]  1023.219247: sched_switch: task X:11482 [120] (R) ==> kwin:11571 [120]
	                                                              |
	                                                              |
                                                                    State of A
For A state we can have either:

R: TASK_RUNNING, the task is not sleeping but it is rescheduled for later
   to let another task run

S: TASK_INTERRUPTIBLE, the task is sleeping, waiting for an event that may
   wake it up. The task can be waked by a signal

D: TASK_UNINTERRUPTIBLE, same as above but can't be waked by a signal.


Now what could be interesting interesting is to measure the time between
such pair of events:

	- t0: A wakes up B
	- t1: B is sched in

t1 - t0 would then be the scheduler latency, or at least part of it:

The scheduler latency may be an addition of several factors:

	- the time it takes for the actual wake up to perform (re-insert
	  the task into a runqueue, which can be subject to the runqueue(s)
	  design, the rebalancing if needed, etc..

	- the time between a task is waked up and the scheduler eventually
	  decide to schedule it in.

	- the time it takes to perform the task switch, which is not only
	  in the scheduler scope. But the time it takes may depend of a
	  rebalancing decision (cache cold, etc..)

Unfortunately we can only measure the second part with the above ftrace
events. But that's still an interesting scheduler abstract that is a
large part of the scheduler latency.

We could write a tiny parser that could walk through such ftrace traces
and produce some average, maximum, standard deviation numbers.

But we have userspace tools that can parse ftrace events (through perf
counter), so I'm trying to write something there, hopefully I could get
a relevant end result.

Thanks.
                                                                    


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 20:06                           ` Ingo Molnar
@ 2009-09-10 20:39                             ` Martin Steigerwald
  2009-09-10 20:42                               ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Martin Steigerwald @ 2009-09-10 20:39 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar, Peter Zijlstra, Nikos Chantziaras,
	Mike Galbraith, Jens Axboe, Con Kolivas

Am Donnerstag 10 September 2009 schrieb Ingo Molnar:
> * Martin Steigerwald <Martin@lichtvoll.de> wrote:
> > Am Mittwoch 09 September 2009 schrieb Peter Zijlstra:
> > > On Wed, 2009-09-09 at 12:05 +0300, Nikos Chantziaras wrote:
> > > > Thank you for mentioning min_granularity.  After:
> > > >
> > > >    echo 10000000 > /proc/sys/kernel/sched_latency_ns
> > > >    echo 2000000 > /proc/sys/kernel/sched_min_granularity_ns
> > >
> > > You might also want to do:
> > >
> > >      echo 2000000 > /proc/sys/kernel/sched_wakeup_granularity_ns
> > >
> > > That affects when a newly woken task will preempt an already
> > > running task.
> >
> > Heh that scheduler thing again... and unfortunately Col appearing
> > to feel hurt while I am think that Ingo is honest on his offer on
> > collaboration...
> >
> > While it makes fun playing with that numbers and indeed
> > experiencing subjectively a more fluid deskopt how about just a
> >
> > echo "This is a f* desktop!" > /proc/sys/kernel/sched_workload
> 
> No need to do that, that's supposed to be the default :-) The knobs
> are really just there to help us make it even more so - i.e. you
> dont need to tune them. But it really relies on people helping us
> out and tell us which combinations work best ...

Well currently I have:

shambhala:/proc/sys/kernel> grep "" sched_latency_ns 
sched_min_granularity_ns sched_wakeup_granularity_ns
sched_latency_ns:100000
sched_min_granularity_ns:200000
sched_wakeup_granularity_ns:0

And this give me *a completely different* desktop experience.

I am using KDE 4.3.1 on a mixture of Debian Squeeze/Sid/Experimental, with 
compositing. And now when I flip desktops or open a window I can *actually 
see* the animation. Before I jusooooooooooooooooooooot saw two to five 
steps of the animation, 
now its really a lot more fluid. 

perceived _latency--! Well its like 
oooooooooooooooooooooooooooooooooooooooooooooooooooooooopening the eyes 
again cause I tended 
to take the jerky behavior as normal and possibly related to having KDE 
4.3.1 with compositing enabled on a ThinkPad T42 with 
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350 
[Mobility Radeon 9600 M10] [1002:4e50]

which I consider to be low end for that workload. But then why actually? 
Next to me is a Sam440ep with PPC 440 667 MHz and and even older Radeon M9 
with AmigaOS 4.1 and some simple transparency effects with compositing. And 
well this combo does feel like it wheel spins cause the hardware is 
actually to fast 
foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

> 
> > Or to say it in other words: The Linux kernel should not require
> > me to fine-tune three or more values to have the scheduler act in
> > a way that matches my workload.
> >
> > I am willing to test stuff on my work thinkpad and my Amarok
> > thinkpad in order to help improving with that.
> 
> It would be great if you could check latest -tip:
> 
>   http://people.redhat.com/mingo/tip.git/README
> 
> and compare it to vanilla .31?
> 
> Also, could you outline the interactivity problems/complaints you
> have?
> 
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
>  in the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 20:39                             ` Martin Steigerwald
@ 2009-09-10 20:42                               ` Ingo Molnar
  2009-09-10 21:19                                 ` Martin Steigerwald
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-10 20:42 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: linux-kernel, Peter Zijlstra, Nikos Chantziaras, Mike Galbraith,
	Jens Axboe, Con Kolivas


* Martin Steigerwald <Martin@lichtvoll.de> wrote:

> Am Donnerstag 10 September 2009 schrieb Ingo Molnar:
> > * Martin Steigerwald <Martin@lichtvoll.de> wrote:
> > > Am Mittwoch 09 September 2009 schrieb Peter Zijlstra:
> > > > On Wed, 2009-09-09 at 12:05 +0300, Nikos Chantziaras wrote:
> > > > > Thank you for mentioning min_granularity.  After:
> > > > >
> > > > >    echo 10000000 > /proc/sys/kernel/sched_latency_ns
> > > > >    echo 2000000 > /proc/sys/kernel/sched_min_granularity_ns
> > > >
> > > > You might also want to do:
> > > >
> > > >      echo 2000000 > /proc/sys/kernel/sched_wakeup_granularity_ns
> > > >
> > > > That affects when a newly woken task will preempt an already
> > > > running task.
> > >
> > > Heh that scheduler thing again... and unfortunately Col appearing
> > > to feel hurt while I am think that Ingo is honest on his offer on
> > > collaboration...
> > >
> > > While it makes fun playing with that numbers and indeed
> > > experiencing subjectively a more fluid deskopt how about just a
> > >
> > > echo "This is a f* desktop!" > /proc/sys/kernel/sched_workload
> > 
> > No need to do that, that's supposed to be the default :-) The knobs
> > are really just there to help us make it even more so - i.e. you
> > dont need to tune them. But it really relies on people helping us
> > out and tell us which combinations work best ...
> 
> Well currently I have:
> 
> shambhala:/proc/sys/kernel> grep "" sched_latency_ns 
> sched_min_granularity_ns sched_wakeup_granularity_ns
> sched_latency_ns:100000
> sched_min_granularity_ns:200000
> sched_wakeup_granularity_ns:0
> 
> And this give me *a completely different* desktop experience.

what is /debug/sched_features - is NO_NEW_FAIR_SLEEPERS set? If not 
set yet then try it:

  echo NO_NEW_FAIR_SLEEPERS > /debug/sched_features

that too might make things more fluid.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 20:42                               ` Ingo Molnar
@ 2009-09-10 21:19                                 ` Martin Steigerwald
  2009-09-11  9:26                                   ` Mat
  0 siblings, 1 reply; 216+ messages in thread
From: Martin Steigerwald @ 2009-09-10 21:19 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: Text/Plain, Size: 2332 bytes --]

Am Donnerstag 10 September 2009 schrieb Ingo Molnar:
> * Martin Steigerwald <Martin@lichtvoll.de> wrote:
> > Am Donnerstag 10 September 2009 schrieb Ingo Molnar:
> > > * Martin Steigerwald <Martin@lichtvoll.de> wrote:
> > > > Am Mittwoch 09 September 2009 schrieb Peter Zijlstra:
> > > > > On Wed, 2009-09-09 at 12:05 +0300, Nikos Chantziaras wrote:
> > > > > > Thank you for mentioning min_granularity.  After:
> > > > > >
> > > > > >    echo 10000000 > /proc/sys/kernel/sched_latency_ns
> > > > > >    echo 2000000 > /proc/sys/kernel/sched_min_granularity_ns
> > > > >
> > > > > You might also want to do:
> > > > >
> > > > >      echo 2000000 >
> > > > > /proc/sys/kernel/sched_wakeup_granularity_ns
> > > > >
> > > > > That affects when a newly woken task will preempt an already
> > > > > running task.
> > > >
> > > > Heh that scheduler thing again... and unfortunately Col appearing
> > > > to feel hurt while I am think that Ingo is honest on his offer on
> > > > collaboration...
> > > >
> > > > While it makes fun playing with that numbers and indeed
> > > > experiencing subjectively a more fluid deskopt how about just a
> > > >
> > > > echo "This is a f* desktop!" > /proc/sys/kernel/sched_workload
> > >
> > > No need to do that, that's supposed to be the default :-) The knobs
> > > are really just there to help us make it even more so - i.e. you
> > > dont need to tune them. But it really relies on people helping us
> > > out and tell us which combinations work best ...
> >
> > Well currently I have:
> >
> > shambhala:/proc/sys/kernel> grep "" sched_latency_ns
> > sched_min_granularity_ns sched_wakeup_granularity_ns
> > sched_latency_ns:100000
> > sched_min_granularity_ns:200000
> > sched_wakeup_granularity_ns:0
> >
> > And this give me *a completely different* desktop experience.
> 
> what is /debug/sched_features - is NO_NEW_FAIR_SLEEPERS set? If not
> set yet then try it:
> 
>   echo NO_NEW_FAIR_SLEEPERS > /debug/sched_features
> 
> that too might make things more fluid.

Hmmm, need to mount that first. But not today, cause I have to dig out on 
how to do it. Have to pack some things for tomorrow. And then sleep time.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-10  6:53         ` Ingo Molnar
@ 2009-09-10 23:23           ` Serge Belyshev
  2009-09-11  6:10             ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Serge Belyshev @ 2009-09-10 23:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

[-- Attachment #1: Type: text/plain, Size: 2114 bytes --]

Ingo Molnar <mingo@elte.hu> writes:

>    perf stat --repeat 3 make -j4 bzImage

BFS hangs here:

[  128.859000] BUG: soft lockup - CPU#2 stuck for 61s! [sh:7946]
[  128.859016] Modules linked in:
[  128.859016] CPU 2:
[  128.859016] Modules linked in:
[  128.859016] Pid: 7946, comm: sh Not tainted 2.6.31-bfs211-dirty #4 GA-MA790FX-DQ6
[  128.859016] RIP: 0010:[<ffffffff81055a52>]  [<ffffffff81055a52>] task_oncpu_function_call+0x22/0x40
[  128.859016] RSP: 0018:ffff880205967e18  EFLAGS: 00000246
[  128.859016] RAX: 0000000000000002 RBX: ffff880205964cc0 RCX: 000000000000dd00
[  128.859016] RDX: ffff880211138c00 RSI: ffffffff8108d3f0 RDI: ffff88022e42a100
[  128.859016] RBP: ffffffff8102d76e R08: ffff880028066000 R09: 0000000000000000
[  128.859016] R10: 0000000000000000 R11: 0000000000000058 R12: ffffffff8108d3f0
[  128.859016] R13: ffff880211138c00 R14: 0000000000000001 R15: 000000000000e260
[  128.859016] FS:  00002b9ba0924e00(0000) GS:ffff880028066000(0000) knlGS:0000000000000000
[  128.859016] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  128.859016] CR2: 00002b9ba091e4a8 CR3: 0000000001001000 CR4: 00000000000006e0
[  128.859016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  128.859016] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  128.859016] Call Trace:
[  128.859016]  [<ffffffff8108ee3b>] ? perf_counter_remove_from_context+0x3b/0x90
[  128.859016]  [<ffffffff810904b4>] ? perf_counter_exit_task+0x114/0x340
[  128.859016]  [<ffffffff810c3f66>] ? filp_close+0x56/0x90
[  128.859016]  [<ffffffff8105d3ac>] ? do_exit+0x14c/0x6f0
[  128.859016]  [<ffffffff8105d991>] ? do_group_exit+0x41/0xb0
[  128.859016]  [<ffffffff8105da12>] ? sys_exit_group+0x12/0x20
[  128.859016]  [<ffffffff8102cceb>] ? system_call_fastpath+0x16/0x1b

So, got nothing to compare with.

> Also, it would be nice if you could send me your kernel config - 
> maybe it's some config detail that keeps me from being able to 
> reproduce these results. I havent seen a link to a config in your 
> mails (maybe i missed it - these threads are voluminous).

Attached.

[-- Attachment #2: .config --]
[-- Type: text/plain, Size: 52628 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.31-rc9
# Wed Sep  9 18:35:46 2009
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA=y
CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_TRAMPOLINE=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set
# CONFIG_AUDIT is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_TREE_PREEMPT_RCU is not set
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_FANOUT=64
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=20
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
# CONFIG_GROUP_SCHED is not set
# CONFIG_CGROUPS is not set
# CONFIG_SYSFS_DEPRECATED_V2 is not set
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_HAVE_PERF_COUNTERS=y

#
# Performance Counters
#
CONFIG_PERF_COUNTERS=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
# CONFIG_MARKERS is not set
# CONFIG_OPROFILE is not set
CONFIG_HAVE_OPROFILE=y
# CONFIG_KPROBES is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y

#
# GCOV-based kernel profiling
#
# CONFIG_SLOW_WORK is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
# CONFIG_BLK_DEV_INTEGRITY is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_FREEZER is not set

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
# CONFIG_SPARSE_IRQ is not set
CONFIG_X86_MPPARSE=y
# CONFIG_X86_EXTENDED_PLATFORM is not set
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT_GUEST is not set
CONFIG_MEMTEST=y
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
CONFIG_MK8=y
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_CPU=y
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
# CONFIG_X86_DS is not set
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
# CONFIG_CALGARY_IOMMU is not set
# CONFIG_AMD_IOMMU is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
CONFIG_IOMMU_API=y
# CONFIG_MAXSMP is not set
CONFIG_NR_CPUS=4
# CONFIG_SCHED_SMT is not set
# CONFIG_SCHED_MC is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_INTEL is not set
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
# CONFIG_X86_MCE_INJECT is not set
# CONFIG_I8K is not set
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
# CONFIG_X86_CPU_DEBUG is not set
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
# CONFIG_NUMA is not set
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
# CONFIG_MEMORY_HOTPLUG is not set
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_HAVE_MLOCK=y
CONFIG_HAVE_MLOCKED_PAGE_BIT=y
CONFIG_MMU_NOTIFIER=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=0
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_X86_RESERVE_LOW_64K=y
CONFIG_MTRR=y
# CONFIG_MTRR_SANITIZER is not set
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
# CONFIG_EFI is not set
# CONFIG_SECCOMP is not set
# CONFIG_CC_STACKPROTECTOR is not set
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x200000
CONFIG_RELOCATABLE=y
CONFIG_PHYSICAL_ALIGN=0x1000000
# CONFIG_HOTPLUG_CPU is not set
# CONFIG_COMPAT_VDSO is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y

#
# Power management and ACPI options
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
# CONFIG_SUSPEND is not set
# CONFIG_HIBERNATION is not set
CONFIG_ACPI=y
# CONFIG_ACPI_PROCFS is not set
# CONFIG_ACPI_PROCFS_POWER is not set
CONFIG_ACPI_SYSFS_POWER=y
# CONFIG_ACPI_PROC_EVENT is not set
# CONFIG_ACPI_AC is not set
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_BUTTON=y
# CONFIG_ACPI_FAN is not set
# CONFIG_ACPI_DOCK is not set
# CONFIG_ACPI_PROCESSOR is not set
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_X86_PM_TIMER=y
# CONFIG_ACPI_CONTAINER is not set
# CONFIG_ACPI_SBS is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y

#
# Memory power savings
#
# CONFIG_I7300_IDLE is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
CONFIG_DMAR=y
CONFIG_DMAR_DEFAULT_ON=y
# CONFIG_DMAR_BROKEN_GFX_WA is not set
CONFIG_DMAR_FLOPPY_WA=y
# CONFIG_INTR_REMAP is not set
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y
# CONFIG_PCIE_ECRC is not set
# CONFIG_PCIEAER_INJECT is not set
CONFIG_PCIEASPM=y
# CONFIG_PCIEASPM_DEBUG is not set
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
# CONFIG_PCI_LEGACY is not set
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_STUB is not set
CONFIG_HT_IRQ=y
# CONFIG_PCI_IOV is not set
CONFIG_ISA_DMA_API=y
CONFIG_K8_NB=y
# CONFIG_PCCARD is not set
# CONFIG_HOTPLUG_PCI is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_HAVE_AOUT is not set
# CONFIG_BINFMT_MISC is not set
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
# CONFIG_IP_FIB_TRIE is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_MULTIPLE_TABLES=y
# CONFIG_IP_ROUTE_MULTIPATH is not set
CONFIG_IP_ROUTE_VERBOSE=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
CONFIG_NET_IPGRE=y
# CONFIG_NET_IPGRE_BROADCAST is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
CONFIG_INET_TUNNEL=y
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
# CONFIG_INET_LRO is not set
# CONFIG_INET_DIAG is not set
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
CONFIG_IPV6_PRIVACY=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
# CONFIG_INET6_AH is not set
# CONFIG_INET6_ESP is not set
# CONFIG_INET6_IPCOMP is not set
# CONFIG_IPV6_MIP6 is not set
# CONFIG_INET6_XFRM_TUNNEL is not set
# CONFIG_INET6_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET6_XFRM_MODE_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_BEET is not set
# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
CONFIG_IPV6_SIT=y
CONFIG_IPV6_NDISC_NODETYPE=y
# CONFIG_IPV6_TUNNEL is not set
# CONFIG_IPV6_MULTIPLE_TABLES is not set
# CONFIG_IPV6_MROUTE is not set
# CONFIG_NETWORK_SECMARK is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
# CONFIG_NETFILTER_ADVANCED is not set

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=y
CONFIG_NETFILTER_NETLINK_LOG=y
CONFIG_NF_CONNTRACK=y
CONFIG_NF_CONNTRACK_FTP=y
CONFIG_NF_CONNTRACK_IRC=y
CONFIG_NF_CONNTRACK_SIP=y
CONFIG_NF_CT_NETLINK=y
CONFIG_NETFILTER_XTABLES=y
CONFIG_NETFILTER_XT_TARGET_MARK=y
CONFIG_NETFILTER_XT_TARGET_NFLOG=y
CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
CONFIG_NETFILTER_XT_MATCH_MARK=y
CONFIG_NETFILTER_XT_MATCH_STATE=y
# CONFIG_IP_VS is not set

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=y
CONFIG_NF_CONNTRACK_IPV4=y
CONFIG_NF_CONNTRACK_PROC_COMPAT=y
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_TARGET_REJECT=y
CONFIG_IP_NF_TARGET_LOG=y
CONFIG_IP_NF_TARGET_ULOG=y
CONFIG_NF_NAT=y
CONFIG_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=y
CONFIG_NF_NAT_FTP=y
CONFIG_NF_NAT_IRC=y
# CONFIG_NF_NAT_TFTP is not set
# CONFIG_NF_NAT_AMANDA is not set
# CONFIG_NF_NAT_PPTP is not set
# CONFIG_NF_NAT_H323 is not set
CONFIG_NF_NAT_SIP=y
CONFIG_IP_NF_MANGLE=y

#
# IPv6: Netfilter Configuration
#
CONFIG_NF_CONNTRACK_IPV6=y
CONFIG_IP6_NF_IPTABLES=y
CONFIG_IP6_NF_MATCH_IPV6HEADER=y
CONFIG_IP6_NF_TARGET_LOG=y
CONFIG_IP6_NF_FILTER=y
CONFIG_IP6_NF_TARGET_REJECT=y
CONFIG_IP6_NF_MANGLE=y
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_STP=y
CONFIG_BRIDGE=y
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=y
# CONFIG_VLAN_8021Q_GVRP is not set
# CONFIG_DECNET is not set
CONFIG_LLC=y
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
# CONFIG_IEEE802154 is not set
# CONFIG_NET_SCHED is not set
# CONFIG_DCB is not set

#
# Network testing
#
CONFIG_NET_PKTGEN=y
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y
# CONFIG_WIRELESS is not set
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_CONNECTOR=m
# CONFIG_MTD is not set
# CONFIG_PARPORT is not set
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
CONFIG_BLK_DEV_NBD=y
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
# CONFIG_BLK_DEV_XIP is not set
CONFIG_CDROM_PKTCDVD=y
CONFIG_CDROM_PKTCDVD_BUFFERS=64
CONFIG_CDROM_PKTCDVD_WCACHE=y
CONFIG_ATA_OVER_ETH=y
# CONFIG_VIRTIO_BLK is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_MISC_DEVICES=y
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ICS932S401 is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
# CONFIG_ISL29003 is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
CONFIG_EEPROM_LEGACY=y
# CONFIG_EEPROM_MAX6875 is not set
# CONFIG_EEPROM_93CX6 is not set
# CONFIG_CB710_CORE is not set
CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_TGT=y
# CONFIG_SCSI_NETLINK is not set
# CONFIG_SCSI_PROC_FS is not set

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=y
# CONFIG_BLK_DEV_SR_VENDOR is not set
CONFIG_CHR_DEV_SG=y
# CONFIG_CHR_DEV_SCH is not set
# CONFIG_SCSI_MULTI_LUN is not set
# CONFIG_SCSI_CONSTANTS is not set
# CONFIG_SCSI_LOGGING is not set
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
# CONFIG_SCSI_SPI_ATTRS is not set
# CONFIG_SCSI_FC_ATTRS is not set
# CONFIG_SCSI_ISCSI_ATTRS is not set
CONFIG_SCSI_SAS_ATTRS=y
# CONFIG_SCSI_SAS_LIBSAS is not set
# CONFIG_SCSI_SRP_ATTRS is not set
# CONFIG_SCSI_LOWLEVEL is not set
# CONFIG_SCSI_DH is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
CONFIG_ATA=y
# CONFIG_ATA_NONSTANDARD is not set
# CONFIG_ATA_ACPI is not set
CONFIG_SATA_PMP=y
CONFIG_SATA_AHCI=y
CONFIG_SATA_SIL24=y
CONFIG_ATA_SFF=y
# CONFIG_SATA_SVW is not set
# CONFIG_ATA_PIIX is not set
# CONFIG_SATA_MV is not set
# CONFIG_SATA_NV is not set
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SX4 is not set
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set
# CONFIG_SATA_INIC162X is not set
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_ATA_GENERIC is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_IT8213 is not set
CONFIG_PATA_JMICRON=y
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RZ1000 is not set
# CONFIG_PATA_SC1200 is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set
# CONFIG_PATA_SCH is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
# CONFIG_MD_LINEAR is not set
CONFIG_MD_RAID0=y
CONFIG_MD_RAID1=y
# CONFIG_MD_RAID10 is not set
# CONFIG_MD_RAID456 is not set
# CONFIG_MD_MULTIPATH is not set
CONFIG_MD_FAULTY=y
CONFIG_BLK_DEV_DM=y
# CONFIG_DM_DEBUG is not set
CONFIG_DM_CRYPT=y
# CONFIG_DM_SNAPSHOT is not set
# CONFIG_DM_MIRROR is not set
CONFIG_DM_ZERO=y
# CONFIG_DM_MULTIPATH is not set
# CONFIG_DM_DELAY is not set
# CONFIG_DM_UEVENT is not set
CONFIG_FUSION=y
# CONFIG_FUSION_SPI is not set
# CONFIG_FUSION_FC is not set
CONFIG_FUSION_SAS=y
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_CTL=m
CONFIG_FUSION_LOGGING=y

#
# IEEE 1394 (FireWire) support
#

#
# You can enable one or both FireWire driver stacks.
#

#
# See the help texts for more information.
#
# CONFIG_FIREWIRE is not set
# CONFIG_IEEE1394 is not set
# CONFIG_I2O is not set
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
# CONFIG_BONDING is not set
CONFIG_MACVLAN=y
# CONFIG_EQUALIZER is not set
CONFIG_TUN=y
# CONFIG_VETH is not set
# CONFIG_NET_SB1000 is not set
# CONFIG_ARCNET is not set
# CONFIG_NET_ETHERNET is not set
CONFIG_MII=y
CONFIG_NETDEV_1000=y
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
# CONFIG_E1000 is not set
# CONFIG_E1000E is not set
# CONFIG_IP1000 is not set
# CONFIG_IGB is not set
# CONFIG_IGBVF is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_R8169=y
CONFIG_R8169_VLAN=y
# CONFIG_SIS190 is not set
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
# CONFIG_VIA_VELOCITY is not set
# CONFIG_TIGON3 is not set
# CONFIG_BNX2 is not set
# CONFIG_QLA3XXX is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_JME is not set
# CONFIG_NETDEV_10000 is not set
# CONFIG_TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
CONFIG_PPP=y
CONFIG_PPP_MULTILINK=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=y
CONFIG_PPP_SYNC_TTY=y
CONFIG_PPP_DEFLATE=y
CONFIG_PPP_BSDCOMP=y
CONFIG_PPP_MPPE=y
CONFIG_PPPOE=y
CONFIG_PPPOL2TP=y
# CONFIG_SLIP is not set
CONFIG_SLHC=y
# CONFIG_NET_FC is not set
CONFIG_NETCONSOLE=y
CONFIG_NETCONSOLE_DYNAMIC=y
CONFIG_NETPOLL=y
CONFIG_NETPOLL_TRAP=y
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_VIRTIO_NET is not set
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_POLLDEV is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1280
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=800
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_MOUSE_SYNAPTICS_I2C is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=y
# CONFIG_INPUT_ATLAS_BTNS is not set
# CONFIG_INPUT_ATI_REMOTE is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
# CONFIG_INPUT_CM109 is not set
# CONFIG_INPUT_UINPUT is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_DEVKMEM=y
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
# CONFIG_LEGACY_PTYS is not set
# CONFIG_VIRTIO_CONSOLE is not set
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
# CONFIG_HW_RANDOM_INTEL is not set
CONFIG_HW_RANDOM_AMD=y
# CONFIG_HW_RANDOM_VIA is not set
# CONFIG_HW_RANDOM_VIRTIO is not set
# CONFIG_NVRAM is not set
CONFIG_RTC=y
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_MWAVE is not set
# CONFIG_PC8736x_GPIO is not set
# CONFIG_RAW_DRIVER is not set
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
# CONFIG_HANGCHECK_TIMER is not set
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_HELPER_AUTO=y

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_ISCH is not set
CONFIG_I2C_PIIX4=y
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_SIMTEC is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Graphics adapter I2C/DDC channel drivers
#
# CONFIG_I2C_VOODOO3 is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_STUB is not set

#
# Miscellaneous I2C Chip support
#
# CONFIG_DS1682 is not set
# CONFIG_SENSORS_PCF8574 is not set
# CONFIG_PCF8575 is not set
# CONFIG_SENSORS_PCA9539 is not set
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set
# CONFIG_SPI is not set

#
# PPS support
#
# CONFIG_PPS is not set
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_BATTERY_DS2760 is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_BQ27x00 is not set
# CONFIG_BATTERY_MAX17040 is not set
CONFIG_HWMON=y
CONFIG_HWMON_VID=y
# CONFIG_SENSORS_ABITUGURU is not set
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7414 is not set
# CONFIG_SENSORS_AD7418 is not set
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1029 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ADT7462 is not set
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7473 is not set
# CONFIG_SENSORS_ADT7475 is not set
CONFIG_SENSORS_K8TEMP=y
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_ATK0110 is not set
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_FSCHER is not set
# CONFIG_SENSORS_FSCPOS is not set
# CONFIG_SENSORS_FSCHMD is not set
# CONFIG_SENSORS_G760A is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_GL520SM is not set
# CONFIG_SENSORS_CORETEMP is not set
CONFIG_SENSORS_IT87=y
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_LM92 is not set
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_MAX6650 is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_PC87427 is not set
# CONFIG_SENSORS_PCF8591 is not set
# CONFIG_SENSORS_SIS5595 is not set
# CONFIG_SENSORS_DME1737 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_SMSC47M192 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_THMC50 is not set
# CONFIG_SENSORS_TMP401 is not set
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
# CONFIG_SENSORS_W83792D is not set
# CONFIG_SENSORS_W83793 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_SENSORS_HDAPS is not set
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_SENSORS_APPLESMC is not set
# CONFIG_HWMON_DEBUG_CHIP is not set
# CONFIG_THERMAL is not set
# CONFIG_THERMAL_HWMON is not set
CONFIG_WATCHDOG=y
# CONFIG_WATCHDOG_NOWAYOUT is not set

#
# Watchdog Device Drivers
#
# CONFIG_SOFT_WATCHDOG is not set
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
# CONFIG_ALIM1535_WDT is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_SC520_WDT is not set
# CONFIG_EUROTECH_WDT is not set
# CONFIG_IB700_WDT is not set
# CONFIG_IBMASR is not set
# CONFIG_WAFER_WDT is not set
# CONFIG_I6300ESB_WDT is not set
# CONFIG_ITCO_WDT is not set
# CONFIG_IT8712F_WDT is not set
# CONFIG_IT87_WDT is not set
# CONFIG_HP_WATCHDOG is not set
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
# CONFIG_60XX_WDT is not set
# CONFIG_SBC8360_WDT is not set
# CONFIG_CPU5_WDT is not set
# CONFIG_SMSC_SCH311X_WDT is not set
# CONFIG_SMSC37B787_WDT is not set
# CONFIG_W83627HF_WDT is not set
# CONFIG_W83697HF_WDT is not set
# CONFIG_W83697UG_WDT is not set
# CONFIG_W83877F_WDT is not set
# CONFIG_W83977F_WDT is not set
# CONFIG_MACHZ_WDT is not set
# CONFIG_SBC_EPX_C3_WATCHDOG is not set

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set

#
# USB-based Watchdog Cards
#
# CONFIG_USBPCWATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
# CONFIG_SSB is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_AB3100_CORE is not set
# CONFIG_REGULATOR is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
# CONFIG_AGP_INTEL is not set
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_VIA is not set
# CONFIG_DRM is not set
# CONFIG_VGASTATE is not set
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
# CONFIG_FB is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=2048
CONFIG_DUMMY_CONSOLE=y
CONFIG_SOUND=y
CONFIG_SOUND_OSS_CORE=y
CONFIG_SND=y
CONFIG_SND_TIMER=y
CONFIG_SND_PCM=y
CONFIG_SND_HWDEP=y
CONFIG_SND_RAWMIDI=y
CONFIG_SND_SEQUENCER=y
# CONFIG_SND_SEQ_DUMMY is not set
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=y
CONFIG_SND_PCM_OSS=y
CONFIG_SND_PCM_OSS_PLUGINS=y
CONFIG_SND_SEQUENCER_OSS=y
# CONFIG_SND_HRTIMER is not set
# CONFIG_SND_RTCTIMER is not set
CONFIG_SND_DYNAMIC_MINORS=y
# CONFIG_SND_SUPPORT_OLD_API is not set
CONFIG_SND_VERBOSE_PROCFS=y
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set
CONFIG_SND_VMASTER=y
CONFIG_SND_RAWMIDI_SEQ=y
# CONFIG_SND_OPL3_LIB_SEQ is not set
# CONFIG_SND_OPL4_LIB_SEQ is not set
# CONFIG_SND_SBAWE_SEQ is not set
# CONFIG_SND_EMU10K1_SEQ is not set
CONFIG_SND_DRIVERS=y
# CONFIG_SND_PCSP is not set
# CONFIG_SND_DUMMY is not set
# CONFIG_SND_VIRMIDI is not set
# CONFIG_SND_SERIAL_U16550 is not set
# CONFIG_SND_MPU401 is not set
CONFIG_SND_PCI=y
# CONFIG_SND_AD1889 is not set
# CONFIG_SND_ALS300 is not set
# CONFIG_SND_ALS4000 is not set
# CONFIG_SND_ALI5451 is not set
# CONFIG_SND_ATIIXP is not set
# CONFIG_SND_ATIIXP_MODEM is not set
# CONFIG_SND_AU8810 is not set
# CONFIG_SND_AU8820 is not set
# CONFIG_SND_AU8830 is not set
# CONFIG_SND_AW2 is not set
# CONFIG_SND_AZT3328 is not set
# CONFIG_SND_BT87X is not set
# CONFIG_SND_CA0106 is not set
# CONFIG_SND_CMIPCI is not set
# CONFIG_SND_OXYGEN is not set
# CONFIG_SND_CS4281 is not set
# CONFIG_SND_CS46XX is not set
# CONFIG_SND_CS5530 is not set
# CONFIG_SND_CTXFI is not set
# CONFIG_SND_DARLA20 is not set
# CONFIG_SND_GINA20 is not set
# CONFIG_SND_LAYLA20 is not set
# CONFIG_SND_DARLA24 is not set
# CONFIG_SND_GINA24 is not set
# CONFIG_SND_LAYLA24 is not set
# CONFIG_SND_MONA is not set
# CONFIG_SND_MIA is not set
# CONFIG_SND_ECHO3G is not set
# CONFIG_SND_INDIGO is not set
# CONFIG_SND_INDIGOIO is not set
# CONFIG_SND_INDIGODJ is not set
# CONFIG_SND_INDIGOIOX is not set
# CONFIG_SND_INDIGODJX is not set
# CONFIG_SND_EMU10K1 is not set
# CONFIG_SND_EMU10K1X is not set
# CONFIG_SND_ENS1370 is not set
# CONFIG_SND_ENS1371 is not set
# CONFIG_SND_ES1938 is not set
# CONFIG_SND_ES1968 is not set
# CONFIG_SND_FM801 is not set
CONFIG_SND_HDA_INTEL=m
# CONFIG_SND_HDA_HWDEP is not set
# CONFIG_SND_HDA_INPUT_BEEP is not set
# CONFIG_SND_HDA_INPUT_JACK is not set
CONFIG_SND_HDA_CODEC_REALTEK=y
# CONFIG_SND_HDA_CODEC_ANALOG is not set
# CONFIG_SND_HDA_CODEC_SIGMATEL is not set
# CONFIG_SND_HDA_CODEC_VIA is not set
# CONFIG_SND_HDA_CODEC_ATIHDMI is not set
CONFIG_SND_HDA_CODEC_NVHDMI=y
CONFIG_SND_HDA_CODEC_INTELHDMI=y
CONFIG_SND_HDA_ELD=y
# CONFIG_SND_HDA_CODEC_CONEXANT is not set
CONFIG_SND_HDA_CODEC_CA0110=y
# CONFIG_SND_HDA_CODEC_CMEDIA is not set
# CONFIG_SND_HDA_CODEC_SI3054 is not set
CONFIG_SND_HDA_GENERIC=y
CONFIG_SND_HDA_POWER_SAVE=y
CONFIG_SND_HDA_POWER_SAVE_DEFAULT=0
# CONFIG_SND_HDSP is not set
# CONFIG_SND_HDSPM is not set
# CONFIG_SND_HIFIER is not set
# CONFIG_SND_ICE1712 is not set
# CONFIG_SND_ICE1724 is not set
# CONFIG_SND_INTEL8X0 is not set
# CONFIG_SND_INTEL8X0M is not set
# CONFIG_SND_KORG1212 is not set
# CONFIG_SND_LX6464ES is not set
# CONFIG_SND_MAESTRO3 is not set
# CONFIG_SND_MIXART is not set
# CONFIG_SND_NM256 is not set
# CONFIG_SND_PCXHR is not set
# CONFIG_SND_RIPTIDE is not set
# CONFIG_SND_RME32 is not set
# CONFIG_SND_RME96 is not set
# CONFIG_SND_RME9652 is not set
# CONFIG_SND_SONICVIBES is not set
# CONFIG_SND_TRIDENT is not set
# CONFIG_SND_VIA82XX is not set
# CONFIG_SND_VIA82XX_MODEM is not set
# CONFIG_SND_VIRTUOSO is not set
# CONFIG_SND_VX222 is not set
# CONFIG_SND_YMFPCI is not set
CONFIG_SND_USB=y
CONFIG_SND_USB_AUDIO=y
# CONFIG_SND_USB_USX2Y is not set
# CONFIG_SND_USB_CAIAQ is not set
# CONFIG_SND_USB_US122L is not set
# CONFIG_SND_SOC is not set
# CONFIG_SOUND_PRIME is not set
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
# CONFIG_HID_DEBUG is not set
# CONFIG_HIDRAW is not set

#
# USB Input Devices
#
CONFIG_USB_HID=y
# CONFIG_HID_PID is not set
# CONFIG_USB_HIDDEV is not set

#
# Special HID drivers
#
CONFIG_HID_A4TECH=y
CONFIG_HID_APPLE=y
CONFIG_HID_BELKIN=y
CONFIG_HID_CHERRY=y
CONFIG_HID_CHICONY=y
CONFIG_HID_CYPRESS=y
CONFIG_HID_DRAGONRISE=y
# CONFIG_DRAGONRISE_FF is not set
CONFIG_HID_EZKEY=y
CONFIG_HID_KYE=y
CONFIG_HID_GYRATION=y
CONFIG_HID_KENSINGTON=y
CONFIG_HID_LOGITECH=y
# CONFIG_LOGITECH_FF is not set
# CONFIG_LOGIRUMBLEPAD2_FF is not set
CONFIG_HID_MICROSOFT=y
CONFIG_HID_MONTEREY=y
CONFIG_HID_NTRIG=y
CONFIG_HID_PANTHERLORD=y
# CONFIG_PANTHERLORD_FF is not set
CONFIG_HID_PETALYNX=y
CONFIG_HID_SAMSUNG=y
CONFIG_HID_SONY=y
CONFIG_HID_SUNPLUS=y
CONFIG_HID_GREENASIA=y
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_SMARTJOYPLUS=y
# CONFIG_SMARTJOYPLUS_FF is not set
CONFIG_HID_TOPSEED=y
CONFIG_HID_THRUSTMASTER=y
# CONFIG_THRUSTMASTER_FF is not set
CONFIG_HID_ZEROPLUS=y
# CONFIG_ZEROPLUS_FF is not set
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_ANNOUNCE_NEW_DEVICES is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
# CONFIG_USB_DEVICE_CLASS is not set
CONFIG_USB_DYNAMIC_MINORS=y
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set
# CONFIG_USB_MON is not set
# CONFIG_USB_WUSB is not set
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
# CONFIG_USB_XHCI_HCD is not set
CONFIG_USB_EHCI_HCD=m
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1760_HCD is not set
CONFIG_USB_OHCI_HCD=m
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=m
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_HWA_HCD is not set

#
# USB Device Class drivers
#
CONFIG_USB_ACM=m
CONFIG_USB_PRINTER=m
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
CONFIG_USB_STORAGE_DATAFAB=m
CONFIG_USB_STORAGE_FREECOM=m
CONFIG_USB_STORAGE_ISD200=m
CONFIG_USB_STORAGE_USBAT=m
CONFIG_USB_STORAGE_SDDR09=m
CONFIG_USB_STORAGE_SDDR55=m
CONFIG_USB_STORAGE_JUMPSHOT=m
CONFIG_USB_STORAGE_ALAUDA=m
CONFIG_USB_STORAGE_ONETOUCH=m
CONFIG_USB_STORAGE_KARMA=m
CONFIG_USB_STORAGE_CYPRESS_ATACB=m
# CONFIG_USB_LIBUSUAL is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB port drivers
#
CONFIG_USB_SERIAL=m
# CONFIG_USB_EZUSB is not set
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_AIRCABLE is not set
# CONFIG_USB_SERIAL_ARK3116 is not set
# CONFIG_USB_SERIAL_BELKIN is not set
# CONFIG_USB_SERIAL_CH341 is not set
# CONFIG_USB_SERIAL_WHITEHEAT is not set
# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
# CONFIG_USB_SERIAL_CP210X is not set
# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
# CONFIG_USB_SERIAL_EMPEG is not set
# CONFIG_USB_SERIAL_FTDI_SIO is not set
# CONFIG_USB_SERIAL_FUNSOFT is not set
# CONFIG_USB_SERIAL_VISOR is not set
# CONFIG_USB_SERIAL_IPAQ is not set
# CONFIG_USB_SERIAL_IR is not set
# CONFIG_USB_SERIAL_EDGEPORT is not set
# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
# CONFIG_USB_SERIAL_GARMIN is not set
# CONFIG_USB_SERIAL_IPW is not set
# CONFIG_USB_SERIAL_IUU is not set
# CONFIG_USB_SERIAL_KEYSPAN_PDA is not set
# CONFIG_USB_SERIAL_KEYSPAN is not set
# CONFIG_USB_SERIAL_KLSI is not set
# CONFIG_USB_SERIAL_KOBIL_SCT is not set
# CONFIG_USB_SERIAL_MCT_U232 is not set
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MOTOROLA is not set
# CONFIG_USB_SERIAL_NAVMAN is not set
CONFIG_USB_SERIAL_PL2303=m
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_QUALCOMM is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
# CONFIG_USB_SERIAL_HP4X is not set
# CONFIG_USB_SERIAL_SAFE is not set
# CONFIG_USB_SERIAL_SIEMENS_MPI is not set
# CONFIG_USB_SERIAL_SIERRAWIRELESS is not set
# CONFIG_USB_SERIAL_SYMBOL is not set
# CONFIG_USB_SERIAL_TI is not set
# CONFIG_USB_SERIAL_CYBERJACK is not set
# CONFIG_USB_SERIAL_XIRCOM is not set
# CONFIG_USB_SERIAL_OPTION is not set
# CONFIG_USB_SERIAL_OMNINET is not set
# CONFIG_USB_SERIAL_OPTICON is not set
# CONFIG_USB_SERIAL_DEBUG is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_BERRY_CHARGE is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_VST is not set

#
# OTG and related infrastructure
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
CONFIG_EDAC=y

#
# Reporting subsystems
#
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_MM_EDAC=y
CONFIG_EDAC_AMD64=y
# CONFIG_EDAC_AMD64_ERROR_INJECTION is not set
# CONFIG_EDAC_E752X is not set
# CONFIG_EDAC_I82975X is not set
# CONFIG_EDAC_I3000 is not set
# CONFIG_EDAC_X38 is not set
# CONFIG_EDAC_I5400 is not set
# CONFIG_EDAC_I5000 is not set
# CONFIG_EDAC_I5100 is not set
# CONFIG_RTC_CLASS is not set
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set

#
# TI VLYNQ
#
CONFIG_STAGING=y
# CONFIG_STAGING_EXCLUDE_BUILD is not set
# CONFIG_ET131X is not set
# CONFIG_SLICOSS is not set
# CONFIG_ME4000 is not set
# CONFIG_MEILHAUS is not set
# CONFIG_USB_IP_COMMON is not set
# CONFIG_ECHO is not set
# CONFIG_COMEDI is not set
# CONFIG_ASUS_OLED is not set
# CONFIG_ALTERA_PCIE_CHDMA is not set
# CONFIG_INPUT_MIMIO is not set
# CONFIG_TRANZPORT is not set
# CONFIG_EPL is not set

#
# Android
#
# CONFIG_ANDROID is not set
# CONFIG_DST is not set
CONFIG_POHMELFS=m
# CONFIG_POHMELFS_DEBUG is not set
# CONFIG_POHMELFS_CRYPTO is not set
# CONFIG_B3DFG is not set
# CONFIG_IDE_PHISON is not set
# CONFIG_PLAN9AUTH is not set
# CONFIG_HECI is not set
# CONFIG_LINE6_USB is not set
# CONFIG_USB_SERIAL_QUATECH2 is not set
# CONFIG_VT6655 is not set
# CONFIG_USB_CPC is not set
# CONFIG_RDC_17F3101X is not set
# CONFIG_X86_PLATFORM_DEVICES is not set

#
# Firmware Drivers
#
# CONFIG_EDD is not set
CONFIG_FIRMWARE_MEMMAP=y
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_DMIID=y
# CONFIG_ISCSI_IBFT_FIND is not set

#
# File systems
#
CONFIG_EXT2_FS=m
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=m
CONFIG_EXT3_DEFAULTS_TO_ORDERED=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_EXT4_FS=y
CONFIG_EXT4DEV_COMPAT=y
CONFIG_EXT4_FS_XATTR=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
CONFIG_JBD=m
# CONFIG_JBD_DEBUG is not set
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
CONFIG_FUSE_FS=y
# CONFIG_CUSE is not set

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
# CONFIG_MSDOS_FS is not set
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=1251
CONFIG_FAT_DEFAULT_IOCHARSET="utf8"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=y
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_NILFS2_FS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
CONFIG_OSF_PARTITION=y
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
# CONFIG_BSD_DISKLABEL is not set
# CONFIG_MINIX_SUBPARTITION is not set
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_UNIXWARE_DISKLABEL is not set
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
# CONFIG_EFI_PARTITION is not set
# CONFIG_SYSV68_PARTITION is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
# CONFIG_NLS_CODEPAGE_437 is not set
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
CONFIG_NLS_CODEPAGE_866=y
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
CONFIG_NLS_CODEPAGE_1251=y
# CONFIG_NLS_ASCII is not set
# CONFIG_NLS_ISO8859_1 is not set
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
CONFIG_NLS_ISO8859_5=y
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
CONFIG_NLS_KOI8_R=y
CONFIG_NLS_KOI8_U=y
CONFIG_NLS_UTF8=y
# CONFIG_DLM is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_PRINTK_TIME=y
CONFIG_ALLOW_WARNINGS=y
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
# CONFIG_DEBUG_SECTION_MISMATCH is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
# CONFIG_SCHED_DEBUG is not set
# CONFIG_SCHEDSTATS is not set
# CONFIG_TIMER_STATS is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VIRTUAL is not set
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_ARCH_WANT_FRAME_POINTERS=y
# CONFIG_FRAME_POINTER is not set
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
CONFIG_SYSCTL_SYSCALL_CHECK=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SYSPROF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_ENABLE_DEFAULT_TRACERS is not set
# CONFIG_FTRACE_SYSCALLS is not set
# CONFIG_BOOT_TRACER is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
# CONFIG_POWER_TRACER is not set
# CONFIG_KSYM_TRACER is not set
# CONFIG_STACK_TRACER is not set
# CONFIG_KMEMTRACE is not set
# CONFIG_WORKQUEUE_TRACER is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_MMIOTRACE is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_HAVE_ARCH_KMEMCHECK=y
# CONFIG_KMEMCHECK is not set
# CONFIG_STRICT_DEVMEM is not set
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_DBGP is not set
# CONFIG_DEBUG_STACKOVERFLOW is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PER_CPU_MAPS is not set
# CONFIG_X86_PTDUMP is not set
# CONFIG_DEBUG_RODATA is not set
# CONFIG_DEBUG_NX_TEST is not set
# CONFIG_IOMMU_DEBUG is not set
# CONFIG_IOMMU_STRESS is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set
# CONFIG_OPTIMIZE_INLINING is not set

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITYFS is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
# CONFIG_INTEL_TXT is not set
# CONFIG_IMA is not set
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
# CONFIG_CRYPTO_FIPS is not set
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_GF128MUL=y
# CONFIG_CRYPTO_NULL is not set
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_CRYPTD=y
# CONFIG_CRYPTO_AUTHENC is not set
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=y
# CONFIG_CRYPTO_PCBC is not set
CONFIG_CRYPTO_XTS=y

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=m
# CONFIG_CRYPTO_XCBC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32C_INTEL is not set
# CONFIG_CRYPTO_MD4 is not set
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=y
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
# CONFIG_CRYPTO_SHA256 is not set
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_TGR192 is not set
CONFIG_CRYPTO_WP512=y

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_X86_64=y
# CONFIG_CRYPTO_AES_NI_INTEL is not set
# CONFIG_CRYPTO_ANUBIS is not set
CONFIG_CRYPTO_ARC4=y
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
CONFIG_CRYPTO_DES=y
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
# CONFIG_CRYPTO_SEED is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_TEA is not set
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_TWOFISH_COMMON=y
CONFIG_CRYPTO_TWOFISH_X86_64=y

#
# Compression
#
# CONFIG_CRYPTO_DEFLATE is not set
# CONFIG_CRYPTO_ZLIB is not set
# CONFIG_CRYPTO_LZO is not set

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_HW is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=y
# CONFIG_KVM_INTEL is not set
CONFIG_KVM_AMD=y
# CONFIG_KVM_TRACE is not set
CONFIG_VIRTIO=y
CONFIG_VIRTIO_RING=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_BALLOON=y
# CONFIG_BINARY_PRINTF is not set

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
# CONFIG_CRC_T10DIF is not set
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
CONFIG_LIBCRC32C=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_NLATTR=y

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-09 11:55             ` Frans Pop
@ 2009-09-11  1:36               ` Benjamin Herrenschmidt
  2009-09-16 18:27                 ` Frans Pop
  0 siblings, 1 reply; 216+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-11  1:36 UTC (permalink / raw)
  To: Frans Pop; +Cc: Arjan van de Ven, realnc, linux-kernel


> I'd say add an extra horizontal split in the second column, so you'd get 
> three areas in the right column:
> - top for the global target (permanently)
> - middle for current, either:
>   - "current most lagging" if "Global" is selected in left column
>   - selected process if a specific target is selected in left column
> - bottom for backtrace
> 
> Maybe with that setup "Global" in the left column should be renamed to 
> something like "Dynamic".
> 
> The backtrace area would show selection from either top or middle areas 
> (so selecting a cause in top or middle area should unselect causes in the 
> other).

I'll have a look after the merge window madness. Multiple windows is
also still an option I suppose even if i don't like it that much: we
could support double-click on an app or "global" in the left list,
making that pop a new window with the same content as the right pane for
that app (or global) that updates at the same time as the rest.

Somebody ping me if I seem to have forgotten about it in 2 weeks :-)

Ben.




^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-10 23:23           ` Serge Belyshev
@ 2009-09-11  6:10             ` Ingo Molnar
  2009-09-11  8:55               ` Serge Belyshev
  2009-09-13 15:27               ` Serge Belyshev
  0 siblings, 2 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-11  6:10 UTC (permalink / raw)
  To: Serge Belyshev; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith


* Serge Belyshev <belyshev@depni.sinp.msu.ru> wrote:

> Ingo Molnar <mingo@elte.hu> writes:
> 
> >    perf stat --repeat 3 make -j4 bzImage
> 
> BFS hangs here:
> 
> [  128.859000] BUG: soft lockup - CPU#2 stuck for 61s! [sh:7946]
> [  128.859016] Modules linked in:
> [  128.859016] CPU 2:
> [  128.859016] Modules linked in:
> [  128.859016] Pid: 7946, comm: sh Not tainted 2.6.31-bfs211-dirty #4 GA-MA790FX-DQ6
> [  128.859016] RIP: 0010:[<ffffffff81055a52>]  [<ffffffff81055a52>] task_oncpu_function_call+0x22/0x40
> [  128.859016] RSP: 0018:ffff880205967e18  EFLAGS: 00000246
> [  128.859016] RAX: 0000000000000002 RBX: ffff880205964cc0 RCX: 000000000000dd00
> [  128.859016] RDX: ffff880211138c00 RSI: ffffffff8108d3f0 RDI: ffff88022e42a100
> [  128.859016] RBP: ffffffff8102d76e R08: ffff880028066000 R09: 0000000000000000
> [  128.859016] R10: 0000000000000000 R11: 0000000000000058 R12: ffffffff8108d3f0
> [  128.859016] R13: ffff880211138c00 R14: 0000000000000001 R15: 000000000000e260
> [  128.859016] FS:  00002b9ba0924e00(0000) GS:ffff880028066000(0000) knlGS:0000000000000000
> [  128.859016] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  128.859016] CR2: 00002b9ba091e4a8 CR3: 0000000001001000 CR4: 00000000000006e0
> [  128.859016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  128.859016] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  128.859016] Call Trace:
> [  128.859016]  [<ffffffff8108ee3b>] ? perf_counter_remove_from_context+0x3b/0x90
> [  128.859016]  [<ffffffff810904b4>] ? perf_counter_exit_task+0x114/0x340
> [  128.859016]  [<ffffffff810c3f66>] ? filp_close+0x56/0x90
> [  128.859016]  [<ffffffff8105d3ac>] ? do_exit+0x14c/0x6f0
> [  128.859016]  [<ffffffff8105d991>] ? do_group_exit+0x41/0xb0
> [  128.859016]  [<ffffffff8105da12>] ? sys_exit_group+0x12/0x20
> [  128.859016]  [<ffffffff8102cceb>] ? system_call_fastpath+0x16/0x1b
> 
> So, got nothing to compare with.

Could still compare -j5 to -j4 on -tip, to see why -j4 is 3% short 
of -j5's throughput.

(Plus maybe the NEW_FAIR_SLEEPERS change in -tip fixes the 3% drop.)

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable
  2009-09-10 18:00                                       ` [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable Ingo Molnar
@ 2009-09-11  7:37                                         ` Ingo Molnar
  2009-09-11  7:48                                           ` Martin Schwidefsky
  2009-09-11 13:33                                           ` Martin Schwidefsky
  0 siblings, 2 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-11  7:37 UTC (permalink / raw)
  To: Jens Axboe, Martin Schwidefsky, John Stultz
  Cc: Peter Zijlstra, Mike Galbraith, Con Kolivas, linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > I went to try -tip btw, but it crashes on boot. Here's the 
> > > backtrace, typed manually, it's crashing in 
> > > queue_work_on+0x28/0x60.
> > > 
> > > Call Trace:
> > >         queue_work
> > >         schedule_work
> > >         clocksource_mark_unstable
> > >         mark_tsc_unstable
> > >         check_tsc_sync_source
> > >         native_cpu_up
> > >         relay_hotcpu_callback
> > >         do_forK_idle
> > >         _cpu_up
> > >         cpu_up
> > >         kernel_init
> > >         kernel_thread_helper
> > 
> > hm, that looks like an old bug i fixed days ago via:
> > 
> >   00a3273: Revert "x86: Make tsc=reliable override boot time stability checks"
> > 
> > Have you tested tip:master - do you still know which sha1?
> 
> Ok, i reproduced it on a testbox and bisected it, the crash is 
> caused by:
> 
>  7285dd7fd375763bfb8ab1ac9cf3f1206f503c16 is first bad commit
>  commit 7285dd7fd375763bfb8ab1ac9cf3f1206f503c16
>  Author: Thomas Gleixner <tglx@linutronix.de>
>  Date:   Fri Aug 28 20:25:24 2009 +0200
> 
>     clocksource: Resolve cpu hotplug dead lock with TSC unstable
>     
>     Martin Schwidefsky analyzed it:
> 
> I've reverted it in tip/master for now.

and that uncovers the circular locking bug that this commit was 
supposed to fix ...

Martin?

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable
  2009-09-11  7:37                                         ` Ingo Molnar
@ 2009-09-11  7:48                                           ` Martin Schwidefsky
  2009-09-11 13:33                                           ` Martin Schwidefsky
  1 sibling, 0 replies; 216+ messages in thread
From: Martin Schwidefsky @ 2009-09-11  7:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jens Axboe, John Stultz, Peter Zijlstra, Mike Galbraith,
	Con Kolivas, linux-kernel

On Fri, 11 Sep 2009 09:37:47 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > 
> > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > > 
> > > > I went to try -tip btw, but it crashes on boot. Here's the 
> > > > backtrace, typed manually, it's crashing in 
> > > > queue_work_on+0x28/0x60.
> > > > 
> > > > Call Trace:
> > > >         queue_work
> > > >         schedule_work
> > > >         clocksource_mark_unstable
> > > >         mark_tsc_unstable
> > > >         check_tsc_sync_source
> > > >         native_cpu_up
> > > >         relay_hotcpu_callback
> > > >         do_forK_idle
> > > >         _cpu_up
> > > >         cpu_up
> > > >         kernel_init
> > > >         kernel_thread_helper
> > > 
> > > hm, that looks like an old bug i fixed days ago via:
> > > 
> > >   00a3273: Revert "x86: Make tsc=reliable override boot time stability checks"
> > > 
> > > Have you tested tip:master - do you still know which sha1?
> > 
> > Ok, i reproduced it on a testbox and bisected it, the crash is 
> > caused by:
> > 
> >  7285dd7fd375763bfb8ab1ac9cf3f1206f503c16 is first bad commit
> >  commit 7285dd7fd375763bfb8ab1ac9cf3f1206f503c16
> >  Author: Thomas Gleixner <tglx@linutronix.de>
> >  Date:   Fri Aug 28 20:25:24 2009 +0200
> > 
> >     clocksource: Resolve cpu hotplug dead lock with TSC unstable
> >     
> >     Martin Schwidefsky analyzed it:
> > 
> > I've reverted it in tip/master for now.
> 
> and that uncovers the circular locking bug that this commit was 
> supposed to fix ...
> 
> Martin?

Damn, back to running around in circles ..

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-11  6:10             ` Ingo Molnar
@ 2009-09-11  8:55               ` Serge Belyshev
  2009-09-13 15:27               ` Serge Belyshev
  1 sibling, 0 replies; 216+ messages in thread
From: Serge Belyshev @ 2009-09-11  8:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

Ingo Molnar <mingo@elte.hu> writes:

> Could still compare -j5 to -j4 on -tip, to see why -j4 is 3% short 
> of -j5's throughput.
>
> (Plus maybe the NEW_FAIR_SLEEPERS change in -tip fixes the 3% drop.)

Will do in about 12 hours or so.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-10 21:19                                 ` Martin Steigerwald
@ 2009-09-11  9:26                                   ` Mat
  2009-09-12 11:26                                     ` Martin Steigerwald
  0 siblings, 1 reply; 216+ messages in thread
From: Mat @ 2009-09-11  9:26 UTC (permalink / raw)
  To: linux-kernel

Martin Steigerwald <Martin <at> lichtvoll.de> writes:

> 
> Am Donnerstag 10 September 2009 schrieb Ingo Molnar:

[snip]

> > what is /debug/sched_features - is NO_NEW_FAIR_SLEEPERS set? If not
> > set yet then try it:
> > 
> >   echo NO_NEW_FAIR_SLEEPERS > /debug/sched_features
> > 
> > that too might make things more fluid.


Hi Martin,

it made an tremendous difference which still has to be tested out :)

Hi Ingo,

which adverse effect could 

cat /proc/sys/kernel/sched_wakeup_granularity_ns 
0

have on the throughput ?


Concerning that "NO_NEW_FAIR_SLEEPERS" switch - isn't it as easy as to 

do the following ? (I'm not sure if there's supposed to be another debug)

echo NO_NEW_FAIR_SLEEPERS > /sys/kernel/debug/sched_features

which after the change says:

cat /sys/kernel/debug/sched_features 
NO_NEW_FAIR_SLEEPERS NO_NORMALIZED_SLEEPER ADAPTIVE_GRAN WAKEUP_PREEMPT
START_DEBIT AFFINE_WAKEUPS CACHE_HOT_BUDDY SYNC_WAKEUPS NO_HRTICK NO_DOUBLE_TICK
ASYM_GRAN LB_BIAS LB_WAKEUP_UPDATE ASYM_EFF_LOAD NO_WAKEUP_OVERLAP LAST_BUDDY
OWNER_SPIN

I hope that's the correct switch ^^

Greetings and please keep on improving the scheduler (especially with regards to
the desktop crowd)

Regards

Mat



^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable
  2009-09-11  7:37                                         ` Ingo Molnar
  2009-09-11  7:48                                           ` Martin Schwidefsky
@ 2009-09-11 13:33                                           ` Martin Schwidefsky
  2009-09-11 18:22                                             ` [tip:timers/core] clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash tip-bot for Martin Schwidefsky
  2009-09-14 15:19                                             ` [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable Ingo Molnar
  1 sibling, 2 replies; 216+ messages in thread
From: Martin Schwidefsky @ 2009-09-11 13:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jens Axboe, John Stultz, Peter Zijlstra, Mike Galbraith,
	Con Kolivas, linux-kernel

On Fri, 11 Sep 2009 09:37:47 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > 
> > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > > 
> > > > I went to try -tip btw, but it crashes on boot. Here's the 
> > > > backtrace, typed manually, it's crashing in 
> > > > queue_work_on+0x28/0x60.
> > > > 
> > > > Call Trace:
> > > >         queue_work
> > > >         schedule_work
> > > >         clocksource_mark_unstable
> > > >         mark_tsc_unstable
> > > >         check_tsc_sync_source
> > > >         native_cpu_up
> > > >         relay_hotcpu_callback
> > > >         do_forK_idle
> > > >         _cpu_up
> > > >         cpu_up
> > > >         kernel_init
> > > >         kernel_thread_helper
> > > 
> > > hm, that looks like an old bug i fixed days ago via:
> > > 
> > >   00a3273: Revert "x86: Make tsc=reliable override boot time stability checks"
> > > 
> > > Have you tested tip:master - do you still know which sha1?
> > 
> > Ok, i reproduced it on a testbox and bisected it, the crash is 
> > caused by:
> > 
> >  7285dd7fd375763bfb8ab1ac9cf3f1206f503c16 is first bad commit
> >  commit 7285dd7fd375763bfb8ab1ac9cf3f1206f503c16
> >  Author: Thomas Gleixner <tglx@linutronix.de>
> >  Date:   Fri Aug 28 20:25:24 2009 +0200
> > 
> >     clocksource: Resolve cpu hotplug dead lock with TSC unstable
> >     
> >     Martin Schwidefsky analyzed it:
> > 
> > I've reverted it in tip/master for now.
> 
> and that uncovers the circular locking bug that this commit was 
> supposed to fix ...
> 
> Martin?

This patch should fix the obvious problem that the watchdog_work
structure is not yet initialized if the clocksource watchdog is not
running yet.
--
Subject: [PATCH] clocksource: statically initialize watchdog workqueue

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

The watchdog timer is started after the watchdog clocksource and at least
one watched clocksource have been registered. The clocksource work element
watchdog_work is initialized just before the clocksource timer is started.
This is too late for the clocksource_mark_unstable call from native_cpu_up.
To fix this use a static initializer for watchdog_work.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 kernel/time/clocksource.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/time/clocksource.c
===================================================================
--- linux-2.6.orig/kernel/time/clocksource.c
+++ linux-2.6/kernel/time/clocksource.c
@@ -123,10 +123,12 @@ static DEFINE_MUTEX(clocksource_mutex);
 static char override_name[32];
 
 #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
+static void clocksource_watchdog_work(struct work_struct *work);
+
 static LIST_HEAD(watchdog_list);
 static struct clocksource *watchdog;
 static struct timer_list watchdog_timer;
-static struct work_struct watchdog_work;
+static DECLARE_WORK(watchdog_work, clocksource_watchdog_work);
 static DEFINE_SPINLOCK(watchdog_lock);
 static cycle_t watchdog_last;
 static int watchdog_running;
@@ -230,7 +232,6 @@ static inline void clocksource_start_wat
 {
 	if (watchdog_running || !watchdog || list_empty(&watchdog_list))
 		return;
-	INIT_WORK(&watchdog_work, clocksource_watchdog_work);
 	init_timer(&watchdog_timer);
 	watchdog_timer.function = clocksource_watchdog;
 	watchdog_last = watchdog->read(watchdog);

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 216+ messages in thread

* [tip:timers/core] clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
  2009-09-11 13:33                                           ` Martin Schwidefsky
@ 2009-09-11 18:22                                             ` tip-bot for Martin Schwidefsky
  2009-09-14 15:19                                             ` [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable Ingo Molnar
  1 sibling, 0 replies; 216+ messages in thread
From: tip-bot for Martin Schwidefsky @ 2009-09-11 18:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, johnstul, schwidefsky, jens.axboe, tglx, mingo

Commit-ID:  f79e0258ea1f04d63db499479b5fb855dff6dbc5
Gitweb:     http://git.kernel.org/tip/f79e0258ea1f04d63db499479b5fb855dff6dbc5
Author:     Martin Schwidefsky <schwidefsky@de.ibm.com>
AuthorDate: Fri, 11 Sep 2009 15:33:05 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 11 Sep 2009 20:17:18 +0200

clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash

The watchdog timer is started after the watchdog clocksource
and at least one watched clocksource have been registered. The
clocksource work element watchdog_work is initialized just
before the clocksource timer is started. This is too late for
the clocksource_mark_unstable call from native_cpu_up. To fix
this use a static initializer for watchdog_work.

This resolves a boot crash reported by multiple people.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090911153305.3fe9a361@skybase>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/time/clocksource.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index a0af4ff..5697155 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -123,10 +123,12 @@ static DEFINE_MUTEX(clocksource_mutex);
 static char override_name[32];
 
 #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
+static void clocksource_watchdog_work(struct work_struct *work);
+
 static LIST_HEAD(watchdog_list);
 static struct clocksource *watchdog;
 static struct timer_list watchdog_timer;
-static struct work_struct watchdog_work;
+static DECLARE_WORK(watchdog_work, clocksource_watchdog_work);
 static DEFINE_SPINLOCK(watchdog_lock);
 static cycle_t watchdog_last;
 static int watchdog_running;
@@ -257,7 +259,6 @@ static inline void clocksource_start_watchdog(void)
 {
 	if (watchdog_running || !watchdog || list_empty(&watchdog_list))
 		return;
-	INIT_WORK(&watchdog_work, clocksource_watchdog_work);
 	init_timer(&watchdog_timer);
 	watchdog_timer.function = clocksource_watchdog;
 	watchdog_last = watchdog->read(watchdog);

^ permalink raw reply related	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-11  9:26                                   ` Mat
@ 2009-09-12 11:26                                     ` Martin Steigerwald
  0 siblings, 0 replies; 216+ messages in thread
From: Martin Steigerwald @ 2009-09-12 11:26 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar, Jens Axboe, Mike Galbraith,
	Peter Zijlstra, Con Kolivas
  Cc: Mat

[-- Attachment #1: Type: Text/Plain, Size: 1706 bytes --]

Am Freitag 11 September 2009 schrieb Mat:
> Martin Steigerwald <Martin <at> lichtvoll.de> writes:
> > Am Donnerstag 10 September 2009 schrieb Ingo Molnar:
> 
> [snip]
> 
> > > what is /debug/sched_features - is NO_NEW_FAIR_SLEEPERS set? If not
> > > set yet then try it:
> > >
> > >   echo NO_NEW_FAIR_SLEEPERS > /debug/sched_features
> > >
> > > that too might make things more fluid.
> 
> Hi Martin,

Hi Mat,

> it made an tremendous difference which still has to be tested out :)

[...]

> Concerning that "NO_NEW_FAIR_SLEEPERS" switch - isn't it as easy as to
> 
> do the following ? (I'm not sure if there's supposed to be another
>  debug)
> 
> echo NO_NEW_FAIR_SLEEPERS > /sys/kernel/debug/sched_features
> 
> which after the change says:
> 
> cat /sys/kernel/debug/sched_features
> NO_NEW_FAIR_SLEEPERS NO_NORMALIZED_SLEEPER ADAPTIVE_GRAN WAKEUP_PREEMPT
> START_DEBIT AFFINE_WAKEUPS CACHE_HOT_BUDDY SYNC_WAKEUPS NO_HRTICK
>  NO_DOUBLE_TICK ASYM_GRAN LB_BIAS LB_WAKEUP_UPDATE ASYM_EFF_LOAD
>  NO_WAKEUP_OVERLAP LAST_BUDDY OWNER_SPIN
> 
> I hope that's the correct switch ^^

Thanks. Appears to work here nicely ;-). I thought this might be a debug 
fs that I need to mount separately, but its already there here. I will see 
how it works out.

I wondered whethere it might be a good idea to have a

echo default >  /sys/kernel/kernel-tuning-knob

that will reset it to the compiled in factory defaults. Would be a nice 
way to go back to safe settings again once you got carried away to far 
with trying those tuning knobs.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [tip:sched/core] sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
  2009-09-09 15:37                     ` [tip:sched/core] sched: Re-tune the scheduler latency defaults to decrease worst-case latencies tip-bot for Mike Galbraith
@ 2009-09-12 11:45                       ` Martin Steigerwald
  0 siblings, 0 replies; 216+ messages in thread
From: Martin Steigerwald @ 2009-09-12 11:45 UTC (permalink / raw)
  To: linux-kernel, mingo, hpa, a.p.zijlstra, efault, tglx, mingo
  Cc: linux-tip-commits

[-- Attachment #1: Type: Text/Plain, Size: 4623 bytes --]

Am Mittwoch 09 September 2009 schrieb tip-bot for Mike Galbraith:
> Commit-ID:  172e082a9111ea504ee34cbba26284a5ebdc53a7
> Gitweb:    
>  http://git.kernel.org/tip/172e082a9111ea504ee34cbba26284a5ebdc53a7
>  Author:     Mike Galbraith <efault@gmx.de>
> AuthorDate: Wed, 9 Sep 2009 15:41:37 +0200
> Committer:  Ingo Molnar <mingo@elte.hu>
> CommitDate: Wed, 9 Sep 2009 17:30:06 +0200
> 
> sched: Re-tune the scheduler latency defaults to decrease worst-case
>  latencies
> 
> Reduce the latency target from 20 msecs to 5 msecs.
> 
> Why? Larger latencies increase spread, which is good for scaling,
> but bad for worst case latency.
> 
> We still have the ilog(nr_cpus) rule to scale up on bigger
> server boxes.
> 
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> 
> ---
>  kernel/sched_fair.c |   12 ++++++------
>  1 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index af325a3..26fadb4 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -24,7 +24,7 @@
> 
>  /*
>   * Targeted preemption latency for CPU-bound tasks:
> - * (default: 20ms * (1 + ilog(ncpus)), units: nanoseconds)
> + * (default: 5ms * (1 + ilog(ncpus)), units: nanoseconds)
>   *
>   * NOTE: this latency value is not the same as the concept of
>   * 'timeslice length' - timeslices in CFS are of variable length
> @@ -34,13 +34,13 @@
>   * (to see the precise effective timeslice length of your workload,
>   *  run vmstat and monitor the context-switches (cs) field)
>   */
> -unsigned int sysctl_sched_latency = 20000000ULL;
> +unsigned int sysctl_sched_latency = 5000000ULL;
> 
>  /*
>   * Minimal preemption granularity for CPU-bound tasks:
> - * (default: 4 msec * (1 + ilog(ncpus)), units: nanoseconds)
> + * (default: 1 msec * (1 + ilog(ncpus)), units: nanoseconds)
>   */
> -unsigned int sysctl_sched_min_granularity = 4000000ULL;
> +unsigned int sysctl_sched_min_granularity = 1000000ULL;

Needs to be lower for a fluid desktop experience here:

shambhala:/proc/sys/kernel> cat sched_min_granularity_ns
100000

> 
>  /*
>   * is kept at sysctl_sched_latency / sysctl_sched_min_granularity
> @@ -63,13 +63,13 @@ unsigned int __read_mostly
>  sysctl_sched_compat_yield;
> 
>  /*
>   * SCHED_OTHER wake-up granularity.
> - * (default: 5 msec * (1 + ilog(ncpus)), units: nanoseconds)
> + * (default: 1 msec * (1 + ilog(ncpus)), units: nanoseconds)
>   *
>   * This option delays the preemption effects of decoupled workloads
>   * and reduces their over-scheduling. Synchronous workloads will still
>   * have immediate wakeup/sleep latencies.
>   */
> -unsigned int sysctl_sched_wakeup_granularity = 5000000UL;
> +unsigned int sysctl_sched_wakeup_granularity = 1000000UL;

Dito:

shambhala:/proc/sys/kernel> cat sched_wakeup_granularity_ns
100000

With

shambhala:~> cat /proc/version
Linux version 2.6.31-rc7-tp42-toi-3.0.1-04741-g57e61c0 (martin@shambhala) 
(gcc version 4.3.3 (Debian 4.3.3-10) ) #6 PREEMPT Sun Aug 23 10:51:32 CEST 
2009

on my ThinkPad T42.

Otherwise compositing animations like switching desktops and zooming in 
newly opening windows still appear jerky. Even with:

shambhala:/sys/kernel/debug> cat sched_features
NO_NEW_FAIR_SLEEPERS NO_NORMALIZED_SLEEPER ADAPTIVE_GRAN WAKEUP_PREEMPT 
START_DEBIT AFFINE_WAKEUPS CACHE_HOT_BUDDY SYNC_WAKEUPS NO_HRTICK 
NO_DOUBLE_TICK ASYM_GRAN LB_BIAS LB_WAKEUP_UPDATE ASYM_EFF_LOAD 
NO_WAKEUP_OVERLAP LAST_BUDDY OWNER_SPIN

But NO_NEW_FAIR_SLEEPERS also gives a benefit. It makes those animation 
even more fluent.

In complete I am quity happy with

shambhala:/proc/sys/kernel> grep "" *sched*
sched_child_runs_first:0
sched_compat_yield:0
sched_features:113916
sched_latency_ns:5000000
sched_migration_cost:500000
sched_min_granularity_ns:100000
sched_nr_migrate:32
sched_rt_period_us:1000000
sched_rt_runtime_us:950000
sched_shares_ratelimit:250000
sched_shares_thresh:4
sched_wakeup_granularity_ns:100000

for now.

It really makes a *lot* of difference. But it appears that both 
sched_min_granularity_ns and sched_wakeup_granularity_ns have to be lower 
on my ThinkPad for best effect.

I would still prefer some autotuning, where I say "desktop!" or nothing at 
all. And thats it.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [tip:sched/core] sched: Keep kthreads at default priority
  2009-09-09 17:34                           ` Mike Galbraith
@ 2009-09-12 11:48                             ` Martin Steigerwald
  2009-09-12 12:19                               ` Mike Galbraith
  0 siblings, 1 reply; 216+ messages in thread
From: Martin Steigerwald @ 2009-09-12 11:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mike Galbraith, Peter Zijlstra, Dmitry Torokhov, mingo, hpa,
	tglx, mingo, linux-tip-commits

[-- Attachment #1: Type: Text/Plain, Size: 1251 bytes --]

Am Mittwoch 09 September 2009 schrieb Mike Galbraith:
> On Wed, 2009-09-09 at 19:06 +0200, Peter Zijlstra wrote:
> > On Wed, 2009-09-09 at 09:55 -0700, Dmitry Torokhov wrote:
> > > On Wed, Sep 09, 2009 at 03:37:34PM +0000, tip-bot for Mike Galbraith 
wrote:
> > > > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > > > index eb8751a..5fe7099 100644
> > > > --- a/kernel/kthread.c
> > > > +++ b/kernel/kthread.c
> > > > @@ -16,8 +16,6 @@
> > > >  #include <linux/mutex.h>
> > > >  #include <trace/events/sched.h>
> > > >
> > > > -#define KTHREAD_NICE_LEVEL (-5)
> > > > -
> > >
> > > Why don't we just redefine it to 0? We may find out later that we'd
> > > still prefer to have kernel threads have boost.
> >
> > Seems sensible, also the traditional reasoning behind this nice level
> > is that kernel threads do work on behalf of multiple tasks. Its a
> > kind of prio ceiling thing.
> 
> True.  None of our current threads are heavy enough to matter much.

Does it make sense to have this as a tunable? Where does it matter? Server 
workloads?

(Oh no, not another tunable I can hear you yell;-).

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [tip:sched/core] sched: Keep kthreads at default priority
  2009-09-12 11:48                             ` Martin Steigerwald
@ 2009-09-12 12:19                               ` Mike Galbraith
  0 siblings, 0 replies; 216+ messages in thread
From: Mike Galbraith @ 2009-09-12 12:19 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: linux-kernel, Peter Zijlstra, Dmitry Torokhov, mingo, hpa, tglx,
	mingo, linux-tip-commits

On Sat, 2009-09-12 at 13:48 +0200, Martin Steigerwald wrote:
> Am Mittwoch 09 September 2009 schrieb Mike Galbraith:
> > On Wed, 2009-09-09 at 19:06 +0200, Peter Zijlstra wrote:
> > > On Wed, 2009-09-09 at 09:55 -0700, Dmitry Torokhov wrote:
> > > > On Wed, Sep 09, 2009 at 03:37:34PM +0000, tip-bot for Mike Galbraith 
> wrote:
> > > > > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > > > > index eb8751a..5fe7099 100644
> > > > > --- a/kernel/kthread.c
> > > > > +++ b/kernel/kthread.c
> > > > > @@ -16,8 +16,6 @@
> > > > >  #include <linux/mutex.h>
> > > > >  #include <trace/events/sched.h>
> > > > >
> > > > > -#define KTHREAD_NICE_LEVEL (-5)
> > > > > -
> > > >
> > > > Why don't we just redefine it to 0? We may find out later that we'd
> > > > still prefer to have kernel threads have boost.
> > >
> > > Seems sensible, also the traditional reasoning behind this nice level
> > > is that kernel threads do work on behalf of multiple tasks. Its a
> > > kind of prio ceiling thing.
> > 
> > True.  None of our current threads are heavy enough to matter much.
> 
> Does it make sense to have this as a tunable? Where does it matter? Server 
> workloads?

I don't think it should be a knob.  It only makes a difference to
kthreads that are heavy CPU users.  If one pops up as a performance
problem, IMHO, it should be tweaked separately.  Running at default
weight saves a bit of unnecessary math for the common case.

	-Mike


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-11  6:10             ` Ingo Molnar
  2009-09-11  8:55               ` Serge Belyshev
@ 2009-09-13 15:27               ` Serge Belyshev
  2009-09-13 15:47                 ` Ingo Molnar
  2009-09-16 19:45                 ` Ingo Molnar
  1 sibling, 2 replies; 216+ messages in thread
From: Serge Belyshev @ 2009-09-13 15:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

[-- Attachment #1: Type: text/plain, Size: 5259 bytes --]

Ingo Molnar <mingo@elte.hu> writes:

> Could still compare -j5 to -j4 on -tip, to see why -j4 is 3% short 
> of -j5's throughput.
>
> (Plus maybe the NEW_FAIR_SLEEPERS change in -tip fixes the 3% drop.)

I've hacked the "perf" tool a bit (see the attached patch) and obtained
the following results with "perf stat -r9 make -j4 -- make clean":

--------------------------------------------------------------------------------
2.6.31-tip-01381-gf4c92b6:

 Performance counter stats for 'make -j4' (9 runs):

   94730.871493  task-clock-msecs         #      3.503 CPUs    ( +-   0.022% )
          17499  context-switches         #      0.000 M/sec   ( +-   0.270% )
           1127  CPU-migrations           #      0.000 M/sec   ( +-   1.159% )
        4601950  page-faults              #      0.049 M/sec   ( +-   0.001% )
   237853044260  cycles                   #   2510.829 M/sec   ( +-   0.022% )
   200554338937  instructions             #      0.843 IPC     ( +-   0.000% )
    70615226021  cache-references         #    745.430 M/sec   ( +-   0.002% )
     2268068445  cache-misses             #     23.942 M/sec   ( +-   0.020% )

   27.046390652  seconds time elapsed   ( +-   0.085% )


 Performance counter stats for 'make -j5' (9 runs):

   95283.766024  task-clock-msecs         #      3.541 CPUs    ( +-   0.024% )
          20612  context-switches         #      0.000 M/sec   ( +-   0.297% )
           1644  CPU-migrations           #      0.000 M/sec   ( +-   0.711% )
        4601733  page-faults              #      0.048 M/sec   ( +-   0.001% )
   239234709890  cycles                   #   2510.760 M/sec   ( +-   0.024% )
   200562340756  instructions             #      0.838 IPC     ( +-   0.000% )
    70575215911  cache-references         #    740.685 M/sec   ( +-   0.004% )
     2266361085  cache-misses             #     23.785 M/sec   ( +-   0.017% )

   26.909772228  seconds time elapsed   ( +-   0.041% )


 Performance counter stats for 'make -j6' (9 runs):

   95664.586804  task-clock-msecs         #      3.568 CPUs    ( +-   0.020% )
          22358  context-switches         #      0.000 M/sec   ( +-   0.194% )
           1835  CPU-migrations           #      0.000 M/sec   ( +-   0.659% )
        4601777  page-faults              #      0.048 M/sec   ( +-   0.001% )
   240188102235  cycles                   #   2510.732 M/sec   ( +-   0.020% )
   200569036210  instructions             #      0.835 IPC     ( +-   0.000% )
    70565619805  cache-references         #    737.636 M/sec   ( +-   0.006% )
     2265714056  cache-misses             #     23.684 M/sec   ( +-   0.017% )

   26.808072908  seconds time elapsed   ( +-   0.063% )
--------------------------------------------------------------------------------
2.6.31-tip-01385-g1ca1afc:

 Performance counter stats for 'make -j4' (9 runs):

   94873.128287  task-clock-msecs         #      3.422 CPUs    ( +-   0.020% )
          13196  context-switches         #      0.000 M/sec   ( +-   0.181% )
           1777  CPU-migrations           #      0.000 M/sec   ( +-   0.664% )
        4601784  page-faults              #      0.049 M/sec   ( +-   0.001% )
   238126976192  cycles                   #   2509.952 M/sec   ( +-   0.020% )
   200545291785  instructions             #      0.842 IPC     ( +-   0.000% )
    70607104279  cache-references         #    744.227 M/sec   ( +-   0.005% )
     2266980390  cache-misses             #     23.895 M/sec   ( +-   0.022% )

   27.723375828  seconds time elapsed   ( +-   0.134% )


 Performance counter stats for 'make -j5' (9 runs):

   95363.595017  task-clock-msecs         #      3.519 CPUs    ( +-   0.015% )
          15003  context-switches         #      0.000 M/sec   ( +-   0.294% )
           1880  CPU-migrations           #      0.000 M/sec   ( +-   1.017% )
        4601648  page-faults              #      0.048 M/sec   ( +-   0.001% )
   239351549481  cycles                   #   2509.884 M/sec   ( +-   0.015% )
   200552379266  instructions             #      0.838 IPC     ( +-   0.000% )
    70576977633  cache-references         #    740.083 M/sec   ( +-   0.006% )
     2265294365  cache-misses             #     23.754 M/sec   ( +-   0.012% )

   27.096710006  seconds time elapsed   ( +-   0.074% )


 Performance counter stats for 'make -j6' (9 runs):

   95739.941233  task-clock-msecs         #      3.568 CPUs    ( +-   0.024% )
          16327  context-switches         #      0.000 M/sec   ( +-   0.230% )
           1863  CPU-migrations           #      0.000 M/sec   ( +-   0.623% )
        4601715  page-faults              #      0.048 M/sec   ( +-   0.002% )
   240292969273  cycles                   #   2509.851 M/sec   ( +-   0.024% )
   200554772612  instructions             #      0.835 IPC     ( +-   0.000% )
    70566092747  cache-references         #    737.060 M/sec   ( +-   0.004% )
     2264738533  cache-misses             #     23.655 M/sec   ( +-   0.021% )

   26.836474777  seconds time elapsed   ( +-   0.181% )
--------------------------------------------------------------------------------

Note that the disabling NEW_FAIR_SLEEPERS doesn't fix 3% regression from
v2.6.23, but instead makes "make -j4" runtime another 2% worse (27.05 -> 27.72).


[-- Attachment #2: perf cleanup-command patch --]
[-- Type: text/plain, Size: 1716 bytes --]

---
 tools/perf/builtin-stat.c |   18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

Index: linux/tools/perf/builtin-stat.c
===================================================================
--- linux.orig/tools/perf/builtin-stat.c
+++ linux/tools/perf/builtin-stat.c
@@ -44,6 +44,7 @@
 #include "util/parse-events.h"
 #include "util/event.h"
 #include "util/debug.h"
+#include "util/run-command.h"
 
 #include <sys/prctl.h>
 #include <math.h>
@@ -458,7 +459,7 @@ static const struct option options[] = {
 
 int cmd_stat(int argc, const char **argv, const char *prefix __used)
 {
-	int status;
+	int status, j, cleanup, cleanup_argc;
 
 	argc = parse_options(argc, argv, options, stat_usage,
 		PARSE_OPT_STOP_AT_NON_OPTION);
@@ -467,6 +468,19 @@ int cmd_stat(int argc, const char **argv
 	if (run_count <= 0)
 		usage_with_options(stat_usage, options);
 
+	// quick ugly hack: if a "--" appears in the command, treat is as
+	// a delimiter and use remaining part as a "cleanup command",
+	// not affecting performance counters.
+	cleanup = cleanup_argc = 0;
+	for (j = 1; j < (argc-1); j ++) {
+		if (!strcmp (argv[j], "--")) {
+			cleanup = j + 1;
+			cleanup_argc = argc - j - 1;
+			argv[j] = NULL;
+			argc = j;
+		}
+	}
+
 	/* Set attrs and nr_counters if no event is selected and !null_run */
 	if (!null_run && !nr_counters) {
 		memcpy(attrs, default_attrs, sizeof(default_attrs));
@@ -493,6 +507,8 @@ int cmd_stat(int argc, const char **argv
 		if (run_count != 1 && verbose)
 			fprintf(stderr, "[ perf stat: executing run #%d ... ]\n", run_idx + 1);
 		status = run_perf_stat(argc, argv);
+		if (cleanup)
+			run_command_v_opt (&argv [cleanup], 0);
 	}
 
 	print_stat(argc, argv);

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-13 15:27               ` Serge Belyshev
@ 2009-09-13 15:47                 ` Ingo Molnar
  2009-09-13 19:17                   ` Mike Galbraith
  2009-09-16 19:45                 ` Ingo Molnar
  1 sibling, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-13 15:47 UTC (permalink / raw)
  To: Serge Belyshev; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith


* Serge Belyshev <belyshev@depni.sinp.msu.ru> wrote:

> Note that the disabling NEW_FAIR_SLEEPERS doesn't fix 3% 
> regression from v2.6.23, but instead makes "make -j4" runtime 
> another 2% worse (27.05 -> 27.72).

ok - thanks for the numbers, will have a look.

> ---
>  tools/perf/builtin-stat.c |   18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)

> +	// quick ugly hack: if a "--" appears in the command, treat is as
> +	// a delimiter and use remaining part as a "cleanup command",
> +	// not affecting performance counters.
> +	cleanup = cleanup_argc = 0;
> +	for (j = 1; j < (argc-1); j ++) {
> +		if (!strcmp (argv[j], "--")) {
> +			cleanup = j + 1;
> +			cleanup_argc = argc - j - 1;
> +			argv[j] = NULL;
> +			argc = j;
> +		}
> +	}

Nice feature!

How about doing it a bit cleaner, as '--repeat-prepare' and 
'--repeat-cleanup' options, to allow both pre-repeat and post-repeat 
cleanup ops to be done outside of the measured period?

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-13 15:47                 ` Ingo Molnar
@ 2009-09-13 19:17                   ` Mike Galbraith
  2009-09-14  6:15                     ` Mike Galbraith
  0 siblings, 1 reply; 216+ messages in thread
From: Mike Galbraith @ 2009-09-13 19:17 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Serge Belyshev, Con Kolivas, linux-kernel, Peter Zijlstra

On Sun, 2009-09-13 at 17:47 +0200, Ingo Molnar wrote:
> * Serge Belyshev <belyshev@depni.sinp.msu.ru> wrote:
> 
> > Note that the disabling NEW_FAIR_SLEEPERS doesn't fix 3% 
> > regression from v2.6.23, but instead makes "make -j4" runtime 
> > another 2% worse (27.05 -> 27.72).
> 
> ok - thanks for the numbers, will have a look.

Seems NEXT_BUDDY is hurting the -j4 build.

LAST_BUDDY helps, which makes some sense.. if a task has heated up
cache, and is wakeup preempted by a fast mover (kthread, make..), it can
get the CPU back with still toasty data.  Hm.  If NEXT_BUDDY is on, that
benefit would likely be frequently destroyed too, because NEXT_BUDDY is
preferred over LAST_BUDDY.

Anyway, I'm thinking of tracking forks/sec as a means of detecting the
fork/exec load.  Or, maybe just enable it when there's > 1 buddy pair
running.. or something.  After all, NEXT_BUDDY is about scalability, and
make -j4 on a quad surely doesn't need any scalability help :)

 Performance counter stats for 'make -j4 vmlinux':

stock
  111.625198810  seconds time elapsed  avg 112.120  1.00
  112.209501685  seconds time elapsed
  112.528258240  seconds time elapsed

NO_NEXT_BUDDY NO_LAST_BUDDY
  109.405064078  seconds time elapsed  avg 109.351  .975
  108.708076118  seconds time elapsed
  109.942346026  seconds time elapsed

NO_NEXT_BUDDY
  108.005756718  seconds time elapsed  avg 108.064  .963
  107.689862679  seconds time elapsed
  108.497117555  seconds time elapsed

NO_LAST_BUDDY
  110.208717063  seconds time elapsed  avg 110.120  .982
  110.362412902  seconds time elapsed
  109.791359601  seconds time elapsed


diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index aa7f841..7cfea64 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1501,7 +1501,8 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int sync)
 	 */
 	if (sched_feat(LAST_BUDDY) && likely(se->on_rq && curr != rq->idle))
 		set_last_buddy(se);
-	set_next_buddy(pse);
+	if (sched_feat(NEXT_BUDDY))
+		set_next_buddy(pse);
 
 	/*
 	 * We can come here with TIF_NEED_RESCHED already set from new task
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index e2dc63a..6e7070b 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -13,5 +13,6 @@ SCHED_FEAT(LB_BIAS, 1)
 SCHED_FEAT(LB_WAKEUP_UPDATE, 1)
 SCHED_FEAT(ASYM_EFF_LOAD, 1)
 SCHED_FEAT(WAKEUP_OVERLAP, 0)
+SCHED_FEAT(NEXT_BUDDY, 1)
 SCHED_FEAT(LAST_BUDDY, 1)
 SCHED_FEAT(OWNER_SPIN, 1)



^ permalink raw reply related	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-13 19:17                   ` Mike Galbraith
@ 2009-09-14  6:15                     ` Mike Galbraith
  0 siblings, 0 replies; 216+ messages in thread
From: Mike Galbraith @ 2009-09-14  6:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Serge Belyshev, Con Kolivas, linux-kernel, Peter Zijlstra

On Sun, 2009-09-13 at 21:17 +0200, Mike Galbraith wrote:

> Anyway, I'm thinking of tracking forks/sec as a means of detecting the
> fork/exec load.  Or, maybe just enable it when there's > 1 buddy pair
> running.. or something.  After all, NEXT_BUDDY is about scalability, and
> make -j4 on a quad surely doesn't need any scalability help :)

But, this buddy vs fork/exec thing is not at all cut and dried.  Even
with fork/exec load being the primary CPU consumer, there are genuine
buddies to worry about when you've got a GUI running, next/last buddy
can reduce the chances that an oinker slips in between X and client.

Ponder...

(oil for rusty old ponder machine welcome, gears grinding)

	-Mike


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Phoronix CFS vs BFS bencharks
  2009-09-06 20:59 BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
                   ` (6 preceding siblings ...)
  2009-09-10  7:43 ` [updated] BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
@ 2009-09-14  9:46 ` Nikos Chantziaras
  2009-09-14 11:35   ` Mike Galbraith
  7 siblings, 1 reply; 216+ messages in thread
From: Nikos Chantziaras @ 2009-09-14  9:46 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith

Phoronix has published some benchmarks, including some "non-synthetic" 
real-life applications:

   http://www.phoronix.com/vr.php?view=14179

The benchmarks are:

  * World of Padman
  * Timed Apache Compilation
  * Timed PHP Compilation
  * 7-Zip Compression
  * GraphicsMagick
  * Apache Benchmark
  * Threaded I/O Tester
  * PostMark

The test was performed on an Ubuntu 9.10 daily snapshot from 2009-09-10 
with the GNOME 2.27.91 desktop, X Server 1.6.3, NVIDIA 190.32 display 
driver, GCC 4.4.1, and an EXT4 file-system.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Phoronix CFS vs BFS bencharks
  2009-09-14  9:46 ` Phoronix CFS vs BFS bencharks Nikos Chantziaras
@ 2009-09-14 11:35   ` Mike Galbraith
       [not found]     ` <f42384a10909140727k463ff460q3859892dcb79bcc5@mail.gmail.com>
  0 siblings, 1 reply; 216+ messages in thread
From: Mike Galbraith @ 2009-09-14 11:35 UTC (permalink / raw)
  To: Nikos Chantziaras; +Cc: Ingo Molnar, Con Kolivas, linux-kernel, Peter Zijlstra

On Mon, 2009-09-14 at 12:46 +0300, Nikos Chantziaras wrote:
> Phoronix has published some benchmarks, including some "non-synthetic" 
> real-life applications:
> 
>    http://www.phoronix.com/vr.php?view=14179
> 
> The benchmarks are:
> 
>   * World of Padman
>   * Timed Apache Compilation
>   * Timed PHP Compilation
>   * 7-Zip Compression
>   * GraphicsMagick
>   * Apache Benchmark
>   * Threaded I/O Tester
>   * PostMark
> 
> The test was performed on an Ubuntu 9.10 daily snapshot from 2009-09-10 
> with the GNOME 2.27.91 desktop, X Server 1.6.3, NVIDIA 190.32 display 
> driver, GCC 4.4.1, and an EXT4 file-system.

Interesting results.

It'd be nice to see what difference the changes since .31 have made to
these comparisons.  In particular, child_runs_first was found to have a
substantial negative impact on parallel compiles, and has been turned
off.  The reduction of sched_latency has a rather large effect on worst
case latency for CPU hogs, so will likely affect some results markedly.

Hohum, back to the grindstone.

	-Mike


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable
  2009-09-11 13:33                                           ` Martin Schwidefsky
  2009-09-11 18:22                                             ` [tip:timers/core] clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash tip-bot for Martin Schwidefsky
@ 2009-09-14 15:19                                             ` Ingo Molnar
  2009-09-14 15:37                                               ` Martin Schwidefsky
  2009-09-14 17:59                                               ` Martin Schwidefsky
  1 sibling, 2 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-14 15:19 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Jens Axboe, John Stultz, Peter Zijlstra, Mike Galbraith,
	Con Kolivas, linux-kernel


* Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> On Fri, 11 Sep 2009 09:37:47 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > 
> > > * Ingo Molnar <mingo@elte.hu> wrote:
> > > 
> > > > 
> > > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > 
> > > > > I went to try -tip btw, but it crashes on boot. Here's the 
> > > > > backtrace, typed manually, it's crashing in 
> > > > > queue_work_on+0x28/0x60.
> > > > > 
> > > > > Call Trace:
> > > > >         queue_work
> > > > >         schedule_work
> > > > >         clocksource_mark_unstable
> > > > >         mark_tsc_unstable
> > > > >         check_tsc_sync_source
> > > > >         native_cpu_up
> > > > >         relay_hotcpu_callback
> > > > >         do_forK_idle
> > > > >         _cpu_up
> > > > >         cpu_up
> > > > >         kernel_init
> > > > >         kernel_thread_helper
> > > > 
> > > > hm, that looks like an old bug i fixed days ago via:
> > > > 
> > > >   00a3273: Revert "x86: Make tsc=reliable override boot time stability checks"
> > > > 
> > > > Have you tested tip:master - do you still know which sha1?
> > > 
> > > Ok, i reproduced it on a testbox and bisected it, the crash is 
> > > caused by:
> > > 
> > >  7285dd7fd375763bfb8ab1ac9cf3f1206f503c16 is first bad commit
> > >  commit 7285dd7fd375763bfb8ab1ac9cf3f1206f503c16
> > >  Author: Thomas Gleixner <tglx@linutronix.de>
> > >  Date:   Fri Aug 28 20:25:24 2009 +0200
> > > 
> > >     clocksource: Resolve cpu hotplug dead lock with TSC unstable
> > >     
> > >     Martin Schwidefsky analyzed it:
> > > 
> > > I've reverted it in tip/master for now.
> > 
> > and that uncovers the circular locking bug that this commit was 
> > supposed to fix ...
> > 
> > Martin?
> 
> This patch should fix the obvious problem that the watchdog_work
> structure is not yet initialized if the clocksource watchdog is not
> running yet.
> --
> Subject: [PATCH] clocksource: statically initialize watchdog workqueue
> 
> From: Martin Schwidefsky <schwidefsky@de.ibm.com>
> 
> The watchdog timer is started after the watchdog clocksource and at least
> one watched clocksource have been registered. The clocksource work element
> watchdog_work is initialized just before the clocksource timer is started.
> This is too late for the clocksource_mark_unstable call from native_cpu_up.
> To fix this use a static initializer for watchdog_work.
> 
> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> ---
>  kernel/time/clocksource.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/kernel/time/clocksource.c
> ===================================================================
> --- linux-2.6.orig/kernel/time/clocksource.c
> +++ linux-2.6/kernel/time/clocksource.c
> @@ -123,10 +123,12 @@ static DEFINE_MUTEX(clocksource_mutex);
>  static char override_name[32];
>  
>  #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
> +static void clocksource_watchdog_work(struct work_struct *work);
> +
>  static LIST_HEAD(watchdog_list);
>  static struct clocksource *watchdog;
>  static struct timer_list watchdog_timer;
> -static struct work_struct watchdog_work;
> +static DECLARE_WORK(watchdog_work, clocksource_watchdog_work);
>  static DEFINE_SPINLOCK(watchdog_lock);
>  static cycle_t watchdog_last;
>  static int watchdog_running;
> @@ -230,7 +232,6 @@ static inline void clocksource_start_wat
>  {
>  	if (watchdog_running || !watchdog || list_empty(&watchdog_list))
>  		return;
> -	INIT_WORK(&watchdog_work, clocksource_watchdog_work);
>  	init_timer(&watchdog_timer);
>  	watchdog_timer.function = clocksource_watchdog;
>  	watchdog_last = watchdog->read(watchdog);

Now another box crashes during bootup. Reverting these two:

 f79e025: clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
 7285dd7: clocksource: Resolve cpu hotplug dead lock with TSC unstable

allows me to boot it.

plain 32-bit defconfig.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Phoronix CFS vs BFS bencharks
       [not found]     ` <f42384a10909140727k463ff460q3859892dcb79bcc5@mail.gmail.com>
@ 2009-09-14 15:32       ` Mike Galbraith
  2009-09-14 19:14         ` Marcin Letyns
  0 siblings, 1 reply; 216+ messages in thread
From: Mike Galbraith @ 2009-09-14 15:32 UTC (permalink / raw)
  To: Marcin Letyns
  Cc: Nikos Chantziaras, Ingo Molnar, Con Kolivas, linux-kernel,
	Peter Zijlstra

On Mon, 2009-09-14 at 16:27 +0200, Marcin Letyns wrote:
> Hello,
> 
> Disabling NEW_FAIR_SLEEPERS makes a lot of difference here in the
> Apache benchmark:
> 
> 2.6.30.6-bfs: 7311.05
> 
> 2.6.30.6-cfs-fair_sl_disabled: 8249.17
> 
> 2.6.30.6-cfs-fair_sl_enabled: 4894.99

Wow.

Some loads like wakeup preemption (mysql+oltp), and some hate it.  This
load appears to REALLY hate it (as does volanomark, but that thing is
extremely overloaded).  How many threads does that benchmark run
concurrently?

In any case, it's currently disabled in tip.  Time will tell which
benchmarks gain, and which lose.  With it disabled, anything light loses
when competing with hog(s).  There _are_ one heck of a lot of hogs out
there though, so maybe it _should_ be disabled by default.  Dunno.

	-Mike


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable
  2009-09-14 15:19                                             ` [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable Ingo Molnar
@ 2009-09-14 15:37                                               ` Martin Schwidefsky
  2009-09-14 17:59                                               ` Martin Schwidefsky
  1 sibling, 0 replies; 216+ messages in thread
From: Martin Schwidefsky @ 2009-09-14 15:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jens Axboe, John Stultz, Peter Zijlstra, Mike Galbraith,
	Con Kolivas, linux-kernel

On Mon, 14 Sep 2009 17:19:58 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> Now another box crashes during bootup. Reverting these two:
> 
>  f79e025: clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
>  7285dd7: clocksource: Resolve cpu hotplug dead lock with TSC unstable
> 
> allows me to boot it.
> 
> plain 32-bit defconfig.

I've seen the bug report. init_workqueues comes after smp_init.
The idea I'm currently playing with is a simple check in the tsc
code if the tsc clocksource is already registered or not.
When smp_init is called the tsc is not yet registered, we could
just set the rating to zero.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable
  2009-09-14 15:19                                             ` [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable Ingo Molnar
  2009-09-14 15:37                                               ` Martin Schwidefsky
@ 2009-09-14 17:59                                               ` Martin Schwidefsky
  1 sibling, 0 replies; 216+ messages in thread
From: Martin Schwidefsky @ 2009-09-14 17:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jens Axboe, John Stultz, Peter Zijlstra, Mike Galbraith,
	Con Kolivas, linux-kernel

On Mon, 14 Sep 2009 17:19:58 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> Now another box crashes during bootup. Reverting these two:
> 
>  f79e025: clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
>  7285dd7: clocksource: Resolve cpu hotplug dead lock with TSC unstable
> 
> allows me to boot it.
> 
> plain 32-bit defconfig.

Ok, I forced the situation where the bad thing happens. With the patch below
the crash goes away.

[    0.152056] checking TSC synchronization [CPU#0 -> CPU#1]:
[    0.156001] Measured 0 cycles TSC warp between CPUs, turning off TSC clock.
[    0.156001] Marking TSC unstable due to check_tsc_sync_source failed

Is there a reason why we need the TSC as a clocksource early in the boot
process?

--
Subject: clocksource: delay tsc clocksource registration

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

Until the tsc clocksource has been registered it can be
downgraded by setting the CLOCK_SOURCE_UNSTABLE bit and the
rating to zero. Once the tsc clocksource is registered a
work queue is needed to change the rating.

Delay the registration of the tsc clocksource to a point in
the boot process after the work queues have been initialized.

This hopefully finally resolves the boot crash due to the
tsc downgrade.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: John Stultz <johnstul@us.ibm.com>
---

Index: linux-2.6-tip/arch/x86/kernel/tsc.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/tsc.c	2009-09-14 19:25:02.000000000 +0200
+++ linux-2.6-tip/arch/x86/kernel/tsc.c	2009-09-14 19:30:13.000000000 +0200
@@ -853,9 +853,16 @@
 		clocksource_tsc.rating = 0;
 		clocksource_tsc.flags &= ~CLOCK_SOURCE_IS_CONTINUOUS;
 	}
+}
+
+static int __init register_tsc_clocksource(void)
+{
 	clocksource_register(&clocksource_tsc);
+	return 0;
 }
 
+core_initcall(register_tsc_clocksource);
+
 #ifdef CONFIG_X86_64
 /*
  * calibrate_cpu is used on systems with fixed rate TSCs to determine
-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Phoronix CFS vs BFS bencharks
  2009-09-14 15:32       ` Mike Galbraith
@ 2009-09-14 19:14         ` Marcin Letyns
  2009-09-14 20:49           ` Willy Tarreau
  0 siblings, 1 reply; 216+ messages in thread
From: Marcin Letyns @ 2009-09-14 19:14 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Nikos Chantziaras, Ingo Molnar, Con Kolivas, linux-kernel,
	Peter Zijlstra

2009/9/14 Mike Galbraith <efault@gmx.de>
>
> On Mon, 2009-09-14 at 16:27 +0200, Marcin Letyns wrote:
> > Hello,
> >
> > Disabling NEW_FAIR_SLEEPERS makes a lot of difference here in the
> > Apache benchmark:
> >
> > 2.6.30.6-bfs: 7311.05
> >
> > 2.6.30.6-cfs-fair_sl_disabled: 8249.17
> >
> > 2.6.30.6-cfs-fair_sl_enabled: 4894.99
>
> Wow.
>
> Some loads like wakeup preemption (mysql+oltp), and some hate it.  This
> load appears to REALLY hate it (as does volanomark, but that thing is
> extremely overloaded).  How many threads does that benchmark run
> concurrently?

>From the benchmark description:

This is a test of ab, which is the Apache Benchmark program. This test
profile measures how many requests per second a given system can
sustain when carrying out 500,000 requests with 100 requests being
carried out concurrently.

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Phoronix CFS vs BFS bencharks
  2009-09-14 19:14         ` Marcin Letyns
@ 2009-09-14 20:49           ` Willy Tarreau
  2009-09-15  8:37             ` Mike Galbraith
  0 siblings, 1 reply; 216+ messages in thread
From: Willy Tarreau @ 2009-09-14 20:49 UTC (permalink / raw)
  To: Marcin Letyns
  Cc: Mike Galbraith, Nikos Chantziaras, Ingo Molnar, Con Kolivas,
	linux-kernel, Peter Zijlstra

On Mon, Sep 14, 2009 at 09:14:35PM +0200, Marcin Letyns wrote:
> 2009/9/14 Mike Galbraith <efault@gmx.de>
> >
> > On Mon, 2009-09-14 at 16:27 +0200, Marcin Letyns wrote:
> > > Hello,
> > >
> > > Disabling NEW_FAIR_SLEEPERS makes a lot of difference here in the
> > > Apache benchmark:
> > >
> > > 2.6.30.6-bfs: 7311.05
> > >
> > > 2.6.30.6-cfs-fair_sl_disabled: 8249.17
> > >
> > > 2.6.30.6-cfs-fair_sl_enabled: 4894.99
> >
> > Wow.
> >
> > Some loads like wakeup preemption (mysql+oltp), and some hate it.  This
> > load appears to REALLY hate it (as does volanomark, but that thing is
> > extremely overloaded).  How many threads does that benchmark run
> > concurrently?
> 
> >From the benchmark description:
> 
> This is a test of ab, which is the Apache Benchmark program. This test
> profile measures how many requests per second a given system can
> sustain when carrying out 500,000 requests with 100 requests being
> carried out concurrently.

Be careful not to run ab on the same machine as you run apache, otherwise
the numerous apache processes can limit ab's throughput. This is the same
reason as why I educate people so that they don't run a single-process
proxy in front of a multi-process/multi-thread web server. Apparently
it's not obvious to everyone.

Regards,
Willy


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Phoronix CFS vs BFS bencharks
  2009-09-14 20:49           ` Willy Tarreau
@ 2009-09-15  8:37             ` Mike Galbraith
  0 siblings, 0 replies; 216+ messages in thread
From: Mike Galbraith @ 2009-09-15  8:37 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Marcin Letyns, Nikos Chantziaras, Ingo Molnar, Con Kolivas,
	linux-kernel, Peter Zijlstra

On Mon, 2009-09-14 at 22:49 +0200, Willy Tarreau wrote:
> On Mon, Sep 14, 2009 at 09:14:35PM +0200, Marcin Letyns wrote:
> > 2009/9/14 Mike Galbraith <efault@gmx.de>
> > >
> > > On Mon, 2009-09-14 at 16:27 +0200, Marcin Letyns wrote:
> > > > Hello,
> > > >
> > > > Disabling NEW_FAIR_SLEEPERS makes a lot of difference here in the
> > > > Apache benchmark:
> > > >
> > > > 2.6.30.6-bfs: 7311.05
> > > >
> > > > 2.6.30.6-cfs-fair_sl_disabled: 8249.17
> > > >
> > > > 2.6.30.6-cfs-fair_sl_enabled: 4894.99
> > >
> > > Wow.
> > >
> > > Some loads like wakeup preemption (mysql+oltp), and some hate it.  This
> > > load appears to REALLY hate it (as does volanomark, but that thing is
> > > extremely overloaded).  How many threads does that benchmark run
> > > concurrently?
> > 
> > >From the benchmark description:
> > 
> > This is a test of ab, which is the Apache Benchmark program. This test
> > profile measures how many requests per second a given system can
> > sustain when carrying out 500,000 requests with 100 requests being
> > carried out concurrently.
> 
> Be careful not to run ab on the same machine as you run apache, otherwise
> the numerous apache processes can limit ab's throughput. This is the same
> reason as why I educate people so that they don't run a single-process
> proxy in front of a multi-process/multi-thread web server. Apparently
> it's not obvious to everyone.

I turned on apache, and played with ab a bit, and yup, ab is a hog, so
any fairness hurts it a badly.  Ergo, running ab on the same box as
apache suffers with CFS when NEW_FAIR_SLEEPERS are turned on.  Issuing
ab bandwidth to match it's 1:N pig nature brings throughput right back.

(In all the comparison testing I've done, BFS favors hogs, and with
NEW_FAIR_SLEEPERS turned off, so does CFS, though not as much.)

Running apache on one core and ab on another (with shared cache tho),
something went south with BFS. I would have expected it to be much
closer (shrug).  

Some likely not very uninteresting numbers below.  I wasted a lot more
of my time generating them than anyone will downloading them :)

ab -n 500000 -c 100 http://localhost/openSUSE.org.html

2.6.31-bfs221-smp
Concurrency Level:      100
Time taken for tests:   43.556 seconds
Complete requests:      500000
Failed requests:        0
Write errors:           0
Total transferred:      7158558404 bytes
HTML transferred:       7027047358 bytes
Requests per second:    11479.50 [#/sec] (mean)
Time per request:       8.711 [ms] (mean)
Time per request:       0.087 [ms] (mean, across all concurrent requests)
Transfer rate:          160501.38 [Kbytes/sec] received

2.6.32-tip-smp NO_NEW_FAIR_SLEEPERS
Concurrency Level:      100
Time taken for tests:   42.834 seconds
Complete requests:      500000
Failed requests:        0
Write errors:           0
Total transferred:      7158429480 bytes
HTML transferred:       7026921590 bytes
Requests per second:    11672.84 [#/sec] (mean)
Time per request:       8.567 [ms] (mean)
Time per request:       0.086 [ms] (mean, across all concurrent requests)
Transfer rate:          163201.63 [Kbytes/sec] received

2.6.32-tip-smp NEW_FAIR_SLEEPERS
Concurrency Level:      100
Time taken for tests:   68.221 seconds
Complete requests:      500000
Failed requests:        0
Write errors:           0
Total transferred:      7158357900 bytes
HTML transferred:       7026851325 bytes
Requests per second:    7329.12 [#/sec] (mean)
Time per request:       13.644 [ms] (mean)
Time per request:       0.136 [ms] (mean, across all concurrent requests)
Transfer rate:          102469.65 [Kbytes/sec] received

2.6.32-tip-smp NEW_FAIR_SLEEPERS + ab at nice -15
Concurrency Level:      100
Time taken for tests:   42.824 seconds
Complete requests:      500000
Failed requests:        0
Write errors:           0
Total transferred:      7158451988 bytes
HTML transferred:       7026943572 bytes
Requests per second:    11675.68 [#/sec] (mean)
Time per request:       8.565 [ms] (mean)
Time per request:       0.086 [ms] (mean, across all concurrent requests)
Transfer rate:          163241.78 [Kbytes/sec] received

taskset -c 2 /etc/init.d/apache2 restart
taskset -c 3 ab -n 500000 -c 100 http://localhost/openSUSE.org.html

2.6.31-bfs221-smp
Concurrency Level:      100
Time taken for tests:   86.590 seconds
Complete requests:      500000
Failed requests:        0
Write errors:           0
Total transferred:      7158000000 bytes
HTML transferred:       7026500000 bytes
Requests per second:    5774.37 [#/sec] (mean)
Time per request:       17.318 [ms] (mean)
Time per request:       0.173 [ms] (mean, across all concurrent requests)
Transfer rate:          80728.41 [Kbytes/sec] received

2.6.32-tip-smp
Concurrency Level:      100
Time taken for tests:   48.640 seconds
Complete requests:      500000
Failed requests:        0
Write errors:           0
Total transferred:      7158000000 bytes
HTML transferred:       7026500000 bytes
Requests per second:    10279.71 [#/sec] (mean)
Time per request:       9.728 [ms] (mean)
Time per request:       0.097 [ms] (mean, across all concurrent requests)
Transfer rate:          143715.15 [Kbytes/sec] received


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-11  1:36               ` Benjamin Herrenschmidt
@ 2009-09-16 18:27                 ` Frans Pop
  2009-09-17  1:29                   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 216+ messages in thread
From: Frans Pop @ 2009-09-16 18:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: arjan, realnc, linux-kernel

Benjamin Herrenschmidt wrote:
> I'll have a look after the merge window madness. Multiple windows is
> also still an option I suppose even if i don't like it that much: we
> could support double-click on an app or "global" in the left list,
> making that pop a new window with the same content as the right pane for
> that app (or global) that updates at the same time as the rest.
 
I have another request. If I select a specific application to watch (say a 
mail client) but it is idle for a while and thus has no latencies, it will 
get dropped from the list and thus my selection of it will be lost.

It would be nice if in that case a selected application would stay visible 
and selected, or maybe get reselected automatically when it appears again.

Thanks,
FJP

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-13 15:27               ` Serge Belyshev
  2009-09-13 15:47                 ` Ingo Molnar
@ 2009-09-16 19:45                 ` Ingo Molnar
  2009-09-16 23:18                   ` Serge Belyshev
  1 sibling, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-16 19:45 UTC (permalink / raw)
  To: Serge Belyshev; +Cc: linux-kernel


* Serge Belyshev <belyshev@depni.sinp.msu.ru> wrote:

> Note that the disabling NEW_FAIR_SLEEPERS doesn't fix 3% regression 
> from v2.6.23, but instead makes "make -j4" runtime another 2% worse 
> (27.05 -> 27.72).

Ok, i think we've got a handle on that finally - mind checking latest 
-tip?

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: Epic regression in throughput since v2.6.23
  2009-09-16 19:45                 ` Ingo Molnar
@ 2009-09-16 23:18                   ` Serge Belyshev
  2009-09-17  4:55                     ` [patchlet] " Mike Galbraith
  0 siblings, 1 reply; 216+ messages in thread
From: Serge Belyshev @ 2009-09-16 23:18 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo Molnar <mingo@elte.hu> writes:

> Ok, i think we've got a handle on that finally - mind checking latest 
> -tip?

Kernel build benchmark:
http://img11.imageshack.us/img11/4544/makej20090916.png

I have also repeated video encode benchmarks described here:
http://article.gmane.org/gmane.linux.kernel/889444

"x264 --preset ultrafast":
http://img11.imageshack.us/img11/9020/ultrafast20090916.png

"x264 --preset medium":
http://img11.imageshack.us/img11/7729/medium20090916.png

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-16 18:27                 ` Frans Pop
@ 2009-09-17  1:29                   ` Benjamin Herrenschmidt
  2009-10-01  9:36                     ` Frans Pop
  0 siblings, 1 reply; 216+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-17  1:29 UTC (permalink / raw)
  To: Frans Pop; +Cc: arjan, realnc, linux-kernel

On Wed, 2009-09-16 at 20:27 +0200, Frans Pop wrote:
> Benjamin Herrenschmidt wrote:
> > I'll have a look after the merge window madness. Multiple windows is
> > also still an option I suppose even if i don't like it that much: we
> > could support double-click on an app or "global" in the left list,
> > making that pop a new window with the same content as the right pane for
> > that app (or global) that updates at the same time as the rest.
>  
> I have another request. If I select a specific application to watch (say a 
> mail client) but it is idle for a while and thus has no latencies, it will 
> get dropped from the list and thus my selection of it will be lost.
> 
> It would be nice if in that case a selected application would stay visible 
> and selected, or maybe get reselected automatically when it appears again.

Hrm... I though I forced the selected app to remain ... or maybe I
wanted to do that and failed :-) Ok. On the list. Please ping me next
week if nothing happens.

Ben.



^ permalink raw reply	[flat|nested] 216+ messages in thread

* [patchlet] Re: Epic regression in throughput since v2.6.23
  2009-09-16 23:18                   ` Serge Belyshev
@ 2009-09-17  4:55                     ` Mike Galbraith
  2009-09-17  5:06                       ` Mike Galbraith
  0 siblings, 1 reply; 216+ messages in thread
From: Mike Galbraith @ 2009-09-17  4:55 UTC (permalink / raw)
  To: Serge Belyshev; +Cc: Ingo Molnar, linux-kernel

On Wed, 2009-09-16 at 23:18 +0000, Serge Belyshev wrote:
> Ingo Molnar <mingo@elte.hu> writes:
> 
> > Ok, i think we've got a handle on that finally - mind checking latest 
> > -tip?
> 
> Kernel build benchmark:
> http://img11.imageshack.us/img11/4544/makej20090916.png
> 
> I have also repeated video encode benchmarks described here:
> http://article.gmane.org/gmane.linux.kernel/889444
> 
> "x264 --preset ultrafast":
> http://img11.imageshack.us/img11/9020/ultrafast20090916.png
> 
> "x264 --preset medium":
> http://img11.imageshack.us/img11/7729/medium20090916.png

Pre-ramble..
Most of the performance differences I've examined in all these CFS vs
BFS threads boil down to fair scheduler vs unfair scheduler.  If you
favor hogs, naturally, hogs getting more bandwidth perform better than
hogs getting their fair share.  That's wonderful for hogs, somewhat less
than wonderful for their competition.  That fairness is not necessarily
the best thing for throughput is well known.  If you've got a single
dissimilar task load running alone, favoring hogs may perform better..
or not.  What about mixed loads though?  Is the throughput of frequent
switchers less important than hog throughput?

Moving right along..

That x264 thing uncovered an interesting issue within CFS.  That load is
a frequent clone() customer, and when it has to compete against a not so
fork/clone happy load, it suffers mightily.  Even when running solo, ie
only competing against it's own siblings, IFF sleeper fairness is
enabled, the pain of thread startup latency is quite visible.  With
concurrent loads, it is agonizingly painful.

concurrent load test
tbench 8 vs
x264 --preset ultrafast --no-scenecut --sync-lookahead 0 --qp 20 -o /dev/null --threads 8 soccer_4cif.y4m

(i can turn knobs and get whatever numbers i want, including
outperforming bfs, concurrent or solo.. not the point)

START_DEBIT
encoded 600 frames, 44.29 fps, 22096.60 kb/s
encoded 600 frames, 43.59 fps, 22096.60 kb/s
encoded 600 frames, 43.78 fps, 22096.60 kb/s
encoded 600 frames, 43.77 fps, 22096.60 kb/s
encoded 600 frames, 45.67 fps, 22096.60 kb/s

8   1068214   672.35 MB/sec  execute  57 sec
8   1083785   672.16 MB/sec  execute  58 sec
8   1099188   672.18 MB/sec  execute  59 sec
8   1114626   672.00 MB/sec  cleanup  60 sec
8   1114626   671.96 MB/sec  cleanup  60 sec

NO_START_DEBIT
encoded 600 frames, 123.19 fps, 22096.60 kb/s
encoded 600 frames, 123.85 fps, 22096.60 kb/s
encoded 600 frames, 120.05 fps, 22096.60 kb/s
encoded 600 frames, 123.43 fps, 22096.60 kb/s
encoded 600 frames, 121.27 fps, 22096.60 kb/s

8    848135   533.79 MB/sec  execute  57 sec
8    860829   534.08 MB/sec  execute  58 sec
8    872840   533.74 MB/sec  execute  59 sec
8    885036   533.66 MB/sec  cleanup  60 sec
8    885036   533.64 MB/sec  cleanup  60 sec

2.6.31-bfs221-smp
encoded 600 frames, 169.00 fps, 22096.60 kb/s
encoded 600 frames, 163.85 fps, 22096.60 kb/s
encoded 600 frames, 161.00 fps, 22096.60 kb/s
encoded 600 frames, 155.57 fps, 22096.60 kb/s
encoded 600 frames, 162.01 fps, 22096.60 kb/s

8    458328   287.67 MB/sec  execute  57 sec
8    464442   288.68 MB/sec  execute  58 sec
8    471129   288.71 MB/sec  execute  59 sec
8    477643   288.61 MB/sec  cleanup  60 sec
8    477643   288.60 MB/sec  cleanup  60 sec

patchlet:

sched: disable START_DEBIT.

START_DEBIT induces unfairness to loads which fork/clone frequently when they
must compete against loads which do not.


Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

 kernel/sched_features.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index d5059fd..2fc94a0 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -23,7 +23,7 @@ SCHED_FEAT(NORMALIZED_SLEEPER, 0)
  * Place new tasks ahead so that they do not starve already running
  * tasks
  */
-SCHED_FEAT(START_DEBIT, 1)
+SCHED_FEAT(START_DEBIT, 0)
 
 /*
  * Should wakeups try to preempt running tasks.



^ permalink raw reply related	[flat|nested] 216+ messages in thread

* Re: [patchlet] Re: Epic regression in throughput since v2.6.23
  2009-09-17  4:55                     ` [patchlet] " Mike Galbraith
@ 2009-09-17  5:06                       ` Mike Galbraith
  2009-09-17  7:21                         ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Mike Galbraith @ 2009-09-17  5:06 UTC (permalink / raw)
  To: Serge Belyshev; +Cc: Ingo Molnar, linux-kernel, Peter Zijlstra

Aw poo, forgot to add Peter to CC list before poking xmit.

On Thu, 2009-09-17 at 06:55 +0200, Mike Galbraith wrote:
> On Wed, 2009-09-16 at 23:18 +0000, Serge Belyshev wrote:
> > Ingo Molnar <mingo@elte.hu> writes:
> > 
> > > Ok, i think we've got a handle on that finally - mind checking latest 
> > > -tip?
> > 
> > Kernel build benchmark:
> > http://img11.imageshack.us/img11/4544/makej20090916.png
> > 
> > I have also repeated video encode benchmarks described here:
> > http://article.gmane.org/gmane.linux.kernel/889444
> > 
> > "x264 --preset ultrafast":
> > http://img11.imageshack.us/img11/9020/ultrafast20090916.png
> > 
> > "x264 --preset medium":
> > http://img11.imageshack.us/img11/7729/medium20090916.png
> 
> Pre-ramble..
> Most of the performance differences I've examined in all these CFS vs
> BFS threads boil down to fair scheduler vs unfair scheduler.  If you
> favor hogs, naturally, hogs getting more bandwidth perform better than
> hogs getting their fair share.  That's wonderful for hogs, somewhat less
> than wonderful for their competition.  That fairness is not necessarily
> the best thing for throughput is well known.  If you've got a single
> dissimilar task load running alone, favoring hogs may perform better..
> or not.  What about mixed loads though?  Is the throughput of frequent
> switchers less important than hog throughput?
> 
> Moving right along..
> 
> That x264 thing uncovered an interesting issue within CFS.  That load is
> a frequent clone() customer, and when it has to compete against a not so
> fork/clone happy load, it suffers mightily.  Even when running solo, ie
> only competing against it's own siblings, IFF sleeper fairness is
> enabled, the pain of thread startup latency is quite visible.  With
> concurrent loads, it is agonizingly painful.
> 
> concurrent load test
> tbench 8 vs
> x264 --preset ultrafast --no-scenecut --sync-lookahead 0 --qp 20 -o /dev/null --threads 8 soccer_4cif.y4m
> 
> (i can turn knobs and get whatever numbers i want, including
> outperforming bfs, concurrent or solo.. not the point)
> 
> START_DEBIT
> encoded 600 frames, 44.29 fps, 22096.60 kb/s
> encoded 600 frames, 43.59 fps, 22096.60 kb/s
> encoded 600 frames, 43.78 fps, 22096.60 kb/s
> encoded 600 frames, 43.77 fps, 22096.60 kb/s
> encoded 600 frames, 45.67 fps, 22096.60 kb/s
> 
> 8   1068214   672.35 MB/sec  execute  57 sec
> 8   1083785   672.16 MB/sec  execute  58 sec
> 8   1099188   672.18 MB/sec  execute  59 sec
> 8   1114626   672.00 MB/sec  cleanup  60 sec
> 8   1114626   671.96 MB/sec  cleanup  60 sec
> 
> NO_START_DEBIT
> encoded 600 frames, 123.19 fps, 22096.60 kb/s
> encoded 600 frames, 123.85 fps, 22096.60 kb/s
> encoded 600 frames, 120.05 fps, 22096.60 kb/s
> encoded 600 frames, 123.43 fps, 22096.60 kb/s
> encoded 600 frames, 121.27 fps, 22096.60 kb/s
> 
> 8    848135   533.79 MB/sec  execute  57 sec
> 8    860829   534.08 MB/sec  execute  58 sec
> 8    872840   533.74 MB/sec  execute  59 sec
> 8    885036   533.66 MB/sec  cleanup  60 sec
> 8    885036   533.64 MB/sec  cleanup  60 sec
> 
> 2.6.31-bfs221-smp
> encoded 600 frames, 169.00 fps, 22096.60 kb/s
> encoded 600 frames, 163.85 fps, 22096.60 kb/s
> encoded 600 frames, 161.00 fps, 22096.60 kb/s
> encoded 600 frames, 155.57 fps, 22096.60 kb/s
> encoded 600 frames, 162.01 fps, 22096.60 kb/s
> 
> 8    458328   287.67 MB/sec  execute  57 sec
> 8    464442   288.68 MB/sec  execute  58 sec
> 8    471129   288.71 MB/sec  execute  59 sec
> 8    477643   288.61 MB/sec  cleanup  60 sec
> 8    477643   288.60 MB/sec  cleanup  60 sec
> 
> patchlet:
> 
> sched: disable START_DEBIT.
> 
> START_DEBIT induces unfairness to loads which fork/clone frequently when they
> must compete against loads which do not.
> 
> 
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> LKML-Reference: <new-submission>
> 
>  kernel/sched_features.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sched_features.h b/kernel/sched_features.h
> index d5059fd..2fc94a0 100644
> --- a/kernel/sched_features.h
> +++ b/kernel/sched_features.h
> @@ -23,7 +23,7 @@ SCHED_FEAT(NORMALIZED_SLEEPER, 0)
>   * Place new tasks ahead so that they do not starve already running
>   * tasks
>   */
> -SCHED_FEAT(START_DEBIT, 1)
> +SCHED_FEAT(START_DEBIT, 0)
>  
>  /*
>   * Should wakeups try to preempt running tasks.
> 


^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: [patchlet] Re: Epic regression in throughput since v2.6.23
  2009-09-17  5:06                       ` Mike Galbraith
@ 2009-09-17  7:21                         ` Ingo Molnar
  0 siblings, 0 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-17  7:21 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Serge Belyshev, linux-kernel, Peter Zijlstra


here's some start-debit versus non-start-debit numbers.

The workload: on a dual-core box start and kill 10 loops, once every 
second. PID 23137 is a shell doing interactive stuff. (running a loop of 
usleep 100000 and echo)

START_DEBIT:

europe:~> perf sched lat | grep 23137
  bash:23137            |     34.380 ms |      187 | avg:    0.005 ms | max:    0.017 ms |
  bash:23137            |     36.410 ms |      188 | avg:    0.005 ms | max:    0.011 ms |
  bash:23137            |     36.680 ms |      183 | avg:    0.007 ms | max:    0.333 ms |

NO_START_DEBIT:

europe:~> perf sched lat | grep 23137
  bash:23137            |     35.531 ms |      183 | avg:    0.005 ms | max:    0.019 ms |
  bash:23137            |     35.511 ms |      188 | avg:    0.007 ms | max:    0.334 ms |
  bash:23137            |     35.774 ms |      185 | avg:    0.005 ms | max:    0.019 ms |

Seems very similar at first sight.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-08 14:45       ` Michael Buesch
@ 2009-09-18 11:24         ` Ingo Molnar
  2009-09-18 14:46           ` Felix Fietkau
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-18 11:24 UTC (permalink / raw)
  To: Michael Buesch
  Cc: Con Kolivas, linux-kernel, Peter Zijlstra, Mike Galbraith, Felix Fietkau


* Michael Buesch <mb@bu3sch.de> wrote:

> On Tuesday 08 September 2009 09:48:25 Ingo Molnar wrote:
> > Mind poking on this one to figure out whether it's all repeatable 
> > and why that slowdown happens?
> 
> I repeated the test several times, because I couldn't really believe 
> that there's such a big difference for me, but the results were the 
> same. I don't really know what's going on nor how to find out what's 
> going on.

Well that's a really memory constrained MIPS device with like 16 MB of 
RAM or so? So having effects from small things like changing details in 
a kernel image is entirely plausible.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-18 11:24         ` Ingo Molnar
@ 2009-09-18 14:46           ` Felix Fietkau
  2009-09-19 18:01             ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Felix Fietkau @ 2009-09-18 14:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michael Buesch, Con Kolivas, linux-kernel, Peter Zijlstra,
	Mike Galbraith

Ingo Molnar wrote:
> * Michael Buesch <mb@bu3sch.de> wrote:
> 
>> On Tuesday 08 September 2009 09:48:25 Ingo Molnar wrote:
>> > Mind poking on this one to figure out whether it's all repeatable 
>> > and why that slowdown happens?
>> 
>> I repeated the test several times, because I couldn't really believe 
>> that there's such a big difference for me, but the results were the 
>> same. I don't really know what's going on nor how to find out what's 
>> going on.
> 
> Well that's a really memory constrained MIPS device with like 16 MB of 
> RAM or so? So having effects from small things like changing details in 
> a kernel image is entirely plausible.
Normally changing small details doesn't have much of an effect. While 16
MB is indeed not that much, we do usually have around 8 MB free with a
full user space running. Changes to other subsystems normally produce
consistent and repeatable differences that seem entirely unrelated to
memory use, so any measurable difference related to scheduler changes is
unlikely to be related to the low amount of RAM.
By the way, we do frequently also test the same software with devices
that have more RAM, e.g. 32 or 64 MB and it usually behaves in a very
similar way.

- Felix

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-18 14:46           ` Felix Fietkau
@ 2009-09-19 18:01             ` Ingo Molnar
  2009-09-19 18:43               ` Felix Fietkau
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-19 18:01 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Michael Buesch, Con Kolivas, linux-kernel, Peter Zijlstra,
	Mike Galbraith


* Felix Fietkau <nbd@openwrt.org> wrote:

> Ingo Molnar wrote:
> > * Michael Buesch <mb@bu3sch.de> wrote:
> > 
> >> On Tuesday 08 September 2009 09:48:25 Ingo Molnar wrote:
> >> > Mind poking on this one to figure out whether it's all repeatable 
> >> > and why that slowdown happens?
> >> 
> >> I repeated the test several times, because I couldn't really believe 
> >> that there's such a big difference for me, but the results were the 
> >> same. I don't really know what's going on nor how to find out what's 
> >> going on.
> > 
> > Well that's a really memory constrained MIPS device with like 16 MB of 
> > RAM or so? So having effects from small things like changing details in 
> > a kernel image is entirely plausible.
>
> Normally changing small details doesn't have much of an effect. While 
> 16 MB is indeed not that much, we do usually have around 8 MB free 
> with a full user space running. Changes to other subsystems normally 
> produce consistent and repeatable differences that seem entirely 
> unrelated to memory use, so any measurable difference related to 
> scheduler changes is unlikely to be related to the low amount of RAM. 
> By the way, we do frequently also test the same software with devices 
> that have more RAM, e.g. 32 or 64 MB and it usually behaves in a very 
> similar way.

Well, Michael Buesch posted vmstat results, and they show what i have 
found with my x86 simulated reproducer as well (these are Michael's 
numbers):

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0	   0  15892   1684   5868    0    0     0     0  268    6 31 69  0  0
 1  0	   0  15892   1684   5868    0    0     0     0  266    2 34 66  0  0
 1  0	   0  15892   1684   5868    0    0     0     0  266    6 33 67  0  0
 1  0	   0  15892   1684   5868    0    0     0     0  267    4 37 63  0  0
 1  0	   0  15892   1684   5868    0    0     0     0  267    6 34 66  0  0

on average 4 context switches _per second_. The scheduler is not a 
factor on this box.

Furthermore:

 | I'm currently unable to test BFS, because the device throws strange 
 | flash errors. Maybe the flash is broken :(

So maybe those flash errors somehow impacted the measurements as well?

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-19 18:01             ` Ingo Molnar
@ 2009-09-19 18:43               ` Felix Fietkau
  2009-09-19 19:39                 ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Felix Fietkau @ 2009-09-19 18:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michael Buesch, Con Kolivas, linux-kernel, Peter Zijlstra,
	Mike Galbraith

Ingo Molnar wrote:
> * Felix Fietkau <nbd@openwrt.org> wrote:
> 
>> Ingo Molnar wrote:
>> > Well that's a really memory constrained MIPS device with like 16 MB of 
>> > RAM or so? So having effects from small things like changing details in 
>> > a kernel image is entirely plausible.
>>
>> Normally changing small details doesn't have much of an effect. While 
>> 16 MB is indeed not that much, we do usually have around 8 MB free 
>> with a full user space running. Changes to other subsystems normally 
>> produce consistent and repeatable differences that seem entirely 
>> unrelated to memory use, so any measurable difference related to 
>> scheduler changes is unlikely to be related to the low amount of RAM. 
>> By the way, we do frequently also test the same software with devices 
>> that have more RAM, e.g. 32 or 64 MB and it usually behaves in a very 
>> similar way.
> 
> Well, Michael Buesch posted vmstat results, and they show what i have 
> found with my x86 simulated reproducer as well (these are Michael's 
> numbers):
> 
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  1  0	   0  15892   1684   5868    0    0     0     0  268    6 31 69  0  0
>  1  0	   0  15892   1684   5868    0    0     0     0  266    2 34 66  0  0
>  1  0	   0  15892   1684   5868    0    0     0     0  266    6 33 67  0  0
>  1  0	   0  15892   1684   5868    0    0     0     0  267    4 37 63  0  0
>  1  0	   0  15892   1684   5868    0    0     0     0  267    6 34 66  0  0
> 
> on average 4 context switches _per second_. The scheduler is not a 
> factor on this box.
> 
> Furthermore:
> 
>  | I'm currently unable to test BFS, because the device throws strange 
>  | flash errors. Maybe the flash is broken :(
> 
> So maybe those flash errors somehow impacted the measurements as well?
I did some tests with BFS v230 vs CFS on Linux 2.6.30 on a different
MIPS device (Atheros AR2317) with 180 MHz and 16 MB RAM. When running
iperf tests, I consistently get the following results when running the
transfer from the device to my laptop:

CFS: [  5]  0.0-60.0 sec    107 MBytes  15.0 Mbits/sec
BFS: [  5]  0.0-60.0 sec    119 MBytes  16.6 Mbits/sec

The transfer speed from my laptop to the device are the same with BFS
and CFS. I repeated the tests a few times just to be sure, and I will
check vmstat later.
The difference here cannot be flash related, as I ran a kernel image
with the whole userland contained in initramfs. No on-flash filesystem
was mounted or accessed.

- Felix

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-19 18:43               ` Felix Fietkau
@ 2009-09-19 19:39                 ` Ingo Molnar
  2009-09-19 20:15                   ` Felix Fietkau
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-19 19:39 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Michael Buesch, Con Kolivas, linux-kernel, Peter Zijlstra,
	Mike Galbraith


* Felix Fietkau <nbd@openwrt.org> wrote:

> Ingo Molnar wrote:
> > * Felix Fietkau <nbd@openwrt.org> wrote:
> > 
> >> Ingo Molnar wrote:
> >> > Well that's a really memory constrained MIPS device with like 16 MB of 
> >> > RAM or so? So having effects from small things like changing details in 
> >> > a kernel image is entirely plausible.
> >>
> >> Normally changing small details doesn't have much of an effect. While 
> >> 16 MB is indeed not that much, we do usually have around 8 MB free 
> >> with a full user space running. Changes to other subsystems normally 
> >> produce consistent and repeatable differences that seem entirely 
> >> unrelated to memory use, so any measurable difference related to 
> >> scheduler changes is unlikely to be related to the low amount of RAM. 
> >> By the way, we do frequently also test the same software with devices 
> >> that have more RAM, e.g. 32 or 64 MB and it usually behaves in a very 
> >> similar way.
> > 
> > Well, Michael Buesch posted vmstat results, and they show what i have 
> > found with my x86 simulated reproducer as well (these are Michael's 
> > numbers):
> > 
> > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> >  1  0	   0  15892   1684   5868    0    0     0     0  268    6 31 69  0  0
> >  1  0	   0  15892   1684   5868    0    0     0     0  266    2 34 66  0  0
> >  1  0	   0  15892   1684   5868    0    0     0     0  266    6 33 67  0  0
> >  1  0	   0  15892   1684   5868    0    0     0     0  267    4 37 63  0  0
> >  1  0	   0  15892   1684   5868    0    0     0     0  267    6 34 66  0  0
> > 
> > on average 4 context switches _per second_. The scheduler is not a 
> > factor on this box.
> > 
> > Furthermore:
> > 
> >  | I'm currently unable to test BFS, because the device throws strange 
> >  | flash errors. Maybe the flash is broken :(
> > 
> > So maybe those flash errors somehow impacted the measurements as well?
> I did some tests with BFS v230 vs CFS on Linux 2.6.30 on a different
> MIPS device (Atheros AR2317) with 180 MHz and 16 MB RAM. When running
> iperf tests, I consistently get the following results when running the
> transfer from the device to my laptop:
> 
> CFS: [  5]  0.0-60.0 sec    107 MBytes  15.0 Mbits/sec
> BFS: [  5]  0.0-60.0 sec    119 MBytes  16.6 Mbits/sec
> 
> The transfer speed from my laptop to the device are the same with BFS
> and CFS. I repeated the tests a few times just to be sure, and I will
> check vmstat later.

Which exact mainline kernel have you tried? For anything performance 
related running latest upstream -git (currently at 202c467) would be 
recommended.

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-19 19:39                 ` Ingo Molnar
@ 2009-09-19 20:15                   ` Felix Fietkau
  2009-09-19 20:22                     ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Felix Fietkau @ 2009-09-19 20:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michael Buesch, Con Kolivas, linux-kernel, Peter Zijlstra,
	Mike Galbraith

Ingo Molnar wrote:
> * Felix Fietkau <nbd@openwrt.org> wrote:
>> I did some tests with BFS v230 vs CFS on Linux 2.6.30 on a different
>> MIPS device (Atheros AR2317) with 180 MHz and 16 MB RAM. When running
>> iperf tests, I consistently get the following results when running the
>> transfer from the device to my laptop:
>> 
>> CFS: [  5]  0.0-60.0 sec    107 MBytes  15.0 Mbits/sec
>> BFS: [  5]  0.0-60.0 sec    119 MBytes  16.6 Mbits/sec
>> 
>> The transfer speed from my laptop to the device are the same with BFS
>> and CFS. I repeated the tests a few times just to be sure, and I will
>> check vmstat later.
> 
> Which exact mainline kernel have you tried? For anything performance 
> related running latest upstream -git (currently at 202c467) would be 
> recommended.
I used the OpenWrt-patched 2.6.30. Support for the hardware that I
tested with hasn't been merged upstream yet. Do you think that the
scheduler related changes after 2.6.30 are relevant for non-SMP
performance as well? If so, I'll work on a test with latest upstream
-git with the necessary patches when I have time for it.

- Felix

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-19 20:15                   ` Felix Fietkau
@ 2009-09-19 20:22                     ` Ingo Molnar
  2009-09-19 20:33                       ` Felix Fietkau
  0 siblings, 1 reply; 216+ messages in thread
From: Ingo Molnar @ 2009-09-19 20:22 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Michael Buesch, Con Kolivas, linux-kernel, Peter Zijlstra,
	Mike Galbraith


* Felix Fietkau <nbd@openwrt.org> wrote:

> Ingo Molnar wrote:
> > * Felix Fietkau <nbd@openwrt.org> wrote:
> >> I did some tests with BFS v230 vs CFS on Linux 2.6.30 on a different
> >> MIPS device (Atheros AR2317) with 180 MHz and 16 MB RAM. When running
> >> iperf tests, I consistently get the following results when running the
> >> transfer from the device to my laptop:
> >> 
> >> CFS: [  5]  0.0-60.0 sec    107 MBytes  15.0 Mbits/sec
> >> BFS: [  5]  0.0-60.0 sec    119 MBytes  16.6 Mbits/sec
> >> 
> >> The transfer speed from my laptop to the device are the same with BFS
> >> and CFS. I repeated the tests a few times just to be sure, and I will
> >> check vmstat later.
> > 
> > Which exact mainline kernel have you tried? For anything performance 
> > related running latest upstream -git (currently at 202c467) would be 
> > recommended.
>
> I used the OpenWrt-patched 2.6.30. Support for the hardware that I 
> tested with hasn't been merged upstream yet. Do you think that the 
> scheduler related changes after 2.6.30 are relevant for non-SMP 
> performance as well? If so, I'll work on a test with latest upstream 
> -git with the necessary patches when I have time for it.

Dont know - it's hard to tell what happens without basic analysis tools. 
Is there _any_ way to profile what happens on that system? (Do hrtimers 
work on it that could be used to profile it?)

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-19 20:22                     ` Ingo Molnar
@ 2009-09-19 20:33                       ` Felix Fietkau
  2009-09-20 18:10                         ` Ingo Molnar
  0 siblings, 1 reply; 216+ messages in thread
From: Felix Fietkau @ 2009-09-19 20:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michael Buesch, Con Kolivas, linux-kernel, Peter Zijlstra,
	Mike Galbraith

Ingo Molnar wrote:
> * Felix Fietkau <nbd@openwrt.org> wrote:
> 
>> Ingo Molnar wrote:
>> > * Felix Fietkau <nbd@openwrt.org> wrote:
>> >> I did some tests with BFS v230 vs CFS on Linux 2.6.30 on a different
>> >> MIPS device (Atheros AR2317) with 180 MHz and 16 MB RAM. When running
>> >> iperf tests, I consistently get the following results when running the
>> >> transfer from the device to my laptop:
>> >> 
>> >> CFS: [  5]  0.0-60.0 sec    107 MBytes  15.0 Mbits/sec
>> >> BFS: [  5]  0.0-60.0 sec    119 MBytes  16.6 Mbits/sec
>> >> 
>> >> The transfer speed from my laptop to the device are the same with BFS
>> >> and CFS. I repeated the tests a few times just to be sure, and I will
>> >> check vmstat later.
>> > 
>> > Which exact mainline kernel have you tried? For anything performance 
>> > related running latest upstream -git (currently at 202c467) would be 
>> > recommended.
>>
>> I used the OpenWrt-patched 2.6.30. Support for the hardware that I 
>> tested with hasn't been merged upstream yet. Do you think that the 
>> scheduler related changes after 2.6.30 are relevant for non-SMP 
>> performance as well? If so, I'll work on a test with latest upstream 
>> -git with the necessary patches when I have time for it.
> 
> Dont know - it's hard to tell what happens without basic analysis tools. 
> Is there _any_ way to profile what happens on that system? (Do hrtimers 
> work on it that could be used to profile it?)
oprofile doesn't have any support for it (mips r4k, no generic
perfcounters), the only usable clock source is a simple cpu cycle
counter (which is also used for the timer interrupt).

- Felix

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-19 20:33                       ` Felix Fietkau
@ 2009-09-20 18:10                         ` Ingo Molnar
  0 siblings, 0 replies; 216+ messages in thread
From: Ingo Molnar @ 2009-09-20 18:10 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Michael Buesch, Con Kolivas, linux-kernel, Peter Zijlstra,
	Mike Galbraith


* Felix Fietkau <nbd@openwrt.org> wrote:

> Ingo Molnar wrote:
> > * Felix Fietkau <nbd@openwrt.org> wrote:
> > 
> >> Ingo Molnar wrote:
> >> > * Felix Fietkau <nbd@openwrt.org> wrote:
> >> >> I did some tests with BFS v230 vs CFS on Linux 2.6.30 on a different
> >> >> MIPS device (Atheros AR2317) with 180 MHz and 16 MB RAM. When running
> >> >> iperf tests, I consistently get the following results when running the
> >> >> transfer from the device to my laptop:
> >> >> 
> >> >> CFS: [  5]  0.0-60.0 sec    107 MBytes  15.0 Mbits/sec
> >> >> BFS: [  5]  0.0-60.0 sec    119 MBytes  16.6 Mbits/sec
> >> >> 
> >> >> The transfer speed from my laptop to the device are the same with BFS
> >> >> and CFS. I repeated the tests a few times just to be sure, and I will
> >> >> check vmstat later.
> >> > 
> >> > Which exact mainline kernel have you tried? For anything performance 
> >> > related running latest upstream -git (currently at 202c467) would be 
> >> > recommended.
> >>
> >> I used the OpenWrt-patched 2.6.30. Support for the hardware that I 
> >> tested with hasn't been merged upstream yet. Do you think that the 
> >> scheduler related changes after 2.6.30 are relevant for non-SMP 
> >> performance as well? If so, I'll work on a test with latest upstream 
> >> -git with the necessary patches when I have time for it.
> > 
> > Dont know - it's hard to tell what happens without basic analysis tools. 
> > Is there _any_ way to profile what happens on that system? (Do hrtimers 
> > work on it that could be used to profile it?)
>
> oprofile doesn't have any support for it (mips r4k, no generic 
> perfcounters), the only usable clock source is a simple cpu cycle 
> counter (which is also used for the timer interrupt).

A simple cpu cycle counter ought to be enough to get pretty good 
perfcounters support going on that box.

It takes a surprisingly small amount of code to do that, and a large 
portion of the perf tooling should then work out of box. Here's a few 
example commits of minimal perfcounters support, on other architectures:

 310d6b6: [S390] wire up sys_perf_counter_open
 2d4618d: parisc: perf: wire up sys_perf_counter_open
 19470e1: sh: Wire up sys_perf_counter_open.

Takes about 15 well placed lines of code, if there are no other 
complications on MIPS ;-)

	Ingo

^ permalink raw reply	[flat|nested] 216+ messages in thread

* Re: BFS vs. mainline scheduler benchmarks and measurements
  2009-09-17  1:29                   ` Benjamin Herrenschmidt
@ 2009-10-01  9:36                     ` Frans Pop
  0 siblings, 0 replies; 216+ messages in thread
From: Frans Pop @ 2009-10-01  9:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: arjan, realnc, linux-kernel

Benjamin Herrenschmidt wrote:
> On Wed, 2009-09-16 at 20:27 +0200, Frans Pop wrote:
>> Benjamin Herrenschmidt wrote:
>> > I'll have a look after the merge window madness. Multiple windows is
>> > also still an option I suppose even if i don't like it that much: we
>> > could support double-click on an app or "global" in the left list,
>> > making that pop a new window with the same content as the right pane
>> > for that app (or global) that updates at the same time as the rest.
>>  
>> I have another request. If I select a specific application to watch (say
>> a mail client) but it is idle for a while and thus has no latencies, it
>> will get dropped from the list and thus my selection of it will be lost.
>> 
>> It would be nice if in that case a selected application would stay
>> visible and selected, or maybe get reselected automatically when it
>> appears again.
> 
> Hrm... I though I forced the selected app to remain ... or maybe I
> wanted to do that and failed :-) Ok. On the list. Please ping me next
> week if nothing happens.

As requested: ping?

And while I'm writing anyway, one more suggestion.
I find the fact that the buttons jump twice every 30 seconds (because of a 
change in the timer between <10 and >=10 seconds) slightly annoying.
Any chance of making the position of the buttons fixed? One option could be 
moving the timer to the left side of the bottom bar.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 216+ messages in thread

end of thread, other threads:[~2009-10-01  9:36 UTC | newest]

Thread overview: 216+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-06 20:59 BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
2009-09-07  2:05 ` Frans Pop
2009-09-07 12:16   ` [quad core results] " Ingo Molnar
2009-09-07 12:36     ` Stefan Richter
2009-09-07 13:41     ` Markus Tornqvist
2009-09-07 13:59       ` Ingo Molnar
2009-09-09  5:54         ` Markus Tornqvist
2009-09-07 14:45       ` Arjan van de Ven
2009-09-07 15:20         ` Frans Pop
2009-09-07 15:36           ` Arjan van de Ven
2009-09-07 15:47             ` Frans Pop
2009-09-07 15:24         ` Xavier Bestel
2009-09-07 15:37           ` Arjan van de Ven
2009-09-07 16:00           ` Diego Calleja
2009-09-07 15:34     ` Nikos Chantziaras
2009-09-07  3:38 ` Nikos Chantziaras
2009-09-07 11:01   ` Frederic Weisbecker
2009-09-08 18:15     ` Nikos Chantziaras
2009-09-10 20:25       ` Frederic Weisbecker
2009-09-07 14:40   ` Arjan van de Ven
2009-09-08  7:19     ` Nikos Chantziaras
2009-09-08  8:31       ` Arjan van de Ven
2009-09-08 20:22         ` Frans Pop
2009-09-08 21:10           ` Michal Schmidt
2009-09-08 21:11           ` Frans Pop
2009-09-08 21:40             ` GeunSik Lim
2009-09-08 22:36               ` Frans Pop
2009-09-09  9:53           ` Benjamin Herrenschmidt
2009-09-09 11:14             ` David Newall
2009-09-09 11:32               ` Benjamin Herrenschmidt
2009-09-09 11:55             ` Frans Pop
2009-09-11  1:36               ` Benjamin Herrenschmidt
2009-09-16 18:27                 ` Frans Pop
2009-09-17  1:29                   ` Benjamin Herrenschmidt
2009-10-01  9:36                     ` Frans Pop
2009-09-08  8:38       ` Arjan van de Ven
2009-09-08 10:13         ` Nikos Chantziaras
2009-09-08 11:32           ` Juergen Beisert
2009-09-08 22:00             ` Nikos Chantziaras
2009-09-08 23:20               ` Jiri Kosina
2009-09-08 23:38                 ` Nikos Chantziaras
2009-09-08 12:03           ` Theodore Tso
2009-09-08 21:28             ` Nikos Chantziaras
2009-09-08 14:20           ` Arjan van de Ven
2009-09-08 22:53             ` Nikos Chantziaras
2009-09-07 23:54   ` Thomas Fjellstrom
2009-09-08 11:30     ` Nikos Chantziaras
2009-09-07  3:50 ` Con Kolivas
2009-09-07 18:20   ` Jerome Glisse
2009-09-07  9:49 ` Jens Axboe
2009-09-07 10:12   ` Nikos Chantziaras
2009-09-07 10:41     ` Jens Axboe
2009-09-07 11:57   ` Jens Axboe
2009-09-07 14:14     ` Ingo Molnar
2009-09-07 17:38       ` Jens Axboe
2009-09-07 20:44         ` Jens Axboe
2009-09-08  9:13           ` Jens Axboe
2009-09-08 15:23             ` Peter Zijlstra
2009-09-08 20:34               ` Jens Axboe
2009-09-09  6:13                 ` Ingo Molnar
2009-09-09  8:34                   ` Nikos Chantziaras
2009-09-09  8:52                   ` Mike Galbraith
2009-09-09  9:02                     ` Peter Zijlstra
2009-09-09  9:18                       ` Mike Galbraith
2009-09-09  9:05                     ` Nikos Chantziaras
2009-09-09  9:17                       ` Peter Zijlstra
2009-09-09  9:40                         ` Nikos Chantziaras
2009-09-09 10:17                           ` Nikos Chantziaras
2009-09-10 19:45                         ` Martin Steigerwald
2009-09-10 20:06                           ` Ingo Molnar
2009-09-10 20:39                             ` Martin Steigerwald
2009-09-10 20:42                               ` Ingo Molnar
2009-09-10 21:19                                 ` Martin Steigerwald
2009-09-11  9:26                                   ` Mat
2009-09-12 11:26                                     ` Martin Steigerwald
2009-09-09  9:10                     ` Jens Axboe
2009-09-09 11:54                       ` Jens Axboe
2009-09-09 12:20                         ` Jens Axboe
2009-09-09 18:04                           ` Ingo Molnar
2009-09-09 20:12                             ` Nikos Chantziaras
2009-09-09 20:50                               ` Jens Axboe
2009-09-10  1:02                                 ` Con Kolivas
2009-09-10 11:03                                   ` Jens Axboe
2009-09-10  3:15                               ` Mike Galbraith
2009-09-10  6:08                               ` Ingo Molnar
2009-09-10  6:40                                 ` Ingo Molnar
2009-09-10  9:54                                   ` Jens Axboe
2009-09-10 10:03                                     ` Ingo Molnar
2009-09-10 10:11                                       ` Jens Axboe
2009-09-10 10:28                                         ` Jens Axboe
2009-09-10 10:57                                           ` Mike Galbraith
2009-09-10 11:09                                             ` Jens Axboe
2009-09-10 11:21                                               ` Mike Galbraith
2009-09-10 11:24                                                 ` Jens Axboe
2009-09-10 11:28                                                   ` Mike Galbraith
2009-09-10 11:35                                                     ` Jens Axboe
2009-09-10 11:42                                                       ` Mike Galbraith
2009-09-10 16:02                                 ` Bret Towe
2009-09-10 16:05                                   ` Peter Zijlstra
2009-09-10 16:12                                     ` Bret Towe
2009-09-10 16:26                                       ` Ingo Molnar
2009-09-10 16:33                                         ` Bret Towe
2009-09-10 17:03                                           ` Ingo Molnar
2009-09-10 17:53                                 ` Nikos Chantziaras
2009-09-10 18:46                                   ` Ingo Molnar
2009-09-10 18:51                                   ` [tip:sched/core] sched: Disable NEW_FAIR_SLEEPERS for now tip-bot for Ingo Molnar
2009-09-10 18:57                                   ` [tip:sched/core] sched: Fix sched::sched_stat_wait tracepoint field tip-bot for Ingo Molnar
2009-09-10  9:48                             ` BFS vs. mainline scheduler benchmarks and measurements Jens Axboe
2009-09-10  9:59                               ` Ingo Molnar
2009-09-10 10:01                                 ` Jens Axboe
2009-09-10  6:55                           ` Peter Zijlstra
2009-09-10  6:58                             ` Jens Axboe
2009-09-10  7:04                               ` Ingo Molnar
2009-09-10  9:44                                 ` Jens Axboe
2009-09-10  9:45                                   ` Jens Axboe
2009-09-10 13:53                                   ` Steven Rostedt
2009-09-10  7:33                               ` Jens Axboe
2009-09-10  7:49                                 ` Ingo Molnar
2009-09-10  7:53                                   ` Jens Axboe
2009-09-10 10:02                                     ` Ingo Molnar
2009-09-10 10:09                                       ` Jens Axboe
2009-09-10 18:00                                       ` [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable Ingo Molnar
2009-09-11  7:37                                         ` Ingo Molnar
2009-09-11  7:48                                           ` Martin Schwidefsky
2009-09-11 13:33                                           ` Martin Schwidefsky
2009-09-11 18:22                                             ` [tip:timers/core] clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash tip-bot for Martin Schwidefsky
2009-09-14 15:19                                             ` [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable Ingo Molnar
2009-09-14 15:37                                               ` Martin Schwidefsky
2009-09-14 17:59                                               ` Martin Schwidefsky
2009-09-10  6:59                             ` BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
2009-09-09 12:48                         ` Mike Galbraith
2009-09-09 15:37                     ` [tip:sched/core] sched: Turn off child_runs_first tip-bot for Mike Galbraith
2009-09-09 17:57                       ` Theodore Tso
2009-09-09 18:08                         ` Ingo Molnar
2009-09-09 18:59                           ` Chris Friesen
2009-09-09 19:48                           ` Pavel Machek
2009-09-09 15:37                     ` [tip:sched/core] sched: Re-tune the scheduler latency defaults to decrease worst-case latencies tip-bot for Mike Galbraith
2009-09-12 11:45                       ` Martin Steigerwald
2009-09-09 15:37                     ` [tip:sched/core] sched: Keep kthreads at default priority tip-bot for Mike Galbraith
2009-09-09 16:55                       ` Dmitry Torokhov
2009-09-09 17:06                         ` Peter Zijlstra
2009-09-09 17:34                           ` Mike Galbraith
2009-09-12 11:48                             ` Martin Steigerwald
2009-09-12 12:19                               ` Mike Galbraith
2009-09-09 11:52               ` BFS vs. mainline scheduler benchmarks and measurements Nikos Chantziaras
2009-09-07 18:02   ` Avi Kivity
2009-09-07 18:46     ` Jens Axboe
2009-09-07 20:36       ` Ingo Molnar
2009-09-07 20:46         ` Jens Axboe
2009-09-07 21:03           ` Peter Zijlstra
2009-09-07 21:05             ` Jens Axboe
2009-09-07 22:18               ` Ingo Molnar
2009-09-09  7:38   ` Pavel Machek
2009-09-10 12:19     ` latt location (Was Re: BFS vs. mainline scheduler benchmarks and measurements) Jens Axboe
2009-09-07 15:16 ` BFS vs. mainline scheduler benchmarks and measurements Michael Buesch
2009-09-07 18:26   ` Ingo Molnar
2009-09-07 18:47     ` Daniel Walker
2009-09-07 18:51     ` Michael Buesch
2009-09-07 20:57       ` Ingo Molnar
2009-09-07 23:24         ` Pekka Pietikainen
2009-09-08  8:04           ` Ingo Molnar
2009-09-08  8:13             ` Nikos Chantziaras
2009-09-08 10:12               ` Ingo Molnar
2009-09-08 10:40                 ` Nikos Chantziaras
2009-09-08 11:35                   ` Ingo Molnar
2009-09-08 19:06                     ` Nikos Chantziaras
2009-09-08 12:00                 ` el_es
2009-09-08 15:45         ` Michael Buesch
2009-09-08  7:48     ` Ingo Molnar
2009-09-08  9:50       ` Benjamin Herrenschmidt
2009-09-08 13:09         ` Ralf Baechle
2009-09-09  1:36           ` Felix Fietkau
2009-09-08 13:09         ` Felix Fietkau
2009-09-09  0:28           ` Benjamin Herrenschmidt
2009-09-09  0:37             ` David Miller
2009-09-08 14:45       ` Michael Buesch
2009-09-18 11:24         ` Ingo Molnar
2009-09-18 14:46           ` Felix Fietkau
2009-09-19 18:01             ` Ingo Molnar
2009-09-19 18:43               ` Felix Fietkau
2009-09-19 19:39                 ` Ingo Molnar
2009-09-19 20:15                   ` Felix Fietkau
2009-09-19 20:22                     ` Ingo Molnar
2009-09-19 20:33                       ` Felix Fietkau
2009-09-20 18:10                         ` Ingo Molnar
2009-09-08 12:57 ` Epic regression in throughput since v2.6.23 Serge Belyshev
2009-09-08 17:47   ` Jesse Brandeburg
2009-09-08 18:20     ` Nikos Chantziaras
2009-09-08 19:00     ` Jeff Garzik
2009-09-08 19:20       ` Serge Belyshev
2009-09-08 19:26         ` Jeff Garzik
2009-09-08 18:37   ` Nikos Chantziaras
2009-09-08 22:15   ` Serge Belyshev
2009-09-09 15:52     ` Ingo Molnar
2009-09-09 20:49       ` Serge Belyshev
2009-09-09 21:23         ` Cory Fields
2009-09-10  6:53         ` Ingo Molnar
2009-09-10 23:23           ` Serge Belyshev
2009-09-11  6:10             ` Ingo Molnar
2009-09-11  8:55               ` Serge Belyshev
2009-09-13 15:27               ` Serge Belyshev
2009-09-13 15:47                 ` Ingo Molnar
2009-09-13 19:17                   ` Mike Galbraith
2009-09-14  6:15                     ` Mike Galbraith
2009-09-16 19:45                 ` Ingo Molnar
2009-09-16 23:18                   ` Serge Belyshev
2009-09-17  4:55                     ` [patchlet] " Mike Galbraith
2009-09-17  5:06                       ` Mike Galbraith
2009-09-17  7:21                         ` Ingo Molnar
2009-09-10  7:43 ` [updated] BFS vs. mainline scheduler benchmarks and measurements Ingo Molnar
2009-09-14  9:46 ` Phoronix CFS vs BFS bencharks Nikos Chantziaras
2009-09-14 11:35   ` Mike Galbraith
     [not found]     ` <f42384a10909140727k463ff460q3859892dcb79bcc5@mail.gmail.com>
2009-09-14 15:32       ` Mike Galbraith
2009-09-14 19:14         ` Marcin Letyns
2009-09-14 20:49           ` Willy Tarreau
2009-09-15  8:37             ` Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.