* unpredictability in scheduler test results
@ 2008-09-18 22:45 Chris Friesen
2008-09-24 15:19 ` unpredictability in scheduler test results -- still present Chris Friesen
0 siblings, 1 reply; 7+ messages in thread
From: Chris Friesen @ 2008-09-18 22:45 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, linux-kernel
I was running some tests with the "fairtest" testcase and noticed that
successive runs could give wildly different results.
I was originally using the tip/master tree as of Sep 16, but I also
confirmed the behaviour with Linus' tree as of Sep 14 (with the
__load_balance_iterator() fix applied). The same behaviour is present
in both cases.
I'm using the test config listed at the bottom. It's pretty
straightforward.
The first run gave the following results. As expected, the system
picked a static task distribution and didn't migrate tasks during the test.
group actual(%) expected(%) avg latency(ms) max_latency(ms)
1 33.31(33.33/33.2 30.00 23/23 37/37
2 36.29 40.00 5 25
3 30.40(27.40/33.40) 30.00 22/23 60/40
On the second run, the task distribution is almost perfect, but the
system was only using one of the two cpus as seen by the difference
between actual and expected cpu time.
Warning, actual cpu time different than expected. actual: 10033.011108,
expected: 20000.000000
group actual(%) expected(%) avg latency(ms) max_latency(ms)
1 0.24(30.59/29.88) 30.00 26/27 68/58
2 39.87 40.00 20 36
3 29.89(29.87/29.91) 30.00 28/27 47/60
Any ideas what's going on?
Chris
test config file:
#delay (secs)
1
#duration (secs)
10
#groupname,share,numhogs
1,750,n
2,1000,1
3,750,n
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unpredictability in scheduler test results -- still present
2008-09-18 22:45 unpredictability in scheduler test results Chris Friesen
@ 2008-09-24 15:19 ` Chris Friesen
2008-09-24 23:37 ` Chris Friesen
0 siblings, 1 reply; 7+ messages in thread
From: Chris Friesen @ 2008-09-24 15:19 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, linux-kernel
Chris Friesen wrote:
> I'm using the test config listed at the bottom. It's pretty
> straightforward.
> On the second run, the task distribution is almost perfect, but the
> system was only using one of the two cpus as seen by the difference
> between actual and expected cpu time.
>
> Warning, actual cpu time different than expected. actual: 10033.011108,
> expected: 20000.000000
> group actual(%) expected(%) avg latency(ms) max_latency(ms)
> 1 0.24(30.59/29.88) 30.00 26/27 68/58
> 2 39.87 40.00 20 36
> 3 29.89(29.87/29.91) 30.00 28/27 47/60
This behaviour (that load balancing is messed up) is now almost
continuous with both current tip/master and current Linus git. On the
first test after booting, it seems to work okay (although there are
still issues with fairness). On every subsequent test, fairness is good
but it only uses one of the two cpus.
Also, building a kernel with "-j10" results in one cpu being mostly idle
while the other one is 100% busy. It used to be both 100% busy--if I get
time today I may try bisecting it.
Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unpredictability in scheduler test results -- still present
2008-09-24 15:19 ` unpredictability in scheduler test results -- still present Chris Friesen
@ 2008-09-24 23:37 ` Chris Friesen
2008-09-27 20:04 ` Ingo Molnar
0 siblings, 1 reply; 7+ messages in thread
From: Chris Friesen @ 2008-09-24 23:37 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, linux-kernel
Chris Friesen wrote:
> This behaviour (that load balancing is messed up) is now almost
> continuous with both current tip/master and current Linus git. On the
> first test after booting, it seems to work okay (although there are
> still issues with fairness). On every subsequent test, fairness is good
> but it only uses one of the two cpus.
>
> Also, building a kernel with "-j10" results in one cpu being mostly idle
> while the other one is 100% busy. It used to be both 100% busy--if I get
> time today I may try bisecting it.
It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load
balancing problem go away and causes all cpus to be used.
With this option enabled, the problem seems to be present as far back as
2.6.27-rc2. (2.6.27-rc1 doesn't compile on my machine, and 2.6.26
doesn't have ftrace).
I have no idea why turning on dynamic ftrace would affect load balancing
behaviour, but it's very repeatable. The very first test run after
booting works fine, and all successive runs fail to balance properly.
Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unpredictability in scheduler test results -- still present
2008-09-24 23:37 ` Chris Friesen
@ 2008-09-27 20:04 ` Ingo Molnar
2008-09-29 15:43 ` Chris Friesen
0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2008-09-27 20:04 UTC (permalink / raw)
To: Chris Friesen; +Cc: Peter Zijlstra, linux-kernel, Steven Rostedt
* Chris Friesen <cfriesen@nortel.com> wrote:
> Chris Friesen wrote:
>
>> This behaviour (that load balancing is messed up) is now almost
>> continuous with both current tip/master and current Linus git. On the
>> first test after booting, it seems to work okay (although there are
>> still issues with fairness). On every subsequent test, fairness is
>> good but it only uses one of the two cpus.
>>
>> Also, building a kernel with "-j10" results in one cpu being mostly
>> idle while the other one is 100% busy. It used to be both 100% busy--if
>> I get time today I may try bisecting it.
>
> It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load
> balancing problem go away and causes all cpus to be used.
>
> With this option enabled, the problem seems to be present as far back
> as 2.6.27-rc2. (2.6.27-rc1 doesn't compile on my machine, and 2.6.26
> doesn't have ftrace).
>
> I have no idea why turning on dynamic ftrace would affect load
> balancing behaviour, but it's very repeatable. The very first test
> run after booting works fine, and all successive runs fail to balance
> properly.
very weird. Would be very nice to figure it out.
and in tip/master we dont have the 'ftraced' kernel-patching kernel
thread anymore, so ftrace should be passive by all means.
OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in
the .config, or also activating it via /debug/tracing/current_tracer?
Ingo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unpredictability in scheduler test results -- still present
2008-09-27 20:04 ` Ingo Molnar
@ 2008-09-29 15:43 ` Chris Friesen
2008-09-30 11:12 ` Ingo Molnar
0 siblings, 1 reply; 7+ messages in thread
From: Chris Friesen @ 2008-09-29 15:43 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel, Steven Rostedt
Ingo Molnar wrote:
> * Chris Friesen <cfriesen@nortel.com> wrote:
>> It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load
>> balancing problem go away and causes all cpus to be used.
>>
>> With this option enabled, the problem seems to be present as far back
>> as 2.6.27-rc2. (2.6.27-rc1 doesn't compile on my machine, and 2.6.26
>> doesn't have ftrace).
>>
>> I have no idea why turning on dynamic ftrace would affect load
>> balancing behaviour, but it's very repeatable. The very first test
>> run after booting works fine, and all successive runs fail to balance
>> properly.
> OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in
> the .config, or also activating it via /debug/tracing/current_tracer?
Just enabling it in the .config is enough to trigger the behaviour
change. I'm not explicitly activating any traces.
Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unpredictability in scheduler test results -- still present
2008-09-29 15:43 ` Chris Friesen
@ 2008-09-30 11:12 ` Ingo Molnar
2008-09-30 21:14 ` Chris Friesen
0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2008-09-30 11:12 UTC (permalink / raw)
To: Chris Friesen; +Cc: Peter Zijlstra, linux-kernel, Steven Rostedt
* Chris Friesen <cfriesen@nortel.com> wrote:
> Ingo Molnar wrote:
>> * Chris Friesen <cfriesen@nortel.com> wrote:
>
>>> It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load
>>> balancing problem go away and causes all cpus to be used.
>>>
>>> With this option enabled, the problem seems to be present as far back
>>> as 2.6.27-rc2. (2.6.27-rc1 doesn't compile on my machine, and 2.6.26
>>> doesn't have ftrace).
>>>
>>> I have no idea why turning on dynamic ftrace would affect load
>>> balancing behaviour, but it's very repeatable. The very first test
>>> run after booting works fine, and all successive runs fail to balance
>>> properly.
>
>> OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in
>> the .config, or also activating it via /debug/tracing/current_tracer?
>
> Just enabling it in the .config is enough to trigger the behaviour
> change. I'm not explicitly activating any traces.
ok, that would be a clear ftrace bug i guess?
Ingo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unpredictability in scheduler test results -- still present
2008-09-30 11:12 ` Ingo Molnar
@ 2008-09-30 21:14 ` Chris Friesen
0 siblings, 0 replies; 7+ messages in thread
From: Chris Friesen @ 2008-09-30 21:14 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel, Steven Rostedt
Ingo Molnar wrote:
> * Chris Friesen <cfriesen@nortel.com> wrote:
>> Ingo Molnar wrote:
>>> OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in
>>> the .config, or also activating it via /debug/tracing/current_tracer?
>> Just enabling it in the .config is enough to trigger the behaviour
>> change. I'm not explicitly activating any traces.
> ok, that would be a clear ftrace bug i guess?
It's either an ftrace bug or a fragile load balancer bug. I wonder if
it's related somehow to the stop_machine() call in ftrace_dynamic_init()?
Chris
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-09-30 21:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-18 22:45 unpredictability in scheduler test results Chris Friesen
2008-09-24 15:19 ` unpredictability in scheduler test results -- still present Chris Friesen
2008-09-24 23:37 ` Chris Friesen
2008-09-27 20:04 ` Ingo Molnar
2008-09-29 15:43 ` Chris Friesen
2008-09-30 11:12 ` Ingo Molnar
2008-09-30 21:14 ` Chris Friesen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).