linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* unpredictability in scheduler test results
@ 2008-09-18 22:45 Chris Friesen
  2008-09-24 15:19 ` unpredictability in scheduler test results -- still present Chris Friesen
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Friesen @ 2008-09-18 22:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, linux-kernel

I was running some tests with the "fairtest" testcase and noticed that 
successive runs could give wildly different results.

I was originally using the tip/master tree as of Sep 16, but I also 
confirmed the behaviour with Linus' tree as of Sep 14 (with the 
__load_balance_iterator() fix applied).  The same behaviour is present 
in both cases.

I'm using the test config listed at the bottom.  It's pretty 
straightforward.

The first run gave the following results.  As expected, the system 
picked a static task distribution and didn't migrate tasks during the test.

group       actual(%)     expected(%)  avg latency(ms) max_latency(ms)
     1   33.31(33.33/33.2    30.00      23/23            37/37
     2        36.29          40.00       5               25
     3   30.40(27.40/33.40)  30.00      22/23            60/40



On the second run, the task distribution is almost perfect, but the 
system was only using one of the two cpus as seen by the difference 
between actual and expected cpu time.

Warning, actual cpu time different than expected. actual: 10033.011108, 
expected: 20000.000000
group       actual(%)     expected(%)  avg latency(ms) max_latency(ms)
     1   0.24(30.59/29.88)    30.00      26/27             68/58
     2       39.87            40.00       20                36
     3   29.89(29.87/29.91)   30.00      28/27             47/60


Any ideas what's going on?

Chris



test config file:
#delay (secs)
1

#duration (secs)
10

#groupname,share,numhogs
1,750,n
2,1000,1
3,750,n



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unpredictability in scheduler test results -- still present
  2008-09-18 22:45 unpredictability in scheduler test results Chris Friesen
@ 2008-09-24 15:19 ` Chris Friesen
  2008-09-24 23:37   ` Chris Friesen
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Friesen @ 2008-09-24 15:19 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, linux-kernel

Chris Friesen wrote:

> I'm using the test config listed at the bottom.  It's pretty 
> straightforward.

> On the second run, the task distribution is almost perfect, but the 
> system was only using one of the two cpus as seen by the difference 
> between actual and expected cpu time.
> 
> Warning, actual cpu time different than expected. actual: 10033.011108, 
> expected: 20000.000000
> group       actual(%)     expected(%)  avg latency(ms) max_latency(ms)
>      1   0.24(30.59/29.88)    30.00      26/27             68/58
>      2       39.87            40.00       20                36
>      3   29.89(29.87/29.91)   30.00      28/27             47/60

This behaviour (that load balancing is messed up) is now almost 
continuous with both current tip/master and current Linus git.  On the 
first test after booting, it seems to work okay (although there are 
still issues with fairness).  On every subsequent test, fairness is good 
but it only uses one of the two cpus.

Also, building a kernel with "-j10" results in one cpu being mostly idle 
while the other one is 100% busy. It used to be both 100% busy--if I get 
time today I may try bisecting it.

Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unpredictability in scheduler test results -- still present
  2008-09-24 15:19 ` unpredictability in scheduler test results -- still present Chris Friesen
@ 2008-09-24 23:37   ` Chris Friesen
  2008-09-27 20:04     ` Ingo Molnar
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Friesen @ 2008-09-24 23:37 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, linux-kernel

Chris Friesen wrote:

> This behaviour (that load balancing is messed up) is now almost 
> continuous with both current tip/master and current Linus git.  On the 
> first test after booting, it seems to work okay (although there are 
> still issues with fairness).  On every subsequent test, fairness is good 
> but it only uses one of the two cpus.
> 
> Also, building a kernel with "-j10" results in one cpu being mostly idle 
> while the other one is 100% busy. It used to be both 100% busy--if I get 
> time today I may try bisecting it.

It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load 
balancing problem go away and causes all cpus to be used.

With this option enabled, the problem seems to be present as far back as 
2.6.27-rc2.  (2.6.27-rc1 doesn't compile on my machine, and 2.6.26 
doesn't have ftrace).

I have no idea why turning on dynamic ftrace would affect load balancing 
behaviour, but it's very repeatable.  The very first test run after 
booting works fine, and all successive runs fail to balance properly.

Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unpredictability in scheduler test results -- still present
  2008-09-24 23:37   ` Chris Friesen
@ 2008-09-27 20:04     ` Ingo Molnar
  2008-09-29 15:43       ` Chris Friesen
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2008-09-27 20:04 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Peter Zijlstra, linux-kernel, Steven Rostedt


* Chris Friesen <cfriesen@nortel.com> wrote:

> Chris Friesen wrote:
>
>> This behaviour (that load balancing is messed up) is now almost  
>> continuous with both current tip/master and current Linus git.  On the  
>> first test after booting, it seems to work okay (although there are  
>> still issues with fairness).  On every subsequent test, fairness is 
>> good but it only uses one of the two cpus.
>>
>> Also, building a kernel with "-j10" results in one cpu being mostly 
>> idle while the other one is 100% busy. It used to be both 100% busy--if 
>> I get time today I may try bisecting it.
>
> It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load 
> balancing problem go away and causes all cpus to be used.
>
> With this option enabled, the problem seems to be present as far back 
> as 2.6.27-rc2.  (2.6.27-rc1 doesn't compile on my machine, and 2.6.26 
> doesn't have ftrace).
>
> I have no idea why turning on dynamic ftrace would affect load 
> balancing behaviour, but it's very repeatable.  The very first test 
> run after booting works fine, and all successive runs fail to balance 
> properly.

very weird. Would be very nice to figure it out.

and in tip/master we dont have the 'ftraced' kernel-patching kernel 
thread anymore, so ftrace should be passive by all means.

OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in 
the .config, or also activating it via /debug/tracing/current_tracer?

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unpredictability in scheduler test results -- still present
  2008-09-27 20:04     ` Ingo Molnar
@ 2008-09-29 15:43       ` Chris Friesen
  2008-09-30 11:12         ` Ingo Molnar
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Friesen @ 2008-09-29 15:43 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel, Steven Rostedt

Ingo Molnar wrote:
> * Chris Friesen <cfriesen@nortel.com> wrote:

>> It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load 
>> balancing problem go away and causes all cpus to be used.
>>
>> With this option enabled, the problem seems to be present as far back 
>> as 2.6.27-rc2.  (2.6.27-rc1 doesn't compile on my machine, and 2.6.26 
>> doesn't have ftrace).
>>
>> I have no idea why turning on dynamic ftrace would affect load 
>> balancing behaviour, but it's very repeatable.  The very first test 
>> run after booting works fine, and all successive runs fail to balance 
>> properly.

> OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in 
> the .config, or also activating it via /debug/tracing/current_tracer?

Just enabling it in the .config is enough to trigger the behaviour 
change.  I'm not explicitly activating any traces.

Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unpredictability in scheduler test results -- still present
  2008-09-29 15:43       ` Chris Friesen
@ 2008-09-30 11:12         ` Ingo Molnar
  2008-09-30 21:14           ` Chris Friesen
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2008-09-30 11:12 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Peter Zijlstra, linux-kernel, Steven Rostedt


* Chris Friesen <cfriesen@nortel.com> wrote:

> Ingo Molnar wrote:
>> * Chris Friesen <cfriesen@nortel.com> wrote:
>
>>> It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load  
>>> balancing problem go away and causes all cpus to be used.
>>>
>>> With this option enabled, the problem seems to be present as far back 
>>> as 2.6.27-rc2.  (2.6.27-rc1 doesn't compile on my machine, and 2.6.26 
>>> doesn't have ftrace).
>>>
>>> I have no idea why turning on dynamic ftrace would affect load  
>>> balancing behaviour, but it's very repeatable.  The very first test  
>>> run after booting works fine, and all successive runs fail to balance 
>>> properly.
>
>> OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in  
>> the .config, or also activating it via /debug/tracing/current_tracer?
>
> Just enabling it in the .config is enough to trigger the behaviour 
> change.  I'm not explicitly activating any traces.

ok, that would be a clear ftrace bug i guess?

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unpredictability in scheduler test results -- still present
  2008-09-30 11:12         ` Ingo Molnar
@ 2008-09-30 21:14           ` Chris Friesen
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Friesen @ 2008-09-30 21:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel, Steven Rostedt

Ingo Molnar wrote:
> * Chris Friesen <cfriesen@nortel.com> wrote:
>> Ingo Molnar wrote:

>>> OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in  
>>> the .config, or also activating it via /debug/tracing/current_tracer?

>> Just enabling it in the .config is enough to trigger the behaviour 
>> change.  I'm not explicitly activating any traces.

> ok, that would be a clear ftrace bug i guess?

It's either an ftrace bug or a fragile load balancer bug.  I wonder if 
it's related somehow to the stop_machine() call in ftrace_dynamic_init()?

Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-09-30 21:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-18 22:45 unpredictability in scheduler test results Chris Friesen
2008-09-24 15:19 ` unpredictability in scheduler test results -- still present Chris Friesen
2008-09-24 23:37   ` Chris Friesen
2008-09-27 20:04     ` Ingo Molnar
2008-09-29 15:43       ` Chris Friesen
2008-09-30 11:12         ` Ingo Molnar
2008-09-30 21:14           ` Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).