linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* -mm seems significanty slower than mainline on kernbench
@ 2006-01-11  1:14 Martin Bligh
  2006-01-11  1:31 ` Andrew Morton
  0 siblings, 1 reply; 59+ messages in thread
From: Martin Bligh @ 2006-01-11  1:14 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton

OK, I fixed the graphs so you can actually read them now ;-)

http://test.kernel.org/perf/kernbench.elm3b6.png (x86_64 4x)
http://test.kernel.org/perf/kernbench.moe.png (NUMA-Q)
http://test.kernel.org/perf/kernbench.elm3b132.png (4x SMP ia32)

Both seems significantly slower on -mm (mm is green line)

If I look at diffprofile between 2.6.15 and 2.6.15-mm1, it just looks
like we have lots more idle time. You got strange scheduler changes in
there, that you've been carrying for a long time (2.6.14-mm1 at least)? 
or HZ piddling? See to be mainly getting much more idle time.

Diffprofile:

       1278    39.6% default_idle
       1261    10.1% total
        243  1518.8% find_get_page
        220     0.0% copy_user_generic
        100   357.1% free_hot_cold_page
...
       -106   -93.0% __pagevec_free
       -125  -100.0% __free_pages_ok
       -239  -100.0% copy_user_generic_c
       -242  -100.0% find_trylock_page

Original profiles:
http://test.kernel.org/19657/002.kernbench.test/profiling/profile.text 
(2.6.15)
http://test.kernel.org/19794/002.kernbench.test/profiling/profile.text
(2.6.15-mm1)


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  1:14 -mm seems significanty slower than mainline on kernbench Martin Bligh
@ 2006-01-11  1:31 ` Andrew Morton
  2006-01-11  1:41   ` Martin Bligh
  0 siblings, 1 reply; 59+ messages in thread
From: Andrew Morton @ 2006-01-11  1:31 UTC (permalink / raw)
  To: Martin Bligh; +Cc: linux-kernel, Ingo Molnar

Martin Bligh <mbligh@google.com> wrote:
>
> OK, I fixed the graphs so you can actually read them now ;-)

They're cute.

> http://test.kernel.org/perf/kernbench.elm3b6.png (x86_64 4x)
> http://test.kernel.org/perf/kernbench.moe.png (NUMA-Q)
> http://test.kernel.org/perf/kernbench.elm3b132.png (4x SMP ia32)
> 
> Both seems significantly slower on -mm (mm is green line)

Well, 1% slower.  -mm has permanent not-for-linus debug things, some of
which are expected to have a performance impact.  I don't know whether
they'd have a 1% impact though.

> If I look at diffprofile between 2.6.15 and 2.6.15-mm1, it just looks
> like we have lots more idle time.

Yes, we do.   It'd be useful to test -git7..

> You got strange scheduler changes in
> there, that you've been carrying for a long time (2.6.14-mm1 at least)? 
> or HZ piddling? See to be mainly getting much more idle time.

Yes, there are CPU scheduler changes, although much fewer than usual. 
Ingo, any suggestions as to a culprit?


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  1:31 ` Andrew Morton
@ 2006-01-11  1:41   ` Martin Bligh
  2006-01-11  1:48     ` Andrew Morton
                       ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Martin Bligh @ 2006-01-11  1:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Ingo Molnar

Andrew Morton wrote:
> Martin Bligh <mbligh@google.com> wrote:
> 
>>OK, I fixed the graphs so you can actually read them now ;-)
> 
> They're cute.

Thanks. Hopefully we can also make them useful ;-)

Have finally got some time to play, and am working on a "push" 
comparison model that'll send you email pro-actively.

>>http://test.kernel.org/perf/kernbench.elm3b6.png (x86_64 4x)
>>http://test.kernel.org/perf/kernbench.moe.png (NUMA-Q)
>>http://test.kernel.org/perf/kernbench.elm3b132.png (4x SMP ia32)
>>
>>Both seems significantly slower on -mm (mm is green line)
> 
> 
> Well, 1% slower.  -mm has permanent not-for-linus debug things, some of
> which are expected to have a performance impact.  I don't know whether
> they'd have a 1% impact though.

OK, fair enough. Can I turn them off somehow to check? it's 10% on 
NUMA-Q. The good news is that it's stayed in -mm for long time, so ...
am praying.

Is cool to have extra debug stuff. It does make it harder to check perf 
though ... if we can do runs both with and without, it'd be ideal, I 
guess. I'd like to be able to spot perf degredations before they hit 
mainline.

>>If I look at diffprofile between 2.6.15 and 2.6.15-mm1, it just looks
>>like we have lots more idle time.
> 
> 
> Yes, we do.   It'd be useful to test -git7..

Will do. it does all of them.

>>You got strange scheduler changes in
>>there, that you've been carrying for a long time (2.6.14-mm1 at least)? 
>>or HZ piddling? See to be mainly getting much more idle time.
> 
> Yes, there are CPU scheduler changes, although much fewer than usual. 
> Ingo, any suggestions as to a culprit?

I'd truncated all -mm info in the filtering before 2.6.14 .. am putting 
it back so we can see clearly ... done. Look again.

Seems to have gone wrong between 2.6.14-rc1-mm1 and 2.6.14-rc2-mm1 ?
See http://test.kernel.org/perf/kernbench.moe.png for clearest effect.

M.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  1:41   ` Martin Bligh
@ 2006-01-11  1:48     ` Andrew Morton
  2006-01-11  1:49     ` Con Kolivas
  2006-01-11  1:52     ` Andrew Morton
  2 siblings, 0 replies; 59+ messages in thread
From: Andrew Morton @ 2006-01-11  1:48 UTC (permalink / raw)
  To: Martin Bligh; +Cc: linux-kernel, mingo

Martin Bligh <mbligh@google.com> wrote:
>
> Seems to have gone wrong between 2.6.14-rc1-mm1 and 2.6.14-rc2-mm1 ?
> See http://test.kernel.org/perf/kernbench.moe.png for clearest effect.

akpm:/usr/src/25> diff -u pc/2.6.14-rc1-mm1-series pc/2.6.14-rc2-mm1-series| grep sched
+move-tasklist-walk-from-cfq-iosched-to-elevatorc.patch
 sched-consider-migration-thread-with-smp-nice.patch
 sched-better-wake-balancing-3.patch
+sched-modified-nice-support-for-smp-load-balancing.patch
 #sched-rationalise-resched-and-cpu_idle.patch
 #fixup-resched-and-arch-cpu_idle.patch

Try reverting that sucker?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  1:41   ` Martin Bligh
  2006-01-11  1:48     ` Andrew Morton
@ 2006-01-11  1:49     ` Con Kolivas
  2006-01-11  2:38       ` Peter Williams
  2006-01-11  1:52     ` Andrew Morton
  2 siblings, 1 reply; 59+ messages in thread
From: Con Kolivas @ 2006-01-11  1:49 UTC (permalink / raw)
  To: Martin Bligh; +Cc: Andrew Morton, linux-kernel, Ingo Molnar, Peter Williams

On Wed, 11 Jan 2006 12:41 pm, Martin Bligh wrote:
> Seems to have gone wrong between 2.6.14-rc1-mm1 and 2.6.14-rc2-mm1 ?
> See http://test.kernel.org/perf/kernbench.moe.png for clearest effect.

The only new scheduler patch at that time was this:
+sched-modified-nice-support-for-smp-load-balancing.patch

which was Peter's modifications to my smp nice support. cc'ed Peter

I guess we need to check whether reversing this patch helps.

Con

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  1:41   ` Martin Bligh
  2006-01-11  1:48     ` Andrew Morton
  2006-01-11  1:49     ` Con Kolivas
@ 2006-01-11  1:52     ` Andrew Morton
  2 siblings, 0 replies; 59+ messages in thread
From: Andrew Morton @ 2006-01-11  1:52 UTC (permalink / raw)
  To: Martin Bligh; +Cc: linux-kernel, mingo

Martin Bligh <mbligh@google.com> wrote:
>
> > Well, 1% slower.  -mm has permanent not-for-linus debug things, some of
> > which are expected to have a performance impact.  I don't know whether
> > they'd have a 1% impact though.
> 
> OK, fair enough. Can I turn them off somehow to check? it's 10% on 
> NUMA-Q. The good news is that it's stayed in -mm for long time, so ...
> am praying.

There are quite a lot of them.

list_del-debug.patch
page-owner-tracking-leak-detector.patch
page-owner-tracking-leak-detector-fix.patch
unplug-can-sleep.patch
firestream-warnings.patch
#periodically-scan-redzone-entries-and-slab-control-structures.patch
slab-leak-detector.patch
releasing-resources-with-children.patch
nr_blockdev_pages-in_interrupt-warning.patch
nmi-lockup-and-altsysrq-p-dumping-calltraces-on-_all_-cpus.patch
detect-atomic-counter-underflows.patch
sysfs-crash-debugging.patch
device-suspend-debug.patch
slab-cache-shrinker-statistics.patch
slab-cache-shrinker-statistics-fix.patch
mm-debug-dump-pageframes-on-bad_page.patch
debug-warn-if-we-sleep-in-an-irq-for-a-long-time.patch
debug-shared-irqs.patch
debug-shared-irqs-fix.patch
make-frame_pointer-default=y.patch


list_del-debug.patch, detect-atomic-counter-underflows.patch,
slab-cache-shrinker-statistics.patch will all have a small impact.

But not much.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  1:49     ` Con Kolivas
@ 2006-01-11  2:38       ` Peter Williams
  2006-01-11  3:07         ` Con Kolivas
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Williams @ 2006-01-11  2:38 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Martin Bligh, Andrew Morton, linux-kernel, Ingo Molnar

Con Kolivas wrote:
> On Wed, 11 Jan 2006 12:41 pm, Martin Bligh wrote:
> 
>>Seems to have gone wrong between 2.6.14-rc1-mm1 and 2.6.14-rc2-mm1 ?
>>See http://test.kernel.org/perf/kernbench.moe.png for clearest effect.
> 
> 
> The only new scheduler patch at that time was this:
> +sched-modified-nice-support-for-smp-load-balancing.patch
> 
> which was Peter's modifications to my smp nice support. cc'ed Peter
> 

This patch will probably have overhead implications as it will skip some 
tasks during load balancing looking for ones whose bias_prio is small 
enough to fit within the amount of bias to be moved.  Because the 
candidate tasks are ordered in the array by dynamic priority and not 
nice (or bias_prio) the search is exhaustive.  I need to think about 
whether this can be made a little smarter e.g. skip to the next idx as 
soon as a task whose bias_prio is too large is encountered.  This would 
have the effect of missing some tasks that could have been moved due but 
these will generally be tasks with interactive bonuses causing them to 
be in a lower slot in the queue than would otherwise be the case and as 
they generally do small CPU runs not moving them probably won't make 
much difference.

 > I guess we need to check whether reversing this patch helps.

It would be interesting to see if it does.

If it does we probably have to wear the cost (and try to reduce it) as 
without this change smp nice support is fairly ineffective due to the 
fact that it moves exactly the same tasks as would be moved without it. 
  At the most it changes the frequency at which load balancing occurs.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  2:38       ` Peter Williams
@ 2006-01-11  3:07         ` Con Kolivas
  2006-01-11  3:12           ` Martin Bligh
  2006-01-11  3:40           ` Peter Williams
  0 siblings, 2 replies; 59+ messages in thread
From: Con Kolivas @ 2006-01-11  3:07 UTC (permalink / raw)
  To: Peter Williams; +Cc: Martin Bligh, Andrew Morton, linux-kernel, Ingo Molnar

On Wed, 11 Jan 2006 01:38 pm, Peter Williams wrote:
> Con Kolivas wrote:
>  > I guess we need to check whether reversing this patch helps.
>
> It would be interesting to see if it does.
>
> If it does we probably have to wear the cost (and try to reduce it) as
> without this change smp nice support is fairly ineffective due to the
> fact that it moves exactly the same tasks as would be moved without it.
>   At the most it changes the frequency at which load balancing occurs.

I disagree. I think the current implementation changes the balancing according 
to nice much more effectively than previously where by their very nature, low 
priority tasks were balanced more frequently and ended up getting their own 
cpu. No it does not provide firm 'nice' handling that we can achieve on UP 
configurations but it is also free in throughput terms and miles better than 
without it. I would like to see your more robust (and nicer code) solution 
incorporated but I also want to see it cost us as little as possible. We 
haven't confirmed anything just yet...

Cheers,
Con

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  3:07         ` Con Kolivas
@ 2006-01-11  3:12           ` Martin Bligh
  2006-01-11  3:40           ` Peter Williams
  1 sibling, 0 replies; 59+ messages in thread
From: Martin Bligh @ 2006-01-11  3:12 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Peter Williams, Andrew Morton, linux-kernel, Ingo Molnar

Con Kolivas wrote:
> On Wed, 11 Jan 2006 01:38 pm, Peter Williams wrote:
> 
>>Con Kolivas wrote:
>> > I guess we need to check whether reversing this patch helps.
>>
>>It would be interesting to see if it does.
>>
>>If it does we probably have to wear the cost (and try to reduce it) as
>>without this change smp nice support is fairly ineffective due to the
>>fact that it moves exactly the same tasks as would be moved without it.
>>  At the most it changes the frequency at which load balancing occurs.
> 
> 
> I disagree. I think the current implementation changes the balancing according 
> to nice much more effectively than previously where by their very nature, low 
> priority tasks were balanced more frequently and ended up getting their own 
> cpu. No it does not provide firm 'nice' handling that we can achieve on UP 
> configurations but it is also free in throughput terms and miles better than 
> without it. I would like to see your more robust (and nicer code) solution 
> incorporated but I also want to see it cost us as little as possible. We 
> haven't confirmed anything just yet...

Whether it turns out to be that or not ... 10% is a BIG frigging hit to 
take doing something basic like kernel compilation. Hell, 1% is a big 
hit to take, given the lengths we go to to buy that sort of benefit.

M.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  3:07         ` Con Kolivas
  2006-01-11  3:12           ` Martin Bligh
@ 2006-01-11  3:40           ` Peter Williams
  2006-01-11  3:49             ` Con Kolivas
  2006-01-11  5:14             ` Peter Williams
  1 sibling, 2 replies; 59+ messages in thread
From: Peter Williams @ 2006-01-11  3:40 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Martin Bligh, Andrew Morton, linux-kernel, Ingo Molnar

Con Kolivas wrote:
> On Wed, 11 Jan 2006 01:38 pm, Peter Williams wrote:
> 
>>Con Kolivas wrote:
>> > I guess we need to check whether reversing this patch helps.
>>
>>It would be interesting to see if it does.
>>
>>If it does we probably have to wear the cost (and try to reduce it) as
>>without this change smp nice support is fairly ineffective due to the
>>fact that it moves exactly the same tasks as would be moved without it.
>>  At the most it changes the frequency at which load balancing occurs.
> 
> 
> I disagree. I think the current implementation changes the balancing according 
> to nice much more effectively than previously where by their very nature, low 
> priority tasks were balanced more frequently and ended up getting their own 
> cpu.

I can't follow the logic here and I certainly don't see much difference 
in practice.

> No it does not provide firm 'nice' handling that we can achieve on UP 
> configurations but it is also free in throughput terms and miles better than 
> without it. I would like to see your more robust (and nicer code) solution 
> incorporated but I also want to see it cost us as little as possible. We 
> haven't confirmed anything just yet...

Yes, that's true.  I must admit that I wouldn't have expected the 
increased overhead to be very big.  In general, the "system" CPU time in 
a kernbench is only about 1% of the total CPU time and the extra 
overhead should be just a fraction of that.

It's possible that better distribution of niceness across CPU leads to 
more preemption within a run queue (i.e. if there all the same priority 
they won't preempt each other much) leading to more context switches. 
But you wouldn't expect that to show up in kernbench as the tasks are 
all the same niceness and usually end up with the same dynamic priority.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  3:40           ` Peter Williams
@ 2006-01-11  3:49             ` Con Kolivas
  2006-01-11  4:33               ` Peter Williams
  2006-01-11  5:14             ` Peter Williams
  1 sibling, 1 reply; 59+ messages in thread
From: Con Kolivas @ 2006-01-11  3:49 UTC (permalink / raw)
  To: Peter Williams; +Cc: Martin Bligh, Andrew Morton, linux-kernel, Ingo Molnar

On Wed, 11 Jan 2006 02:40 pm, Peter Williams wrote:
> Con Kolivas wrote:
> > I disagree. I think the current implementation changes the balancing
> > according to nice much more effectively than previously where by their
> > very nature, low priority tasks were balanced more frequently and ended
> > up getting their own cpu.
>
> I can't follow the logic here 

cpu bound non interactive tasks have long timeslices. Tasks that have short 
timeslices like interactive ones or cpu bound ones at nice 19 have short 
timeslices. If a nice 0 and nice 19 task are running on the same cpu, the 
nice 19 one is going to be spending most of its time waiting in the runqueue. 
As soon as an idle cpu appears it will only pull a task that is waiting in a 
runqueue... and that is going to be the low priority tasks. 

> Yes, that's true.  I must admit that I wouldn't have expected the
> increased overhead to be very big.  In general, the "system" CPU time in
> a kernbench is only about 1% of the total CPU time and the extra
> overhead should be just a fraction of that.

Doesn't appear to be system time. Extra idle time does suggest poor balancing 
though. Remember the effort I went to to avoid altering the delicate idle 
balancing...

> It's possible that better distribution of niceness across CPU leads to
> more preemption within a run queue (i.e. if there all the same priority
> they won't preempt each other much) leading to more context switches.

Can't see how that is for what you say below.

> But you wouldn't expect that to show up in kernbench as the tasks are
> all the same niceness and usually end up with the same dynamic priority.

The whole of userspace on a kernbench run is going to be nice 0. 

Let's wait till we confirm or deny this patch is responsible before 
postulating further.

Cheers,
Con

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  3:49             ` Con Kolivas
@ 2006-01-11  4:33               ` Peter Williams
  0 siblings, 0 replies; 59+ messages in thread
From: Peter Williams @ 2006-01-11  4:33 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Martin Bligh, Andrew Morton, linux-kernel, Ingo Molnar

Con Kolivas wrote:
> On Wed, 11 Jan 2006 02:40 pm, Peter Williams wrote:
> 
>>Con Kolivas wrote:
>>
>>>I disagree. I think the current implementation changes the balancing
>>>according to nice much more effectively than previously where by their
>>>very nature, low priority tasks were balanced more frequently and ended
>>>up getting their own cpu.
>>
>>I can't follow the logic here 
> 
> 
> cpu bound non interactive tasks have long timeslices. Tasks that have short 
> timeslices like interactive ones or cpu bound ones at nice 19 have short 
> timeslices.

Time slice size is dependent on nice (via static_prio) only, gets bigger 
as static_prio gets smaller and only really effects the switching of 
tasks from the active array to the expired array.  This means that 
programs with high nice values will tend to spend more time on the 
expired array than the active array.  Since the expired queue is always 
examined first this makes them the most likely to be moved regardless of 
the smp nice patch.  This is independent of the amount of CPU a task 
uses each time it gets onto the CPU which is what I think you were 
alluding to.

> If a nice 0 and nice 19 task are running on the same cpu, the 
> nice 19 one is going to be spending most of its time waiting in the runqueue. 
> As soon as an idle cpu appears it will only pull a task that is waiting in a 
> runqueue... and that is going to be the low priority tasks. 

Because they're more likely to be on the expired array.

So the patch works by reducing the chance of any tasks being moved 
during an idle rebalance.  Surely this is likely to increase the amount 
of idle time.

Maybe the idle balance should check the active arrays before it checks 
the expired arrays?  This will increase the chances of getting a high 
priority task.  The down side is that tasks on the active array are more 
likely to be "cache warm" which is why the expired array is checked 
first (hence the suggestion that this only apply to idle balancing).

But, as you say, let's wait and see what happens with the patch backed 
out before we get too carried away.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  3:40           ` Peter Williams
  2006-01-11  3:49             ` Con Kolivas
@ 2006-01-11  5:14             ` Peter Williams
  2006-01-11  6:21               ` Martin J. Bligh
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Williams @ 2006-01-11  5:14 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Martin Bligh, Andrew Morton, linux-kernel, Ingo Molnar

Peter Williams wrote:
> Con Kolivas wrote:
> 
>> On Wed, 11 Jan 2006 01:38 pm, Peter Williams wrote:
>>
>>> Con Kolivas wrote:
>>> > I guess we need to check whether reversing this patch helps.
>>>
>>> It would be interesting to see if it does.
>>>
>>> If it does we probably have to wear the cost (and try to reduce it) as
>>> without this change smp nice support is fairly ineffective due to the
>>> fact that it moves exactly the same tasks as would be moved without it.
>>>  At the most it changes the frequency at which load balancing occurs.
>>
>>
>>
>> I disagree. I think the current implementation changes the balancing 
>> according to nice much more effectively than previously where by their 
>> very nature, low priority tasks were balanced more frequently and 
>> ended up getting their own cpu.
> 
> 
> I can't follow the logic here and I certainly don't see much difference 
> in practice.

I think I've figured out why I'm not seeing much difference in practice. 
  I'm only testing on 2 CPU systems and it seems to me that the main 
difference that the SMP nice patch will have is in selecting which CPU 
to steal tasks from (grabbing the one with the highest priority tasks) 
and this is a non issue on a 2 CPU system.  :-(

So I should revise my statement to say that it doesn't make much 
difference if there's only 2 CPUs.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  5:14             ` Peter Williams
@ 2006-01-11  6:21               ` Martin J. Bligh
  2006-01-11 12:24                 ` Peter Williams
  0 siblings, 1 reply; 59+ messages in thread
From: Martin J. Bligh @ 2006-01-11  6:21 UTC (permalink / raw)
  To: Peter Williams; +Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar

Peter Williams wrote:

> Peter Williams wrote:
>
>> Con Kolivas wrote:
>>
>>> On Wed, 11 Jan 2006 01:38 pm, Peter Williams wrote:
>>>
>>>> Con Kolivas wrote:
>>>> > I guess we need to check whether reversing this patch helps.
>>>>
>>>> It would be interesting to see if it does.
>>>>
>>>> If it does we probably have to wear the cost (and try to reduce it) as
>>>> without this change smp nice support is fairly ineffective due to the
>>>> fact that it moves exactly the same tasks as would be moved without 
>>>> it.
>>>>  At the most it changes the frequency at which load balancing occurs.
>>>
>>>
>>>
>>>
>>> I disagree. I think the current implementation changes the balancing 
>>> according to nice much more effectively than previously where by 
>>> their very nature, low priority tasks were balanced more frequently 
>>> and ended up getting their own cpu.
>>
>>
>>
>> I can't follow the logic here and I certainly don't see much 
>> difference in practice.
>
>
> I think I've figured out why I'm not seeing much difference in 
> practice.  I'm only testing on 2 CPU systems and it seems to me that 
> the main difference that the SMP nice patch will have is in selecting 
> which CPU to steal tasks from (grabbing the one with the highest 
> priority tasks) and this is a non issue on a 2 CPU system.  :-(
>
> So I should revise my statement to say that it doesn't make much 
> difference if there's only 2 CPUs.
>

If nothing's niced, why would it be affecting scheduling decisions at all?
That seems broken to me ?

M.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11  6:21               ` Martin J. Bligh
@ 2006-01-11 12:24                 ` Peter Williams
  2006-01-11 14:29                   ` Con Kolivas
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Williams @ 2006-01-11 12:24 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar

Martin J. Bligh wrote:
> Peter Williams wrote:
> 
>> Peter Williams wrote:
>>
>>> Con Kolivas wrote:
>>>
>>>> On Wed, 11 Jan 2006 01:38 pm, Peter Williams wrote:
>>>>
>>>>> Con Kolivas wrote:
>>>>> > I guess we need to check whether reversing this patch helps.
>>>>>
>>>>> It would be interesting to see if it does.
>>>>>
>>>>> If it does we probably have to wear the cost (and try to reduce it) as
>>>>> without this change smp nice support is fairly ineffective due to the
>>>>> fact that it moves exactly the same tasks as would be moved without 
>>>>> it.
>>>>>  At the most it changes the frequency at which load balancing occurs.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I disagree. I think the current implementation changes the balancing 
>>>> according to nice much more effectively than previously where by 
>>>> their very nature, low priority tasks were balanced more frequently 
>>>> and ended up getting their own cpu.
>>>
>>>
>>>
>>>
>>> I can't follow the logic here and I certainly don't see much 
>>> difference in practice.
>>
>>
>>
>> I think I've figured out why I'm not seeing much difference in 
>> practice.  I'm only testing on 2 CPU systems and it seems to me that 
>> the main difference that the SMP nice patch will have is in selecting 
>> which CPU to steal tasks from (grabbing the one with the highest 
>> priority tasks) and this is a non issue on a 2 CPU system.  :-(
>>
>> So I should revise my statement to say that it doesn't make much 
>> difference if there's only 2 CPUs.
>>
> 
> If nothing's niced, why would it be affecting scheduling decisions at all?

Load balancing decisions.

> That seems broken to me ?

But, yes, given that the problem goes away when the patch is removed 
(which we're still waiting to see) it's broken.  I think the problem is 
probably due to the changed metric (i.e. biased load instead of simple 
load) causing idle_balance() to fail more often (i.e. it decides to not 
bother moving any tasks more often than it otherwise would) which would 
explain the increased idle time being seen.  This means that the fix 
would be to review the criteria for deciding whether to move tasks in 
idle_balance().

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11 12:24                 ` Peter Williams
@ 2006-01-11 14:29                   ` Con Kolivas
  2006-01-11 22:05                     ` Peter Williams
  0 siblings, 1 reply; 59+ messages in thread
From: Con Kolivas @ 2006-01-11 14:29 UTC (permalink / raw)
  To: Peter Williams; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Ingo Molnar

On Wednesday 11 January 2006 23:24, Peter Williams wrote:
> Martin J. Bligh wrote:
> > That seems broken to me ?
>
> But, yes, given that the problem goes away when the patch is removed
> (which we're still waiting to see) it's broken.  I think the problem is
> probably due to the changed metric (i.e. biased load instead of simple
> load) causing idle_balance() to fail more often (i.e. it decides to not
> bother moving any tasks more often than it otherwise would) which would
> explain the increased idle time being seen.  This means that the fix
> would be to review the criteria for deciding whether to move tasks in
> idle_balance().

Look back on my implementation. The problem as I saw it was that one task 
alone with a biased load would suddenly make a runqueue look much busier than 
it was supposed to so I special cased the runqueue that had precisely one 
task.

Cheers,
Con

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11 14:29                   ` Con Kolivas
@ 2006-01-11 22:05                     ` Peter Williams
  2006-01-12  0:54                       ` Peter Williams
  2006-01-12  1:25                       ` Peter Williams
  0 siblings, 2 replies; 59+ messages in thread
From: Peter Williams @ 2006-01-11 22:05 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Ingo Molnar

Con Kolivas wrote:
> On Wednesday 11 January 2006 23:24, Peter Williams wrote:
> 
>>Martin J. Bligh wrote:
>>
>>>That seems broken to me ?
>>
>>But, yes, given that the problem goes away when the patch is removed
>>(which we're still waiting to see) it's broken.  I think the problem is
>>probably due to the changed metric (i.e. biased load instead of simple
>>load) causing idle_balance() to fail more often (i.e. it decides to not
>>bother moving any tasks more often than it otherwise would) which would
>>explain the increased idle time being seen.  This means that the fix
>>would be to review the criteria for deciding whether to move tasks in
>>idle_balance().
> 
> 
> Look back on my implementation. The problem as I saw it was that one task 
> alone with a biased load would suddenly make a runqueue look much busier than 
> it was supposed to so I special cased the runqueue that had precisely one 
> task.

OK.  I'll look at that.

But I was thinking more about the code that (in the original) handled 
the case where the number of tasks to be moved was less than 1 but more 
than 0 (i.e. the cases where "imbalance" would have been reduced to zero 
when divided by SCHED_LOAD_SCALE).  I think that I got that part wrong 
and you can end up with a bias load to be moved which is less than any 
of the bias_prio values for any queued tasks (in circumstances where the 
original code would have rounded up to 1 and caused a move).  I think 
that the way to handle this problem is to replace 1 with "average bias 
prio" within that logic.  This would guarantee at least one task with a 
bias_prio small enough to be moved.

I think that this analysis is a strong argument for my original patch 
being the cause of the problem so I'll go ahead and generate a fix. 
I'll try to have a patch available later this morning.

Peter
PS
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11 22:05                     ` Peter Williams
@ 2006-01-12  0:54                       ` Peter Williams
  2006-01-12  1:18                         ` Con Kolivas
  2006-01-12 18:39                         ` Martin Bligh
  2006-01-12  1:25                       ` Peter Williams
  1 sibling, 2 replies; 59+ messages in thread
From: Peter Williams @ 2006-01-12  0:54 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 2569 bytes --]

Peter Williams wrote:
> Con Kolivas wrote:
> 
>> On Wednesday 11 January 2006 23:24, Peter Williams wrote:
>>
>>> Martin J. Bligh wrote:
>>>
>>>> That seems broken to me ?
>>>
>>>
>>> But, yes, given that the problem goes away when the patch is removed
>>> (which we're still waiting to see) it's broken.  I think the problem is
>>> probably due to the changed metric (i.e. biased load instead of simple
>>> load) causing idle_balance() to fail more often (i.e. it decides to not
>>> bother moving any tasks more often than it otherwise would) which would
>>> explain the increased idle time being seen.  This means that the fix
>>> would be to review the criteria for deciding whether to move tasks in
>>> idle_balance().
>>
>>
>>
>> Look back on my implementation. The problem as I saw it was that one 
>> task alone with a biased load would suddenly make a runqueue look much 
>> busier than it was supposed to so I special cased the runqueue that 
>> had precisely one task.
> 
> 
> OK.  I'll look at that.

Addressed in a separate e-mail.

> 
> But I was thinking more about the code that (in the original) handled 
> the case where the number of tasks to be moved was less than 1 but more 
> than 0 (i.e. the cases where "imbalance" would have been reduced to zero 
> when divided by SCHED_LOAD_SCALE).  I think that I got that part wrong 
> and you can end up with a bias load to be moved which is less than any 
> of the bias_prio values for any queued tasks (in circumstances where the 
> original code would have rounded up to 1 and caused a move).  I think 
> that the way to handle this problem is to replace 1 with "average bias 
> prio" within that logic.  This would guarantee at least one task with a 
> bias_prio small enough to be moved.
> 
> I think that this analysis is a strong argument for my original patch 
> being the cause of the problem so I'll go ahead and generate a fix. I'll 
> try to have a patch available later this morning.

Attached is a patch that addresses this problem.  Unlike the description 
above it does not use "average bias prio" as that solution would be very 
complicated.  Instead it makes the assumption that NICE_TO_BIAS_PRIO(0) 
is a "good enough" for this purpose as this is highly likely to be the 
median bias prio and the median is probably better for this purpose than 
the average.

Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

[-- Attachment #2: fix-excessive-idling-with-smp-move-load-not-tasks --]
[-- Type: text/plain, Size: 732 bytes --]

Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c	2006-01-12 09:23:38.000000000 +1100
+++ MM-2.6.X/kernel/sched.c	2006-01-12 10:44:50.000000000 +1100
@@ -2116,11 +2116,11 @@ find_busiest_group(struct sched_domain *
 				(avg_load - this_load) * this->cpu_power)
 			/ SCHED_LOAD_SCALE;
 
-	if (*imbalance < SCHED_LOAD_SCALE) {
+	if (*imbalance < NICE_TO_BIAS_PRIO(0) * SCHED_LOAD_SCALE) {
 		unsigned long pwr_now = 0, pwr_move = 0;
 		unsigned long tmp;
 
-		if (max_load - this_load >= SCHED_LOAD_SCALE*2) {
+		if (max_load - this_load >= NICE_TO_BIAS_PRIO(0) * SCHED_LOAD_SCALE*2) {
 			*imbalance = NICE_TO_BIAS_PRIO(0);
 			return busiest;
 		}

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  0:54                       ` Peter Williams
@ 2006-01-12  1:18                         ` Con Kolivas
  2006-01-12  1:29                           ` Peter Williams
  2006-01-12  2:04                           ` Martin Bligh
  2006-01-12 18:39                         ` Martin Bligh
  1 sibling, 2 replies; 59+ messages in thread
From: Con Kolivas @ 2006-01-12  1:18 UTC (permalink / raw)
  To: Peter Williams; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Ingo Molnar

On Thu, 12 Jan 2006 11:54 am, Peter Williams wrote:
> Peter Williams wrote:
> > Con Kolivas wrote:
> >> On Wednesday 11 January 2006 23:24, Peter Williams wrote:
> >>> Martin J. Bligh wrote:
> >>>> That seems broken to me ?
> >>>
> >>> But, yes, given that the problem goes away when the patch is removed
> >>> (which we're still waiting to see) it's broken.  I think the problem is
> >>> probably due to the changed metric (i.e. biased load instead of simple
> >>> load) causing idle_balance() to fail more often (i.e. it decides to not
> >>> bother moving any tasks more often than it otherwise would) which would
> >>> explain the increased idle time being seen.  This means that the fix
> >>> would be to review the criteria for deciding whether to move tasks in
> >>> idle_balance().
> >>
> >> Look back on my implementation. The problem as I saw it was that one
> >> task alone with a biased load would suddenly make a runqueue look much
> >> busier than it was supposed to so I special cased the runqueue that
> >> had precisely one task.
> >
> > OK.  I'll look at that.
>
> Addressed in a separate e-mail.
>
> > But I was thinking more about the code that (in the original) handled
> > the case where the number of tasks to be moved was less than 1 but more
> > than 0 (i.e. the cases where "imbalance" would have been reduced to zero
> > when divided by SCHED_LOAD_SCALE).  I think that I got that part wrong
> > and you can end up with a bias load to be moved which is less than any
> > of the bias_prio values for any queued tasks (in circumstances where the
> > original code would have rounded up to 1 and caused a move).  I think
> > that the way to handle this problem is to replace 1 with "average bias
> > prio" within that logic.  This would guarantee at least one task with a
> > bias_prio small enough to be moved.
> >
> > I think that this analysis is a strong argument for my original patch
> > being the cause of the problem so I'll go ahead and generate a fix. I'll
> > try to have a patch available later this morning.
>
> Attached is a patch that addresses this problem.  Unlike the description
> above it does not use "average bias prio" as that solution would be very
> complicated.  Instead it makes the assumption that NICE_TO_BIAS_PRIO(0)
> is a "good enough" for this purpose as this is highly likely to be the
> median bias prio and the median is probably better for this purpose than
> the average.

This is a shot in the dark. We haven't confirmed 1. there is a problem 2. that 
this is the problem nor 3. that this patch will fix the problem. I say we 
wait for the results of 1. If the improved smp nice handling patch ends up 
being responsible then it should not be merged upstream, and then this patch 
can be tested on top.

Martin I know your work move has made it not your responsibility to test 
backing out this change, but are you aware of anything being done to test 
this hypothesis?

Con

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-11 22:05                     ` Peter Williams
  2006-01-12  0:54                       ` Peter Williams
@ 2006-01-12  1:25                       ` Peter Williams
  1 sibling, 0 replies; 59+ messages in thread
From: Peter Williams @ 2006-01-12  1:25 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 2423 bytes --]

Peter Williams wrote:
> Con Kolivas wrote:
> 
>> On Wednesday 11 January 2006 23:24, Peter Williams wrote:
>>
>>> Martin J. Bligh wrote:
>>>
>>>> That seems broken to me ?
>>>
>>>
>>> But, yes, given that the problem goes away when the patch is removed
>>> (which we're still waiting to see) it's broken.  I think the problem is
>>> probably due to the changed metric (i.e. biased load instead of simple
>>> load) causing idle_balance() to fail more often (i.e. it decides to not
>>> bother moving any tasks more often than it otherwise would) which would
>>> explain the increased idle time being seen.  This means that the fix
>>> would be to review the criteria for deciding whether to move tasks in
>>> idle_balance().
>>
>>
>>
>> Look back on my implementation. The problem as I saw it was that one 
>> task alone with a biased load would suddenly make a runqueue look much 
>> busier than it was supposed to so I special cased the runqueue that 
>> had precisely one task.
> 
> 
> OK.  I'll look at that.

OK.  I agree that this mechanism increases the chances that a queue with 
only one runnable task is selected as the target for stealing tasks 
from.  The attached patch addresses this issue in two ways:

1. in find_busiest_group(), only groups that have at least one queue 
with more than one task running are considered, and
2. in find_busiest_queue(), only queues with more than one runnable task 
are considered.

As I see it, this patch is a bit iffy as it is effected by race 
conditions in two ways:

1. just because there's more than one task runnable when these checks 
are made there's no guarantee that this will be the case when you try to 
move some of them, and
2. just because there's only one task runnable when these checks are 
made it's possible that there will be more than one when you attempt the 
move.

I don't think that this patch makes case 1 any worse than it already is 
but case 2 could cause potential moves to be missed that otherwise 
wouldn't be and I assume this is the reason why there is no similar code 
in the original.  Whether the increased probability of choosing queues 
with only one runnable tasks changes this reasoning is up for debate.

Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

[-- Attachment #2: consider-nr_running-when-finding-busiest --]
[-- Type: text/plain, Size: 1330 bytes --]

Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c	2006-01-12 10:44:50.000000000 +1100
+++ MM-2.6.X/kernel/sched.c	2006-01-12 10:47:01.000000000 +1100
@@ -2052,6 +2052,7 @@ find_busiest_group(struct sched_domain *
 		unsigned long load;
 		int local_group;
 		int i;
+		unsigned int eligible_qs = 0;
 
 		local_group = cpu_isset(this_cpu, group->cpumask);
 
@@ -2065,8 +2066,11 @@ find_busiest_group(struct sched_domain *
 			/* Bias balancing toward cpus of our domain */
 			if (local_group)
 				load = target_load(i, load_idx);
-			else
+			else {
 				load = source_load(i, load_idx);
+				if (cpu_rq(i)->nr_running > 1)
+					++eligible_qs;
+			}
 
 			avg_load += load;
 		}
@@ -2080,7 +2084,7 @@ find_busiest_group(struct sched_domain *
 		if (local_group) {
 			this_load = avg_load;
 			this = group;
-		} else if (avg_load > max_load) {
+		} else if (avg_load > max_load && eligible_qs) {
 			max_load = avg_load;
 			busiest = group;
 		}
@@ -2181,8 +2185,12 @@ static runqueue_t *find_busiest_queue(st
 		load = source_load(i, 0);
 
 		if (load > max_load) {
-			max_load = load;
-			busiest = cpu_rq(i);
+			runqueue_t *tmprq = cpu_rq(i);
+
+			if (tmprq->nr_running > 1) {
+				max_load = load;
+				busiest = tmprq;
+			}
 		}
 	}
 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  1:18                         ` Con Kolivas
@ 2006-01-12  1:29                           ` Peter Williams
  2006-01-12  1:36                             ` Con Kolivas
  2006-01-12  2:04                           ` Martin Bligh
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Williams @ 2006-01-12  1:29 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Ingo Molnar

Con Kolivas wrote:
> On Thu, 12 Jan 2006 11:54 am, Peter Williams wrote:
> 
>>Peter Williams wrote:
>>
>>>Con Kolivas wrote:
>>>
>>>>On Wednesday 11 January 2006 23:24, Peter Williams wrote:
>>>>
>>>>>Martin J. Bligh wrote:
>>>>>
>>>>>>That seems broken to me ?
>>>>>
>>>>>But, yes, given that the problem goes away when the patch is removed
>>>>>(which we're still waiting to see) it's broken.  I think the problem is
>>>>>probably due to the changed metric (i.e. biased load instead of simple
>>>>>load) causing idle_balance() to fail more often (i.e. it decides to not
>>>>>bother moving any tasks more often than it otherwise would) which would
>>>>>explain the increased idle time being seen.  This means that the fix
>>>>>would be to review the criteria for deciding whether to move tasks in
>>>>>idle_balance().
>>>>
>>>>Look back on my implementation. The problem as I saw it was that one
>>>>task alone with a biased load would suddenly make a runqueue look much
>>>>busier than it was supposed to so I special cased the runqueue that
>>>>had precisely one task.
>>>
>>>OK.  I'll look at that.
>>
>>Addressed in a separate e-mail.
>>
>>
>>>But I was thinking more about the code that (in the original) handled
>>>the case where the number of tasks to be moved was less than 1 but more
>>>than 0 (i.e. the cases where "imbalance" would have been reduced to zero
>>>when divided by SCHED_LOAD_SCALE).  I think that I got that part wrong
>>>and you can end up with a bias load to be moved which is less than any
>>>of the bias_prio values for any queued tasks (in circumstances where the
>>>original code would have rounded up to 1 and caused a move).  I think
>>>that the way to handle this problem is to replace 1 with "average bias
>>>prio" within that logic.  This would guarantee at least one task with a
>>>bias_prio small enough to be moved.
>>>
>>>I think that this analysis is a strong argument for my original patch
>>>being the cause of the problem so I'll go ahead and generate a fix. I'll
>>>try to have a patch available later this morning.
>>
>>Attached is a patch that addresses this problem.  Unlike the description
>>above it does not use "average bias prio" as that solution would be very
>>complicated.  Instead it makes the assumption that NICE_TO_BIAS_PRIO(0)
>>is a "good enough" for this purpose as this is highly likely to be the
>>median bias prio and the median is probably better for this purpose than
>>the average.
> 
> 
> This is a shot in the dark. We haven't confirmed 1. there is a problem 2. that 
> this is the problem nor 3. that this patch will fix the problem.

I disagree.  I think that there is a clear mistake in my original patch 
that this patch fixes.

> I say we 
> wait for the results of 1. If the improved smp nice handling patch ends up 
> being responsible then it should not be merged upstream, and then this patch 
> can be tested on top.
> 
> Martin I know your work move has made it not your responsibility to test 
> backing out this change, but are you aware of anything being done to test 
> this hypothesis?

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  1:29                           ` Peter Williams
@ 2006-01-12  1:36                             ` Con Kolivas
  2006-01-12  2:23                               ` Peter Williams
  0 siblings, 1 reply; 59+ messages in thread
From: Con Kolivas @ 2006-01-12  1:36 UTC (permalink / raw)
  To: Peter Williams; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Ingo Molnar

On Thu, 12 Jan 2006 12:29 pm, Peter Williams wrote:
> Con Kolivas wrote:
> > This is a shot in the dark. We haven't confirmed 1. there is a problem 2.
> > that this is the problem nor 3. that this patch will fix the problem.
>
> I disagree.  I think that there is a clear mistake in my original patch
> that this patch fixes.

I agree with you on that. The real concern is that we were just about to push 
it upstream. So where does this leave us? I propose we delay merging the 
"improved smp nice handling" patch into mainline pending your further 
changes.

Cheers,
Con

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  1:18                         ` Con Kolivas
  2006-01-12  1:29                           ` Peter Williams
@ 2006-01-12  2:04                           ` Martin Bligh
  2006-01-12  6:35                             ` Martin J. Bligh
  1 sibling, 1 reply; 59+ messages in thread
From: Martin Bligh @ 2006-01-12  2:04 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Peter Williams, Andrew Morton, linux-kernel, Ingo Molnar


> This is a shot in the dark. We haven't confirmed 1. there is a problem 2. that 
> this is the problem nor 3. that this patch will fix the problem. I say we 
> wait for the results of 1. If the improved smp nice handling patch ends up 
> being responsible then it should not be merged upstream, and then this patch 
> can be tested on top.
> 
> Martin I know your work move has made it not your responsibility to test 
> backing out this change, but are you aware of anything being done to test 
> this hypothesis?

Yup, Andy queued the job ... just waiting for it to pop out the system

M.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  1:36                             ` Con Kolivas
@ 2006-01-12  2:23                               ` Peter Williams
  2006-01-12  2:26                                 ` Martin Bligh
  2006-01-12  2:27                                 ` Con Kolivas
  0 siblings, 2 replies; 59+ messages in thread
From: Peter Williams @ 2006-01-12  2:23 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Ingo Molnar

Con Kolivas wrote:
> On Thu, 12 Jan 2006 12:29 pm, Peter Williams wrote:
> 
>>Con Kolivas wrote:
>>
>>>This is a shot in the dark. We haven't confirmed 1. there is a problem 2.
>>>that this is the problem nor 3. that this patch will fix the problem.
>>
>>I disagree.  I think that there is a clear mistake in my original patch
>>that this patch fixes.
> 
> 
> I agree with you on that. The real concern is that we were just about to push 
> it upstream. So where does this leave us?  I propose we delay merging the
> "improved smp nice handling" patch into mainline pending your further 
> changes.

I think that they're already in 2.6.15 minus my "move load not tasks" 
modification which I was expecting to go into 2.6.16.  Is that what you 
meant?

If so I think this is a small and obvious fix that shouldn't delay the 
merging of "move load not tasks" into the mainline.  But it's not my call.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  2:23                               ` Peter Williams
@ 2006-01-12  2:26                                 ` Martin Bligh
  2006-01-12  6:39                                   ` Con Kolivas
  2006-01-12  2:27                                 ` Con Kolivas
  1 sibling, 1 reply; 59+ messages in thread
From: Martin Bligh @ 2006-01-12  2:26 UTC (permalink / raw)
  To: Peter Williams; +Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar

Peter Williams wrote:
> Con Kolivas wrote:
> 
>> On Thu, 12 Jan 2006 12:29 pm, Peter Williams wrote:
>>
>>> Con Kolivas wrote:
>>>
>>>> This is a shot in the dark. We haven't confirmed 1. there is a 
>>>> problem 2.
>>>> that this is the problem nor 3. that this patch will fix the problem.
>>>
>>>
>>> I disagree.  I think that there is a clear mistake in my original patch
>>> that this patch fixes.
>>
>>
>>
>> I agree with you on that. The real concern is that we were just about 
>> to push it upstream. So where does this leave us?  I propose we delay 
>> merging the
>> "improved smp nice handling" patch into mainline pending your further 
>> changes.
> 
> 
> I think that they're already in 2.6.15 minus my "move load not tasks" 
> modification which I was expecting to go into 2.6.16.  Is that what you 
> meant?
> 
> If so I think this is a small and obvious fix that shouldn't delay the 
> merging of "move load not tasks" into the mainline.  But it's not my call.

BTW, in reference to a question from last night ... -git7 seems fine. So
if you merged it already, it wasn't that ;-)

M.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  2:23                               ` Peter Williams
  2006-01-12  2:26                                 ` Martin Bligh
@ 2006-01-12  2:27                                 ` Con Kolivas
  1 sibling, 0 replies; 59+ messages in thread
From: Con Kolivas @ 2006-01-12  2:27 UTC (permalink / raw)
  To: Peter Williams; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Ingo Molnar

On Thu, 12 Jan 2006 01:23 pm, Peter Williams wrote:
> Con Kolivas wrote:
> > On Thu, 12 Jan 2006 12:29 pm, Peter Williams wrote:
> >>Con Kolivas wrote:
> >>>This is a shot in the dark. We haven't confirmed 1. there is a problem
> >>> 2. that this is the problem nor 3. that this patch will fix the
> >>> problem.
> >>
> >>I disagree.  I think that there is a clear mistake in my original patch
> >>that this patch fixes.
> >
> > I agree with you on that. The real concern is that we were just about to
> > push it upstream. So where does this leave us?  I propose we delay
> > merging the "improved smp nice handling" patch into mainline pending your
> > further changes.
>
> I think that they're already in 2.6.15 minus my "move load not tasks"
> modification which I was expecting to go into 2.6.16.  Is that what you
> meant?

Yes, akpm included your patch under the name of something like "improved smp 
nice handling". My smp nice patches went into mainline 2.6.15 and were 
extensively tested for performance regressions. This is about your move load 
not tasks modification which is the patch in question that might be affecting 
performance.

Con

> If so I think this is a small and obvious fix that shouldn't delay the
> merging of "move load not tasks" into the mainline.  But it's not my call.
>
> Peter

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  2:04                           ` Martin Bligh
@ 2006-01-12  6:35                             ` Martin J. Bligh
  2006-01-12  6:41                               ` Con Kolivas
  0 siblings, 1 reply; 59+ messages in thread
From: Martin J. Bligh @ 2006-01-12  6:35 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Con Kolivas, Peter Williams, Andrew Morton, linux-kernel,
	Ingo Molnar, apw

Martin Bligh wrote:

>
>> This is a shot in the dark. We haven't confirmed 1. there is a 
>> problem 2. that this is the problem nor 3. that this patch will fix 
>> the problem. I say we wait for the results of 1. If the improved smp 
>> nice handling patch ends up being responsible then it should not be 
>> merged upstream, and then this patch can be tested on top.
>>
>> Martin I know your work move has made it not your responsibility to 
>> test backing out this change, but are you aware of anything being 
>> done to test this hypothesis?
>
>
OK, backing out that patch seems to fix it. Thanks Andy ;-)

M.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  2:26                                 ` Martin Bligh
@ 2006-01-12  6:39                                   ` Con Kolivas
  2006-01-23 19:28                                     ` Martin Bligh
  0 siblings, 1 reply; 59+ messages in thread
From: Con Kolivas @ 2006-01-12  6:39 UTC (permalink / raw)
  To: Martin Bligh; +Cc: Peter Williams, Andrew Morton, linux-kernel, Ingo Molnar

On Thu, 12 Jan 2006 01:26 pm, Martin Bligh wrote:
> Peter Williams wrote:
> > Con Kolivas wrote:
> >> On Thu, 12 Jan 2006 12:29 pm, Peter Williams wrote:
> >>> Con Kolivas wrote:
> >>>> This is a shot in the dark. We haven't confirmed 1. there is a
> >>>> problem 2.
> >>>> that this is the problem nor 3. that this patch will fix the problem.
> >>>
> >>> I disagree.  I think that there is a clear mistake in my original patch
> >>> that this patch fixes.
> >>
> >> I agree with you on that. The real concern is that we were just about
> >> to push it upstream. So where does this leave us?  I propose we delay
> >> merging the
> >> "improved smp nice handling" patch into mainline pending your further
> >> changes.
> >
> > I think that they're already in 2.6.15 minus my "move load not tasks"
> > modification which I was expecting to go into 2.6.16.  Is that what you
> > meant?
> >
> > If so I think this is a small and obvious fix that shouldn't delay the
> > merging of "move load not tasks" into the mainline.  But it's not my
> > call.
>
> BTW, in reference to a question from last night ... -git7 seems fine. So
> if you merged it already, it wasn't that ;-)

Thanks and looks like the results are in from 2.6.14-rc2-mm1 with the patch 
backed out.

Drumroll....

http://test.kernel.org/perf/kernbench.moe.png

The performance goes back to a range similar to 2.6.14-rc1-mm1 (see 
2.6.14-rc2-mm1 + 20328 in blue). Unfortunately this does implicate this 
patch. Can we put it back into -mm only and allow Peter's tweaks/fixes to go 
on top and have it tested some more before going upstream?

Peter, Ingo?

Cheers,
Con

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  6:35                             ` Martin J. Bligh
@ 2006-01-12  6:41                               ` Con Kolivas
  2006-01-12  6:54                                 ` Peter Williams
  0 siblings, 1 reply; 59+ messages in thread
From: Con Kolivas @ 2006-01-12  6:41 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Martin Bligh, Peter Williams, Andrew Morton, linux-kernel,
	Ingo Molnar, apw

On Thu, 12 Jan 2006 05:35 pm, Martin J. Bligh wrote:
> Martin Bligh wrote:
> >> This is a shot in the dark. We haven't confirmed 1. there is a
> >> problem 2. that this is the problem nor 3. that this patch will fix
> >> the problem. I say we wait for the results of 1. If the improved smp
> >> nice handling patch ends up being responsible then it should not be
> >> merged upstream, and then this patch can be tested on top.
> >>
> >> Martin I know your work move has made it not your responsibility to
> >> test backing out this change, but are you aware of anything being
> >> done to test this hypothesis?
>
> OK, backing out that patch seems to fix it. Thanks Andy ;-)

Yes thanks!

I just noticed the result myself and emailed on another branch of this email 
thread. Gah

Con

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  6:41                               ` Con Kolivas
@ 2006-01-12  6:54                                 ` Peter Williams
  0 siblings, 0 replies; 59+ messages in thread
From: Peter Williams @ 2006-01-12  6:54 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Martin J. Bligh, Martin Bligh, Andrew Morton, linux-kernel,
	Ingo Molnar, apw

Con Kolivas wrote:
> On Thu, 12 Jan 2006 05:35 pm, Martin J. Bligh wrote:
> 
>>Martin Bligh wrote:
>>
>>>>This is a shot in the dark. We haven't confirmed 1. there is a
>>>>problem 2. that this is the problem nor 3. that this patch will fix
>>>>the problem. I say we wait for the results of 1. If the improved smp
>>>>nice handling patch ends up being responsible then it should not be
>>>>merged upstream, and then this patch can be tested on top.
>>>>
>>>>Martin I know your work move has made it not your responsibility to
>>>>test backing out this change, but are you aware of anything being
>>>>done to test this hypothesis?
>>
>>OK, backing out that patch seems to fix it. Thanks Andy ;-)
> 
> 
> Yes thanks!
> 
> I just noticed the result myself and emailed on another branch of this email 
> thread. Gah

It would nice to get a run with the patch back in and my fix applied.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  0:54                       ` Peter Williams
  2006-01-12  1:18                         ` Con Kolivas
@ 2006-01-12 18:39                         ` Martin Bligh
  2006-01-12 20:03                           ` Peter Williams
  1 sibling, 1 reply; 59+ messages in thread
From: Martin Bligh @ 2006-01-12 18:39 UTC (permalink / raw)
  To: Peter Williams
  Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft


>>
>> But I was thinking more about the code that (in the original) handled 
>> the case where the number of tasks to be moved was less than 1 but 
>> more than 0 (i.e. the cases where "imbalance" would have been reduced 
>> to zero when divided by SCHED_LOAD_SCALE).  I think that I got that 
>> part wrong and you can end up with a bias load to be moved which is 
>> less than any of the bias_prio values for any queued tasks (in 
>> circumstances where the original code would have rounded up to 1 and 
>> caused a move).  I think that the way to handle this problem is to 
>> replace 1 with "average bias prio" within that logic.  This would 
>> guarantee at least one task with a bias_prio small enough to be moved.
>>
>> I think that this analysis is a strong argument for my original patch 
>> being the cause of the problem so I'll go ahead and generate a fix. 
>> I'll try to have a patch available later this morning.
> 
> 
> Attached is a patch that addresses this problem.  Unlike the description 
> above it does not use "average bias prio" as that solution would be very 
> complicated.  Instead it makes the assumption that NICE_TO_BIAS_PRIO(0) 
> is a "good enough" for this purpose as this is highly likely to be the 
> median bias prio and the median is probably better for this purpose than 
> the average.
> 
> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>

Doesn't fix the perf issue.

M.


> Peter
> 
> 
> ------------------------------------------------------------------------
> 
> Index: MM-2.6.X/kernel/sched.c
> ===================================================================
> --- MM-2.6.X.orig/kernel/sched.c	2006-01-12 09:23:38.000000000 +1100
> +++ MM-2.6.X/kernel/sched.c	2006-01-12 10:44:50.000000000 +1100
> @@ -2116,11 +2116,11 @@ find_busiest_group(struct sched_domain *
>  				(avg_load - this_load) * this->cpu_power)
>  			/ SCHED_LOAD_SCALE;
>  
> -	if (*imbalance < SCHED_LOAD_SCALE) {
> +	if (*imbalance < NICE_TO_BIAS_PRIO(0) * SCHED_LOAD_SCALE) {
>  		unsigned long pwr_now = 0, pwr_move = 0;
>  		unsigned long tmp;
>  
> -		if (max_load - this_load >= SCHED_LOAD_SCALE*2) {
> +		if (max_load - this_load >= NICE_TO_BIAS_PRIO(0) * SCHED_LOAD_SCALE*2) {
>  			*imbalance = NICE_TO_BIAS_PRIO(0);
>  			return busiest;
>  		}


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12 18:39                         ` Martin Bligh
@ 2006-01-12 20:03                           ` Peter Williams
  2006-01-12 22:20                             ` Peter Williams
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Williams @ 2006-01-12 20:03 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft

Martin Bligh wrote:
> 
>>>
>>> But I was thinking more about the code that (in the original) handled 
>>> the case where the number of tasks to be moved was less than 1 but 
>>> more than 0 (i.e. the cases where "imbalance" would have been reduced 
>>> to zero when divided by SCHED_LOAD_SCALE).  I think that I got that 
>>> part wrong and you can end up with a bias load to be moved which is 
>>> less than any of the bias_prio values for any queued tasks (in 
>>> circumstances where the original code would have rounded up to 1 and 
>>> caused a move).  I think that the way to handle this problem is to 
>>> replace 1 with "average bias prio" within that logic.  This would 
>>> guarantee at least one task with a bias_prio small enough to be moved.
>>>
>>> I think that this analysis is a strong argument for my original patch 
>>> being the cause of the problem so I'll go ahead and generate a fix. 
>>> I'll try to have a patch available later this morning.
>>
>>
>>
>> Attached is a patch that addresses this problem.  Unlike the 
>> description above it does not use "average bias prio" as that solution 
>> would be very complicated.  Instead it makes the assumption that 
>> NICE_TO_BIAS_PRIO(0) is a "good enough" for this purpose as this is 
>> highly likely to be the median bias prio and the median is probably 
>> better for this purpose than the average.
>>
>> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
> 
> 
> Doesn't fix the perf issue.

OK, thanks.  I think there's a few more places where SCHED_LOAD_SCALE 
needs to be multiplied by NICE_TO_BIAS_PRIO(0).  Basically, anywhere 
that it's added to, subtracted from or compared to a load.  In those 
cases it's being used as a scaled version of 1 and we need a scaled 
version of NICE_TO_BIAS_PRIO(0).  I'll have another patch later today.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12 20:03                           ` Peter Williams
@ 2006-01-12 22:20                             ` Peter Williams
  2006-01-13  7:06                               ` Peter Williams
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Williams @ 2006-01-12 22:20 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft

Peter Williams wrote:
> Martin Bligh wrote:
> 
>>
>>>>
>>>> But I was thinking more about the code that (in the original) 
>>>> handled the case where the number of tasks to be moved was less than 
>>>> 1 but more than 0 (i.e. the cases where "imbalance" would have been 
>>>> reduced to zero when divided by SCHED_LOAD_SCALE).  I think that I 
>>>> got that part wrong and you can end up with a bias load to be moved 
>>>> which is less than any of the bias_prio values for any queued tasks 
>>>> (in circumstances where the original code would have rounded up to 1 
>>>> and caused a move).  I think that the way to handle this problem is 
>>>> to replace 1 with "average bias prio" within that logic.  This would 
>>>> guarantee at least one task with a bias_prio small enough to be moved.
>>>>
>>>> I think that this analysis is a strong argument for my original 
>>>> patch being the cause of the problem so I'll go ahead and generate a 
>>>> fix. I'll try to have a patch available later this morning.
>>>
>>>
>>>
>>>
>>> Attached is a patch that addresses this problem.  Unlike the 
>>> description above it does not use "average bias prio" as that 
>>> solution would be very complicated.  Instead it makes the assumption 
>>> that NICE_TO_BIAS_PRIO(0) is a "good enough" for this purpose as this 
>>> is highly likely to be the median bias prio and the median is 
>>> probably better for this purpose than the average.
>>>
>>> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
>>
>>
>>
>> Doesn't fix the perf issue.
> 
> 
> OK, thanks.  I think there's a few more places where SCHED_LOAD_SCALE 
> needs to be multiplied by NICE_TO_BIAS_PRIO(0).  Basically, anywhere 
> that it's added to, subtracted from or compared to a load.  In those 
> cases it's being used as a scaled version of 1 and we need a scaled 

This would have been better said as "the load generated by 1 task" 
rather than just "a scaled version of 1".  Numerically, they're the same 
but one is clearer than the other and makes it more obvious why we need 
NICE_TO_BIAS_PRIO(0) * SCHED_LOAD_SCALE and where we need it.

> version of NICE_TO_BIAS_PRIO(0).  I'll have another patch later today.

I'm just testing this at the moment.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12 22:20                             ` Peter Williams
@ 2006-01-13  7:06                               ` Peter Williams
  2006-01-13 12:00                                 ` Peter Williams
                                                   ` (3 more replies)
  0 siblings, 4 replies; 59+ messages in thread
From: Peter Williams @ 2006-01-13  7:06 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Martin Bligh, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft

[-- Attachment #1: Type: text/plain, Size: 4046 bytes --]

Peter Williams wrote:
> Peter Williams wrote:
> 
>> Martin Bligh wrote:
>>
>>>
>>>>>
>>>>> But I was thinking more about the code that (in the original) 
>>>>> handled the case where the number of tasks to be moved was less 
>>>>> than 1 but more than 0 (i.e. the cases where "imbalance" would have 
>>>>> been reduced to zero when divided by SCHED_LOAD_SCALE).  I think 
>>>>> that I got that part wrong and you can end up with a bias load to 
>>>>> be moved which is less than any of the bias_prio values for any 
>>>>> queued tasks (in circumstances where the original code would have 
>>>>> rounded up to 1 and caused a move).  I think that the way to handle 
>>>>> this problem is to replace 1 with "average bias prio" within that 
>>>>> logic.  This would guarantee at least one task with a bias_prio 
>>>>> small enough to be moved.
>>>>>
>>>>> I think that this analysis is a strong argument for my original 
>>>>> patch being the cause of the problem so I'll go ahead and generate 
>>>>> a fix. I'll try to have a patch available later this morning.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Attached is a patch that addresses this problem.  Unlike the 
>>>> description above it does not use "average bias prio" as that 
>>>> solution would be very complicated.  Instead it makes the assumption 
>>>> that NICE_TO_BIAS_PRIO(0) is a "good enough" for this purpose as 
>>>> this is highly likely to be the median bias prio and the median is 
>>>> probably better for this purpose than the average.
>>>>
>>>> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
>>>
>>>
>>>
>>>
>>> Doesn't fix the perf issue.
>>
>>
>>
>> OK, thanks.  I think there's a few more places where SCHED_LOAD_SCALE 
>> needs to be multiplied by NICE_TO_BIAS_PRIO(0).  Basically, anywhere 
>> that it's added to, subtracted from or compared to a load.  In those 
>> cases it's being used as a scaled version of 1 and we need a scaled 
> 
> 
> This would have been better said as "the load generated by 1 task" 
> rather than just "a scaled version of 1".  Numerically, they're the same 
> but one is clearer than the other and makes it more obvious why we need 
> NICE_TO_BIAS_PRIO(0) * SCHED_LOAD_SCALE and where we need it.
> 
>> version of NICE_TO_BIAS_PRIO(0).  I'll have another patch later today.
> 
> 
> I'm just testing this at the moment.

Attached is a new patch to fix the excessive idle problem.  This patch 
takes a new approach to the problem as it was becoming obvious that 
trying to alter the load balancing code to cope with biased load was 
harder than it seemed.

This approach reverts to the old load values but weights them according 
to tasks' bias_prio values.  This means that any assumptions by the load 
balancing code that the load generated by a single task is 
SCHED_LOAD_SCALE will still hold.  Then, in find_busiest_group(), the 
imbalance is scaled back up to bias_prio scale so that move_tasks() can 
move biased load rather than tasks.

One advantage of this is that when there are no non zero niced tasks the 
processing will be mathematically the same as the original code. 
Kernbench results from a 2 CPU Celeron 550Mhz system are:

Average Optimal -j 8 Load Run:
Elapsed Time 1056.16 (0.831102)
User Time 1906.54 (1.38447)
System Time 182.086 (0.973386)
Percent CPU 197 (0)
Context Switches 48727.2 (249.351)
Sleeps 27623.4 (413.913)

This indicates that, on average, 98.9% of the total available CPU was 
used by the build.

Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>

BTW I think that we need to think about a slightly more complex nice to 
bias mapping function.  The current one gives a nice==19 1/20 of the 
bias of a nice=0 task but only gives nice=-20 tasks twice the bias of a 
nice=0 task.  I don't think this is a big problem as the majority of non 
nice==0 tasks will have positive nice but should be looked at for a 
future enhancement.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

[-- Attachment #2: different-approach-to-smp-nice-problem --]
[-- Type: text/plain, Size: 1832 bytes --]

Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c	2006-01-13 14:53:34.000000000 +1100
+++ MM-2.6.X/kernel/sched.c	2006-01-13 15:11:19.000000000 +1100
@@ -1042,7 +1042,8 @@ void kick_process(task_t *p)
 static unsigned long source_load(int cpu, int type)
 {
 	runqueue_t *rq = cpu_rq(cpu);
-	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
+	unsigned long load_now = (rq->prio_bias * SCHED_LOAD_SCALE) /
+		NICE_TO_BIAS_PRIO(0);
 
 	if (type == 0)
 		return load_now;
@@ -1056,7 +1057,8 @@ static unsigned long source_load(int cpu
 static inline unsigned long target_load(int cpu, int type)
 {
 	runqueue_t *rq = cpu_rq(cpu);
-	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
+	unsigned long load_now = (rq->prio_bias * SCHED_LOAD_SCALE) /
+		NICE_TO_BIAS_PRIO(0);
 
 	if (type == 0)
 		return load_now;
@@ -1322,7 +1324,8 @@ static int try_to_wake_up(task_t *p, uns
 			 * of the current CPU:
 			 */
 			if (sync)
-				tl -= p->bias_prio * SCHED_LOAD_SCALE;
+				tl -= (p->bias_prio * SCHED_LOAD_SCALE) /
+					NICE_TO_BIAS_PRIO(0);
 
 			if ((tl <= load &&
 				tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) ||
@@ -2159,7 +2162,7 @@ find_busiest_group(struct sched_domain *
 	}
 
 	/* Get rid of the scaling factor, rounding down as we divide */
-	*imbalance = *imbalance / SCHED_LOAD_SCALE;
+	*imbalance = (*imbalance * NICE_TO_BIAS_PRIO(0)) / SCHED_LOAD_SCALE;
 	return busiest;
 
 out_balanced:
@@ -2472,7 +2475,8 @@ static void rebalance_tick(int this_cpu,
 	struct sched_domain *sd;
 	int i;
 
-	this_load = this_rq->prio_bias * SCHED_LOAD_SCALE;
+	this_load = (this_rq->prio_bias * SCHED_LOAD_SCALE) /
+		NICE_TO_BIAS_PRIO(0);
 	/* Update our load */
 	for (i = 0; i < 3; i++) {
 		unsigned long new_load = this_load;

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-13  7:06                               ` Peter Williams
@ 2006-01-13 12:00                                 ` Peter Williams
  2006-01-13 16:15                                 ` Martin J. Bligh
                                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 59+ messages in thread
From: Peter Williams @ 2006-01-13 12:00 UTC (permalink / raw)
  To: Peter Williams
  Cc: Con Kolivas, Martin Bligh, Andrew Morton, linux-kernel,
	Ingo Molnar, Andy Whitcroft

Peter Williams wrote:
> Peter Williams wrote:
> 
>> Peter Williams wrote:
>>
>>> Martin Bligh wrote:
>>>
>>>>
>>>>>>
>>>>>> But I was thinking more about the code that (in the original) 
>>>>>> handled the case where the number of tasks to be moved was less 
>>>>>> than 1 but more than 0 (i.e. the cases where "imbalance" would 
>>>>>> have been reduced to zero when divided by SCHED_LOAD_SCALE).  I 
>>>>>> think that I got that part wrong and you can end up with a bias 
>>>>>> load to be moved which is less than any of the bias_prio values 
>>>>>> for any queued tasks (in circumstances where the original code 
>>>>>> would have rounded up to 1 and caused a move).  I think that the 
>>>>>> way to handle this problem is to replace 1 with "average bias 
>>>>>> prio" within that logic.  This would guarantee at least one task 
>>>>>> with a bias_prio small enough to be moved.
>>>>>>
>>>>>> I think that this analysis is a strong argument for my original 
>>>>>> patch being the cause of the problem so I'll go ahead and generate 
>>>>>> a fix. I'll try to have a patch available later this morning.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Attached is a patch that addresses this problem.  Unlike the 
>>>>> description above it does not use "average bias prio" as that 
>>>>> solution would be very complicated.  Instead it makes the 
>>>>> assumption that NICE_TO_BIAS_PRIO(0) is a "good enough" for this 
>>>>> purpose as this is highly likely to be the median bias prio and the 
>>>>> median is probably better for this purpose than the average.
>>>>>
>>>>> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Doesn't fix the perf issue.
>>>
>>>
>>>
>>>
>>> OK, thanks.  I think there's a few more places where SCHED_LOAD_SCALE 
>>> needs to be multiplied by NICE_TO_BIAS_PRIO(0).  Basically, anywhere 
>>> that it's added to, subtracted from or compared to a load.  In those 
>>> cases it's being used as a scaled version of 1 and we need a scaled 
>>
>>
>>
>> This would have been better said as "the load generated by 1 task" 
>> rather than just "a scaled version of 1".  Numerically, they're the 
>> same but one is clearer than the other and makes it more obvious why 
>> we need NICE_TO_BIAS_PRIO(0) * SCHED_LOAD_SCALE and where we need it.
>>
>>> version of NICE_TO_BIAS_PRIO(0).  I'll have another patch later today.
>>
>>
>>
>> I'm just testing this at the moment.
> 
> 
> Attached is a new patch to fix the excessive idle problem.  This patch 
> takes a new approach to the problem as it was becoming obvious that 
> trying to alter the load balancing code to cope with biased load was 
> harder than it seemed.
> 
> This approach reverts to the old load values but weights them according 
> to tasks' bias_prio values.  This means that any assumptions by the load 
> balancing code that the load generated by a single task is 
> SCHED_LOAD_SCALE will still hold.  Then, in find_busiest_group(), the 
> imbalance is scaled back up to bias_prio scale so that move_tasks() can 
> move biased load rather than tasks.
> 
> One advantage of this is that when there are no non zero niced tasks the 
> processing will be mathematically the same as the original code. 
> Kernbench results from a 2 CPU Celeron 550Mhz system are:
> 
> Average Optimal -j 8 Load Run:
> Elapsed Time 1056.16 (0.831102)
> User Time 1906.54 (1.38447)
> System Time 182.086 (0.973386)
> Percent CPU 197 (0)
> Context Switches 48727.2 (249.351)
> Sleeps 27623.4 (413.913)
> 
> This indicates that, on average, 98.9% of the total available CPU was 
> used by the build.

Here's the numbers for the same machine with the "improved smp nice 
handling" completely removed i.e. back to 2.6.15 version.

Average Optimal -j 8 Load Run:
Elapsed Time 1059.95 (1.19324)
User Time 1914.94 (1.11102)
System Time 181.846 (0.916695)
Percent CPU 197.4 (0.547723)
Context Switches 40917.4 (469.184)
Sleeps 26654 (320.824)

> 
> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
> 
> BTW I think that we need to think about a slightly more complex nice to 
> bias mapping function.  The current one gives a nice==19 1/20 of the 
> bias of a nice=0 task but only gives nice=-20 tasks twice the bias of a 
> nice=0 task.  I don't think this is a big problem as the majority of non 
> nice==0 tasks will have positive nice but should be looked at for a 
> future enhancement.
> 
> Peter
> 
> 
> ------------------------------------------------------------------------
> 
> Index: MM-2.6.X/kernel/sched.c
> ===================================================================
> --- MM-2.6.X.orig/kernel/sched.c	2006-01-13 14:53:34.000000000 +1100
> +++ MM-2.6.X/kernel/sched.c	2006-01-13 15:11:19.000000000 +1100
> @@ -1042,7 +1042,8 @@ void kick_process(task_t *p)
>  static unsigned long source_load(int cpu, int type)
>  {
>  	runqueue_t *rq = cpu_rq(cpu);
> -	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
> +	unsigned long load_now = (rq->prio_bias * SCHED_LOAD_SCALE) /
> +		NICE_TO_BIAS_PRIO(0);
>  
>  	if (type == 0)
>  		return load_now;
> @@ -1056,7 +1057,8 @@ static unsigned long source_load(int cpu
>  static inline unsigned long target_load(int cpu, int type)
>  {
>  	runqueue_t *rq = cpu_rq(cpu);
> -	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
> +	unsigned long load_now = (rq->prio_bias * SCHED_LOAD_SCALE) /
> +		NICE_TO_BIAS_PRIO(0);
>  
>  	if (type == 0)
>  		return load_now;
> @@ -1322,7 +1324,8 @@ static int try_to_wake_up(task_t *p, uns
>  			 * of the current CPU:
>  			 */
>  			if (sync)
> -				tl -= p->bias_prio * SCHED_LOAD_SCALE;
> +				tl -= (p->bias_prio * SCHED_LOAD_SCALE) /
> +					NICE_TO_BIAS_PRIO(0);
>  
>  			if ((tl <= load &&
>  				tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) ||
> @@ -2159,7 +2162,7 @@ find_busiest_group(struct sched_domain *
>  	}
>  
>  	/* Get rid of the scaling factor, rounding down as we divide */
> -	*imbalance = *imbalance / SCHED_LOAD_SCALE;
> +	*imbalance = (*imbalance * NICE_TO_BIAS_PRIO(0)) / SCHED_LOAD_SCALE;
>  	return busiest;
>  
>  out_balanced:
> @@ -2472,7 +2475,8 @@ static void rebalance_tick(int this_cpu,
>  	struct sched_domain *sd;
>  	int i;
>  
> -	this_load = this_rq->prio_bias * SCHED_LOAD_SCALE;
> +	this_load = (this_rq->prio_bias * SCHED_LOAD_SCALE) /
> +		NICE_TO_BIAS_PRIO(0);
>  	/* Update our load */
>  	for (i = 0; i < 3; i++) {
>  		unsigned long new_load = this_load;


-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-13  7:06                               ` Peter Williams
  2006-01-13 12:00                                 ` Peter Williams
@ 2006-01-13 16:15                                 ` Martin J. Bligh
  2006-01-13 16:26                                 ` Andy Whitcroft
  2006-01-14 18:48                                 ` Martin J. Bligh
  3 siblings, 0 replies; 59+ messages in thread
From: Martin J. Bligh @ 2006-01-13 16:15 UTC (permalink / raw)
  To: Peter Williams
  Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft


>
> Attached is a new patch to fix the excessive idle problem.  This patch 
> takes a new approach to the problem as it was becoming obvious that 
> trying to alter the load balancing code to cope with biased load was 
> harder than it seemed.
>

OK, well the last one didn't work.
(peterw-fix-excessive-idling-with-smp-move-load-not-tasks)

If Andy is feeling very kind, perhaps he will kick this one too.
This one is for real, right? ;-)

M.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-13  7:06                               ` Peter Williams
  2006-01-13 12:00                                 ` Peter Williams
  2006-01-13 16:15                                 ` Martin J. Bligh
@ 2006-01-13 16:26                                 ` Andy Whitcroft
  2006-01-13 17:54                                   ` Andy Whitcroft
  2006-01-14 18:48                                 ` Martin J. Bligh
  3 siblings, 1 reply; 59+ messages in thread
From: Andy Whitcroft @ 2006-01-13 16:26 UTC (permalink / raw)
  To: Peter Williams
  Cc: Con Kolivas, Martin Bligh, Andrew Morton, linux-kernel, Ingo Molnar

Peter Williams wrote:

> Attached is a new patch to fix the excessive idle problem.  This patch
> takes a new approach to the problem as it was becoming obvious that
> trying to alter the load balancing code to cope with biased load was
> harder than it seemed.

Ok.  Tried testing different-approach-to-smp-nice-problem against the
transition release 2.6.14-rc2-mm1 but it doesn't apply.  Am testing
against 2.6.15-mm3 right now.  Will let you know.

-apw

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-13 16:26                                 ` Andy Whitcroft
@ 2006-01-13 17:54                                   ` Andy Whitcroft
  2006-01-13 20:41                                     ` Martin Bligh
  2006-01-13 22:59                                     ` Peter Williams
  0 siblings, 2 replies; 59+ messages in thread
From: Andy Whitcroft @ 2006-01-13 17:54 UTC (permalink / raw)
  To: Peter Williams
  Cc: Andy Whitcroft, Con Kolivas, Martin Bligh, Andrew Morton,
	linux-kernel, Ingo Molnar

Andy Whitcroft wrote:
> Peter Williams wrote:
> 
> 
>>Attached is a new patch to fix the excessive idle problem.  This patch
>>takes a new approach to the problem as it was becoming obvious that
>>trying to alter the load balancing code to cope with biased load was
>>harder than it seemed.
> 
> 
> Ok.  Tried testing different-approach-to-smp-nice-problem against the
> transition release 2.6.14-rc2-mm1 but it doesn't apply.  Am testing
> against 2.6.15-mm3 right now.  Will let you know.

Doesn't appear to help if I am analysing the graphs right.  Martin?

Also tried to back out the original patch against the 2.6.15-mm3 tree
but that also won't apply (2 rejects).  Peter?

-apw

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-13 17:54                                   ` Andy Whitcroft
@ 2006-01-13 20:41                                     ` Martin Bligh
  2006-01-14  0:23                                       ` Peter Williams
  2006-01-13 22:59                                     ` Peter Williams
  1 sibling, 1 reply; 59+ messages in thread
From: Martin Bligh @ 2006-01-13 20:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Peter Williams, Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar

Andy Whitcroft wrote:
> Andy Whitcroft wrote:
> 
>>Peter Williams wrote:
>>
>>
>>
>>>Attached is a new patch to fix the excessive idle problem.  This patch
>>>takes a new approach to the problem as it was becoming obvious that
>>>trying to alter the load balancing code to cope with biased load was
>>>harder than it seemed.
>>
>>
>>Ok.  Tried testing different-approach-to-smp-nice-problem against the
>>transition release 2.6.14-rc2-mm1 but it doesn't apply.  Am testing
>>against 2.6.15-mm3 right now.  Will let you know.
> 
> 
> Doesn't appear to help if I am analysing the graphs right.  Martin?

Nope. still broken.

> Also tried to back out the original patch against the 2.6.15-mm3 tree
> but that also won't apply (2 rejects).  Peter?
> 
> -apw


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-13 17:54                                   ` Andy Whitcroft
  2006-01-13 20:41                                     ` Martin Bligh
@ 2006-01-13 22:59                                     ` Peter Williams
  1 sibling, 0 replies; 59+ messages in thread
From: Peter Williams @ 2006-01-13 22:59 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Con Kolivas, Martin Bligh, Andrew Morton, linux-kernel, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 963 bytes --]

Andy Whitcroft wrote:
> Andy Whitcroft wrote:
> 
>>Peter Williams wrote:
>>
>>
>>
>>>Attached is a new patch to fix the excessive idle problem.  This patch
>>>takes a new approach to the problem as it was becoming obvious that
>>>trying to alter the load balancing code to cope with biased load was
>>>harder than it seemed.
>>
>>
>>Ok.  Tried testing different-approach-to-smp-nice-problem against the
>>transition release 2.6.14-rc2-mm1 but it doesn't apply.  Am testing
>>against 2.6.15-mm3 right now.  Will let you know.
> 
> 
> Doesn't appear to help if I am analysing the graphs right.  Martin?
> 
> Also tried to back out the original patch against the 2.6.15-mm3 tree
> but that also won't apply (2 rejects).  Peter?

Attached is a patch to back the original patch out of 2.6.15-mm3.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

[-- Attachment #2: remove-smp-nice-imp-patch --]
[-- Type: text/plain, Size: 11133 bytes --]

Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c	2006-01-13 18:09:49.000000000 +1100
+++ MM-2.6.X/kernel/sched.c	2006-01-13 18:21:38.000000000 +1100
@@ -63,16 +63,6 @@
 #define PRIO_TO_NICE(prio)	((prio) - MAX_RT_PRIO - 20)
 #define TASK_NICE(p)		PRIO_TO_NICE((p)->static_prio)
 
-#ifdef CONFIG_SMP
-/*
- * Priority bias for load balancing ranges from 1 (nice==19) to 139 (RT
- * priority of 100).
- */
-#define NICE_TO_BIAS_PRIO(nice)	(20 - (nice))
-#define PRIO_TO_BIAS_PRIO(prio)	NICE_TO_BIAS_PRIO(PRIO_TO_NICE(prio))
-#define RTPRIO_TO_BIAS_PRIO(rp)	(40 + (rp))
-#endif
-
 /*
  * 'User priority' is the nice value converted to something we
  * can work with better when scaling various scheduler parameters,
@@ -695,53 +685,46 @@ static int effective_prio(task_t *p)
 }
 
 #ifdef CONFIG_SMP
-static inline void set_bias_prio(task_t *p)
-{
-	if (rt_task(p)) {
-		if (p == task_rq(p)->migration_thread)
-			/*
-			 * The migration thread does the actual balancing. Do
-			 * not bias by its priority as the ultra high priority
-			 * will skew balancing adversely.
-			 */
-			p->bias_prio = 0;
-		else
-			p->bias_prio = RTPRIO_TO_BIAS_PRIO(p->rt_priority);
-	} else
-		p->bias_prio = PRIO_TO_BIAS_PRIO(p->static_prio);
-}
-
-static inline void inc_prio_bias(runqueue_t *rq, const task_t *p)
+static inline void inc_prio_bias(runqueue_t *rq, int prio)
 {
-	rq->prio_bias += p->bias_prio;
+	rq->prio_bias += MAX_PRIO - prio;
 }
 
-static inline void dec_prio_bias(runqueue_t *rq, const task_t *p)
+static inline void dec_prio_bias(runqueue_t *rq, int prio)
 {
-	rq->prio_bias -= p->bias_prio;
+	rq->prio_bias -= MAX_PRIO - prio;
 }
 
 static inline void inc_nr_running(task_t *p, runqueue_t *rq)
 {
 	rq->nr_running++;
-	inc_prio_bias(rq, p);
+	if (rt_task(p)) {
+		if (p != rq->migration_thread)
+			/*
+			 * The migration thread does the actual balancing. Do
+			 * not bias by its priority as the ultra high priority
+			 * will skew balancing adversely.
+			 */
+			inc_prio_bias(rq, p->prio);
+	} else
+		inc_prio_bias(rq, p->static_prio);
 }
 
 static inline void dec_nr_running(task_t *p, runqueue_t *rq)
 {
 	rq->nr_running--;
-	dec_prio_bias(rq, p);
+	if (rt_task(p)) {
+		if (p != rq->migration_thread)
+			dec_prio_bias(rq, p->prio);
+	} else
+		dec_prio_bias(rq, p->static_prio);
 }
 #else
-static inline void set_bias_prio(task_t *p)
-{
-}
-
-static inline void inc_prio_bias(runqueue_t *rq, const task_t *p)
+static inline void inc_prio_bias(runqueue_t *rq, int prio)
 {
 }
 
-static inline void dec_prio_bias(runqueue_t *rq, const task_t *p)
+static inline void dec_prio_bias(runqueue_t *rq, int prio)
 {
 }
 
@@ -1039,29 +1022,61 @@ void kick_process(task_t *p)
  * We want to under-estimate the load of migration sources, to
  * balance conservatively.
  */
-static unsigned long source_load(int cpu, int type)
+static inline unsigned long __source_load(int cpu, int type, enum idle_type idle)
 {
 	runqueue_t *rq = cpu_rq(cpu);
-	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
+	unsigned long running = rq->nr_running;
+	unsigned long source_load, cpu_load = rq->cpu_load[type-1],
+		load_now = running * SCHED_LOAD_SCALE;
 
 	if (type == 0)
-		return load_now;
+		source_load = load_now;
+	else
+		source_load = min(cpu_load, load_now);
+
+	if (running > 1 || (idle == NOT_IDLE && running))
+		/*
+		 * If we are busy rebalancing the load is biased by
+		 * priority to create 'nice' support across cpus. When
+		 * idle rebalancing we should only bias the source_load if
+		 * there is more than one task running on that queue to
+		 * prevent idle rebalance from trying to pull tasks from a
+		 * queue with only one running task.
+		 */
+		source_load = source_load * rq->prio_bias / running;
+
+	return source_load;
+}
 
-	return min(rq->cpu_load[type-1], load_now);
+static inline unsigned long source_load(int cpu, int type)
+{
+	return __source_load(cpu, type, NOT_IDLE);
 }
 
 /*
  * Return a high guess at the load of a migration-target cpu
  */
-static inline unsigned long target_load(int cpu, int type)
+static inline unsigned long __target_load(int cpu, int type, enum idle_type idle)
 {
 	runqueue_t *rq = cpu_rq(cpu);
-	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
+	unsigned long running = rq->nr_running;
+	unsigned long target_load, cpu_load = rq->cpu_load[type-1],
+		load_now = running * SCHED_LOAD_SCALE;
 
 	if (type == 0)
-		return load_now;
+		target_load = load_now;
+	else
+		target_load = max(cpu_load, load_now);
+
+	if (running > 1 || (idle == NOT_IDLE && running))
+		target_load = target_load * rq->prio_bias / running;
+
+	return target_load;
+}
 
-	return max(rq->cpu_load[type-1], load_now);
+static inline unsigned long target_load(int cpu, int type)
+{
+	return __target_load(cpu, type, NOT_IDLE);
 }
 
 /*
@@ -1322,7 +1337,7 @@ static int try_to_wake_up(task_t *p, uns
 			 * of the current CPU:
 			 */
 			if (sync)
-				tl -= p->bias_prio * SCHED_LOAD_SCALE;
+				tl -= SCHED_LOAD_SCALE;
 
 			if ((tl <= load &&
 				tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) ||
@@ -1934,16 +1949,15 @@ int can_migrate_task(task_t *p, runqueue
  * Called with both runqueues locked.
  */
 static int move_tasks(runqueue_t *this_rq, int this_cpu, runqueue_t *busiest,
-		      unsigned long max_nr_move, long max_bias_move,
-		      struct sched_domain *sd, enum idle_type idle,
-		      int *all_pinned)
+		      unsigned long max_nr_move, struct sched_domain *sd,
+		      enum idle_type idle, int *all_pinned)
 {
 	prio_array_t *array, *dst_array;
 	struct list_head *head, *curr;
 	int idx, pulled = 0, pinned = 0;
 	task_t *tmp;
 
-	if (max_nr_move == 0 || max_bias_move == 0)
+	if (max_nr_move == 0)
 		goto out;
 
 	pinned = 1;
@@ -1986,8 +2000,7 @@ skip_queue:
 
 	curr = curr->prev;
 
-	if (tmp->bias_prio > max_bias_move ||
-	    !can_migrate_task(tmp, busiest, this_cpu, sd, idle, &pinned)) {
+	if (!can_migrate_task(tmp, busiest, this_cpu, sd, idle, &pinned)) {
 		if (curr != head)
 			goto skip_queue;
 		idx++;
@@ -2001,13 +2014,9 @@ skip_queue:
 
 	pull_task(busiest, array, tmp, this_rq, dst_array, this_cpu);
 	pulled++;
-	max_bias_move -= tmp->bias_prio;
 
-	/*
-	 * We only want to steal up to the prescribed number of tasks
-	 * and the prescribed amount of biased load.
-	 */
-	if (pulled < max_nr_move && max_bias_move > 0) {
+	/* We only want to steal up to the prescribed number of tasks. */
+	if (pulled < max_nr_move) {
 		if (curr != head)
 			goto skip_queue;
 		idx++;
@@ -2028,7 +2037,7 @@ out:
 
 /*
  * find_busiest_group finds and returns the busiest CPU group within the
- * domain. It calculates and returns the amount of biased load which should be
+ * domain. It calculates and returns the number of tasks which should be
  * moved to restore balance via the imbalance parameter.
  */
 static struct sched_group *
@@ -2064,9 +2073,9 @@ find_busiest_group(struct sched_domain *
 
 			/* Bias balancing toward cpus of our domain */
 			if (local_group)
-				load = target_load(i, load_idx);
+				load = __target_load(i, load_idx, idle);
 			else
-				load = source_load(i, load_idx);
+				load = __source_load(i, load_idx, idle);
 
 			avg_load += load;
 		}
@@ -2121,7 +2130,7 @@ find_busiest_group(struct sched_domain *
 		unsigned long tmp;
 
 		if (max_load - this_load >= SCHED_LOAD_SCALE*2) {
-			*imbalance = NICE_TO_BIAS_PRIO(0);
+			*imbalance = 1;
 			return busiest;
 		}
 
@@ -2154,7 +2163,7 @@ find_busiest_group(struct sched_domain *
 		if (pwr_move <= pwr_now)
 			goto out_balanced;
 
-		*imbalance = NICE_TO_BIAS_PRIO(0);
+		*imbalance = 1;
 		return busiest;
 	}
 
@@ -2171,14 +2180,15 @@ out_balanced:
 /*
  * find_busiest_queue - find the busiest runqueue among the cpus in group.
  */
-static runqueue_t *find_busiest_queue(struct sched_group *group)
+static runqueue_t *find_busiest_queue(struct sched_group *group,
+	enum idle_type idle)
 {
 	unsigned long load, max_load = 0;
 	runqueue_t *busiest = NULL;
 	int i;
 
 	for_each_cpu_mask(i, group->cpumask) {
-		load = source_load(i, 0);
+		load = __source_load(i, 0, idle);
 
 		if (load > max_load) {
 			max_load = load;
@@ -2195,7 +2205,6 @@ static runqueue_t *find_busiest_queue(st
  */
 #define MAX_PINNED_INTERVAL	512
 
-#define minus_1_or_zero(n) ((n) > 0 ? (n) - 1 : 0)
 /*
  * Check this_cpu to ensure it is balanced within domain. Attempt to move
  * tasks if there is an imbalance.
@@ -2223,7 +2232,7 @@ static int load_balance(int this_cpu, ru
 		goto out_balanced;
 	}
 
-	busiest = find_busiest_queue(group);
+	busiest = find_busiest_queue(group, idle);
 	if (!busiest) {
 		schedstat_inc(sd, lb_nobusyq[idle]);
 		goto out_balanced;
@@ -2243,7 +2252,6 @@ static int load_balance(int this_cpu, ru
 		 */
 		double_rq_lock(this_rq, busiest);
 		nr_moved = move_tasks(this_rq, this_cpu, busiest,
-					minus_1_or_zero(busiest->nr_running),
 					imbalance, sd, idle, &all_pinned);
 		double_rq_unlock(this_rq, busiest);
 
@@ -2347,7 +2355,7 @@ static int load_balance_newidle(int this
 		goto out_balanced;
 	}
 
-	busiest = find_busiest_queue(group);
+	busiest = find_busiest_queue(group, NEWLY_IDLE);
 	if (!busiest) {
 		schedstat_inc(sd, lb_nobusyq[NEWLY_IDLE]);
 		goto out_balanced;
@@ -2362,7 +2370,6 @@ static int load_balance_newidle(int this
 		/* Attempt to move tasks */
 		double_lock_balance(this_rq, busiest);
 		nr_moved = move_tasks(this_rq, this_cpu, busiest,
-					minus_1_or_zero(busiest->nr_running),
 					imbalance, sd, NEWLY_IDLE, NULL);
 		spin_unlock(&busiest->lock);
 	}
@@ -2443,8 +2450,7 @@ static void active_load_balance(runqueue
 
 	schedstat_inc(sd, alb_cnt);
 
-	if (move_tasks(target_rq, target_cpu, busiest_rq, 1,
-			RTPRIO_TO_BIAS_PRIO(100), sd, SCHED_IDLE, NULL))
+	if (move_tasks(target_rq, target_cpu, busiest_rq, 1, sd, SCHED_IDLE, NULL))
 		schedstat_inc(sd, alb_pushed);
 	else
 		schedstat_inc(sd, alb_failed);
@@ -2472,7 +2478,7 @@ static void rebalance_tick(int this_cpu,
 	struct sched_domain *sd;
 	int i;
 
-	this_load = this_rq->prio_bias * SCHED_LOAD_SCALE;
+	this_load = this_rq->nr_running * SCHED_LOAD_SCALE;
 	/* Update our load */
 	for (i = 0; i < 3; i++) {
 		unsigned long new_load = this_load;
@@ -3612,19 +3618,18 @@ void set_user_nice(task_t *p, long nice)
 	array = p->array;
 	if (array) {
 		dequeue_task(p, array);
-		dec_prio_bias(rq, p);
+		dec_prio_bias(rq, p->static_prio);
 	}
 
 	old_prio = p->prio;
 	new_prio = NICE_TO_PRIO(nice);
 	delta = new_prio - old_prio;
 	p->static_prio = NICE_TO_PRIO(nice);
-	set_bias_prio(p);
 	p->prio += delta;
 
 	if (array) {
 		enqueue_task(p, array);
-		inc_prio_bias(rq, p);
+		inc_prio_bias(rq, p->static_prio);
 		/*
 		 * If the task increased its priority or is running and
 		 * lowered its priority, then reschedule its CPU:
@@ -3767,7 +3772,6 @@ static void __setscheduler(struct task_s
 		if (policy == SCHED_BATCH)
 			p->sleep_avg = 0;
 	}
-	set_bias_prio(p);
 }
 
 /**
@@ -6309,7 +6313,6 @@ void __init sched_init(void)
 		}
 	}
 
-	set_bias_prio(&init_task);
 	/*
 	 * The boot idle thread does lazy MMU switching as well:
 	 */

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-13 20:41                                     ` Martin Bligh
@ 2006-01-14  0:23                                       ` Peter Williams
  2006-01-14  5:03                                         ` Nick Piggin
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Williams @ 2006-01-14  0:23 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Andy Whitcroft, Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar

Martin Bligh wrote:
> Andy Whitcroft wrote:
> 
>> Andy Whitcroft wrote:
>>
>>> Peter Williams wrote:
>>>
>>>
>>>
>>>> Attached is a new patch to fix the excessive idle problem.  This patch
>>>> takes a new approach to the problem as it was becoming obvious that
>>>> trying to alter the load balancing code to cope with biased load was
>>>> harder than it seemed.
>>>
>>>
>>>
>>> Ok.  Tried testing different-approach-to-smp-nice-problem against the
>>> transition release 2.6.14-rc2-mm1 but it doesn't apply.  Am testing
>>> against 2.6.15-mm3 right now.  Will let you know.
>>
>>
>>
>> Doesn't appear to help if I am analysing the graphs right.  Martin?
> 
> 
> Nope. still broken.

Interesting.  The only real difference between this and Con's original 
patch is the stuff that he did in source_load() and target_load() to 
nobble the bias when nr_running is 1 or less.  With this new model it 
should be possible to do something similar in those functions but I'll 
hold off doing anything until a comparison against 2.6.15-mm3 with the 
patch removed is available (as there are other scheduler changes in -mm3).

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-14  0:23                                       ` Peter Williams
@ 2006-01-14  5:03                                         ` Nick Piggin
  2006-01-14  5:40                                           ` Con Kolivas
  2006-01-14  5:53                                           ` Peter Williams
  0 siblings, 2 replies; 59+ messages in thread
From: Nick Piggin @ 2006-01-14  5:03 UTC (permalink / raw)
  To: Peter Williams
  Cc: Martin Bligh, Andy Whitcroft, Con Kolivas, Andrew Morton,
	linux-kernel, Ingo Molnar

Peter Williams wrote:
> Martin Bligh wrote:
> 
>> Andy Whitcroft wrote:
>>
>>> Andy Whitcroft wrote:
>>>
>>>> Peter Williams wrote:
>>>>
>>>>
>>>>
>>>>> Attached is a new patch to fix the excessive idle problem.  This patch
>>>>> takes a new approach to the problem as it was becoming obvious that
>>>>> trying to alter the load balancing code to cope with biased load was
>>>>> harder than it seemed.
>>>>
>>>>
>>>>
>>>>
>>>> Ok.  Tried testing different-approach-to-smp-nice-problem against the
>>>> transition release 2.6.14-rc2-mm1 but it doesn't apply.  Am testing
>>>> against 2.6.15-mm3 right now.  Will let you know.
>>>
>>>
>>>
>>>
>>> Doesn't appear to help if I am analysing the graphs right.  Martin?
>>
>>
>>
>> Nope. still broken.
> 
> 
> Interesting.  The only real difference between this and Con's original 
> patch is the stuff that he did in source_load() and target_load() to 
> nobble the bias when nr_running is 1 or less.  With this new model it 
> should be possible to do something similar in those functions but I'll 
> hold off doing anything until a comparison against 2.6.15-mm3 with the 
> patch removed is available (as there are other scheduler changes in -mm3).
> 

Ideally, balancing should be completely unaffected when all tasks are
of priority 0 which is what I thought yours did, and why I think the
current system is not great.

I'll probably end up taking a look at it one day, if it doesn't get fixed.
I think your patch is pretty close but I didn't quite look close enough to
work out what's going wrong.

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-14  5:03                                         ` Nick Piggin
@ 2006-01-14  5:40                                           ` Con Kolivas
  2006-01-14  6:05                                             ` Nick Piggin
  2006-01-14  5:53                                           ` Peter Williams
  1 sibling, 1 reply; 59+ messages in thread
From: Con Kolivas @ 2006-01-14  5:40 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Williams, Martin Bligh, Andy Whitcroft, Andrew Morton,
	linux-kernel, Ingo Molnar

On Saturday 14 January 2006 16:03, Nick Piggin wrote:
> Ideally, balancing should be completely unaffected when all tasks are
> of priority 0 which is what I thought yours did, and why I think the
> current system is not great.

The current smp nice in mainline 2.6.15 is performance neutral when all are 
the same priority. It's only this improved version in -mm that is not.

Con

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-14  5:03                                         ` Nick Piggin
  2006-01-14  5:40                                           ` Con Kolivas
@ 2006-01-14  5:53                                           ` Peter Williams
  2006-01-14  6:13                                             ` Nick Piggin
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Williams @ 2006-01-14  5:53 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Martin Bligh, Andy Whitcroft, Con Kolivas, Andrew Morton,
	linux-kernel, Ingo Molnar

Nick Piggin wrote:
> Peter Williams wrote:
> 
>> Martin Bligh wrote:
>>
>>> Andy Whitcroft wrote:
>>>
>>>> Andy Whitcroft wrote:
>>>>
>>>>> Peter Williams wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Attached is a new patch to fix the excessive idle problem.  This 
>>>>>> patch
>>>>>> takes a new approach to the problem as it was becoming obvious that
>>>>>> trying to alter the load balancing code to cope with biased load was
>>>>>> harder than it seemed.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Ok.  Tried testing different-approach-to-smp-nice-problem against the
>>>>> transition release 2.6.14-rc2-mm1 but it doesn't apply.  Am testing
>>>>> against 2.6.15-mm3 right now.  Will let you know.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Doesn't appear to help if I am analysing the graphs right.  Martin?
>>>
>>>
>>>
>>>
>>> Nope. still broken.
>>
>>
>>
>> Interesting.  The only real difference between this and Con's original 
>> patch is the stuff that he did in source_load() and target_load() to 
>> nobble the bias when nr_running is 1 or less.  With this new model it 
>> should be possible to do something similar in those functions but I'll 
>> hold off doing anything until a comparison against 2.6.15-mm3 with the 
>> patch removed is available (as there are other scheduler changes in 
>> -mm3).
>>
> 
> Ideally, balancing should be completely unaffected when all tasks are
> of priority 0 which is what I thought yours did, and why I think the
> current system is not great.

This latest change should make that happen as the weights for nice=0 
tasks is unity.

> 
> I'll probably end up taking a look at it one day, if it doesn't get fixed.
> I think your patch is pretty close

I thought that it was there :-).

I figured using the weights (which go away for nice=0 tasks) would make 
it behave nicely with the rest of the load balancing code.

> but I didn't quite look close enough to
> work out what's going wrong.

My testing (albeit on an old 2 cpu Celeron) showed no statistically 
significant difference between with the patch and without.  If you 
ignored the standard deviations and statistical practice and just looked 
at the raw numbers you'd think it was better with the patch than without 
it.  :-)

I assume that Andy Whitcroft is doing a kernbench with the patch removed 
from 2.6.15-mm3 (otherwise why would he ask for a patch to do that) and 
I'm waiting to see how that compares with the run he did with it in. 
There were other scheduling changes in 2.6.15-mm3 so I think this 
comparison is needed in order to be sure that any degradation is still 
due to my patch.

Peter
PS As load balancing maintainer, is the value 128 set in cement for 
SCHED_LOAD_SCALE?  The reason I ask is that if it was changed to be a 
multiple of NICE_TO_BIAS_PRIO(0) (i.e. 20) my modification could be made 
slightly more efficient.
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-14  5:40                                           ` Con Kolivas
@ 2006-01-14  6:05                                             ` Nick Piggin
  0 siblings, 0 replies; 59+ messages in thread
From: Nick Piggin @ 2006-01-14  6:05 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Peter Williams, Martin Bligh, Andy Whitcroft, Andrew Morton,
	linux-kernel, Ingo Molnar

Con Kolivas wrote:
> On Saturday 14 January 2006 16:03, Nick Piggin wrote:
> 
>>Ideally, balancing should be completely unaffected when all tasks are
>>of priority 0 which is what I thought yours did, and why I think the
>>current system is not great.
> 
> 
> The current smp nice in mainline 2.6.15 is performance neutral when all are 
> the same priority. It's only this improved version in -mm that is not.
> 

Oh sure. I wasn't talking about performance though.

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-14  5:53                                           ` Peter Williams
@ 2006-01-14  6:13                                             ` Nick Piggin
  0 siblings, 0 replies; 59+ messages in thread
From: Nick Piggin @ 2006-01-14  6:13 UTC (permalink / raw)
  To: Peter Williams
  Cc: Martin Bligh, Andy Whitcroft, Con Kolivas, Andrew Morton,
	linux-kernel, Ingo Molnar

Peter Williams wrote:
> Nick Piggin wrote:
> 

> I figured using the weights (which go away for nice=0 tasks) would make 
> it behave nicely with the rest of the load balancing code.
> 

OK, if you keep working on it that would be great.

>> but I didn't quite look close enough to
>> work out what's going wrong.
> 
> 
> My testing (albeit on an old 2 cpu Celeron) showed no statistically 
> significant difference between with the patch and without.  If you 
> ignored the standard deviations and statistical practice and just looked 
> at the raw numbers you'd think it was better with the patch than without 
> it.  :-)
> 
> I assume that Andy Whitcroft is doing a kernbench with the patch removed 
> from 2.6.15-mm3 (otherwise why would he ask for a patch to do that) and 
> I'm waiting to see how that compares with the run he did with it in. 
> There were other scheduling changes in 2.6.15-mm3 so I think this 
> comparison is needed in order to be sure that any degradation is still 
> due to my patch.
> 
> Peter
> PS As load balancing maintainer, is the value 128 set in cement for 
> SCHED_LOAD_SCALE?  The reason I ask is that if it was changed to be a 
> multiple of NICE_TO_BIAS_PRIO(0) (i.e. 20) my modification could be made 
> slightly more efficient.

Not set in stone but it should really be a power of two because there are
quite a lot of multiplies and divides done with it.

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-13  7:06                               ` Peter Williams
                                                   ` (2 preceding siblings ...)
  2006-01-13 16:26                                 ` Andy Whitcroft
@ 2006-01-14 18:48                                 ` Martin J. Bligh
  2006-01-15  0:05                                   ` Peter Williams
  3 siblings, 1 reply; 59+ messages in thread
From: Martin J. Bligh @ 2006-01-14 18:48 UTC (permalink / raw)
  To: Peter Williams
  Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft


>
> Attached is a new patch to fix the excessive idle problem.  This patch 
> takes a new approach to the problem as it was becoming obvious that 
> trying to alter the load balancing code to cope with biased load was 
> harder than it seemed.
>
> This approach reverts to the old load values but weights them 
> according to tasks' bias_prio values.  This means that any assumptions 
> by the load balancing code that the load generated by a single task is 
> SCHED_LOAD_SCALE will still hold.  Then, in find_busiest_group(), the 
> imbalance is scaled back up to bias_prio scale so that move_tasks() 
> can move biased load rather than tasks.
>
OK, this one seems to fix the issue that I had, AFAICS. Congrats, and 
thanks,

M.

> One advantage of this is that when there are no non zero niced tasks 
> the processing will be mathematically the same as the original code. 
> Kernbench results from a 2 CPU Celeron 550Mhz system are:
>
> Average Optimal -j 8 Load Run:
> Elapsed Time 1056.16 (0.831102)
> User Time 1906.54 (1.38447)
> System Time 182.086 (0.973386)
> Percent CPU 197 (0)
> Context Switches 48727.2 (249.351)
> Sleeps 27623.4 (413.913)
>
> This indicates that, on average, 98.9% of the total available CPU was 
> used by the build.
>
> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
>
> BTW I think that we need to think about a slightly more complex nice 
> to bias mapping function.  The current one gives a nice==19 1/20 of 
> the bias of a nice=0 task but only gives nice=-20 tasks twice the bias 
> of a nice=0 task.  I don't think this is a big problem as the majority 
> of non nice==0 tasks will have positive nice but should be looked at 
> for a future enhancement.
>
> Peter
>
>------------------------------------------------------------------------
>
>Index: MM-2.6.X/kernel/sched.c
>===================================================================
>--- MM-2.6.X.orig/kernel/sched.c	2006-01-13 14:53:34.000000000 +1100
>+++ MM-2.6.X/kernel/sched.c	2006-01-13 15:11:19.000000000 +1100
>@@ -1042,7 +1042,8 @@ void kick_process(task_t *p)
> static unsigned long source_load(int cpu, int type)
> {
> 	runqueue_t *rq = cpu_rq(cpu);
>-	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
>+	unsigned long load_now = (rq->prio_bias * SCHED_LOAD_SCALE) /
>+		NICE_TO_BIAS_PRIO(0);
> 
> 	if (type == 0)
> 		return load_now;
>@@ -1056,7 +1057,8 @@ static unsigned long source_load(int cpu
> static inline unsigned long target_load(int cpu, int type)
> {
> 	runqueue_t *rq = cpu_rq(cpu);
>-	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
>+	unsigned long load_now = (rq->prio_bias * SCHED_LOAD_SCALE) /
>+		NICE_TO_BIAS_PRIO(0);
> 
> 	if (type == 0)
> 		return load_now;
>@@ -1322,7 +1324,8 @@ static int try_to_wake_up(task_t *p, uns
> 			 * of the current CPU:
> 			 */
> 			if (sync)
>-				tl -= p->bias_prio * SCHED_LOAD_SCALE;
>+				tl -= (p->bias_prio * SCHED_LOAD_SCALE) /
>+					NICE_TO_BIAS_PRIO(0);
> 
> 			if ((tl <= load &&
> 				tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) ||
>@@ -2159,7 +2162,7 @@ find_busiest_group(struct sched_domain *
> 	}
> 
> 	/* Get rid of the scaling factor, rounding down as we divide */
>-	*imbalance = *imbalance / SCHED_LOAD_SCALE;
>+	*imbalance = (*imbalance * NICE_TO_BIAS_PRIO(0)) / SCHED_LOAD_SCALE;
> 	return busiest;
> 
> out_balanced:
>@@ -2472,7 +2475,8 @@ static void rebalance_tick(int this_cpu,
> 	struct sched_domain *sd;
> 	int i;
> 
>-	this_load = this_rq->prio_bias * SCHED_LOAD_SCALE;
>+	this_load = (this_rq->prio_bias * SCHED_LOAD_SCALE) /
>+		NICE_TO_BIAS_PRIO(0);
> 	/* Update our load */
> 	for (i = 0; i < 3; i++) {
> 		unsigned long new_load = this_load;
>  
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-14 18:48                                 ` Martin J. Bligh
@ 2006-01-15  0:05                                   ` Peter Williams
  2006-01-15  2:04                                     ` Con Kolivas
                                                       ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Peter Williams @ 2006-01-15  0:05 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Ingo Molnar,
	Andy Whitcroft, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 1142 bytes --]

Martin J. Bligh wrote:
> 
>>
>> Attached is a new patch to fix the excessive idle problem.  This patch 
>> takes a new approach to the problem as it was becoming obvious that 
>> trying to alter the load balancing code to cope with biased load was 
>> harder than it seemed.
>>
>> This approach reverts to the old load values but weights them 
>> according to tasks' bias_prio values.  This means that any assumptions 
>> by the load balancing code that the load generated by a single task is 
>> SCHED_LOAD_SCALE will still hold.  Then, in find_busiest_group(), the 
>> imbalance is scaled back up to bias_prio scale so that move_tasks() 
>> can move biased load rather than tasks.
>>
> OK, this one seems to fix the issue that I had, AFAICS. Congrats, and 
> thanks,

Terrific, thanks for testing.

Con,
	Attached is a cleaned up version of this patch against 2.6.15-mm4 with 
some (hopefully helpful) comments added.

Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>

-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

[-- Attachment #2: different-approach-to-smp-nice-problem-v2 --]
[-- Type: text/plain, Size: 4476 bytes --]

Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c	2006-01-15 09:54:27.000000000 +1100
+++ MM-2.6.X/kernel/sched.c	2006-01-15 10:50:11.000000000 +1100
@@ -681,6 +681,13 @@ static int effective_prio(task_t *p)
 }
 
 #ifdef CONFIG_SMP
+/*
+ * To aid in avoiding the subversion of "niceness" due to uneven distribution
+ * of tasks with abnormal "nice" values accross CPUs the contribution that
+ * each task makes to its run queue's load is weighted according to its
+ * scheduling class and "nice" value.  The bias_prio field holds the value
+ * used to calculate the weight for each task.
+ */
 static inline void set_bias_prio(task_t *p)
 {
 	if (rt_task(p)) {
@@ -718,6 +725,18 @@ static inline void dec_nr_running(task_t
 	rq->nr_running--;
 	dec_prio_bias(rq, p);
 }
+
+/* convert biased priority to scaled weighted load */
+static inline unsigned long weighted_load(unsigned long bias)
+{
+	return (bias * SCHED_LOAD_SCALE) / NICE_TO_BIAS_PRIO(0);
+}
+
+/* convert scaled weighted load to unscaled biased load */
+static inline unsigned long biased_load(unsigned long wload)
+{
+	return (wload * NICE_TO_BIAS_PRIO(0)) / SCHED_LOAD_SCALE;
+}
 #else
 static inline void set_bias_prio(task_t *p)
 {
@@ -1011,7 +1030,8 @@ void kick_process(task_t *p)
 }
 
 /*
- * Return a low guess at the load of a migration-source cpu.
+ * Return a low guess at the load of a migration-source cpu weighted
+ * according to the scheduling class and "nice" value.
  *
  * We want to under-estimate the load of migration sources, to
  * balance conservatively.
@@ -1019,7 +1039,7 @@ void kick_process(task_t *p)
 static unsigned long source_load(int cpu, int type)
 {
 	runqueue_t *rq = cpu_rq(cpu);
-	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
+	unsigned long load_now = weighted_load(rq->prio_bias);
 
 	if (type == 0)
 		return load_now;
@@ -1028,12 +1048,13 @@ static unsigned long source_load(int cpu
 }
 
 /*
- * Return a high guess at the load of a migration-target cpu
+ * Return a high guess at the load of a migration-target cpu weighted
+ * according to the scheduling class and "nice" value.
  */
 static inline unsigned long target_load(int cpu, int type)
 {
 	runqueue_t *rq = cpu_rq(cpu);
-	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
+	unsigned long load_now = weighted_load(rq->prio_bias);
 
 	if (type == 0)
 		return load_now;
@@ -1299,7 +1320,7 @@ static int try_to_wake_up(task_t *p, uns
 			 * of the current CPU:
 			 */
 			if (sync)
-				tl -= p->bias_prio * SCHED_LOAD_SCALE;
+				tl -= weighted_load(p->bias_prio);
 
 			if ((tl <= load &&
 				tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) ||
@@ -1903,9 +1924,9 @@ int can_migrate_task(task_t *p, runqueue
 }
 
 /*
- * move_tasks tries to move up to max_nr_move tasks from busiest to this_rq,
- * as part of a balancing operation within "domain". Returns the number of
- * tasks moved.
+ * move_tasks tries to move up to max_nr_move tasks and max_bias_move biased
+ * load from busiest to this_rq, as part of a balancing operation within
+ * "domain". Returns the number of tasks moved.
  *
  * Called with both runqueues locked.
  */
@@ -2134,8 +2155,11 @@ find_busiest_group(struct sched_domain *
 		return busiest;
 	}
 
-	/* Get rid of the scaling factor, rounding down as we divide */
-	*imbalance = *imbalance / SCHED_LOAD_SCALE;
+	/*
+	 * Get rid of the scaling factor, rounding down as we divide and
+	 * converting to biased load for use by move_tasks()
+	 */
+	*imbalance = biased_load(*imbalance);
 	return busiest;
 
 out_balanced:
@@ -2448,7 +2472,8 @@ static void rebalance_tick(int this_cpu,
 	struct sched_domain *sd;
 	int i;
 
-	this_load = this_rq->prio_bias * SCHED_LOAD_SCALE;
+	/* weight load according to scheduling class and "nice" value */
+	this_load = weighted_load(this_rq->prio_bias);
 	/* Update our load */
 	for (i = 0; i < 3; i++) {
 		unsigned long new_load = this_load;
Index: MM-2.6.X/include/linux/sched.h
===================================================================
--- MM-2.6.X.orig/include/linux/sched.h	2006-01-15 09:54:27.000000000 +1100
+++ MM-2.6.X/include/linux/sched.h	2006-01-15 10:14:42.000000000 +1100
@@ -714,7 +714,7 @@ struct task_struct {
 #endif
 	int prio, static_prio;
 #ifdef CONFIG_SMP
-	int bias_prio;
+	int bias_prio;	/* load "weight" factor for load balancing purposes */
 #endif
 	struct list_head run_list;
 	prio_array_t *array;

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-15  0:05                                   ` Peter Williams
@ 2006-01-15  2:04                                     ` Con Kolivas
  2006-01-15  2:09                                     ` [PATCH] sched - remove unnecessary smpnice ifdefs Con Kolivas
  2006-01-15  3:50                                     ` -mm seems significanty slower than mainline on kernbench Ingo Molnar
  2 siblings, 0 replies; 59+ messages in thread
From: Con Kolivas @ 2006-01-15  2:04 UTC (permalink / raw)
  To: Peter Williams
  Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Ingo Molnar,
	Andy Whitcroft, Nick Piggin

On Sunday 15 January 2006 11:05, Peter Williams wrote:
> Martin J. Bligh wrote:
> >> Attached is a new patch to fix the excessive idle problem.  This patch
> >> takes a new approach to the problem as it was becoming obvious that
> >> trying to alter the load balancing code to cope with biased load was
> >> harder than it seemed.
> >>
> >> This approach reverts to the old load values but weights them
> >> according to tasks' bias_prio values.  This means that any assumptions
> >> by the load balancing code that the load generated by a single task is
> >> SCHED_LOAD_SCALE will still hold.  Then, in find_busiest_group(), the
> >> imbalance is scaled back up to bias_prio scale so that move_tasks()
> >> can move biased load rather than tasks.
> >
> > OK, this one seems to fix the issue that I had, AFAICS. Congrats, and
> > thanks,
>
> Terrific, thanks for testing.
>
> Con,
> 	Attached is a cleaned up version of this patch against 2.6.15-mm4 with
> some (hopefully helpful) comments added.

Great! Well done.

> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>

Acked-by: Con Kolivas <kernel@kolivas.org>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH] sched - remove unnecessary smpnice ifdefs
  2006-01-15  0:05                                   ` Peter Williams
  2006-01-15  2:04                                     ` Con Kolivas
@ 2006-01-15  2:09                                     ` Con Kolivas
  2006-01-15  3:50                                     ` -mm seems significanty slower than mainline on kernbench Ingo Molnar
  2 siblings, 0 replies; 59+ messages in thread
From: Con Kolivas @ 2006-01-15  2:09 UTC (permalink / raw)
  To: Peter Williams
  Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Ingo Molnar,
	Andy Whitcroft, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 1988 bytes --]

On Sunday 15 January 2006 11:05, Peter Williams wrote:
> Martin J. Bligh wrote:
> >> Attached is a new patch to fix the excessive idle problem.  This patch
> >> takes a new approach to the problem as it was becoming obvious that
> >> trying to alter the load balancing code to cope with biased load was
> >> harder than it seemed.
> >>
> >> This approach reverts to the old load values but weights them
> >> according to tasks' bias_prio values.  This means that any assumptions
> >> by the load balancing code that the load generated by a single task is
> >> SCHED_LOAD_SCALE will still hold.  Then, in find_busiest_group(), the
> >> imbalance is scaled back up to bias_prio scale so that move_tasks()
> >> can move biased load rather than tasks.
> >
> > OK, this one seems to fix the issue that I had, AFAICS. Congrats, and
> > thanks,
>
> Terrific, thanks for testing.
>
> Con,
> 	Attached is a cleaned up version of this patch against 2.6.15-mm4 with
> some (hopefully helpful) comments added.

One minor quibble.

Cheers,
Con

---
These #defines have no cost on !CONFIG_SMP without the ifdefs.
Remove the unnecessary ifdefs.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

 kernel/sched.c |    2 --
 1 files changed, 2 deletions(-)

Index: linux-2.6.15-mm4/kernel/sched.c
===================================================================
--- linux-2.6.15-mm4.orig/kernel/sched.c
+++ linux-2.6.15-mm4/kernel/sched.c
@@ -63,7 +63,6 @@
 #define PRIO_TO_NICE(prio)	((prio) - MAX_RT_PRIO - 20)
 #define TASK_NICE(p)		PRIO_TO_NICE((p)->static_prio)
 
-#ifdef CONFIG_SMP
 /*
  * Priority bias for load balancing ranges from 1 (nice==19) to 139 (RT
  * priority of 100).
@@ -71,7 +70,6 @@
 #define NICE_TO_BIAS_PRIO(nice)	(20 - (nice))
 #define PRIO_TO_BIAS_PRIO(prio)	NICE_TO_BIAS_PRIO(PRIO_TO_NICE(prio))
 #define RTPRIO_TO_BIAS_PRIO(rp)	(40 + (rp))
-#endif
 
 /*
  * 'User priority' is the nice value converted to something we

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-15  0:05                                   ` Peter Williams
  2006-01-15  2:04                                     ` Con Kolivas
  2006-01-15  2:09                                     ` [PATCH] sched - remove unnecessary smpnice ifdefs Con Kolivas
@ 2006-01-15  3:50                                     ` Ingo Molnar
  2 siblings, 0 replies; 59+ messages in thread
From: Ingo Molnar @ 2006-01-15  3:50 UTC (permalink / raw)
  To: Peter Williams
  Cc: Con Kolivas, Martin J. Bligh, Andrew Morton, linux-kernel,
	Andy Whitcroft, Nick Piggin

* Peter Williams <pwil3058@bigpond.net.au> wrote:

> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>

Acked-by: Ingo Molnar <mingo@elte.hu>

nice work! There's a small typo here:

> + * To aid in avoiding the subversion of "niceness" due to uneven distribution
> + * of tasks with abnormal "nice" values accross CPUs the contribution that

s/accross/across

	Ingo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-12  6:39                                   ` Con Kolivas
@ 2006-01-23 19:28                                     ` Martin Bligh
  2006-01-24  1:25                                       ` Peter Williams
  0 siblings, 1 reply; 59+ messages in thread
From: Martin Bligh @ 2006-01-23 19:28 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Peter Williams, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft


> Thanks and looks like the results are in from 2.6.14-rc2-mm1 with the patch 
> backed out.
> 
> Drumroll....
> 
> http://test.kernel.org/perf/kernbench.moe.png
> 
> The performance goes back to a range similar to 2.6.14-rc1-mm1 (see 
> 2.6.14-rc2-mm1 + 20328 in blue). Unfortunately this does implicate this 
> patch. Can we put it back into -mm only and allow Peter's tweaks/fixes to go 
> on top and have it tested some more before going upstream?

Hmm. Looks like we didn't get this as fixed up as I thought. Moe seems 
to be fixed (16x NUMA-Q), but elm3b132 is not (it's 4x, flat SMP ia32).
Look at the latest graphs ....

Is it possible it only got fixed for NUMA boxes?

M.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-23 19:28                                     ` Martin Bligh
@ 2006-01-24  1:25                                       ` Peter Williams
  2006-01-24  3:50                                         ` Peter Williams
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Williams @ 2006-01-24  1:25 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft

Martin Bligh wrote:
> 
>> Thanks and looks like the results are in from 2.6.14-rc2-mm1 with the 
>> patch backed out.
>>
>> Drumroll....
>>
>> http://test.kernel.org/perf/kernbench.moe.png
>>
>> The performance goes back to a range similar to 2.6.14-rc1-mm1 (see 
>> 2.6.14-rc2-mm1 + 20328 in blue). Unfortunately this does implicate 
>> this patch. Can we put it back into -mm only and allow Peter's 
>> tweaks/fixes to go on top and have it tested some more before going 
>> upstream?
> 
> 
> Hmm. Looks like we didn't get this as fixed up as I thought. Moe seems 
> to be fixed (16x NUMA-Q), but elm3b132 is not (it's 4x, flat SMP ia32).
> Look at the latest graphs ....
> 
> Is it possible it only got fixed for NUMA boxes?

It should be totally independent.

But anyhow, I can't see the problem.  The only numbers that I can see 
for 2.6.16-rc1-mm[1|2] (which are the ones with the latest fix) on this 
graph are approx. 101 which is much the same as the best of the rest. 
Or have I missed something?

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-24  1:25                                       ` Peter Williams
@ 2006-01-24  3:50                                         ` Peter Williams
  2006-01-24  4:41                                           ` Martin J. Bligh
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Williams @ 2006-01-24  3:50 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft

Peter Williams wrote:
> Martin Bligh wrote:
> 
>>
>>> Thanks and looks like the results are in from 2.6.14-rc2-mm1 with the 
>>> patch backed out.
>>>
>>> Drumroll....
>>>
>>> http://test.kernel.org/perf/kernbench.moe.png
>>>
>>> The performance goes back to a range similar to 2.6.14-rc1-mm1 (see 
>>> 2.6.14-rc2-mm1 + 20328 in blue). Unfortunately this does implicate 
>>> this patch. Can we put it back into -mm only and allow Peter's 
>>> tweaks/fixes to go on top and have it tested some more before going 
>>> upstream?
>>
>>
>>
>> Hmm. Looks like we didn't get this as fixed up as I thought. Moe seems 
>> to be fixed (16x NUMA-Q), but elm3b132 is not (it's 4x, flat SMP ia32).
>> Look at the latest graphs ....
>>
>> Is it possible it only got fixed for NUMA boxes?
> 
> 
> It should be totally independent.
> 
> But anyhow, I can't see the problem.  The only numbers that I can see 
> for 2.6.16-rc1-mm[1|2] (which are the ones with the latest fix) on this 
> graph are approx. 101 which is much the same as the best of the rest. Or 
> have I missed something?

Oops.  I was looking at the graphs for Moe but 
<http://test.kernel.org/perf/dbench.elm3b132.png> doesn't appear to be 
demonstrating a problem either.  Given the fluctuation in the 2.6.16-rc1 
results (235, 234, 211, 228.5 and 237.5), the results for 2.6.16-rc1-mm1 
(229) and 2.6.16-mm2 (219) aren't significantly different.

Peter
PS I have a modification for kernbench that calculates and displays the 
standard deviations for the various averages if you're interested.  This 
would enable you to display 95% (say) confidence bars on the graphed 
results which in turn makes it easier to spot significant differences.
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-24  3:50                                         ` Peter Williams
@ 2006-01-24  4:41                                           ` Martin J. Bligh
  2006-01-24  6:22                                             ` Peter Williams
  0 siblings, 1 reply; 59+ messages in thread
From: Martin J. Bligh @ 2006-01-24  4:41 UTC (permalink / raw)
  To: Peter Williams
  Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft


> Oops.  I was looking at the graphs for Moe but 
> <http://test.kernel.org/perf/dbench.elm3b132.png> doesn't appear to be 
> demonstrating a problem either.  Given the fluctuation in the 
> 2.6.16-rc1 results (235, 234, 211, 228.5 and 237.5), the results for 
> 2.6.16-rc1-mm1 (229) and 2.6.16-mm2 (219) aren't significantly different.

I disagree. Look at the graph. mm results are consistent and stable, and 
significantly worse than mainline.

> Peter
> PS I have a modification for kernbench that calculates and displays 
> the standard deviations for the various averages if you're 
> interested.  This would enable you to display 95% (say) confidence 
> bars on the graphed results which in turn makes it easier to spot 
> significant differences.

Thanks, but I have that. What do you think those vertical bars on the 
graph are for? ;-) They're deviation of 5 runs. I throw away the best 
and worst first. If it was just random noise, then you'd get the same 
variance *between* runs inside the -mm train and inside mainline. You 
don't see that ...

Use the visuals in the graph .. it's very telling. -mm is *broken*.
It may well not be the same issue as last time though, I shouldn't
have jumped to that conclusion.

M.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-24  4:41                                           ` Martin J. Bligh
@ 2006-01-24  6:22                                             ` Peter Williams
  2006-01-24  6:42                                               ` Martin J. Bligh
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Williams @ 2006-01-24  6:22 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft

Martin J. Bligh wrote:
> 
>> Oops.  I was looking at the graphs for Moe but 
>> <http://test.kernel.org/perf/dbench.elm3b132.png> doesn't appear to be 
>> demonstrating a problem either.  Given the fluctuation in the 
>> 2.6.16-rc1 results (235, 234, 211, 228.5 and 237.5), the results for 
>> 2.6.16-rc1-mm1 (229) and 2.6.16-mm2 (219) aren't significantly different.
> 
> 
> I disagree. Look at the graph. mm results are consistent and stable, and 
> significantly worse than mainline.
> 
>> Peter
>> PS I have a modification for kernbench that calculates and displays 
>> the standard deviations for the various averages if you're 
>> interested.  This would enable you to display 95% (say) confidence 
>> bars on the graphed results which in turn makes it easier to spot 
>> significant differences.
> 
> 
> Thanks, but I have that. What do you think those vertical bars on the 
> graph are for? ;-) They're deviation of 5 runs. I throw away the best 
> and worst first.

Not very good scientific practice :-)

> If it was just random noise, then you'd get the same 
> variance *between* runs inside the -mm train and inside mainline. You 
> don't see that ...

I was still looking at the wrong graph this time the dbench instead of 
kernbench :-( (explains why I didn't see the error bars).  Now I see it.

Looking at the other 6 kernbench graphs, I see that it also occurs for 
elm3b70 but no others (including elm3b6 and elm3b67).  Are there any 
differences between the various elm3b systems that could explain this?

> Use the visuals in the graph .. it's very telling. -mm is *broken*.
> It may well not be the same issue as last time though, I shouldn't
> have jumped to that conclusion.

It's very hard to understand how it could be an issue on a system that 
doesn't have a lot of abnormally niced (i.e. non zero) tasks that are 
fairly active as it's now mathematically equivalent to the original in 
the absence of such tasks.  Do these two systems have many such tasks 
running?

Would it be possible to get a run with the following patches backed out:

+sched-modified-nice-support-for-smp-load-balancing-fix.patch
+sched-modified-nice-support-for-smp-load-balancing-fix-fix.patch

Thanks
Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-24  6:22                                             ` Peter Williams
@ 2006-01-24  6:42                                               ` Martin J. Bligh
  2006-01-28 23:20                                                 ` Peter Williams
  0 siblings, 1 reply; 59+ messages in thread
From: Martin J. Bligh @ 2006-01-24  6:42 UTC (permalink / raw)
  To: Peter Williams
  Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft


>> Thanks, but I have that. What do you think those vertical bars on the 
>> graph are for? ;-) They're deviation of 5 runs. I throw away the best 
>> and worst first.
>
>
> Not very good scientific practice :-)


Bollocks to scientific practice ;-) It works, it produces very stable 
results, has done for a few years now. Things like cron or sunspots can 
kick in. Yes, yes, I did stats at University ... but the real world 
doesn't work like that. The visuals in the graph speak for it.

>
> Looking at the other 6 kernbench graphs, I see that it also occurs for 
> elm3b70 but no others (including elm3b6 and elm3b67).  Are there any 
> differences between the various elm3b systems that could explain this?
>
Yes. They're all completely different architectures - there's a brief 
description at the top of the main page. elm3b67 should be ignored, nay 
thrown out of the window. It's an unstable POS that randomly loses 
processors. I've removed it from the pages.

elm3b70 is PPC64 (8 cpu)
elm3b6 is x86_64.
elm3b132 is a 4x SMP ia32 Pentium 3.
moe is 16x NUMA-Q (ia32).
gekko-lp1 is a 2x PPC blade.

>> Use the visuals in the graph .. it's very telling. -mm is *broken*.
>> It may well not be the same issue as last time though, I shouldn't
>> have jumped to that conclusion.
>
>
> It's very hard to understand how it could be an issue on a system that 
> doesn't have a lot of abnormally niced (i.e. non zero) tasks that are 
> fairly active as it's now mathematically equivalent to the original in 
> the absence of such tasks.  Do these two systems have many such tasks 
> running?
>
> Would it be possible to get a run with the following patches backed out:
>
> +sched-modified-nice-support-for-smp-load-balancing-fix.patch
> +sched-modified-nice-support-for-smp-load-balancing-fix-fix.patch


Yup, that should prove or disprove it. It's probably something 
completely un-scheduler-related ;-)

M.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-24  6:42                                               ` Martin J. Bligh
@ 2006-01-28 23:20                                                 ` Peter Williams
  2006-01-29  0:52                                                   ` Martin J. Bligh
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Williams @ 2006-01-28 23:20 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft

Martin J. Bligh wrote:
> 
>>> Thanks, but I have that. What do you think those vertical bars on the 
>>> graph are for? ;-) They're deviation of 5 runs. I throw away the best 
>>> and worst first.
>>
>>
>>
>> Not very good scientific practice :-)
> 
> 
> 
> Bollocks to scientific practice ;-) It works, it produces very stable 
> results, has done for a few years now. Things like cron or sunspots can 
> kick in. Yes, yes, I did stats at University ... but the real world 
> doesn't work like that. The visuals in the graph speak for it.
> 
>>
>> Looking at the other 6 kernbench graphs, I see that it also occurs for 
>> elm3b70 but no others (including elm3b6 and elm3b67).  Are there any 
>> differences between the various elm3b systems that could explain this?
>>
> Yes. They're all completely different architectures - there's a brief 
> description at the top of the main page. elm3b67 should be ignored, nay 
> thrown out of the window. It's an unstable POS that randomly loses 
> processors. I've removed it from the pages.
> 
> elm3b70 is PPC64 (8 cpu)
> elm3b6 is x86_64.
> elm3b132 is a 4x SMP ia32 Pentium 3.
> moe is 16x NUMA-Q (ia32).
> gekko-lp1 is a 2x PPC blade.
> 
>>> Use the visuals in the graph .. it's very telling. -mm is *broken*.
>>> It may well not be the same issue as last time though, I shouldn't
>>> have jumped to that conclusion.
>>
>>
>>
>> It's very hard to understand how it could be an issue on a system that 
>> doesn't have a lot of abnormally niced (i.e. non zero) tasks that are 
>> fairly active as it's now mathematically equivalent to the original in 
>> the absence of such tasks.  Do these two systems have many such tasks 
>> running?
>>
>> Would it be possible to get a run with the following patches backed out:
>>
>> +sched-modified-nice-support-for-smp-load-balancing-fix.patch
>> +sched-modified-nice-support-for-smp-load-balancing-fix-fix.patch
> 
> 
> 
> Yup, that should prove or disprove it. It's probably something 
> completely un-scheduler-related ;-)
> 
> M.

Looking at the latest results for 2.6.16-rc1-mm3, it appears to me that 
this is no longer an issue.  Do you agree?

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: -mm seems significanty slower than mainline on kernbench
  2006-01-28 23:20                                                 ` Peter Williams
@ 2006-01-29  0:52                                                   ` Martin J. Bligh
  0 siblings, 0 replies; 59+ messages in thread
From: Martin J. Bligh @ 2006-01-29  0:52 UTC (permalink / raw)
  To: Peter Williams
  Cc: Con Kolivas, Andrew Morton, linux-kernel, Ingo Molnar, Andy Whitcroft


>
> Looking at the latest results for 2.6.16-rc1-mm3, it appears to me 
> that this is no longer an issue.  Do you agree?
>
> Peter

Yes, -mm3 looks OK ... not sure what happened, but as long as it works 
... ;-)

M.

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2006-01-29  0:53 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-11  1:14 -mm seems significanty slower than mainline on kernbench Martin Bligh
2006-01-11  1:31 ` Andrew Morton
2006-01-11  1:41   ` Martin Bligh
2006-01-11  1:48     ` Andrew Morton
2006-01-11  1:49     ` Con Kolivas
2006-01-11  2:38       ` Peter Williams
2006-01-11  3:07         ` Con Kolivas
2006-01-11  3:12           ` Martin Bligh
2006-01-11  3:40           ` Peter Williams
2006-01-11  3:49             ` Con Kolivas
2006-01-11  4:33               ` Peter Williams
2006-01-11  5:14             ` Peter Williams
2006-01-11  6:21               ` Martin J. Bligh
2006-01-11 12:24                 ` Peter Williams
2006-01-11 14:29                   ` Con Kolivas
2006-01-11 22:05                     ` Peter Williams
2006-01-12  0:54                       ` Peter Williams
2006-01-12  1:18                         ` Con Kolivas
2006-01-12  1:29                           ` Peter Williams
2006-01-12  1:36                             ` Con Kolivas
2006-01-12  2:23                               ` Peter Williams
2006-01-12  2:26                                 ` Martin Bligh
2006-01-12  6:39                                   ` Con Kolivas
2006-01-23 19:28                                     ` Martin Bligh
2006-01-24  1:25                                       ` Peter Williams
2006-01-24  3:50                                         ` Peter Williams
2006-01-24  4:41                                           ` Martin J. Bligh
2006-01-24  6:22                                             ` Peter Williams
2006-01-24  6:42                                               ` Martin J. Bligh
2006-01-28 23:20                                                 ` Peter Williams
2006-01-29  0:52                                                   ` Martin J. Bligh
2006-01-12  2:27                                 ` Con Kolivas
2006-01-12  2:04                           ` Martin Bligh
2006-01-12  6:35                             ` Martin J. Bligh
2006-01-12  6:41                               ` Con Kolivas
2006-01-12  6:54                                 ` Peter Williams
2006-01-12 18:39                         ` Martin Bligh
2006-01-12 20:03                           ` Peter Williams
2006-01-12 22:20                             ` Peter Williams
2006-01-13  7:06                               ` Peter Williams
2006-01-13 12:00                                 ` Peter Williams
2006-01-13 16:15                                 ` Martin J. Bligh
2006-01-13 16:26                                 ` Andy Whitcroft
2006-01-13 17:54                                   ` Andy Whitcroft
2006-01-13 20:41                                     ` Martin Bligh
2006-01-14  0:23                                       ` Peter Williams
2006-01-14  5:03                                         ` Nick Piggin
2006-01-14  5:40                                           ` Con Kolivas
2006-01-14  6:05                                             ` Nick Piggin
2006-01-14  5:53                                           ` Peter Williams
2006-01-14  6:13                                             ` Nick Piggin
2006-01-13 22:59                                     ` Peter Williams
2006-01-14 18:48                                 ` Martin J. Bligh
2006-01-15  0:05                                   ` Peter Williams
2006-01-15  2:04                                     ` Con Kolivas
2006-01-15  2:09                                     ` [PATCH] sched - remove unnecessary smpnice ifdefs Con Kolivas
2006-01-15  3:50                                     ` -mm seems significanty slower than mainline on kernbench Ingo Molnar
2006-01-12  1:25                       ` Peter Williams
2006-01-11  1:52     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).