* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
[not found] <20040811010116.GL11200@holomorphy.com>
@ 2004-08-11 2:21 ` spaminos-ker
2004-08-11 2:23 ` William Lee Irwin III
2004-08-11 3:09 ` Con Kolivas
0 siblings, 2 replies; 21+ messages in thread
From: spaminos-ker @ 2004-08-11 2:21 UTC (permalink / raw)
To: linux-kernel; +Cc: William Lee Irwin III
--- William Lee Irwin III <wli@holomorphy.com> wrote:
>
> Wakeup bonuses etc. are starving tasks. Could you try Peter Williams'
> SPA patches with the do_promotions() function? I suspect these should
> pass your tests.
>
>
> -- wli
>
I tried the patch-2.6.7-spa_hydra_FULL-v4.0 patch
I only changed the value of /proc/sys/kernel/cpusched/mode to switch between
different patches.
The 2 threads test passes successfuly (improvement over stock 2.6.7) but none
passed the 20 threads test:
eb
Tue Aug 10 19:10:48 PDT 2004
>>>>>>> delta = 6
Tue Aug 10 19:11:03 PDT 2004
>>>>>>> delta = 16
Tue Aug 10 19:11:13 PDT 2004
>>>>>>> delta = 9
Tue Aug 10 19:11:24 PDT 2004
>>>>>>> delta = 11
Tue Aug 10 19:11:34 PDT 2004
>>>>>>> delta = 10
Tue Aug 10 19:11:45 PDT 2004
>>>>>>> delta = 11
Tue Aug 10 19:11:56 PDT 2004
>>>>>>> delta = 11
Tue Aug 10 19:12:06 PDT 2004
>>>>>>> delta = 10
pb
Tue Aug 10 19:07:52 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:07:55 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:07:59 PDT 2004
>>>>>>> delta = 4
Tue Aug 10 19:08:02 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:08:05 PDT 2004
>>>>>>> delta = 3
sc
Tue Aug 10 19:08:28 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:09:08 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:09:17 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:09:23 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:09:49 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:09:53 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:09:55 PDT 2004
>>>>>>> delta = 3
eb seemed to be the worst of the bunch with quite long system hangs on this
particular test.
With the default settings of:
base_promotion_interval 255
compute 0
cpu_hog_threshold 900
ia_threshold 900
initial_ia_bonus 1
interactive 0
log_at_exit 0
max_ia_bonus 9
max_tpt_bonus 4
sched_batch_time_slice_multiplier 10
sched_iso_threshold 50
sched_rr_time_slice 100
time_slice 100
I am not very familiar with all the parameters, so I just kept the defaults
Anything else I could try?
Nicolas
=====
------------------------------------------------------------
video meliora proboque deteriora sequor
------------------------------------------------------------
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-11 2:21 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
@ 2004-08-11 2:23 ` William Lee Irwin III
2004-08-11 2:45 ` Peter Williams
2004-08-11 3:09 ` Con Kolivas
1 sibling, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2004-08-11 2:23 UTC (permalink / raw)
To: spaminos-ker; +Cc: linux-kernel
On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote:
> I am not very familiar with all the parameters, so I just kept the defaults
> Anything else I could try?
> Nicolas
No. It appeared that the SPA bits had sufficient fairness in them to
pass this test but apparently not quite enough.
-- wli
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-11 2:23 ` William Lee Irwin III
@ 2004-08-11 2:45 ` Peter Williams
2004-08-11 2:47 ` Peter Williams
0 siblings, 1 reply; 21+ messages in thread
From: Peter Williams @ 2004-08-11 2:45 UTC (permalink / raw)
To: spaminos-ker; +Cc: William Lee Irwin III, linux-kernel
William Lee Irwin III wrote:
> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote:
>
>>I am not very familiar with all the parameters, so I just kept the defaults
>>Anything else I could try?
>>Nicolas
>
>
> No. It appeared that the SPA bits had sufficient fairness in them to
> pass this test but apparently not quite enough.
>
The interactive bonus may interfere with fairness (the throughput bonus
should actually help it for tasks with equal nice) so you could try
setting max_ia_bonus to zero (and possibly increasing max_tpt_bonus).
With "eb" mode this should still give good interactive response but
expect interactive response to suffer a little in "pb" mode however
renicing the X server to a negative value should help.
Peter
PS There's a primitive GUI available for setting the scheduler
parameters at
<http://prdownloads.sourceforge.net/cpuse/gcpuctl_hydra-1.3.tar.gz?download>
this is just a Python script with a Glade XML file (gcpuctl_hydra.glade)
which needs to be in the same directory that you run the script from.
--
Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-11 2:45 ` Peter Williams
@ 2004-08-11 2:47 ` Peter Williams
2004-08-11 3:23 ` Peter Williams
0 siblings, 1 reply; 21+ messages in thread
From: Peter Williams @ 2004-08-11 2:47 UTC (permalink / raw)
To: spaminos-ker; +Cc: William Lee Irwin III, linux-kernel
Peter Williams wrote:
> William Lee Irwin III wrote:
>
>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote:
>>
>>> I am not very familiar with all the parameters, so I just kept the
>>> defaults
>>> Anything else I could try?
>>> Nicolas
>>
>>
>>
>> No. It appeared that the SPA bits had sufficient fairness in them to
>> pass this test but apparently not quite enough.
>>
>
> The interactive bonus may interfere with fairness (the throughput bonus
> should actually help it for tasks with equal nice) so you could try
> setting max_ia_bonus to zero (and possibly increasing max_tpt_bonus).
> With "eb" mode this should still give good interactive response but
> expect interactive response to suffer a little in "pb" mode however
> renicing the X server to a negative value should help.
I should also have mentioned that fiddling with the promotion interval
may help.
Peter
--
Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-11 2:21 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
2004-08-11 2:23 ` William Lee Irwin III
@ 2004-08-11 3:09 ` Con Kolivas
2004-08-11 10:24 ` Prakash K. Cheemplavam
` (2 more replies)
1 sibling, 3 replies; 21+ messages in thread
From: Con Kolivas @ 2004-08-11 3:09 UTC (permalink / raw)
To: spaminos-ker; +Cc: linux-kernel, William Lee Irwin III
spaminos-ker@yahoo.com writes:
> --- William Lee Irwin III <wli@holomorphy.com> wrote:
>>
>> Wakeup bonuses etc. are starving tasks. Could you try Peter Williams'
>> SPA patches with the do_promotions() function? I suspect these should
>> pass your tests.
>>
>>
>> -- wli
>>
>
> I tried the patch-2.6.7-spa_hydra_FULL-v4.0 patch
>
> I only changed the value of /proc/sys/kernel/cpusched/mode to switch between
> different patches.
>
> The 2 threads test passes successfuly (improvement over stock 2.6.7) but none
> passed the 20 threads test:
Hi
I tried this on the latest staircase patch (7.I) and am not getting any
output from your script when tested up to 60 threads on my hardware. Can you
try this version of staircase please?
There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
http://ck.kolivas.org/patches/2.6/2.6.8/
Cheers,
Con
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-11 2:47 ` Peter Williams
@ 2004-08-11 3:23 ` Peter Williams
2004-08-11 3:31 ` Con Kolivas
2004-08-11 3:44 ` Peter Williams
0 siblings, 2 replies; 21+ messages in thread
From: Peter Williams @ 2004-08-11 3:23 UTC (permalink / raw)
To: spaminos-ker; +Cc: William Lee Irwin III, linux-kernel
Peter Williams wrote:
> Peter Williams wrote:
>
>> William Lee Irwin III wrote:
>>
>>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote:
>>>
>>>> I am not very familiar with all the parameters, so I just kept the
>>>> defaults
>>>> Anything else I could try?
>>>> Nicolas
>>>
>>>
>>>
>>>
>>> No. It appeared that the SPA bits had sufficient fairness in them to
>>> pass this test but apparently not quite enough.
>>>
>>
>> The interactive bonus may interfere with fairness (the throughput
>> bonus should actually help it for tasks with equal nice) so you could
>> try setting max_ia_bonus to zero (and possibly increasing
>> max_tpt_bonus). With "eb" mode this should still give good interactive
>> response but expect interactive response to suffer a little in "pb"
>> mode however renicing the X server to a negative value should help.
>
>
> I should also have mentioned that fiddling with the promotion interval
> may help.
Having reread your original e-mail I think that this problem is probably
being caused by the interactive bonus mechanism classifying the httpd
server threads as "interactive" threads and giving them a bonus. But
for some reason the daemon is not identified as "interactive" meaning
that it gets given a lower priority. In this situation if there's a
large number of httpd threads (even with promotion) it could take quite
a while for the daemon to get a look in. Without promotion total
starvation is even a possibility.
Peter
PS For both "eb" and "pb" modes, max_io_bonus should be set to zero on
servers (where interactive responsiveness isn't an issue).
PPS For "sc" mode, try setting "interactive" to zero and "compute" to 1.
--
Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-11 3:23 ` Peter Williams
@ 2004-08-11 3:31 ` Con Kolivas
2004-08-11 3:46 ` Peter Williams
2004-08-11 3:44 ` Peter Williams
1 sibling, 1 reply; 21+ messages in thread
From: Con Kolivas @ 2004-08-11 3:31 UTC (permalink / raw)
To: Peter Williams; +Cc: spaminos-ker, William Lee Irwin III, linux-kernel
Peter Williams writes:
> Peter Williams wrote:
>> Peter Williams wrote:
>>
>>> William Lee Irwin III wrote:
>>>
>>>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote:
>>>>
>>>>> I am not very familiar with all the parameters, so I just kept the
>>>>> defaults
>>>>> Anything else I could try?
>>>>> Nicolas
>>>>
>>>>
>>>>
>>>>
>>>> No. It appeared that the SPA bits had sufficient fairness in them to
>>>> pass this test but apparently not quite enough.
>>>>
>>>
>>> The interactive bonus may interfere with fairness (the throughput
>>> bonus should actually help it for tasks with equal nice) so you could
>>> try setting max_ia_bonus to zero (and possibly increasing
>>> max_tpt_bonus). With "eb" mode this should still give good interactive
>>> response but expect interactive response to suffer a little in "pb"
>>> mode however renicing the X server to a negative value should help.
>>
>>
>> I should also have mentioned that fiddling with the promotion interval
>> may help.
>
> Having reread your original e-mail I think that this problem is probably
> being caused by the interactive bonus mechanism classifying the httpd
> server threads as "interactive" threads and giving them a bonus. But
> for some reason the daemon is not identified as "interactive" meaning
> that it gets given a lower priority. In this situation if there's a
> large number of httpd threads (even with promotion) it could take quite
> a while for the daemon to get a look in. Without promotion total
> starvation is even a possibility.
>
> Peter
> PS For both "eb" and "pb" modes, max_io_bonus should be set to zero on
> servers (where interactive responsiveness isn't an issue).
> PPS For "sc" mode, try setting "interactive" to zero and "compute" to 1.
No, compute should not be set to 1 for a server. It is reserved only for
computational nodes, not regular servers. "Compute" will increase latency
which is undersirable.
Cheers,
Con
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-11 3:23 ` Peter Williams
2004-08-11 3:31 ` Con Kolivas
@ 2004-08-11 3:44 ` Peter Williams
2004-08-13 0:13 ` spaminos-ker
1 sibling, 1 reply; 21+ messages in thread
From: Peter Williams @ 2004-08-11 3:44 UTC (permalink / raw)
To: spaminos-ker; +Cc: Peter Williams, William Lee Irwin III, linux-kernel
Peter Williams wrote:
> Peter Williams wrote:
>
>> Peter Williams wrote:
>>
>>> William Lee Irwin III wrote:
>>>
>>>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote:
>>>>
>>>>> I am not very familiar with all the parameters, so I just kept the
>>>>> defaults
>>>>> Anything else I could try?
>>>>> Nicolas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> No. It appeared that the SPA bits had sufficient fairness in them to
>>>> pass this test but apparently not quite enough.
>>>>
>>>
>>> The interactive bonus may interfere with fairness (the throughput
>>> bonus should actually help it for tasks with equal nice) so you could
>>> try setting max_ia_bonus to zero (and possibly increasing
>>> max_tpt_bonus). With "eb" mode this should still give good
>>> interactive response but expect interactive response to suffer a
>>> little in "pb" mode however renicing the X server to a negative value
>>> should help.
>>
>>
>>
>> I should also have mentioned that fiddling with the promotion interval
>> may help.
>
>
> Having reread your original e-mail I think that this problem is probably
> being caused by the interactive bonus mechanism classifying the httpd
> server threads as "interactive" threads and giving them a bonus. But
> for some reason the daemon is not identified as "interactive" meaning
> that it gets given a lower priority. In this situation if there's a
> large number of httpd threads (even with promotion) it could take quite
> a while for the daemon to get a look in. Without promotion total
> starvation is even a possibility.
>
> Peter
> PS For both "eb" and "pb" modes, max_io_bonus should be set to zero on
> servers (where interactive responsiveness isn't an issue).
> PPS For "sc" mode, try setting "interactive" to zero and "compute" to 1.
I've just run your tests on my desktop and with max_ia_bonus at its
default value I see the "delta = 3" with 20 threads BUT when I set
max_ia_bonus to zero they stop (in both "eb" and "pb" mode). So I then
reran the tests with 60 threads and zero max_ia_bonus and no output was
generated by your testdelay script in either "eb" or "pb" modes. I
didn't try "sc" mode as I have a ZAPHOD kernel loaded (not HYDRA) but
Con has reported that the problem is absent in his latest patches so
I'll update the "sc" mode in HYDRA to those patches.
Peter
--
Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-11 3:31 ` Con Kolivas
@ 2004-08-11 3:46 ` Peter Williams
0 siblings, 0 replies; 21+ messages in thread
From: Peter Williams @ 2004-08-11 3:46 UTC (permalink / raw)
To: Con Kolivas; +Cc: spaminos-ker, William Lee Irwin III, linux-kernel
Con Kolivas wrote:
> Peter Williams writes:
>
>> Peter Williams wrote:
>>
>>> Peter Williams wrote:
>>>
>>>> William Lee Irwin III wrote:
>>>>
>>>>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com
>>>>> wrote:
>>>>>
>>>>>> I am not very familiar with all the parameters, so I just kept the
>>>>>> defaults
>>>>>> Anything else I could try?
>>>>>> Nicolas
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> No. It appeared that the SPA bits had sufficient fairness in them to
>>>>> pass this test but apparently not quite enough.
>>>>>
>>>>
>>>> The interactive bonus may interfere with fairness (the throughput
>>>> bonus should actually help it for tasks with equal nice) so you
>>>> could try setting max_ia_bonus to zero (and possibly increasing
>>>> max_tpt_bonus). With "eb" mode this should still give good
>>>> interactive response but expect interactive response to suffer a
>>>> little in "pb" mode however renicing the X server to a negative
>>>> value should help.
>>>
>>>
>>>
>>> I should also have mentioned that fiddling with the promotion
>>> interval may help.
>>
>>
>> Having reread your original e-mail I think that this problem is
>> probably being caused by the interactive bonus mechanism classifying
>> the httpd server threads as "interactive" threads and giving them a
>> bonus. But for some reason the daemon is not identified as
>> "interactive" meaning that it gets given a lower priority. In this
>> situation if there's a large number of httpd threads (even with
>> promotion) it could take quite a while for the daemon to get a look
>> in. Without promotion total starvation is even a possibility.
>>
>> Peter
>> PS For both "eb" and "pb" modes, max_io_bonus should be set to zero on
>> servers (where interactive responsiveness isn't an issue).
>> PPS For "sc" mode, try setting "interactive" to zero and "compute" to 1.
>
>
> No, compute should not be set to 1 for a server. It is reserved only for
> computational nodes, not regular servers. "Compute" will increase
> latency which is undersirable.
Sorry, my misunderstanding.
Peter
--
Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-11 3:09 ` Con Kolivas
@ 2004-08-11 10:24 ` Prakash K. Cheemplavam
2004-08-11 11:26 ` Scheduler fairness problem on 2.6 series Con Kolivas
2004-08-12 2:04 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
2004-08-12 2:24 ` spaminos-ker
2 siblings, 1 reply; 21+ messages in thread
From: Prakash K. Cheemplavam @ 2004-08-11 10:24 UTC (permalink / raw)
To: Con Kolivas; +Cc: spaminos-ker, linux-kernel, William Lee Irwin III
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Con Kolivas wrote:
| I tried this on the latest staircase patch (7.I) and am not getting any
| output from your script when tested up to 60 threads on my hardware. Can
| you try this version of staircase please?
|
| There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
|
| http://ck.kolivas.org/patches/2.6/2.6.8/
Hi,
I just updated to 2.6.8-rc4-ck2 and tried the two options interactive
and compute. Is the compute stuff functional? I tried setting it to 1
within X and after that X wasn't usable anymore (meaning it looked like
locked up, frozen/gone mouse cursor even). I managed to switch back to
console and set it to 0 and all was OK again.
The interactive to 0 setting helped me with runnign locally multiple
processes using mpi. Nevertheless (only with interactive 1 regression to
vanilla scheduler, else same) can't this be enhanced?
Details: I am working on a load balancing class using mpi. For testing
purpises I am running multiple processes on my machine. So for a given
problem I can say, it needs x time to solve. Using more processes opn a
single machine, this time (except communication and balancing overhead)
shouldn't be much larger. Unfortunately this happens. Eg. a given
probelm using two processes needs about 20 seconds to finish. But using
8 it already needs 47s (55s with interactiv set to 1). No, my balancing
framework is quite good. On a real (small, even larger till 128 nodes
tested) cluster overhead is just as low as 3% to 5%, ie. it scales quite
linearly.
Any idea how to tweak the staircase to get near the 20 seconds with more
processes? Or is this rather a problem of mpich used locally?
If you like I can send you my code to test (beware it is not that small).
Cheers,
Prakash
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBGfPZxU2n/+9+t5gRApa1AJ9j82Aujwj/IoGLqvDsX29y/dLu/wCglvse
bRV6zeWc+6z+ETl9Hxqleho=
=Jay6
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series
2004-08-11 10:24 ` Prakash K. Cheemplavam
@ 2004-08-11 11:26 ` Con Kolivas
2004-08-11 12:05 ` Prakash K. Cheemplavam
0 siblings, 1 reply; 21+ messages in thread
From: Con Kolivas @ 2004-08-11 11:26 UTC (permalink / raw)
To: Prakash K. Cheemplavam; +Cc: linux kernel mailing list
[-- Attachment #1: Type: text/plain, Size: 2465 bytes --]
Prakash K. Cheemplavam wrote:
> Con Kolivas wrote:
> | I tried this on the latest staircase patch (7.I) and am not getting any
> | output from your script when tested up to 60 threads on my hardware. Can
> | you try this version of staircase please?
> |
> | There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
> |
> | http://ck.kolivas.org/patches/2.6/2.6.8/
>
> Hi,
>
> I just updated to 2.6.8-rc4-ck2 and tried the two options interactive
> and compute. Is the compute stuff functional? I tried setting it to 1
> within X and after that X wasn't usable anymore (meaning it looked like
> locked up, frozen/gone mouse cursor even). I managed to switch back to
> console and set it to 0 and all was OK again.
Compute is very functional. However it isn't remotely meant to be run on
a desktop because of very large scheduling latencies (on purpose).
> The interactive to 0 setting helped me with runnign locally multiple
> processes using mpi. Nevertheless (only with interactive 1 regression to
> vanilla scheduler, else same) can't this be enhanced?
I don't understand your question. Can what be enhanced?
> Details: I am working on a load balancing class using mpi. For testing
> purpises I am running multiple processes on my machine. So for a given
> problem I can say, it needs x time to solve. Using more processes opn a
> single machine, this time (except communication and balancing overhead)
> shouldn't be much larger. Unfortunately this happens. Eg. a given
> probelm using two processes needs about 20 seconds to finish. But using
> 8 it already needs 47s (55s with interactiv set to 1). No, my balancing
> framework is quite good. On a real (small, even larger till 128 nodes
> tested) cluster overhead is just as low as 3% to 5%, ie. it scales quite
> linearly.
Once again I dont quite understand you. Are you saying that there is
more than 50% cpu overhead when running 8 processes? Or that the cpu is
distributed unfairly such that the longest will run for 47s?
> Any idea how to tweak the staircase to get near the 20 seconds with more
> processes? Or is this rather a problem of mpich used locally?
Compute mode is by far the most scalable mode in staircase for purely
computational tasks. The cost is that of interactivity; it is bad on
purpose since it is a no-compromise maximum cpu cache utilisation policy.
> If you like I can send you my code to test (beware it is not that small).
>
> Cheers,
>
> Prakash
Cheers,
Con
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series
2004-08-11 11:26 ` Scheduler fairness problem on 2.6 series Con Kolivas
@ 2004-08-11 12:05 ` Prakash K. Cheemplavam
2004-08-11 19:22 ` Prakash K. Cheemplavam
0 siblings, 1 reply; 21+ messages in thread
From: Prakash K. Cheemplavam @ 2004-08-11 12:05 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux kernel mailing list
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Con Kolivas wrote:
| Prakash K. Cheemplavam wrote:
|
|> Con Kolivas wrote:
|> | I tried this on the latest staircase patch (7.I) and am not getting any
|> | output from your script when tested up to 60 threads on my hardware.
|> Can
|> | you try this version of staircase please?
|> |
|> | There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
|> |
|> | http://ck.kolivas.org/patches/2.6/2.6.8/
|>
|> Hi,
|>
|> I just updated to 2.6.8-rc4-ck2 and tried the two options interactive
|> and compute. Is the compute stuff functional? I tried setting it to 1
|> within X and after that X wasn't usable anymore (meaning it looked like
|> locked up, frozen/gone mouse cursor even). I managed to switch back to
|> console and set it to 0 and all was OK again.
|
|
| Compute is very functional. However it isn't remotely meant to be run on
| a desktop because of very large scheduling latencies (on purpose).
Uhm, OK, I didn't know it would have such drastic effect. Perhpas you
should add a warnign that this setting shouldn't be used on X. :-)
|
|> The interactive to 0 setting helped me with runnign locally multiple
|> processes using mpi. Nevertheless (only with interactive 1 regression to
|> vanilla scheduler, else same) can't this be enhanced?
|
|
| I don't understand your question. Can what be enhanced?
|
|> Details: I am working on a load balancing class using mpi. For testing
|> purpises I am running multiple processes on my machine. So for a given
|> problem I can say, it needs x time to solve. Using more processes opn a
|> single machine, this time (except communication and balancing overhead)
|> shouldn't be much larger. Unfortunately this happens. Eg. a given
|> probelm using two processes needs about 20 seconds to finish. But using
|> 8 it already needs 47s (55s with interactiv set to 1). No, my balancing
|> framework is quite good. On a real (small, even larger till 128 nodes
|> tested) cluster overhead is just as low as 3% to 5%, ie. it scales quite
|> linearly.
|
|
| Once again I dont quite understand you. Are you saying that there is
| more than 50% cpu overhead when running 8 processes? Or that the cpu is
| distributed unfairly such that the longest will run for 47s?
I don't think it is the overhead. I rather think the way the kernel
schedulers gives mpich and the cpu bound program resources is unfair.
Or the timeslice is tto big? Those 8 processes in my test usually do a
load-balancing after 1 second of work. In this second all of those
processes should use the CPU at the same time. I rather have the
impression that the processes get CPU time one after the other, so it
fools the load balancer to think the cpu is fast (the job is done in
"regular" time but the overhead seems to be big, as each process after
having finished now waits for the next one to finish and communicate
with it.
Or to put it more graphically (with 4 processes consisting of 3 parts
just for making it clear and final communication:)
What is done now (xy, x: process, y:part or communication):
11 12 13 1c 21 22 23 2c 31 32 33 3c 41 42 43 4c
What the sheduler should rather do:
11 21 31 41 12 22 32 42 13 23 33 43 1c 2c 3c 4c
So the balancer would rather find the CPU to be slower by the factor of
used processes instead of thinking the overhead is big. (I am not sure
whether this really explains the steep increase of time wasted with more
processes used. Perheaps it really is mpich, though I don't understand
why it would use up so much time. Any way for me to find out? Via
profiling?)
This is just a guess of what I think goes wrong. (Is the timeslice
simply too big which the scheduler gives each process?)
hth,
Prakash
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBGgtxxU2n/+9+t5gRAp4HAJ0eN4j3RHvTmvQDzMi+fpa2YAuU3QCgpQRQ
6zbDInJz3DqrJrzh3DUTiIw=
=Yk5C
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series
2004-08-11 12:05 ` Prakash K. Cheemplavam
@ 2004-08-11 19:22 ` Prakash K. Cheemplavam
2004-08-11 23:42 ` Con Kolivas
0 siblings, 1 reply; 21+ messages in thread
From: Prakash K. Cheemplavam @ 2004-08-11 19:22 UTC (permalink / raw)
Cc: Con Kolivas, linux kernel mailing list
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
|
| I don't think it is the overhead. I rather think the way the kernel
| schedulers gives mpich and the cpu bound program resources is unfair.
Well, I don't know whether it helps, but I ran a profiler and these are
the functions which cause so much wasted CPU cycles when running 16
processes of my example with mpich:
124910 9.8170 vmlinux tcp_poll
123356 9.6949 vmlinux sys_select
85634 6.7302 vmlinux do_select
71858 5.6475 vmlinux sysenter_past_esp
62093 4.8801 vmlinux kfree
51658 4.0600 vmlinux __copy_to_user_ll
37495 2.9468 vmlinux max_select_fd
36949 2.9039 vmlinux __kmalloc
22700 1.7841 vmlinux __copy_from_user_ll
14587 1.1464 vmlinux do_gettimeofday
Is anything scheduler related?
bye,
Prakash
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBGnHxxU2n/+9+t5gRAlF+AJ9z+OqbIJYkeiy4nAPVB22S/WLLnACg1khF
XeF+3Hq0adpoLjdbn+tmzn0=
=7Onu
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series
2004-08-11 19:22 ` Prakash K. Cheemplavam
@ 2004-08-11 23:42 ` Con Kolivas
2004-08-12 8:08 ` Prakash K. Cheemplavam
2004-08-12 18:18 ` Bill Davidsen
0 siblings, 2 replies; 21+ messages in thread
From: Con Kolivas @ 2004-08-11 23:42 UTC (permalink / raw)
To: Prakash K. Cheemplavam; +Cc: linux kernel mailing list
[-- Attachment #1: Type: text/plain, Size: 1404 bytes --]
Prakash K. Cheemplavam wrote:
> |
> | I don't think it is the overhead. I rather think the way the kernel
> | schedulers gives mpich and the cpu bound program resources is unfair.
>
> Well, I don't know whether it helps, but I ran a profiler and these are
> the functions which cause so much wasted CPU cycles when running 16
> processes of my example with mpich:
>
> 124910 9.8170 vmlinux tcp_poll
> 123356 9.6949 vmlinux sys_select
> 85634 6.7302 vmlinux do_select
> 71858 5.6475 vmlinux sysenter_past_esp
> 62093 4.8801 vmlinux kfree
> 51658 4.0600 vmlinux __copy_to_user_ll
> 37495 2.9468 vmlinux max_select_fd
> 36949 2.9039 vmlinux __kmalloc
> 22700 1.7841 vmlinux __copy_from_user_ll
> 14587 1.1464 vmlinux do_gettimeofday
>
> Is anything scheduler related?
No
It looks like your select timeouts are too short and when the cpu load
goes up they repeatedly timeout wasting cpu cycles.
I quote from `man select_tut` under the section SELECT LAW:
1. You should always try use select without a timeout. Your program
should have nothing to do if there is no data available. Code
that depends on timeouts is not usually portable and difficult
to debug.
Cheers,
Con
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-11 3:09 ` Con Kolivas
2004-08-11 10:24 ` Prakash K. Cheemplavam
@ 2004-08-12 2:04 ` spaminos-ker
2004-08-12 2:24 ` spaminos-ker
2 siblings, 0 replies; 21+ messages in thread
From: spaminos-ker @ 2004-08-12 2:04 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux-kernel, William Lee Irwin III
--- Con Kolivas <kernel@kolivas.org> wrote:
> Hi
>
> I tried this on the latest staircase patch (7.I) and am not getting any
> output from your script when tested up to 60 threads on my hardware. Can you
> try this version of staircase please?
>
> There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
>
> http://ck.kolivas.org/patches/2.6/2.6.8/
>
> Cheers,
> Con
>
Just tried on my machine:
2.6.8-rc4 fails all tests (did the test just to be sure)
2.6.8-rc4 with the "from_2.6.8-rc4_to_staircase7.I" patch and things look
pretty good:
on my hardware, I could put 60 threads too, and my shells are still very
responsive etc, and I get no slow downs with my watchdog script.
A few strange things happened though (with 60 threads):
* after a few minutes, I got one message
Wed Aug 11 18:06:11 PDT 2004
>>>>>>> delta = 57
57 seconds !?! very surprising
* shortly after that, I tried to run top, or ps, and they all got stuck, I
waited a couple minutes and they were still stuck. I opened a few shells, I
could do anything but commands that enumerate the process list. After a while,
I killed the cputest program (ctrld c it), and the stucked ps/top continued
their execution.
I could not reproduce those problems ; I even rebooted the machine, but only
got one message delta of 3 every 30 minutes or so.
Nicolas
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-11 3:09 ` Con Kolivas
2004-08-11 10:24 ` Prakash K. Cheemplavam
2004-08-12 2:04 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
@ 2004-08-12 2:24 ` spaminos-ker
2004-08-12 2:53 ` Con Kolivas
2 siblings, 1 reply; 21+ messages in thread
From: spaminos-ker @ 2004-08-12 2:24 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux-kernel, William Lee Irwin III
--- Con Kolivas <kernel@kolivas.org> wrote:
>
> Hi
>
> I tried this on the latest staircase patch (7.I) and am not getting any
> output from your script when tested up to 60 threads on my hardware. Can you
> try this version of staircase please?
>
> There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
>
> http://ck.kolivas.org/patches/2.6/2.6.8/
>
> Cheers,
> Con
>
>
One thing to note is that I do get a lot of output from the script if I set
interactive to 0 (delays between 3 and 13 seconds with 60 threads).
Nicolas
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-12 2:24 ` spaminos-ker
@ 2004-08-12 2:53 ` Con Kolivas
0 siblings, 0 replies; 21+ messages in thread
From: Con Kolivas @ 2004-08-12 2:53 UTC (permalink / raw)
To: spaminos-ker; +Cc: linux-kernel, William Lee Irwin III
spaminos-ker@yahoo.com writes:
> --- Con Kolivas <kernel@kolivas.org> wrote:
>>
>> Hi
>>
>> I tried this on the latest staircase patch (7.I) and am not getting any
>> output from your script when tested up to 60 threads on my hardware. Can you
>> try this version of staircase please?
>>
>> There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
>>
>> http://ck.kolivas.org/patches/2.6/2.6.8/
>>
>> Cheers,
>> Con
>>
>>
>
> One thing to note is that I do get a lot of output from the script if I set
> interactive to 0 (delays between 3 and 13 seconds with 60 threads).
Sounds fair.
With interactive==0 it will penalise tasks during their bursts of cpu usage
in the interest of fairness, and your script effectively is BASH doing a
burst of cpu so 3-13 second delays when the load is effectively >60 is
pretty good.
Cheers,
Con
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series
2004-08-11 23:42 ` Con Kolivas
@ 2004-08-12 8:08 ` Prakash K. Cheemplavam
2004-08-12 18:18 ` Bill Davidsen
1 sibling, 0 replies; 21+ messages in thread
From: Prakash K. Cheemplavam @ 2004-08-12 8:08 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux kernel mailing list
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Con Kolivas wrote:
| Prakash K. Cheemplavam wrote:
|
|> 124910 9.8170 vmlinux tcp_poll
|> 123356 9.6949 vmlinux sys_select
|> 85634 6.7302 vmlinux do_select
|> 71858 5.6475 vmlinux sysenter_past_esp
|> 62093 4.8801 vmlinux kfree
|> 51658 4.0600 vmlinux __copy_to_user_ll
|> 37495 2.9468 vmlinux max_select_fd
|> 36949 2.9039 vmlinux __kmalloc
|> 22700 1.7841 vmlinux __copy_from_user_ll
|> 14587 1.1464 vmlinux do_gettimeofday
|>
| It looks like your select timeouts are too short and when the cpu load
| goes up they repeatedly timeout wasting cpu cycles.
| I quote from `man select_tut` under the section SELECT LAW:
|
| 1. You should always try use select without a timeout. Your program
| should have nothing to do if there is no data available. Code
| that depends on timeouts is not usually portable and difficult
| to debug.
|
Thanks for your explanation. I cannot do anything about it, as it is
mpich related. So I'll ask them if they could change its behaviour a bit
so that it eats less CPU on a single CPU machine.
Cheers,
Prakash
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBGyV1xU2n/+9+t5gRAqHEAJ9hW/AJYtMenL6mXQ4JZYvTvRrRkgCdHwQD
LbJ1MYJ/pbpNbrT8vvlD8uI=
=9AUE
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series
2004-08-11 23:42 ` Con Kolivas
2004-08-12 8:08 ` Prakash K. Cheemplavam
@ 2004-08-12 18:18 ` Bill Davidsen
1 sibling, 0 replies; 21+ messages in thread
From: Bill Davidsen @ 2004-08-12 18:18 UTC (permalink / raw)
To: linux-kernel
Con Kolivas wrote:
> Prakash K. Cheemplavam wrote:
>
>> |
>> | I don't think it is the overhead. I rather think the way the kernel
>> | schedulers gives mpich and the cpu bound program resources is unfair.
>>
>> Well, I don't know whether it helps, but I ran a profiler and these are
>> the functions which cause so much wasted CPU cycles when running 16
>> processes of my example with mpich:
>>
>> 124910 9.8170 vmlinux tcp_poll
>> 123356 9.6949 vmlinux sys_select
>> 85634 6.7302 vmlinux do_select
>> 71858 5.6475 vmlinux sysenter_past_esp
>> 62093 4.8801 vmlinux kfree
>> 51658 4.0600 vmlinux __copy_to_user_ll
>> 37495 2.9468 vmlinux max_select_fd
>> 36949 2.9039 vmlinux __kmalloc
>> 22700 1.7841 vmlinux __copy_from_user_ll
>> 14587 1.1464 vmlinux do_gettimeofday
>>
>> Is anything scheduler related?
>
>
> No
>
> It looks like your select timeouts are too short and when the cpu load
> goes up they repeatedly timeout wasting cpu cycles.
> I quote from `man select_tut` under the section SELECT LAW:
>
> 1. You should always try use select without a timeout. Your program
> should have nothing to do if there is no data available. Code
> that depends on timeouts is not usually portable and difficult
> to debug.
There's a generalization which should confuse novice users... correctly
used a timeout IS a debugging technique. Useful to detect when a peer
has gone walkabout, as a common example.
Sounds as if the timeout is way too low here, however. Perhaps they are
using it as poorly-done polling? In any case, not kernel misbehaviour.
--
-bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-11 3:44 ` Peter Williams
@ 2004-08-13 0:13 ` spaminos-ker
2004-08-13 1:44 ` Peter Williams
0 siblings, 1 reply; 21+ messages in thread
From: spaminos-ker @ 2004-08-13 0:13 UTC (permalink / raw)
To: linux-kernel; +Cc: Peter Williams, William Lee Irwin III
--- Peter Williams <pwil3058@bigpond.net.au> wrote:
> I've just run your tests on my desktop and with max_ia_bonus at its
> default value I see the "delta = 3" with 20 threads BUT when I set
> max_ia_bonus to zero they stop (in both "eb" and "pb" mode). So I then
> reran the tests with 60 threads and zero max_ia_bonus and no output was
> generated by your testdelay script in either "eb" or "pb" modes. I
> didn't try "sc" mode as I have a ZAPHOD kernel loaded (not HYDRA) but
> Con has reported that the problem is absent in his latest patches so
> I'll update the "sc" mode in HYDRA to those patches.
>
I just tried the same test on spa-zaphod-linux 4.1 over 2.6.8-rc4
I also have messages with 20 threads "delta = 3" that go away when I set
max_ia_bonus to 0 (and stay off with 60 threads too) in "pb" mode.
But, unlike your desktop, the "eb" mode doesn't seem to get better by setting
max_ia_bonus to 0 on my machine, maybe I need to tweak something else? (even
though, the idea of tweaking for a given workload doesn't sound very good to
me).
The "pb" mode is very responsive with the system under heavy load, I like it :)
I will run some tests over the week end with the actual server to see the
effect of this patch on a more complex system.
Nicolas
PS: the machine I am using is a pure server, only accessible through ssh, so I
can not really tell the behavior under X.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
2004-08-13 0:13 ` spaminos-ker
@ 2004-08-13 1:44 ` Peter Williams
0 siblings, 0 replies; 21+ messages in thread
From: Peter Williams @ 2004-08-13 1:44 UTC (permalink / raw)
To: spaminos-ker; +Cc: linux-kernel, William Lee Irwin III
spaminos-ker@yahoo.com wrote:
> --- Peter Williams <pwil3058@bigpond.net.au> wrote:
>
>>I've just run your tests on my desktop and with max_ia_bonus at its
>>default value I see the "delta = 3" with 20 threads BUT when I set
>>max_ia_bonus to zero they stop (in both "eb" and "pb" mode). So I then
>>reran the tests with 60 threads and zero max_ia_bonus and no output was
>>generated by your testdelay script in either "eb" or "pb" modes. I
>>didn't try "sc" mode as I have a ZAPHOD kernel loaded (not HYDRA) but
>>Con has reported that the problem is absent in his latest patches so
>>I'll update the "sc" mode in HYDRA to those patches.
>>
>
>
> I just tried the same test on spa-zaphod-linux 4.1 over 2.6.8-rc4
>
> I also have messages with 20 threads "delta = 3" that go away when I set
> max_ia_bonus to 0 (and stay off with 60 threads too) in "pb" mode.
I'm going to do some experiments to measure the relationship between the
size of max_ia_bonus and the observed delays to see if there's value
that gives acceptable performance without turning bonuses off completely.
> But, unlike your desktop, the "eb" mode doesn't seem to get better by setting
> max_ia_bonus to 0 on my machine, maybe I need to tweak something else? (even
> though, the idea of tweaking for a given workload doesn't sound very good to
> me).
You could try increasing "base_promotion_interval". When I have a
better idea of the best values (for each mode) for the various
parameters I'll reset their values when the mode is changed.
>
> The "pb" mode is very responsive with the system under heavy load, I like it :)
That's good to hear.
If you have time, I'd appreciate if you could try a few different values
of max_ia_bonus to determine the minimum value that still gives good
responsiveness for your system? I'm trying to get a feel for how much
this varies from system to system.
>
> I will run some tests over the week end with the actual server to see the
> effect of this patch on a more complex system.
>
> Nicolas
>
> PS: the machine I am using is a pure server, only accessible through ssh, so I
> can not really tell the behavior under X.
If it's a pure server I imagine that it's not running X. On a pure
server I'd recommend setting max_ia_bonus to zero.
Thanks
Peter
--
Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2004-08-13 1:44 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20040811010116.GL11200@holomorphy.com>
2004-08-11 2:21 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
2004-08-11 2:23 ` William Lee Irwin III
2004-08-11 2:45 ` Peter Williams
2004-08-11 2:47 ` Peter Williams
2004-08-11 3:23 ` Peter Williams
2004-08-11 3:31 ` Con Kolivas
2004-08-11 3:46 ` Peter Williams
2004-08-11 3:44 ` Peter Williams
2004-08-13 0:13 ` spaminos-ker
2004-08-13 1:44 ` Peter Williams
2004-08-11 3:09 ` Con Kolivas
2004-08-11 10:24 ` Prakash K. Cheemplavam
2004-08-11 11:26 ` Scheduler fairness problem on 2.6 series Con Kolivas
2004-08-11 12:05 ` Prakash K. Cheemplavam
2004-08-11 19:22 ` Prakash K. Cheemplavam
2004-08-11 23:42 ` Con Kolivas
2004-08-12 8:08 ` Prakash K. Cheemplavam
2004-08-12 18:18 ` Bill Davidsen
2004-08-12 2:04 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
2004-08-12 2:24 ` spaminos-ker
2004-08-12 2:53 ` Con Kolivas
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.