All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
       [not found] <20040811010116.GL11200@holomorphy.com>
@ 2004-08-11  2:21 ` spaminos-ker
  2004-08-11  2:23   ` William Lee Irwin III
  2004-08-11  3:09   ` Con Kolivas
  0 siblings, 2 replies; 43+ messages in thread
From: spaminos-ker @ 2004-08-11  2:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: William Lee Irwin III

--- William Lee Irwin III <wli@holomorphy.com> wrote:
> 
> Wakeup bonuses etc. are starving tasks. Could you try Peter Williams'
> SPA patches with the do_promotions() function? I suspect these should
> pass your tests.
> 
> 
> -- wli
> 

I tried the patch-2.6.7-spa_hydra_FULL-v4.0 patch

I only changed the value of /proc/sys/kernel/cpusched/mode to switch between
different patches.

The 2 threads test passes successfuly (improvement over stock 2.6.7) but none
passed the 20 threads test:

eb

Tue Aug 10 19:10:48 PDT 2004
>>>>>>> delta = 6
Tue Aug 10 19:11:03 PDT 2004
>>>>>>> delta = 16
Tue Aug 10 19:11:13 PDT 2004
>>>>>>> delta = 9
Tue Aug 10 19:11:24 PDT 2004
>>>>>>> delta = 11
Tue Aug 10 19:11:34 PDT 2004
>>>>>>> delta = 10
Tue Aug 10 19:11:45 PDT 2004
>>>>>>> delta = 11
Tue Aug 10 19:11:56 PDT 2004
>>>>>>> delta = 11
Tue Aug 10 19:12:06 PDT 2004
>>>>>>> delta = 10



pb

Tue Aug 10 19:07:52 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:07:55 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:07:59 PDT 2004
>>>>>>> delta = 4
Tue Aug 10 19:08:02 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:08:05 PDT 2004
>>>>>>> delta = 3

sc

Tue Aug 10 19:08:28 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:09:08 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:09:17 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:09:23 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:09:49 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:09:53 PDT 2004
>>>>>>> delta = 3
Tue Aug 10 19:09:55 PDT 2004
>>>>>>> delta = 3


eb seemed to be the worst of the bunch with quite long system hangs on this
particular test.
With the default settings of:

base_promotion_interval 255
compute 0
cpu_hog_threshold 900
ia_threshold 900
initial_ia_bonus 1
interactive 0
log_at_exit 0
max_ia_bonus 9
max_tpt_bonus 4
sched_batch_time_slice_multiplier 10
sched_iso_threshold 50
sched_rr_time_slice 100
time_slice 100

I am not very familiar with all the parameters, so I just kept the defaults

Anything else I could try?

Nicolas


=====
------------------------------------------------------------
video meliora proboque deteriora sequor
------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-11  2:21 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
@ 2004-08-11  2:23   ` William Lee Irwin III
  2004-08-11  2:45     ` Peter Williams
  2004-08-11  3:09   ` Con Kolivas
  1 sibling, 1 reply; 43+ messages in thread
From: William Lee Irwin III @ 2004-08-11  2:23 UTC (permalink / raw)
  To: spaminos-ker; +Cc: linux-kernel

On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote:
> I am not very familiar with all the parameters, so I just kept the defaults
> Anything else I could try?
> Nicolas

No. It appeared that the SPA bits had sufficient fairness in them to
pass this test but apparently not quite enough.


-- wli

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-11  2:23   ` William Lee Irwin III
@ 2004-08-11  2:45     ` Peter Williams
  2004-08-11  2:47       ` Peter Williams
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Williams @ 2004-08-11  2:45 UTC (permalink / raw)
  To: spaminos-ker; +Cc: William Lee Irwin III, linux-kernel

William Lee Irwin III wrote:
> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote:
> 
>>I am not very familiar with all the parameters, so I just kept the defaults
>>Anything else I could try?
>>Nicolas
> 
> 
> No. It appeared that the SPA bits had sufficient fairness in them to
> pass this test but apparently not quite enough.
> 

The interactive bonus may interfere with fairness (the throughput bonus 
should actually help it for tasks with equal nice) so you could try 
setting max_ia_bonus to zero (and possibly increasing max_tpt_bonus). 
With "eb" mode this should still give good interactive response but 
expect interactive response to suffer a little in "pb" mode however 
renicing the X server to a negative value should help.

Peter
PS There's a primitive GUI available for setting the scheduler 
parameters at 
<http://prdownloads.sourceforge.net/cpuse/gcpuctl_hydra-1.3.tar.gz?download>
this is just a Python script with a Glade XML file (gcpuctl_hydra.glade) 
which needs to be in the same directory that you run the script from.
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-11  2:45     ` Peter Williams
@ 2004-08-11  2:47       ` Peter Williams
  2004-08-11  3:23         ` Peter Williams
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Williams @ 2004-08-11  2:47 UTC (permalink / raw)
  To: spaminos-ker; +Cc: William Lee Irwin III, linux-kernel

Peter Williams wrote:
> William Lee Irwin III wrote:
> 
>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote:
>>
>>> I am not very familiar with all the parameters, so I just kept the 
>>> defaults
>>> Anything else I could try?
>>> Nicolas
>>
>>
>>
>> No. It appeared that the SPA bits had sufficient fairness in them to
>> pass this test but apparently not quite enough.
>>
> 
> The interactive bonus may interfere with fairness (the throughput bonus 
> should actually help it for tasks with equal nice) so you could try 
> setting max_ia_bonus to zero (and possibly increasing max_tpt_bonus). 
> With "eb" mode this should still give good interactive response but 
> expect interactive response to suffer a little in "pb" mode however 
> renicing the X server to a negative value should help.

I should also have mentioned that fiddling with the promotion interval 
may help.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-11  2:21 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
  2004-08-11  2:23   ` William Lee Irwin III
@ 2004-08-11  3:09   ` Con Kolivas
  2004-08-11 10:24     ` Prakash K. Cheemplavam
                       ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: Con Kolivas @ 2004-08-11  3:09 UTC (permalink / raw)
  To: spaminos-ker; +Cc: linux-kernel, William Lee Irwin III

spaminos-ker@yahoo.com writes:

> --- William Lee Irwin III <wli@holomorphy.com> wrote:
>> 
>> Wakeup bonuses etc. are starving tasks. Could you try Peter Williams'
>> SPA patches with the do_promotions() function? I suspect these should
>> pass your tests.
>> 
>> 
>> -- wli
>> 
> 
> I tried the patch-2.6.7-spa_hydra_FULL-v4.0 patch
> 
> I only changed the value of /proc/sys/kernel/cpusched/mode to switch between
> different patches.
> 
> The 2 threads test passes successfuly (improvement over stock 2.6.7) but none
> passed the 20 threads test:

Hi

I tried this on the latest staircase patch (7.I) and am not getting any 
output from your script when tested up to 60 threads on my hardware. Can you 
try this version of staircase please?

There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1

http://ck.kolivas.org/patches/2.6/2.6.8/

Cheers,
Con


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-11  2:47       ` Peter Williams
@ 2004-08-11  3:23         ` Peter Williams
  2004-08-11  3:31           ` Con Kolivas
  2004-08-11  3:44           ` Peter Williams
  0 siblings, 2 replies; 43+ messages in thread
From: Peter Williams @ 2004-08-11  3:23 UTC (permalink / raw)
  To: spaminos-ker; +Cc: William Lee Irwin III, linux-kernel

Peter Williams wrote:
> Peter Williams wrote:
> 
>> William Lee Irwin III wrote:
>>
>>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote:
>>>
>>>> I am not very familiar with all the parameters, so I just kept the 
>>>> defaults
>>>> Anything else I could try?
>>>> Nicolas
>>>
>>>
>>>
>>>
>>> No. It appeared that the SPA bits had sufficient fairness in them to
>>> pass this test but apparently not quite enough.
>>>
>>
>> The interactive bonus may interfere with fairness (the throughput 
>> bonus should actually help it for tasks with equal nice) so you could 
>> try setting max_ia_bonus to zero (and possibly increasing 
>> max_tpt_bonus). With "eb" mode this should still give good interactive 
>> response but expect interactive response to suffer a little in "pb" 
>> mode however renicing the X server to a negative value should help.
> 
> 
> I should also have mentioned that fiddling with the promotion interval 
> may help.

Having reread your original e-mail I think that this problem is probably 
  being caused by the interactive bonus mechanism classifying the httpd 
server threads as "interactive" threads and giving them a bonus.  But 
for some reason the daemon is not identified as "interactive" meaning 
that it gets given a lower priority.  In this situation if there's a 
large number of httpd threads (even with promotion) it could take quite 
a while for the daemon to get a look in.  Without promotion total 
starvation is even a possibility.

Peter
PS For both "eb" and "pb" modes, max_io_bonus should be set to zero on 
servers (where interactive responsiveness isn't an issue).
PPS For "sc" mode, try setting "interactive" to zero and "compute" to 1.
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-11  3:23         ` Peter Williams
@ 2004-08-11  3:31           ` Con Kolivas
  2004-08-11  3:46             ` Peter Williams
  2004-08-11  3:44           ` Peter Williams
  1 sibling, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2004-08-11  3:31 UTC (permalink / raw)
  To: Peter Williams; +Cc: spaminos-ker, William Lee Irwin III, linux-kernel

Peter Williams writes:

> Peter Williams wrote:
>> Peter Williams wrote:
>> 
>>> William Lee Irwin III wrote:
>>>
>>>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote:
>>>>
>>>>> I am not very familiar with all the parameters, so I just kept the 
>>>>> defaults
>>>>> Anything else I could try?
>>>>> Nicolas
>>>>
>>>>
>>>>
>>>>
>>>> No. It appeared that the SPA bits had sufficient fairness in them to
>>>> pass this test but apparently not quite enough.
>>>>
>>>
>>> The interactive bonus may interfere with fairness (the throughput 
>>> bonus should actually help it for tasks with equal nice) so you could 
>>> try setting max_ia_bonus to zero (and possibly increasing 
>>> max_tpt_bonus). With "eb" mode this should still give good interactive 
>>> response but expect interactive response to suffer a little in "pb" 
>>> mode however renicing the X server to a negative value should help.
>> 
>> 
>> I should also have mentioned that fiddling with the promotion interval 
>> may help.
> 
> Having reread your original e-mail I think that this problem is probably 
>   being caused by the interactive bonus mechanism classifying the httpd 
> server threads as "interactive" threads and giving them a bonus.  But 
> for some reason the daemon is not identified as "interactive" meaning 
> that it gets given a lower priority.  In this situation if there's a 
> large number of httpd threads (even with promotion) it could take quite 
> a while for the daemon to get a look in.  Without promotion total 
> starvation is even a possibility.
> 
> Peter
> PS For both "eb" and "pb" modes, max_io_bonus should be set to zero on 
> servers (where interactive responsiveness isn't an issue).
> PPS For "sc" mode, try setting "interactive" to zero and "compute" to 1.

No, compute should not be set to 1 for a server. It is reserved only for 
computational nodes, not regular servers. "Compute"  will increase latency 
which is undersirable.

Cheers,
Con


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-11  3:23         ` Peter Williams
  2004-08-11  3:31           ` Con Kolivas
@ 2004-08-11  3:44           ` Peter Williams
  2004-08-13  0:13             ` spaminos-ker
  1 sibling, 1 reply; 43+ messages in thread
From: Peter Williams @ 2004-08-11  3:44 UTC (permalink / raw)
  To: spaminos-ker; +Cc: Peter Williams, William Lee Irwin III, linux-kernel

Peter Williams wrote:
> Peter Williams wrote:
> 
>> Peter Williams wrote:
>>
>>> William Lee Irwin III wrote:
>>>
>>>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com wrote:
>>>>
>>>>> I am not very familiar with all the parameters, so I just kept the 
>>>>> defaults
>>>>> Anything else I could try?
>>>>> Nicolas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> No. It appeared that the SPA bits had sufficient fairness in them to
>>>> pass this test but apparently not quite enough.
>>>>
>>>
>>> The interactive bonus may interfere with fairness (the throughput 
>>> bonus should actually help it for tasks with equal nice) so you could 
>>> try setting max_ia_bonus to zero (and possibly increasing 
>>> max_tpt_bonus). With "eb" mode this should still give good 
>>> interactive response but expect interactive response to suffer a 
>>> little in "pb" mode however renicing the X server to a negative value 
>>> should help.
>>
>>
>>
>> I should also have mentioned that fiddling with the promotion interval 
>> may help.
> 
> 
> Having reread your original e-mail I think that this problem is probably 
>  being caused by the interactive bonus mechanism classifying the httpd 
> server threads as "interactive" threads and giving them a bonus.  But 
> for some reason the daemon is not identified as "interactive" meaning 
> that it gets given a lower priority.  In this situation if there's a 
> large number of httpd threads (even with promotion) it could take quite 
> a while for the daemon to get a look in.  Without promotion total 
> starvation is even a possibility.
> 
> Peter
> PS For both "eb" and "pb" modes, max_io_bonus should be set to zero on 
> servers (where interactive responsiveness isn't an issue).
> PPS For "sc" mode, try setting "interactive" to zero and "compute" to 1.

I've just run your tests on my desktop and with max_ia_bonus at its 
default value I see the "delta = 3" with 20 threads BUT when I set 
max_ia_bonus to zero they stop (in both "eb" and "pb" mode).  So I then 
reran the tests with 60 threads and zero max_ia_bonus and no output was 
generated by your testdelay script in either "eb" or "pb" modes.  I 
didn't try "sc" mode as I have a ZAPHOD kernel loaded (not HYDRA) but 
Con has reported that the problem is absent in his latest patches so 
I'll update the "sc" mode in HYDRA to those patches.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-11  3:31           ` Con Kolivas
@ 2004-08-11  3:46             ` Peter Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Williams @ 2004-08-11  3:46 UTC (permalink / raw)
  To: Con Kolivas; +Cc: spaminos-ker, William Lee Irwin III, linux-kernel

Con Kolivas wrote:
> Peter Williams writes:
> 
>> Peter Williams wrote:
>>
>>> Peter Williams wrote:
>>>
>>>> William Lee Irwin III wrote:
>>>>
>>>>> On Tue, Aug 10, 2004 at 07:21:43PM -0700, spaminos-ker@yahoo.com 
>>>>> wrote:
>>>>>
>>>>>> I am not very familiar with all the parameters, so I just kept the 
>>>>>> defaults
>>>>>> Anything else I could try?
>>>>>> Nicolas
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> No. It appeared that the SPA bits had sufficient fairness in them to
>>>>> pass this test but apparently not quite enough.
>>>>>
>>>>
>>>> The interactive bonus may interfere with fairness (the throughput 
>>>> bonus should actually help it for tasks with equal nice) so you 
>>>> could try setting max_ia_bonus to zero (and possibly increasing 
>>>> max_tpt_bonus). With "eb" mode this should still give good 
>>>> interactive response but expect interactive response to suffer a 
>>>> little in "pb" mode however renicing the X server to a negative 
>>>> value should help.
>>>
>>>
>>>
>>> I should also have mentioned that fiddling with the promotion 
>>> interval may help.
>>
>>
>> Having reread your original e-mail I think that this problem is 
>> probably   being caused by the interactive bonus mechanism classifying 
>> the httpd server threads as "interactive" threads and giving them a 
>> bonus.  But for some reason the daemon is not identified as 
>> "interactive" meaning that it gets given a lower priority.  In this 
>> situation if there's a large number of httpd threads (even with 
>> promotion) it could take quite a while for the daemon to get a look 
>> in.  Without promotion total starvation is even a possibility.
>>
>> Peter
>> PS For both "eb" and "pb" modes, max_io_bonus should be set to zero on 
>> servers (where interactive responsiveness isn't an issue).
>> PPS For "sc" mode, try setting "interactive" to zero and "compute" to 1.
> 
> 
> No, compute should not be set to 1 for a server. It is reserved only for 
> computational nodes, not regular servers. "Compute"  will increase 
> latency which is undersirable.

Sorry, my misunderstanding.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-11  3:09   ` Con Kolivas
@ 2004-08-11 10:24     ` Prakash K. Cheemplavam
  2004-08-11 11:26       ` Scheduler fairness problem on 2.6 series Con Kolivas
  2004-08-12  2:04     ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
  2004-08-12  2:24     ` spaminos-ker
  2 siblings, 1 reply; 43+ messages in thread
From: Prakash K. Cheemplavam @ 2004-08-11 10:24 UTC (permalink / raw)
  To: Con Kolivas; +Cc: spaminos-ker, linux-kernel, William Lee Irwin III

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Con Kolivas wrote:
| I tried this on the latest staircase patch (7.I) and am not getting any
| output from your script when tested up to 60 threads on my hardware. Can
| you try this version of staircase please?
|
| There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
|
| http://ck.kolivas.org/patches/2.6/2.6.8/

Hi,

I just updated to 2.6.8-rc4-ck2 and tried the two options interactive
and compute. Is the compute stuff functional? I tried setting it to 1
within X and after that X wasn't usable anymore (meaning it looked like
locked up, frozen/gone mouse cursor even). I managed to switch back to
console and set it to 0 and all was OK again.

The interactive to 0 setting helped me with runnign locally multiple
processes using mpi. Nevertheless (only with interactive 1 regression to
vanilla scheduler, else same) can't this be enhanced?

Details: I am working on a load balancing class using mpi. For testing
purpises I am running multiple processes on my machine. So for a given
problem I can say, it needs x time to solve. Using more processes opn a
single machine, this time (except communication and balancing overhead)
shouldn't be much larger. Unfortunately this happens. Eg. a given
probelm using two processes needs about 20 seconds to finish. But using
8 it already needs 47s (55s with interactiv set to 1). No, my balancing
framework is quite good. On a real (small, even larger till 128 nodes
tested) cluster overhead is just as low as 3% to 5%, ie. it scales quite
linearly.

Any idea how to tweak the staircase to get near the 20 seconds with more
processes? Or is this rather a problem of mpich used locally?

If you like I can send you my code to test (beware it is not that small).

Cheers,

Prakash
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBGfPZxU2n/+9+t5gRApa1AJ9j82Aujwj/IoGLqvDsX29y/dLu/wCglvse
bRV6zeWc+6z+ETl9Hxqleho=
=Jay6
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series
  2004-08-11 10:24     ` Prakash K. Cheemplavam
@ 2004-08-11 11:26       ` Con Kolivas
  2004-08-11 12:05         ` Prakash K. Cheemplavam
  0 siblings, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2004-08-11 11:26 UTC (permalink / raw)
  To: Prakash K. Cheemplavam; +Cc: linux kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 2465 bytes --]

Prakash K. Cheemplavam wrote:
> Con Kolivas wrote:
> | I tried this on the latest staircase patch (7.I) and am not getting any
> | output from your script when tested up to 60 threads on my hardware. Can
> | you try this version of staircase please?
> |
> | There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
> |
> | http://ck.kolivas.org/patches/2.6/2.6.8/
> 
> Hi,
> 
> I just updated to 2.6.8-rc4-ck2 and tried the two options interactive
> and compute. Is the compute stuff functional? I tried setting it to 1
> within X and after that X wasn't usable anymore (meaning it looked like
> locked up, frozen/gone mouse cursor even). I managed to switch back to
> console and set it to 0 and all was OK again.

Compute is very functional. However it isn't remotely meant to be run on 
a desktop because of very large scheduling latencies (on purpose).

> The interactive to 0 setting helped me with runnign locally multiple
> processes using mpi. Nevertheless (only with interactive 1 regression to
> vanilla scheduler, else same) can't this be enhanced?

I don't understand your question. Can what be enhanced?

> Details: I am working on a load balancing class using mpi. For testing
> purpises I am running multiple processes on my machine. So for a given
> problem I can say, it needs x time to solve. Using more processes opn a
> single machine, this time (except communication and balancing overhead)
> shouldn't be much larger. Unfortunately this happens. Eg. a given
> probelm using two processes needs about 20 seconds to finish. But using
> 8 it already needs 47s (55s with interactiv set to 1). No, my balancing
> framework is quite good. On a real (small, even larger till 128 nodes
> tested) cluster overhead is just as low as 3% to 5%, ie. it scales quite
> linearly.

Once again I dont quite understand you. Are you saying that there is 
more than 50% cpu overhead when running 8 processes? Or that the cpu is 
distributed unfairly such that the longest will run for 47s?

> Any idea how to tweak the staircase to get near the 20 seconds with more
> processes? Or is this rather a problem of mpich used locally?

Compute mode is by far the most scalable mode in staircase for purely 
computational tasks. The cost is that of interactivity; it is bad on 
purpose since it is a no-compromise maximum cpu cache utilisation policy.

> If you like I can send you my code to test (beware it is not that small).
> 
> Cheers,
> 
> Prakash

Cheers,
Con

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series
  2004-08-11 11:26       ` Scheduler fairness problem on 2.6 series Con Kolivas
@ 2004-08-11 12:05         ` Prakash K. Cheemplavam
  2004-08-11 19:22           ` Prakash K. Cheemplavam
  0 siblings, 1 reply; 43+ messages in thread
From: Prakash K. Cheemplavam @ 2004-08-11 12:05 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Con Kolivas wrote:
| Prakash K. Cheemplavam wrote:
|
|> Con Kolivas wrote:
|> | I tried this on the latest staircase patch (7.I) and am not getting any
|> | output from your script when tested up to 60 threads on my hardware.
|> Can
|> | you try this version of staircase please?
|> |
|> | There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
|> |
|> | http://ck.kolivas.org/patches/2.6/2.6.8/
|>
|> Hi,
|>
|> I just updated to 2.6.8-rc4-ck2 and tried the two options interactive
|> and compute. Is the compute stuff functional? I tried setting it to 1
|> within X and after that X wasn't usable anymore (meaning it looked like
|> locked up, frozen/gone mouse cursor even). I managed to switch back to
|> console and set it to 0 and all was OK again.
|
|
| Compute is very functional. However it isn't remotely meant to be run on
| a desktop because of very large scheduling latencies (on purpose).

Uhm, OK, I didn't know it would have such drastic effect. Perhpas you
should add a warnign that this setting shouldn't be used on X. :-)

|
|> The interactive to 0 setting helped me with runnign locally multiple
|> processes using mpi. Nevertheless (only with interactive 1 regression to
|> vanilla scheduler, else same) can't this be enhanced?
|
|
| I don't understand your question. Can what be enhanced?
|
|> Details: I am working on a load balancing class using mpi. For testing
|> purpises I am running multiple processes on my machine. So for a given
|> problem I can say, it needs x time to solve. Using more processes opn a
|> single machine, this time (except communication and balancing overhead)
|> shouldn't be much larger. Unfortunately this happens. Eg. a given
|> probelm using two processes needs about 20 seconds to finish. But using
|> 8 it already needs 47s (55s with interactiv set to 1). No, my balancing
|> framework is quite good. On a real (small, even larger till 128 nodes
|> tested) cluster overhead is just as low as 3% to 5%, ie. it scales quite
|> linearly.
|
|
| Once again I dont quite understand you. Are you saying that there is
| more than 50% cpu overhead when running 8 processes? Or that the cpu is
| distributed unfairly such that the longest will run for 47s?

I don't think it is the overhead. I rather think the way the kernel
schedulers gives mpich and the cpu bound program  resources is unfair.
Or the timeslice is tto big? Those 8 processes in my test usually do a
load-balancing after 1 second of work. In this second all of those
processes should use the CPU at the same time. I rather have the
impression that the processes get CPU time one after the other, so it
fools the load balancer to think the cpu is fast (the job is done in
"regular" time but the overhead seems to be big, as each process after
having finished now waits for the next one to finish and communicate
with it.

Or to put it more graphically (with 4 processes consisting of 3 parts
just for making it clear and final communication:)

What is done now (xy, x: process, y:part or communication):

11 12 13 1c 21 22 23 2c 31 32 33 3c 41 42 43 4c

What the sheduler should rather do:

11 21 31 41 12 22 32 42 13 23 33 43 1c 2c 3c 4c

So the balancer would rather find the CPU to be slower by the factor of
used processes instead of thinking the overhead is big. (I am not sure
whether this really explains the steep increase of time wasted with more
processes used. Perheaps it really is mpich, though I don't understand
why it would use up so much time. Any way for me to find out? Via
profiling?)

This is just a guess of what I think goes wrong. (Is the timeslice
simply too big which the scheduler gives each process?)

hth,

Prakash
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBGgtxxU2n/+9+t5gRAp4HAJ0eN4j3RHvTmvQDzMi+fpa2YAuU3QCgpQRQ
6zbDInJz3DqrJrzh3DUTiIw=
=Yk5C
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series
  2004-08-11 12:05         ` Prakash K. Cheemplavam
@ 2004-08-11 19:22           ` Prakash K. Cheemplavam
  2004-08-11 23:42             ` Con Kolivas
  0 siblings, 1 reply; 43+ messages in thread
From: Prakash K. Cheemplavam @ 2004-08-11 19:22 UTC (permalink / raw)
  Cc: Con Kolivas, linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

|
| I don't think it is the overhead. I rather think the way the kernel
| schedulers gives mpich and the cpu bound program  resources is unfair.

Well, I don't know whether it helps, but I ran a profiler and these are
the functions which cause so much wasted CPU cycles when running 16
processes of my example with mpich:

124910    9.8170  vmlinux                  tcp_poll
123356    9.6949  vmlinux                  sys_select
85634     6.7302  vmlinux                  do_select
71858     5.6475  vmlinux                  sysenter_past_esp
62093     4.8801  vmlinux                  kfree
51658     4.0600  vmlinux                  __copy_to_user_ll
37495     2.9468  vmlinux                  max_select_fd
36949     2.9039  vmlinux                  __kmalloc
22700     1.7841  vmlinux                  __copy_from_user_ll
14587     1.1464  vmlinux                  do_gettimeofday

Is anything scheduler related?

bye,

Prakash
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBGnHxxU2n/+9+t5gRAlF+AJ9z+OqbIJYkeiy4nAPVB22S/WLLnACg1khF
XeF+3Hq0adpoLjdbn+tmzn0=
=7Onu
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series
  2004-08-11 19:22           ` Prakash K. Cheemplavam
@ 2004-08-11 23:42             ` Con Kolivas
  2004-08-12  8:08               ` Prakash K. Cheemplavam
  2004-08-12 18:18               ` Bill Davidsen
  0 siblings, 2 replies; 43+ messages in thread
From: Con Kolivas @ 2004-08-11 23:42 UTC (permalink / raw)
  To: Prakash K. Cheemplavam; +Cc: linux kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 1404 bytes --]

Prakash K. Cheemplavam wrote:
> |
> | I don't think it is the overhead. I rather think the way the kernel
> | schedulers gives mpich and the cpu bound program  resources is unfair.
> 
> Well, I don't know whether it helps, but I ran a profiler and these are
> the functions which cause so much wasted CPU cycles when running 16
> processes of my example with mpich:
> 
> 124910    9.8170  vmlinux                  tcp_poll
> 123356    9.6949  vmlinux                  sys_select
> 85634     6.7302  vmlinux                  do_select
> 71858     5.6475  vmlinux                  sysenter_past_esp
> 62093     4.8801  vmlinux                  kfree
> 51658     4.0600  vmlinux                  __copy_to_user_ll
> 37495     2.9468  vmlinux                  max_select_fd
> 36949     2.9039  vmlinux                  __kmalloc
> 22700     1.7841  vmlinux                  __copy_from_user_ll
> 14587     1.1464  vmlinux                  do_gettimeofday
> 
> Is anything scheduler related?

No

It looks like your select timeouts are too short and when the cpu load 
goes up they repeatedly timeout wasting cpu cycles.
I quote from `man select_tut` under the section SELECT LAW:

1. You should always try use select without a timeout. Your program
  should have nothing to do if there is no  data  available.  Code
  that  depends  on timeouts is not usually portable and difficult
  to debug.

Cheers,
Con

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-11  3:09   ` Con Kolivas
  2004-08-11 10:24     ` Prakash K. Cheemplavam
@ 2004-08-12  2:04     ` spaminos-ker
  2004-08-12  2:24     ` spaminos-ker
  2 siblings, 0 replies; 43+ messages in thread
From: spaminos-ker @ 2004-08-12  2:04 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, William Lee Irwin III

--- Con Kolivas <kernel@kolivas.org> wrote:
> Hi
> 
> I tried this on the latest staircase patch (7.I) and am not getting any 
> output from your script when tested up to 60 threads on my hardware. Can you 
> try this version of staircase please?
> 
> There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
> 
> http://ck.kolivas.org/patches/2.6/2.6.8/
> 
> Cheers,
> Con
> 

Just tried on my machine:
2.6.8-rc4 fails all tests (did the test just to be sure)

2.6.8-rc4 with the "from_2.6.8-rc4_to_staircase7.I" patch and things look
pretty good:
on my hardware, I could put 60 threads too, and my shells are still very
responsive etc, and I get no slow downs with my watchdog script.

A few  strange things happened though (with 60 threads):
* after a few minutes, I got one message
Wed Aug 11 18:06:11 PDT 2004
>>>>>>> delta = 57
57 seconds !?! very surprising
* shortly after that, I tried to run top, or ps, and they all got stuck, I
waited a couple minutes and they were still stuck. I opened a few shells, I
could do anything but commands that enumerate the process list. After a while,
I killed the cputest program (ctrld c it), and the stucked ps/top continued
their execution.

I could not reproduce those problems ; I even rebooted the machine, but only
got one message delta of 3 every 30 minutes or so.

Nicolas


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and         others)
  2004-08-11  3:09   ` Con Kolivas
  2004-08-11 10:24     ` Prakash K. Cheemplavam
  2004-08-12  2:04     ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
@ 2004-08-12  2:24     ` spaminos-ker
  2004-08-12  2:53       ` Con Kolivas
  2 siblings, 1 reply; 43+ messages in thread
From: spaminos-ker @ 2004-08-12  2:24 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, William Lee Irwin III

--- Con Kolivas <kernel@kolivas.org> wrote:
> 
> Hi
> 
> I tried this on the latest staircase patch (7.I) and am not getting any 
> output from your script when tested up to 60 threads on my hardware. Can you 
> try this version of staircase please?
> 
> There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
> 
> http://ck.kolivas.org/patches/2.6/2.6.8/
> 
> Cheers,
> Con
> 
> 

One thing to note is that I do get a lot of output from the script if I set
interactive to 0 (delays between 3 and 13 seconds with 60 threads).

Nicolas


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-12  2:24     ` spaminos-ker
@ 2004-08-12  2:53       ` Con Kolivas
  0 siblings, 0 replies; 43+ messages in thread
From: Con Kolivas @ 2004-08-12  2:53 UTC (permalink / raw)
  To: spaminos-ker; +Cc: linux-kernel, William Lee Irwin III

spaminos-ker@yahoo.com writes:

> --- Con Kolivas <kernel@kolivas.org> wrote:
>> 
>> Hi
>> 
>> I tried this on the latest staircase patch (7.I) and am not getting any 
>> output from your script when tested up to 60 threads on my hardware. Can you 
>> try this version of staircase please?
>> 
>> There are 7.I patches against 2.6.8-rc4 and 2.6.8-rc4-mm1
>> 
>> http://ck.kolivas.org/patches/2.6/2.6.8/
>> 
>> Cheers,
>> Con
>> 
>> 
> 
> One thing to note is that I do get a lot of output from the script if I set
> interactive to 0 (delays between 3 and 13 seconds with 60 threads).

Sounds fair. 

With interactive==0 it will penalise tasks during their bursts of cpu usage 
in the interest of fairness, and your script effectively is BASH doing a 
burst of cpu so 3-13 second delays when the load is effectively >60 is 
pretty good.

Cheers,
Con


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series
  2004-08-11 23:42             ` Con Kolivas
@ 2004-08-12  8:08               ` Prakash K. Cheemplavam
  2004-08-12 18:18               ` Bill Davidsen
  1 sibling, 0 replies; 43+ messages in thread
From: Prakash K. Cheemplavam @ 2004-08-12  8:08 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Con Kolivas wrote:
| Prakash K. Cheemplavam wrote:
|
|> 124910    9.8170  vmlinux                  tcp_poll
|> 123356    9.6949  vmlinux                  sys_select
|> 85634     6.7302  vmlinux                  do_select
|> 71858     5.6475  vmlinux                  sysenter_past_esp
|> 62093     4.8801  vmlinux                  kfree
|> 51658     4.0600  vmlinux                  __copy_to_user_ll
|> 37495     2.9468  vmlinux                  max_select_fd
|> 36949     2.9039  vmlinux                  __kmalloc
|> 22700     1.7841  vmlinux                  __copy_from_user_ll
|> 14587     1.1464  vmlinux                  do_gettimeofday
|>
| It looks like your select timeouts are too short and when the cpu load
| goes up they repeatedly timeout wasting cpu cycles.
| I quote from `man select_tut` under the section SELECT LAW:
|
| 1. You should always try use select without a timeout. Your program
|  should have nothing to do if there is no  data  available.  Code
|  that  depends  on timeouts is not usually portable and difficult
|  to debug.
|

Thanks for your explanation. I cannot do anything about it, as it is
mpich related. So I'll ask them if they could change its behaviour a bit
so that it eats less CPU on a single CPU machine.

Cheers,

Prakash
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBGyV1xU2n/+9+t5gRAqHEAJ9hW/AJYtMenL6mXQ4JZYvTvRrRkgCdHwQD
LbJ1MYJ/pbpNbrT8vvlD8uI=
=9AUE
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series
  2004-08-11 23:42             ` Con Kolivas
  2004-08-12  8:08               ` Prakash K. Cheemplavam
@ 2004-08-12 18:18               ` Bill Davidsen
  1 sibling, 0 replies; 43+ messages in thread
From: Bill Davidsen @ 2004-08-12 18:18 UTC (permalink / raw)
  To: linux-kernel

Con Kolivas wrote:
> Prakash K. Cheemplavam wrote:
> 
>> |
>> | I don't think it is the overhead. I rather think the way the kernel
>> | schedulers gives mpich and the cpu bound program  resources is unfair.
>>
>> Well, I don't know whether it helps, but I ran a profiler and these are
>> the functions which cause so much wasted CPU cycles when running 16
>> processes of my example with mpich:
>>
>> 124910    9.8170  vmlinux                  tcp_poll
>> 123356    9.6949  vmlinux                  sys_select
>> 85634     6.7302  vmlinux                  do_select
>> 71858     5.6475  vmlinux                  sysenter_past_esp
>> 62093     4.8801  vmlinux                  kfree
>> 51658     4.0600  vmlinux                  __copy_to_user_ll
>> 37495     2.9468  vmlinux                  max_select_fd
>> 36949     2.9039  vmlinux                  __kmalloc
>> 22700     1.7841  vmlinux                  __copy_from_user_ll
>> 14587     1.1464  vmlinux                  do_gettimeofday
>>
>> Is anything scheduler related?
> 
> 
> No
> 
> It looks like your select timeouts are too short and when the cpu load 
> goes up they repeatedly timeout wasting cpu cycles.
> I quote from `man select_tut` under the section SELECT LAW:
> 
> 1. You should always try use select without a timeout. Your program
>  should have nothing to do if there is no  data  available.  Code
>  that  depends  on timeouts is not usually portable and difficult
>  to debug.

There's a generalization which should confuse novice users... correctly 
used a timeout IS a debugging technique. Useful to detect when a peer 
has gone walkabout, as a common example.

Sounds as if the timeout is way too low here, however. Perhaps they are 
using it as poorly-done polling? In any case, not kernel misbehaviour.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-11  3:44           ` Peter Williams
@ 2004-08-13  0:13             ` spaminos-ker
  2004-08-13  1:44               ` Peter Williams
  0 siblings, 1 reply; 43+ messages in thread
From: spaminos-ker @ 2004-08-13  0:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Williams, William Lee Irwin III


--- Peter Williams <pwil3058@bigpond.net.au> wrote:
> I've just run your tests on my desktop and with max_ia_bonus at its 
> default value I see the "delta = 3" with 20 threads BUT when I set 
> max_ia_bonus to zero they stop (in both "eb" and "pb" mode).  So I then 
> reran the tests with 60 threads and zero max_ia_bonus and no output was 
> generated by your testdelay script in either "eb" or "pb" modes.  I 
> didn't try "sc" mode as I have a ZAPHOD kernel loaded (not HYDRA) but 
> Con has reported that the problem is absent in his latest patches so 
> I'll update the "sc" mode in HYDRA to those patches.
> 

I just tried the same test on spa-zaphod-linux 4.1 over 2.6.8-rc4

I also have messages with 20 threads "delta = 3" that go away when I set
max_ia_bonus to 0 (and stay off with 60 threads too) in "pb" mode.
But, unlike your desktop, the "eb" mode doesn't seem to get better by setting
max_ia_bonus to 0 on my machine, maybe I need to tweak something else? (even
though, the idea of tweaking for a given workload doesn't sound very good to
me).

The "pb" mode is very responsive with the system under heavy load, I like it :)

I will run some tests over the week end with the actual server to see the
effect of this patch on a more complex system.

Nicolas

PS: the machine I am using is a pure server, only accessible through ssh, so I
can not really tell the behavior under X.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-13  0:13             ` spaminos-ker
@ 2004-08-13  1:44               ` Peter Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Williams @ 2004-08-13  1:44 UTC (permalink / raw)
  To: spaminos-ker; +Cc: linux-kernel, William Lee Irwin III

spaminos-ker@yahoo.com wrote:
> --- Peter Williams <pwil3058@bigpond.net.au> wrote:
> 
>>I've just run your tests on my desktop and with max_ia_bonus at its 
>>default value I see the "delta = 3" with 20 threads BUT when I set 
>>max_ia_bonus to zero they stop (in both "eb" and "pb" mode).  So I then 
>>reran the tests with 60 threads and zero max_ia_bonus and no output was 
>>generated by your testdelay script in either "eb" or "pb" modes.  I 
>>didn't try "sc" mode as I have a ZAPHOD kernel loaded (not HYDRA) but 
>>Con has reported that the problem is absent in his latest patches so 
>>I'll update the "sc" mode in HYDRA to those patches.
>>
> 
> 
> I just tried the same test on spa-zaphod-linux 4.1 over 2.6.8-rc4
> 
> I also have messages with 20 threads "delta = 3" that go away when I set
> max_ia_bonus to 0 (and stay off with 60 threads too) in "pb" mode.

I'm going to do some experiments to measure the relationship between the 
size of max_ia_bonus and the observed delays to see if there's value 
that gives acceptable performance without turning bonuses off completely.

> But, unlike your desktop, the "eb" mode doesn't seem to get better by setting
> max_ia_bonus to 0 on my machine, maybe I need to tweak something else? (even
> though, the idea of tweaking for a given workload doesn't sound very good to
> me).

You could try increasing "base_promotion_interval".  When I have a 
better idea of the best values (for each mode) for the various 
parameters I'll reset their values when the mode is changed.

> 
> The "pb" mode is very responsive with the system under heavy load, I like it :)

That's good to hear.

If you have time, I'd appreciate if you could try a few different values 
of max_ia_bonus to determine the minimum value that still gives good 
responsiveness for your system?  I'm trying to get a feel for how much 
this varies from system to system.

> 
> I will run some tests over the week end with the actual server to see the
> effect of this patch on a more complex system.
> 
> Nicolas
> 
> PS: the machine I am using is a pure server, only accessible through ssh, so I
> can not really tell the behavior under X.

If it's a pure server I imagine that it's not running X.  On a pure 
server I'd recommend setting max_ia_bonus to zero.

Thanks
Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-29  1:31                         ` Peter Williams
@ 2004-09-13 20:09                           ` spaminos-ker
  0 siblings, 0 replies; 43+ messages in thread
From: spaminos-ker @ 2004-09-13 20:09 UTC (permalink / raw)
  To: Peter Williams; +Cc: Lee Revell, linux-kernel

--- Peter Williams <pwil3058@bigpond.net.au> wrote:

>> Nicolas,
>> 	I'll generate a combined patch and let you know when it's ready.  In 
>> the mean time, could you try increasing the "base_promotion_interval" to 
>> about twice the time slice size?
> 

>A patch for the ZAPHOD compiler on top of the R5 voluntary preemption 
>patches is available at:
>
><http://prdownloads.sourceforge.net/cpuse/patch-2.6.9-rc1-vp-R5-zaphod-v5.0.1?download>
>
>Due to the fact that the R5 patch requires the bk12 patch to be applied 
>to 2.6.9-rc1 before it is applied, generating a combined patch resulted 
>in a very large patch (and lots of duplicated effort) so this patch is 
>not a combined patch but is relative to a 2.6.9-rc1 kernel with bk12 and 
>  voluntary preempt R5 patches already applied.

I have been running for several days with this patched kernel, with
0 for max_ia_bonus
and 0 for max_tpt_bonus
in "pb" mode

and there are no slow downs at all: my system is running very steadily now!

So it seems that we are getting somewhere!!

Nicolas


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-29  2:28                         ` spaminos-ker
@ 2004-08-29  4:53                           ` Peter Williams
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Williams @ 2004-08-29  4:53 UTC (permalink / raw)
  To: spaminos-ker; +Cc: Lee Revell, linux-kernel

spaminos-ker@yahoo.com wrote:
> --- Peter Williams <pwil3058@bigpond.net.au> wrote:
> 
>>The mode in which the scheduler was being used had all priority fiddling 
>>(except promotion) turned off so the tasks should have been just round 
>>robinning with each other.  Also, the time outs are fairly rare (every 
>>few hours according to Nicolas's e-mail) and happen with several 
>>different schedulers (with ZAPHOD (the one being used by Nicolas) and 
>>Con's staircase schedulers having less problem than the vanilla 
>>scheduler) which is why I thought it might be something outside the 
>>scheduler.  Perhaps it's something outside the kernel?
>>
> 
> 
> I can add to this that this problem occured on a variety of systems, single CPU
> Pentium IIIs and 4s, Athlon, dual PIIIs ;
> the one thing in common is that everything works fine on all those machines
> with 2.4, but breaks with 2.5 (or redhat 2.4 kernel with some backported code).

I don't suppose you know what the backported code was?  If you could 
provide a patch of the backport it might provide some clues.

> When I do the tests, the only thing I switch is the kernel and reboot.
> 
> It's true that it could be something broken outside of the scheduling code
> (like the way IRQ events are handled maybe, or the way signals are delivered).
> 
> The one difference between the artificial test (from the original post) and the
> real life test I do now, is that the real test combines disks I/O, network I/O
> (TCP/IP and UDP) and several multithreaded processes.
> Where things are kind of bad is that I am far from saturating the machine (the
> load average is less than 2), but still some processes get those annoying
> timeouts.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-29  2:03                       ` Peter Williams
@ 2004-08-29  2:28                         ` spaminos-ker
  2004-08-29  4:53                           ` Peter Williams
  0 siblings, 1 reply; 43+ messages in thread
From: spaminos-ker @ 2004-08-29  2:28 UTC (permalink / raw)
  To: Peter Williams, Lee Revell; +Cc: linux-kernel

--- Peter Williams <pwil3058@bigpond.net.au> wrote:
> The mode in which the scheduler was being used had all priority fiddling 
> (except promotion) turned off so the tasks should have been just round 
> robinning with each other.  Also, the time outs are fairly rare (every 
> few hours according to Nicolas's e-mail) and happen with several 
> different schedulers (with ZAPHOD (the one being used by Nicolas) and 
> Con's staircase schedulers having less problem than the vanilla 
> scheduler) which is why I thought it might be something outside the 
> scheduler.  Perhaps it's something outside the kernel?
> 

I can add to this that this problem occured on a variety of systems, single CPU
Pentium IIIs and 4s, Athlon, dual PIIIs ;
the one thing in common is that everything works fine on all those machines
with 2.4, but breaks with 2.5 (or redhat 2.4 kernel with some backported code).
When I do the tests, the only thing I switch is the kernel and reboot.

It's true that it could be something broken outside of the scheduling code
(like the way IRQ events are handled maybe, or the way signals are delivered).

The one difference between the artificial test (from the original post) and the
real life test I do now, is that the real test combines disks I/O, network I/O
(TCP/IP and UDP) and several multithreaded processes.
Where things are kind of bad is that I am far from saturating the machine (the
load average is less than 2), but still some processes get those annoying
timeouts.

Nicolas


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-29  1:19                     ` spaminos-ker
  2004-08-29  1:22                       ` Lee Revell
@ 2004-08-29  2:20                       ` Lee Revell
  1 sibling, 0 replies; 43+ messages in thread
From: Lee Revell @ 2004-08-29  2:20 UTC (permalink / raw)
  To: spaminos-ker; +Cc: Peter Williams, linux-kernel

On Sat, 2004-08-28 at 21:19, spaminos-ker@yahoo.com wrote:
> --- Lee Revell <rlrevell@joe-job.com> wrote:
> > Is this an SMP machine?  There were problems with that version of the
> > voluntary preemption patches on SMP.  The latest version, Q3, should fix
> > these.
> > 
> No, it's a single CPU Athlon 1800+, the kernel is compiled in with support for
> SMP system, but that should not have any impact.
> 

It shouldn't, but it can.  For example taking a spinlock just disables
preemption with a UP kernel, but with an SMP kernel I believe you can
actually end up spinning.  You would have to have hit a locking bug or
race condition for this to happen.  Just to be certain, can you
reproduce the problem with a UP kernel?

Lee


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-29  0:45                     ` Lee Revell
@ 2004-08-29  2:03                       ` Peter Williams
  2004-08-29  2:28                         ` spaminos-ker
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Williams @ 2004-08-29  2:03 UTC (permalink / raw)
  To: Lee Revell; +Cc: spaminos-ker, linux-kernel

Lee Revell wrote:
> On Sat, 2004-08-28 at 20:25, Lee Revell wrote:
> 
>>On Sat, 2004-08-28 at 20:21, Peter Williams wrote:
>>
>>>spaminos-ker@yahoo.com wrote:
>>>
>>>>--- Peter Williams <pwil3058@bigpond.net.au> wrote:
>>
>>>>    -----------------
>>>> => started at: kernel_fpu_begin+0x21/0x60
>>>> => ended at:   _mmx_memcpy+0x131/0x180
>>>>=======>
>>>>00000001 0.000ms (+0.000ms): kernel_fpu_begin (_mmx_memcpy)
>>>>00000001 0.730ms (+0.730ms): sub_preempt_count (_mmx_memcpy)
>>>>00000001 0.730ms (+0.000ms): _mmx_memcpy (check_preempt_timing)
>>>>00000001 0.730ms (+0.000ms): kernel_fpu_begin (_mmx_memcpy)
>>>>
>>>
>>>As far as I can see sub_preempt_count() is part of the latency measuring 
>>>component of the voluntary preempt patch so, like you, I'm not sure 
>>>whether this report makes any sense.
>>
>>Is this an SMP machine?  There were problems with that version of the
>>voluntary preemption patches on SMP.  The latest version, Q3, should fix
>>these.
>>
> 
> 
> Hmm, after rereading the entire thread, I am not sure that voluntary
> preemption will help you here.  Voluntary preemption (and preemption in
> general) deals with the situation in which you have a high priority
> task, often the highest priority task on the system, that spends most of
> its time sleeping on some resource, and this task needs to run as soon
> as possible once it becomes runnable.  In that situation the scheduler
> doesn't have a very difficult decision, there is no question that it
> should run this task ASAP.  How long 'ASAP' is depends on how long it
> takes whatever task was running when our high priority task became
> runnable to yield the processor.  The scheduler has a very easy job
> here, there is only one right thing to do.  Also the intervals involved
> are very small, usually less than 1ms, whereas you are talking about a
> variance of several seconds.

My understanding is that it's a very occasional (every few hours) time 
out of several seconds not something with a variance of several seconds. 
  So it seems to fit the characteristics of the type of problem that you 
are hunting i.e. every now and then going through a (rarely exercised) 
code path that hogs the CPU for too long.  But, as you say, the time 
scales are radically different.

> 
> In the situation you describe, you have two tasks running at the same
> base priority, and the scheduler does not seem to be doing a good job
> balancing them.  This is a different situation, and much more dependent
> on the scheduling policy.

The mode in which the scheduler was being used had all priority fiddling 
(except promotion) turned off so the tasks should have been just round 
robinning with each other.  Also, the time outs are fairly rare (every 
few hours according to Nicolas's e-mail) and happen with several 
different schedulers (with ZAPHOD (the one being used by Nicolas) and 
Con's staircase schedulers having less problem than the vanilla 
scheduler) which is why I thought it might be something outside the 
scheduler.  Perhaps it's something outside the kernel?

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-29  1:22                       ` Lee Revell
@ 2004-08-29  1:31                         ` Peter Williams
  2004-09-13 20:09                           ` spaminos-ker
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Williams @ 2004-08-29  1:31 UTC (permalink / raw)
  To: spaminos-ker; +Cc: Lee Revell, linux-kernel

Lee Revell wrote:
> On Sat, 2004-08-28 at 21:19, spaminos-ker@yahoo.com wrote:
> 
>>--- Lee Revell <rlrevell@joe-job.com> wrote:
>>
>>>Is this an SMP machine?  There were problems with that version of the
>>>voluntary preemption patches on SMP.  The latest version, Q3, should fix
>>>these.
>>>
>>
>>No, it's a single CPU Athlon 1800+, the kernel is compiled in with support for
>>SMP system, but that should not have any impact.
>>
> 
> 
> I believe people were also having problems running SMP kernels with the
> VP patches on UP.  Try the latest version. Q3 as of this writing.

Nicolas,
	I'll generate a combined patch and let you know when it's ready.  In 
the mean time, could you try increasing the "base_promotion_interval" to 
about twice the time slice size?

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-29  1:19                     ` spaminos-ker
@ 2004-08-29  1:22                       ` Lee Revell
  2004-08-29  1:31                         ` Peter Williams
  2004-08-29  2:20                       ` Lee Revell
  1 sibling, 1 reply; 43+ messages in thread
From: Lee Revell @ 2004-08-29  1:22 UTC (permalink / raw)
  To: spaminos-ker; +Cc: Peter Williams, linux-kernel

On Sat, 2004-08-28 at 21:19, spaminos-ker@yahoo.com wrote:
> --- Lee Revell <rlrevell@joe-job.com> wrote:
> > Is this an SMP machine?  There were problems with that version of the
> > voluntary preemption patches on SMP.  The latest version, Q3, should fix
> > these.
> > 
> No, it's a single CPU Athlon 1800+, the kernel is compiled in with support for
> SMP system, but that should not have any impact.
> 

I believe people were also having problems running SMP kernels with the
VP patches on UP.  Try the latest version. Q3 as of this writing.

Lee


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-29  0:25                   ` Lee Revell
  2004-08-29  0:45                     ` Lee Revell
@ 2004-08-29  1:19                     ` spaminos-ker
  2004-08-29  1:22                       ` Lee Revell
  2004-08-29  2:20                       ` Lee Revell
  1 sibling, 2 replies; 43+ messages in thread
From: spaminos-ker @ 2004-08-29  1:19 UTC (permalink / raw)
  To: Lee Revell, Peter Williams; +Cc: linux-kernel

--- Lee Revell <rlrevell@joe-job.com> wrote:
> Is this an SMP machine?  There were problems with that version of the
> voluntary preemption patches on SMP.  The latest version, Q3, should fix
> these.
> 
No, it's a single CPU Athlon 1800+, the kernel is compiled in with support for
SMP system, but that should not have any impact.

Nicolas


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-29  0:25                   ` Lee Revell
@ 2004-08-29  0:45                     ` Lee Revell
  2004-08-29  2:03                       ` Peter Williams
  2004-08-29  1:19                     ` spaminos-ker
  1 sibling, 1 reply; 43+ messages in thread
From: Lee Revell @ 2004-08-29  0:45 UTC (permalink / raw)
  To: Peter Williams; +Cc: spaminos-ker, linux-kernel

On Sat, 2004-08-28 at 20:25, Lee Revell wrote:
> On Sat, 2004-08-28 at 20:21, Peter Williams wrote:
> > spaminos-ker@yahoo.com wrote:
> > > --- Peter Williams <pwil3058@bigpond.net.au> wrote:
> 
> > >     -----------------
> > >  => started at: kernel_fpu_begin+0x21/0x60
> > >  => ended at:   _mmx_memcpy+0x131/0x180
> > > =======>
> > > 00000001 0.000ms (+0.000ms): kernel_fpu_begin (_mmx_memcpy)
> > > 00000001 0.730ms (+0.730ms): sub_preempt_count (_mmx_memcpy)
> > > 00000001 0.730ms (+0.000ms): _mmx_memcpy (check_preempt_timing)
> > > 00000001 0.730ms (+0.000ms): kernel_fpu_begin (_mmx_memcpy)
> > > 
> > 
> > As far as I can see sub_preempt_count() is part of the latency measuring 
> > component of the voluntary preempt patch so, like you, I'm not sure 
> > whether this report makes any sense.
> 
> Is this an SMP machine?  There were problems with that version of the
> voluntary preemption patches on SMP.  The latest version, Q3, should fix
> these.
> 

Hmm, after rereading the entire thread, I am not sure that voluntary
preemption will help you here.  Voluntary preemption (and preemption in
general) deals with the situation in which you have a high priority
task, often the highest priority task on the system, that spends most of
its time sleeping on some resource, and this task needs to run as soon
as possible once it becomes runnable.  In that situation the scheduler
doesn't have a very difficult decision, there is no question that it
should run this task ASAP.  How long 'ASAP' is depends on how long it
takes whatever task was running when our high priority task became
runnable to yield the processor.  The scheduler has a very easy job
here, there is only one right thing to do.  Also the intervals involved
are very small, usually less than 1ms, whereas you are talking about a
variance of several seconds.

In the situation you describe, you have two tasks running at the same
base priority, and the scheduler does not seem to be doing a good job
balancing them.  This is a different situation, and much more dependent
on the scheduling policy.

Lee


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-29  0:21                 ` Peter Williams
@ 2004-08-29  0:25                   ` Lee Revell
  2004-08-29  0:45                     ` Lee Revell
  2004-08-29  1:19                     ` spaminos-ker
  0 siblings, 2 replies; 43+ messages in thread
From: Lee Revell @ 2004-08-29  0:25 UTC (permalink / raw)
  To: Peter Williams; +Cc: spaminos-ker, linux-kernel

On Sat, 2004-08-28 at 20:21, Peter Williams wrote:
> spaminos-ker@yahoo.com wrote:
> > --- Peter Williams <pwil3058@bigpond.net.au> wrote:

> >     -----------------
> >  => started at: kernel_fpu_begin+0x21/0x60
> >  => ended at:   _mmx_memcpy+0x131/0x180
> > =======>
> > 00000001 0.000ms (+0.000ms): kernel_fpu_begin (_mmx_memcpy)
> > 00000001 0.730ms (+0.730ms): sub_preempt_count (_mmx_memcpy)
> > 00000001 0.730ms (+0.000ms): _mmx_memcpy (check_preempt_timing)
> > 00000001 0.730ms (+0.000ms): kernel_fpu_begin (_mmx_memcpy)
> > 
> 
> As far as I can see sub_preempt_count() is part of the latency measuring 
> component of the voluntary preempt patch so, like you, I'm not sure 
> whether this report makes any sense.

Is this an SMP machine?  There were problems with that version of the
voluntary preemption patches on SMP.  The latest version, Q3, should fix
these.

Lee


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-28  1:59               ` spaminos-ker
@ 2004-08-29  0:21                 ` Peter Williams
  2004-08-29  0:25                   ` Lee Revell
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Williams @ 2004-08-29  0:21 UTC (permalink / raw)
  To: spaminos-ker; +Cc: linux-kernel

spaminos-ker@yahoo.com wrote:
> --- Peter Williams <pwil3058@bigpond.net.au> wrote:
> 
>>A (gzipped) combined ZAPHOD and P9 voluntary preempt patch for 2.6.8.1 
>>is available at:
>>
>>
> 
> <http://prdownloads.sourceforge.net/cpuse/patch-2.6.8.1-zaphod-vp-v5.0.1.gz?download>
> 
>>This patch has had minimal testing so use with care and please let me 
>>know if there are any problems.
>>
> 
> 
> I tried this patch, and I get a pretty high latency in "sub_preempt_count"
> 00000001 0.730ms (+0.730ms): sub_preempt_count (_mmx_memcpy)
> 
> I am not sure if that makes sense and what it means.
> 
> Nicolas
> 
> 
> Here are the full messages:
> 
> Aug 27 18:42:11 localhost kernel: (events/0/4): new 730 us maximum-latency
> critical section.
> Aug 27 18:42:11 localhost kernel:  => started at: <kernel_fpu_begin+0x21/0x60>
> Aug 27 18:42:11 localhost kernel:  => ended at:   <_mmx_memcpy+0x131/0x180>
> Aug 27 18:42:11 localhost kernel:  [<c014106a>]
> check_preempt_timing+0x1aa/0x240
> Aug 27 18:42:11 localhost kernel:  [<c0225751>] _mmx_memcpy+0x131/0x180
> Aug 27 18:42:11 localhost kernel:  [<c0225751>] _mmx_memcpy+0x131/0x180
> Aug 27 18:42:11 localhost kernel:  [<c0141244>] sub_preempt_count+0x54/0x60
> Aug 27 18:42:11 localhost kernel:  [<c0141244>] sub_preempt_count+0x54/0x60
> Aug 27 18:42:11 localhost kernel:  [<c0225751>] _mmx_memcpy+0x131/0x180
> Aug 27 18:42:11 localhost kernel:  [<c02dd9fe>] vgacon_save_screen+0x7e/0x80
> Aug 27 18:42:11 localhost kernel:  [<c0267d32>] do_blank_screen+0x182/0x2b0
> Aug 27 18:42:11 localhost kernel:  [<c0122fa4>] acquire_console_sem+0x44/0x70
> Aug 27 18:42:11 localhost kernel:  [<c0266ab2>] console_callback+0x72/0xf0
> Aug 27 18:42:11 localhost kernel:  [<c0134dcb>] worker_thread+0x1eb/0x2d0
> Aug 27 18:42:11 localhost kernel:  [<c0266a40>] console_callback+0x0/0xf0
> Aug 27 18:42:11 localhost kernel:  [<c011c000>] default_wake_function+0x0/0x20
> Aug 27 18:42:11 localhost kernel:  [<c011c000>] default_wake_function+0x0/0x20
> Aug 27 18:42:11 localhost kernel:  [<c013963c>] kthread+0xbc/0xd0
> Aug 27 18:42:11 localhost kernel:  [<c0134be0>] worker_thread+0x0/0x2d0
> Aug 27 18:42:11 localhost kernel:  [<c0139580>] kthread+0x0/0xd0
> Aug 27 18:42:11 localhost kernel:  [<c0104389>] kernel_thread_helper+0x5/0xc
> 
> preemption latency trace v1.0.2
> -------------------------------
>  latency: 730 us, entries: 4 (4)
>     -----------------
>     | task: events/0/4, uid:0 nice:-10 policy:0 rt_prio:0
>     -----------------
>  => started at: kernel_fpu_begin+0x21/0x60
>  => ended at:   _mmx_memcpy+0x131/0x180
> =======>
> 00000001 0.000ms (+0.000ms): kernel_fpu_begin (_mmx_memcpy)
> 00000001 0.730ms (+0.730ms): sub_preempt_count (_mmx_memcpy)
> 00000001 0.730ms (+0.000ms): _mmx_memcpy (check_preempt_timing)
> 00000001 0.730ms (+0.000ms): kernel_fpu_begin (_mmx_memcpy)
> 

As far as I can see sub_preempt_count() is part of the latency measuring 
component of the voluntary preempt patch so, like you, I'm not sure 
whether this report makes any sense.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-26  8:39             ` Peter Williams
@ 2004-08-28  1:59               ` spaminos-ker
  2004-08-29  0:21                 ` Peter Williams
  0 siblings, 1 reply; 43+ messages in thread
From: spaminos-ker @ 2004-08-28  1:59 UTC (permalink / raw)
  To: Peter Williams; +Cc: linux-kernel

--- Peter Williams <pwil3058@bigpond.net.au> wrote:
> A (gzipped) combined ZAPHOD and P9 voluntary preempt patch for 2.6.8.1 
> is available at:
> 
>
<http://prdownloads.sourceforge.net/cpuse/patch-2.6.8.1-zaphod-vp-v5.0.1.gz?download>
> 
> This patch has had minimal testing so use with care and please let me 
> know if there are any problems.
> 

I tried this patch, and I get a pretty high latency in "sub_preempt_count"
00000001 0.730ms (+0.730ms): sub_preempt_count (_mmx_memcpy)

I am not sure if that makes sense and what it means.

Nicolas


Here are the full messages:

Aug 27 18:42:11 localhost kernel: (events/0/4): new 730 us maximum-latency
critical section.
Aug 27 18:42:11 localhost kernel:  => started at: <kernel_fpu_begin+0x21/0x60>
Aug 27 18:42:11 localhost kernel:  => ended at:   <_mmx_memcpy+0x131/0x180>
Aug 27 18:42:11 localhost kernel:  [<c014106a>]
check_preempt_timing+0x1aa/0x240
Aug 27 18:42:11 localhost kernel:  [<c0225751>] _mmx_memcpy+0x131/0x180
Aug 27 18:42:11 localhost kernel:  [<c0225751>] _mmx_memcpy+0x131/0x180
Aug 27 18:42:11 localhost kernel:  [<c0141244>] sub_preempt_count+0x54/0x60
Aug 27 18:42:11 localhost kernel:  [<c0141244>] sub_preempt_count+0x54/0x60
Aug 27 18:42:11 localhost kernel:  [<c0225751>] _mmx_memcpy+0x131/0x180
Aug 27 18:42:11 localhost kernel:  [<c02dd9fe>] vgacon_save_screen+0x7e/0x80
Aug 27 18:42:11 localhost kernel:  [<c0267d32>] do_blank_screen+0x182/0x2b0
Aug 27 18:42:11 localhost kernel:  [<c0122fa4>] acquire_console_sem+0x44/0x70
Aug 27 18:42:11 localhost kernel:  [<c0266ab2>] console_callback+0x72/0xf0
Aug 27 18:42:11 localhost kernel:  [<c0134dcb>] worker_thread+0x1eb/0x2d0
Aug 27 18:42:11 localhost kernel:  [<c0266a40>] console_callback+0x0/0xf0
Aug 27 18:42:11 localhost kernel:  [<c011c000>] default_wake_function+0x0/0x20
Aug 27 18:42:11 localhost kernel:  [<c011c000>] default_wake_function+0x0/0x20
Aug 27 18:42:11 localhost kernel:  [<c013963c>] kthread+0xbc/0xd0
Aug 27 18:42:11 localhost kernel:  [<c0134be0>] worker_thread+0x0/0x2d0
Aug 27 18:42:11 localhost kernel:  [<c0139580>] kthread+0x0/0xd0
Aug 27 18:42:11 localhost kernel:  [<c0104389>] kernel_thread_helper+0x5/0xc

preemption latency trace v1.0.2
-------------------------------
 latency: 730 us, entries: 4 (4)
    -----------------
    | task: events/0/4, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: kernel_fpu_begin+0x21/0x60
 => ended at:   _mmx_memcpy+0x131/0x180
=======>
00000001 0.000ms (+0.000ms): kernel_fpu_begin (_mmx_memcpy)
00000001 0.730ms (+0.730ms): sub_preempt_count (_mmx_memcpy)
00000001 0.730ms (+0.000ms): _mmx_memcpy (check_preempt_timing)
00000001 0.730ms (+0.000ms): kernel_fpu_begin (_mmx_memcpy)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-26  2:42           ` Peter Williams
@ 2004-08-26  8:39             ` Peter Williams
  2004-08-28  1:59               ` spaminos-ker
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Williams @ 2004-08-26  8:39 UTC (permalink / raw)
  To: spaminos-ker; +Cc: linux-kernel

Peter Williams wrote:
> spaminos-ker@yahoo.com wrote:
> 
>> --- Peter Williams <pwil3058@bigpond.net.au> wrote:
>>
>>> You could try Lee Revell's (rlrevell@joe-job.com) latency measuring 
>>> patches and also try applying Ingo Molnar's (mingo@elte.hu) 
>>> voluntary-preempt patches.
>>>
>>> Peter
>>
>>
>>
>> I tried 2.6.8.1 with voluntary-preempt-2.6.8.1-P9 and I am getting 
>> latency
>> messages, they trigger at around the same time I get "delta = 3" 
>> messages.
>>
>> I guess that there is no way to have the latency reporting work with 
>> the zaphod
>> patch?
> 
> 
> I'll see what I can do and let you know.

A (gzipped) combined ZAPHOD and P9 voluntary preempt patch for 2.6.8.1 
is available at:

<http://prdownloads.sourceforge.net/cpuse/patch-2.6.8.1-zaphod-vp-v5.0.1.gz?download>

This patch has had minimal testing so use with care and please let me 
know if there are any problems.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-26  2:30         ` spaminos-ker
@ 2004-08-26  2:42           ` Peter Williams
  2004-08-26  8:39             ` Peter Williams
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Williams @ 2004-08-26  2:42 UTC (permalink / raw)
  To: spaminos-ker; +Cc: linux-kernel

spaminos-ker@yahoo.com wrote:
> --- Peter Williams <pwil3058@bigpond.net.au> wrote:
> 
>>You could try Lee Revell's (rlrevell@joe-job.com) latency measuring 
>>patches and also try applying Ingo Molnar's (mingo@elte.hu) 
>>voluntary-preempt patches.
>>
>>Peter
> 
> 
> I tried 2.6.8.1 with voluntary-preempt-2.6.8.1-P9 and I am getting latency
> messages, they trigger at around the same time I get "delta = 3" messages.
> 
> I guess that there is no way to have the latency reporting work with the zaphod
> patch?

I'll see what I can do and let you know.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-24 23:04       ` Peter Williams
  2004-08-24 23:22         ` Lee Revell
@ 2004-08-26  2:30         ` spaminos-ker
  2004-08-26  2:42           ` Peter Williams
  1 sibling, 1 reply; 43+ messages in thread
From: spaminos-ker @ 2004-08-26  2:30 UTC (permalink / raw)
  To: Peter Williams; +Cc: linux-kernel

--- Peter Williams <pwil3058@bigpond.net.au> wrote:
> You could try Lee Revell's (rlrevell@joe-job.com) latency measuring 
> patches and also try applying Ingo Molnar's (mingo@elte.hu) 
> voluntary-preempt patches.
> 
> Peter

I tried 2.6.8.1 with voluntary-preempt-2.6.8.1-P9 and I am getting latency
messages, they trigger at around the same time I get "delta = 3" messages.

I guess that there is no way to have the latency reporting work with the zaphod
patch?

I hope those messages are giving a clue on what is going on on this box...

Are this kind of latencies normal (reminder this is an Athlon XP 1800+ ie
1.5MHz)?
00010006 0.144ms (+0.000ms): task_rq_lock (try_to_wake_up)
00010007 0.144ms (+0.000ms): activate_task (try_to_wake_up)
00010007 0.144ms (+0.000ms): sched_clock (activate_task)
00010007 0.144ms (+0.000ms): recalc_task_prio (activate_task)
00010007 0.144ms (+0.000ms): effective_prio (recalc_task_prio)
00010007 0.144ms (+0.000ms): enqueue_task (activate_task)

Anyway, here are the full messages:
Aug 25 19:10:01 localhost kernel: (ksoftirqd/0/3): new 163 us maximum-latency
critical section.
Aug 25 19:10:01 localhost kernel:  => started at:
<netif_receive_skb+0x81/0x1e0>
Aug 25 19:10:01 localhost kernel:  => ended at:  
<netif_receive_skb+0x167/0x1e0>
Aug 25 19:10:01 localhost kernel:  [<c013eec7>]
check_preempt_timing+0x1c7/0x230
Aug 25 19:10:01 localhost kernel:  [<c031ae57>] netif_receive_skb+0x167/0x1e0
Aug 25 19:10:01 localhost kernel:  [<c031ae57>] netif_receive_skb+0x167/0x1e0
Aug 25 19:10:01 localhost kernel:  [<c013f074>] sub_preempt_count+0x54/0x60
Aug 25 19:10:01 localhost kernel:  [<c013f074>] sub_preempt_count+0x54/0x60
Aug 25 19:10:01 localhost kernel:  [<c031ae57>] netif_receive_skb+0x167/0x1e0
Aug 25 19:10:01 localhost kernel:  [<c0310008>] sock_release+0xf8/0x110
Aug 25 19:10:01 localhost kernel:  [<c031af5a>] process_backlog+0x8a/0x150
Aug 25 19:10:01 localhost kernel:  [<c031b0ae>] net_rx_action+0x8e/0x120
Aug 25 19:10:01 localhost kernel:  [<c01248a2>] ___do_softirq+0xa2/0xb0
Aug 25 19:10:01 localhost kernel:  [<c012492e>] _do_softirq+0xe/0x20
Aug 25 19:10:01 localhost kernel:  [<c0124eb5>] ksoftirqd+0xa5/0x110
Aug 25 19:10:01 localhost kernel:  [<c013723c>] kthread+0xbc/0xd0
Aug 25 19:10:01 localhost kernel:  [<c0124e10>] ksoftirqd+0x0/0x110
Aug 25 19:10:01 localhost kernel:  [<c0137180>] kthread+0x0/0xd0
Aug 25 19:10:01 localhost kernel:  [<c01044b9>] kernel_thread_helper+0x5/0xc


preemption latency trace v1.0.2
-------------------------------
 latency: 163 us, entries: 194 (194)
    -----------------
    | task: ksoftirqd/0/3, uid:0 nice:-10 policy:0 rt_prio:0
    -----------------
 => started at: netif_receive_skb+0x81/0x1e0
 => ended at:   netif_receive_skb+0x167/0x1e0
=======>
00000001 0.000ms (+0.000ms): netif_receive_skb (process_backlog)
00000001 0.001ms (+0.001ms): ip_rcv (netif_receive_skb)
00000001 0.005ms (+0.003ms): nf_hook_slow (ip_rcv)
00000002 0.006ms (+0.001ms): nf_iterate (nf_hook_slow)
00000002 0.007ms (+0.001ms): ip_conntrack_defrag (nf_iterate)
00000002 0.008ms (+0.000ms): ip_conntrack_in (nf_iterate)
00000002 0.008ms (+0.000ms): ip_ct_find_proto (ip_conntrack_in)
00000103 0.009ms (+0.000ms): __ip_ct_find_proto (ip_ct_find_proto)
00000102 0.009ms (+0.000ms): local_bh_enable (ip_ct_find_proto)
00000002 0.010ms (+0.000ms): get_tuple (ip_conntrack_in)
00000002 0.010ms (+0.000ms): tcp_pkt_to_tuple (get_tuple)
00000002 0.011ms (+0.000ms): skb_copy_bits (tcp_pkt_to_tuple)
00000002 0.013ms (+0.001ms): ip_conntrack_find_get (ip_conntrack_in)
00000103 0.013ms (+0.000ms): __ip_conntrack_find (ip_conntrack_find_get)
00000103 0.013ms (+0.000ms): hash_conntrack (__ip_conntrack_find)
00000102 0.015ms (+0.001ms): local_bh_enable (ip_conntrack_find_get)
00000002 0.015ms (+0.000ms): tcp_packet (ip_conntrack_in)
00000002 0.015ms (+0.000ms): skb_copy_bits (tcp_packet)
00000103 0.016ms (+0.000ms): get_conntrack_index (tcp_packet)
00000102 0.017ms (+0.000ms): local_bh_enable (tcp_packet)
00000002 0.017ms (+0.000ms): ip_ct_refresh (tcp_packet)
00000103 0.018ms (+0.000ms): del_timer (ip_ct_refresh)
00000103 0.019ms (+0.001ms): __mod_timer (ip_ct_refresh)
00000105 0.019ms (+0.000ms): internal_add_timer (__mod_timer)
00000102 0.019ms (+0.000ms): local_bh_enable (tcp_packet)
00000002 0.020ms (+0.000ms): ipt_route_hook (nf_iterate)
00000002 0.020ms (+0.000ms): ipt_do_table (ipt_route_hook)
00000102 0.024ms (+0.003ms): local_bh_enable (ipt_do_table)
00000002 0.024ms (+0.000ms): ip_nat_fn (nf_iterate)
00000002 0.024ms (+0.000ms): ip_conntrack_get (ip_nat_fn)
00000002 0.026ms (+0.001ms): do_bindings (ip_nat_fn)
00000102 0.027ms (+0.001ms): local_bh_enable (do_bindings)
00000002 0.029ms (+0.001ms): ip_rcv_finish (nf_hook_slow)
00000002 0.030ms (+0.001ms): ip_route_input (ip_rcv_finish)
00000002 0.030ms (+0.000ms): rt_hash_code (ip_route_input)
00000002 0.035ms (+0.004ms): ip_local_deliver (ip_rcv_finish)
00000002 0.036ms (+0.000ms): nf_hook_slow (ip_local_deliver)
00000003 0.036ms (+0.000ms): nf_iterate (nf_hook_slow)
00000003 0.036ms (+0.000ms): ipt_route_hook (nf_iterate)
00000003 0.036ms (+0.000ms): ipt_do_table (ipt_route_hook)
00000103 0.037ms (+0.001ms): local_bh_enable (ipt_do_table)
00000003 0.038ms (+0.000ms): ipt_hook (nf_iterate)
00000003 0.038ms (+0.000ms): ipt_do_table (ipt_hook)
00000103 0.039ms (+0.001ms): local_bh_enable (ipt_do_table)
00000003 0.040ms (+0.000ms): ip_confirm (nf_iterate)
00000003 0.040ms (+0.000ms): ip_local_deliver_finish (nf_hook_slow)
00000004 0.042ms (+0.002ms): tcp_v4_rcv (ip_local_deliver_finish)
00000004 0.043ms (+0.001ms): tcp_v4_checksum_init (tcp_v4_rcv)
00000004 0.045ms (+0.001ms): skb_checksum (tcp_v4_checksum_init)
00000005 0.053ms (+0.008ms): tcp_v4_do_rcv (tcp_v4_rcv)
00000005 0.054ms (+0.000ms): tcp_rcv_state_process (tcp_v4_do_rcv)
00000005 0.057ms (+0.002ms): tcp_ack (tcp_rcv_state_process)
00000005 0.058ms (+0.001ms): tcp_ack_update_window (tcp_ack)
00000005 0.061ms (+0.002ms): tcp_clean_rtx_queue (tcp_ack)
00000005 0.063ms (+0.002ms): __kfree_skb (tcp_clean_rtx_queue)
00000005 0.064ms (+0.000ms): kfree_skbmem (__kfree_skb)
00000005 0.064ms (+0.000ms): skb_release_data (kfree_skbmem)
00000005 0.066ms (+0.001ms): kfree (kfree_skbmem)
00000005 0.066ms (+0.000ms): kmem_cache_free (kfree_skbmem)
00000005 0.067ms (+0.000ms): tcp_ack_no_tstamp (tcp_clean_rtx_queue)
00000005 0.067ms (+0.000ms): tcp_rtt_estimator (tcp_ack_no_tstamp)
00000005 0.074ms (+0.006ms): tcp_reset_keepalive_timer (tcp_rcv_state_process)
00000005 0.075ms (+0.000ms): sk_reset_timer (tcp_reset_keepalive_timer)
00000005 0.075ms (+0.000ms): mod_timer (sk_reset_timer)
00000005 0.075ms (+0.000ms): __mod_timer (sk_reset_timer)
00000007 0.076ms (+0.000ms): internal_add_timer (__mod_timer)
00000005 0.077ms (+0.000ms): tcp_urg (tcp_rcv_state_process)
00000005 0.077ms (+0.000ms): tcp_data_queue (tcp_rcv_state_process)
00000005 0.080ms (+0.002ms): sk_stream_mem_schedule (tcp_data_queue)
00000005 0.081ms (+0.001ms): tcp_fin (tcp_data_queue)
00000005 0.083ms (+0.001ms): tcp_send_ack (tcp_fin)
00000005 0.083ms (+0.000ms): alloc_skb (tcp_send_ack)
00000005 0.083ms (+0.000ms): kmem_cache_alloc (alloc_skb)
00000005 0.083ms (+0.000ms): __kmalloc (alloc_skb)
00000005 0.084ms (+0.001ms): tcp_transmit_skb (tcp_send_ack)
00000005 0.087ms (+0.003ms): __tcp_select_window (tcp_transmit_skb)
00000005 0.090ms (+0.003ms): tcp_v4_send_check (tcp_transmit_skb)
00000005 0.092ms (+0.001ms): ip_queue_xmit (tcp_transmit_skb)
00000005 0.097ms (+0.005ms): nf_hook_slow (ip_queue_xmit)
00000006 0.097ms (+0.000ms): nf_iterate (nf_hook_slow)
00000006 0.098ms (+0.000ms): ip_conntrack_defrag (nf_iterate)
00000006 0.098ms (+0.000ms): ip_conntrack_local (nf_iterate)
00000006 0.098ms (+0.000ms): ip_conntrack_in (nf_iterate)
00000006 0.099ms (+0.000ms): ip_ct_find_proto (ip_conntrack_in)
00000107 0.099ms (+0.000ms): __ip_ct_find_proto (ip_ct_find_proto)
00000106 0.099ms (+0.000ms): local_bh_enable (ip_ct_find_proto)
00000006 0.099ms (+0.000ms): get_tuple (ip_conntrack_in)
00000006 0.100ms (+0.000ms): tcp_pkt_to_tuple (get_tuple)
00000006 0.100ms (+0.000ms): skb_copy_bits (tcp_pkt_to_tuple)
00000006 0.100ms (+0.000ms): ip_conntrack_find_get (ip_conntrack_in)
00000107 0.100ms (+0.000ms): __ip_conntrack_find (ip_conntrack_find_get)
00000107 0.100ms (+0.000ms): hash_conntrack (__ip_conntrack_find)
00000106 0.101ms (+0.000ms): local_bh_enable (ip_conntrack_find_get)
00000006 0.101ms (+0.000ms): tcp_packet (ip_conntrack_in)
00000006 0.101ms (+0.000ms): skb_copy_bits (tcp_packet)
00000107 0.101ms (+0.000ms): get_conntrack_index (tcp_packet)
00000106 0.101ms (+0.000ms): local_bh_enable (tcp_packet)
00000006 0.102ms (+0.000ms): ip_ct_refresh (tcp_packet)
00000107 0.102ms (+0.000ms): del_timer (ip_ct_refresh)
00000107 0.102ms (+0.000ms): __mod_timer (ip_ct_refresh)
00000109 0.102ms (+0.000ms): internal_add_timer (__mod_timer)
00000106 0.102ms (+0.000ms): local_bh_enable (tcp_packet)
00000006 0.103ms (+0.000ms): ipt_local_hook (nf_iterate)
00000006 0.103ms (+0.000ms): ipt_do_table (ipt_local_hook)
00000106 0.104ms (+0.001ms): local_bh_enable (ipt_do_table)
00000006 0.105ms (+0.000ms): ipt_local_out_hook (nf_iterate)
00000006 0.105ms (+0.000ms): ipt_do_table (ipt_local_out_hook)
00000106 0.107ms (+0.001ms): local_bh_enable (ipt_do_table)
00000006 0.107ms (+0.000ms): dst_output (nf_hook_slow)
00000006 0.108ms (+0.000ms): ip_output (dst_output)
00000006 0.108ms (+0.000ms): ip_finish_output (dst_output)
00000006 0.109ms (+0.000ms): nf_hook_slow (ip_finish_output)
00000007 0.109ms (+0.000ms): nf_iterate (nf_hook_slow)
00000007 0.109ms (+0.000ms): ipt_route_hook (nf_iterate)
00000007 0.109ms (+0.000ms): ipt_do_table (ipt_route_hook)
00000107 0.110ms (+0.000ms): local_bh_enable (ipt_do_table)
00000007 0.110ms (+0.000ms): ip_nat_out (nf_iterate)
00000007 0.111ms (+0.000ms): ip_nat_fn (nf_iterate)
00000007 0.111ms (+0.000ms): ip_conntrack_get (ip_nat_fn)
00000007 0.111ms (+0.000ms): do_bindings (ip_nat_fn)
00000107 0.112ms (+0.000ms): local_bh_enable (do_bindings)
00000007 0.112ms (+0.000ms): ip_refrag (nf_iterate)
00000007 0.112ms (+0.000ms): ip_confirm (ip_refrag)
00000007 0.112ms (+0.000ms): ip_finish_output2 (nf_hook_slow)
00000107 0.114ms (+0.001ms): local_bh_enable (ip_finish_output2)
00000007 0.115ms (+0.000ms): dev_queue_xmit (ip_finish_output2)
00000109 0.116ms (+0.001ms): pfifo_fast_enqueue (dev_queue_xmit)
00000109 0.117ms (+0.000ms): qdisc_restart (dev_queue_xmit)
00000109 0.118ms (+0.000ms): pfifo_fast_dequeue (qdisc_restart)
00000109 0.120ms (+0.001ms): speedo_start_xmit (qdisc_restart)
0000010a 0.122ms (+0.002ms): __const_udelay (speedo_start_xmit)
0000010a 0.123ms (+0.000ms): __delay (speedo_start_xmit)
0000010a 0.123ms (+0.000ms): delay_tsc (__delay)
00000109 0.126ms (+0.002ms): qdisc_restart (dev_queue_xmit)
00000109 0.126ms (+0.000ms): pfifo_fast_dequeue (qdisc_restart)
00000108 0.126ms (+0.000ms): local_bh_enable (dev_queue_xmit)
00000005 0.128ms (+0.001ms): tcp_time_wait (tcp_fin)
00000005 0.129ms (+0.000ms): kmem_cache_alloc (tcp_time_wait)
00000005 0.130ms (+0.001ms): __tcp_tw_hashdance (tcp_time_wait)
00000005 0.134ms (+0.003ms): tcp_tw_schedule (tcp_time_wait)
00000005 0.136ms (+0.001ms): tcp_update_metrics (tcp_time_wait)
00010005 0.138ms (+0.001ms): do_IRQ (tcp_update_metrics)
00010006 0.138ms (+0.000ms): mask_and_ack_8259A (do_IRQ)
00010006 0.143ms (+0.005ms): generic_redirect_hardirq (do_IRQ)
00010006 0.144ms (+0.000ms): wake_up_process (generic_redirect_hardirq)
00010006 0.144ms (+0.000ms): try_to_wake_up (wake_up_process)
00010006 0.144ms (+0.000ms): task_rq_lock (try_to_wake_up)
00010007 0.144ms (+0.000ms): activate_task (try_to_wake_up)
00010007 0.144ms (+0.000ms): sched_clock (activate_task)
00010007 0.144ms (+0.000ms): recalc_task_prio (activate_task)
00010007 0.144ms (+0.000ms): effective_prio (recalc_task_prio)
00010007 0.144ms (+0.000ms): enqueue_task (activate_task)
00000005 0.145ms (+0.000ms): smp_apic_timer_interrupt (tcp_update_metrics)
00010005 0.145ms (+0.000ms): update_process_times (smp_apic_timer_interrupt)
00010005 0.146ms (+0.000ms): update_one_process (update_process_times)
00010005 0.146ms (+0.000ms): run_local_timers (update_process_times)
00010005 0.147ms (+0.000ms): raise_softirq (update_process_times)
00010005 0.147ms (+0.000ms): scheduler_tick (update_process_times)
00010005 0.147ms (+0.000ms): sched_clock (scheduler_tick)
00010006 0.148ms (+0.000ms): __bitmap_weight (scheduler_tick)
00010006 0.148ms (+0.000ms): task_timeslice (scheduler_tick)
00010005 0.148ms (+0.000ms): rebalance_tick (scheduler_tick)
00000006 0.148ms (+0.000ms): do_softirq (smp_apic_timer_interrupt)
00000006 0.148ms (+0.000ms): __do_softirq (do_softirq)
00000005 0.150ms (+0.001ms): tcp_unhash (tcp_time_wait)
00000005 0.151ms (+0.001ms): tcp_put_port (tcp_time_wait)
00000105 0.152ms (+0.000ms): __tcp_put_port (tcp_put_port)
00000106 0.152ms (+0.000ms): tcp_bucket_destroy (__tcp_put_port)
00000105 0.153ms (+0.000ms): local_bh_enable (tcp_time_wait)
00000005 0.153ms (+0.000ms): do_softirq (local_bh_enable)
00000005 0.153ms (+0.000ms): __do_softirq (do_softirq)
00000005 0.153ms (+0.000ms): tcp_clear_xmit_timers (tcp_time_wait)
00000005 0.154ms (+0.000ms): sk_stop_timer (tcp_clear_xmit_timers)
00000005 0.154ms (+0.000ms): del_timer (sk_stop_timer)
00000005 0.155ms (+0.001ms): sk_stop_timer (tcp_clear_xmit_timers)
00000005 0.155ms (+0.000ms): sk_stop_timer (tcp_clear_xmit_timers)
00000005 0.155ms (+0.000ms): del_timer (sk_stop_timer)
00000005 0.156ms (+0.000ms): tcp_destroy_sock (tcp_fin)
00000005 0.157ms (+0.001ms): tcp_v4_destroy_sock (tcp_destroy_sock)
00000005 0.157ms (+0.000ms): tcp_clear_xmit_timers (tcp_v4_destroy_sock)
00000005 0.157ms (+0.000ms): sk_stop_timer (tcp_clear_xmit_timers)
00000005 0.157ms (+0.000ms): sk_stop_timer (tcp_clear_xmit_timers)
00000005 0.158ms (+0.000ms): sk_stop_timer (tcp_clear_xmit_timers)
00000005 0.159ms (+0.001ms): sk_stream_kill_queues (tcp_destroy_sock)
00000005 0.159ms (+0.000ms): __kfree_skb (sk_stream_kill_queues)
00000005 0.159ms (+0.000ms): sk_stream_rfree (__kfree_skb)
00000005 0.160ms (+0.000ms): kfree_skbmem (__kfree_skb)
00000005 0.160ms (+0.000ms): skb_release_data (kfree_skbmem)
00000005 0.160ms (+0.000ms): kfree (kfree_skbmem)
00000005 0.160ms (+0.000ms): kmem_cache_free (kfree_skbmem)
00000005 0.161ms (+0.001ms): __sk_stream_mem_reclaim (sk_stream_kill_queues)
00000001 0.164ms (+0.002ms): sub_preempt_count (netif_receive_skb)
00000001 0.164ms (+0.000ms): _mmx_memcpy (check_preempt_timing)
00000001 0.165ms (+0.000ms): kernel_fpu_begin (_mmx_memcpy)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-24 23:04       ` Peter Williams
@ 2004-08-24 23:22         ` Lee Revell
  2004-08-26  2:30         ` spaminos-ker
  1 sibling, 0 replies; 43+ messages in thread
From: Lee Revell @ 2004-08-24 23:22 UTC (permalink / raw)
  To: Peter Williams; +Cc: spaminos-ker, linux-kernel

On Tue, 2004-08-24 at 19:04, Peter Williams wrote:
> spaminos-ker@yahoo.com wrote:
> > 
> > Could I do something more useful than just displaying those deltas? Maybe I
> > could dump the process list in some way, or enable some debugging code in the
> > kernel to find out what is going on?
> 
> You could try Lee Revell's (rlrevell@joe-job.com) latency measuring 
> patches and also try applying Ingo Molnar's (mingo@elte.hu) 
> voluntary-preempt patches.
> 

Most of the tools I am using are probably too specific to the audio
subsystem to be of much use to you.  Just use Ingo's voluntary
preemption patch; if this is a scheduler/preemption problem, then it
will definitely show up in the traces.

Lee


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-24 21:11     ` spaminos-ker
@ 2004-08-24 23:04       ` Peter Williams
  2004-08-24 23:22         ` Lee Revell
  2004-08-26  2:30         ` spaminos-ker
  0 siblings, 2 replies; 43+ messages in thread
From: Peter Williams @ 2004-08-24 23:04 UTC (permalink / raw)
  To: spaminos-ker; +Cc: linux-kernel

spaminos-ker@yahoo.com wrote:
> --- Peter Williams <pwil3058@bigpond.net.au> wrote:
> 
> 
>>Could you try it in "pb" mode with both max_ia_bonus and max_tpt_bonus 
>>set to zero?  That will disable all "priority" fiddling and tasks should 
>>just round robin at a priority determined solely by their "nice" value 
>>and since (according to your earlier mail) all the daemons have the same 
>>"nice" value they should just round robin with each other.
> 
> 
> 
> Hi, I tried the latest  V-5.0 patch over 2.6.8.1 in these conditions with the
> actual server subsystem, and I get components timeouts :(
> I also ran the watchdog script on the box while running the test, and saw
> deltas of around 3 seconds every few hours:
> 
> Tue Aug 24 03:02:13 PDT 2004
> 
>>>>>>>>delta = 3
> 
> Tue Aug 24 05:50:14 PDT 2004
> 
>>>>>>>>delta = 3
> 
> Tue Aug 24 09:05:24 PDT 2004
> 
>>>>>>>>delta = 4
> 
> Tue Aug 24 09:06:20 PDT 2004
> 
>>>>>>>>delta = 4
> 
> Tue Aug 24 09:36:22 PDT 2004
> 
>>>>>>>>delta = 3
> 
> Tue Aug 24 10:20:16 PDT 2004
> 
>>>>>>>>delta = 3
> 
> Tue Aug 24 13:28:19 PDT 2004
> 
>>>>>>>>delta = 3
> 
> 
> Could I do something more useful than just displaying those deltas? Maybe I
> could dump the process list in some way, or enable some debugging code in the
> kernel to find out what is going on?

You could try Lee Revell's (rlrevell@joe-job.com) latency measuring 
patches and also try applying Ingo Molnar's (mingo@elte.hu) 
voluntary-preempt patches.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-18  0:12   ` Peter Williams
@ 2004-08-24 21:11     ` spaminos-ker
  2004-08-24 23:04       ` Peter Williams
  0 siblings, 1 reply; 43+ messages in thread
From: spaminos-ker @ 2004-08-24 21:11 UTC (permalink / raw)
  To: Peter Williams; +Cc: linux-kernel

--- Peter Williams <pwil3058@bigpond.net.au> wrote:

> Could you try it in "pb" mode with both max_ia_bonus and max_tpt_bonus 
> set to zero?  That will disable all "priority" fiddling and tasks should 
> just round robin at a priority determined solely by their "nice" value 
> and since (according to your earlier mail) all the daemons have the same 
> "nice" value they should just round robin with each other.


Hi, I tried the latest  V-5.0 patch over 2.6.8.1 in these conditions with the
actual server subsystem, and I get components timeouts :(
I also ran the watchdog script on the box while running the test, and saw
deltas of around 3 seconds every few hours:

Tue Aug 24 03:02:13 PDT 2004
>>>>>>> delta = 3
Tue Aug 24 05:50:14 PDT 2004
>>>>>>> delta = 3
Tue Aug 24 09:05:24 PDT 2004
>>>>>>> delta = 4
Tue Aug 24 09:06:20 PDT 2004
>>>>>>> delta = 4
Tue Aug 24 09:36:22 PDT 2004
>>>>>>> delta = 3
Tue Aug 24 10:20:16 PDT 2004
>>>>>>> delta = 3
Tue Aug 24 13:28:19 PDT 2004
>>>>>>> delta = 3

Could I do something more useful than just displaying those deltas? Maybe I
could dump the process list in some way, or enable some debugging code in the
kernel to find out what is going on?

Thanks

Nicolas

=====
------------------------------------------------------------
video meliora proboque deteriora sequor
------------------------------------------------------------

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
  2004-08-17 23:19 ` spaminos-ker
@ 2004-08-18  0:12   ` Peter Williams
  2004-08-24 21:11     ` spaminos-ker
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Williams @ 2004-08-18  0:12 UTC (permalink / raw)
  To: spaminos-ker; +Cc: linux-kernel

spaminos-ker@yahoo.com wrote:
> I tried the actual server with a stress test, and I do eventually get timeouts.
> 
> I tried with what seemed to be the best setup earlier:
> pb mode
> max_ia_bonus set to 0
> 
> I tried several values for base_promotion_interval but the system eventually
> times out after a few hours (it's still better than it used to be, with a stock
> kernel, it timeouts in less than an hour).

Could you try it in "pb" mode with both max_ia_bonus and max_tpt_bonus 
set to zero?  That will disable all "priority" fiddling and tasks should 
just round robin at a priority determined solely by their "nice" value 
and since (according to your earlier mail) all the daemons have the same 
"nice" value they should just round robin with each other.

Thanks
Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
       [not found] <411D50AE.5020005@bigpond.net.au>
@ 2004-08-17 23:19 ` spaminos-ker
  2004-08-18  0:12   ` Peter Williams
  0 siblings, 1 reply; 43+ messages in thread
From: spaminos-ker @ 2004-08-17 23:19 UTC (permalink / raw)
  To: Peter Williams; +Cc: linux-kernel

I tried the actual server with a stress test, and I do eventually get timeouts.

I tried with what seemed to be the best setup earlier:
pb mode
max_ia_bonus set to 0

I tried several values for base_promotion_interval but the system eventually
times out after a few hours (it's still better than it used to be, with a stock
kernel, it timeouts in less than an hour).

Nicolas


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
       [not found] <20040811093945.GA10667@elte.hu>
@ 2004-08-17 23:08 ` spaminos-ker
  0 siblings, 0 replies; 43+ messages in thread
From: spaminos-ker @ 2004-08-17 23:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

--- Ingo Molnar <mingo@elte.hu> wrote:

> 
> could you also try the 2.6.8-rc4-mm1 kernel? It has the array-switch
> disabled which _could_ lead to smoother timeslice distribution. It still
> has wakeup bonuses though.
> 
> 	Ingo

Sorry I didn't have the chance to try this test before: I didn't try it on
2.6.8.1-mm1 as I saw that maybe the patch related to the array sawitching was
dropped.

Anyway, I tried the test on 2.6.8-rc4-mm1 and it fails the test with 2 and 20
threads (delays of about 3 seconds for 2 threads, and 5 seconds with 20).

Nicolas


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others)
@ 2004-08-07 21:53 spaminos-ker
  0 siblings, 0 replies; 43+ messages in thread
From: spaminos-ker @ 2004-08-07 21:53 UTC (permalink / raw)
  To: linux-kernel

Hey guys

I ran into this problem a few month ago while trying to migrate to 2.6
series at the time, it was using 2.6.1, I figured maybe it was fixed in the
latest kernel, so I tried 2.6.7 but it shows the same signs.

I am running in an interesting problem with, what seems to be, the
scheduler. I managed to create a test program to reproduce it.

This test system I use is an Athlon XP1800+ with 256 Mo RAM with
Redhat 9 installed
(but I could reproduce it the same way with Dual CPU systems).

I can reproduce this problem on 2.6.7 (and before, I could reproduce
it on a redhat kernel linux-2.4.20-19.9)

Kernel that I compiled with and without CONFIG_PREEMPT, SMP on or off,
to see no difference.

Everything is fine on my regular (non redhat) 2.4 kernel (same box).

Basically, here is what's happening on my server:
* I have a daemon running, serving http requests via worker threads.
* I have an other daemon running that does some sanity check on the
system, basically to monitor the box (and it mostly sleeps).

Under load, the sanity check daemon stops running properly. It seems
like all the CPU is given to the http server, even though they are
started the same way. This leads to all sorts of timeouts, not good
for a production server.

So how to reproduce the problem:

I run in one shell a script to check for slowdowns:
A=$(date +%s) ; while true ; do sleep 1s; B=$(date +%s) ; D=$(($B-$A)) ;
if [ $D -ge 3 ] ; then date ; echo ">>>>>>> delta = $D" ; fi ; A=$B ;
done

and in an other, my test program (see below for source code):
./cputest 2 10000 1000 512 &

note: I am running everything from ssh shells, not from X (don't know
if this has any effect for this kind of test).

And after a few seconds, I get this:

Mon Aug 2 14:40:27 PDT 2004
>>>>>>> delta = 3
Mon Aug 2 14:41:37 PDT 2004
>>>>>>> delta = 3
Mon Aug 2 14:41:51 PDT 2004
>>>>>>> delta = 3
Mon Aug 2 14:41:57 PDT 2004
>>>>>>> delta = 3
Mon Aug 2 14:42:12 PDT 2004
>>>>>>> delta = 3

The script takes 3 seconds to execute?!

That means that the script, that sleeps and doesn't use a lot of CPU,
stops getting cycles, for some reason.

On 2.4.x series kernel (from kernel.org, not redhat), I don't get any
output for the script, meaning the
fairness is good between processes (even if I start it with 60 worker
threads instead of 2).

If I try to increase the number of threads to 60 on 2.6.7, the script
gets stuck for very long period of time (I've seen 90 seconds).

I can post the .config file I used for compiling the 2.6.7 kernel, if
needed, but I figured this post was already big enough :)

After seeing a discussion on scheduling problems, I also tried different
patches, in particular the effect of an alternate scheduler from Nick
Piggin:
2.6.7 -> fails with 2 threads
2.6.7-bk20 -> fails with 2 threads
2.6.7-bk20-np8 -> works fine with 2 threads, fails with 20 threads
2.6.7-mm7 -> fails with 2 threads
2.6.7-mm7-np8 -> works fine with 2 threads, fails with 20 threads

Any clue?

Nicolas

compile the code with:

gcc -pthread -D_REENTRANT -lm cputest.c -o cputest
---------------------- cputest.c -----------------------------

#include <pthread.h>
#include <stdio.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <math.h>

int nbthreads = 0;
int iterations = 0;
int run = 1;
int iterations2 = 0;

#define DEFAULTBUFSIZE 1
int globalpipe[2];

int buflen = DEFAULTBUFSIZE;

void exithandler(int s) {
        if (run) {
                run = 0;
                close(globalpipe[0]);
                close(globalpipe[1]);
        }
}

char *fname;


void *thread_code(void *arg) {
        time_t lasttime = time(NULL);
        int jj = (int)arg;
        int localiter = iterations/jj;
        int fd;

        char *buf;

        buf = alloca(buflen);
        if (buf == NULL) {
                printf("Failed to alloca!\n");
                exit(-1);
        }

        do {
                int i;
                time_t newtime;

                for(i=0; i < localiter && run; i++) {
                        int j;
                        double k = 1.0;
                        for(j=0; j < iterations2 && run; j++) {
                                k += j*k;
                                k = (1-k)*(1+k);
                                k = sqrt(k);
                        }
                        if (fd != -1) {
                                if (read(globalpipe[0], buf, buflen)
!= buflen) {
                                        printf("%d, aborted read\n",
jj);
                                }
                        }

                }
                sleep(1);
                newtime = time(NULL);
                printf("%d    delta = %d\n", (int)arg,
newtime-lasttime);
                lasttime = newtime;
        } while (run);
        return NULL;
}


int main(int argc, char **argv) {
        int i;
        pthread_t *mythreads;

        if (argc < 5)
                return -1;

        nbthreads = atoi(argv[1]);
        iterations = atoi(argv[2]);
        iterations2 = atoi(argv[3]);
        buflen = atoi(argv[4]);

        signal(SIGINT, exithandler);

        if (pipe(globalpipe) != 0) {
                return -2;
        }

        mythreads = (pthread_t *)calloc(nbthreads, sizeof(pthread_t));

        for(i=0; i < nbthreads; i++) {
                if (pthread_create(&mythreads[i], NULL, thread_code,
(void *)(i+1)) != 0) {
                        return -5;
                }
                printf("Started %d\n", i);
        }


        void * res;

        char *buf;
        int localbuflen;

        localbuflen = buflen * nbthreads;

        buf = alloca(localbuflen);
        if (buf == NULL) {
                printf("Failed to alloca!\n");
                run = 0;
                exit(-1);
        }

        sleep(3);

        do {
                write(globalpipe[1], buf, localbuflen);
        } while (run);


        for(i=0; i < nbthreads; i++) {
                printf("Waiting %d\n", i);
                pthread_join(mythreads[i], &res);
        }
        return 0;
}

-------------------- end cputest.c -------------------



^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2004-09-13 20:11 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20040811010116.GL11200@holomorphy.com>
2004-08-11  2:21 ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
2004-08-11  2:23   ` William Lee Irwin III
2004-08-11  2:45     ` Peter Williams
2004-08-11  2:47       ` Peter Williams
2004-08-11  3:23         ` Peter Williams
2004-08-11  3:31           ` Con Kolivas
2004-08-11  3:46             ` Peter Williams
2004-08-11  3:44           ` Peter Williams
2004-08-13  0:13             ` spaminos-ker
2004-08-13  1:44               ` Peter Williams
2004-08-11  3:09   ` Con Kolivas
2004-08-11 10:24     ` Prakash K. Cheemplavam
2004-08-11 11:26       ` Scheduler fairness problem on 2.6 series Con Kolivas
2004-08-11 12:05         ` Prakash K. Cheemplavam
2004-08-11 19:22           ` Prakash K. Cheemplavam
2004-08-11 23:42             ` Con Kolivas
2004-08-12  8:08               ` Prakash K. Cheemplavam
2004-08-12 18:18               ` Bill Davidsen
2004-08-12  2:04     ` Scheduler fairness problem on 2.6 series (Attn: Nick Piggin and others) spaminos-ker
2004-08-12  2:24     ` spaminos-ker
2004-08-12  2:53       ` Con Kolivas
     [not found] <411D50AE.5020005@bigpond.net.au>
2004-08-17 23:19 ` spaminos-ker
2004-08-18  0:12   ` Peter Williams
2004-08-24 21:11     ` spaminos-ker
2004-08-24 23:04       ` Peter Williams
2004-08-24 23:22         ` Lee Revell
2004-08-26  2:30         ` spaminos-ker
2004-08-26  2:42           ` Peter Williams
2004-08-26  8:39             ` Peter Williams
2004-08-28  1:59               ` spaminos-ker
2004-08-29  0:21                 ` Peter Williams
2004-08-29  0:25                   ` Lee Revell
2004-08-29  0:45                     ` Lee Revell
2004-08-29  2:03                       ` Peter Williams
2004-08-29  2:28                         ` spaminos-ker
2004-08-29  4:53                           ` Peter Williams
2004-08-29  1:19                     ` spaminos-ker
2004-08-29  1:22                       ` Lee Revell
2004-08-29  1:31                         ` Peter Williams
2004-09-13 20:09                           ` spaminos-ker
2004-08-29  2:20                       ` Lee Revell
     [not found] <20040811093945.GA10667@elte.hu>
2004-08-17 23:08 ` spaminos-ker
2004-08-07 21:53 spaminos-ker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.