linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problem with the O(1) scheduler in 2.4.19
@ 2002-09-01 21:53 Tobias Ringstrom
  2002-09-02 13:07 ` Alan Cox
  2002-09-02 13:36 ` Ingo Molnar
  0 siblings, 2 replies; 24+ messages in thread
From: Tobias Ringstrom @ 2002-09-01 21:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Kernel Mailing List

While the O(1) scheduler has performed very well for me in most
situations, I have one big problem with it.  When running a Counter-Strike
game server on Linux 2.4.19 with the sched-2.4.19-rc2-A4 patch applied,
the server process is niced from the default value of 15 (interactive) to
25 (background).  This means that every time crond wakes up or a mail
arrives the game latency becomes extremely bad and the users experience
lag.

The process takes around 70% CPU on these occasions, so I'm surprised that
the task is not considered to be interactive.

This does not happen with stock 2.4.19.  Do you have any ideas why this
regression is happening?

/Tobias


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-01 21:53 Problem with the O(1) scheduler in 2.4.19 Tobias Ringstrom
@ 2002-09-02 13:07 ` Alan Cox
  2002-09-02 13:42   ` Tobias Ringstrom
  2002-09-02 13:36 ` Ingo Molnar
  1 sibling, 1 reply; 24+ messages in thread
From: Alan Cox @ 2002-09-02 13:07 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Ingo Molnar, Kernel Mailing List

On Sun, 2002-09-01 at 22:53, Tobias Ringstrom wrote:
> While the O(1) scheduler has performed very well for me in most
> situations, I have one big problem with it.  When running a Counter-Strike
> game server on Linux 2.4.19 with the sched-2.4.19-rc2-A4 patch applied,
> the server process is niced from the default value of 15 (interactive) to
> 25 (background).  This means that every time crond wakes up or a mail
> arrives the game latency becomes extremely bad and the users experience
> lag.
> 
> The process takes around 70% CPU on these occasions, so I'm surprised that
> the task is not considered to be interactive.
> 
> This does not happen with stock 2.4.19.  Do you have any ideas why this
> regression is happening?

It isnt a regression, its a bug fix. The nice value is now being
honoured properly.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-01 21:53 Problem with the O(1) scheduler in 2.4.19 Tobias Ringstrom
  2002-09-02 13:07 ` Alan Cox
@ 2002-09-02 13:36 ` Ingo Molnar
  2002-09-02 13:54   ` Tobias Ringstrom
  1 sibling, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2002-09-02 13:36 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Kernel Mailing List


On Sun, 1 Sep 2002, Tobias Ringstrom wrote:

> While the O(1) scheduler has performed very well for me in most
> situations, I have one big problem with it.  When running a
> Counter-Strike game server on Linux 2.4.19 with the sched-2.4.19-rc2-A4
> patch applied, the server process is niced from the default value of 15
> (interactive) to 25 (background).  This means that every time crond
> wakes up or a mail arrives the game latency becomes extremely bad and
> the users experience lag.

does the same problem happen if you renice the game server to -10 or -15?

	Ingo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-02 13:07 ` Alan Cox
@ 2002-09-02 13:42   ` Tobias Ringstrom
  2002-09-02 21:44     ` Tobias Ringstrom
  0 siblings, 1 reply; 24+ messages in thread
From: Tobias Ringstrom @ 2002-09-02 13:42 UTC (permalink / raw)
  To: Alan Cox; +Cc: Ingo Molnar, Kernel Mailing List

On 2 Sep 2002, Alan Cox wrote:

> It isnt a regression, its a bug fix. The nice value is now being
> honoured properly.

The problem is that the kernel decided to nice the process (by changing
the priority, not the nice value) as if it was a background task, but it's
not a background task.  On the contrary, it's highly interactive.

/Tobias


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-02 13:36 ` Ingo Molnar
@ 2002-09-02 13:54   ` Tobias Ringstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Tobias Ringstrom @ 2002-09-02 13:54 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Kernel Mailing List

On Mon, 2 Sep 2002, Ingo Molnar wrote:

> On Sun, 1 Sep 2002, Tobias Ringstrom wrote:
> 
> > While the O(1) scheduler has performed very well for me in most
> > situations, I have one big problem with it.  When running a
> > Counter-Strike game server on Linux 2.4.19 with the sched-2.4.19-rc2-A4
> > patch applied, the server process is niced from the default value of 15
> > (interactive) to 25 (background).  This means that every time crond
> > wakes up or a mail arrives the game latency becomes extremely bad and
> > the users experience lag.
> 
> does the same problem happen if you renice the game server to -10 or -15?

The process was at nice level 0, which I think corresponds to prio 15-25
for interactive to background tasks if I understand things correctly.  
When I used top to renice the process to -10, the prio became 15, i.e. it
was still considered non-interactive.  I even tried -20 (or maybe -19),
and it was still at the non-interactive prio.

In other words:  For all nice values I tried (-20, -10, 0), the prio was
20+nice+5.  When the server is lightly loaded, the prio is 20+nice-5.

Note that even when the server was loaded, it only used 70% CPU, which I
suppose must mean that it does not use up the time slices, which I thought
should make the kernel treat the process as interactive.  Is there a
description of the criteria somewhere (other than in the source code)?

/Tobias


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-02 13:42   ` Tobias Ringstrom
@ 2002-09-02 21:44     ` Tobias Ringstrom
  2002-09-03  5:54       ` Ingo Molnar
  0 siblings, 1 reply; 24+ messages in thread
From: Tobias Ringstrom @ 2002-09-02 21:44 UTC (permalink / raw)
  To: Alan Cox; +Cc: Ingo Molnar, Kernel Mailing List

On Mon, 2 Sep 2002, Tobias Ringstrom wrote:

> On 2 Sep 2002, Alan Cox wrote:
> 
> > It isnt a regression, its a bug fix. The nice value is now being
> > honoured properly.
> 
> The problem is that the kernel decided to nice the process (by changing
> the priority, not the nice value) as if it was a background task, but it's
> not a background task.  On the contrary, it's highly interactive.

I think I will have to take this back.  It looks like even the old kernel
treats the game server as a background process, but as you said, it does
not make such a big difference.  Another change is that the prio value 
varies very quickly over time (as seen in top).  I do not recall seeing 
that using the O(1)-scheduler.

But I still do not understand why the process is classified as
non-interactive...  Around 20 times per second it does a nanosleep for
1 ms which takes around 40 ms in reality.  (Seeing this makes me believe 
that I should try to increase HZ, but that is a separate issue.)

/Tobias



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-02 21:44     ` Tobias Ringstrom
@ 2002-09-03  5:54       ` Ingo Molnar
  2002-09-03 10:13         ` Tobias Ringstrom
  0 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2002-09-03  5:54 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Alan Cox, Kernel Mailing List


On Mon, 2 Sep 2002, Tobias Ringstrom wrote:

> But I still do not understand why the process is classified as
> non-interactive...  Around 20 times per second it does a nanosleep for 1
> ms which takes around 40 ms in reality.  (Seeing this makes me believe
> that I should try to increase HZ, but that is a separate issue.)

what CPU usage does it have? 70% CPU usage is not interactive.

well, even 70% CPU usage can be interactive if you lower its priority to
-20. But with the default nice value a task will lose its interactivity
much quicker.

also, could you increase HZ to 1000 (in asm/param.h, full recompile of the
kernel is needed), does it make a difference?

	Ingo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-03  5:54       ` Ingo Molnar
@ 2002-09-03 10:13         ` Tobias Ringstrom
  2002-09-03 10:28           ` Ingo Molnar
  0 siblings, 1 reply; 24+ messages in thread
From: Tobias Ringstrom @ 2002-09-03 10:13 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Alan Cox, Kernel Mailing List

On Tue, 3 Sep 2002, Ingo Molnar wrote:

> On Mon, 2 Sep 2002, Tobias Ringstrom wrote:
> 
> > But I still do not understand why the process is classified as
> > non-interactive...  Around 20 times per second it does a nanosleep for 1
> > ms which takes around 40 ms in reality.  (Seeing this makes me believe
> > that I should try to increase HZ, but that is a separate issue.)
> 
> what CPU usage does it have? 70% CPU usage is not interactive.
> 
> well, even 70% CPU usage can be interactive if you lower its priority to
> -20. But with the default nice value a task will lose its interactivity
> much quicker.

If I understand the code in sched.c correctly, the dynamic prio [-5...5]
is calculated using sleep_avg, but the name is deceiving, it's more like
the edge of a knife.  If a process is sleeping, its sleep_avg is
incremented by one per timer tick, and if it is running it is decremented
by one per timer tick.  This means (for a periodic task) that if it sleeps
for less than 50% of the timer ticks, it will get a sleep_avg of zero
(dynamic prio +5), and if it is sleeping for more than 50%, it will get a 
sleep_avg of MAX_SLEEP_AVG (dynamic prio -5).

For the case of a game server, this means that when the CPU utilization 
gets above 50% (roughly), it will switch from -5 to +5 in dynamic priority 
in a few seconds and stay there until the CPU utilization drops under 50%.

Is my analysis correct, and is this what we want?

Have you experimented with other averaging algorithms?

> also, could you increase HZ to 1000 (in asm/param.h, full recompile of the
> kernel is needed), does it make a difference?

I tried that yesterday (without the O(1) scheduler), and it does wonders
for the in-game latency (i.e. ping).  I suppose that the dynamic prio will
still be +5 at 70% CPU utilization even with a HZ of 1000 using the O(1)  
scheduler.  Why would it make a difference?

/Tobias


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-03 10:13         ` Tobias Ringstrom
@ 2002-09-03 10:28           ` Ingo Molnar
  2002-09-03 12:23             ` Tobias Ringstrom
  2002-09-03 16:46             ` Problem with the O(1) scheduler in 2.4.19 John Alvord
  0 siblings, 2 replies; 24+ messages in thread
From: Ingo Molnar @ 2002-09-03 10:28 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Alan Cox, Kernel Mailing List


On Tue, 3 Sep 2002, Tobias Ringstrom wrote:

> For the case of a game server, this means that when the CPU utilization
> gets above 50% (roughly), it will switch from -5 to +5 in dynamic
> priority in a few seconds and stay there until the CPU utilization drops
> under 50%.
> 
> Is my analysis correct, and is this what we want?

do you expect a task that uses up 50% CPU time over an extended period of
time to be rated 'interactive'?

we might make the '50%' rule to be '100% / nr_running_avg', so that if
your task is the only one in the system then it gets rated interactive -
but i suspect it will still be rated a CPU hog if it keeps trying to use
up 50% of CPU time even during busier periods. I have tried the
(1/nr_running) rule in earlier incarnations of the scheduler, and it didnt
make much difference, but we obviously need a boundary case like yours to
see the differences.

> I tried that yesterday (without the O(1) scheduler), and it does wonders
> for the in-game latency (i.e. ping).  I suppose that the dynamic prio
> will still be +5 at 70% CPU utilization even with a HZ of 1000 using the
> O(1)  scheduler.  Why would it make a difference?

(it could in theory make a difference in some rare cases, in which the
frequency of sampling resonates with internal timings of the application -
i asked for this only to make sure there are no interactions.)

	Ingo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-03 10:28           ` Ingo Molnar
@ 2002-09-03 12:23             ` Tobias Ringstrom
  2002-09-03 15:58               ` Mark Mielke
                                 ` (2 more replies)
  2002-09-03 16:46             ` Problem with the O(1) scheduler in 2.4.19 John Alvord
  1 sibling, 3 replies; 24+ messages in thread
From: Tobias Ringstrom @ 2002-09-03 12:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Alan Cox, Kernel Mailing List

On Tue, 3 Sep 2002, Ingo Molnar wrote:

> do you expect a task that uses up 50% CPU time over an extended period of
> time to be rated 'interactive'?

Interactive is not the best word, but I would not expect a process like
the one I described to be considedred a CPU hog.  It's a deadline driven
semi realtime process.

> we might make the '50%' rule to be '100% / nr_running_avg', so that if
> your task is the only one in the system then it gets rated interactive -
> but i suspect it will still be rated a CPU hog if it keeps trying to use
> up 50% of CPU time even during busier periods. I have tried the
> (1/nr_running) rule in earlier incarnations of the scheduler, and it didnt
> make much difference, but we obviously need a boundary case like yours to
> see the differences.

I think the problem I have (that I loose a lot of performance to processes
such as crond, httpd, etc.) is common to the whole class of semi-realtime
processes, at least if they use >50% CPU.  This means that CPU intensive
audio and video (e.g. DVD) playback programs might have the same problem.

I see three simple ways to solve the problem without changing the
scheduler.  Either run the process with nice -20, use SCHED_RR, or use a
dedicated server with no other processes (such as crond, httpd, etc).  
The first two might be OK, but you need root privilegies to run renice and
to change the scheduler policy.  The third one is not an option for all
users, and definately not for the video playback case.

A problem is that this new scheduler behaviour will hit people running
semi realtime processes as a regression when they switch to 2.6.  It would
be nice to avoid that.

One solution might be to teach the scheduler how to detect these deadline
driven semi-realtime processes, and not punish them.  It is not obvious to
me how to do that.

Another much simpler solution that might work just as well is be to change
the CPU utilization threshold from 50% to 90%.

You're the expert of course.  I'm only fumbling in the dark...  :-)

> (it could in theory make a difference in some rare cases, in which the
> frequency of sampling resonates with internal timings of the application -
> i asked for this only to make sure there are no interactions.)

I'll try it out and let you know if it does make a difference.

/Tobias


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-03 12:23             ` Tobias Ringstrom
@ 2002-09-03 15:58               ` Mark Mielke
  2002-09-03 16:58                 ` Tobias Ringstrom
  2002-09-03 16:51               ` Ingo Molnar
  2002-09-04  0:34               ` [SOURCE] RT monitor (Was: Re: Problem with the O(1) scheduler in 2.4.19) Roger Larsson
  2 siblings, 1 reply; 24+ messages in thread
From: Mark Mielke @ 2002-09-03 15:58 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Ingo Molnar, Alan Cox, Kernel Mailing List

I wonder if it does not make sense to just give the process real time
priority? No scheduler will be excellent in all situations. I would not
consider a game, or game server, to be a standard application.

mark


On Tue, Sep 03, 2002 at 02:23:49PM +0200, Tobias Ringstrom wrote:
> On Tue, 3 Sep 2002, Ingo Molnar wrote:
> 
> > do you expect a task that uses up 50% CPU time over an extended period of
> > time to be rated 'interactive'?
> 
> Interactive is not the best word, but I would not expect a process like
> the one I described to be considedred a CPU hog.  It's a deadline driven
> semi realtime process.
> 
> > we might make the '50%' rule to be '100% / nr_running_avg', so that if
> > your task is the only one in the system then it gets rated interactive -
> > but i suspect it will still be rated a CPU hog if it keeps trying to use
> > up 50% of CPU time even during busier periods. I have tried the
> > (1/nr_running) rule in earlier incarnations of the scheduler, and it didnt
> > make much difference, but we obviously need a boundary case like yours to
> > see the differences.
> 
> I think the problem I have (that I loose a lot of performance to processes
> such as crond, httpd, etc.) is common to the whole class of semi-realtime
> processes, at least if they use >50% CPU.  This means that CPU intensive
> audio and video (e.g. DVD) playback programs might have the same problem.
> 
> I see three simple ways to solve the problem without changing the
> scheduler.  Either run the process with nice -20, use SCHED_RR, or use a
> dedicated server with no other processes (such as crond, httpd, etc).  
> The first two might be OK, but you need root privilegies to run renice and
> to change the scheduler policy.  The third one is not an option for all
> users, and definately not for the video playback case.
> 
> A problem is that this new scheduler behaviour will hit people running
> semi realtime processes as a regression when they switch to 2.6.  It would
> be nice to avoid that.
> 
> One solution might be to teach the scheduler how to detect these deadline
> driven semi-realtime processes, and not punish them.  It is not obvious to
> me how to do that.
> 
> Another much simpler solution that might work just as well is be to change
> the CPU utilization threshold from 50% to 90%.
> 
> You're the expert of course.  I'm only fumbling in the dark...  :-)
> 
> > (it could in theory make a difference in some rare cases, in which the
> > frequency of sampling resonates with internal timings of the application -
> > i asked for this only to make sure there are no interactions.)
> 
> I'll try it out and let you know if it does make a difference.
> 
> /Tobias
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-03 10:28           ` Ingo Molnar
  2002-09-03 12:23             ` Tobias Ringstrom
@ 2002-09-03 16:46             ` John Alvord
  2002-09-03 17:00               ` Ingo Molnar
  1 sibling, 1 reply; 24+ messages in thread
From: John Alvord @ 2002-09-03 16:46 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Tobias Ringstrom, Alan Cox, Kernel Mailing List

On Tue, 3 Sep 2002 12:28:18 +0200 (CEST), Ingo Molnar <mingo@elte.hu>
wrote:

>
>On Tue, 3 Sep 2002, Tobias Ringstrom wrote:
>
>> For the case of a game server, this means that when the CPU utilization
>> gets above 50% (roughly), it will switch from -5 to +5 in dynamic
>> priority in a few seconds and stay there until the CPU utilization drops
>> under 50%.
>> 
>> Is my analysis correct, and is this what we want?
>
>do you expect a task that uses up 50% CPU time over an extended period of
>time to be rated 'interactive'?
>
>we might make the '50%' rule to be '100% / nr_running_avg', so that if
>your task is the only one in the system then it gets rated interactive -
>but i suspect it will still be rated a CPU hog if it keeps trying to use
>up 50% of CPU time even during busier periods. I have tried the
>(1/nr_running) rule in earlier incarnations of the scheduler, and it didnt
>make much difference, but we obviously need a boundary case like yours to
>see the differences.
>
>> I tried that yesterday (without the O(1) scheduler), and it does wonders
>> for the in-game latency (i.e. ping).  I suppose that the dynamic prio
>> will still be +5 at 70% CPU utilization even with a HZ of 1000 using the
>> O(1)  scheduler.  Why would it make a difference?
>
>(it could in theory make a difference in some rare cases, in which the
>frequency of sampling resonates with internal timings of the application -
>i asked for this only to make sure there are no interactions.)
>
It seems to me that this condition could arise for any server process
which is used by many interactive processes. Imagine 300 users and a
server process which needs 70% to do the work. This could be a
database server as well as the current game server.

john

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-03 12:23             ` Tobias Ringstrom
  2002-09-03 15:58               ` Mark Mielke
@ 2002-09-03 16:51               ` Ingo Molnar
  2002-09-03 17:55                 ` Tobias Ringstrom
  2002-09-04  0:34               ` [SOURCE] RT monitor (Was: Re: Problem with the O(1) scheduler in 2.4.19) Roger Larsson
  2 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2002-09-03 16:51 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Alan Cox, Kernel Mailing List


On Tue, 3 Sep 2002, Tobias Ringstrom wrote:

> [...] It's a deadline driven semi realtime process.

> [...] I see three simple ways to solve the problem without changing the
> scheduler.  Either run the process with nice -20, use SCHED_RR, or use a
> dedicated server with no other processes (such as crond, httpd, etc).  
> The first two might be OK, but you need root privilegies to run renice
> and to change the scheduler policy.  The third one is not an option for
> all users, and definately not for the video playback case.

do you see the conflict between your two statements?

if it's a "semi-realtime" process that needs more CPU time and needs it
sooner than other 'unimportant' processes in the system like httpd or
remote shells, then give it a higher priority.

under the O(1) scheduler this will now do something meaningful. Yes, this
needs root privileges, otherwise it could be abused to lift priority and
effectively lock out eg. the root shell.

under the old scheduler the nice levels were just a rough mechanism to
determine how CPU hogs use the CPU - interactiveness-wise it did not make
a big difference.

but, i have a spare plan for this, mentioned previously: to enable
unprivileged processes to lower their priority to -5 if they want to.
Could you please test your game server, does it feel interactive enough at
-5?

(allowing -10 might be too much of a stretch.)

	Ingo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-03 15:58               ` Mark Mielke
@ 2002-09-03 16:58                 ` Tobias Ringstrom
  0 siblings, 0 replies; 24+ messages in thread
From: Tobias Ringstrom @ 2002-09-03 16:58 UTC (permalink / raw)
  To: Mark Mielke; +Cc: Ingo Molnar, Alan Cox, Kernel Mailing List

On Tue, 3 Sep 2002, Mark Mielke wrote:

> I wonder if it does not make sense to just give the process real time
> priority? No scheduler will be excellent in all situations. I would not
> consider a game, or game server, to be a standard application.

If you are talking about SCHED_RR, I think it would lock up the server
since it only sleeps 1 ms which is done as a busy sleep for SCHED_RR
tasks.  The game server would have to be designed to use SCHED_RR in a
sensible way, in that case.  The source code is not availible... :-(

/Tobias


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-03 16:46             ` Problem with the O(1) scheduler in 2.4.19 John Alvord
@ 2002-09-03 17:00               ` Ingo Molnar
  2002-09-04 20:14                 ` Bill Davidsen
  0 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2002-09-03 17:00 UTC (permalink / raw)
  To: John Alvord; +Cc: Tobias Ringstrom, Alan Cox, Kernel Mailing List


On Tue, 3 Sep 2002, John Alvord wrote:

> It seems to me that this condition could arise for any server process
> which is used by many interactive processes. Imagine 300 users and a
> server process which needs 70% to do the work. This could be a database
> server as well as the current game server.

well, if there is enough CPU power around then there is no problem -
everyone gets enough CPU time.

if CPU power becomes scarce then the kernel will do like it does for every
other resource: it starts to partition the resource, and no-one will get
the absolute maximum it has asked for.

the 2.5 scheduler adds another thing to the mix: if a task behaves in an
'interactive' way then it will get more CPU time than what it got in 2.4 -
if it behaves like a 'CPU hog' then it will get less CPU time than what it
used to get in 2.4.

the penalty is at most +-5 priority levels, so you can always offset (much
of) this effect by moving the task 10 priority levels lower. (Hence the
magic '-10' priority level i keep suggesting, and hence the magic -5
priority levels i'd like to allow ordinary tasks to lower their priority.)

[the scheduler also has other code to ensure fairness in highly loaded
situations, it makes sure that no task waits CPU-less for more than 3
seconds due to the interactiveness bonuses. This effect does not play in
this current situation, it needs a couple of tens of currently running
agressive tasks to trigger on most normal boxes.]

those tasks that need a disproportionate amount of CPU time need to be
reniced, so that the penalty for being an 'unfair' CPU user is offset.  
There is no way the scheduler could figure out how important a task is -
some people have a game server have higher priority, other people would
give httpd (or remote shells) a higher priority. Since this information is
only available in the administrator's head, it needs help from the
administrator to handle the situation. The kernel has a good default, but
it cannot work in every case, this is why we have the ability to renice
tasks.

	Ingo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-03 16:51               ` Ingo Molnar
@ 2002-09-03 17:55                 ` Tobias Ringstrom
  2002-09-03 18:05                   ` Ingo Molnar
  0 siblings, 1 reply; 24+ messages in thread
From: Tobias Ringstrom @ 2002-09-03 17:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Alan Cox, Kernel Mailing List

On Tue, 3 Sep 2002, Ingo Molnar wrote:

> On Tue, 3 Sep 2002, Tobias Ringstrom wrote:
> 
> > [...] It's a deadline driven semi realtime process.
> 
> > [...] I see three simple ways to solve the problem without changing the
> > scheduler.  Either run the process with nice -20, use SCHED_RR, or use a
> > dedicated server with no other processes (such as crond, httpd, etc).  
> > The first two might be OK, but you need root privilegies to run renice
> > and to change the scheduler policy.  The third one is not an option for
> > all users, and definately not for the video playback case.
> 
> do you see the conflict between your two statements?

Certainly, it's very hard for the kernel to do the right thing.  Perhaps 
the only viable solution is for the user to solve the problem.

Would it really be so unfair go give the user a way to state that a
process is interactive?  The kernel obviously make mistakes.  The system
is not fair for users anyway.  If a user wants to compete with other
users, he can create more processes to get more CPU.

I'm really concerned about the video decompression/playback situation,
which is quite similar, and can easily take >50% CPU.  It also very
inconvenient to have to have superuser support to get good frame rate
stability.  A way to define a process as interactive is one way to solve
that problem.  Another solution is to let ordinary users use negative nice
values, as you mention below.

> but, i have a spare plan for this, mentioned previously: to enable
> unprivileged processes to lower their priority to -5 if they want to.
> Could you please test your game server, does it feel interactive enough at
> -5?

It helps a little, but the problem is still very visible.

> (allowing -10 might be too much of a stretch.)

Why?  If it's using more than 50% CPU, the prio will be the same as a 
zero-niced interactive process.

The minimum user nice value might be a good candidate for a new rlimit...

/Tobias


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-03 17:55                 ` Tobias Ringstrom
@ 2002-09-03 18:05                   ` Ingo Molnar
  2002-09-10 22:58                     ` Tobias Ringstrom
  0 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2002-09-03 18:05 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Alan Cox, Kernel Mailing List


On Tue, 3 Sep 2002, Tobias Ringstrom wrote:

> > (allowing -10 might be too much of a stretch.)
> 
> Why?  If it's using more than 50% CPU, the prio will be the same as a
> zero-niced interactive process.

well, perhaps -10 could also be allowed.

does -10 make it equivalent to the 2.4 behavior? Could you somehow measure
the priority where it's still acceptable? Ie. -8 or -9?

> The minimum user nice value might be a good candidate for a new
> rlimit...

yes.

	Ingo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [SOURCE] RT monitor (Was: Re: Problem with the O(1) scheduler in 2.4.19)
  2002-09-03 12:23             ` Tobias Ringstrom
  2002-09-03 15:58               ` Mark Mielke
  2002-09-03 16:51               ` Ingo Molnar
@ 2002-09-04  0:34               ` Roger Larsson
  2 siblings, 0 replies; 24+ messages in thread
From: Roger Larsson @ 2002-09-04  0:34 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1521 bytes --]

On Tuesday 03 September 2002 14.23, Tobias Ringstrom wrote:
> I see three simple ways to solve the problem without changing the
> scheduler.  Either run the process with nice -20, use SCHED_RR, or use a
> dedicated server with no other processes (such as crond, httpd, etc).  
> The first two might be OK, but you need root privilegies to run renice and
> to change the scheduler policy.  The third one is not an option for all
> users, and definately not for the video playback case.
> 

Here comes some code that works as a RT requester/monitor and
an small utility to try it out.

With this monitor any process can request RT priorities.
If those (or other) processes overloads the system,
all will be returned to normal priorities.

Note:
* this code is still experimental. I had a situation where
   a previous monitor reduced its own priority... (rendering it useless)
* It does probably not work on SMP - I have not given that
   much of a thought yet...

compile the source:
        gcc -Wall rt.c -o rt
        gcc -Wall rt_monitor.c -o rt_monitor

then as root:
        mkfifo -m 622 /var/named/rt-request
        ./rt_monitor

start another shell (as a normal user - not root)
to check the function of the monitor (sleeps 3 s then loops,
the monitor should reduce the priority in about 4 seconds)
        ./rt -c

to set RT priority on any process do
(note: this should be quite safe since the monitor does the raising
so it has to be running :-)
	./rt -p anypid 


/RogerL
-- 
Roger Larsson
Skellefteå
Sweden

[-- Attachment #2: rt.c --]
[-- Type: text/x-csrc, Size: 2603 bytes --]

/* RT user.

        Copyright (c) 2002 Roger Larsson <roger.larsson@norran.net>

    This program is free software; you can redistribute it and/or
    modify it under the terms of version 2 of the GNU General Public
    License as published by the Free Software Foundation.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

    Thanks to autor of KSysGuard Chris Schlaeger for borrowed code...
*/

#include <sys/types.h>
#include <sched.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>

struct request
{
	pid_t pid;
	char _filler[32];
};


int main(int argc, char *argv[])
{
	struct request request;
	FILE *reqf;
	int done=0, loops = 0;

	request.pid = getpid();
	while (!done) {
	    switch (getopt(argc, argv, "?c:p:" )) {
		case 'c':
		    loops = atoi(optarg);
		    break;
		case 'p':
		    request.pid = atoi(optarg);
		    printf("pid %d\n", request.pid);
		    break;
		case '?':
		    printf("%s: [-c|-p pid]\n", argv[0]);
		    printf("\t-c loops\tcheck monitor function by looping\n");
		    printf("\t-p pid\trequest on behalf of other process\n");
		    return 1;
		case -1:
		    // No more options
		    done = 1;
		    break;
	    }
	}

	printf("As long as no monitor runs, execution will sleep here...\n");
	reqf = fopen("/var/named/rt-request", "w");
	if (reqf == NULL) {
	    perror("fopen");
	    return errno;
	}

	printf("policy %d\n", sched_getscheduler(request.pid));


	fwrite(&request, 32, 1, reqf);
	fclose(reqf); // important! (maybe flush?)

	printf("policy %d\n", sched_getscheduler(request.pid));

	// well behaved
	if (request.pid == getpid() && loops > 0) {
	    // Wait until RT prio raised
	    while (sched_getscheduler(request.pid) == 0) {
	    }

	    printf("\nsleep for 3 seconds then start with a\n");
	    printf("busy wait for %d loops (or until prio reduced)\n", loops);
	    printf(" move your mouse!\n");
	    sleep(3);

	    while (--loops > 0 && sched_getscheduler(request.pid) != 0) {
		// someone did listen to my request...
		// assume monitor is running
	    }

	    if (loops == 0)
		printf(" - normal loop finish, to short loop?\n");
	    else
		printf(" - monitor works! (priority got reduced)\n");
	}

	return 0;
}

[-- Attachment #3: rt_monitor.c --]
[-- Type: text/x-csrc, Size: 9047 bytes --]

/* RT monitor.

        Copyright (c) 2002 Roger Larsson <roger.larsson@norran.net>

    This program is free software; you can redistribute it and/or
    modify it under the terms of version 2 of the GNU General Public
    License as published by the Free Software Foundation.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

    Thanks to autor of KSysGuard Chris Schlaeger for borrowed code...
*/

#include <sys/types.h>
#include <sched.h>
#include <stdio.h>
#include <dirent.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <ctype.h>
#include <sys/time.h>
#include <stdlib.h>


int set_normal_priority(pid_t pid);



 int isRT(pid_t pid)
 {
     int sched_class = sched_getscheduler( pid);
     if (sched_class == -1) {
	 fprintf(stderr, "Pid %d Exited?\n", pid);
	 return 0;
     }

     return sched_class != SCHED_OTHER;
 }

struct rt_process_info
{
    /* This flag is set for all found processes at the beginning of the
     * process list update. Processes that do not have this flag set will
     * be assumed dead and removed from the list. The flag is cleared after
     * each list update. */
    int alive;
    int centStamp;


    pid_t pid;
    pid_t ppid;
    gid_t gid;

    unsigned int userTime;
    unsigned int sysTime;
    unsigned int vmSize; // enough?
    unsigned int vmRss; // enough?

    float sysLoad;
    float userLoad;
    float cpu_usage;
};

#define MAX_RT_PROCESSES 200
struct rt_process_info rt_process[MAX_RT_PROCESSES]; /* pid & alive == 0 */

struct rt_process_info *find_process(pid_t pid)
{
    unsigned ix;
    for (ix = 0; ix < MAX_RT_PROCESSES; ix++)
    {
	if (rt_process[ix].pid == pid) {
	    rt_process[ix].alive = 1;
	    return &rt_process[ix];
	}
    }

    return NULL;
}

struct rt_process_info *new_process(pid_t pid)
{
    unsigned ix;
    for (ix = 0; ix < MAX_RT_PROCESSES; ix++)
    {
	if (rt_process[ix].pid == 0) {
	    rt_process[ix].pid = pid;
	    rt_process[ix].alive = 2;
	    return &rt_process[ix];
	}
    }

    return NULL;
}

float cpu_usage(struct rt_process_info *ps)
{
#define BUFSIZE 1024
    char buf[BUFSIZE];
    FILE *fd;
    char status;
    unsigned int userTime, sysTime;

    snprintf(buf, BUFSIZE - 1, "/proc/%d/stat", ps->pid);
    buf[BUFSIZE - 1] = '\0';
    if ((fd = fopen(buf, "r")) == 0)
	return (-1);

    if (fscanf(fd, "%*d %*s %c %d %d %*d %*d %*d %*u %*u %*u %*u %*u %d %d"
	       "%*d %*d %*d %*d %*u %*u %*d %u %u",
	       &status, (int*) &ps->ppid, (int*) &ps->gid,
	       &userTime, &sysTime, &ps->vmSize,
	       &ps->vmRss) != 7) {
	fclose(fd);
	return (-1);
    }

    if (fclose(fd))
	return (-1);

    {
	unsigned int newCentStamp;
	int timeDiff, userDiff, sysDiff;
	struct timeval tv;

	gettimeofday(&tv, 0);
	newCentStamp = tv.tv_sec * 100 + tv.tv_usec / 10000;

	// calculate load
	if (ps->alive == 2)
	    ps->sysLoad = ps->userLoad = 0.0f; /* can't give relieable number at the moment... */
	else {
	    timeDiff = (int)(newCentStamp - ps->centStamp);
	    userDiff = userTime - ps->userTime;
	    sysDiff = sysTime - ps->sysTime;

			
	    if ((timeDiff > 0) && (userDiff >= 0) && (sysDiff >= 0)) /* protect from bad data */
	    {
		ps->userLoad = ((double) userDiff / timeDiff) * 100.0;
		ps->sysLoad = ((double) sysDiff / timeDiff) * 100.0;
	    }
	    else
		ps->sysLoad = ps->userLoad = 0.0;
	}

	// update fields
	ps->centStamp = newCentStamp;
	ps->userTime = userTime;
	ps->sysTime = sysTime;
    }
	
    ps->cpu_usage = ps->userLoad + ps->sysLoad;

    return ps->cpu_usage;
}

float process_cpu_usage(pid_t pid)
{
    struct rt_process_info *process = find_process(pid);
    float cpu_use;

    if (process == NULL)
    {
	process = new_process(pid);
	if (process == NULL) {
	    // to many RT processes!
	    //  this process is new - assume a DOS attack
	    printf("Out of RT process info space - "
		   "assume DOS attack\n");
	    set_normal_priority(pid);
	}

						
	process->alive = 2; /* mark process new */
    }

    cpu_use = cpu_usage(process);

    process->alive = 1;

    return cpu_use;
}

/* process reading code from ksysguard */
float cpu_rt_usage(struct rt_process_info **rt_list_head)
{
    // Watch out for SMP effects...
	
    float result = 0.0f;
    pid_t myself = getpid();
    DIR* dir;
    struct dirent* entry;

    /* read in current process list via the /proc filesystem entry */
    if ((dir = opendir("/proc")) == NULL)
    {
	perror("Cannot open directory \'/proc\'!\n"
	       "The kernel needs to be compiled with support\n"
	       "for /proc filesystem enabled!\n");
	return 0;
    }
	
    // for all processes
    while ((entry = readdir(dir)))
    {
	if (isdigit(entry->d_name[0]))
	{
	    pid_t pid;

	    pid = atoi(entry->d_name);
			
	    if (pid != myself && isRT(pid)) {
		result += process_cpu_usage(pid);
	    }
	}
    }
    closedir(dir);
	
    return result;
}


void gc_rt_processes()
{
    unsigned ix;
    for (ix = 0; ix < MAX_RT_PROCESSES; ix++)
    {
	struct rt_process_info *rt_examine = &rt_process[ix];

	if (rt_examine->alive)
	{
	    rt_examine->alive = 0;
	}
	else
	{
	    rt_examine->pid = 0; /* delete it! */
	}
    }
}

int set_me_realtime(void)
{
struct sched_param schp;
	/*
	 * set the process to realtime privs
	 */
        memset(&schp, 0, sizeof(schp));
	schp.sched_priority = sched_get_priority_max(SCHED_FIFO);

	if (sched_setscheduler(0, SCHED_FIFO, &schp) != 0) {
		perror("sched_setscheduler");
		return -1;
	}

	if(mlockall(MCL_CURRENT|MCL_FUTURE))
	{
	    perror("mlockall() failed, exiting. mlock");
	    return -1;
	}

	return 0;

}

int set_realtime_priority(pid_t pid)
{
	struct sched_param schp;
	/*
	 * set the process to realtime privs
	 */

	printf("Attempt to set realtime for pid %d ", pid);

	if (pid == 0 || pid == getpid()) {
	    printf("- ignored! (that is me)\n");
	    return -1;
	}


        memset(&schp, 0, sizeof(schp));
	schp.sched_priority = sched_get_priority_min(SCHED_FIFO);

	if (sched_setscheduler(pid, SCHED_FIFO, &schp) != 0) {
		printf("- failed!\n");
		perror("sched_setscheduler");
		return -1;
	}
	printf("- done!\n");

	(void)process_cpu_usage(pid);

	return 0;

}

int set_normal_priority(pid_t pid)
{
struct sched_param schp;
	/*
	 * set the process to realtime privs
	 */
        memset(&schp, 0, sizeof(schp));
	schp.sched_priority = 0;

	printf("Attempt to reduce scheduling class for pid %d ", pid);

	if (pid == 0 || pid == getpid()) {
	    printf("- ignored! (that is me)\n");
	    return -1;
	}

	if (sched_setscheduler(pid, SCHED_OTHER, &schp) != 0) {
		printf("- failed!\n");
		perror("sched_setscheduler");
		return -1;
	}
	printf("- done!\n");

	return 0;
}

void set_normal_priority_all()
{
    unsigned ix;
    for (ix = 0; ix < MAX_RT_PROCESSES; ix++)
    {
	struct rt_process_info *process_info = &rt_process[ix];
	if (process_info->pid)
	    set_normal_priority(process_info->pid);
    }
}

struct request
{
	pid_t pid;
	char	_filler[32];
};
	
#define REQUEST_SIZE 32

int poll_request(int reqfd)
{
    // Be VERY careful not to
    // * block here...
    // * get buffer overruns...

	static int remaining = REQUEST_SIZE;
	static struct request request;
	char *next = ((char *)&request + REQUEST_SIZE - remaining);

	int ret = read(reqfd, (void *)next, remaining);
	if (ret == -1) {
		perror("read");
		return 0;
	}
	remaining -= ret;

	if (remaining == 0) {
		remaining = REQUEST_SIZE;

		if (request.pid == 0 || request.pid == getpid()) {
			fputs("attempt to forge the monitor\n", stderr);
			return 0;
		}

		set_realtime_priority(request.pid);

		return 1;
	}

	return 0;
}

#define RTREQUEST_FILE "/var/named/rt-request" 
int main(int argc, char * argv[])
{
    struct rt_process_info *rt_list = NULL;
    int reqfd = open(RTREQUEST_FILE,  O_RDONLY | O_NONBLOCK | O_NDELAY);

    if (reqfd == -1) {
	perror("open " RTREQUEST_FILE);
	fputs("have you created it? use 'mkfifo -m 622 " RTREQUEST_FILE "'\n", stderr);
	exit(1);
    }
      
    // monitor process runs with realtime prio
    set_me_realtime();

 #define MIN_IDLE 10
 #define MAX_RT_USAGE 70

    while (1) {
	poll_request(reqfd);

		
	if (cpu_rt_usage(&rt_list) > MAX_RT_USAGE) {
	    printf("Total CPU RT usage above MAX_RT_USAGE\n");
	    
	    gc_rt_processes();

	    // build process trees from rt_list
	    // decide which tree to reduce to normal prio class
	    //   (assume only one for simplicitly...)
	    
	    // reduce all processes in that tree
	    set_normal_priority_all();

	    //   (may use nice to simulate prio levels)
	    // log a message

	}
	sleep(2);
    }
    
    // process exiting - free elements on rt_list...
    //free_rt_list(rt_list);
}


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-03 17:00               ` Ingo Molnar
@ 2002-09-04 20:14                 ` Bill Davidsen
  0 siblings, 0 replies; 24+ messages in thread
From: Bill Davidsen @ 2002-09-04 20:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linux-Kernel Mailing List

On Tue, 3 Sep 2002, Ingo Molnar wrote:

> 
> On Tue, 3 Sep 2002, John Alvord wrote:
> 
> > It seems to me that this condition could arise for any server process
> > which is used by many interactive processes. Imagine 300 users and a
> > server process which needs 70% to do the work. This could be a database
> > server as well as the current game server.

As I see it, there are two possibilities here, that the game server is
being run on a dedicated server hardware, or at least with the blessing of
the root user. In that case root can adjust nice appropriately. The other
possibility is that root is counting on the scheduler to protect the other
processes in the system from being starved. In that case it's working
nicely.

> the 2.5 scheduler adds another thing to the mix: if a task behaves in an
> 'interactive' way then it will get more CPU time than what it got in 2.4 -
> if it behaves like a 'CPU hog' then it will get less CPU time than what it
> used to get in 2.4.

Yes, and it works really well! Job mixes which used to result in poor
response now work just fine, nice actually does something, and behaviour
of processes which are intended to get resources can be given negative
nice (nasty?) to make them run well.
 
> the penalty is at most +-5 priority levels, so you can always offset (much
> of) this effect by moving the task 10 priority levels lower. (Hence the
> magic '-10' priority level i keep suggesting, and hence the magic -5
> priority levels i'd like to allow ordinary tasks to lower their priority.)

Seems to defeat all the wonderful work which went into this. On any shared
system there will be people who know how to trick the scheduler into
running their jobs faster. Used to do that myself when machine were really
slow ;-) Actually if I understand the way the scheduler works, and I think
I do at the high level, if this server was a well-behaved threaded app
individual threads would show as interactive, they could have various
priority depending on the behaviour of the threads, and things would run
pretty well. If that server is doing a huge select or poll of 300 users I
bet all the CPU is in the system call anyway.
 
> [the scheduler also has other code to ensure fairness in highly loaded
> situations, it makes sure that no task waits CPU-less for more than 3
> seconds due to the interactiveness bonuses. This effect does not play in
> this current situation, it needs a couple of tens of currently running
> agressive tasks to trigger on most normal boxes.]
> 
> those tasks that need a disproportionate amount of CPU time need to be
> reniced, so that the penalty for being an 'unfair' CPU user is offset.  
> There is no way the scheduler could figure out how important a task is -
> some people have a game server have higher priority, other people would
> give httpd (or remote shells) a higher priority. Since this information is
> only available in the administrator's head, it needs help from the
> administrator to handle the situation. The kernel has a good default, but
> it cannot work in every case, this is why we have the ability to renice
> tasks.

And I suspect that if users can push their own jobs, they will. I really
don't think the scheduler is doing the wrong thing, and there is a well
defined way to make the process have higher priority.

This isn't a kernel issue, it's an administration issue.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-03 18:05                   ` Ingo Molnar
@ 2002-09-10 22:58                     ` Tobias Ringstrom
  2002-09-11 21:14                       ` Tobias Ringstrom
  0 siblings, 1 reply; 24+ messages in thread
From: Tobias Ringstrom @ 2002-09-10 22:58 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Alan Cox, Kernel Mailing List

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2032 bytes --]

On Tue, 3 Sep 2002, Ingo Molnar wrote:

> does -10 make it equivalent to the 2.4 behavior? Could you somehow measure
> the priority where it's still acceptable? Ie. -8 or -9?

I've done some more experimenting, and I've found something interesting.  
I've attached two very simple CPU hog programs.

The program latency runs in a tight loop calling gettimeofday, and prints
the loop time if it exceeds 8 ms.  This program simulates a game server,
video decoding program or whatever.

The program hog sleeps for five seconds, and then runs in a tight loop.  
This program simulates a cron job.  This program is always run at the
default nice level (0).

I will now run the latency program at the three different nice levels -20
(high prio), 0 (normal) and 20 (low prio).  A few seconds after latency is 
started, hog is started.  Note that there are no visible latency when hog 
program is started, the latency comes from the loop five seconds after 
the start:

[root@boris Prog]# nice -n -20 ./latency
00:22:16: dt = 608.864 ms
00:22:17: dt = 150.978 ms
00:22:18: dt = 150.983 ms
00:22:19: dt = 150.979 ms
00:22:20: dt = 150.981 ms

[root@boris Prog]# nice -n 0 ./latency
00:22:49: dt = 604.865 ms
00:22:50: dt = 150.966 ms
00:22:50: dt = 150.964 ms
00:22:51: dt = 150.963 ms
00:22:51: dt = 152.981 ms

[root@boris Prog]# nice -n 19 ./latency
00:23:44: dt = 678.848 ms
00:23:44: dt = 150.964 ms
00:23:44: dt = 150.978 ms
00:23:44: dt = 150.978 ms
00:23:45: dt = 150.978 ms

Here we can see that the time slice for hog is stabilized at 150 ms, and
that as the latency program is niced, the hog program gets its time slices
more often.  I think this is what's supposed to happen, but the problem is
the >600 ms timeslice that hog gets when it starts to run.  Comments?

One could also argue that 150 ms is a bit too much.  For video playback at
25 FPS, that means three lost frames.  I do understand the benefits of
long timeslices, of course.  It's a hard choice...

This is on a HZ=1000 2.4.19+sched-2.4.19-rc2-A4 kernel.

/Tobias

[-- Attachment #2: Type: TEXT/PLAIN, Size: 67 bytes --]

#include <unistd.h>
int main()
{
	sleep(5);
	for (;;)
		;
}

[-- Attachment #3: Type: TEXT/PLAIN, Size: 557 bytes --]

#include <stdio.h>
#include <sys/time.h>
#include <time.h>
#include <unistd.h>

double now(void)
{
	struct timeval t;
	gettimeofday(&t, NULL);
	return t.tv_sec + t.tv_usec * 1e-6;
}

int main()
{
	double t0, t, dt, max_dt = 0.0;
	char tbuf[100];
	time_t utc;

	t0 = now();
	for (;;)
	{
		t = now();
		dt = t - t0;
		if (dt > 0.008)
		{
			max_dt = dt;
			time(&utc);
			strftime(tbuf, sizeof(tbuf), "%T", localtime(&utc));
			printf("%s: dt = %.3f ms\n", tbuf, max_dt * 1e3);
			t = now();
		}
		t0 = t;
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-10 22:58                     ` Tobias Ringstrom
@ 2002-09-11 21:14                       ` Tobias Ringstrom
  2002-09-12  8:06                         ` Ingo Molnar
  0 siblings, 1 reply; 24+ messages in thread
From: Tobias Ringstrom @ 2002-09-11 21:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Alan Cox, Kernel Mailing List

On Wed, 11 Sep 2002, Tobias Ringstrom wrote:

> On Tue, 3 Sep 2002, Ingo Molnar wrote:
> 
> > does -10 make it equivalent to the 2.4 behavior? Could you somehow measure
> > the priority where it's still acceptable? Ie. -8 or -9?
> 
> I've done some more experimenting, and I've found something interesting.  
> I've attached two very simple CPU hog programs.

...and now I've done some code study.  I think the following is what 
happens:

1. hog is sleeping, and is interactive
2. latency is running and is non-interactive
3. hog becomes runnable
4. latency is preemted and put on the expired list
5. hog runs uses it's timeslice (151 ms), but sice
   it is interactive it stays on the active list and
   continues to run.
6. after 4/11*2 s = 0.7 s (and a few expired timeslices)
   hog is no longer interactive and is moved to the
   expired list
7. latency runs after a 0.7 s break.

Do you agree?

In other words:  Any nice-0 task that has been sleeping for two seconds or
more will be able to monololize the CPU for up to 0.7 seconds.  Do you
agree that this is a problem, or am I being too narrow-minded?  :-)

/Tobias


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-11 21:14                       ` Tobias Ringstrom
@ 2002-09-12  8:06                         ` Ingo Molnar
  2002-09-12  9:03                           ` Tobias Ringstrom
  0 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2002-09-12  8:06 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Alan Cox, Kernel Mailing List


On Wed, 11 Sep 2002, Tobias Ringstrom wrote:

> In other words:  Any nice-0 task that has been sleeping for two seconds
> or more will be able to monololize the CPU for up to 0.7 seconds.  Do
> you agree that this is a problem, or am I being too narrow-minded?  :-)

well, 'monopolize' the CPU from CPU-hogs - yes. Take the CPU from other
interactive tasks: no.

	Ingo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-12  8:06                         ` Ingo Molnar
@ 2002-09-12  9:03                           ` Tobias Ringstrom
  2002-09-13 12:01                             ` Bill Davidsen
  0 siblings, 1 reply; 24+ messages in thread
From: Tobias Ringstrom @ 2002-09-12  9:03 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Alan Cox, Kernel Mailing List

On Thu, 12 Sep 2002, Ingo Molnar wrote:

> On Wed, 11 Sep 2002, Tobias Ringstrom wrote:
> 
> > In other words:  Any nice-0 task that has been sleeping for two seconds
> > or more will be able to monololize the CPU for up to 0.7 seconds.  Do
> > you agree that this is a problem, or am I being too narrow-minded?  :-)
> 
> well, 'monopolize' the CPU from CPU-hogs - yes. Take the CPU from other
> interactive tasks: no.

(Thanks Ingo for your quick answers!)

I don't mind that interactive processes can take the CPU from CPU hogs,
but I do think that there is room for classification improvements.

A few observations (with suggested solutions):

1. The nice levels are not symmetric.  Compared to a nice 0 process, a
   nice 19 process will get 6% CPU, but compared to a nice -20 process, a
   nice 0 process will get 33 % CPU.  This can be solved by scaling the
   conversion from nice level to priority in a different way.  The
   drawback of this is shorter time slices for nice 0 processes.

2. Nice -20 is really impotent.  In addition to the point above, the
   interactive classification stuff is what makes it really impotent.
   That a nice -20 process loses 0.7 seconds to a nice 0 task says it all.  
   How about making -20 processes interactive unconditionally?

3. More than 90% of all tasks in a system are classified as interactive at
   any given time (since they are sleeping).  For example all cron jobs
   are classified as interactive, which sounds really strange.  IMHO, it's
   a good example of a non-interactive background job.  (I'll run my crond
   at nice 19 for now.)

   I'm curious, why are you using the process average sleep time to
   determine interactiveness and not the presense of prematurely abandoned
   timeslices?

4. Using SCHED_RR is one way out, but I suspect that the busy-loop
   nanosleep implementation for "realtime" processes will lock up the
   machine in my case.  I suggest that the 2 ms limit is removed.  It can
   be done in userspace as a gettimeofday loop for applications which
   care.

I'll continue thinking about this to see if I can come up with something
constructive, but it would be extremely valuable to get your view since
you are the expert and you have been working on this for a long time.

/Tobias


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Problem with the O(1) scheduler in 2.4.19
  2002-09-12  9:03                           ` Tobias Ringstrom
@ 2002-09-13 12:01                             ` Bill Davidsen
  0 siblings, 0 replies; 24+ messages in thread
From: Bill Davidsen @ 2002-09-13 12:01 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Ingo Molnar, Alan Cox, Kernel Mailing List

On Thu, 12 Sep 2002, Tobias Ringstrom wrote:

> 3. More than 90% of all tasks in a system are classified as interactive at
>    any given time (since they are sleeping).  For example all cron jobs
>    are classified as interactive, which sounds really strange.  IMHO, it's
>    a good example of a non-interactive background job.  (I'll run my crond
>    at nice 19 for now.)
> 
>    I'm curious, why are you using the process average sleep time to
>    determine interactiveness and not the presense of prematurely abandoned
>    timeslices?

I'll ask that, too. Not because I doubt you have a good reason, but
because it doesn't jump out at me. I would like the CPU to go to the
process most likely to start an i/o and block, so the CPU hog can run
while the i/o takes place, because that seems to get the highest overlap
of CPU and i/o. I assume the current scheduler that as one of the goal,
clearly not the only one.

A few words of clarification would be educational.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2002-09-13 12:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-01 21:53 Problem with the O(1) scheduler in 2.4.19 Tobias Ringstrom
2002-09-02 13:07 ` Alan Cox
2002-09-02 13:42   ` Tobias Ringstrom
2002-09-02 21:44     ` Tobias Ringstrom
2002-09-03  5:54       ` Ingo Molnar
2002-09-03 10:13         ` Tobias Ringstrom
2002-09-03 10:28           ` Ingo Molnar
2002-09-03 12:23             ` Tobias Ringstrom
2002-09-03 15:58               ` Mark Mielke
2002-09-03 16:58                 ` Tobias Ringstrom
2002-09-03 16:51               ` Ingo Molnar
2002-09-03 17:55                 ` Tobias Ringstrom
2002-09-03 18:05                   ` Ingo Molnar
2002-09-10 22:58                     ` Tobias Ringstrom
2002-09-11 21:14                       ` Tobias Ringstrom
2002-09-12  8:06                         ` Ingo Molnar
2002-09-12  9:03                           ` Tobias Ringstrom
2002-09-13 12:01                             ` Bill Davidsen
2002-09-04  0:34               ` [SOURCE] RT monitor (Was: Re: Problem with the O(1) scheduler in 2.4.19) Roger Larsson
2002-09-03 16:46             ` Problem with the O(1) scheduler in 2.4.19 John Alvord
2002-09-03 17:00               ` Ingo Molnar
2002-09-04 20:14                 ` Bill Davidsen
2002-09-02 13:36 ` Ingo Molnar
2002-09-02 13:54   ` Tobias Ringstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).