* Re: [PATCH]O14int
2003-08-10 8:48 ` [PATCH]O14int Simon Kirby
@ 2003-08-10 9:06 ` Con Kolivas
2003-08-12 17:56 ` [PATCH]O14int Simon Kirby
2003-08-10 10:08 ` [PATCH]O14int William Lee Irwin III
2003-08-10 11:17 ` [PATCH]O14int Mike Galbraith
2 siblings, 1 reply; 16+ messages in thread
From: Con Kolivas @ 2003-08-10 9:06 UTC (permalink / raw)
To: Simon Kirby; +Cc: linux-kernel
On Sun, 10 Aug 2003 18:48, Simon Kirby wrote:
> On Sat, Aug 09, 2003 at 10:36:17AM +1000, Con Kolivas wrote:
> > On Sat, 9 Aug 2003 06:08, Voluspa wrote:
> > > On 2003-08-08 15:49:25 Con Kolivas wrote:
> > > > More duck tape interactivity tweaks
> > >
> > > Do you have a premonition... Game-test goes down in flames. Volatile to
> > > the extent where I can't catch head or tail. It can behave like in
> > > A3-O12.2 or as an unpatched 2.6.0-test2. Trigger badness by switching
> > > to a text console.
> >
> > Ah. There's the answer. You've totally changed the behaviour of the
> > application in question by moving to the text console. No longer is it
> > the sizable cpu hog that it is when it's in the foreground on X, so
> > you've totally changed it's behaviour and how it is treated.
>
> I haven't been following this as closely as I would have liked to
> (recent vacation and all), but I am definitely seeing issues with the
> recent 2.5.x, 2.6.x-testx secheduler code and have been looking over
> these threads.
>
> I don't really understand why these changes were made at all to the
> scheduler. As I understand it, the 2.2.x and older 2.4.x scheduler was
> simple in that it allowed any process to wake up if it had available
> ticks, and would switch to that process if any new event occurred and
> woke it up. The rest was just limiting the ticks based on nice value
> and remembering to switch when the ticks run out.
>
> It seems that newer schedulers are now temporarily postponing the
> waking up of other processes when the running process is running with
> "preemptive" ticks, and that there's all sorts of hacks involved in
> trying to hide the bad effects of this decision.
>
> If this is indeed what is going on, what is the reasoning behind it?
> I didn't really see any problems before with the simple scheduler, so
> it seems to me like this may just be a hack to make poorly-written
> applications seem to be a bit "faster" by starving other processes of
> CPU when the poorly-written applications decide they want to do
> something (such as rendering a page with a large table in Mozilla
> -- grr). Is this really making a large enough difference to be worth
> all of this trouble?
>
> To me it would seem the best algorithm would be what we had before all
> of this started. Isn't it best to switch to a task as soon as an event
> (such as disk I/O finishing or a mouse move waking up X to read mouse
> input) occurs for both latency and cache reasons (queued in LIFO
> order)? DMA may make some this more complicated, I don't know.
>
> I am seeing similar starvation problems that others are seeing in these
> threads. At first it was whenever I clicked a link in Mozilla -- xmms
> would stop, sometimes for a second or so, on a Celeron 466 MHz machine.
> More recently I found that loading a web page consisting of several
> large animated gif images (a security camera web page) caused
> absolutely horrible jerking of mouse and keyboard input in all other
> windows, even when the browser window was minimized or hidden. What's
> worse is the jerking tends to subside if I do a lot of typing or more
> the mouse a lot, probably because I'm changing the scheduler's idea of
> what "kind" of processes are running (which makes this stuff even
> harder to debug).
Is this with or without my changes? The old scheduler was not very scalable;
that's why we moved. The new one has other intrinsic issues that I (and
others) have been trying to address, but is much much more scalable. It was
not possible to make the old one more scalable, but it is possible to make
this one more interactive.
Con
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]O14int
2003-08-10 9:06 ` [PATCH]O14int Con Kolivas
@ 2003-08-12 17:56 ` Simon Kirby
2003-08-12 21:21 ` [PATCH]O14int Con Kolivas
0 siblings, 1 reply; 16+ messages in thread
From: Simon Kirby @ 2003-08-12 17:56 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux-kernel
On Sun, Aug 10, 2003 at 07:06:34PM +1000, Con Kolivas wrote:
> Is this with or without my changes? The old scheduler was not very scalable;
> that's why we moved. The new one has other intrinsic issues that I (and
> others) have been trying to address, but is much much more scalable. It was
> not possible to make the old one more scalable, but it is possible to make
> this one more interactive.
Without your changes. Are you changing the design or just tuning certain
cases? I was talking more about the theory behind the scheduling
decisions and not about particular cases.
The O(1) scheduler changes definitely help scalability and I don't have
any problem with that change (unless it introduced the behavior I'm
talking about).
Simon-
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]O14int
2003-08-12 17:56 ` [PATCH]O14int Simon Kirby
@ 2003-08-12 21:21 ` Con Kolivas
0 siblings, 0 replies; 16+ messages in thread
From: Con Kolivas @ 2003-08-12 21:21 UTC (permalink / raw)
To: Simon Kirby; +Cc: linux-kernel
On Wed, 13 Aug 2003 03:56, Simon Kirby wrote:
> On Sun, Aug 10, 2003 at 07:06:34PM +1000, Con Kolivas wrote:
> > Is this with or without my changes? The old scheduler was not very
> > scalable; that's why we moved. The new one has other intrinsic issues
> > that I (and others) have been trying to address, but is much much more
> > scalable. It was not possible to make the old one more scalable, but it
> > is possible to make this one more interactive.
>
> Without your changes. Are you changing the design or just tuning certain
> cases? I was talking more about the theory behind the scheduling
> decisions and not about particular cases.
I'm just changing the algorithm that gives priority boost or penalty, and
creating code to further feedback into that algorithm.
> The O(1) scheduler changes definitely help scalability and I don't have
> any problem with that change (unless it introduced the behavior I'm
> talking about).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]O14int
2003-08-10 8:48 ` [PATCH]O14int Simon Kirby
2003-08-10 9:06 ` [PATCH]O14int Con Kolivas
@ 2003-08-10 10:08 ` William Lee Irwin III
2003-08-12 18:36 ` [PATCH]O14int Simon Kirby
2003-08-10 11:17 ` [PATCH]O14int Mike Galbraith
2 siblings, 1 reply; 16+ messages in thread
From: William Lee Irwin III @ 2003-08-10 10:08 UTC (permalink / raw)
To: Simon Kirby; +Cc: Con Kolivas, linux-kernel
On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> I haven't been following this as closely as I would have liked to
> (recent vacation and all), but I am definitely seeing issues with the
> recent 2.5.x, 2.6.x-testx secheduler code and have been looking over
> these threads.
> I don't really understand why these changes were made at all to the
> scheduler. As I understand it, the 2.2.x and older 2.4.x scheduler was
> simple in that it allowed any process to wake up if it had available
> ticks, and would switch to that process if any new event occurred and
> woke it up. The rest was just limiting the ticks based on nice value
> and remembering to switch when the ticks run out.
Most of this isn't of much concern; most of the 2.4.x semantics have
largely been carried over to 2.6.x with algorithmic improvements, apart
from the same-mm heuristic (which was of dubious value anyway). Even
epochs are still there in the form of the duelling arrays, which
renders the thing vaguely timeout-based like 2.4.x.
On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> It seems that newer schedulers are now temporarily postponing the
> waking up of other processes when the running process is running with
> "preemptive" ticks, and that there's all sorts of hacks involved in
> trying to hide the bad effects of this decision.
If this would deliberate it would be a "selfish" scheduling algorithm,
where the delay in preemptively capturing the cpu is a number of ticks
equal to whatever the value of beta/alpha was chosen to be, and some
raw scheduling algorithm is used otherwise unaltered for those tasks in
the service box. I see no evidence of such an organization (it'd be
really obvious, as a queue box and service box would need to exist),
hence this is probably just something in need of a performance tweak
if it's a real problem.
On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> If this is indeed what is going on, what is the reasoning behind it?
> I didn't really see any problems before with the simple scheduler, so
> it seems to me like this may just be a hack to make poorly-written
> applications seem to be a bit "faster" by starving other processes of
> CPU when the poorly-written applications decide they want to do
> something (such as rendering a page with a large table in Mozilla
> -- grr). Is this really making a large enough difference to be worth
> all of this trouble?
Yes. The SMP issues addressed by the algorithmic improvements in the
scheduler are performance issues so severe, they may safely be called
functional issues.
On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> To me it would seem the best algorithm would be what we had before all
> of this started. Isn't it best to switch to a task as soon as an event
> (such as disk I/O finishing or a mouse move waking up X to read mouse
> input) occurs for both latency and cache reasons (queued in LIFO
> order)? DMA may make some this more complicated, I don't know.
This sounds like either LCFS or FB. FB's not usable out of the box for
long-running tasks, as its context switch rates are excessive there.
LCFS has some rather undesirable properties that render it unsuitable
for general purpose operating systems. Something like multilevel
processor sharing would be a much better alternative, as long-running
tasks can be classified and scheduled according to a more appropriate
discipline with a lower context switch rate while maintaining the
(essentially infinitely) strong preference for short-running tasks.
On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> I am seeing similar starvation problems that others are seeing in these
> threads. At first it was whenever I clicked a link in Mozilla -- xmms
> would stop, sometimes for a second or so, on a Celeron 466 MHz machine.
> More recently I found that loading a web page consisting of several
> large animated gif images (a security camera web page) caused
> absolutely horrible jerking of mouse and keyboard input in all other
> windows, even when the browser window was minimized or hidden. What's
> worse is the jerking tends to subside if I do a lot of typing or more
> the mouse a lot, probably because I'm changing the scheduler's idea of
> what "kind" of processes are running (which makes this stuff even
> harder to debug).
One problem with these kinds of reports is that they aren't coming with
enough information to determine if the scheduler truly is the cause of
the problem, and worse yet, assuming the scheduler did cause these
problems, this isn't enough actual information to address it. We're
going to need proper instrumentation at some point here.
Until then, when you deliver these reports, could you do the following:
(a) vmstat 1 | cat -n | tee -a vmstat.log
(b) run top under script
(c) regularly snapshot profiles with
n=1
while true
do
readprofile -n -m /boot/System.map-`uname -r` \
| sort -k 2,2 > prof.$n
n=`expr $n + 1`
sleep 1
done
while running interactivity tests?
(a) will give some moderately useful information about how much io is
going on and interrupt and context switch rates.
(b) will report dynamic priorities and other general conditions so the
scheduler's decisions can be examined.
(c) will determine if the issue is due to in-kernel algorithms consuming
excessive amounts of cpu and causing application-level latency
issues via cpu burn
Also, send in bootlogs (dmesg), so that general information about the
system can be communicated.
-- wli
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]O14int
2003-08-10 10:08 ` [PATCH]O14int William Lee Irwin III
@ 2003-08-12 18:36 ` Simon Kirby
0 siblings, 0 replies; 16+ messages in thread
From: Simon Kirby @ 2003-08-12 18:36 UTC (permalink / raw)
To: William Lee Irwin III, Con Kolivas, linux-kernel
On Sun, Aug 10, 2003 at 03:08:36AM -0700, William Lee Irwin III wrote:
> Most of this isn't of much concern; most of the 2.4.x semantics have
> largely been carried over to 2.6.x with algorithmic improvements, apart
> from the same-mm heuristic (which was of dubious value anyway). Even
> epochs are still there in the form of the duelling arrays, which
> renders the thing vaguely timeout-based like 2.4.x.
Hmm. I admit I haven't read the code enough to understand really what is
going on -- I'm just guessing how it is working (and how it did work)
based on experiences I've had with it over the years.
> On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> > It seems that newer schedulers are now temporarily postponing the
> > waking up of other processes when the running process is running with
> > "preemptive" ticks, and that there's all sorts of hacks involved in
> > trying to hide the bad effects of this decision.
>
> If this would deliberate it would be a "selfish" scheduling algorithm,
> where the delay in preemptively capturing the cpu is a number of ticks
> equal to whatever the value of beta/alpha was chosen to be, and some
> raw scheduling algorithm is used otherwise unaltered for those tasks in
> the service box. I see no evidence of such an organization (it'd be
> really obvious, as a queue box and service box would need to exist),
> hence this is probably just something in need of a performance tweak
> if it's a real problem.
Perhaps I should read the code to see what is actually going on (though
it is now fairly complex), but it definitely feels like this is
happening. Why else would my keystrokes to an otherwise-idle rxvt be
delayed while my browser is rendering a page? I suppose there may be
interactions with X. This never used to happen, however.
The simple question: Does the scheduler ever intend to delay a context
switch to a process (which has been idle long enough to rebuild its
maximum timeslice) when a wake up event occurs? If so, what is the
reasoning for this?
> > If this is indeed what is going on, what is the reasoning behind it?
> > I didn't really see any problems before with the simple scheduler, so
> > it seems to me like this may just be a hack to make poorly-written
> > applications seem to be a bit "faster" by starving other processes of
> > CPU when the poorly-written applications decide they want to do
> > something (such as rendering a page with a large table in Mozilla
> > -- grr). Is this really making a large enough difference to be worth
> > all of this trouble?
>
> Yes. The SMP issues addressed by the algorithmic improvements in the
> scheduler are performance issues so severe, they may safely be called
> functional issues.
Obviously the scheduler O(1) changes and other scalability improvements
are worthwhile, but I don't think (unless I'm missing something) they
explain the problem I'm seeing.
> On Sun, Aug 10, 2003 at 01:48:27AM -0700, Simon Kirby wrote:
> > To me it would seem the best algorithm would be what we had before all
> > of this started. Isn't it best to switch to a task as soon as an event
> > (such as disk I/O finishing or a mouse move waking up X to read mouse
> > input) occurs for both latency and cache reasons (queued in LIFO
> > order)? DMA may make some this more complicated, I don't know.
>
> This sounds like either LCFS or FB. FB's not usable out of the box for
> long-running tasks, as its context switch rates are excessive there.
> LCFS has some rather undesirable properties that render it unsuitable
> for general purpose operating systems. Something like multilevel
> processor sharing would be a much better alternative, as long-running
> tasks can be classified and scheduled according to a more appropriate
> discipline with a lower context switch rate while maintaining the
> (essentially infinitely) strong preference for short-running tasks.
What makes the context switches excessive? As far as I can see, the
only thing that can initiate a context switch are a process sleeping or
finishing, a timer tick and the scheduler deciding to switch, or a device
causing a wake up event. I was also wondering: Isn't it best to always
switch to the process which has just had an event for cache coherency?
> > I am seeing similar starvation problems that others are seeing in these
> > threads. At first it was whenever I clicked a link in Mozilla -- xmms
> > would stop, sometimes for a second or so, on a Celeron 466 MHz machine.
> > More recently I found that loading a web page consisting of several
> > large animated gif images (a security camera web page) caused
> > absolutely horrible jerking of mouse and keyboard input in all other
> > windows, even when the browser window was minimized or hidden. What's
> > worse is the jerking tends to subside if I do a lot of typing or more
> > the mouse a lot, probably because I'm changing the scheduler's idea of
> > what "kind" of processes are running (which makes this stuff even
> > harder to debug).
>
> One problem with these kinds of reports is that they aren't coming with
> enough information to determine if the scheduler truly is the cause of
> the problem, and worse yet, assuming the scheduler did cause these
> problems, this isn't enough actual information to address it. We're
> going to need proper instrumentation at some point here.
I can do this, but I'm not seeing inefficiency, I'm seeing large decision
problems. If the context switches were up in the hundreds of thousands
or higher, I would understand, but they're in the low hundreds. Isn't
top far too slow to figure out what is actually going on? Also, kernel
time is less than 10 percent, so I don't think kernel profiles will help.
Maybe I'm dreaming, but shouldn't the scheduler be simple enough so that
it can be considered "obviously correct"? ...Or close to that? :)
Simon-
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]O14int
2003-08-10 8:48 ` [PATCH]O14int Simon Kirby
2003-08-10 9:06 ` [PATCH]O14int Con Kolivas
2003-08-10 10:08 ` [PATCH]O14int William Lee Irwin III
@ 2003-08-10 11:17 ` Mike Galbraith
2003-08-11 18:19 ` [PATCH]O14int [SCHED_SOFTRR please] Roger Larsson
[not found] ` <200308120226.35580.roger.larsson@skelleftea.mail.telia.com >
2 siblings, 2 replies; 16+ messages in thread
From: Mike Galbraith @ 2003-08-10 11:17 UTC (permalink / raw)
To: Simon Kirby; +Cc: Con Kolivas, linux-kernel
At 01:48 AM 8/10/2003 -0700, Simon Kirby wrote:
>On Sat, Aug 09, 2003 at 10:36:17AM +1000, Con Kolivas wrote:
>
> > On Sat, 9 Aug 2003 06:08, Voluspa wrote:
> > > On 2003-08-08 15:49:25 Con Kolivas wrote:
> > > > More duck tape interactivity tweaks
> > >
> > > Do you have a premonition... Game-test goes down in flames. Volatile to
> > > the extent where I can't catch head or tail. It can behave like in
> > > A3-O12.2 or as an unpatched 2.6.0-test2. Trigger badness by switching to
> > > a text console.
> >
> > Ah. There's the answer. You've totally changed the behaviour of the
> > application in question by moving to the text console. No longer is it the
> > sizable cpu hog that it is when it's in the foreground on X, so you've
> > totally changed it's behaviour and how it is treated.
>
>I haven't been following this as closely as I would have liked to
>(recent vacation and all), but I am definitely seeing issues with the
>recent 2.5.x, 2.6.x-testx secheduler code and have been looking over
>these threads.
>
>I don't really understand why these changes were made at all to the
>scheduler. As I understand it, the 2.2.x and older 2.4.x scheduler was
>simple in that it allowed any process to wake up if it had available
>ticks, and would switch to that process if any new event occurred and
>woke it up. The rest was just limiting the ticks based on nice value
>and remembering to switch when the ticks run out.
>
>It seems that newer schedulers are now temporarily postponing the
>waking up of other processes when the running process is running with
>"preemptive" ticks, and that there's all sorts of hacks involved in
>trying to hide the bad effects of this decision.
I don't see this as a bad decision at all, it's just that there are some
annoying cases where the deliberate starvation which works nicely in my
favor for both interactivity and throughput in most cases can and does kick
my ass in others. This is nothing new. I have no memory of the scheduler
ever being perfect (0.96->today). This scheduler is very nice to me; it's
very simple, it's generally highly effective, and it's easily
tweakable. It just has some irritating rough edges.
>If this is indeed what is going on, what is the reasoning behind it?
>I didn't really see any problems before with the simple scheduler, so
>it seems to me like this may just be a hack to make poorly-written
>applications seem to be a bit "faster" by starving other processes of
>CPU when the poorly-written applications decide they want to do
>something (such as rendering a page with a large table in Mozilla
>-- grr). Is this really making a large enough difference to be worth
>all of this trouble?
>
>To me it would seem the best algorithm would be what we had before all
>of this started. Isn't it best to switch to a task as soon as an event
>(such as disk I/O finishing or a mouse move waking up X to read mouse
>input) occurs for both latency and cache reasons (queued in LIFO
>order)? DMA may make some this more complicated, I don't know.
Hmm. If a mouse event happened to be queued but not yet run when a slew of
disk events arrived, LIFO would immediately suck. LIFO may be good for the
cache, but it doesn't seem like it could be good for average
latency. Other than that, what you describe is generally what
happens. Tasks which are waiting for hardware a lot rapidly attain a very
high priority, and preempt whoever happened to service the interrupt
(waker) almost instantly. I'd have to look closer at the old scheduler to
be sure, but I don't think there's anything much different between old/new
handling.
>I am seeing similar starvation problems that others are seeing in these
>threads. At first it was whenever I clicked a link in Mozilla -- xmms
>would stop, sometimes for a second or so, on a Celeron 466 MHz machine.
Do you see this with test-X and Ingo's latest changes too? I can only
imagine one scenario off the top of my head where this could happen; if
xmms exhausted a slice while STARVATION_LIMIT is exceeded, it could land in
the expired array and remain unserviced for the period of time it takes for
all tasks remaining in the active array to exhaust their slices. Seems
like that should be pretty rare though.
-Mike
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH]O14int [SCHED_SOFTRR please]
2003-08-10 11:17 ` [PATCH]O14int Mike Galbraith
@ 2003-08-11 18:19 ` Roger Larsson
2003-08-11 21:53 ` Con Kolivas
[not found] ` <200308120226.35580.roger.larsson@skelleftea.mail.telia.com >
1 sibling, 1 reply; 16+ messages in thread
From: Roger Larsson @ 2003-08-11 18:19 UTC (permalink / raw)
To: linux-kernel
On Sunday 10 August 2003 13.17, Mike Galbraith wrote:
> At 01:48 AM 8/10/2003 -0700, Simon Kirby wrote:
> >I am seeing similar starvation problems that others are seeing in these
> >threads. At first it was whenever I clicked a link in Mozilla -- xmms
> >would stop, sometimes for a second or so, on a Celeron 466 MHz machine.
>
> Do you see this with test-X and Ingo's latest changes too? I can only
> imagine one scenario off the top of my head where this could happen; if
> xmms exhausted a slice while STARVATION_LIMIT is exceeded, it could land in
> the expired array and remain unserviced for the period of time it takes for
> all tasks remaining in the active array to exhaust their slices. Seems
> like that should be pretty rare though.
>
xmms is a RT process - it does not really have interactivity problems...
It will be extremely hard to fix this in a generic scheduler, instead
let xmms be the RT process it is with SCHED_SOFTRR (or whatever
it will be named).
Do this for arts, and other audio/video path applications.
Then start the race for interactivity tuning
(X, X applications, console, login, etc)
interactivity = two-way
http://www.m-w.com/cgi-bin/dictionary?va=interactive
Listening to music is not interactive.
Changing equalization on a media playback need to be interactive in
two ways.
1) The slider should move in the GUI.
2) The volume should change, but the big buffers needed in todays audio path
will delay the audible changes...
Note: audio path starvation is not one of them...
/RogerL
--
Roger Larsson
Skellefteå
Sweden
^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <200308120226.35580.roger.larsson@skelleftea.mail.telia.com >]
* Re: What is interactivity? Re: [PATCH]O14int [SCHED_SOFTRR please]
[not found] ` <200308120226.35580.roger.larsson@skelleftea.mail.telia.com >
@ 2003-08-12 5:40 ` Mike Galbraith
2003-08-12 15:29 ` Timothy Miller
2003-08-13 1:43 ` Rob Landley
0 siblings, 2 replies; 16+ messages in thread
From: Mike Galbraith @ 2003-08-12 5:40 UTC (permalink / raw)
To: Roger Larsson; +Cc: linux-kernel
At 02:26 AM 8/12/2003 +0200, Roger Larsson wrote:
>On Monday 11 August 2003 21.46, Mike Galbraith wrote:
> > At 08:19 PM 8/11/2003 +0200, Roger Larsson wrote:
> > >On Sunday 10 August 2003 13.17, Mike Galbraith wrote:
> > > > At 01:48 AM 8/10/2003 -0700, Simon Kirby wrote:
> > > > >I am seeing similar starvation problems that others are seeing in
> > > > > these threads. At first it was whenever I clicked a link in Mozilla
> > > > > -- xmms would stop, sometimes for a second or so, on a Celeron 466
> > > > > MHz machine.
> > > >
> > > > Do you see this with test-X and Ingo's latest changes too? I can only
> > > > imagine one scenario off the top of my head where this could happen; if
> > > > xmms exhausted a slice while STARVATION_LIMIT is exceeded, it could
> > > > land in the expired array and remain unserviced for the period of time
> > > > it takes for all tasks remaining in the active array to exhaust their
> > > > slices. Seems like that should be pretty rare though.
> > >
> > >xmms is a RT process - it does not really have interactivity problems...
> > >It will be extremely hard to fix this in a generic scheduler, instead
> > >let xmms be the RT process it is with SCHED_SOFTRR (or whatever
> > >it will be named).
> > >Do this for arts, and other audio/video path applications.
> >
> > (For the scenario described, it doesn't matter what scheduler policy is
> > used)
>
>It matters if the SOFTRR processes are well behaved, they will get their share
>as long as _they_ do not overuse CPU.
>
>Suppose you have xmms running SOFTRR. Whatever you do that is not SOFTRR
>(or higher SCHED_FIFO, SCHED_RR) can't touch is scheduler wice.
>It will remain SOFTRR and will not run out of its timeslice unless it uses
>too
>much CPU - its timeslice is refilled immediately whenever it gets empty (it
>is put last on the SOFTRR run queue - not in the expired array...)
Yup, brainfart on my part. Realtime tasks are immune.
>But if it SOFTRR processes has used too much CPU there are no guarantees.
>
> >
> > >Then start the race for interactivity tuning
> > > (X, X applications, console, login, etc)
> > >
> > >interactivity = two-way
> > > http://www.m-w.com/cgi-bin/dictionary?va=interactive
> > >
> > >Listening to music is not interactive.
> >
> > ?!? <tilt> What makes you say that? What in the world am I doing when I
> > fire up xmms?
> > --- snip ---
>
>You expect sound to start soon - that is the interactive behaviour.
>
>Suppose xmms starts after four seconds and then won't miss a beat.
>Compare with if it starts after ten seconds and then won't miss a beat.
>If you relate each frame to the start action then you will see that _every_
>frame in the first case is one second late, and in the second case ten
>seconds late. (Best possible interactivity would be an immediate start -
>don't
>you agree?)
>
>xmms is interactive if you see the audioboard as the second part.
>But I think that if we could concentrate on human users the problem will
>become easier. If I leave home while compiling KDE and playing audio with
>xmms
>- is xmms still interactive? (this will be hard to fix but it is not
>impossible, someone (on a MAC I think) have done a application that logged in
>when you arrived with your bluetooth device and logged off when you left)
If I leave the room, or even become distracted enough, xmms ceases to be
interactive.
>"make all" - interactive? It depends on my expectations, my expectations
>depends on how big the _total task_ is.
If you're watching it, I'd call it interactive. I see no difference
between watching a movie and watching compiler output scroll by.
>* If it is run from a shell script - like the kde-build I have in the
> background right now. No way!
Agreed. If you're not watching the output scroll by, it's not interactive.
>* If it is my kdeveloper test project ("Hello world" for remote debugging).
> Yes it is! I waiting for it and expect it to be ready NOW.
>
>make bzImage - total rebuild, Not interactive - I expect to be able to get a
>cup of coffe while waiting.
>make bzImage - one .c file changed, interactive
Well, interactivity can certainly be viewed like one of those tricky
philosophy questions (bears farting in the woods, trees falling over etc;),
but I consider any task which is connected to a human via any of our senses
to be interactive. Perhaps it's not a 100% accurate use of the term, but
for lack of a better term...
>I think that the work done this far is great. It is great that the scheduler
>almost can handle xmms under all kinds of loads - but enough is enough.
I don't care if xmms skips or my mouse pointer stalls while I'm testing at
the heavy end of the load scale, you flat can't have low latency and max
throughput at the same time. If xmms skips and the mouse becomes sticks at
less than "heavy" though, something is wrong (defining heavy is one of
those tricky judgement calls). It's the mozilla loading a webpage type of
reports that I worry about.
-Mike
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: What is interactivity? Re: [PATCH]O14int [SCHED_SOFTRR please]
2003-08-12 5:40 ` What is interactivity? " Mike Galbraith
@ 2003-08-12 15:29 ` Timothy Miller
2003-08-13 1:43 ` Rob Landley
1 sibling, 0 replies; 16+ messages in thread
From: Timothy Miller @ 2003-08-12 15:29 UTC (permalink / raw)
To: Mike Galbraith; +Cc: Roger Larsson, linux-kernel
Mike Galbraith wrote:
>
> If I leave the room, or even become distracted enough, xmms ceases to be
> interactive.
>
xmms skipping can be very distracting. A small skip in the background
brain stimulus can cause a major skip in the foreground concentration.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: What is interactivity? Re: [PATCH]O14int [SCHED_SOFTRR please]
2003-08-12 5:40 ` What is interactivity? " Mike Galbraith
2003-08-12 15:29 ` Timothy Miller
@ 2003-08-13 1:43 ` Rob Landley
1 sibling, 0 replies; 16+ messages in thread
From: Rob Landley @ 2003-08-13 1:43 UTC (permalink / raw)
To: Mike Galbraith, Roger Larsson; +Cc: linux-kernel
On Tuesday 12 August 2003 01:40, Mike Galbraith wrote:
> Well, interactivity can certainly be viewed like one of those tricky
> philosophy questions (bears farting in the woods, trees falling over etc;),
> but I consider any task which is connected to a human via any of our senses
> to be interactive. Perhaps it's not a 100% accurate use of the term, but
> for lack of a better term...
"Interactivity" is being used as a proxy for at least two different
conditions: smooth spooling and snappy response to (possibly repeated)
asynchronous wakeups.
The smooth spooler problem is where you're trying to input or output stuff at
a constant rate, somewhere below your theoretical maximum capacity. Sound
output is like this. Whether you're listening or not, the tree in the forest
still falls. A skip is a skip, the output could be being recorded to tape or
who knows what. Correctness here is emprical; if it skips something went
wrong.
Sound is just one example, and a relatively easy one since the CPU
requirements are so low on modern machines. Personal Video Recorders ala
Tivo are a more demanding application (often coming perilously close to your
memory or disk bandwidth capacity), and skips or dropouts are saved for
posterity there. A human doesn't even have to be in the room, that task is
still "interactive".
Repeated asynchronous wakeups come from typing on the keyboard and wiggling
the mouse. If your mouse is dragging a window, the asynchronous wakeups
could provoke a lot of CPU activity.
The difference between these two is that they are different types of waits.
Smooth spooling involves waiting for a known period of time, and being woken
up by a timer. Asynchronous wakeups come out of the blue, the application
has know way of knowing the mouse is about to move or the keyboard is about
to press until it happens.
(Some things combine these behaviors. First person shooters (30 frames per
second, plus responding to the joystick NOW), but that kind of thing could
also collapse into the smooth spooler case if the frame rate's high enough
and polling for input is cheap...)
True CPU hogs do block, but they only block when they're requesting more work.
Any read or write to a block device is a "request more work" type of block,
for example. If the block device gets faster, the app runs faster.
With a CPU hog, there is no system so powerful that this thing won't try to
speed to completion as fast as it can. With an "interactive" task, the speed
of the system is not the limiting factor (or at least shouldn't be).
Now there's a lot of fuzzy bits where you can't tell what kind of block you're
doing. Blocking on the network, blocking on pipes, etc. Could be anything.
But I think it's pretty safe to say that a timer is always an interactive
wait, and a block device never is. (And considering that the I/O scheduler
and the CPU scheduler may have to work together in the future to make things
like the anticipatory schedulerwork properly, it shouldn't be TOO much of a
stretch to distinguish between waiting on a block device and waiting on
something else...)
> >I think that the work done this far is great. It is great that the
> > scheduler almost can handle xmms under all kinds of loads - but enough is
> > enough.
>
> I don't care if xmms skips or my mouse pointer stalls while I'm testing at
> the heavy end of the load scale,
I do. I believe you're in the minority here.
> you flat can't have low latency and max
> throughput at the same time.
If you're talking about keeping your cache hot, I agree. But a lot of times,
minimizing latency DOES help throughput. (Anticipatory scheduler, case in
point. :)
What you're saying is that you want your CPU hog loads to complete as quickly
as possible at the expense of smooth mouse movement. This is what "nice" is
for, isn't it? (If you've got a dedicated, throughput-optimized server
running X in the first place, you have more fundamental problems.)
And your uber-optimized configuration is still going to lose out to an
unoptimized configuration running on hardware that's three months newer... :)
The linux-kernel gurus focused their optimizations almost exclusively on
throughput for almost the first full decade of kernel development.
Interactive latency started explicitly showing up as a concern in 2.4, and
has only really become a priority in 2.5. There are a few tradeoffs, but
some of them are a bit overdue if you ask me.
If you can document a throughput degredation and give a repeatable benchmark,
I'm sure Con and Ingo will be thrilled to address it. A lot of contest is
about throughput, you know. They're trying very hard to avoid regressions...
> If xmms skips and the mouse becomes sticks at
> less than "heavy" though, something is wrong (defining heavy is one of
> those tricky judgement calls).
You know, I used to beat OS/2 to DEATH, and the mouse never went funky on me.
(Of course the mouse was updated directly from an interrupt routine in kernel
memory that never swapped out. But still... :)
> It's the mozilla loading a webpage type of reports that I worry about.
It could be worse. It could be OpenOffice. :)
> -Mike
Rob
^ permalink raw reply [flat|nested] 16+ messages in thread