linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [ANNOUNCE] RSDL completely fair starvation free interactive  cpu scheduler
@ 2007-03-05  9:45 Nicolas Mailhot
  2007-03-05  9:53 ` Gene Heskett
  0 siblings, 1 reply; 69+ messages in thread
From: Nicolas Mailhot @ 2007-03-05  9:45 UTC (permalink / raw)
  To: linux-kernel

This looks like -mm stuff if you want it in 2.6.22

-- 
Nicolas Mailhot


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-05  9:45 [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler Nicolas Mailhot
@ 2007-03-05  9:53 ` Gene Heskett
  2007-03-05 10:00   ` Nicolas Mailhot
                     ` (3 more replies)
  0 siblings, 4 replies; 69+ messages in thread
From: Gene Heskett @ 2007-03-05  9:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: Nicolas Mailhot

On Monday 05 March 2007, Nicolas Mailhot wrote:
>This looks like -mm stuff if you want it in 2.6.22

This needs to get to 2.6.21, it really is that big an improvement.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive  cpu scheduler
  2007-03-05  9:53 ` Gene Heskett
@ 2007-03-05 10:00   ` Nicolas Mailhot
  2007-03-05 15:22   ` Paolo Ciarrocchi
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 69+ messages in thread
From: Nicolas Mailhot @ 2007-03-05 10:00 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel


Le Lun 5 mars 2007 10:53, Gene Heskett a écrit :
> On Monday 05 March 2007, Nicolas Mailhot wrote:
>>This looks like -mm stuff if you want it in 2.6.22
>
> This needs to get to 2.6.21, it really is that big an improvement.

One can dream...
I suspect Linus will disagree, especially if it never was in mm first

-- 
Nicolas Mailhot


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-05  9:53 ` Gene Heskett
  2007-03-05 10:00   ` Nicolas Mailhot
@ 2007-03-05 15:22   ` Paolo Ciarrocchi
  2007-03-05 18:37     ` Gene Heskett
  2007-03-05 18:20   ` Lee Revell
  2007-03-06 17:50   ` Bill Davidsen
  3 siblings, 1 reply; 69+ messages in thread
From: Paolo Ciarrocchi @ 2007-03-05 15:22 UTC (permalink / raw)
  To: Gene Heskett, Andrew Morton; +Cc: linux-kernel, Nicolas Mailhot

On 3/5/07, Gene Heskett <gene.heskett@gmail.com> wrote:
> On Monday 05 March 2007, Nicolas Mailhot wrote:
> >This looks like -mm stuff if you want it in 2.6.22
>
> This needs to get to 2.6.21, it really is that big an improvement.

On 3/5/07, Gene Heskett <gene.heskett@gmail.com> wrote:
> On Monday 05 March 2007, Nicolas Mailhot wrote:
> >This looks like -mm stuff if you want it in 2.6.22
>
> This needs to get to 2.6.21, it really is that big an improvement.

I didn't try the new cpu scheduler but 2.6.21 is in bug fix only mode
so I don't see how this stuff can go in.

Maybe Andrew can comment about having it in -mm?

Ciao,
-- 
Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-05  9:53 ` Gene Heskett
  2007-03-05 10:00   ` Nicolas Mailhot
  2007-03-05 15:22   ` Paolo Ciarrocchi
@ 2007-03-05 18:20   ` Lee Revell
  2007-03-05 19:19     ` Gene Heskett
  2007-03-06 17:50   ` Bill Davidsen
  3 siblings, 1 reply; 69+ messages in thread
From: Lee Revell @ 2007-03-05 18:20 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Nicolas Mailhot

On 3/5/07, Gene Heskett <gene.heskett@gmail.com> wrote:
> On Monday 05 March 2007, Nicolas Mailhot wrote:
> >This looks like -mm stuff if you want it in 2.6.22
>
> This needs to get to 2.6.21, it really is that big an improvement.

You can probably speed things up by regression testing against a wide
range of non-desktop workloads and posting your numbers...

Lee

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-05 15:22   ` Paolo Ciarrocchi
@ 2007-03-05 18:37     ` Gene Heskett
  0 siblings, 0 replies; 69+ messages in thread
From: Gene Heskett @ 2007-03-05 18:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Paolo Ciarrocchi, Andrew Morton, Nicolas Mailhot

On Monday 05 March 2007, Paolo Ciarrocchi wrote:
>On 3/5/07, Gene Heskett <gene.heskett@gmail.com> wrote:
>> On Monday 05 March 2007, Nicolas Mailhot wrote:
>> >This looks like -mm stuff if you want it in 2.6.22
>>
>> This needs to get to 2.6.21, it really is that big an improvement.
>
>On 3/5/07, Gene Heskett <gene.heskett@gmail.com> wrote:
>> On Monday 05 March 2007, Nicolas Mailhot wrote:
>> >This looks like -mm stuff if you want it in 2.6.22
>>
>> This needs to get to 2.6.21, it really is that big an improvement.
>
>I didn't try the new cpu scheduler but 2.6.21 is in bug fix only mode
>so I don't see how this stuff can go in.
>
>Maybe Andrew can comment about having it in -mm?
>
>Ciao,

In that case, I hope that Con can update the patch to apply to 2.6.21 when 
final is out.  In the meantime I've not had usable luck with -rc1 
nor -rc2  rc1 didn't want to boot after uncompressing, and rc2 is about 
as fast as a 3 day old carcass.  And beating on it doesn't help, the 
stalls are 30+ seconds at a time.  And nothing in the logs.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-05 18:20   ` Lee Revell
@ 2007-03-05 19:19     ` Gene Heskett
  2007-03-05 22:40       ` Andrew Morton
  2007-03-06  2:23       ` Ed Tomlinson
  0 siblings, 2 replies; 69+ messages in thread
From: Gene Heskett @ 2007-03-05 19:19 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Linus Torvalds; +Cc: Lee Revell, Nicolas Mailhot

On Monday 05 March 2007, Lee Revell wrote:
>On 3/5/07, Gene Heskett <gene.heskett@gmail.com> wrote:
>> On Monday 05 March 2007, Nicolas Mailhot wrote:
>> >This looks like -mm stuff if you want it in 2.6.22
>>
>> This needs to get to 2.6.21, it really is that big an improvement.
>
>You can probably speed things up by regression testing against a wide
>range of non-desktop workloads and posting your numbers...
>
>Lee

I'm not THAT much of a tester other than seeing how it does my personal 
workload, such as working over an image in Gimp, and then printing it, 
while I'm reading and reply to email, browsing the web or playing 
solitaire.  At all of this is marches along rather nicely, and generally 
unaffected by the fetchmail/procmail/spamc loop every 90 seconds in the 
background.  glxgears runs about 1190 fps here, and loses maybe 4 or 5 
frames when the system beeps at me to indicate a new mail has arrived.  
Strangely, the wah wah wah wah it plays for a spam sent to trash doesn't 
bother glxgears any more then the 1/3 second beep.

I might add that its a bit puzzling to me, that when these pregnant pauses 
occur without the patch, gkrellm cpu usage, at an update frequency of 1 
second granularity, doesn't show ANY change, or so little it can't 
possibly explain why I'm sitting here for 10 seconds or more, waiting for 
what I've typed to actually show up on screen.  I should see a spike in 
one of the three cpu usage windows, but the whole thing is sitting at 2% 
total cpu while my application is starved and frozen for 10+ seconds at a 
time. 

That in itself tells this unwashed user that the scheduler is spinning its 
wheel somewhere as it exists today without this patch.  With this patch, 
those pauses are very small fractions of a second.  Just barely 
detectable.  And, system or user activity now properly registers in the 
gkrellm display, going up to about 15% user if I stand on the left arrow 
to backspace the cursor here in kmails composer screen.

To me, this plays great in Peoria, put it in ASAP and the users will bow 
at your feet to pay homage to Con for submitting it.

I am having a hard time figuring out why Con has thrown patches that might 
have helped over the fence only to have them go splat in the night.  When 
Con gets frustrated enough to post about it, everybody seems to turn a 
deaf ear, like he's committed some cardinal sin and I don't understand 
why or what.

This list is about development, supposedly in an orderly fashion, but 
occasionally a patch comes along that's so obviously correct it deserves 
to be fast tracked, and this is one of those.  To me this is as important 
as the filesystem corruption that was chased all the way thru from around 
2.6.17 to the middle of the 20-rc stuff.  Nobody wasted any time getting 
that into the next rc when it was finally found.  Fortunately I didn't 
get bit, possibly due to my habit of restarting azureas to seed, and I've 
seen it re-suck a few pieces to correct itself more than once.

Andrew, please, get this one in ASAP, but promise me an -mm won't trash 
half my filesystems like one I tried 2-3 years ago did.  Its a shame when 
Con submits a patch, and only 2 people post their experiences from using 
it.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-05 19:19     ` Gene Heskett
@ 2007-03-05 22:40       ` Andrew Morton
  2007-03-05 23:19         ` Gene Heskett
  2007-03-06  2:23       ` Ed Tomlinson
  1 sibling, 1 reply; 69+ messages in thread
From: Andrew Morton @ 2007-03-05 22:40 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Linus Torvalds, Lee Revell, Nicolas Mailhot

On Mon, 05 Mar 2007 14:19:25 -0500
Gene Heskett <gene.heskett@gmail.com> wrote:

> Andrew, please, get this one in ASAP,

I'm presently nearly 1000 messages behind on my lkml reading.  We'll get
there.

> but promise me an -mm won't trash 
> half my filesystems like one I tried 2-3 years ago did.

I can't.  -mm is crap at present.  Well.  Mainline is crap at present, and
-mm is crap^2.  I think I might be about to throw vast amounts of code
overboard.



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-05 22:40       ` Andrew Morton
@ 2007-03-05 23:19         ` Gene Heskett
  0 siblings, 0 replies; 69+ messages in thread
From: Gene Heskett @ 2007-03-05 23:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Linus Torvalds, Lee Revell, Nicolas Mailhot

On Monday 05 March 2007, Andrew Morton wrote:
>On Mon, 05 Mar 2007 14:19:25 -0500
>
>Gene Heskett <gene.heskett@gmail.com> wrote:
>> Andrew, please, get this one in ASAP,
>
>I'm presently nearly 1000 messages behind on my lkml reading.  We'll get
>there.
>
>> but promise me an -mm won't trash
>> half my filesystems like one I tried 2-3 years ago did.
>
>I can't.  -mm is crap at present.  Well.  Mainline is crap at present,
> and -mm is crap^2.  I think I might be about to throw vast amounts of
> code overboard.

Chuckle, yes, I've been there and done that in a past life writing arexx 
code for an amiga.  Thanks for taking the time to reply.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-05 19:19     ` Gene Heskett
  2007-03-05 22:40       ` Andrew Morton
@ 2007-03-06  2:23       ` Ed Tomlinson
  2007-03-06  2:54         ` Linus Torvalds
  1 sibling, 1 reply; 69+ messages in thread
From: Ed Tomlinson @ 2007-03-06  2:23 UTC (permalink / raw)
  To: Gene Heskett
  Cc: linux-kernel, Andrew Morton, Linus Torvalds, Lee Revell, Nicolas Mailhot

Hi,

I have had this in for about 24 hours.  So far so good.  I am running on IUP amd64 with
'voluntary kernel Preemption' enabled (preemptible kernels seem to lock up solid
switching between 32 and 64 apps - no opps and nothing on the serial console...)

The patch _does_ make a difference.  For instance reading mail with freenet working 
hard  (threaded java application) and gentoo's emerge triggering compiles to update the 
box is much smoother.

Think this scheduler needs serious looking at.  

Ed Tomlinson

On Monday 05 March 2007 14:19, Gene Heskett wrote:
> On Monday 05 March 2007, Lee Revell wrote:
> >On 3/5/07, Gene Heskett <gene.heskett@gmail.com> wrote
> >> On Monday 05 March 2007, Nicolas Mailhot wrote:
> >> >This looks like -mm stuff if you want it in 2.6.22
> >>
> >> This needs to get to 2.6.21, it really is that big an improvement.
> >
> >You can probably speed things up by regression testing against a wide
> >range of non-desktop workloads and posting your numbers...
> >
> >Lee
> 
> I'm not THAT much of a tester other than seeing how it does my personal 
> workload, such as working over an image in Gimp, and then printing it, 
> while I'm reading and reply to email, browsing the web or playing 
> solitaire.  At all of this is marches along rather nicely, and generally 
> unaffected by the fetchmail/procmail/spamc loop every 90 seconds in the 
> background.  glxgears runs about 1190 fps here, and loses maybe 4 or 5 
> frames when the system beeps at me to indicate a new mail has arrived.  
> Strangely, the wah wah wah wah it plays for a spam sent to trash doesn't 
> bother glxgears any more then the 1/3 second beep.
> 
> I might add that its a bit puzzling to me, that when these pregnant pauses 
> occur without the patch, gkrellm cpu usage, at an update frequency of 1 
> second granularity, doesn't show ANY change, or so little it can't 
> possibly explain why I'm sitting here for 10 seconds or more, waiting for 
> what I've typed to actually show up on screen.  I should see a spike in 
> one of the three cpu usage windows, but the whole thing is sitting at 2% 
> total cpu while my application is starved and frozen for 10+ seconds at a 
> time. 
> 
> That in itself tells this unwashed user that the scheduler is spinning its 
> wheel somewhere as it exists today without this patch.  With this patch, 
> those pauses are very small fractions of a second.  Just barely 
> detectable.  And, system or user activity now properly registers in the 
> gkrellm display, going up to about 15% user if I stand on the left arrow 
> to backspace the cursor here in kmails composer screen.
> 
> To me, this plays great in Peoria, put it in ASAP and the users will bow 
> at your feet to pay homage to Con for submitting it.
> 
> I am having a hard time figuring out why Con has thrown patches that might 
> have helped over the fence only to have them go splat in the night.  When 
> Con gets frustrated enough to post about it, everybody seems to turn a 
> deaf ear, like he's committed some cardinal sin and I don't understand 
> why or what.
> 
> This list is about development, supposedly in an orderly fashion, but 
> occasionally a patch comes along that's so obviously correct it deserves 
> to be fast tracked, and this is one of those.  To me this is as important 
> as the filesystem corruption that was chased all the way thru from around 
> 2.6.17 to the middle of the 20-rc stuff.  Nobody wasted any time getting 
> that into the next rc when it was finally found.  Fortunately I didn't 
> get bit, possibly due to my habit of restarting azureas to seed, and I've 
> seen it re-suck a few pieces to correct itself more than once.
> 
> Andrew, please, get this one in ASAP, but promise me an -mm won't trash 
> half my filesystems like one I tried 2-3 years ago did.  Its a shame when 
> Con submits a patch, and only 2 people post their experiences from using 
> it.
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-06  2:23       ` Ed Tomlinson
@ 2007-03-06  2:54         ` Linus Torvalds
  2007-03-06  3:36           ` Gene Heskett
  2007-03-09  4:04           ` Bill Davidsen
  0 siblings, 2 replies; 69+ messages in thread
From: Linus Torvalds @ 2007-03-06  2:54 UTC (permalink / raw)
  To: Ed Tomlinson
  Cc: Gene Heskett, linux-kernel, Andrew Morton, Lee Revell, Nicolas Mailhot



On Mon, 5 Mar 2007, Ed Tomlinson wrote:
> 
> The patch _does_ make a difference.  For instance reading mail with freenet working 
> hard  (threaded java application) and gentoo's emerge triggering compiles to update the 
> box is much smoother.
> 
> Think this scheduler needs serious looking at.  

I agree, partly because it's obviously been getting rave reviews so far, 
but mainly because it looks like you can think about behaviour a lot 
better, something that was always very hard with the interactivity 
boosters with process state history.

I'm not at all opposed to this, but we do need:
 - to not do it at this stage in the stable kernel
 - to let it sit in -mm for at least a short while
 - and generally more people testing more loads.

I don't actually worry too much about switching out a CPU scheduler: those 
things are places where you *can* largely read the source code and get an 
idea for them (although with the kind of history state that we currently 
have, it's really really hard). But at the very least they aren't likely 
to have subtle bugs that show up elsewhere, so...

So as long as the generic concerns above are under control, I'll happily 
try something like this if it can be merged early in a merge window..

			Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-06  2:54         ` Linus Torvalds
@ 2007-03-06  3:36           ` Gene Heskett
  2007-03-09  4:04           ` Bill Davidsen
  1 sibling, 0 replies; 69+ messages in thread
From: Gene Heskett @ 2007-03-06  3:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ed Tomlinson, Andrew Morton, Lee Revell, Nicolas Mailhot

On Monday 05 March 2007, Linus Torvalds wrote:
>On Mon, 5 Mar 2007, Ed Tomlinson wrote:
>> The patch _does_ make a difference.  For instance reading mail with
>> freenet working hard  (threaded java application) and gentoo's emerge
>> triggering compiles to update the box is much smoother.
>>
>> Think this scheduler needs serious looking at.
>
>I agree, partly because it's obviously been getting rave reviews so far,
>but mainly because it looks like you can think about behaviour a lot
>better, something that was always very hard with the interactivity
>boosters with process state history.
>
>I'm not at all opposed to this, but we do need:
> - to not do it at this stage in the stable kernel
> - to let it sit in -mm for at least a short while
> - and generally more people testing more loads.
>
>I don't actually worry too much about switching out a CPU scheduler:
> those things are places where you *can* largely read the source code
> and get an idea for them (although with the kind of history state that
> we currently have, it's really really hard). But at the very least they
> aren't likely to have subtle bugs that show up elsewhere, so...
>
>So as long as the generic concerns above are under control, I'll happily
>try something like this if it can be merged early in a merge window..
>
>			Linus

Thanks Linus, now I feel we're getting somewhere.

Cheers, Gene
-- 
Many hands make light work.
		-- John Heywood

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-05  9:53 ` Gene Heskett
                     ` (2 preceding siblings ...)
  2007-03-05 18:20   ` Lee Revell
@ 2007-03-06 17:50   ` Bill Davidsen
  2007-03-06 20:06     ` Con Kolivas
  3 siblings, 1 reply; 69+ messages in thread
From: Bill Davidsen @ 2007-03-06 17:50 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Nicolas Mailhot

Gene Heskett wrote:
> On Monday 05 March 2007, Nicolas Mailhot wrote:
>> This looks like -mm stuff if you want it in 2.6.22
> 
> This needs to get to 2.6.21, it really is that big an improvement.
> 
As Con pointed out, for some workloads and desired behavour this is not 
as good as the existing scheduler. Therefore it should go in -mm and 
hopefully give the user an option to select which is appropriate.

With luck I'll get to shake out that patch in combination with kvm later 
today.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu   scheduler
  2007-03-06 17:50   ` Bill Davidsen
@ 2007-03-06 20:06     ` Con Kolivas
  2007-03-09  4:21       ` Bill Davidsen
  0 siblings, 1 reply; 69+ messages in thread
From: Con Kolivas @ 2007-03-06 20:06 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Gene Heskett, linux-kernel, Nicolas Mailhot

On Wednesday 07 March 2007 04:50, Bill Davidsen wrote:
> Gene Heskett wrote:
> > On Monday 05 March 2007, Nicolas Mailhot wrote:
> >> This looks like -mm stuff if you want it in 2.6.22
> >
> > This needs to get to 2.6.21, it really is that big an improvement.
>
> As Con pointed out, for some workloads and desired behavour this is not
> as good as the existing scheduler. Therefore it should go in -mm and
> hopefully give the user an option to select which is appropriate.

Actually I wasn't saying that for some workloads mainline will be better. What 
I was saying was there will be some bizarre scenarios where the intrinsic 
unfairness in mainline towards certain interactive tasks will make them 
appear to run better. After fiddling with scheduler code for the last few 
years I've come to believe that that may _appear to look better_, but is 
worse since that behaviour can be exploited and leads to scheduling delays 
elsewhere.

> With luck I'll get to shake out that patch in combination with kvm later
> today.

Great thanks!. I've appreciated all the feedback so far.

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-06  2:54         ` Linus Torvalds
  2007-03-06  3:36           ` Gene Heskett
@ 2007-03-09  4:04           ` Bill Davidsen
  2007-03-09  6:31             ` Linus Torvalds
  1 sibling, 1 reply; 69+ messages in thread
From: Bill Davidsen @ 2007-03-09  4:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ed Tomlinson, Gene Heskett, linux-kernel, Andrew Morton,
	Lee Revell, Nicolas Mailhot

Linus Torvalds wrote:
> 
> On Mon, 5 Mar 2007, Ed Tomlinson wrote:
>> The patch _does_ make a difference.  For instance reading mail with freenet working 
>> hard  (threaded java application) and gentoo's emerge triggering compiles to update the 
>> box is much smoother.
>>
>> Think this scheduler needs serious looking at.  
> 
> I agree, partly because it's obviously been getting rave reviews so far, 
> but mainly because it looks like you can think about behaviour a lot 
> better, something that was always very hard with the interactivity 
> boosters with process state history.
> 
> I'm not at all opposed to this, but we do need:
>  - to not do it at this stage in the stable kernel
>  - to let it sit in -mm for at least a short while
>  - and generally more people testing more loads.
> 
Please, could you now rethink plugable scheduler as well? Even if one 
had to be chosen at boot time and couldn't be change thereafter, it 
would still allow a few new thoughts to be included.

> I don't actually worry too much about switching out a CPU scheduler: those 
> things are places where you *can* largely read the source code and get an 
> idea for them (although with the kind of history state that we currently 
> have, it's really really hard). But at the very least they aren't likely 
> to have subtle bugs that show up elsewhere, so...
> 
I confess that the default scheduler works for me most of the time, i/o 
tuning is more productive. I want tot test with kvm load, but 
2.6.21-rc3-git3 doesn't want to run kvm at all, I'm looking to see what 
I broke, since nbd doesn't work, either.

I'm collecting OOPS now, will forward when I have a few more.

> So as long as the generic concerns above are under control, I'll happily 
> try something like this if it can be merged early in a merge window..
> 
> 			Linus


-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-06 20:06     ` Con Kolivas
@ 2007-03-09  4:21       ` Bill Davidsen
  0 siblings, 0 replies; 69+ messages in thread
From: Bill Davidsen @ 2007-03-09  4:21 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Gene Heskett, linux-kernel, Nicolas Mailhot

Con Kolivas wrote:
> On Wednesday 07 March 2007 04:50, Bill Davidsen wrote:

>> With luck I'll get to shake out that patch in combination with kvm later
>> today.
> 
> Great thanks!. I've appreciated all the feedback so far.
> 
I did try, the 2.6.21-rc3-git3 doesn't want to kvm for me, and your 
patch may not be doing what it should. I'm falling back to 2.6.20 and 
will retest after I document my kvm issues.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-09  4:04           ` Bill Davidsen
@ 2007-03-09  6:31             ` Linus Torvalds
  2007-03-09  7:04               ` Bill Huey
                                 ` (2 more replies)
  0 siblings, 3 replies; 69+ messages in thread
From: Linus Torvalds @ 2007-03-09  6:31 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Ed Tomlinson, Gene Heskett, linux-kernel, Andrew Morton,
	Lee Revell, Nicolas Mailhot



On Thu, 8 Mar 2007, Bill Davidsen wrote:
>
> Please, could you now rethink plugable scheduler as well? Even if one had to
> be chosen at boot time and couldn't be change thereafter, it would still allow
> a few new thoughts to be included.

No. Really.

I absolutely *detest* pluggable schedulers. They have a huge downside: 
they allow people to think that it's ok to make special-case schedulers. 
And I simply very fundamentally disagree.

If you want to play with a scheduler of your own, go wild. It's easy 
(well, you'll find out that getting good results isn't, but that's a 
different thing). But actual pluggable schedulers just cause people to 
think that "oh, the scheduler performs badly under circumstance X, so 
let's tell people to use special scheduler Y for that case".

And CPU scheduling really isn't that complicated. It's *way* simpler than 
IO scheduling. There simply is *no*excuse* for not trying to do it well 
enough for all cases, or for having special-case stuff.

But even IO scheduling actually ends up being largely the same. Yes, we 
have pluggable schedulers, and we even allow switching them, but in the 
end, we don't want people to actually do it. It's much better to have a 
scheduler that is "good enough" than it is to have five that are "perfect" 
for five particular cases.

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-09  6:31             ` Linus Torvalds
@ 2007-03-09  7:04               ` Bill Huey
  2007-03-09 10:54               ` William Lee Irwin III
  2007-03-09 14:54               ` Bill Davidsen
  2 siblings, 0 replies; 69+ messages in thread
From: Bill Huey @ 2007-03-09  7:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bill Davidsen, Ed Tomlinson, Gene Heskett, linux-kernel,
	Andrew Morton, Lee Revell, Nicolas Mailhot, Bill Huey (hui)

On Thu, Mar 08, 2007 at 10:31:48PM -0800, Linus Torvalds wrote:
> On Thu, 8 Mar 2007, Bill Davidsen wrote:
> > Please, could you now rethink plugable scheduler as well? Even if one had to
> > be chosen at boot time and couldn't be change thereafter, it would still allow
> > a few new thoughts to be included.
> 
> No. Really.
> 
> I absolutely *detest* pluggable schedulers. They have a huge downside: 
> they allow people to think that it's ok to make special-case schedulers. 
> And I simply very fundamentally disagree.

Linus,

This is where I have to respectfully disagree. There are types of loads
that aren't covered in SCHED_OTHER. They are typically certain real time
loads and those folks (regardless of -rt patch) would benefit greatly
from having something like that in place. Those scheduler developers can
plug in (at compile time) their work without having to track and forward
port their code constantly so that non-SCHED_OTHER policies can be
experimented with easily.

This is especially so with rate monotonic influenced schedulers that are
in the works by real time folks, stock kernel or not. This is about
making Linux generally accessible to those folks and not folks doing
SCHED_OTHER work. They are orthogonal.

bill


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-09  6:31             ` Linus Torvalds
  2007-03-09  7:04               ` Bill Huey
@ 2007-03-09 10:54               ` William Lee Irwin III
  2007-03-09 14:54               ` Bill Davidsen
  2 siblings, 0 replies; 69+ messages in thread
From: William Lee Irwin III @ 2007-03-09 10:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bill Davidsen, Ed Tomlinson, Gene Heskett, linux-kernel,
	Andrew Morton, Lee Revell, Nicolas Mailhot

On Thu, Mar 08, 2007 at 10:31:48PM -0800, Linus Torvalds wrote:
> No. Really.
> I absolutely *detest* pluggable schedulers. They have a huge downside: 
> they allow people to think that it's ok to make special-case schedulers. 
> And I simply very fundamentally disagree.
> If you want to play with a scheduler of your own, go wild. It's easy 
> (well, you'll find out that getting good results isn't, but that's a 
> different thing). But actual pluggable schedulers just cause people to 
> think that "oh, the scheduler performs badly under circumstance X, so 
> let's tell people to use special scheduler Y for that case".
> And CPU scheduling really isn't that complicated. It's *way* simpler than 
> IO scheduling. There simply is *no*excuse* for not trying to do it well 
> enough for all cases, or for having special-case stuff.
> But even IO scheduling actually ends up being largely the same. Yes, we 
> have pluggable schedulers, and we even allow switching them, but in the 
> end, we don't want people to actually do it. It's much better to have a 
> scheduler that is "good enough" than it is to have five that are "perfect" 
> for five particular cases.

For the most part I was trying to assist development, but ran out of
patience and interest before getting much of anywhere. The basic idea
was to be able to fork over a kernel to a benchmark team and have them
run head-to-head comparisons, switching schedulers on the fly,
particularly on machines that took a very long time to boot. The
concept ideally involved making observations and loading fresh
schedulers based on them as kernel modules on the fly. I was more
interested in rapid incremental changes than total rewrites, though I
considered total rewrites to be tests of adequacy, since somewhere in
the back of my mind I had thoughts about experimenting with gang
scheduling policies on those machines taking very long times to boot.

What actually got written, the result of it being picked up by others,
and how it's getting used are all rather far from what I had in mind,
not that I'm offended in the least by any of it. I also had little or
no interest in mainline for it. The intention was more on the order of
an elaborate instrumentation patch for systems where the time required
to reboot is prohibitive and the duration of access strictly limited.
(In fact, downward-revised estimates of the likelihood of such access
also factored into the abandonment of the codebase.)

I consider policy issues to be hopeless political quagmires and
therefore stick to mechanism. So even though I may have started the
code in question, I have little or nothing to say about that sort of
use for it.

There's my longwinded excuse for having originated that tidbit of code.


-- wli

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-09  6:31             ` Linus Torvalds
  2007-03-09  7:04               ` Bill Huey
  2007-03-09 10:54               ` William Lee Irwin III
@ 2007-03-09 14:54               ` Bill Davidsen
  2007-03-09 18:11                 ` Linus Torvalds
  2 siblings, 1 reply; 69+ messages in thread
From: Bill Davidsen @ 2007-03-09 14:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ed Tomlinson, Gene Heskett, linux-kernel, Andrew Morton,
	Lee Revell, Nicolas Mailhot

Linus Torvalds wrote:
> On Thu, 8 Mar 2007, Bill Davidsen wrote:
>   
>> Please, could you now rethink plugable scheduler as well? Even if one had to
>> be chosen at boot time and couldn't be change thereafter, it would still allow
>> a few new thoughts to be included.
>>     
>
> No. Really.
>
> I absolutely *detest* pluggable schedulers. They have a huge downside: 
> they allow people to think that it's ok to make special-case schedulers. 
>   
But it IS okay for people to make special-case schedulers. Because it's 
MY machine, and how it behaves under mixed load is not a technical 
issue, it's a POLICY issue, and therefore the only way you can allow the 
admin to implement that policy is to either provide several schedulers 
or to provide all sorts of tunable knobs. And by having a few schedulers 
which have been heavily tested and reviewed, you can define the policy 
the scheduler implements and document it. Instead of people writing 
their own, or hacking the code, they could have a few well-tested 
choices, with known policy goals.
> And I simply very fundamentally disagree.
>
> If you want to play with a scheduler of your own, go wild. It's easy 
> (well, you'll find out that getting good results isn't, but that's a 
> different thing). But actual pluggable schedulers just cause people to 
> think that "oh, the scheduler performs badly under circumstance X, so 
> let's tell people to use special scheduler Y for that case".
>   
And has that been a problem with io schedulers? I don't see any vast 
proliferation of them, I don't see contentious exchanges on LKML, or 
people asking how to get yet another into mainline. In fact, I would say 
that the io scheduler situation is as right as anything can be, choices 
for special cases, lack of requests for something else.
> And CPU scheduling really isn't that complicated. It's *way* simpler than 
> IO scheduling. There simply is *no*excuse* for not trying to do it well 
> enough for all cases, or for having special-case stuff.
>   
This supposes that the desired behavior, the policy, is the same on all 
machines or that there is currently a way to set the target. If I want 
interactive response with no consideration to batch (and can't trust 
users to use nice), I want one policy. If I want a compromise, the 
current scheduler or RSDL are candidates, but they do different things.
> But even IO scheduling actually ends up being largely the same. Yes, we 
> have pluggable schedulers, and we even allow switching them, but in the 
> end, we don't want people to actually do it. It's much better to have a 
> scheduler that is "good enough" than it is to have five that are "perfect" 
> for five particular cases.
>   
We not only have multiple io schedulers, we have many tunable io 
parameters, all of which allow people to make their system behave the 
way they think is best. It isn't causing complaint, confusion, or 
instability. We have many people requesting a different scheduler, so 
obviously what we have isn't "good enough" and I doubt any one scheduler 
can be, given that the target behavior is driven by non-technical choices.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-09 14:54               ` Bill Davidsen
@ 2007-03-09 18:11                 ` Linus Torvalds
  0 siblings, 0 replies; 69+ messages in thread
From: Linus Torvalds @ 2007-03-09 18:11 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Ed Tomlinson, Gene Heskett, linux-kernel, Andrew Morton,
	Lee Revell, Nicolas Mailhot



On Fri, 9 Mar 2007, Bill Davidsen wrote:
>
> But it IS okay for people to make special-case schedulers. Because it's MY
> machine,

Sure.

Go wild. It's what open-source is all about.

I'm not stopping you.

I'm just not merging code that makes the scheduler unreadable, even hard 
to understand, and slows things down. I'm also not merging code that sets 
some scheduler policy limits by having specific "pluggable scheduler 
interfaces".

Different schedulers tend to need different data structures in some *very* 
core data, like the per-cpu run-queues, in "struct task_struct", in 
"struct thread_struct" etc etc. Those are some of *the* most low-level 
structures in the kernel. And those are things that get set up to have as 
little cache footprint a possible etc.

IO schedulers have basically none of those issues. Once you need to do IO, 
you'll happibly use a few indirect pointers, it's not going to show up 
anywhere. But in the scheduler, 10 cycles here and there will be a big 
deal.

And hey, you can try to prove me wrong. Code talks. So far, nobody has 
really ever come close.

So go and code it up, and show the end result. So far, nobody who actually 
*does* CPU schedulers have really wanted to do it, because they all want 
to muck around with their own private versions of the data structures.

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-12 12:52                 ` Con Kolivas
  2007-03-12 14:14                   ` Al Boldi
@ 2007-03-18  1:30                   ` Bill Davidsen
  1 sibling, 0 replies; 69+ messages in thread
From: Bill Davidsen @ 2007-03-18  1:30 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Al Boldi, ck list, linux-kernel

Con Kolivas wrote:
> On Monday 12 March 2007 22:26, Al Boldi wrote:
>> Con Kolivas wrote:
>>> On Monday 12 March 2007 15:42, Al Boldi wrote:
>>>> Con Kolivas wrote:
>>>>> On Monday 12 March 2007 08:52, Con Kolivas wrote:
>>>>>> And thank you! I think I know what's going on now. I think each
>>>>>> rotation is followed by another rotation before the higher priority
>>>>>> task is getting a look in in schedule() to even get quota and add
>>>>>> it to the runqueue quota. I'll try a simple change to see if that
>>>>>> helps. Patch coming up shortly.
>>>>> Can you try the following patch and see if it helps. There's also one
>>>>> minor preemption logic fix in there that I'm planning on including.
>>>>> Thanks!
>>>> Applied on top of v0.28 mainline, and there is no difference.
>>>>
>>>> What's it look like on your machine?
>>> The higher priority one always get 6-7ms whereas the lower priority one
>>> runs 6-7ms and then one larger perfectly bound expiration amount.
>>> Basically exactly as I'd expect. The higher priority task gets precisely
>>> RR_INTERVAL maximum latency whereas the lower priority task gets
>>> RR_INTERVAL min and full expiration (according to the virtual deadline)
>>> as a maximum. That's exactly how I intend it to work. Yes I realise that
>>> the max latency ends up being longer intermittently on the niced task but
>>> that's -in my opinion- perfectly fine as a compromise to ensure the nice
>>> 0 one always gets low latency.
>> I think, it should be possible to spread this max expiration latency across
>> the rotation, should it not?
> 
> There is a way that I toyed with of creating maps of slots to use for each 
> different priority, but it broke the O(1) nature of the virtual deadline 
> management. Minimising algorithmic complexity seemed more important to 
> maintain than getting slightly better latency spreads for niced tasks. It 
> also appeared to be less cache friendly in design. I could certainly try and 
> implement it but how much importance are we to place on latency of niced 
> tasks? Are you aware of any usage scenario where latency sensitive tasks are 
> ever significantly niced in the real world?
> 
It depends on how you reconcile "completely fair" and "order of 
magnitude blips in latency." It looks (from the results, not the code) 
as if nice is implemented by round-robin scheduling followed by once in 
a while just not giving the CPU to the nice task for a while. Given the 
smooth nature of the performance otherwise, it's more obvious than if 
you weren't doing such a good job most of the time.

Ugly stands out more on something beautiful!

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-12 14:14                   ` Al Boldi
  2007-03-12 14:58                     ` [ck] " jos poortvliet
@ 2007-03-12 18:05                     ` Con Kolivas
  1 sibling, 0 replies; 69+ messages in thread
From: Con Kolivas @ 2007-03-12 18:05 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel

On Tuesday 13 March 2007 01:14, Al Boldi wrote:
> Con Kolivas wrote:
> > > > The higher priority one always get 6-7ms whereas the lower priority
> > > > one runs 6-7ms and then one larger perfectly bound expiration amount.
> > > > Basically exactly as I'd expect. The higher priority task gets
> > > > precisely RR_INTERVAL maximum latency whereas the lower priority task
> > > > gets RR_INTERVAL min and full expiration (according to the virtual
> > > > deadline) as a maximum. That's exactly how I intend it to work. Yes I
> > > > realise that the max latency ends up being longer intermittently on
> > > > the niced task but that's -in my opinion- perfectly fine as a
> > > > compromise to ensure the nice 0 one always gets low latency.
> > >
> > > I think, it should be possible to spread this max expiration latency
> > > across the rotation, should it not?
> >
> > There is a way that I toyed with of creating maps of slots to use for
> > each different priority, but it broke the O(1) nature of the virtual
> > deadline management. Minimising algorithmic complexity seemed more
> > important to maintain than getting slightly better latency spreads for
> > niced tasks. It also appeared to be less cache friendly in design. I
> > could certainly try and implement it but how much importance are we to
> > place on latency of niced tasks? Are you aware of any usage scenario
> > where latency sensitive tasks are ever significantly niced in the real
> > world?
>
> It only takes one negatively nice'd proc to affect X adversely.

I have an idea. Give me some time to code up my idea. Lack of sleep is making 
me very unpleasant.

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-12 14:58                     ` [ck] " jos poortvliet
@ 2007-03-12 17:41                       ` Al Boldi
  0 siblings, 0 replies; 69+ messages in thread
From: Al Boldi @ 2007-03-12 17:41 UTC (permalink / raw)
  To: jos poortvliet, ck; +Cc: Con Kolivas, linux-kernel

jos poortvliet wrote:
> > It only takes one negatively nice'd proc to affect X adversely.
>
> Then, maybe, we should start nicing X again, like we did/had to do until a
> few years ago? Or should we just wait until X gets fixed (after all,
> development goes faster than ever)? Or is this really the scheduler's
> fault?

It's not enough to renice X.  You would have to renice it, and any app that 
needed fixed latency, to the same nice of the negatively nice'd proc, which 
defeats the purpose...


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-12 12:52                 ` Con Kolivas
@ 2007-03-12 14:14                   ` Al Boldi
  2007-03-12 14:58                     ` [ck] " jos poortvliet
  2007-03-12 18:05                     ` Con Kolivas
  2007-03-18  1:30                   ` Bill Davidsen
  1 sibling, 2 replies; 69+ messages in thread
From: Al Boldi @ 2007-03-12 14:14 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel

Con Kolivas wrote:
> > > The higher priority one always get 6-7ms whereas the lower priority
> > > one runs 6-7ms and then one larger perfectly bound expiration amount.
> > > Basically exactly as I'd expect. The higher priority task gets
> > > precisely RR_INTERVAL maximum latency whereas the lower priority task
> > > gets RR_INTERVAL min and full expiration (according to the virtual
> > > deadline) as a maximum. That's exactly how I intend it to work. Yes I
> > > realise that the max latency ends up being longer intermittently on
> > > the niced task but that's -in my opinion- perfectly fine as a
> > > compromise to ensure the nice 0 one always gets low latency.
> >
> > I think, it should be possible to spread this max expiration latency
> > across the rotation, should it not?
>
> There is a way that I toyed with of creating maps of slots to use for each
> different priority, but it broke the O(1) nature of the virtual deadline
> management. Minimising algorithmic complexity seemed more important to
> maintain than getting slightly better latency spreads for niced tasks. It
> also appeared to be less cache friendly in design. I could certainly try
> and implement it but how much importance are we to place on latency of
> niced tasks? Are you aware of any usage scenario where latency sensitive
> tasks are ever significantly niced in the real world?

It only takes one negatively nice'd proc to affect X adversely.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-12 11:26               ` Al Boldi
@ 2007-03-12 12:52                 ` Con Kolivas
  2007-03-12 14:14                   ` Al Boldi
  2007-03-18  1:30                   ` Bill Davidsen
  0 siblings, 2 replies; 69+ messages in thread
From: Con Kolivas @ 2007-03-12 12:52 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel

On Monday 12 March 2007 22:26, Al Boldi wrote:
> Con Kolivas wrote:
> > On Monday 12 March 2007 15:42, Al Boldi wrote:
> > > Con Kolivas wrote:
> > > > On Monday 12 March 2007 08:52, Con Kolivas wrote:
> > > > > And thank you! I think I know what's going on now. I think each
> > > > > rotation is followed by another rotation before the higher priority
> > > > > task is getting a look in in schedule() to even get quota and add
> > > > > it to the runqueue quota. I'll try a simple change to see if that
> > > > > helps. Patch coming up shortly.
> > > >
> > > > Can you try the following patch and see if it helps. There's also one
> > > > minor preemption logic fix in there that I'm planning on including.
> > > > Thanks!
> > >
> > > Applied on top of v0.28 mainline, and there is no difference.
> > >
> > > What's it look like on your machine?
> >
> > The higher priority one always get 6-7ms whereas the lower priority one
> > runs 6-7ms and then one larger perfectly bound expiration amount.
> > Basically exactly as I'd expect. The higher priority task gets precisely
> > RR_INTERVAL maximum latency whereas the lower priority task gets
> > RR_INTERVAL min and full expiration (according to the virtual deadline)
> > as a maximum. That's exactly how I intend it to work. Yes I realise that
> > the max latency ends up being longer intermittently on the niced task but
> > that's -in my opinion- perfectly fine as a compromise to ensure the nice
> > 0 one always gets low latency.
>
> I think, it should be possible to spread this max expiration latency across
> the rotation, should it not?

There is a way that I toyed with of creating maps of slots to use for each 
different priority, but it broke the O(1) nature of the virtual deadline 
management. Minimising algorithmic complexity seemed more important to 
maintain than getting slightly better latency spreads for niced tasks. It 
also appeared to be less cache friendly in design. I could certainly try and 
implement it but how much importance are we to place on latency of niced 
tasks? Are you aware of any usage scenario where latency sensitive tasks are 
ever significantly niced in the real world?

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-12  4:53             ` Con Kolivas
@ 2007-03-12 11:26               ` Al Boldi
  2007-03-12 12:52                 ` Con Kolivas
  0 siblings, 1 reply; 69+ messages in thread
From: Al Boldi @ 2007-03-12 11:26 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel

Con Kolivas wrote:
> On Monday 12 March 2007 15:42, Al Boldi wrote:
> > Con Kolivas wrote:
> > > On Monday 12 March 2007 08:52, Con Kolivas wrote:
> > > > And thank you! I think I know what's going on now. I think each
> > > > rotation is followed by another rotation before the higher priority
> > > > task is getting a look in in schedule() to even get quota and add it
> > > > to the runqueue quota. I'll try a simple change to see if that
> > > > helps. Patch coming up shortly.
> > >
> > > Can you try the following patch and see if it helps. There's also one
> > > minor preemption logic fix in there that I'm planning on including.
> > > Thanks!
> >
> > Applied on top of v0.28 mainline, and there is no difference.
> >
> > What's it look like on your machine?
>
> The higher priority one always get 6-7ms whereas the lower priority one
> runs 6-7ms and then one larger perfectly bound expiration amount.
> Basically exactly as I'd expect. The higher priority task gets precisely
> RR_INTERVAL maximum latency whereas the lower priority task gets
> RR_INTERVAL min and full expiration (according to the virtual deadline) as
> a maximum. That's exactly how I intend it to work. Yes I realise that the
> max latency ends up being longer intermittently on the niced task but
> that's -in my opinion- perfectly fine as a compromise to ensure the nice 0
> one always gets low latency.

I think, it should be possible to spread this max expiration latency across 
the rotation, should it not?


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-12  4:42           ` Al Boldi
@ 2007-03-12  4:53             ` Con Kolivas
  2007-03-12 11:26               ` Al Boldi
  0 siblings, 1 reply; 69+ messages in thread
From: Con Kolivas @ 2007-03-12  4:53 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel

On Monday 12 March 2007 15:42, Al Boldi wrote:
> Con Kolivas wrote:
> > On Monday 12 March 2007 08:52, Con Kolivas wrote:
> > > And thank you! I think I know what's going on now. I think each
> > > rotation is followed by another rotation before the higher priority
> > > task is getting a look in in schedule() to even get quota and add it to
> > > the runqueue quota. I'll try a simple change to see if that helps.
> > > Patch coming up shortly.
> >
> > Can you try the following patch and see if it helps. There's also one
> > minor preemption logic fix in there that I'm planning on including.
> > Thanks!
>
> Applied on top of v0.28 mainline, and there is no difference.
>
> What's it look like on your machine?

The higher priority one always get 6-7ms whereas the lower priority one runs 
6-7ms and then one larger perfectly bound expiration amount. Basically 
exactly as I'd expect. The higher priority task gets precisely RR_INTERVAL 
maximum latency whereas the lower priority task gets RR_INTERVAL min and full 
expiration (according to the virtual deadline) as a maximum. That's exactly 
how I intend it to work. Yes I realise that the max latency ends up being 
longer intermittently on the niced task but that's -in my opinion- perfectly 
fine as a compromise to ensure the nice 0 one always gets low latency.

Eg:
nice 0 vs nice 10

nice 0:
pid 6288, prio   0, out for    7 ms
pid 6288, prio   0, out for    6 ms
pid 6288, prio   0, out for    6 ms
pid 6288, prio   0, out for    6 ms
pid 6288, prio   0, out for    6 ms
pid 6288, prio   0, out for    6 ms
pid 6288, prio   0, out for    6 ms
pid 6288, prio   0, out for    6 ms
pid 6288, prio   0, out for    6 ms
pid 6288, prio   0, out for    6 ms
pid 6288, prio   0, out for    6 ms
pid 6288, prio   0, out for    6 ms
pid 6288, prio   0, out for    6 ms

nice 10:
pid 6290, prio  10, out for    6 ms
pid 6290, prio  10, out for    6 ms
pid 6290, prio  10, out for    6 ms
pid 6290, prio  10, out for    6 ms
pid 6290, prio  10, out for    6 ms
pid 6290, prio  10, out for    6 ms
pid 6290, prio  10, out for    6 ms
pid 6290, prio  10, out for    6 ms
pid 6290, prio  10, out for    6 ms
pid 6290, prio  10, out for   66 ms
pid 6290, prio  10, out for    6 ms
pid 6290, prio  10, out for    6 ms
pid 6290, prio  10, out for    6 ms

exactly as I'd expect. If you want fixed latencies _of niced tasks_ in the 
presence of less niced tasks you will not get them with this scheduler. What 
you will get, though, is a perfectly bound relationship knowing exactly what 
the maximum latency will ever be.

Thanks for the test case. It's interesting and nice that it confirms this 
scheduler works as I expect it to.

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-11 22:12         ` Con Kolivas
@ 2007-03-12  4:42           ` Al Boldi
  2007-03-12  4:53             ` Con Kolivas
  0 siblings, 1 reply; 69+ messages in thread
From: Al Boldi @ 2007-03-12  4:42 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel

Con Kolivas wrote:
> On Monday 12 March 2007 08:52, Con Kolivas wrote:
> > And thank you! I think I know what's going on now. I think each rotation
> > is followed by another rotation before the higher priority task is
> > getting a look in in schedule() to even get quota and add it to the
> > runqueue quota. I'll try a simple change to see if that helps. Patch
> > coming up shortly.
>
> Can you try the following patch and see if it helps. There's also one
> minor preemption logic fix in there that I'm planning on including.
> Thanks!

Applied on top of v0.28 mainline, and there is no difference.

What's it look like on your machine?


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-11 22:29 bert hubert
@ 2007-03-11 22:57 ` Con Kolivas
  0 siblings, 0 replies; 69+ messages in thread
From: Con Kolivas @ 2007-03-11 22:57 UTC (permalink / raw)
  To: bert hubert; +Cc: linux-kernel

On Monday 12 March 2007 09:29, bert hubert wrote:
> Con,
>
> Recent kernel versions have real problems for me on the interactivity
> front, with even a simple 'make' of my C++ program (PowerDNS) causing
> Firefox to slow down to a crawl.
>
> RSDL fixed all that, the system is noticeably snappier.
>
> As a case in point, I used to notice when a compile was done because the
> system stopped being sluggish.
>
> Today, a few times, I only noticed 'make' was done because the fans of my
> computer slowed down.
>
> Thanks for the good work! I'm on 2.6.21-rc3-rsdl-0.29.

You're most welcome, and thank you for the report :)

> 	Bert

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu   scheduler
@ 2007-03-11 22:29 bert hubert
  2007-03-11 22:57 ` Con Kolivas
  0 siblings, 1 reply; 69+ messages in thread
From: bert hubert @ 2007-03-11 22:29 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel

Con,

Recent kernel versions have real problems for me on the interactivity front,
with even a simple 'make' of my C++ program (PowerDNS) causing Firefox to
slow down to a crawl.

RSDL fixed all that, the system is noticeably snappier.

As a case in point, I used to notice when a compile was done because the
system stopped being sluggish.

Today, a few times, I only noticed 'make' was done because the fans of my
computer slowed down.

Thanks for the good work! I'm on 2.6.21-rc3-rsdl-0.29.

	Bert

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://netherlabs.nl              Open and Closed source services

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-11 21:52       ` Con Kolivas
@ 2007-03-11 22:12         ` Con Kolivas
  2007-03-12  4:42           ` Al Boldi
  0 siblings, 1 reply; 69+ messages in thread
From: Con Kolivas @ 2007-03-11 22:12 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel

On Monday 12 March 2007 08:52, Con Kolivas wrote:
> And thank you! I think I know what's going on now. I think each rotation is
> followed by another rotation before the higher priority task is getting a
> look in in schedule() to even get quota and add it to the runqueue quota.
> I'll try a simple change to see if that helps. Patch coming up shortly.

Can you try the following patch and see if it helps. There's also one minor
preemption logic fix in there that I'm planning on including. Thanks!

---
 kernel/sched.c |   24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

Index: linux-2.6.21-rc3-mm2/kernel/sched.c
===================================================================
--- linux-2.6.21-rc3-mm2.orig/kernel/sched.c	2007-03-12 08:47:43.000000000 +1100
+++ linux-2.6.21-rc3-mm2/kernel/sched.c	2007-03-12 09:10:33.000000000 +1100
@@ -96,10 +96,9 @@ unsigned long long __attribute__((weak))
  * provided it is not a realtime comparison.
  */
 #define TASK_PREEMPTS_CURR(p, curr) \
-	(((p)->prio < (curr)->prio) || (((p)->prio == (curr)->prio) && \
+	(((p)->prio < (curr)->prio) || (!rt_task(p) && \
 		((p)->static_prio < (curr)->static_prio && \
-			((curr)->static_prio > (curr)->prio)) && \
-				!rt_task(p)))
+			((curr)->static_prio > (curr)->prio))))
 
 /*
  * This is the time all tasks within the same priority round robin.
@@ -3323,7 +3322,7 @@ static inline void major_prio_rotation(s
  */
 static inline void rotate_runqueue_priority(struct rq *rq)
 {
-	int new_prio_level, remaining_quota;
+	int new_prio_level;
 	struct prio_array *array;
 
 	/*
@@ -3334,7 +3333,6 @@ static inline void rotate_runqueue_prior
 	if (unlikely(sched_find_first_bit(rq->dyn_bitmap) < rq->prio_level))
 		return;
 
-	remaining_quota = rq_quota(rq, rq->prio_level);
 	array = rq->active;
 	if (rq->prio_level > MAX_PRIO - 2) {
 		/* Major rotation required */
@@ -3368,10 +3366,11 @@ static inline void rotate_runqueue_prior
 	}
 	rq->prio_level = new_prio_level;
 	/*
-	 * While we usually rotate with the rq quota being 0, it is possible
-	 * to be negative so we subtract any deficit from the new level.
+	 * As we are merging to a prio_level that may not have anything in
+	 * its quota we add 1 to ensure the tasks get to run in schedule() to
+	 * add their quota to it.
 	 */
-	rq_quota(rq, new_prio_level) += remaining_quota;
+	rq_quota(rq, new_prio_level) += 1;
 }
 
 static void task_running_tick(struct rq *rq, struct task_struct *p)
@@ -3397,12 +3396,11 @@ static void task_running_tick(struct rq 
 	if (!--p->time_slice)
 		task_expired_entitlement(rq, p);
 	/*
-	 * The rq quota can become negative due to a task being queued in
-	 * scheduler without any quota left at that priority level. It is
-	 * cheaper to allow it to run till this scheduler tick and then
-	 * subtract it from the quota of the merged queues.
+	 * We only employ the deadline mechanism if we run over the quota.
+	 * It allows aliasing problems around the scheduler_tick to be
+	 * less harmful.
 	 */
-	if (!rt_task(p) && --rq_quota(rq, rq->prio_level) <= 0) {
+	if (!rt_task(p) && --rq_quota(rq, rq->prio_level) < 0) {
 		if (unlikely(p->first_time_slice))
 			p->first_time_slice = 0;
 		rotate_runqueue_priority(rq);


-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-11 18:11     ` Al Boldi
@ 2007-03-11 21:52       ` Con Kolivas
  2007-03-11 22:12         ` Con Kolivas
  0 siblings, 1 reply; 69+ messages in thread
From: Con Kolivas @ 2007-03-11 21:52 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel

On Monday 12 March 2007 05:11, Al Boldi wrote:
> Al Boldi wrote:
> > BTW, another way to show these hickups would be through some kind of a
> > cpu/proc timing-tracer.  Do we have something like that?
>
> Here is something like a tracer.
>
> Original idea by Chris Friesen, thanks, from this post:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=117331003029329&w=4
>
> Try attached chew.c like this:
> Boot into /bin/sh.
> Run chew in one console.
> Run nice chew in another console.
> Watch timings.
>
> Console 1: ./chew

> Console 2: nice -10 ./chew

> pid 669, prio  10, out for    5 ms
> pid 669, prio  10, out for   65 ms

One full expiration

> pid 669, prio  10, out for    6 ms
> pid 669, prio  10, out for   65 ms

again

> Console 2: nice -15 ./chew
> pid 673, prio  15, out for    6 ms
> pid 673, prio  15, out for   95 ms

again and so on..

> OTOH, mainline is completely smooth, albeit huge drop-outs.

Heh. That's not much good either is it.

> Thanks!

And thank you! I think I know what's going on now. I think each rotation is 
followed by another rotation before the higher priority task is getting a 
look in in schedule() to even get quota and add it to the runqueue quota. 
I'll try a simple change to see if that helps. Patch coming up shortly.

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-06 15:15   ` Al Boldi
@ 2007-03-11 18:11     ` Al Boldi
  2007-03-11 21:52       ` Con Kolivas
  0 siblings, 1 reply; 69+ messages in thread
From: Al Boldi @ 2007-03-11 18:11 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6035 bytes --]

Al Boldi wrote:
> BTW, another way to show these hickups would be through some kind of a
> cpu/proc timing-tracer.  Do we have something like that?

Here is something like a tracer.

Original idea by Chris Friesen, thanks, from this post:
http://marc.theaimsgroup.com/?l=linux-kernel&m=117331003029329&w=4

Try attached chew.c like this:
Boot into /bin/sh.
Run chew in one console.
Run nice chew in another console.
Watch timings.

Console 1: ./chew
pid 655, prio   0, out for    6 ms
pid 655, prio   0, out for    6 ms
pid 655, prio   0, out for    6 ms
pid 655, prio   0, out for    5 ms
pid 655, prio   0, out for    6 ms
pid 655, prio   0, out for    6 ms
pid 655, prio   0, out for    5 ms
pid 655, prio   0, out for    5 ms
pid 655, prio   0, out for    5 ms
pid 655, prio   0, out for    6 ms
pid 655, prio   0, out for    5 ms
pid 655, prio   0, out for    5 ms
pid 655, prio   0, out for    5 ms
pid 655, prio   0, out for    6 ms
pid 655, prio   0, out for    5 ms
pid 655, prio   0, out for    6 ms
pid 655, prio   0, out for    6 ms
pid 655, prio   0, out for    6 ms
pid 655, prio   0, out for    5 ms
pid 655, prio   0, out for    6 ms
pid 655, prio   0, out for    6 ms
pid 655, prio   0, out for    5 ms

Console 2: nice -10 ./chew
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    5 ms
pid 669, prio  10, out for   65 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    5 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    5 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for   65 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    6 ms
pid 669, prio  10, out for    6 ms

Console 2: nice -15 ./chew
pid 673, prio  15, out for    5 ms
pid 673, prio  15, out for    6 ms
pid 673, prio  15, out for   95 ms
pid 673, prio  15, out for    5 ms
pid 673, prio  15, out for    5 ms
pid 673, prio  15, out for    6 ms
pid 673, prio  15, out for    5 ms
pid 673, prio  15, out for   95 ms
pid 673, prio  15, out for    5 ms
pid 673, prio  15, out for    5 ms
pid 673, prio  15, out for    6 ms
pid 673, prio  15, out for    6 ms
pid 673, prio  15, out for   95 ms
pid 673, prio  15, out for    6 ms
pid 673, prio  15, out for    6 ms
pid 673, prio  15, out for    6 ms
pid 673, prio  15, out for    6 ms
pid 673, prio  15, out for   95 ms
pid 673, prio  15, out for    5 ms
pid 673, prio  15, out for    5 ms
pid 673, prio  15, out for    6 ms
pid 673, prio  15, out for    5 ms

Console 2: nice -18 ./chew
pid 677, prio  18, out for  113 ms
pid 677, prio  18, out for    6 ms
pid 677, prio  18, out for  113 ms
pid 677, prio  18, out for    6 ms
pid 677, prio  18, out for  113 ms
pid 677, prio  18, out for    5 ms
pid 677, prio  18, out for  113 ms
pid 677, prio  18, out for    6 ms
pid 677, prio  18, out for  113 ms
pid 677, prio  18, out for    5 ms
pid 677, prio  18, out for  113 ms
pid 677, prio  18, out for    5 ms
pid 677, prio  18, out for  113 ms
pid 677, prio  18, out for    5 ms
pid 677, prio  18, out for  113 ms
pid 677, prio  18, out for    6 ms
pid 677, prio  18, out for  113 ms
pid 677, prio  18, out for    6 ms
pid 677, prio  18, out for  113 ms
pid 677, prio  18, out for    5 ms
pid 677, prio  18, out for  113 ms
pid 677, prio  18, out for    5 ms

Console 2: nice -19 ./chew
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms
pid 679, prio  19, out for  119 ms

Now with negative nice:
Console 1: ./chew
pid 674, prio   0, out for    6 ms
pid 674, prio   0, out for  125 ms
pid 674, prio   0, out for    6 ms
pid 674, prio   0, out for    5 ms
pid 674, prio   0, out for    6 ms
pid 674, prio   0, out for    6 ms
pid 674, prio   0, out for    5 ms
pid 674, prio   0, out for    6 ms
pid 674, prio   0, out for    6 ms
pid 674, prio   0, out for    6 ms
pid 674, prio   0, out for    5 ms
pid 674, prio   0, out for    5 ms
pid 674, prio   0, out for    5 ms
pid 674, prio   0, out for    5 ms
pid 674, prio   0, out for    5 ms
pid 674, prio   0, out for    5 ms
pid 674, prio   0, out for    5 ms
pid 674, prio   0, out for    6 ms
pid 674, prio   0, out for    5 ms
pid 674, prio   0, out for    6 ms
pid 674, prio   0, out for    5 ms
pid 674, prio   0, out for  110 ms

Console 2: nice --20 ./chew
pid 687, prio -20, out for    5 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    5 ms
pid 687, prio -20, out for    5 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    5 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    5 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    5 ms
pid 687, prio -20, out for    5 ms
pid 687, prio -20, out for    6 ms
pid 687, prio -20, out for    6 ms


OTOH, mainline is completely smooth, albeit huge drop-outs.


Thanks!

--
Al



[-- Attachment #2: chew.c --]
[-- Type: text/x-csrc, Size: 1027 bytes --]

/*
 * orignal idea by Chris Friesen.  Thanks.
 */

#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>

#define THRESHOLD_USEC 2000

unsigned long long stamp()
{
        struct timeval tv;
        gettimeofday(&tv, 0);
        return (unsigned long long) tv.tv_usec + ((unsigned long long) tv.tv_sec)*1000000;
}

int main()
{
        unsigned long long thresh_ticks = THRESHOLD_USEC;
        unsigned long long cur,last;
        struct timespec ts;

        sched_rr_get_interval(0, &ts);
        printf("pid %d, prio %3d, interval of %d nsec\n", getpid(), getpriority(PRIO_PROCESS, 0), ts.tv_nsec);

        last = stamp();
        while(1) {
                cur = stamp();
                unsigned long long delta = cur-last;
                if (delta > thresh_ticks) {
                        printf("pid %d, prio %3d, out for %4llu ms\n", getpid(), getpriority(PRIO_PROCESS, 0), delta/1000);
                        cur = stamp();
                }
                last = cur;
        }

        return 0;
}

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-08 20:57   ` Con Kolivas
@ 2007-03-08 21:31     ` Fabio Comolli
  0 siblings, 0 replies; 69+ messages in thread
From: Fabio Comolli @ 2007-03-08 21:31 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list, ck list

Well, downloaded - compiled - booted: initng measures 17.369 seconds
to complete the boot process; without the patch the same kernel booted
in 21.553 seconds.

Very impressive.
Many thanks for your work.

Fabio






On 3/8/07, Con Kolivas <kernel@kolivas.org> wrote:
> On Friday 09 March 2007 07:25, Fabio Comolli wrote:
> > Hi Con
> > It would be nice if you could rebase this patch to latest git or at
> > least to 2.6.21-rc3.
> > Regards,
>
> Check in http://ck.kolivas.org/patches/staircase-deadline/
> There's an -rc3 patch there.
>
> --
> -ck
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-08 20:25 ` Fabio Comolli
@ 2007-03-08 20:57   ` Con Kolivas
  2007-03-08 21:31     ` Fabio Comolli
  0 siblings, 1 reply; 69+ messages in thread
From: Con Kolivas @ 2007-03-08 20:57 UTC (permalink / raw)
  To: Fabio Comolli; +Cc: linux kernel mailing list, ck list

On Friday 09 March 2007 07:25, Fabio Comolli wrote:
> Hi Con
> It would be nice if you could rebase this patch to latest git or at
> least to 2.6.21-rc3.
> Regards,

Check in http://ck.kolivas.org/patches/staircase-deadline/
There's an -rc3 patch there.

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-04  7:00 Con Kolivas
                   ` (3 preceding siblings ...)
  2007-03-08  8:53 ` Ingo Molnar
@ 2007-03-08 20:25 ` Fabio Comolli
  2007-03-08 20:57   ` Con Kolivas
  4 siblings, 1 reply; 69+ messages in thread
From: Fabio Comolli @ 2007-03-08 20:25 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list, ck list

Hi Con
It would be nice if you could rebase this patch to latest git or at
least to 2.6.21-rc3.
Regards,
Fabio




On 3/4/07, Con Kolivas <kernel@kolivas.org> wrote:
> This message is to announce the first general public release of the "Rotating
> Staircase DeadLine" cpu scheduler.
>
> Based on previous work from the staircase cpu scheduler I set out to design,
> from scratch, a new scheduling policy design which satisfies every
> requirement for SCHED_NORMAL (otherwise known as SCHED_OTHER) task management.
>
> Available for download are:
>
>  A full rollup of the patch for 2.6.20:
> http://ck.kolivas.org/patches/staircase-deadline/sched-rsdl-0.26.patch
>
>  Split patches for 2.6.20(which will follow this email):
> http://ck.kolivas.org/patches/staircase-deadline/split-out/
>
>  The readme (which will also constitute the rest of this email):
> http://ck.kolivas.org/patches/staircase-deadline/rsdl_scheduler.readme
>
>
> The following readme is also included as documentation in
> Documentation/sched-design.txt
>
>
> Rotating Staircase Deadline cpu scheduler policy
> ================================================
>
> Design summary
> ==============
>
> A novel design which incorporates a foreground-background descending priority
> system (the staircase) with runqueue managed minor and major epochs (rotation
> and deadline).
>
>
> Features
> ========
>
> A starvation free, strict fairness O(1) scalable design with interactivity
> as good as the above restrictions can provide. There is no interactivity
> estimator, no sleep/run measurements and only simple fixed accounting.
> The design has strict enough a design and accounting that task behaviour
> can be modelled and maximum scheduling latencies can be predicted by
> the virtual deadline mechanism that manages runqueues. The prime concern
> in this design is to maintain fairness at all costs determined by nice level,
> yet to maintain as good interactivity as can be allowed within the
> constraints of strict fairness.
>
>
> Design description
> ==================
>
> RSDL works off the principle of providing each task a quota of runtime that
> it is allowed to run at each priority level equal to its static priority
> (ie. its nice level) and every priority below that. When each task is queued,
> the cpu that it is queued onto also keeps a record of that quota. If the
> task uses up its quota it is decremented one priority level. Also, if the cpu
> notices a quota full has been used for that priority level, it pushes
> everything remaining at that priority level to the next lowest priority
> level. Once every runtime quota has been consumed of every priority level,
> a task is queued on the "expired" array. When no other tasks exist with
> quota, the expired array is activated and fresh quotas are handed out. This
> is all done in O(1).
>
>
> Design details
> ==============
>
> Each cpu has its own runqueue which micromanages its own epochs, and each
> task keeps a record of its own entitlement of cpu time. Most of the rest
> of these details apply to non-realtime tasks as rt task management is
> straight forward.
>
> Each runqueue keeps a record of what major epoch it is up to in the
> rq->prio_rotation field which is incremented on each major epoch. It also
> keeps a record of quota available to each priority value valid for that
> major epoch in rq->prio_quota[].
>
> Each task keeps a record of what major runqueue epoch it was last running
> on in p->rotation. It also keeps a record of what priority levels it has
> already been allocated quota from during this epoch in a bitmap p->bitmap.
>
> The only tunable that determines all other details is the RR_INTERVAL. This
> is set to 6ms (minimum on 1000HZ, higher at different HZ values).
>
> All tasks are initially given a quota based on RR_INTERVAL. This is equal to
> RR_INTERVAL between nice values of 0 and 19, and progressively larger for
> nice values from -1 to -20. This is assigned to p->quota and only changes
> with changes in nice level.
>
> As a task is first queued, it checks in recalc_task_prio to see if it has
> run at this runqueue's current priority rotation. If it has not, it will
> have its p->prio level set to equal its p->static_prio (nice level) and will
> be given a p->time_slice equal to the p->quota, and has its allocation
> bitmap bit set in p->bitmap for its static priority (nice value). This
> quota is then also added to the current runqueue's rq->prio_quota[p->prio].
> It is then queued on the current active priority array.
>
> If a task has already been running during this major epoch, if it has
> p->time_slice left and the rq->prio_quota for the task's p->prio still
> has quota, it will be placed back on the active array, but no more quota
> will be added to either the task or the runqueue quota.
>
> If a task has been running during this major epoch, but does not have
> p->time_slice left or the runqueue's prio_quota for this task's p->prio
> does not have quota, it will find the next lowest priority in its bitmap
> that it has not been allocated quota from. It then gets the a full quota
> in p->time_slice and adds that to the quota value for the relevant priority
> rq->prio_quota. It is then queued on the current active priority array at
> the newly determined lower priority.
>
> If a task has been running during this major epoch, and does not have
> any entitlement left in p->bitmap and no time_slice left, it will have its
> bitmap cleared, and be queued at its p->static_prio again, but on the expired
> priority array. No quota will be allocated until this task is scheduled.
>
> When a task is queued, it has its static_prio bit set in the current
> runqueue's rq->static_bitmap, and the relevant bit in the rq->dyn_bitmap.
> In order to minimise the number of bitmap lookups, the bitmap of queued
> tasks on the expired array is at the end of the same bitmap as the active
> array. The number of tasks queued at the current static_prio is kept in
> rq->prio_queued[].
>
> During a scheduler_tick where a task is running, the p->time_slice is
> decremented, and if it reaches zero then the recalc_task_prio is readjusted
> and the task rescheduled.
>
> During a task running tick, the runqueue prio_quota is also decremented. If
> it empties then a priority rotation occurs (a major or minor epoch). If the
> current runqueue's priority level is better than that of nice 19 tasks, a
> minor rotation is performed, otherwise a major rotation will occur.
>
> A minor rotation takes the remaining tasks at this priority level queue and
> merges them with a list_splice_tail with the queue from the next lowest
> priority level. At this time, any tasks that have been merged will now
> have invalid values in p->prio so this must be considered when dequeueing
> the task, and for testing for preemption.
>
> A major rotation takes the remaining tasks at this priority level queue and
> merges them with a list_splice_tail with the best priority task running on
> the expired array, and swaps the priority arrays. The priority quotas are
> reset at this time. Any tasks that have been merged will now have invalid
> values in p->array and possibly p->prio so this must be considered. The
> rq->prio_rotation is incremented at this time.
>
> When a task is dequeued, the dyn_bitmap bit is unset only after testing
> that the relevant queue is actually empty since p->prio may be inaccurate
> and no hard accounting of the number of tasks at that level is possible.
>
> When selecting a new task for scheduling, after the first dynamic bit is
> found on the dyn_bitmap, it is checked to see that a task is really queued
> at that priority or if it is a false positive due to the task being
> dequeued at a time when its p->prio does not match which queue it is on
> after some form of priority rotation. This is a rare occurrence as it tends
> to only occur if a task that is already waiting on a runqueue gets dequeued.
> If the bitmap value is in the expired array range, a major priority rotation
> is performed. If the chosen task has not been running during this major or
> minor rotation it has new quota allocated at this time, and added to the
> runqueue's quota.
>
>
> Modelling deadline behaviour
> ============================
>
> As the accounting in this design is hard and not modified by sleep average
> calculations or interactivity modifiers, it is possible to accurately
> predict the maximum latency that a task may experience under different
> conditions. This is a virtual deadline mechanism enforced by mandatory
> runqueue epochs, and not by trying to keep complicated accounting of each
> task.
>
> The maximum duration a task can run during one major epoch is determined
> by its nice value. Nice 0 tasks can run at 19 different priority levels
> for RR_INTERVAL duration during each epoch (the equivalent of nice 0 to nice
> 19). Nice 10 tasks can run at 9 priority levels for each epoch, and so on.
>
> Therefore the maximum duration a runqueue epoch can take is determined by
> the number of tasks running, and their nice level. After that, the maximum
> duration it can take before a task can wait before it get scheduled is
> determined by the difference between its nice value and the nice value of
> the highest priority task queued.
>
> In the following examples, these are _worst case scenarios_ and would rarely
> occur, but can be modelled nonetheless to determine the maximum possible
> latency.
>
> So for example, if two nice 0 tasks are running, and one has just expired as
> another is activated for the first time receiving a full quota for this
> runqueue rotation, the first task will wait:
>
> nr_tasks * max_duration + nice_difference * rr_interval
> 1 * 19 * RR_INTERVAL + 0 = 114ms
>
> In the presence of a nice 10 task, a nice 0 task would wait a maximum of
> 1 * 10 * RR_INTERVAL + 0 = 60ms
>
> In the presence of a nice 0 task, a nice 10 task would wait a maximum of
> 1 * 19 * RR_INTERVAL + 9 * RR_INTERVAL = 168ms
>
> Using a more complicated example, if there are 4 tasks running fully cpu
> bound, one each at nice -20, nice 0, nice 10 and nice 19, we can calculate
> the maximum latency possible for the nice 10 task. Note that -20 tasks are
> heavily biased for so this will be a long time, but can be modelled.
>
> The nice -20 task has quota = RR_INTERVAL + 20*RR_INTERVAL = 21*RR_INTERVAL.
> It can run at 39 priority levels so its maximum duration =
> 39 * 21 * RR_INTERVAL.
> The nice 0 task works out to
> 19 * RR_INTERVAL
> The nice 19 task works out to
> RR_INTERVAL.
>
> So major epoch can take up a maximum of
> 39 * 21 * RR_INTERVAL + 19 * RR_INTERVAL + RR_INTERVAL = 1229 * RR_INTERVAL;
>
> Then before the nice 10 task will run, the nice -20 and nice 0 task will
> run for 28 * 21 * RR_INTERVAL and 9 * RR_INTERVAL respectively for a total
> of 597 * RR_INTERVAL.
>
> This means the maximum duration a nice 10 task can wait in the presence of
> these other tasks is 1826*RR_INTERVAL. This is a long time of course and is
> heavily penalised by the presence of nice -20 tasks which would not be part
> of a normal environment.
>
> While this section describes the maximum latency a task can have, this size
> latencies will only be seen by fully cpu bound tasks.
>
>
> Achieving interactivity
> =======================
>
> A requirement of this scheduler design was to achieve good interactivity
> despite being a completely fair deadline based design. The disadvantage of
> designs that try to achieve interactivity is that they usually do so at
> the expense of maintaining fairness. As cpu speeds increase, the requirement
> for some sort of metered unfairness towards interactive tasks becomes a less
> desirable phenomenon, but low latency and fairness remains mandatory to
> good interactive performance.
>
> This design relies on the fact that interactive tasks, by their nature,
> sleep often. Most fair scheduling designs end up penalising such tasks
> indirectly giving them less than their fair possible share because of the
> sleep, and have to use a mechanism of bonusing their priority to offset
> this based on the duration they sleep. This becomes increasingly inaccurate
> as the number of running tasks rises and more tasks spend time waiting on
> runqueues rather than sleeping, and it is impossible to tell whether the
> task that's waiting on a runqueue only intends to run for a short period and
> then sleep again after than runqueue wait. Furthermore, all such designs rely
> on a period of time to pass to accumulate some form of statistic on the task
> before deciding on how much to give them preference. The shorter this period,
> the more rapidly bursts of cpu ruin the interactive tasks behaviour. The
> longer this period, the longer it takes for interactive tasks to get low
> scheduling latencies and fair cpu.
>
> This design does not measure sleep time at all. Interactive tasks that sleep
> often will wake up having consumed very little if any of their quota for
> the current major priority rotation. The longer they have slept, the less
> likely they are to even be on the current major priority rotation. Once
> woken up, though, they get to use up a their full quota for that epoch,
> whether part of a quota remains or a full quota. Overall, however, they
> can still only run as much cpu time for that epoch as any other task of the
> same nice level. This means that two tasks behaving completely differently
> from fully cpu bound to waking/sleeping extremely frequently will still
> get the same quota of cpu, but the latter will be using its quota for that
> epoch in bursts rather than continuously. This guarantees that interactive
> tasks get the same amount of cpu as cpu bound ones.
>
> The other requirement of interactive tasks is also to obtain low latencies
> for when they are scheduled. Unlike fully cpu bound tasks and the maximum
> latencies possible described in the modelling deadline behaviour section
> above, tasks that sleep will wake up with quota available usually at the
> current runqueue's priority_level or better. This means that the most latency
> they are likely to see is one RR_INTERVAL, and often they will preempt the
> current task if it is not of a sleeping nature. This then guarantees very
> low latency for interactive tasks, and the lowest latencies for the least
> cpu bound tasks.
>
> Sunday, 4th March 2007
> Con Kolivas
>
> --
> -ck
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
@ 2007-03-08 14:27 Tim Tassonis
  0 siblings, 0 replies; 69+ messages in thread
From: Tim Tassonis @ 2007-03-08 14:27 UTC (permalink / raw)
  To: linux-kernel

Hi Con

Just also wanted to throw in my less than two cents: I applied the patch 
and also have the very strong subjective impression that my system 
"feels" much more responsive than with stock 2.6.20.

Thanks for the great work.

Bye
Tim

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-08  8:53 ` Ingo Molnar
@ 2007-03-08 10:07   ` Con Kolivas
  0 siblings, 0 replies; 69+ messages in thread
From: Con Kolivas @ 2007-03-08 10:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux kernel mailing list, ck list, Andrew Morton, Linus Torvalds

On Thursday 08 March 2007 19:53, Ingo Molnar wrote:
> * Con Kolivas <kernel@kolivas.org> wrote:
> > This message is to announce the first general public release of the
> > "Rotating Staircase DeadLine" cpu scheduler.
> >
> > Based on previous work from the staircase cpu scheduler I set out to
> > design, from scratch, a new scheduling policy design which satisfies
> > every requirement for SCHED_NORMAL (otherwise known as SCHED_OTHER)
> > task management.
>
> cool! I like this even more than i liked your original staircase
> scheduler from 2 years ago :) Lets try what we did back then: put it
> into -mm and see what breaks (if anything). But in general, it is
> becoming increasingly clear that the interactivity estimator is a more
> fragile concept than the built-in quota mechanism of the staircase
> scheduler, so if it works in practice i'm quite in favor of it, even if
> it regresses /some/ workloads.

Great! Thanks for your support. 

After futzing around for all that time I've become sure that an approach 
without an interactive estimator is our only way forward. So far the 
throughput benchmarks are encouraging too so I suspect the estimator may be 
causing harm there too.

Ensuring the different arches and cpuidle work properly I likely will need 
help with, though, so I'd appreciate any help from people if they see 
something obvious and can get a grip of my code.

Thanks!

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-04  7:00 Con Kolivas
                   ` (2 preceding siblings ...)
  2007-03-05 21:52 ` Con Kolivas
@ 2007-03-08  8:53 ` Ingo Molnar
  2007-03-08 10:07   ` Con Kolivas
  2007-03-08 20:25 ` Fabio Comolli
  4 siblings, 1 reply; 69+ messages in thread
From: Ingo Molnar @ 2007-03-08  8:53 UTC (permalink / raw)
  To: Con Kolivas
  Cc: linux kernel mailing list, ck list, Andrew Morton, Linus Torvalds


* Con Kolivas <kernel@kolivas.org> wrote:

> This message is to announce the first general public release of the 
> "Rotating Staircase DeadLine" cpu scheduler.
> 
> Based on previous work from the staircase cpu scheduler I set out to 
> design, from scratch, a new scheduling policy design which satisfies 
> every requirement for SCHED_NORMAL (otherwise known as SCHED_OTHER) 
> task management.

cool! I like this even more than i liked your original staircase 
scheduler from 2 years ago :) Lets try what we did back then: put it 
into -mm and see what breaks (if anything). But in general, it is 
becoming increasingly clear that the interactivity estimator is a more 
fragile concept than the built-in quota mechanism of the staircase 
scheduler, so if it works in practice i'm quite in favor of it, even if 
it regresses /some/ workloads.

	Ingo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free   interactive cpu scheduler
  2007-03-06 21:37                       ` Bill Davidsen
@ 2007-03-06 21:54                         ` Willy Tarreau
  0 siblings, 0 replies; 69+ messages in thread
From: Willy Tarreau @ 2007-03-06 21:54 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Con Kolivas, jos poortvliet, ck, Gene Heskett, linux-kernel

Hi Bill,

On Tue, Mar 06, 2007 at 04:37:37PM -0500, Bill Davidsen wrote:
(...)
> The point is that no one CPU scheduler will satisfy the policy needs of 
> all users, any more than one i/o scheduler does so. We have realtime 
> scheduling, preempt both voluntary and involuntary, why should we not 
> have multiple CPU schedulers. If Linus has an objection to plugable 
> schedulers, then let's identify what the problem is and address it. If 
> that means one scheduler or the other must be compiled in, or all 
> compiled in and selected, so be it.

I'm not in Linus' head, but I think that he wanted the recurrent scheduler
problems to be addressed first for most users before going further. Too
much choice is often dangerous for quality. For instance, look at all the
netfilter modules. Many of them were completely bogus in their early stages,
and some of them even do mostly the same jobs, and many of them have never
left the "extra" stage. Choice is good to detect users' needs, it's good
for global evolution, but it's not as good when you want to have something
good enough for most people.

> >Then, when we have a generic, good enough scheduler for most situations, I
> >think that it could be good to get the plugsched for very specific usages.
> >People working in HPC may prefer to allocate ressource differently for
> >instance. There may also be people refusing to mix tasks from different 
> >users
> >on two different siblings of one CPU for security reasons, etc... All those
> >would justify a plugable scheduler. But it should not be an excuse to 
> >provide
> >a set of bad schedulers and no good one.
> >
> >  
> Unless you force the the definition of "good" to "what the default 
> scheduler does," there can be no "one" good one. Choice is good, no one 
> is calling for bizarre niche implementations, but we have at minimum 
> three CPU schedulers which as "best" for a large number of users. 
> (current default, and Con's fair and interactive flavors, before you ask).

By "good", I mean a scheduler that is not trivially DoSable, and which
does not cause unexpected long pauses to some processes without any reason
(processes which cannot get any time slice for tens of seconds, or ssh
daemons which freeze under system load, to the point of totally preventing
remote administration past 50% CPU usage on some systems).

> >The CPU scheduler is often compared to the I/O schedulers while in fact 
> >this
> >is a completely different story. The I/O schedulers are needed because the
> >hardware and filesystems may lead to very different behaviours, and the
> >workload may vary a lot (eg: news server, ftp server, cache, desktop, real
> >time streaming, ...). But at least, the default I/O scheduler was good 
> >enough
> >for most usages, and alternative ones are here to provide optimal solutions
> >to specific needs.
> And multiple schedulers are needed because the type of load, mix of 
> loads, and admin preference all require decisions at the policy which 
> can't be covered by a single solution. Or at least none of the existing 
> solutions, and I think letting people tune the guts of scheduler policy 
> is more dangerous than giving a selection of solutions. Linux has been 
> about choice all along, I hope it's nearly time for a solution better 
> than patches to be presented.

There's a difference between CPU and I/O scheduler though. With the CPU
scheduler, you've always had the choice to assign per-process priorities 
with "nice". Don't get me wrong, I'm all for pluggable schedulers, as I'm
an ever unsatisfied optimizer. It's just that I think it has been good to
encourage people to focus on real issues before dispersing efforts on
different needs. I hope that Con's work will eventually get merged and
that the door will be opened towards pluggable schedulers.

Best regards,
Willy


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free   interactive cpu scheduler
  2007-03-06  4:41                     ` Willy Tarreau
  2007-03-06  5:39                       ` Nicholas Miell
  2007-03-06 19:04                       ` jos poortvliet
@ 2007-03-06 21:37                       ` Bill Davidsen
  2007-03-06 21:54                         ` Willy Tarreau
  2 siblings, 1 reply; 69+ messages in thread
From: Bill Davidsen @ 2007-03-06 21:37 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Con Kolivas, jos poortvliet, ck, Gene Heskett, linux-kernel

Willy Tarreau wrote:
> On Tue, Mar 06, 2007 at 11:18:44AM +1100, Con Kolivas wrote:
>   
>> On Tuesday 06 March 2007 10:05, Bill Davidsen wrote:
>>     
>>> jos poortvliet wrote:
>>>       
>>>> Well, imho his current staircase scheduler already does a better job
>>>> compared to mainline, but it won't make it in (or at least, it's not
>>>> likely). So we can hope this WILL make it into mainline, but I wouldn't
>>>> count on it.
>>>>         
>>> Wrong problem, what is really needed is to get CPU scheduler choice into
>>> mainline, just as i/o scheduler finally did. Con has noted that for some
>>> loads this will present suboptimal performance, as will his -ck patches,
>>> as will the default scheduler. Instead of trying to make ANY one size
>>> fit all, we should have a means to select, at runtime, between any of
>>> the schedulers, and preferably to define an interface by which a user
>>> can insert a new scheduler in the kernel (compile in, I don't mean
>>> plugable) with clear and well defined rules for how that can be done.
>>>       
>> Been there, done that. Wli wrote the infrastructure for plugsched; I took his 
>> code and got it booting and ported 3 or so different scheduler designs. It 
>> allowed you to build as few or as many different schedulers into the kernel 
>> and either boot the only one you built into your kernel, or choose a 
>> scheduler at boot time. That code got permavetoed by both Ingo and Linus. 
>> After that I gave up on that code and handed it over to Peter Williams who 
>> still maintains it. So please note that I pushed the plugsched barrow 
>> previously and still don't think it's a bad idea, but the maintainers think 
>> it's the wrong approach.
>>     
>
> In a way, I think they are right. Let me explain. Pluggable schedulers are
> useful when you want to switch away from the default one. This is very useful
> during development of a new scheduler, as well as when you're not satisfied
> with the default scheduler. Having this feature will incitate many people to
> develop their own scheduler for their very specific workload, and nothing
> generic. It's a bit what happened after all : you, Peter, Nick, and Mike
> have worked a lot trying to provide alternative solutions.
>
> But when you think about it, there are other OSes which have only one scheduler
> and which behave very well with tens of thousands of tasks and scale very well
> with lots of CPUs (eg: solaris). So there is a real challenge here to try to
> provide something at least as good and universal because we know that it can
> exist. And this is what you finally did : work on a scheduler which ought to be
> good with any workload.
>
>   
The problem is not with "any workload," because that's not the issue, 
the issue is the definition of "good" matching the administrator's 
policy. And that's where the problem comes in. We have the default 
scheduler, which favors interactive jobs. We have Con's staircase 
scheduler which is part of an interactivity package. We have the 
absolutely fair scheduler which is, well... fair, and keeps things 
smooth and under reasonable load crisp.

There are other schedulers in the pluggable package, I did a doorknob 
scheduler for 2.2 (everybody gets a turn, special case of round-robin). 
I'm sure people have quietly hacked many more, which have never been 
presented to public view.

The point is that no one CPU scheduler will satisfy the policy needs of 
all users, any more than one i/o scheduler does so. We have realtime 
scheduling, preempt both voluntary and involuntary, why should we not 
have multiple CPU schedulers. If Linus has an objection to plugable 
schedulers, then let's identify what the problem is and address it. If 
that means one scheduler or the other must be compiled in, or all 
compiled in and selected, so be it.

> Then, when we have a generic, good enough scheduler for most situations, I
> think that it could be good to get the plugsched for very specific usages.
> People working in HPC may prefer to allocate ressource differently for
> instance. There may also be people refusing to mix tasks from different users
> on two different siblings of one CPU for security reasons, etc... All those
> would justify a plugable scheduler. But it should not be an excuse to provide
> a set of bad schedulers and no good one.
>
>   
Unless you force the the definition of "good" to "what the default 
scheduler does," there can be no "one" good one. Choice is good, no one 
is calling for bizarre niche implementations, but we have at minimum 
three CPU schedulers which as "best" for a large number of users. 
(current default, and Con's fair and interactive flavors, before you ask).
> The CPU scheduler is often compared to the I/O schedulers while in fact this
> is a completely different story. The I/O schedulers are needed because the
> hardware and filesystems may lead to very different behaviours, and the
> workload may vary a lot (eg: news server, ftp server, cache, desktop, real
> time streaming, ...). But at least, the default I/O scheduler was good enough
> for most usages, and alternative ones are here to provide optimal solutions
> to specific needs.
And multiple schedulers are needed because the type of load, mix of 
loads, and admin preference all require decisions at the policy which 
can't be covered by a single solution. Or at least none of the existing 
solutions, and I think letting people tune the guts of scheduler policy 
is more dangerous than giving a selection of solutions. Linux has been 
about choice all along, I hope it's nearly time for a solution better 
than patches to be presented.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free   interactive cpu scheduler
  2007-03-06  4:41                     ` Willy Tarreau
  2007-03-06  5:39                       ` Nicholas Miell
@ 2007-03-06 19:04                       ` jos poortvliet
  2007-03-06 21:37                       ` Bill Davidsen
  2 siblings, 0 replies; 69+ messages in thread
From: jos poortvliet @ 2007-03-06 19:04 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Con Kolivas, Bill Davidsen, ck, Gene Heskett, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3615 bytes --]

Op Tuesday 06 March 2007, schreef Willy Tarreau:
> In a way, I think they are right. Let me explain. Pluggable schedulers are
> useful when you want to switch away from the default one. This is very
> useful during development of a new scheduler, as well as when you're not
> satisfied with the default scheduler. Having this feature will incitate
> many people to develop their own scheduler for their very specific
> workload, and nothing generic. It's a bit what happened after all : you,
> Peter, Nick, and Mike have worked a lot trying to provide alternative
> solutions.

Did that happen for I/O? There are a few schedulers, eg some for servers, 
other more for desktop or throughput. But not 10 or something...

> But when you think about it, there are other OSes which have only one
> scheduler and which behave very well with tens of thousands of tasks and
> scale very well with lots of CPUs (eg: solaris). So there is a real
> challenge here to try to provide something at least as good and universal
> because we know that it can exist. And this is what you finally did : work
> on a scheduler which ought to be good with any workload.

I can imagine a desktop can work optimally with another scheduler than a tiny 
embedded OS in a phone or an 8-core system serving a website, or a 
distributed 512 core system doing heavy scientific calculations?!?

Optimizing for all at the same time involves some compromises, and thus limits 
performance on certain scenario's, right?

> Then, when we have a generic, good enough scheduler for most situations, I
> think that it could be good to get the plugsched for very specific usages.
> People working in HPC may prefer to allocate ressource differently for
> instance. There may also be people refusing to mix tasks from different
> users on two different siblings of one CPU for security reasons, etc... All
> those would justify a plugable scheduler. But it should not be an excuse to
> provide a set of bad schedulers and no good one.

CFQ does pretty well at most workloads, that's why it's default, right? But 
there is choice, which is a good thing. And the current mainline CPU 
scheduler isn't bad at all, so having 'no good one' won't happen anyway.

> The CPU scheduler is often compared to the I/O schedulers while in fact
> this is a completely different story. The I/O schedulers are needed because
> the hardware and filesystems may lead to very different behaviours, and the
> workload may vary a lot (eg: news server, ftp server, cache, desktop, real
> time streaming, ...). But at least, the default I/O scheduler was good
> enough for most usages, and alternative ones are here to provide optimal
> solutions to specific needs.

Ok, for I/O, the diff could be pretty big. But still, there are workloads 
which could be improved by a certain scheduler, right?

And wouldn't it make sense then to have a choice in the default kernel at 
boottime? If that wouldn't hurt performance, it would be an improvement for 
desktop distributions like (K)ubuntu who can set staircase by default, and 
server distro's offering RSDL...

At least having a desktop/interactivity optimized scheduler like staircase and 
a fair, throughput-optimized scheduler like RSDL sounds sane. RSDL does 
better at the msql testcase, staircase is better on the desktop... We're not 
talking about huge amounts of code, or 10 schedulers, and the diff of a few 
percent and better scaling on many cpu's and processes versus better 
interactivity on the desktop sounds like it's worth it.

> Regards,
> Willy

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free   interactive cpu scheduler
  2007-03-06  4:41                     ` Willy Tarreau
@ 2007-03-06  5:39                       ` Nicholas Miell
  2007-03-06 19:04                       ` jos poortvliet
  2007-03-06 21:37                       ` Bill Davidsen
  2 siblings, 0 replies; 69+ messages in thread
From: Nicholas Miell @ 2007-03-06  5:39 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Con Kolivas, Bill Davidsen, jos poortvliet, ck, Gene Heskett,
	linux-kernel

On Tue, 2007-03-06 at 05:41 +0100, Willy Tarreau wrote:
> On Tue, Mar 06, 2007 at 11:18:44AM +1100, Con Kolivas wrote:
> > On Tuesday 06 March 2007 10:05, Bill Davidsen wrote:
> > > jos poortvliet wrote:
> > > > Well, imho his current staircase scheduler already does a better job
> > > > compared to mainline, but it won't make it in (or at least, it's not
> > > > likely). So we can hope this WILL make it into mainline, but I wouldn't
> > > > count on it.
> > >
> > > Wrong problem, what is really needed is to get CPU scheduler choice into
> > > mainline, just as i/o scheduler finally did. Con has noted that for some
> > > loads this will present suboptimal performance, as will his -ck patches,
> > > as will the default scheduler. Instead of trying to make ANY one size
> > > fit all, we should have a means to select, at runtime, between any of
> > > the schedulers, and preferably to define an interface by which a user
> > > can insert a new scheduler in the kernel (compile in, I don't mean
> > > plugable) with clear and well defined rules for how that can be done.
> > 
> > Been there, done that. Wli wrote the infrastructure for plugsched; I took his 
> > code and got it booting and ported 3 or so different scheduler designs. It 
> > allowed you to build as few or as many different schedulers into the kernel 
> > and either boot the only one you built into your kernel, or choose a 
> > scheduler at boot time. That code got permavetoed by both Ingo and Linus. 
> > After that I gave up on that code and handed it over to Peter Williams who 
> > still maintains it. So please note that I pushed the plugsched barrow 
> > previously and still don't think it's a bad idea, but the maintainers think 
> > it's the wrong approach.
> 
> In a way, I think they are right. Let me explain. Pluggable schedulers are
> useful when you want to switch away from the default one. This is very useful
> during development of a new scheduler, as well as when you're not satisfied
> with the default scheduler. Having this feature will incitate many people to
> develop their own scheduler for their very specific workload, and nothing
> generic. It's a bit what happened after all : you, Peter, Nick, and Mike
> have worked a lot trying to provide alternative solutions.
> 
> But when you think about it, there are other OSes which have only one scheduler
> and which behave very well with tens of thousands of tasks and scale very well
> with lots of CPUs (eg: solaris). So there is a real challenge here to try to
> provide something at least as good and universal because we know that it can
> exist. And this is what you finally did : work on a scheduler which ought to be
> good with any workload.

Solaris has a pluggable scheduler framework (each policy --
OTHER/FIFO/RR/etc. -- is it's own separate component).

-- 
Nicholas Miell <nmiell@comcast.net>


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
@ 2007-03-06  4:57 Shawn Starr
  0 siblings, 0 replies; 69+ messages in thread
From: Shawn Starr @ 2007-03-06  4:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Con Kolivas, Willy Tarreau

On Monday 05 March 2007 10:13, Willy Tarreau wrote:
> Con,
>
> I've now given it a try with HZ=250 on my dual-athlon. It works
> beautifully. I also quickly checked that playing mp3 doesn't skip during
> make -j4, and that gears runs fairly smoothly, since those are the
> references people often use.
>
> But with real work, it's excellent too. When I saturate my CPUs by
> injecting HTTP traffic on haproxy, the load is stable and the command line
> perfectly responsive, while in the past the load would oscillate and the
> command line sometimes stopped to respond for a few seconds.
>
> I've also launched my scheddos program (you may remember, the one we did a
> few experiments with). I could not cause any freeze at all. Plain 2.6.20
> had already improved a lot in this area, but above 4 processes/CPU,
> occasional short freezes did still occur. This time, even at 100 processes,
> the system was rather slow (of course!) but just as expected, and nothing
> more.
>
> I also tried the good old "dd if=/dev/zero bs=1|...|dd bs=1 of=/dev/null"
> and it did not cause any trouble.
>
> I will boot 2.6 slightly more often to test the code under various
> conditions, and I will recommend it to a few people I know who tend to
> switch back to 2.4 after one day full of 2.6 jerkiness.
>
> Overall, you have done a great job !
>
> I hope that more people will give it a try, first to help find possible
> remaining bugs, and to pronounce in favour of its inclusion in mainline.

Hi Con/Willy, 

Just to chime on this thread, I've been testing Con's scheduler patches since 
he mentioned them to me. I have to say for desktop performance all processes 
running remain responsive (although with more resource hungry processes 
things get slower but remain responsive, but expected).  I have also been 
testing his scheduler on my work power workstation running 4 VMs with 4GB of 
ram on a dual-core EM64T box, again RSDL performs rather well balancing VM 
resources even while I have xmms running, xchat, beryl and a bunch of other 
stuff.  There is total response to each process.

Thanks, 
Shawn.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free   interactive cpu scheduler
  2007-03-06  0:18                   ` Con Kolivas
@ 2007-03-06  4:41                     ` Willy Tarreau
  2007-03-06  5:39                       ` Nicholas Miell
                                         ` (2 more replies)
  0 siblings, 3 replies; 69+ messages in thread
From: Willy Tarreau @ 2007-03-06  4:41 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Bill Davidsen, jos poortvliet, ck, Gene Heskett, linux-kernel

On Tue, Mar 06, 2007 at 11:18:44AM +1100, Con Kolivas wrote:
> On Tuesday 06 March 2007 10:05, Bill Davidsen wrote:
> > jos poortvliet wrote:
> > > Well, imho his current staircase scheduler already does a better job
> > > compared to mainline, but it won't make it in (or at least, it's not
> > > likely). So we can hope this WILL make it into mainline, but I wouldn't
> > > count on it.
> >
> > Wrong problem, what is really needed is to get CPU scheduler choice into
> > mainline, just as i/o scheduler finally did. Con has noted that for some
> > loads this will present suboptimal performance, as will his -ck patches,
> > as will the default scheduler. Instead of trying to make ANY one size
> > fit all, we should have a means to select, at runtime, between any of
> > the schedulers, and preferably to define an interface by which a user
> > can insert a new scheduler in the kernel (compile in, I don't mean
> > plugable) with clear and well defined rules for how that can be done.
> 
> Been there, done that. Wli wrote the infrastructure for plugsched; I took his 
> code and got it booting and ported 3 or so different scheduler designs. It 
> allowed you to build as few or as many different schedulers into the kernel 
> and either boot the only one you built into your kernel, or choose a 
> scheduler at boot time. That code got permavetoed by both Ingo and Linus. 
> After that I gave up on that code and handed it over to Peter Williams who 
> still maintains it. So please note that I pushed the plugsched barrow 
> previously and still don't think it's a bad idea, but the maintainers think 
> it's the wrong approach.

In a way, I think they are right. Let me explain. Pluggable schedulers are
useful when you want to switch away from the default one. This is very useful
during development of a new scheduler, as well as when you're not satisfied
with the default scheduler. Having this feature will incitate many people to
develop their own scheduler for their very specific workload, and nothing
generic. It's a bit what happened after all : you, Peter, Nick, and Mike
have worked a lot trying to provide alternative solutions.

But when you think about it, there are other OSes which have only one scheduler
and which behave very well with tens of thousands of tasks and scale very well
with lots of CPUs (eg: solaris). So there is a real challenge here to try to
provide something at least as good and universal because we know that it can
exist. And this is what you finally did : work on a scheduler which ought to be
good with any workload.

Then, when we have a generic, good enough scheduler for most situations, I
think that it could be good to get the plugsched for very specific usages.
People working in HPC may prefer to allocate ressource differently for
instance. There may also be people refusing to mix tasks from different users
on two different siblings of one CPU for security reasons, etc... All those
would justify a plugable scheduler. But it should not be an excuse to provide
a set of bad schedulers and no good one.

The CPU scheduler is often compared to the I/O schedulers while in fact this
is a completely different story. The I/O schedulers are needed because the
hardware and filesystems may lead to very different behaviours, and the
workload may vary a lot (eg: news server, ftp server, cache, desktop, real
time streaming, ...). But at least, the default I/O scheduler was good enough
for most usages, and alternative ones are here to provide optimal solutions
to specific needs.

Regards,
Willy


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free   interactive cpu scheduler
  2007-03-05 23:05                 ` Bill Davidsen
@ 2007-03-06  0:18                   ` Con Kolivas
  2007-03-06  4:41                     ` Willy Tarreau
  0 siblings, 1 reply; 69+ messages in thread
From: Con Kolivas @ 2007-03-06  0:18 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: jos poortvliet, ck, Gene Heskett, Willy Tarreau, linux-kernel

On Tuesday 06 March 2007 10:05, Bill Davidsen wrote:
> jos poortvliet wrote:
> > Well, imho his current staircase scheduler already does a better job
> > compared to mainline, but it won't make it in (or at least, it's not
> > likely). So we can hope this WILL make it into mainline, but I wouldn't
> > count on it.
>
> Wrong problem, what is really needed is to get CPU scheduler choice into
> mainline, just as i/o scheduler finally did. Con has noted that for some
> loads this will present suboptimal performance, as will his -ck patches,
> as will the default scheduler. Instead of trying to make ANY one size
> fit all, we should have a means to select, at runtime, between any of
> the schedulers, and preferably to define an interface by which a user
> can insert a new scheduler in the kernel (compile in, I don't mean
> plugable) with clear and well defined rules for how that can be done.

Been there, done that. Wli wrote the infrastructure for plugsched; I took his 
code and got it booting and ported 3 or so different scheduler designs. It 
allowed you to build as few or as many different schedulers into the kernel 
and either boot the only one you built into your kernel, or choose a 
scheduler at boot time. That code got permavetoed by both Ingo and Linus. 
After that I gave up on that code and handed it over to Peter Williams who 
still maintains it. So please note that I pushed the plugsched barrow 
previously and still don't think it's a bad idea, but the maintainers think 
it's the wrong approach.

RSDL is my attempt to create a cpu scheduler to try to do everything well 
instead of some things perfectly, to be best fit when trying to shoehorn it 
into any environment. 

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free   interactive cpu scheduler
  2007-03-04 16:08               ` [ck] " jos poortvliet
@ 2007-03-05 23:05                 ` Bill Davidsen
  2007-03-06  0:18                   ` Con Kolivas
  0 siblings, 1 reply; 69+ messages in thread
From: Bill Davidsen @ 2007-03-05 23:05 UTC (permalink / raw)
  To: jos poortvliet; +Cc: ck, Gene Heskett, Willy Tarreau, linux-kernel

jos poortvliet wrote:
> Op Sunday 04 March 2007, schreef Willy Tarreau:
>> Hi Con !
>>> This was designed to be robust for any application since linux demands a
>>> general purpose scheduler design, while preserving interactivity, instead
>>> of optimising for one particular end use.
>> Well, I haven't tested it yet, but your design choices please me. As you
>> know, I've been one of those encountering big starvation problems with
>> the original scheduler, making 2.6 unusable for me in many situations. I
>> welcome your work and want to thank you for the time you spend trying to
>> fix it.
>>
>> Keep up the good work,
>> Willy
>>
>> PS: I've looked at your graphs, I hope you're on the way to something
>> really better than the 21 first 2.6 releases !
> Well, imho his current staircase scheduler already does a better job compared 
> to mainline, but it won't make it in (or at least, it's not likely). So we 
> can hope this WILL make it into mainline, but I wouldn't count on it.
> 
Wrong problem, what is really needed is to get CPU scheduler choice into 
mainline, just as i/o scheduler finally did. Con has noted that for some 
loads this will present suboptimal performance, as will his -ck patches, 
as will the default scheduler. Instead of trying to make ANY one size 
fit all, we should have a means to select, at runtime, between any of 
the schedulers, and preferably to define an interface by which a user 
can insert a new scheduler in the kernel (compile in, I don't mean 
plugable) with clear and well defined rules for how that can be done.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive  cpu scheduler
  2007-03-04  7:00 Con Kolivas
  2007-03-04  7:45 ` [ck] " Con Kolivas
  2007-03-04 11:08 ` Gene Heskett
@ 2007-03-05 21:52 ` Con Kolivas
  2007-03-08  8:53 ` Ingo Molnar
  2007-03-08 20:25 ` Fabio Comolli
  4 siblings, 0 replies; 69+ messages in thread
From: Con Kolivas @ 2007-03-05 21:52 UTC (permalink / raw)
  To: ck; +Cc: linux kernel mailing list

On Sunday 04 March 2007 18:00, Con Kolivas wrote:
> This message is to announce the first general public release of the
> "Rotating Staircase DeadLine" cpu scheduler.

>  A full rollup of the patch for 2.6.20:
> http://ck.kolivas.org/patches/staircase-deadline/sched-rsdl-0.26.patch

This patch has been queued at test.kernel.org (thanks Andy Whitcroft).

Here is a summary of the first few benchmarks from those tests rsdl 0.26 vs 
mainline 2.6.20 (However much it is that you value these particular 
benchmarks...) on amd64 4x:


Kernbench (lower elapsed is better);
=========
2.6.20:
Elapsed: 103.058s User: 351.974s System: 36.474s CPU: 376.6%
2.6.20-rsdl:
Elapsed: 102.848s User: 359.186s System: 36.666s CPU: 384.6%


tbench (higher is better):
======
2.6.20:
Throughput 165.5 MB/sec 1 procs
2.6.20-rsdl:
Throughput 261.418 MB/sec 1 procs


reaim (higher is better):
=====
2.6.20:
Max Jobs per Minute 500727.27
2.6.20-rsdl:
Max Jobs per Minute 612000.00

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu   scheduler
  2007-03-05 18:29       ` Simon Arlott
@ 2007-03-05 21:36         ` Con Kolivas
  0 siblings, 0 replies; 69+ messages in thread
From: Con Kolivas @ 2007-03-05 21:36 UTC (permalink / raw)
  To: Simon Arlott; +Cc: ck list, linux-kernel

On Tuesday 06 March 2007 05:29, Simon Arlott wrote:
> On 04/03/07 22:27, Con Kolivas wrote:
> > On Monday 05 March 2007 09:19, Simon Arlott wrote:
> >> If I run glxgears, thunderbird/firefox become really slow to
> >> respond/display and cpu usage isn't even at 100%. I had thunderbird
> >> lagging on keyboard character repeat earlier but can't reproduce that
> >> now even with glxgears - however firefox lags really badly on keyboard
> >> input with glxgears running.
> >
> > Hi Simon
> >
> > You're hitting a nasty udev bug here that is unrelated to the cpu
> > scheduler
>
> Yes, I've already reported this.
>
> > and almost certainly responsible for your bad behaviour.
>
> What? How do you make the leap from lockdep output on boot to my experience
> of slow interactive response?

In general I've found once the kernel did something funny on boot that not 
much beyond that could be trusted. However a very small period of alarm with 
lockdep, _assuming it was transient and did not happen later_ as you say, 
would not cause problems.

> > Also this patch was actually for 2.6.20 and you seem to have applied it
> > to 2.6.21-rc2. I haven't even checked that it cleanly applies to that
> > kernel.
>
> It doesn't, a one line fix from Ingo conflicts (and a comment that was
> changed). 7355690ead6d61f6344072ae61060f985060da29 [PATCH] sched: fix SMT
> scheduler bug 72fd4a35a824331d7a0f4168d7576502d95d34b3 [PATCH] Numerous
> fixes to kernel-doc info in source files.

Ok, but I was very reluctant to release this code as a patch to -rc2 because 
the potential for all sorts of weird and wonderful fireworks and explosions 
from the -pre release makes it a lousy testing ground for something else. 
It's often hard to differentiate breakage in behaviour in the unstable kernel 
or the patch you're applying. I was even somewhat reluctant to be releasing 
this patch for 2.6.20 since people are mumbling (offlist) that they're 
getting weird stalls on that kernel - however they're afraid to report them 
to lkml since their issues are fuzzy and they know they'll be ask to do git 
bisect on an intermittent problem; git bisect as we know is rocket science 
(or brain surgery if you happen to be a rocket scientist).

> I have reverted your patch and the laggy behaviour remains when running
> glxgears (in fact it seems a bit worse), I don't know about the thunderbird
> lag while typing since I can't easily reproduce it but I don't recall it
> ever happening before. So, your patch isn't the cause of a serious slowdown
> when running glxgears in the background (although it doesn't improve the
> situation much). It looks like it just slows down X in general rather than
> using much CPU.

Which goes back to my original comment about glxgears, only I should have made 
my advice stronger; glxgears should not be used as a benchmarking or test 
tool of any sort period. It displays pretty gears and that's all. On your 
machine, for example, it is not cpu bound from the looks of your description, 
and either loads up the bus or drowns the gpu in ways that delays other gpu 
activity. On other hardware combinations, it does something completely 
different. Given that, it's easy to see how it is impossible to use it as a 
cpu scheduling test of any sort.

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-04 22:27     ` Con Kolivas
@ 2007-03-05 18:29       ` Simon Arlott
  2007-03-05 21:36         ` Con Kolivas
  0 siblings, 1 reply; 69+ messages in thread
From: Simon Arlott @ 2007-03-05 18:29 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel

On 04/03/07 22:27, Con Kolivas wrote:
> On Monday 05 March 2007 09:19, Simon Arlott wrote:
>> If I run glxgears, thunderbird/firefox become really slow to
>> respond/display and cpu usage isn't even at 100%. I had thunderbird lagging
>> on keyboard character repeat earlier but can't reproduce that now even with
>> glxgears - however firefox lags really badly on keyboard input with
>> glxgears running.
> 
> Hi Simon
> 
> You're hitting a nasty udev bug here that is unrelated to the cpu scheduler 
Yes, I've already reported this.

> and almost certainly responsible for your bad behaviour. 
What? How do you make the leap from lockdep output on boot to my experience of 
slow interactive response?

> Also this patch was actually for 2.6.20 and you seem to have applied it to 
> 2.6.21-rc2. I haven't even checked that it cleanly applies to that kernel.

It doesn't, a one line fix from Ingo conflicts (and a comment that was changed).
7355690ead6d61f6344072ae61060f985060da29 [PATCH] sched: fix SMT scheduler bug
72fd4a35a824331d7a0f4168d7576502d95d34b3 [PATCH] Numerous fixes to kernel-doc info in source files.

I have reverted your patch and the laggy behaviour remains when running 
glxgears (in fact it seems a bit worse), I don't know about the thunderbird 
lag while typing since I can't easily reproduce it but I don't recall it ever 
happening before. So, your patch isn't the cause of a serious slowdown when 
running glxgears in the background (although it doesn't improve the situation 
much). It looks like it just slows down X in general rather than using much 
CPU.

-- 
Simon Arlott


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-05  2:31               ` Zwane Mwaikambo
@ 2007-03-05  3:16                 ` Gene Heskett
  0 siblings, 0 replies; 69+ messages in thread
From: Gene Heskett @ 2007-03-05  3:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Zwane Mwaikambo, Con Kolivas, ck list

On Sunday 04 March 2007, Zwane Mwaikambo wrote:
>On Sun, 4 Mar 2007, Gene Heskett wrote:
>> >There will be times when the mainline scheduler feels more
>> > interactive than this scheduler, and that is because it has
>> > significant unfairness granted towards interactive tasks. This
>> > degree of unfairness in an effort to maintain interactivity has been
>> > criticised and causes problems in certain environments with both
>> > loss of fairness, relative starvation and is not entirely
>> > predictable.
>>
>> Well, in 20 minutes of playing, I am so far VERY impressed, the kmail
>> composer is typing to the screen in unison to my keystrokes, where
>> with the older version it often went away for 10 or more seconds at a
>> time, then displayed the last sentence I had typed in one (or 2
>> sometimes) swell foop.  Now I can back up and correct a typo in real
>> time whereas before, it was often faster if the typo was half a line
>> back, and the key repeat so darned slow it was often over a second per
>> character cell moved, to go grab the mouse, position the cursor on the
>> typo, click, wait for the indicator to move, and then fix it.  Typing
>> is now pleasurable again.  Key repeats seem to remain at the set in
>> the bios key repeat speed.  Amazing.  I do believe you have given me
>> back my machine.
>
>What are the specs on your hardware?

XP-2800 Athlon on a biostar board whose model number I've forgotten. 1 GB 
of dual channel 400 mhz ram running at 333 because this athlon falls over 
at 400mhz fsb settings.  480GB of drives in 3 pieces.

Running a jaton 3dForce 6200-256 video card with nvidia driver, which does 
the opengl stuff better than the ati 9200se I took out a week ago, but 
this nvidia card is several times harder on the cpu when tvtime is 
running than the ati was.  There are other cards in this box, 1394, nic, 
usb expander, but that is the main list of power burners.

Its dmesg clock is just short of 2100. and:
Calibrating delay using timer specific routine.. 4177.82 BogoMIPS 
(lpj=2088910)

In comparison to an XP-1400 I had before, this is nowhere near 2x faster.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-04 14:11             ` Gene Heskett
@ 2007-03-05  2:31               ` Zwane Mwaikambo
  2007-03-05  3:16                 ` Gene Heskett
  0 siblings, 1 reply; 69+ messages in thread
From: Zwane Mwaikambo @ 2007-03-05  2:31 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Con Kolivas, ck list

On Sun, 4 Mar 2007, Gene Heskett wrote:

> >There will be times when the mainline scheduler feels more interactive
> > than this scheduler, and that is because it has significant unfairness
> > granted towards interactive tasks. This degree of unfairness in an
> > effort to maintain interactivity has been criticised and causes
> > problems in certain environments with both loss of fairness, relative
> > starvation and is not entirely predictable.
> 
> Well, in 20 minutes of playing, I am so far VERY impressed, the kmail 
> composer is typing to the screen in unison to my keystrokes, where with 
> the older version it often went away for 10 or more seconds at a time, 
> then displayed the last sentence I had typed in one (or 2 sometimes) 
> swell foop.  Now I can back up and correct a typo in real time whereas 
> before, it was often faster if the typo was half a line back, and the key 
> repeat so darned slow it was often over a second per character cell 
> moved, to go grab the mouse, position the cursor on the typo, click, wait 
> for the indicator to move, and then fix it.  Typing is now pleasurable 
> again.  Key repeats seem to remain at the set in the bios key repeat 
> speed.  Amazing.  I do believe you have given me back my machine.

What are the specs on your hardware?


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-04 23:13   ` Willy Tarreau
  2007-03-04 23:58     ` Con Kolivas
@ 2007-03-05  1:09     ` Gene Heskett
  1 sibling, 0 replies; 69+ messages in thread
From: Gene Heskett @ 2007-03-05  1:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Willy Tarreau, Con Kolivas, Al Boldi, ck list

On Sunday 04 March 2007, Willy Tarreau wrote:
>On Mon, Mar 05, 2007 at 08:49:29AM +1100, Con Kolivas wrote:
>(...)
>
>> > That's just what it did, but when you "nice make -j4", things
>> > (gears) start to stutter.  Is that due to the staircase?
>>
>> gears isn't an interactive task. Apart from using it as a background
>> load to check for starvation because it loads up the cpu fully (which
>> a gpu intensive but otherwise simple app like this should _not_ do)
>> graphics card drivers and interrupts and so on, I wouldn't put much
>> credence on gears as anything else. However I suspect that gears will
>> still get a fair share of the cpu on RSDL which almost never happens
>> on any other scheduler.
>
>Con,
>
>I've now given it a try with HZ=250 on my dual-athlon. It works
> beautifully. I also quickly checked that playing mp3 doesn't skip
> during make -j4, and that gears runs fairly smoothly, since those are
> the references people often use.
>
>But with real work, it's excellent too. When I saturate my CPUs by
> injecting HTTP traffic on haproxy, the load is stable and the command
> line perfectly responsive, while in the past the load would oscillate
> and the command line sometimes stopped to respond for a few seconds.
>
>I've also launched my scheddos program (you may remember, the one we did
> a few experiments with). I could not cause any freeze at all. Plain
> 2.6.20 had already improved a lot in this area, but above 4
> processes/CPU, occasional short freezes did still occur. This time,
> even at 100 processes, the system was rather slow (of course!) but just
> as expected, and nothing more.
>
>I also tried the good old "dd if=/dev/zero bs=1|...|dd bs=1
> of=/dev/null" and it did not cause any trouble.
>
>I will boot 2.6 slightly more often to test the code under various
> conditions, and I will recommend it to a few people I know who tend to
> switch back to 2.4 after one day full of 2.6 jerkiness.
>
>Overall, you have done a great job !
>
>I hope that more people will give it a try, first to help find possible
>remaining bugs, and to pronounce in favour of its inclusion in mainline.
>
>Cheers,
>Willy
>
+1 for mainline, this is a HUGE improvement in usability.  End of 
discussion AFAIK.
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-04 23:13   ` Willy Tarreau
@ 2007-03-04 23:58     ` Con Kolivas
  2007-03-05  1:09     ` Gene Heskett
  1 sibling, 0 replies; 69+ messages in thread
From: Con Kolivas @ 2007-03-04 23:58 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Al Boldi, ck list, linux-kernel

On Monday 05 March 2007 10:13, Willy Tarreau wrote:
> I've now given it a try with HZ=250 on my dual-athlon.

Great, thanks. The HZ should make very little difference, except for slightly 
lower latencies as you increase the HZ.

> It works 
> beautifully. I also quickly checked that playing mp3 doesn't skip during
> make -j4, and that gears runs fairly smoothly, since those are the
> references people often use.

Good. I obviously need to ensure that these remain well managed since I, for 
one, never want linux to suck on the desktop. In your case, I was more 
interested in your real world problems you were having so the rest of your 
report is even more important to me.

> But with real work, it's excellent too. When I saturate my CPUs by
> injecting HTTP traffic on haproxy, the load is stable and the command line
> perfectly responsive, while in the past the load would oscillate and the
> command line sometimes stopped to respond for a few seconds.
>
> I've also launched my scheddos program (you may remember, the one we did a
> few experiments with). I could not cause any freeze at all. Plain 2.6.20
> had already improved a lot in this area, but above 4 processes/CPU,
> occasional short freezes did still occur. This time, even at 100 processes,
> the system was rather slow (of course!) but just as expected, and nothing
> more.
>
> I also tried the good old "dd if=/dev/zero bs=1|...|dd bs=1 of=/dev/null"
> and it did not cause any trouble.
>
> I will boot 2.6 slightly more often to test the code under various
> conditions, and I will recommend it to a few people I know who tend to
> switch back to 2.4 after one day full of 2.6 jerkiness.
>
> Overall, you have done a great job !

Thank you very much. I think you have nicely pointed out some of the real 
world advantages to this design. Those testcases are good for testing average 
latency, fairness, and consistency of both of them under various loads so 
they are excellent real world examples. If you extrapolate those facts to 
server loads you can understand how the fairness and latency can translate 
into advantages anywhere.

> I hope that more people will give it a try, first to help find possible
> remaining bugs, and to pronounce in favour of its inclusion in mainline.

Thanks. I've already had a hundred or so people testing this code (thanks to 
the -ck mailing list people!) without any recent bugs prior to announcing it 
to lkml so I believe it is quite safe for testing.

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-04 21:49 ` Con Kolivas
       [not found]   ` <45EB45F7.3050208@simon.arlott.org.uk>
@ 2007-03-04 23:13   ` Willy Tarreau
  2007-03-04 23:58     ` Con Kolivas
  2007-03-05  1:09     ` Gene Heskett
  1 sibling, 2 replies; 69+ messages in thread
From: Willy Tarreau @ 2007-03-04 23:13 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Al Boldi, ck list, linux-kernel

On Mon, Mar 05, 2007 at 08:49:29AM +1100, Con Kolivas wrote:
(...)
> > That's just what it did, but when you "nice make -j4", things (gears) start
> > to stutter.  Is that due to the staircase?
> 
> gears isn't an interactive task. Apart from using it as a background load to 
> check for starvation because it loads up the cpu fully (which a gpu intensive 
> but otherwise simple app like this should _not_ do) graphics card drivers and 
> interrupts and so on, I wouldn't put much credence on gears as anything else. 
> However I suspect that gears will still get a fair share of the cpu on RSDL 
> which almost never happens on any other scheduler.

Con,

I've now given it a try with HZ=250 on my dual-athlon. It works beautifully.
I also quickly checked that playing mp3 doesn't skip during make -j4, and
that gears runs fairly smoothly, since those are the references people often
use.

But with real work, it's excellent too. When I saturate my CPUs by injecting
HTTP traffic on haproxy, the load is stable and the command line perfectly
responsive, while in the past the load would oscillate and the command line
sometimes stopped to respond for a few seconds.

I've also launched my scheddos program (you may remember, the one we did a
few experiments with). I could not cause any freeze at all. Plain 2.6.20 had
already improved a lot in this area, but above 4 processes/CPU, occasional
short freezes did still occur. This time, even at 100 processes, the system
was rather slow (of course!) but just as expected, and nothing more.

I also tried the good old "dd if=/dev/zero bs=1|...|dd bs=1 of=/dev/null" and
it did not cause any trouble.

I will boot 2.6 slightly more often to test the code under various conditions,
and I will recommend it to a few people I know who tend to switch back to 2.4
after one day full of 2.6 jerkiness.

Overall, you have done a great job !

I hope that more people will give it a try, first to help find possible
remaining bugs, and to pronounce in favour of its inclusion in mainline.

Cheers,
Willy


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu  scheduler
       [not found]   ` <45EB45F7.3050208@simon.arlott.org.uk>
@ 2007-03-04 22:27     ` Con Kolivas
  2007-03-05 18:29       ` Simon Arlott
  0 siblings, 1 reply; 69+ messages in thread
From: Con Kolivas @ 2007-03-04 22:27 UTC (permalink / raw)
  To: Simon Arlott; +Cc: ck list, linux-kernel

On Monday 05 March 2007 09:19, Simon Arlott wrote:
> On 04/03/07 21:49, Con Kolivas wrote:
> > On Monday 05 March 2007 07:35, Al Boldi wrote:
> >> Con Kolivas wrote:
> >>> This means that if you heavily load up your machine without the use of
> >>> 'nice' then your interactive tasks _will_ slow down proportionately to
> >>> the amount of cpu you use. So doing make -j4 for example will make any
> >>> other task started in taht presence run precisely 1/5th speed, but they
> >>> will still be responsive, have low latency (and audio shouldn't skip
> >>> for example).
> >>
> >> That's just what it did, but when you "nice make -j4", things (gears)
> >> start to stutter.  Is that due to the staircase?
> >
> > gears isn't an interactive task. Apart from using it as a background load
> > to check for starvation because it loads up the cpu fully (which a gpu
> > intensive but otherwise simple app like this should _not_ do) graphics
> > card drivers and interrupts and so on, I wouldn't put much credence on
> > gears as anything else. However I suspect that gears will still get a
> > fair share of the cpu on RSDL which almost never happens on any other
> > scheduler.
>
> If I run glxgears, thunderbird/firefox become really slow to
> respond/display and cpu usage isn't even at 100%. I had thunderbird lagging
> on keyboard character repeat earlier but can't reproduce that now even with
> glxgears - however firefox lags really badly on keyboard input with
> glxgears running.

Hi Simon

You're hitting a nasty udev bug here that is unrelated to the cpu scheduler 
and almost certainly responsible for your bad behaviour. 

[   39.314953] =================================
[   39.315068] [ INFO: inconsistent lock state ]
[   39.315120] 2.6.21-rc2-git #161
[   39.315167] ---------------------------------
[   39.315217] inconsistent {softirq-on-R} -> {in-softirq-W} usage.
[   39.315271] udevd/1424 [HC0[0]:SC1[2]:HE1:SE0] takes:
[   39.315323]  (&ndev->lock){-+-?}, at: [<b04a57c4>] 
ipv6_add_addr+0x164/0x1e0
[   39.315525] {softirq-on-R} state was registered at:
[   39.315576]   [<b013f8d2>] __lock_acquire+0x622/0xbb0
[   39.315731]   [<b01402e2>] lock_acquire+0x62/0x80
[   39.315883]   [<b050fc55>] _read_lock+0x35/0x50
[   39.316043]   [<b050c1e0>] sctp_v6_copy_addrlist+0x30/0xc0
[   39.316197]   [<b04f9cd2>] sctp_get_local_addr_list+0x32/0x60
[   39.316358]   [<b06de1f0>] sctp_init+0x2c0/0x550
[   39.316516]   [<b06bcc55>] do_initcalls+0x35/0x120
[   39.316687]   [<b06bcd5c>] do_basic_setup+0x1c/0x30
[   39.316840]   [<b06bcdbf>] init+0x3f/0x90
[   39.316994]   [<b0104d87>] kernel_thread_helper+0x7/0x10
[   39.317150]   [<ffffffff>] 0xffffffff
[   39.317300] irq event stamp: 1976
[   39.317348] hardirqs last  enabled at (1976): [<b014087e>] 
debug_check_no_locks_freed+0xbe/0xd0
[   39.317481] hardirqs last disabled at (1975): [<b01407ea>] 
debug_check_no_locks_freed+0x2a/0xd0
[   39.317618] softirqs last  enabled at (1808): [<b0435557>] 
release_sock+0x57/0xb0
[   39.317751] softirqs last disabled at (1853): [<b0125277>] 
do_softirq+0x47/0x50
[   39.317884] 
[   39.317885] other info that might help us debug this:
[   39.317978] 3 locks held by udevd/1424:
[   39.318026]  #0:  (&mm->mmap_sem){----}, at: [<b0116ae6>] 
do_page_fault+0xb6/0x5a0
[   39.318263]  #1:  (&npinfo->poll_lock){-+..}, at: [<b043cf28>] 
net_rx_action+0x158/0x1a0
[   39.318500]  #2:  (&tp->rx_lock){-+..}, at: [<b034ef7e>] 
rtl8139_poll+0x3e/0x120
[   39.318742] 
[   39.318743] stack backtrace:
[   39.318833]  [<b0104f4a>] show_trace_log_lvl+0x1a/0x30
[   39.318924]  [<b0104f72>] show_trace+0x12/0x20
[   39.319009]  [<b0105086>] dump_stack+0x16/0x20
[   39.319094]  [<b013e4d1>] print_usage_bug+0x151/0x160
[   39.319180]  [<b013e9a3>] mark_lock+0x4c3/0x580
[   39.319265]  [<b013f8b0>] __lock_acquire+0x600/0xbb0
[   39.319354]  [<b01402e2>] lock_acquire+0x62/0x80
[   39.319439]  [<b050fff5>] _write_lock+0x35/0x50
[   39.319526]  [<b04a57c4>] ipv6_add_addr+0x164/0x1e0
[   39.319612]  [<b04a76a2>] addrconf_prefix_rcv+0x2b2/0x560
[   39.319707]  [<b04b39b1>] ndisc_router_discovery+0x431/0x600
[   39.319798]  [<b04b4552>] ndisc_rcv+0x92/0xc0
[   39.319883]  [<b04b98b0>] icmpv6_rcv+0x2b0/0x2d0
[   39.319971]  [<b04a4975>] ip6_input+0x1a5/0x3c0
[   39.320057]  [<b04a4f3a>] ip6_mc_input+0x3a/0x60
[   39.320145]  [<b04a45ab>] ipv6_rcv+0x15b/0x380
[   39.320231]  [<b043cc31>] netif_receive_skb+0x271/0x2f0
[   39.320318]  [<b034ebb2>] rtl8139_rx+0x142/0x340
[   39.320405]  [<b034ef99>] rtl8139_poll+0x59/0x120
[   39.320494]  [<b043ce59>] net_rx_action+0x89/0x1a0
[   39.320580]  [<b01251d2>] __do_softirq+0x52/0xb0
[   39.320666]  [<b0125277>] do_softirq+0x47/0x50
[   39.320754]  [<b0125335>] irq_exit+0x75/0x80
[   39.320843]  [<b0106931>] do_IRQ+0x41/0x80
[   39.320929]  [<b0104be2>] common_interrupt+0x2e/0x34
[   39.321016]  [<b0510634>] error_code+0x74/0x7c
[   39.321103]  =======================

Also this patch was actually for 2.6.20 and you seem to have applied it to 
2.6.21-rc2. I haven't even checked that it cleanly applies to that kernel.

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-04 20:35 Al Boldi
@ 2007-03-04 21:49 ` Con Kolivas
       [not found]   ` <45EB45F7.3050208@simon.arlott.org.uk>
  2007-03-04 23:13   ` Willy Tarreau
  2007-03-06  8:42 ` [ck] " Xavier Bestel
  1 sibling, 2 replies; 69+ messages in thread
From: Con Kolivas @ 2007-03-04 21:49 UTC (permalink / raw)
  To: Al Boldi, ck list; +Cc: linux-kernel

On Monday 05 March 2007 07:35, Al Boldi wrote:
> Con Kolivas wrote:
> > > >> >> >This message is to announce the first general public release of
> > > >> >> > the "Rotating Staircase DeadLine" cpu scheduler.
>
> Thanks a lot!

You're welcome.
>
> > Just to make it clear. The purpose of this scheduler is at all costs to
> > maintain absolute fairness no matter what type of load it is put under.
>
> Great!
>
> > This means that if you heavily load up your machine without the use of
> > 'nice' then your interactive tasks _will_ slow down proportionately to
> > the amount of cpu you use. So doing make -j4 for example will make any
> > other task started in taht presence run precisely 1/5th speed, but they
> > will still be responsive, have low latency (and audio shouldn't skip for
> > example).
>
> That's just what it did, but when you "nice make -j4", things (gears) start
> to stutter.  Is that due to the staircase?

gears isn't an interactive task. Apart from using it as a background load to 
check for starvation because it loads up the cpu fully (which a gpu intensive 
but otherwise simple app like this should _not_ do) graphics card drivers and 
interrupts and so on, I wouldn't put much credence on gears as anything else. 
However I suspect that gears will still get a fair share of the cpu on RSDL 
which almost never happens on any other scheduler.

P.S. Al, please don't drop ccs from the original thread.

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
@ 2007-03-04 20:35 Al Boldi
  2007-03-04 21:49 ` Con Kolivas
  2007-03-06  8:42 ` [ck] " Xavier Bestel
  0 siblings, 2 replies; 69+ messages in thread
From: Al Boldi @ 2007-03-04 20:35 UTC (permalink / raw)
  To: linux-kernel

Con Kolivas wrote:
> > >> >> >This message is to announce the first general public release of
> > >> >> > the "Rotating Staircase DeadLine" cpu scheduler.

Thanks a lot!

> Just to make it clear. The purpose of this scheduler is at all costs to
> maintain absolute fairness no matter what type of load it is put under.

Great!

> This means that if you heavily load up your machine without the use of
> 'nice' then your interactive tasks _will_ slow down proportionately to the
> amount of cpu you use. So doing make -j4 for example will make any other
> task started in taht presence run precisely 1/5th speed, but they will
> still be responsive, have low latency (and audio shouldn't skip for
> example).

That's just what it did, but when you "nice make -j4", things (gears) start 
to stutter.  Is that due to the staircase?


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu  scheduler
  2007-03-04 13:49           ` Con Kolivas
  2007-03-04 14:11             ` Gene Heskett
@ 2007-03-04 14:36             ` Willy Tarreau
  2007-03-04 16:08               ` [ck] " jos poortvliet
  1 sibling, 1 reply; 69+ messages in thread
From: Willy Tarreau @ 2007-03-04 14:36 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Gene Heskett, linux-kernel, ck list

Hi Con !

On Mon, Mar 05, 2007 at 12:49:49AM +1100, Con Kolivas wrote:
> On Monday 05 March 2007 00:25, Gene Heskett wrote:
> > On Sunday 04 March 2007, Con Kolivas wrote:
> > >On Sunday 04 March 2007 23:24, Gene Heskett wrote:
> > >> On Sunday 04 March 2007, Con Kolivas wrote:
> > >> >On Sunday 04 March 2007 22:08, Gene Heskett wrote:
> > >> >> On Sunday 04 March 2007, Con Kolivas wrote:
> > >> >> >This message is to announce the first general public release of
> > >> >> > the "Rotating Staircase DeadLine" cpu scheduler.
> > >> >>
> > >> >> I assume to test this, we select the deadline scheduler?
> > >> >
> > >> >No, only the "deadline" in the name is shared. This is a cpu process
> > >> > scheduler whereas the deadline scheduler you're thinking of is an
> > >> > I/O scheduler. To test this you just patch it in and it replaces the
> > >> > current mainline cpu scheduler (the same way the staircase cpu
> > >> > scheduler in -ck replaces it).
> > >>
> > >> Oh, then I tried to put the -ck1 patch on top of that, and blew the
> > >> tree up.  I'd built it the first time with the deadline scheduler as
> > >> the default while I was waiting on your reply.
> > >
> > >Yes, sorry. This is mutually exclusive with the -ck1 patch. It is a
> > > standalone piece of code.
> >
> > I just this instant got it booted, what with building a driver for nvidia
> > and all.  I'll let you know what I think in a few hours after I've gotten
> > a feel for it.
> 
> Great, thanks.
> 
> Just to make it clear. The purpose of this scheduler is at all costs to 
> maintain absolute fairness no matter what type of load it is put under. This 
> means that if you heavily load up your machine without the use of 'nice' then 
> your interactive tasks _will_ slow down proportionately to the amount of cpu 
> you use. So doing make -j4 for example will make any other task started in 
> taht presence run precisely 1/5th speed, but they will still be responsive, 
> have low latency (and audio shouldn't skip for example). 
> 
> There will be times when the mainline scheduler feels more interactive than 
> this scheduler, and that is because it has significant unfairness granted 
> towards interactive tasks. This degree of unfairness in an effort to maintain 
> interactivity has been criticised and causes problems in certain environments 
> with both loss of fairness, relative starvation and is not entirely 
> predictable. 
> 
> This was designed to be robust for any application since linux demands a 
> general purpose scheduler design, while preserving interactivity, instead of 
> optimising for one particular end use.

Well, I haven't tested it yet, but your design choices please me. As you
know, I've been one of those encountering big starvation problems with
the original scheduler, making 2.6 unusable for me in many situations. I
welcome your work and want to thank you for the time you spend trying to
fix it.

Keep up the good work,
Willy

PS: I've looked at your graphs, I hope you're on the way to something really
    better than the 21 first 2.6 releases !


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-04 13:49           ` Con Kolivas
@ 2007-03-04 14:11             ` Gene Heskett
  2007-03-05  2:31               ` Zwane Mwaikambo
  2007-03-04 14:36             ` Willy Tarreau
  1 sibling, 1 reply; 69+ messages in thread
From: Gene Heskett @ 2007-03-04 14:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Con Kolivas, ck list

On Sunday 04 March 2007, Con Kolivas wrote:
>On Monday 05 March 2007 00:25, Gene Heskett wrote:
>> On Sunday 04 March 2007, Con Kolivas wrote:
>> >On Sunday 04 March 2007 23:24, Gene Heskett wrote:
>> >> On Sunday 04 March 2007, Con Kolivas wrote:
>> >> >On Sunday 04 March 2007 22:08, Gene Heskett wrote:
>> >> >> On Sunday 04 March 2007, Con Kolivas wrote:
>> >> >> >This message is to announce the first general public release of
>> >> >> > the "Rotating Staircase DeadLine" cpu scheduler.
>> >> >>
>> >> >> I assume to test this, we select the deadline scheduler?
>> >> >
>> >> >No, only the "deadline" in the name is shared. This is a cpu
>> >> > process scheduler whereas the deadline scheduler you're thinking
>> >> > of is an I/O scheduler. To test this you just patch it in and it
>> >> > replaces the current mainline cpu scheduler (the same way the
>> >> > staircase cpu scheduler in -ck replaces it).
>> >>
>> >> Oh, then I tried to put the -ck1 patch on top of that, and blew the
>> >> tree up.  I'd built it the first time with the deadline scheduler
>> >> as the default while I was waiting on your reply.
>> >
>> >Yes, sorry. This is mutually exclusive with the -ck1 patch. It is a
>> > standalone piece of code.
>>
>> I just this instant got it booted, what with building a driver for
>> nvidia and all.  I'll let you know what I think in a few hours after
>> I've gotten a feel for it.
>
>Great, thanks.
>
>Just to make it clear. The purpose of this scheduler is at all costs to
>maintain absolute fairness no matter what type of load it is put under.
> This means that if you heavily load up your machine without the use of
> 'nice' then your interactive tasks _will_ slow down proportionately to
> the amount of cpu you use. So doing make -j4 for example will make any
> other task started in taht presence run precisely 1/5th speed, but they
> will still be responsive, have low latency (and audio shouldn't skip
> for example).
>
>There will be times when the mainline scheduler feels more interactive
> than this scheduler, and that is because it has significant unfairness
> granted towards interactive tasks. This degree of unfairness in an
> effort to maintain interactivity has been criticised and causes
> problems in certain environments with both loss of fairness, relative
> starvation and is not entirely predictable.

Well, in 20 minutes of playing, I am so far VERY impressed, the kmail 
composer is typing to the screen in unison to my keystrokes, where with 
the older version it often went away for 10 or more seconds at a time, 
then displayed the last sentence I had typed in one (or 2 sometimes) 
swell foop.  Now I can back up and correct a typo in real time whereas 
before, it was often faster if the typo was half a line back, and the key 
repeat so darned slow it was often over a second per character cell 
moved, to go grab the mouse, position the cursor on the typo, click, wait 
for the indicator to move, and then fix it.  Typing is now pleasurable 
again.  Key repeats seem to remain at the set in the bios key repeat 
speed.  Amazing.  I do believe you have given me back my machine.

I may find something to squawk about in due time, I think that Murphy guys 
laws may have a corollary on that subject, but it sure feels good right 
now.

>This was designed to be robust for any application since linux demands a
>general purpose scheduler design, while preserving interactivity,
> instead of optimising for one particular end use.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive  cpu scheduler
  2007-03-04  7:45 ` [ck] " Con Kolivas
@ 2007-03-04 14:04   ` Con Kolivas
  0 siblings, 0 replies; 69+ messages in thread
From: Con Kolivas @ 2007-03-04 14:04 UTC (permalink / raw)
  To: ck; +Cc: linux kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 1612 bytes --]

On Sunday 04 March 2007 18:45, Con Kolivas wrote:
> On Sunday 04 March 2007 18:00, Con Kolivas wrote:
> > This message is to announce the first general public release of the
> > "Rotating Staircase DeadLine" cpu scheduler.
> >
> > Based on previous work from the staircase cpu scheduler I set out to
> > design, from scratch, a new scheduling policy design which satisfies
> > every requirement for SCHED_NORMAL (otherwise known as SCHED_OTHER) task
> > management.
> >
> > Available for download are:
> >
> >  A full rollup of the patch for 2.6.20:
> > http://ck.kolivas.org/patches/staircase-deadline/sched-rsdl-0.26.patch
>
> It's probably worth mentioning that this scheduler shows a not
> insignificant improvement in the mysql test case that recently received a
> lot of publicity. Why exactly that's the case I'm not sure but it may help
> track down what is actually responsible for the performance drop off as
> well. Note that the SMP balancing of this cpu scheduler is essentially
> unchanged from the mainline one.

Only preliminary other benchmarking has been done so far on RSDL, and so far 
the results are equal to or slightly better than mainline on scalability 
grounds.

These are the sysbench graphs just for completion that Nishanth Aravamudan 
performed. We were actually trying to track down the cause of the mysql 
performance problem and I suggested trying different cpu schedulers as well. 
Needless to say I'm not unhappy with the results although they haven't told 
us exactly where the problem lies but it certainly does add evidence that 
perhaps it is a scheduling issue.

-- 
-ck

[-- Attachment #2: transactions.png --]
[-- Type: image/png, Size: 7950 bytes --]

[-- Attachment #3: idle.png --]
[-- Type: image/png, Size: 6317 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu  scheduler
  2007-03-04 13:25         ` Gene Heskett
@ 2007-03-04 13:49           ` Con Kolivas
  2007-03-04 14:11             ` Gene Heskett
  2007-03-04 14:36             ` Willy Tarreau
  0 siblings, 2 replies; 69+ messages in thread
From: Con Kolivas @ 2007-03-04 13:49 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, ck list

On Monday 05 March 2007 00:25, Gene Heskett wrote:
> On Sunday 04 March 2007, Con Kolivas wrote:
> >On Sunday 04 March 2007 23:24, Gene Heskett wrote:
> >> On Sunday 04 March 2007, Con Kolivas wrote:
> >> >On Sunday 04 March 2007 22:08, Gene Heskett wrote:
> >> >> On Sunday 04 March 2007, Con Kolivas wrote:
> >> >> >This message is to announce the first general public release of
> >> >> > the "Rotating Staircase DeadLine" cpu scheduler.
> >> >>
> >> >> I assume to test this, we select the deadline scheduler?
> >> >
> >> >No, only the "deadline" in the name is shared. This is a cpu process
> >> > scheduler whereas the deadline scheduler you're thinking of is an
> >> > I/O scheduler. To test this you just patch it in and it replaces the
> >> > current mainline cpu scheduler (the same way the staircase cpu
> >> > scheduler in -ck replaces it).
> >>
> >> Oh, then I tried to put the -ck1 patch on top of that, and blew the
> >> tree up.  I'd built it the first time with the deadline scheduler as
> >> the default while I was waiting on your reply.
> >
> >Yes, sorry. This is mutually exclusive with the -ck1 patch. It is a
> > standalone piece of code.
>
> I just this instant got it booted, what with building a driver for nvidia
> and all.  I'll let you know what I think in a few hours after I've gotten
> a feel for it.

Great, thanks.

Just to make it clear. The purpose of this scheduler is at all costs to 
maintain absolute fairness no matter what type of load it is put under. This 
means that if you heavily load up your machine without the use of 'nice' then 
your interactive tasks _will_ slow down proportionately to the amount of cpu 
you use. So doing make -j4 for example will make any other task started in 
taht presence run precisely 1/5th speed, but they will still be responsive, 
have low latency (and audio shouldn't skip for example). 

There will be times when the mainline scheduler feels more interactive than 
this scheduler, and that is because it has significant unfairness granted 
towards interactive tasks. This degree of unfairness in an effort to maintain 
interactivity has been criticised and causes problems in certain environments 
with both loss of fairness, relative starvation and is not entirely 
predictable. 

This was designed to be robust for any application since linux demands a 
general purpose scheduler design, while preserving interactivity, instead of 
optimising for one particular end use.

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-04 12:46       ` Con Kolivas
@ 2007-03-04 13:25         ` Gene Heskett
  2007-03-04 13:49           ` Con Kolivas
  0 siblings, 1 reply; 69+ messages in thread
From: Gene Heskett @ 2007-03-04 13:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Con Kolivas, ck list

On Sunday 04 March 2007, Con Kolivas wrote:
>On Sunday 04 March 2007 23:24, Gene Heskett wrote:
>> On Sunday 04 March 2007, Con Kolivas wrote:
>> >On Sunday 04 March 2007 22:08, Gene Heskett wrote:
>> >> On Sunday 04 March 2007, Con Kolivas wrote:
>> >> >This message is to announce the first general public release of
>> >> > the "Rotating Staircase DeadLine" cpu scheduler.
>> >>
>> >> I assume to test this, we select the deadline scheduler?
>> >
>> >No, only the "deadline" in the name is shared. This is a cpu process
>> > scheduler whereas the deadline scheduler you're thinking of is an
>> > I/O scheduler. To test this you just patch it in and it replaces the
>> > current mainline cpu scheduler (the same way the staircase cpu
>> > scheduler in -ck replaces it).
>>
>> Oh, then I tried to put the -ck1 patch on top of that, and blew the
>> tree up.  I'd built it the first time with the deadline scheduler as
>> the default while I was waiting on your reply.
>
>Yes, sorry. This is mutually exclusive with the -ck1 patch. It is a
> standalone piece of code.

I just this instant got it booted, what with building a driver for nvidia 
and all.  I'll let you know what I think in a few hours after I've gotten 
a feel for it.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu  scheduler
  2007-03-04 12:24     ` Gene Heskett
@ 2007-03-04 12:46       ` Con Kolivas
  2007-03-04 13:25         ` Gene Heskett
  0 siblings, 1 reply; 69+ messages in thread
From: Con Kolivas @ 2007-03-04 12:46 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, ck list

On Sunday 04 March 2007 23:24, Gene Heskett wrote:
> On Sunday 04 March 2007, Con Kolivas wrote:
> >On Sunday 04 March 2007 22:08, Gene Heskett wrote:
> >> On Sunday 04 March 2007, Con Kolivas wrote:
> >> >This message is to announce the first general public release of the
> >> > "Rotating Staircase DeadLine" cpu scheduler.
> >>
> >> I assume to test this, we select the deadline scheduler?
> >
> >No, only the "deadline" in the name is shared. This is a cpu process
> > scheduler whereas the deadline scheduler you're thinking of is an I/O
> > scheduler. To test this you just patch it in and it replaces the
> > current mainline cpu scheduler (the same way the staircase cpu
> > scheduler in -ck replaces it).
>
> Oh, then I tried to put the -ck1 patch on top of that, and blew the tree
> up.  I'd built it the first time with the deadline scheduler as the
> default while I was waiting on your reply.

Yes, sorry. This is mutually exclusive with the -ck1 patch. It is a standalone 
piece of code.

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-04 11:47   ` Con Kolivas
@ 2007-03-04 12:24     ` Gene Heskett
  2007-03-04 12:46       ` Con Kolivas
  0 siblings, 1 reply; 69+ messages in thread
From: Gene Heskett @ 2007-03-04 12:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Con Kolivas, ck list

On Sunday 04 March 2007, Con Kolivas wrote:
>On Sunday 04 March 2007 22:08, Gene Heskett wrote:
>> On Sunday 04 March 2007, Con Kolivas wrote:
>> >This message is to announce the first general public release of the
>> > "Rotating Staircase DeadLine" cpu scheduler.
>>
>> I assume to test this, we select the deadline scheduler?
>
>No, only the "deadline" in the name is shared. This is a cpu process
> scheduler whereas the deadline scheduler you're thinking of is an I/O
> scheduler. To test this you just patch it in and it replaces the
> current mainline cpu scheduler (the same way the staircase cpu
> scheduler in -ck replaces it).

Oh, then I tried to put the -ck1 patch on top of that, and blew the tree 
up.  I'd built it the first time with the deadline scheduler as the 
default while I was waiting on your reply.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu  scheduler
  2007-03-04 11:08 ` Gene Heskett
@ 2007-03-04 11:47   ` Con Kolivas
  2007-03-04 12:24     ` Gene Heskett
  0 siblings, 1 reply; 69+ messages in thread
From: Con Kolivas @ 2007-03-04 11:47 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, ck list

On Sunday 04 March 2007 22:08, Gene Heskett wrote:
> On Sunday 04 March 2007, Con Kolivas wrote:
> >This message is to announce the first general public release of the
> > "Rotating Staircase DeadLine" cpu scheduler.
>
> I assume to test this, we select the deadline scheduler?

No, only the "deadline" in the name is shared. This is a cpu process scheduler 
whereas the deadline scheduler you're thinking of is an I/O scheduler. To 
test this you just patch it in and it replaces the current mainline cpu 
scheduler (the same way the staircase cpu scheduler in -ck replaces it).

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
  2007-03-04  7:00 Con Kolivas
  2007-03-04  7:45 ` [ck] " Con Kolivas
@ 2007-03-04 11:08 ` Gene Heskett
  2007-03-04 11:47   ` Con Kolivas
  2007-03-05 21:52 ` Con Kolivas
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 69+ messages in thread
From: Gene Heskett @ 2007-03-04 11:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Con Kolivas, ck list

On Sunday 04 March 2007, Con Kolivas wrote:
>This message is to announce the first general public release of the
> "Rotating Staircase DeadLine" cpu scheduler.

I assume to test this, we select the deadline scheduler?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
@ 2007-03-04  7:00 Con Kolivas
  2007-03-04  7:45 ` [ck] " Con Kolivas
                   ` (4 more replies)
  0 siblings, 5 replies; 69+ messages in thread
From: Con Kolivas @ 2007-03-04  7:00 UTC (permalink / raw)
  To: linux kernel mailing list, ck list

This message is to announce the first general public release of the "Rotating 
Staircase DeadLine" cpu scheduler. 

Based on previous work from the staircase cpu scheduler I set out to design, 
from scratch, a new scheduling policy design which satisfies every 
requirement for SCHED_NORMAL (otherwise known as SCHED_OTHER) task management.

Available for download are:

 A full rollup of the patch for 2.6.20:
http://ck.kolivas.org/patches/staircase-deadline/sched-rsdl-0.26.patch

 Split patches for 2.6.20(which will follow this email):
http://ck.kolivas.org/patches/staircase-deadline/split-out/

 The readme (which will also constitute the rest of this email):
http://ck.kolivas.org/patches/staircase-deadline/rsdl_scheduler.readme


The following readme is also included as documentation in 
Documentation/sched-design.txt


Rotating Staircase Deadline cpu scheduler policy
================================================

Design summary
==============

A novel design which incorporates a foreground-background descending priority
system (the staircase) with runqueue managed minor and major epochs (rotation
and deadline).


Features
========

A starvation free, strict fairness O(1) scalable design with interactivity
as good as the above restrictions can provide. There is no interactivity
estimator, no sleep/run measurements and only simple fixed accounting.
The design has strict enough a design and accounting that task behaviour
can be modelled and maximum scheduling latencies can be predicted by
the virtual deadline mechanism that manages runqueues. The prime concern
in this design is to maintain fairness at all costs determined by nice level,
yet to maintain as good interactivity as can be allowed within the
constraints of strict fairness.


Design description
==================

RSDL works off the principle of providing each task a quota of runtime that
it is allowed to run at each priority level equal to its static priority
(ie. its nice level) and every priority below that. When each task is queued,
the cpu that it is queued onto also keeps a record of that quota. If the
task uses up its quota it is decremented one priority level. Also, if the cpu
notices a quota full has been used for that priority level, it pushes
everything remaining at that priority level to the next lowest priority
level. Once every runtime quota has been consumed of every priority level,
a task is queued on the "expired" array. When no other tasks exist with
quota, the expired array is activated and fresh quotas are handed out. This
is all done in O(1).


Design details
==============

Each cpu has its own runqueue which micromanages its own epochs, and each
task keeps a record of its own entitlement of cpu time. Most of the rest
of these details apply to non-realtime tasks as rt task management is
straight forward.

Each runqueue keeps a record of what major epoch it is up to in the
rq->prio_rotation field which is incremented on each major epoch. It also
keeps a record of quota available to each priority value valid for that
major epoch in rq->prio_quota[].

Each task keeps a record of what major runqueue epoch it was last running
on in p->rotation. It also keeps a record of what priority levels it has
already been allocated quota from during this epoch in a bitmap p->bitmap.

The only tunable that determines all other details is the RR_INTERVAL. This
is set to 6ms (minimum on 1000HZ, higher at different HZ values).

All tasks are initially given a quota based on RR_INTERVAL. This is equal to
RR_INTERVAL between nice values of 0 and 19, and progressively larger for
nice values from -1 to -20. This is assigned to p->quota and only changes
with changes in nice level.

As a task is first queued, it checks in recalc_task_prio to see if it has
run at this runqueue's current priority rotation. If it has not, it will
have its p->prio level set to equal its p->static_prio (nice level) and will
be given a p->time_slice equal to the p->quota, and has its allocation
bitmap bit set in p->bitmap for its static priority (nice value). This
quota is then also added to the current runqueue's rq->prio_quota[p->prio].
It is then queued on the current active priority array.

If a task has already been running during this major epoch, if it has
p->time_slice left and the rq->prio_quota for the task's p->prio still
has quota, it will be placed back on the active array, but no more quota
will be added to either the task or the runqueue quota.

If a task has been running during this major epoch, but does not have
p->time_slice left or the runqueue's prio_quota for this task's p->prio
does not have quota, it will find the next lowest priority in its bitmap
that it has not been allocated quota from. It then gets the a full quota
in p->time_slice and adds that to the quota value for the relevant priority
rq->prio_quota. It is then queued on the current active priority array at
the newly determined lower priority.

If a task has been running during this major epoch, and does not have
any entitlement left in p->bitmap and no time_slice left, it will have its
bitmap cleared, and be queued at its p->static_prio again, but on the expired
priority array. No quota will be allocated until this task is scheduled.

When a task is queued, it has its static_prio bit set in the current
runqueue's rq->static_bitmap, and the relevant bit in the rq->dyn_bitmap.
In order to minimise the number of bitmap lookups, the bitmap of queued
tasks on the expired array is at the end of the same bitmap as the active
array. The number of tasks queued at the current static_prio is kept in
rq->prio_queued[].

During a scheduler_tick where a task is running, the p->time_slice is
decremented, and if it reaches zero then the recalc_task_prio is readjusted
and the task rescheduled.

During a task running tick, the runqueue prio_quota is also decremented. If
it empties then a priority rotation occurs (a major or minor epoch). If the
current runqueue's priority level is better than that of nice 19 tasks, a
minor rotation is performed, otherwise a major rotation will occur.

A minor rotation takes the remaining tasks at this priority level queue and
merges them with a list_splice_tail with the queue from the next lowest
priority level. At this time, any tasks that have been merged will now
have invalid values in p->prio so this must be considered when dequeueing
the task, and for testing for preemption.

A major rotation takes the remaining tasks at this priority level queue and
merges them with a list_splice_tail with the best priority task running on
the expired array, and swaps the priority arrays. The priority quotas are
reset at this time. Any tasks that have been merged will now have invalid
values in p->array and possibly p->prio so this must be considered. The
rq->prio_rotation is incremented at this time.

When a task is dequeued, the dyn_bitmap bit is unset only after testing
that the relevant queue is actually empty since p->prio may be inaccurate
and no hard accounting of the number of tasks at that level is possible.

When selecting a new task for scheduling, after the first dynamic bit is
found on the dyn_bitmap, it is checked to see that a task is really queued
at that priority or if it is a false positive due to the task being
dequeued at a time when its p->prio does not match which queue it is on
after some form of priority rotation. This is a rare occurrence as it tends
to only occur if a task that is already waiting on a runqueue gets dequeued.
If the bitmap value is in the expired array range, a major priority rotation
is performed. If the chosen task has not been running during this major or
minor rotation it has new quota allocated at this time, and added to the
runqueue's quota.


Modelling deadline behaviour
============================

As the accounting in this design is hard and not modified by sleep average
calculations or interactivity modifiers, it is possible to accurately
predict the maximum latency that a task may experience under different
conditions. This is a virtual deadline mechanism enforced by mandatory
runqueue epochs, and not by trying to keep complicated accounting of each
task.

The maximum duration a task can run during one major epoch is determined
by its nice value. Nice 0 tasks can run at 19 different priority levels
for RR_INTERVAL duration during each epoch (the equivalent of nice 0 to nice
19). Nice 10 tasks can run at 9 priority levels for each epoch, and so on.

Therefore the maximum duration a runqueue epoch can take is determined by
the number of tasks running, and their nice level. After that, the maximum
duration it can take before a task can wait before it get scheduled is
determined by the difference between its nice value and the nice value of
the highest priority task queued.

In the following examples, these are _worst case scenarios_ and would rarely
occur, but can be modelled nonetheless to determine the maximum possible
latency.

So for example, if two nice 0 tasks are running, and one has just expired as
another is activated for the first time receiving a full quota for this
runqueue rotation, the first task will wait:

nr_tasks * max_duration + nice_difference * rr_interval
1 * 19 * RR_INTERVAL + 0 = 114ms

In the presence of a nice 10 task, a nice 0 task would wait a maximum of
1 * 10 * RR_INTERVAL + 0 = 60ms

In the presence of a nice 0 task, a nice 10 task would wait a maximum of
1 * 19 * RR_INTERVAL + 9 * RR_INTERVAL = 168ms

Using a more complicated example, if there are 4 tasks running fully cpu
bound, one each at nice -20, nice 0, nice 10 and nice 19, we can calculate
the maximum latency possible for the nice 10 task. Note that -20 tasks are
heavily biased for so this will be a long time, but can be modelled.

The nice -20 task has quota = RR_INTERVAL + 20*RR_INTERVAL = 21*RR_INTERVAL.
It can run at 39 priority levels so its maximum duration =
39 * 21 * RR_INTERVAL.
The nice 0 task works out to
19 * RR_INTERVAL
The nice 19 task works out to
RR_INTERVAL.

So major epoch can take up a maximum of
39 * 21 * RR_INTERVAL + 19 * RR_INTERVAL + RR_INTERVAL = 1229 * RR_INTERVAL;

Then before the nice 10 task will run, the nice -20 and nice 0 task will
run for 28 * 21 * RR_INTERVAL and 9 * RR_INTERVAL respectively for a total
of 597 * RR_INTERVAL.

This means the maximum duration a nice 10 task can wait in the presence of
these other tasks is 1826*RR_INTERVAL. This is a long time of course and is
heavily penalised by the presence of nice -20 tasks which would not be part
of a normal environment.

While this section describes the maximum latency a task can have, this size
latencies will only be seen by fully cpu bound tasks.


Achieving interactivity
=======================

A requirement of this scheduler design was to achieve good interactivity
despite being a completely fair deadline based design. The disadvantage of
designs that try to achieve interactivity is that they usually do so at
the expense of maintaining fairness. As cpu speeds increase, the requirement
for some sort of metered unfairness towards interactive tasks becomes a less
desirable phenomenon, but low latency and fairness remains mandatory to
good interactive performance.

This design relies on the fact that interactive tasks, by their nature,
sleep often. Most fair scheduling designs end up penalising such tasks
indirectly giving them less than their fair possible share because of the
sleep, and have to use a mechanism of bonusing their priority to offset
this based on the duration they sleep. This becomes increasingly inaccurate
as the number of running tasks rises and more tasks spend time waiting on
runqueues rather than sleeping, and it is impossible to tell whether the
task that's waiting on a runqueue only intends to run for a short period and
then sleep again after than runqueue wait. Furthermore, all such designs rely
on a period of time to pass to accumulate some form of statistic on the task
before deciding on how much to give them preference. The shorter this period,
the more rapidly bursts of cpu ruin the interactive tasks behaviour. The
longer this period, the longer it takes for interactive tasks to get low
scheduling latencies and fair cpu.

This design does not measure sleep time at all. Interactive tasks that sleep
often will wake up having consumed very little if any of their quota for
the current major priority rotation. The longer they have slept, the less
likely they are to even be on the current major priority rotation. Once
woken up, though, they get to use up a their full quota for that epoch,
whether part of a quota remains or a full quota. Overall, however, they
can still only run as much cpu time for that epoch as any other task of the
same nice level. This means that two tasks behaving completely differently
from fully cpu bound to waking/sleeping extremely frequently will still
get the same quota of cpu, but the latter will be using its quota for that
epoch in bursts rather than continuously. This guarantees that interactive
tasks get the same amount of cpu as cpu bound ones.

The other requirement of interactive tasks is also to obtain low latencies
for when they are scheduled. Unlike fully cpu bound tasks and the maximum
latencies possible described in the modelling deadline behaviour section
above, tasks that sleep will wake up with quota available usually at the
current runqueue's priority_level or better. This means that the most latency
they are likely to see is one RR_INTERVAL, and often they will preempt the
current task if it is not of a sleeping nature. This then guarantees very
low latency for interactive tasks, and the lowest latencies for the least
cpu bound tasks.

Sunday, 4th March 2007
Con Kolivas

-- 
-ck

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2007-03-18  1:29 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-05  9:45 [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler Nicolas Mailhot
2007-03-05  9:53 ` Gene Heskett
2007-03-05 10:00   ` Nicolas Mailhot
2007-03-05 15:22   ` Paolo Ciarrocchi
2007-03-05 18:37     ` Gene Heskett
2007-03-05 18:20   ` Lee Revell
2007-03-05 19:19     ` Gene Heskett
2007-03-05 22:40       ` Andrew Morton
2007-03-05 23:19         ` Gene Heskett
2007-03-06  2:23       ` Ed Tomlinson
2007-03-06  2:54         ` Linus Torvalds
2007-03-06  3:36           ` Gene Heskett
2007-03-09  4:04           ` Bill Davidsen
2007-03-09  6:31             ` Linus Torvalds
2007-03-09  7:04               ` Bill Huey
2007-03-09 10:54               ` William Lee Irwin III
2007-03-09 14:54               ` Bill Davidsen
2007-03-09 18:11                 ` Linus Torvalds
2007-03-06 17:50   ` Bill Davidsen
2007-03-06 20:06     ` Con Kolivas
2007-03-09  4:21       ` Bill Davidsen
  -- strict thread matches above, loose matches on Subject: below --
2007-03-11 22:29 bert hubert
2007-03-11 22:57 ` Con Kolivas
2007-03-08 14:27 Tim Tassonis
2007-03-06  4:57 Shawn Starr
2007-03-04 20:35 Al Boldi
2007-03-04 21:49 ` Con Kolivas
     [not found]   ` <45EB45F7.3050208@simon.arlott.org.uk>
2007-03-04 22:27     ` Con Kolivas
2007-03-05 18:29       ` Simon Arlott
2007-03-05 21:36         ` Con Kolivas
2007-03-04 23:13   ` Willy Tarreau
2007-03-04 23:58     ` Con Kolivas
2007-03-05  1:09     ` Gene Heskett
2007-03-06  8:42 ` [ck] " Xavier Bestel
2007-03-06 15:15   ` Al Boldi
2007-03-11 18:11     ` Al Boldi
2007-03-11 21:52       ` Con Kolivas
2007-03-11 22:12         ` Con Kolivas
2007-03-12  4:42           ` Al Boldi
2007-03-12  4:53             ` Con Kolivas
2007-03-12 11:26               ` Al Boldi
2007-03-12 12:52                 ` Con Kolivas
2007-03-12 14:14                   ` Al Boldi
2007-03-12 14:58                     ` [ck] " jos poortvliet
2007-03-12 17:41                       ` Al Boldi
2007-03-12 18:05                     ` Con Kolivas
2007-03-18  1:30                   ` Bill Davidsen
2007-03-04  7:00 Con Kolivas
2007-03-04  7:45 ` [ck] " Con Kolivas
2007-03-04 14:04   ` Con Kolivas
2007-03-04 11:08 ` Gene Heskett
2007-03-04 11:47   ` Con Kolivas
2007-03-04 12:24     ` Gene Heskett
2007-03-04 12:46       ` Con Kolivas
2007-03-04 13:25         ` Gene Heskett
2007-03-04 13:49           ` Con Kolivas
2007-03-04 14:11             ` Gene Heskett
2007-03-05  2:31               ` Zwane Mwaikambo
2007-03-05  3:16                 ` Gene Heskett
2007-03-04 14:36             ` Willy Tarreau
2007-03-04 16:08               ` [ck] " jos poortvliet
2007-03-05 23:05                 ` Bill Davidsen
2007-03-06  0:18                   ` Con Kolivas
2007-03-06  4:41                     ` Willy Tarreau
2007-03-06  5:39                       ` Nicholas Miell
2007-03-06 19:04                       ` jos poortvliet
2007-03-06 21:37                       ` Bill Davidsen
2007-03-06 21:54                         ` Willy Tarreau
2007-03-05 21:52 ` Con Kolivas
2007-03-08  8:53 ` Ingo Molnar
2007-03-08 10:07   ` Con Kolivas
2007-03-08 20:25 ` Fabio Comolli
2007-03-08 20:57   ` Con Kolivas
2007-03-08 21:31     ` Fabio Comolli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).