linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch][rfc] quell interactive feeding frenzy
@ 2006-04-07  9:38 Mike Galbraith
  2006-04-07  9:47 ` Andrew Morton
  2006-04-07 12:56 ` Con Kolivas
  0 siblings, 2 replies; 43+ messages in thread
From: Mike Galbraith @ 2006-04-07  9:38 UTC (permalink / raw)
  To: lkml; +Cc: Ingo Molnar, Andrew Morton, Nick Piggin, Peter Williams, Con Kolivas

Greetings,

Problem:  Wake-up -> cpu latency increases with the number of runnable
tasks, ergo adding this latency to sleep_avg becomes increasingly potent
as nr_running increases.  This turns into a very nasty problem with as
few as 10 httpd tasks doing round robin scheduling.  The result is that
you can only login with difficulty, and interactivity is nonexistent.

Solution:  Restrict the amount of boost a task can receive from this
mechanism, and disable the mechanism entirely when load is high.  As
always, there is a price for increasing fairness.  In this case, the
price seems worth it.  It bought me a usable 2.6 apache server.


Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.17-rc1/kernel/sched.c.org	2006-04-07 08:52:47.000000000 +0200
+++ linux-2.6.17-rc1/kernel/sched.c	2006-04-07 08:57:34.000000000 +0200
@@ -3012,14 +3012,20 @@ go_idle:
 	queue = array->queue + idx;
 	next = list_entry(queue->next, task_t, run_list);
 
-	if (!rt_task(next) && interactive_sleep(next->sleep_type)) {
+	if (!rt_task(next) && interactive_sleep(next->sleep_type) &&
+			rq->nr_running < 1 + MAX_BONUS - CURRENT_BONUS(next)) {
 		unsigned long long delta = now - next->timestamp;
+		unsigned long max_delta;
 		if (unlikely((long long)(now - next->timestamp) < 0))
 			delta = 0;
 
 		if (next->sleep_type == SLEEP_INTERACTIVE)
 			delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
 
+		max_delta = (1 + MAX_BONUS - CURRENT_BONUS(next)) * GRANULARITY;
+		max_delta = JIFFIES_TO_NS(max_delta);
+		if (delta > max_delta)
+			delta = max_delta;
 		array = next->array;
 		new_prio = recalc_task_prio(next, next->timestamp + delta);
 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-07  9:38 [patch][rfc] quell interactive feeding frenzy Mike Galbraith
@ 2006-04-07  9:47 ` Andrew Morton
  2006-04-07  9:52   ` Ingo Molnar
  2006-04-07 10:40   ` Mike Galbraith
  2006-04-07 12:56 ` Con Kolivas
  1 sibling, 2 replies; 43+ messages in thread
From: Andrew Morton @ 2006-04-07  9:47 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel, mingo, nickpiggin, pwil3058, kernel

Mike Galbraith <efault@gmx.de> wrote:
>
> Problem:

I don't know what to do with all these patches you keep sending.

a) The other sched guys seem to be hiding and

b) I'm still sitting on smpnice, and I don't think that's had all its
   problems ironed out yet, so putting the interactivity things in there as
   well will complicate getting that sorted out.

But it's always nice to get email from you ;)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-07  9:47 ` Andrew Morton
@ 2006-04-07  9:52   ` Ingo Molnar
  2006-04-07 10:57     ` Mike Galbraith
  2006-04-07 10:40   ` Mike Galbraith
  1 sibling, 1 reply; 43+ messages in thread
From: Ingo Molnar @ 2006-04-07  9:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mike Galbraith, linux-kernel, nickpiggin, pwil3058, kernel


* Andrew Morton <akpm@osdl.org> wrote:

> Mike Galbraith <efault@gmx.de> wrote:
> >
> > Problem:
> 
> I don't know what to do with all these patches you keep sending.
> 
> a) The other sched guys seem to be hiding and
> 
> b) I'm still sitting on smpnice, and I don't think that's had all its
>    problems ironed out yet, so putting the interactivity things in 
>    there as well will complicate getting that sorted out.

i think we should try Mike's patches after smpnice got ironed out. The 
extreme-starvation cases should be handled more or less correctly now by 
the minimal set of changes from Mike that are upstream (knock on wood), 
the singing-dancing add-ons can probably wait a bit and smpnice clearly 
has priority.

	Ingo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-07  9:47 ` Andrew Morton
  2006-04-07  9:52   ` Ingo Molnar
@ 2006-04-07 10:40   ` Mike Galbraith
  1 sibling, 0 replies; 43+ messages in thread
From: Mike Galbraith @ 2006-04-07 10:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo, nickpiggin, pwil3058, kernel

On Fri, 2006-04-07 at 02:47 -0700, Andrew Morton wrote:
> Mike Galbraith <efault@gmx.de> wrote:
> >
> > Problem:
> 
> I don't know what to do with all these patches you keep sending.

Print and place in nearest dentist office?  Either that or maybe bounce
this one off the distcc problem in your copious spare time.

	-Mike


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-07  9:52   ` Ingo Molnar
@ 2006-04-07 10:57     ` Mike Galbraith
  2006-04-07 11:00       ` Con Kolivas
  0 siblings, 1 reply; 43+ messages in thread
From: Mike Galbraith @ 2006-04-07 10:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, nickpiggin, pwil3058, kernel

On Fri, 2006-04-07 at 11:52 +0200, Ingo Molnar wrote:

> i think we should try Mike's patches after smpnice got ironed out. The 
> extreme-starvation cases should be handled more or less correctly now by 
> the minimal set of changes from Mike that are upstream (knock on wood), 
> the singing-dancing add-ons can probably wait a bit and smpnice clearly 
> has priority.

(I'm still trying to find ways to do less singing and dancing.)

This patch you may notice wasn't against an mm kernel.  I was more or
less separating this one from the others, because I consider this
problem to be very severe.  IMHO, this or something like it needs to get
upstream soon.

	-Mike


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-07 10:57     ` Mike Galbraith
@ 2006-04-07 11:00       ` Con Kolivas
  2006-04-07 11:09         ` Mike Galbraith
  0 siblings, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2006-04-07 11:00 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, nickpiggin, pwil3058

On Friday 07 April 2006 20:57, Mike Galbraith wrote:
> On Fri, 2006-04-07 at 11:52 +0200, Ingo Molnar wrote:
> > i think we should try Mike's patches after smpnice got ironed out. The
> > extreme-starvation cases should be handled more or less correctly now by
> > the minimal set of changes from Mike that are upstream (knock on wood),
> > the singing-dancing add-ons can probably wait a bit and smpnice clearly
> > has priority.
>
> (I'm still trying to find ways to do less singing and dancing.)
>
> This patch you may notice wasn't against an mm kernel.  I was more or
> less separating this one from the others, because I consider this
> problem to be very severe.  IMHO, this or something like it needs to get
> upstream soon.

Which is a fine observation but your code is changing every 2nd day. Which is 
also fine because code needs to evolve. However that's not really the way we 
push stuff upstream...

-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-07 11:00       ` Con Kolivas
@ 2006-04-07 11:09         ` Mike Galbraith
  0 siblings, 0 replies; 43+ messages in thread
From: Mike Galbraith @ 2006-04-07 11:09 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, nickpiggin, pwil3058

On Fri, 2006-04-07 at 21:00 +1000, Con Kolivas wrote:
> On Friday 07 April 2006 20:57, Mike Galbraith wrote:
> > On Fri, 2006-04-07 at 11:52 +0200, Ingo Molnar wrote:
> > > i think we should try Mike's patches after smpnice got ironed out. The
> > > extreme-starvation cases should be handled more or less correctly now by
> > > the minimal set of changes from Mike that are upstream (knock on wood),
> > > the singing-dancing add-ons can probably wait a bit and smpnice clearly
> > > has priority.
> >
> > (I'm still trying to find ways to do less singing and dancing.)
> >
> > This patch you may notice wasn't against an mm kernel.  I was more or
> > less separating this one from the others, because I consider this
> > problem to be very severe.  IMHO, this or something like it needs to get
> > upstream soon.
> 
> Which is a fine observation but your code is changing every 2nd day. Which is 
> also fine because code needs to evolve. However that's not really the way we 
> push stuff upstream...

No, it's not changing much at all, though I wish it would.  WRT this
patch, you'll note that the mail subject contains the magical
incantation [rfc] (didn't work).  I care not one whit whether _my_ patch
gets sent upstream or to the bit bucket.  I care only that the problem
gets solved, and preferably sooner than later.

	-Mike


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-07  9:38 [patch][rfc] quell interactive feeding frenzy Mike Galbraith
  2006-04-07  9:47 ` Andrew Morton
@ 2006-04-07 12:56 ` Con Kolivas
  2006-04-07 13:37   ` Mike Galbraith
  1 sibling, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2006-04-07 12:56 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: lkml, Ingo Molnar, Andrew Morton, Nick Piggin, Peter Williams

On Friday 07 April 2006 19:38, Mike Galbraith wrote:
> Greetings,
>
> Problem:  Wake-up -> cpu latency increases with the number of runnable
> tasks, ergo adding this latency to sleep_avg becomes increasingly potent
> as nr_running increases.  This turns into a very nasty problem with as
> few as 10 httpd tasks doing round robin scheduling.  The result is that
> you can only login with difficulty, and interactivity is nonexistent.
>
> Solution:  Restrict the amount of boost a task can receive from this
> mechanism, and disable the mechanism entirely when load is high.  As
> always, there is a price for increasing fairness.  In this case, the
> price seems worth it.  It bought me a usable 2.6 apache server.

Since this is an RFC, here's my comments :)

This mechanism is designed to convert on-runqueue waiting time into sleep. The 
basic reason is that when the system is loaded, every task is fighting for 
cpu even if they only want say 1% cpu which means they never sleep and are 
waiting on a runqueue instead of sleeping 99% of the time. What you're doing 
is exactly biasing against what this mechanism is in place for. You'll get 
the same effect by bypassing or removing it entirely. Should we do that 
instead?

-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-07 12:56 ` Con Kolivas
@ 2006-04-07 13:37   ` Mike Galbraith
  2006-04-07 13:56     ` Con Kolivas
  0 siblings, 1 reply; 43+ messages in thread
From: Mike Galbraith @ 2006-04-07 13:37 UTC (permalink / raw)
  To: Con Kolivas; +Cc: lkml, Ingo Molnar, Andrew Morton, Nick Piggin, Peter Williams

On Fri, 2006-04-07 at 22:56 +1000, Con Kolivas wrote:
> On Friday 07 April 2006 19:38, Mike Galbraith wrote:
> > Greetings,
> >
> > Problem:  Wake-up -> cpu latency increases with the number of runnable
> > tasks, ergo adding this latency to sleep_avg becomes increasingly potent
> > as nr_running increases.  This turns into a very nasty problem with as
> > few as 10 httpd tasks doing round robin scheduling.  The result is that
> > you can only login with difficulty, and interactivity is nonexistent.
> >
> > Solution:  Restrict the amount of boost a task can receive from this
> > mechanism, and disable the mechanism entirely when load is high.  As
> > always, there is a price for increasing fairness.  In this case, the
> > price seems worth it.  It bought me a usable 2.6 apache server.
> 
> Since this is an RFC, here's my comments :)
> 
> This mechanism is designed to convert on-runqueue waiting time into sleep. The 
> basic reason is that when the system is loaded, every task is fighting for 
> cpu even if they only want say 1% cpu which means they never sleep and are 
> waiting on a runqueue instead of sleeping 99% of the time. What you're doing 
> is exactly biasing against what this mechanism is in place for. You'll get 
> the same effect by bypassing or removing it entirely. Should we do that 
> instead?

Heck no.  That mechanism is just as much about fairness as it is about
intertactivity, and as such is just fine and dandy in my book... once
it's toned down a bit^H^H^Htruckload.  What I'm doing isn't biasing
against the intent, I'm merely straightening the huge bend in favor of
interactive tasks who get this added boost over and over again, and
restricting the general effect to something practical.

Just look at what that mechanism does now with a 10 deep queue.  Every
dinky sleep can have an absolutely huge gob added to it, the exact worst
case number depends on how many cpus you have and whatnot.  Start a slew
of tasks, and you are doomed to have every task that sleeps for the
tiniest bit pegged at max interactive.

Maybe what I did isn't the best that can be done, but something has to
be done about that.  It is very b0rken under heavy load.

	-Mike


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-07 13:37   ` Mike Galbraith
@ 2006-04-07 13:56     ` Con Kolivas
  2006-04-07 14:14       ` Mike Galbraith
  0 siblings, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2006-04-07 13:56 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: lkml, Ingo Molnar, Andrew Morton, Nick Piggin, Peter Williams

On Friday 07 April 2006 23:37, Mike Galbraith wrote:
> On Fri, 2006-04-07 at 22:56 +1000, Con Kolivas wrote:
> > This mechanism is designed to convert on-runqueue waiting time into
> > sleep. The basic reason is that when the system is loaded, every task is
> > fighting for cpu even if they only want say 1% cpu which means they never
> > sleep and are waiting on a runqueue instead of sleeping 99% of the time.
> > What you're doing is exactly biasing against what this mechanism is in
> > place for. You'll get the same effect by bypassing or removing it
> > entirely. Should we do that instead?
>
> Heck no.  That mechanism is just as much about fairness as it is about
> intertactivity, and as such is just fine and dandy in my book... once
> it's toned down a bit^H^H^Htruckload.  What I'm doing isn't biasing
> against the intent, I'm merely straightening the huge bend in favor of
> interactive tasks who get this added boost over and over again, and
> restricting the general effect to something practical.

Have you actually tried without that mechanism? No compromise will be correct 
there for fairness one way or the other. There simply is no way to tell if a 
task is really sleeping or really just waiting on a runqueue. That's why Ingo 
weighted the tasks waking from interrupts more because of their likely 
intent. It's still a best guess scenario (which I'm sure you know). Wouldn't 
it be lovely to have a system that didn't guess and had simple sleep/wake cpu 
usage based heuristics? Oh well.

> Just look at what that mechanism does now with a 10 deep queue.  Every
> dinky sleep can have an absolutely huge gob added to it, the exact worst
> case number depends on how many cpus you have and whatnot.  Start a slew
> of tasks, and you are doomed to have every task that sleeps for the
> tiniest bit pegged at max interactive.

I'm quite aware of the effect it has :)

> Maybe what I did isn't the best that can be done, but something has to
> be done about that.  It is very b0rken under heavy load.

Your compromise is as good as any.

-- 
-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-07 13:56     ` Con Kolivas
@ 2006-04-07 14:14       ` Mike Galbraith
  2006-04-07 15:16         ` Mike Galbraith
  2006-04-09 11:14         ` bert hubert
  0 siblings, 2 replies; 43+ messages in thread
From: Mike Galbraith @ 2006-04-07 14:14 UTC (permalink / raw)
  To: Con Kolivas; +Cc: lkml, Ingo Molnar, Andrew Morton, Nick Piggin, Peter Williams

On Fri, 2006-04-07 at 23:56 +1000, Con Kolivas wrote:
> On Friday 07 April 2006 23:37, Mike Galbraith wrote:
> > On Fri, 2006-04-07 at 22:56 +1000, Con Kolivas wrote:
> > > This mechanism is designed to convert on-runqueue waiting time into
> > > sleep. The basic reason is that when the system is loaded, every task is
> > > fighting for cpu even if they only want say 1% cpu which means they never
> > > sleep and are waiting on a runqueue instead of sleeping 99% of the time.
> > > What you're doing is exactly biasing against what this mechanism is in
> > > place for. You'll get the same effect by bypassing or removing it
> > > entirely. Should we do that instead?
> >
> > Heck no.  That mechanism is just as much about fairness as it is about
> > intertactivity, and as such is just fine and dandy in my book... once
> > it's toned down a bit^H^H^Htruckload.  What I'm doing isn't biasing
> > against the intent, I'm merely straightening the huge bend in favor of
> > interactive tasks who get this added boost over and over again, and
> > restricting the general effect to something practical.
> 
> Have you actually tried without that mechanism?

Yes.  We're better off with it than without.

> > Just look at what that mechanism does now with a 10 deep queue.  Every
> > dinky sleep can have an absolutely huge gob added to it, the exact worst
> > case number depends on how many cpus you have and whatnot.  Start a slew
> > of tasks, and you are doomed to have every task that sleeps for the
> > tiniest bit pegged at max interactive.
> 
> I'm quite aware of the effect it has :)

Ok.  Do we then agree that it makes 2.6 unusable for small servers, and
that this constitutes a serious problem? :)

> > Maybe what I did isn't the best that can be done, but something has to
> > be done about that.  It is very b0rken under heavy load.
> 
> Your compromise is as good as any.

Ok.

	-Mike


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-07 14:14       ` Mike Galbraith
@ 2006-04-07 15:16         ` Mike Galbraith
  2006-04-09 11:14         ` bert hubert
  1 sibling, 0 replies; 43+ messages in thread
From: Mike Galbraith @ 2006-04-07 15:16 UTC (permalink / raw)
  To: Con Kolivas; +Cc: lkml, Ingo Molnar, Andrew Morton, Nick Piggin, Peter Williams

On Fri, 2006-04-07 at 16:14 +0200, Mike Galbraith wrote:
> On Fri, 2006-04-07 at 23:56 +1000, Con Kolivas wrote:
> > > Maybe what I did isn't the best that can be done, but something has to
> > > be done about that.  It is very b0rken under heavy load.
> > 
> > Your compromise is as good as any.

My tree with that change doing make -j100.  That change is what's
keeping it half sane.

top - 16:58:01 up 21 min,  6 users,  load average: 119.54, 96.91, 50.42

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20902 root      22   0 20580  17m 3680 R  4.4  1.7   0:00.41 cc1
21145 root      19   0 18224  14m 3688 R  4.4  1.5   0:00.30 cc1
22287 root      17   0 12668 7872 1988 R  4.4  0.8   0:00.11 cc1
21238 root      22   0 17020  13m 3724 R  4.0  1.4   0:00.32 cc1
21116 root      22   0 18048  13m 2468 R  4.0  1.3   0:00.30 cc1
21531 root      19   0 25688  22m 3744 R  4.0  2.2   0:01.03 cc1
21608 root      19   0 22512  18m 3720 R  4.0  1.9   0:00.60 cc1
21703 root      19   0 20372  17m 3752 R  4.0  1.7   0:00.72 cc1
21711 root      19   0 23044  19m 3748 R  4.0  1.9   0:00.81 cc1
21755 root      19   0 18072  14m 3696 R  4.0  1.4   0:00.42 cc1
21759 root      19   0 18332  14m 3712 R  4.0  1.5   0:00.43 cc1
22148 root      18   0 11484 6736 1976 R  3.2  0.7   0:00.08 cc1
21913 root      18   0 10304 5300 1952 R  2.0  0.5   0:00.05 cc1
22040 root      18   0 10304 5220 1952 R  2.0  0.5   0:00.05 cc1
22063 root      18   0 10296 5416 1976 R  2.0  0.5   0:00.05 cc1
22065 root      18   0 10300 5388 1976 R  2.0  0.5   0:00.05 cc1
22122 root      18   0 10444 5448 1952 R  2.0  0.5   0:00.05 cc1
22150 root      18   0 10444 5384 1952 R  2.0  0.5   0:00.05 cc1
20771 root      19   0 21412  17m 3740 D  1.6  1.8   0:00.67 cc1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-07 14:14       ` Mike Galbraith
  2006-04-07 15:16         ` Mike Galbraith
@ 2006-04-09 11:14         ` bert hubert
  2006-04-09 11:39           ` Mike Galbraith
  1 sibling, 1 reply; 43+ messages in thread
From: bert hubert @ 2006-04-09 11:14 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Con Kolivas, lkml, Ingo Molnar, Andrew Morton, Nick Piggin,
	Peter Williams

On Fri, Apr 07, 2006 at 04:14:54PM +0200, Mike Galbraith wrote:
> Ok.  Do we then agree that it makes 2.6 unusable for small servers, and
> that this constitutes a serious problem? :)

You sure? I may be down there in userspace dirt with the other actual Linux
users, but I hadn't noticed.

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://netherlabs.nl              Open and Closed source services

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-09 11:14         ` bert hubert
@ 2006-04-09 11:39           ` Mike Galbraith
  2006-04-09 12:14             ` bert hubert
  0 siblings, 1 reply; 43+ messages in thread
From: Mike Galbraith @ 2006-04-09 11:39 UTC (permalink / raw)
  To: bert hubert
  Cc: Con Kolivas, lkml, Ingo Molnar, Andrew Morton, Nick Piggin,
	Peter Williams

On Sun, 2006-04-09 at 13:14 +0200, bert hubert wrote:
> On Fri, Apr 07, 2006 at 04:14:54PM +0200, Mike Galbraith wrote:
> > Ok.  Do we then agree that it makes 2.6 unusable for small servers, and
> > that this constitutes a serious problem? :)
> 
> You sure? I may be down there in userspace dirt with the other actual Linux
> users, but I hadn't noticed.

Ok, unusable may be overstated.  Nonetheless, that bit of code causes
serious problems.  It makes my little PIII/500 test box trying to fill
one 100Mbit local network unusable.  That is not overstated.

	-Mike


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-09 11:39           ` Mike Galbraith
@ 2006-04-09 12:14             ` bert hubert
  2006-04-09 18:07               ` Mike Galbraith
  2006-04-09 18:24               ` Mike Galbraith
  0 siblings, 2 replies; 43+ messages in thread
From: bert hubert @ 2006-04-09 12:14 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Con Kolivas, lkml, Ingo Molnar, Andrew Morton, Nick Piggin,
	Peter Williams

On Sun, Apr 09, 2006 at 01:39:38PM +0200, Mike Galbraith wrote:
> Ok, unusable may be overstated.  Nonetheless, that bit of code causes
> serious problems.  It makes my little PIII/500 test box trying to fill
> one 100Mbit local network unusable.  That is not overstated.

If you try to make a PIII/500 fill 100mbit of TCP/IP using lots of different
processes, that IS a corner load.
 
I'm sure you can fix this (rare) workload but are you very sure you are not
killing off performance for other situations?

I get flashbacks to the old days of the VM where we had lots patches around
that would all solve (more or less) real problems, but never all at the same
time..

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://netherlabs.nl              Open and Closed source services

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-09 12:14             ` bert hubert
@ 2006-04-09 18:07               ` Mike Galbraith
  2006-04-10  9:12                 ` bert hubert
  2006-04-09 18:24               ` Mike Galbraith
  1 sibling, 1 reply; 43+ messages in thread
From: Mike Galbraith @ 2006-04-09 18:07 UTC (permalink / raw)
  To: bert hubert
  Cc: Con Kolivas, lkml, Ingo Molnar, Andrew Morton, Nick Piggin,
	Peter Williams

On Sun, 2006-04-09 at 14:14 +0200, bert hubert wrote:
> On Sun, Apr 09, 2006 at 01:39:38PM +0200, Mike Galbraith wrote:
> > Ok, unusable may be overstated.  Nonetheless, that bit of code causes
> > serious problems.  It makes my little PIII/500 test box trying to fill
> > one 100Mbit local network unusable.  That is not overstated.
> 
> If you try to make a PIII/500 fill 100mbit of TCP/IP using lots of different
> processes, that IS a corner load.
>  
> I'm sure you can fix this (rare) workload but are you very sure you are not
> killing off performance for other situations?

Rare?  What exactly is rare about a number of tasks serving data?  I
don't care if it's a P4 serving gigabit.  If you have to divide your
server into pieces (you do, and you know it) you're screwed. 

> I get flashbacks to the old days of the VM where we had lots patches around
> that would all solve (more or less) real problems, but never all at the same
> time..

I choose to take the high road here, and will not respond.

	-Mike


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-09 12:14             ` bert hubert
  2006-04-09 18:07               ` Mike Galbraith
@ 2006-04-09 18:24               ` Mike Galbraith
  1 sibling, 0 replies; 43+ messages in thread
From: Mike Galbraith @ 2006-04-09 18:24 UTC (permalink / raw)
  To: bert hubert
  Cc: Con Kolivas, lkml, Ingo Molnar, Andrew Morton, Nick Piggin,
	Peter Williams

On Sun, 2006-04-09 at 14:14 +0200, bert hubert wrote:
> On Sun, Apr 09, 2006 at 01:39:38PM +0200, Mike Galbraith wrote:
> > Ok, unusable may be overstated.  Nonetheless, that bit of code causes
> > serious problems.  It makes my little PIII/500 test box trying to fill
> > one 100Mbit local network unusable.  That is not overstated.
> 
> If you try to make a PIII/500 fill 100mbit of TCP/IP using lots of different
> processes, that IS a corner load.

I'm trying to wrap my head around this statement, and failing.  I have
10 tasks, I divide 100mbs/10 (_very_ modest concurrency) , and I can't
even login.

	-Mike 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-09 18:07               ` Mike Galbraith
@ 2006-04-10  9:12                 ` bert hubert
  2006-04-10 10:00                   ` Mike Galbraith
  0 siblings, 1 reply; 43+ messages in thread
From: bert hubert @ 2006-04-10  9:12 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Con Kolivas, lkml, Ingo Molnar, Andrew Morton, Nick Piggin,
	Peter Williams

On Sun, Apr 09, 2006 at 08:07:41PM +0200, Mike Galbraith wrote:
> Rare?  What exactly is rare about a number of tasks serving data?  I
> don't care if it's a P4 serving gigabit.  If you have to divide your
> server into pieces (you do, and you know it) you're screwed. 

You've not detailed your load. I assume it consists of lots of small files
being transferred over 10 apache processes? I also assume you max out your
system using apachebench? 

In general, Linux systems are not maxed out as they will disappoint that way
(like any system running with id=0). 

So yes, what you do is a 'rare load' as anybody trying to do this will
disappoint his users.

And any tweak you make to the scheduler this way is bound to affect another
load.

	Bert

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://netherlabs.nl              Open and Closed source services

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-10  9:12                 ` bert hubert
@ 2006-04-10 10:00                   ` Mike Galbraith
  2006-04-10 14:56                     ` Mike Galbraith
  0 siblings, 1 reply; 43+ messages in thread
From: Mike Galbraith @ 2006-04-10 10:00 UTC (permalink / raw)
  To: bert hubert
  Cc: Con Kolivas, lkml, Ingo Molnar, Andrew Morton, Nick Piggin,
	Peter Williams

On Mon, 2006-04-10 at 11:12 +0200, bert hubert wrote:
> On Sun, Apr 09, 2006 at 08:07:41PM +0200, Mike Galbraith wrote:
> > Rare?  What exactly is rare about a number of tasks serving data?  I
> > don't care if it's a P4 serving gigabit.  If you have to divide your
> > server into pieces (you do, and you know it) you're screwed. 
> 
> You've not detailed your load. I assume it consists of lots of small files
> being transferred over 10 apache processes? I also assume you max out your
> system using apachebench?

It's just retrieving the directory, so it's all cached.  Yes, it's ab.  

If it had to hit disk constantly, I'd probably be able to login, because
the stock scheduler blocks IO bound tasks from achieving max priority.

> In general, Linux systems are not maxed out as they will disappoint that way
> (like any system running with id=0). 
> 
> So yes, what you do is a 'rare load' as anybody trying to do this will
> disappoint his users.

Ok, it's rare... if you buy your hardware 10 sizes too large ;-)

The load just doesn't matter though, this apache load is by the
scheduler's own standard a cpu hog.  If you eliminate the man made
sleep, those httpds drop right down where they belong.

> And any tweak you make to the scheduler this way is bound to affect another
> load.

It's not a tweak, it's a bug fix, and course it will affect other loads.
As things stand, that code is contributing to interactivity, and
fairness, but is also contributing heavily to grotesque _unfairness_ to
the point of starvation in the extreme.

You may not like the testcase, but it remains a bug exposing testcase.

	-Mike


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-10 10:00                   ` Mike Galbraith
@ 2006-04-10 14:56                     ` Mike Galbraith
  2006-04-13  7:41                       ` Mike Galbraith
  0 siblings, 1 reply; 43+ messages in thread
From: Mike Galbraith @ 2006-04-10 14:56 UTC (permalink / raw)
  To: bert hubert
  Cc: Con Kolivas, lkml, Ingo Molnar, Andrew Morton, Nick Piggin,
	Peter Williams

On Mon, 2006-04-10 at 12:00 +0200, Mike Galbraith wrote:
> You may not like the testcase, but it remains a bug exposing testcase.

That proposed change just became moot.  Changing to pulling a 16k file
instead of the 20k directory makes it unmanageable with that change,
with it completely disabled, and even with a full throttling tree.

Oh well, I wanted to try run limiting queues anyway I guess (sigh).

	-Mike


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-10 14:56                     ` Mike Galbraith
@ 2006-04-13  7:41                       ` Mike Galbraith
  2006-04-13 10:16                         ` Con Kolivas
  0 siblings, 1 reply; 43+ messages in thread
From: Mike Galbraith @ 2006-04-13  7:41 UTC (permalink / raw)
  To: bert hubert
  Cc: Con Kolivas, lkml, Ingo Molnar, Andrew Morton, Nick Piggin,
	Peter Williams

[-- Attachment #1: Type: text/plain, Size: 5139 bytes --]

On Mon, 2006-04-10 at 16:56 +0200, Mike Galbraith wrote:
> On Mon, 2006-04-10 at 12:00 +0200, Mike Galbraith wrote:
> > You may not like the testcase, but it remains a bug exposing testcase.
> 
> That proposed change just became moot.  Changing to pulling a 16k file
> instead of the 20k directory makes it unmanageable with that change,
> with it completely disabled, and even with a full throttling tree.
> 
> Oh well, I wanted to try run limiting queues anyway I guess (sigh).

Something like below?

This way also allowed me to eliminate the interactive agony of an array
switch when at 100% cpu.  Seems to work well.  No more agony, only tiny
pin pricks.

Anyway, interested readers will find a copy of irman2.c, which is nice
for testing interactive starvation, attached.   The effect is most
noticeable with something like bonnie, which otherwise has zero chance
against irman2.  Just about anything will do though.  Trying to fire up
Amarok is good for a chuckle.  Whatever.  (if anyone plays with irman2
on 2.6.16 or below, call it with -S 1)

	-Mike

--- linux-2.6.17-rc1x/kernel/sched.c.org	2006-04-07 08:52:47.000000000 +0200
+++ linux-2.6.17-rc1x/kernel/sched.c	2006-04-13 09:03:28.000000000 +0200
@@ -2578,6 +2578,52 @@ void account_steal_time(struct task_stru
 		cpustat->steal = cputime64_add(cpustat->steal, tmp);
 }
 
+#define TASK_LATENCY(p, nr_running) \
+	JIFFIES_TO_NS(SCALE(USER_PRIO(p->prio), 39, nr_running * \
+	DEF_TIMESLICE + STARVATION_LIMIT))
+
+static inline void requeue_starving(runqueue_t *rq, unsigned long long now)
+{
+	prio_array_t *array = rq->active;
+	unsigned long *bitmap = array->bitmap;
+	int prio = rq->curr->prio, idx = prio + 1;
+	int noninteractive = 0, nr_running = rq->active->nr_active;
+
+repeat:
+	while ((idx = find_next_bit(bitmap, MAX_PRIO, idx)) < MAX_PRIO) {
+		struct list_head *queue = array->queue + idx;
+		task_t *p = list_entry(queue->next, task_t, run_list);
+		unsigned long latency = TASK_LATENCY(p, nr_running);
+
+		if (!TASK_INTERACTIVE(p))
+			noninteractive = idx;
+
+		if (!batch_task(p) && p->timestamp + latency < now) {
+			dequeue_task(p, p->array);
+			if (p->array == rq->active && p->prio > prio)
+				p->prio = prio;
+			enqueue_task(p, rq->active);
+
+			if (array == rq->expired) {
+				int idx = find_next_bit(bitmap, MAX_PRIO, 0);
+				rq->best_expired_prio = idx;
+				if (idx == MAX_PRIO)
+					rq->expired_timestamp = 0;
+			} else return;
+		}
+		idx++;
+	}
+	if (rq->expired_timestamp && array == rq->active &&
+			(!noninteractive || EXPIRED_STARVING(rq))) {
+		array = rq->expired;
+		bitmap = array->bitmap;
+		nr_running = rq->nr_running;
+		idx = 0;
+		goto repeat;
+	}
+
+}
+
 /*
  * This function gets called by the timer code, with HZ frequency.
  * We call it with interrupts disabled.
@@ -2632,6 +2678,11 @@ void scheduler_tick(void)
 		goto out_unlock;
 	}
 	if (!--p->time_slice) {
+		/*
+		 * Slip starving tasks into the stream.
+		 */
+		if (rq->nr_running > 1)
+			requeue_starving(rq, now);
 		dequeue_task(p, rq->active);
 		set_tsk_need_resched(p);
 		p->prio = effective_prio(p);
@@ -2640,7 +2691,7 @@ void scheduler_tick(void)
 
 		if (!rq->expired_timestamp)
 			rq->expired_timestamp = jiffies;
-		if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) {
+		if (!TASK_INTERACTIVE(p)) {
 			enqueue_task(p, rq->expired);
 			if (p->static_prio < rq->best_expired_prio)
 				rq->best_expired_prio = p->static_prio;
@@ -2935,9 +2986,12 @@ need_resched_nonpreemptible:
 	schedstat_inc(rq, sched_cnt);
 	now = sched_clock();
 	if (likely((long long)(now - prev->timestamp) < NS_MAX_SLEEP_AVG)) {
+		int active = rq->active->nr_active;
 		run_time = now - prev->timestamp;
 		if (unlikely((long long)(now - prev->timestamp) < 0))
 			run_time = 0;
+		else if (active > 1)
+			run_time *= min(active, 1 + MAX_BONUS);
 	} else
 		run_time = NS_MAX_SLEEP_AVG;
 
@@ -2945,7 +2999,7 @@ need_resched_nonpreemptible:
 	 * Tasks charged proportionately less run_time at high sleep_avg to
 	 * delay them losing their interactive status
 	 */
-	run_time /= (CURRENT_BONUS(prev) ? : 1);
+	run_time /= 1 + CURRENT_BONUS(prev);
 
 	spin_lock_irq(&rq->lock);
 
@@ -3012,18 +3066,24 @@ go_idle:
 	queue = array->queue + idx;
 	next = list_entry(queue->next, task_t, run_list);
 
-	if (!rt_task(next) && interactive_sleep(next->sleep_type)) {
+	if (!rt_task(next) && interactive_sleep(next->sleep_type) &&
+			rq->nr_running < 1 + MAX_BONUS - CURRENT_BONUS(next)) {
 		unsigned long long delta = now - next->timestamp;
+		unsigned long max_delta;
 		if (unlikely((long long)(now - next->timestamp) < 0))
 			delta = 0;
 
 		if (next->sleep_type == SLEEP_INTERACTIVE)
 			delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
 
+		max_delta = (1 + MAX_BONUS - CURRENT_BONUS(next)) * GRANULARITY;
+		max_delta = JIFFIES_TO_NS(max_delta);
+		if (delta > max_delta)
+			delta = max_delta;
 		array = next->array;
 		new_prio = recalc_task_prio(next, next->timestamp + delta);
 
-		if (unlikely(next->prio != new_prio)) {
+		if (unlikely(next->prio > new_prio)) {
 			dequeue_task(next, array);
 			next->prio = new_prio;
 			enqueue_task(next, array);


[-- Attachment #2: irman2.c --]
[-- Type: text/x-csrc, Size: 4330 bytes --]

/*
 *  irman by Davide Libenzi ( irman load generator )
 *  Copyright (C) 2003  Davide Libenzi
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, write to the Free Software
 *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 *
 *  Davide Libenzi <davidel@xmailserver.org>
 *
 */

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/socket.h>
#include <sys/signal.h>
#include <sys/resource.h>
#include <sys/socket.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>


#define BUFSIZE (1024 * 32)


static int *pipes, *child;
static int num_pipes, num_active, num_child, use_socket;
static unsigned long burn_ms;
static char buf1[BUFSIZE], buf2[BUFSIZE];
static volatile sig_atomic_t run = 1;
pid_t parent;

static void signal_all(int signum) {
	if (getpid() == parent) {
		while (num_child >= 0) {
			kill(child[num_child], SIGKILL);
			num_child--;
		}
		exit(0);
	} else if (signum == SIGKILL || getppid() == 1)
		run = 0;
}

unsigned long long getustime(void) {
	struct timeval tm;

	gettimeofday(&tm, NULL);
	return (unsigned long long) tm.tv_sec * 1000ULL + (unsigned long long) tm.tv_usec / 1000ULL;
}


int burn_ms_cpu(unsigned long ms) {
	int i, cmp = 0;
	unsigned long long ts;

	ts = getustime();
	do {
		for (i = 0; i < 4; i++)
			cmp += memcmp(buf1, buf2, BUFSIZE);
	} while (ts + ms > getustime());
	return cmp;
}


pid_t hog_process(void) {
	pid_t pid;

	if (!(pid = fork())) {
		while (run) {
			printf("HOG running %u\n", time(NULL));
			burn_ms_cpu(burn_ms);
		}
		exit(0);
	}
	return pid;
}


pid_t irman_process(int n) {
	int nn;
	pid_t pid;
	u_char ch;

	if (!(pid = fork())) {
		if ((nn = n + num_active) >= num_pipes)
			nn -= num_pipes;
		while (run) {
			printf("reading %u\n", n);
			read(pipes[2 * n], &ch, 1);
			burn_ms_cpu(burn_ms);
			printf("writing %u\n", nn);
			write(pipes[2 * nn + 1], "s", 1);
		}
		exit(0);
	}
	return pid;
}

int main (int argc, char **argv) {
	struct rlimit rl;
	int i, c;
	long work;
	int *cp, run_secs = 0;
	extern char *optarg;
	struct sigaction action;

	parent = getpid();
	num_pipes = 40;
	num_active = 1;
	burn_ms = 300;
	use_socket = 0;
	while ((c = getopt(argc, argv, "n:b:a:s:S:")) != -1) {
		switch (c) {
		case 'n':
			num_pipes = atoi(optarg);
			break;
		case 'b':
			burn_ms = atoi(optarg);
			break;
		case 'a':
			num_active = atoi(optarg);
			break;
		case 's':
			run_secs = 1 + atoi(optarg);
			break;
		case 'S':
			use_socket = 1;
			break;
		default:
			fprintf(stderr, "Illegal argument \"%c\"\n", c);
			exit(1);
		}
	}

	rl.rlim_cur = rl.rlim_max = num_pipes * 2 + 50;
	if (setrlimit(RLIMIT_NOFILE, &rl) == -1) {
		perror("setrlimit"); 
		exit(1);
	}

	pipes = calloc(num_pipes * 2, sizeof(int));
	if (pipes == NULL) {
		perror("malloc");
		exit(1);
	}

	child = calloc(num_pipes, sizeof(int));
	if (child == NULL) {
		perror("malloc");
		exit(1);
	}

	for (cp = pipes, i = 0; i < num_pipes; i++, cp += 2) {
		if (!use_socket) {
			if(pipe(cp) == -1) {
				perror("pipe");
				exit(1);
			}
		} else if (socketpair(AF_UNIX, SOCK_STREAM, 0, cp) == -1) {
			perror("socketpair");
			exit(1);
		}
	}

	memset(buf1, 'f', sizeof(buf1));
	memset(buf2, 'f', sizeof(buf2));

	sigemptyset(&action.sa_mask);
	/* establish termination handler */
	action.sa_handler = signal_all;
	action.sa_flags = SA_NODEFER;
	if (sigaction(SIGTERM, &action, NULL) == -1) {
		perror("Could not install signal handler");
		exit(1);
	}

	for (i = 0; i < num_pipes; i++)
		child[i] = irman_process(i);

	child[i] = hog_process();
	num_child = i;

	for (i = 0; i < num_active; i++)
		write(pipes[2 * i + 1], "s", 1);

	while (--run_secs)
		sleep(1);
	signal_all(SIGKILL);
	exit(0);
}


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-13  7:41                       ` Mike Galbraith
@ 2006-04-13 10:16                         ` Con Kolivas
  2006-04-13 11:05                           ` Mike Galbraith
  0 siblings, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2006-04-13 10:16 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: bert hubert, lkml, Ingo Molnar, Andrew Morton, Nick Piggin,
	Peter Williams

On Thursday 13 April 2006 17:41, Mike Galbraith wrote:
> This way also allowed me to eliminate the interactive agony of an array
> switch when at 100% cpu.  Seems to work well.  No more agony, only tiny
> pin pricks.
>
> Anyway, interested readers will find a copy of irman2.c, which is nice
> for testing interactive starvation, attached.   The effect is most
> noticeable with something like bonnie, which otherwise has zero chance
> against irman2.  Just about anything will do though.  Trying to fire up
> Amarok is good for a chuckle.  Whatever.  (if anyone plays with irman2
> on 2.6.16 or below, call it with -S 1)

Comments.

> +repeat:
> +	while ((idx = find_next_bit(bitmap, MAX_PRIO, idx)) < MAX_PRIO) {

...

> +		goto repeat;

...

> +               if (rq->nr_running > 1)
> +                       requeue_starving(rq, now);

An O(n) function in scheduler_tick is probably not the way to tackle this.

-- 
-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-13 10:16                         ` Con Kolivas
@ 2006-04-13 11:05                           ` Mike Galbraith
  0 siblings, 0 replies; 43+ messages in thread
From: Mike Galbraith @ 2006-04-13 11:05 UTC (permalink / raw)
  To: Con Kolivas
  Cc: bert hubert, lkml, Ingo Molnar, Andrew Morton, Nick Piggin,
	Peter Williams

On Thu, 2006-04-13 at 20:16 +1000, Con Kolivas wrote:
> On Thursday 13 April 2006 17:41, Mike Galbraith wrote:
> > This way also allowed me to eliminate the interactive agony of an array
> > switch when at 100% cpu.  Seems to work well.  No more agony, only tiny
> > pin pricks.
> >
> 
> Comments.
> 
> > +repeat:
> > +	while ((idx = find_next_bit(bitmap, MAX_PRIO, idx)) < MAX_PRIO) {
> 
> ...
> 
> > +		goto repeat;
> 
> ...
> 
> > +               if (rq->nr_running > 1)
> > +                       requeue_starving(rq, now);
> 
> An O(n) function in scheduler_tick is probably not the way to tackle this.

Why not?  It's one quick-like-bunny stop per occupied queue per slice
completion.

	-Mike


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-16  8:31                     ` Al Boldi
@ 2006-04-16  8:58                       ` Con Kolivas
  0 siblings, 0 replies; 43+ messages in thread
From: Con Kolivas @ 2006-04-16  8:58 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Sunday 16 April 2006 18:31, Al Boldi wrote:
> Con Kolivas wrote:
> > On Thursday 13 April 2006 01:25, Al Boldi wrote:
> > > Con Kolivas wrote:
> > > > mean 68.7 seconds
> > > >
> > > > range 63-73 seconds.
> > >
> > > Could this 10s skew be improved to around 1s to aid smoothness?
> >
> > It turns out to be dependant on accounting of system time which only
> > staircase does at the moment btw. Currently it's done on a jiffy basis.
> > To increase the accuracy of this would incur incredible cost which I
> > don't consider worth it.
>
> Is this also related to that?

No.

> > > Much smoother, but I still get this choke w/ 2 eatm 9999 loops running:
> > >
> > > 9 MB 783 KB eaten in 130 msec (74 MB/s)
> > > 9 MB 783 KB eaten in 2416 msec (3 MB/s)		<<<<<<<<<<<<<
> > > 9 MB 783 KB eaten in 197 msec (48 MB/s)
> > >
> > > You may have to adjust the kb to get the same effect.
> >
> > I've seen it. It's an artefact of timekeeping that it takes an
> > accumulation of data to get all the information. Not much I can do about
> > it except to have timeslices so small that they thrash the crap out of
> > cpu caches and completely destroy throughput.
>
> So why is this not visible in other schedulers?

When I said there's not much I can do about it I mean with respect to the 
design.

> Are you sure this is not a priority boost problem?

Indeed it is related to the way cpu is proportioned out in staircase being 
both priority and slice. Problem? The magnitude of said problem is up to the 
observer to decide. It's a phenomenon of only two infinitely repeating 
concurrent rapidly forking workloads when one forks every less than 100ms and 
the other more; ie your test case. I'm sure there's a real world workload 
somewhere somehow that exhibits this, but it's important to remember that 
overall it's fair with the occasional blip.

> > The current value, 6ms at 1000HZ, is chosen because it's the largest
> > value that can schedule a task in less than normal human perceptible
> > range when two competing heavily cpu bound tasks are the same priority.
> > At 250HZ it works out to 7.5ms and 10ms at 100HZ. Ironically in my
> > experimenting I found the cpu cache improvements become much less
> > significant above 7ms so I'm very happy with this compromise.
>
> Would you think this is dependent on cache-size and cpu-speed?

It is. Cache warmth time varies on architecture and design. Of course you're 
going to tell me to add a tunable and/or autotune this. Then that undoes the 
limiting it to human perception range. It really does cost us to export these 
things which are otherwise compile time constants... sigh.

> Also, what's this iso_cpu thing?

SCHED_ISO cpu usage which you're not using.

-- 
-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-16  6:02                   ` Con Kolivas
@ 2006-04-16  8:31                     ` Al Boldi
  2006-04-16  8:58                       ` Con Kolivas
  0 siblings, 1 reply; 43+ messages in thread
From: Al Boldi @ 2006-04-16  8:31 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel, Mike Galbraith

Con Kolivas wrote:
> On Thursday 13 April 2006 01:25, Al Boldi wrote:
> > Con Kolivas wrote:
> > > mean 68.7 seconds
> > >
> > > range 63-73 seconds.
> >
> > Could this 10s skew be improved to around 1s to aid smoothness?
>
> It turns out to be dependant on accounting of system time which only
> staircase does at the moment btw. Currently it's done on a jiffy basis. To
> increase the accuracy of this would incur incredible cost which I don't
> consider worth it.

Is this also related to that?

> > Much smoother, but I still get this choke w/ 2 eatm 9999 loops running:
> >
> > 9 MB 783 KB eaten in 130 msec (74 MB/s)
> > 9 MB 783 KB eaten in 2416 msec (3 MB/s)		<<<<<<<<<<<<<
> > 9 MB 783 KB eaten in 197 msec (48 MB/s)
> >
> > You may have to adjust the kb to get the same effect.
>
> I've seen it. It's an artefact of timekeeping that it takes an
> accumulation of data to get all the information. Not much I can do about
> it except to have timeslices so small that they thrash the crap out of cpu
> caches and completely destroy throughput.

So why is this not visible in other schedulers?

Are you sure this is not a priority boost problem?

> The current value, 6ms at 1000HZ, is chosen because it's the largest value
> that can schedule a task in less than normal human perceptible range when
> two competing heavily cpu bound tasks are the same priority. At 250HZ it
> works out to 7.5ms and 10ms at 100HZ. Ironically in my experimenting I
> found the cpu cache improvements become much less significant above 7ms so
> I'm very happy with this compromise.

Would you think this is dependent on cache-size and cpu-speed?

Also, what's this iso_cpu thing?

> Thanks!

Thank you!

--
Al


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12 15:25                 ` Al Boldi
  2006-04-13 11:51                   ` Con Kolivas
@ 2006-04-16  6:02                   ` Con Kolivas
  2006-04-16  8:31                     ` Al Boldi
  1 sibling, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2006-04-16  6:02 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Thursday 13 April 2006 01:25, Al Boldi wrote:
> Con Kolivas wrote:
> > mean 68.7 seconds
> >
> > range 63-73 seconds.
>
> Could this 10s skew be improved to around 1s to aid smoothness?

It turns out to be dependant on accounting of system time which only staircase 
does at the moment btw. Currently it's done on a jiffy basis. To increase the 
accuracy of this would incur incredible cost which I don't consider worth it.

-- 
-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-15 20:45                         ` Al Boldi
@ 2006-04-15 23:22                           ` Con Kolivas
  0 siblings, 0 replies; 43+ messages in thread
From: Con Kolivas @ 2006-04-15 23:22 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Sunday 16 April 2006 06:45, Al Boldi wrote:
> Con Kolivas wrote:
> > Thanks for bringing this to my attention. A while back I had different
> > management of forked tasks and merged it with PF_NONSLEEP. Since then
> > I've changed the management of NONSLEEP tasks and didn't realise it had
> > adversely affected the accounting of forking tasks. This patch should
> > rectify it.
>
> Congrats!
>
> Much smoother, but I still get this choke w/ 2 eatm 9999 loops running:

> 9 MB 783 KB eaten in 130 msec (74 MB/s)
> 9 MB 783 KB eaten in 2416 msec (3 MB/s)		<<<<<<<<<<<<<
> 9 MB 783 KB eaten in 197 msec (48 MB/s)

> You may have to adjust the kb to get the same effect.

I've seen it. It's an artefact of timekeeping that it takes an accumulation of 
data to get all the information. Not much I can do about it except to have 
timeslices so small that they thrash the crap out of cpu caches and 
completely destroy throughput.

The current value, 6ms at 1000HZ, is chosen because it's the largest value 
that can schedule a task in less than normal human perceptible range when two 
competing heavily cpu bound tasks are the same priority. At 250HZ it works 
out to 7.5ms and 10ms at 100HZ. Ironically in my experimenting I found the 
cpu cache improvements become much less significant above 7ms so I'm very 
happy with this compromise.

Thanks!

-- 
-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-15  7:05                       ` Con Kolivas
@ 2006-04-15 20:45                         ` Al Boldi
  2006-04-15 23:22                           ` Con Kolivas
  0 siblings, 1 reply; 43+ messages in thread
From: Al Boldi @ 2006-04-15 20:45 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel, Mike Galbraith

Con Kolivas wrote:
> On Friday 14 April 2006 13:16, Al Boldi wrote:
> > Can you try the attached mem-eater passing it the number of kb to be
> > eaten.
> >
> >         i.e. '# while :; do ./eatm 9999 ; done'
> >
> > This will print the number of bytes eaten and the timing in ms.
> >
> > Assuming timeslice=100, adjust the number of kb to be eaten such that
> > the timing will be less than timeslice (something like 60ms).  Switch to
> > another vt and start another eatm w/ the number of kb yielding more than
> > timeslice (something like 140ms).  This eatm should starve completely
> > after exceeding timeslice.
> >
> > This problem also exists in mainline, but it is able to break out of it
> > to some extent.  Setting eatm kb to a timing larger than timeslice does
> > not exhibit this problem.
>
> Thanks for bringing this to my attention. A while back I had different
> management of forked tasks and merged it with PF_NONSLEEP. Since then I've
> changed the management of NONSLEEP tasks and didn't realise it had
> adversely affected the accounting of forking tasks. This patch should
> rectify it.

Congrats!

Much smoother, but I still get this choke w/ 2 eatm 9999 loops running:

9 MB 783 KB eaten in 131 msec (74 MB/s)
9 MB 783 KB eaten in 129 msec (75 MB/s)
9 MB 783 KB eaten in 129 msec (75 MB/s)
9 MB 783 KB eaten in 131 msec (74 MB/s)
9 MB 783 KB eaten in 133 msec (73 MB/s)
9 MB 783 KB eaten in 132 msec (73 MB/s)
9 MB 783 KB eaten in 128 msec (76 MB/s)
9 MB 783 KB eaten in 133 msec (73 MB/s)
9 MB 783 KB eaten in 129 msec (75 MB/s)
9 MB 783 KB eaten in 130 msec (74 MB/s)
9 MB 783 KB eaten in 2416 msec (3 MB/s)		<<<<<<<<<<<<<
9 MB 783 KB eaten in 197 msec (48 MB/s)
9 MB 783 KB eaten in 133 msec (73 MB/s)
9 MB 783 KB eaten in 132 msec (73 MB/s)
9 MB 783 KB eaten in 132 msec (73 MB/s)
9 MB 783 KB eaten in 126 msec (77 MB/s)
9 MB 783 KB eaten in 135 msec (72 MB/s)
9 MB 783 KB eaten in 132 msec (73 MB/s)
9 MB 783 KB eaten in 132 msec (73 MB/s)
9 MB 783 KB eaten in 134 msec (72 MB/s)
9 MB 783 KB eaten in 64 msec (152 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 64 msec (152 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 64 msec (152 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)
9 MB 783 KB eaten in 63 msec (154 MB/s)

You may have to adjust the kb to get the same effect.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-14  3:16                     ` Al Boldi
@ 2006-04-15  7:05                       ` Con Kolivas
  2006-04-15 20:45                         ` Al Boldi
  0 siblings, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2006-04-15  7:05 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Friday 14 April 2006 13:16, Al Boldi wrote:
> Can you try the attached mem-eater passing it the number of kb to be eaten.
>
>         i.e. '# while :; do ./eatm 9999 ; done'
>
> This will print the number of bytes eaten and the timing in ms.
>
> Assuming timeslice=100, adjust the number of kb to be eaten such that the
> timing will be less than timeslice (something like 60ms).  Switch to
> another vt and start another eatm w/ the number of kb yielding more than
> timeslice (something like 140ms).  This eatm should starve completely after
> exceeding timeslice.
>
> This problem also exists in mainline, but it is able to break out of it to
> some extent.  Setting eatm kb to a timing larger than timeslice does not
> exhibit this problem.

Thanks for bringing this to my attention. A while back I had different 
management of forked tasks and merged it with PF_NONSLEEP. Since then I've 
changed the management of NONSLEEP tasks and didn't realise it had adversely 
affected the accounting of forking tasks. This patch should rectify it.

Thanks!
---
 include/linux/sched.h |    1 +
 kernel/sched.c        |    9 ++++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

Index: linux-2.6.16-ck5/include/linux/sched.h
===================================================================
--- linux-2.6.16-ck5.orig/include/linux/sched.h	2006-04-15 16:32:18.000000000 +1000
+++ linux-2.6.16-ck5/include/linux/sched.h	2006-04-15 16:34:36.000000000 +1000
@@ -961,6 +961,7 @@ static inline void put_task_struct(struc
 #define PF_SWAPWRITE	0x01000000	/* Allowed to write to swap */
 #define PF_NONSLEEP	0x02000000	/* Waiting on in kernel activity */
 #define PF_ISOREF	0x04000000	/* SCHED_ISO task has used up quota */
+#define PF_FORKED	0x08000000	/* Task just forked another process */
 
 /*
  * Only the _current_ task can read/write to tsk->flags, but other
Index: linux-2.6.16-ck5/kernel/sched.c
===================================================================
--- linux-2.6.16-ck5.orig/kernel/sched.c	2006-04-15 16:32:18.000000000 +1000
+++ linux-2.6.16-ck5/kernel/sched.c	2006-04-15 16:34:35.000000000 +1000
@@ -18,7 +18,7 @@
  *  2004-04-02	Scheduler domains code by Nick Piggin
  *  2006-04-02	Staircase scheduling policy by Con Kolivas with help
  *		from William Lee Irwin III, Zwane Mwaikambo & Peter Williams.
- *		Staircase v15
+ *		Staircase v15_test2
  */
 
 #include <linux/mm.h>
@@ -809,6 +809,9 @@ static inline void recalc_task_prio(task
 	else
 		sleep_time = 0;
 
+	if (unlikely(p->flags & PF_FORKED))
+		sleep_time = 0;
+
 	/*
 	 * If we sleep longer than our running total and have not set the
 	 * PF_NONSLEEP flag we gain a bonus.
@@ -847,7 +850,7 @@ static void activate_task(task_t *p, run
 	p->time_slice = p->slice % rr ? : rr;
 	if (!rt_task(p)) {
 		recalc_task_prio(p, now);
-		p->flags &= ~PF_NONSLEEP;
+		p->flags &= ~(PF_NONSLEEP | PF_FORKED);
 		p->systime = 0;
 		p->prio = effective_prio(p);
 	}
@@ -1464,7 +1467,7 @@ void fastcall wake_up_new_task(task_t *p
 
 	/* Forked process gets no bonus to prevent fork bombs. */
 	p->bonus = 0;
-	current->flags |= PF_NONSLEEP;
+	current->flags |= PF_FORKED;
 
 	if (likely(cpu == this_cpu)) {
 		activate_task(p, rq, 1);
-- 
-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-13 11:51                   ` Con Kolivas
@ 2006-04-14  3:16                     ` Al Boldi
  2006-04-15  7:05                       ` Con Kolivas
  0 siblings, 1 reply; 43+ messages in thread
From: Al Boldi @ 2006-04-14  3:16 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel, Mike Galbraith

[-- Attachment #1: Type: text/plain, Size: 2136 bytes --]

Con Kolivas wrote:
> On Thursday 13 April 2006 01:25, Al Boldi wrote:
> > Con Kolivas wrote:
> > > Nvidia driver; all separate tasks in top.
> >
> > On a 400MhzP2 i810drm w/ kernel HZ=1000 it stutters.
> > You may want to compensate for nvidia w/ a few cpu-hogs.
>
> I tried adding cpu hogs and it gets extremely slow very soon but still
> doesn't stutter here.
>
> > How many gears fps do you get?
>
> When those 3 are running concurrently (without any other cpu hogs) gears
> is showing 317 fps.

Your machine is probably too fast to show the problem, due to enough 
cpu-cycles/timeslice to complete the request.

Can you try the attached mem-eater passing it the number of kb to be eaten.

        i.e. '# while :; do ./eatm 9999 ; done' 

This will print the number of bytes eaten and the timing in ms.

Assuming timeslice=100, adjust the number of kb to be eaten such that the 
timing will be less than timeslice (something like 60ms).  Switch to another 
vt and start another eatm w/ the number of kb yielding more than timeslice 
(something like 140ms).  This eatm should starve completely after exceeding 
timeslice.

This problem also exists in mainline, but it is able to break out of it to 
some extent.  Setting eatm kb to a timing larger than timeslice does not 
exhibit this problem.

> > > range 63-73 seconds.
> >
> > Could this 10s skew be improved to around 1s to aid smoothness?
>
> I'm happy to try... but I doubt it. 10% difference over 10 tasks over 10
> mins of tasks of that wake/sleep nature is pretty good IMO. I'll see if
> there's anywhere else I can make the cpu accounting any better.

Great!

> As an aside, note that sched_clock and nanosecond timing with TSC isn't
> actually used if you use the pm timer which undoes any high res accounting
> the cpu scheduler can do (I noticed this when playing with pm timer that
> sched_clock just returns jiffies resolution instead of real nanosecond
> res). This could undo any smoothness that good cpu accounting can do.

Yes, pm-timer looks rather broken, at least on my machine.  Too bad it's on 
by default, as I always have to turn it off.

Thanks!

--
Al



[-- Attachment #2: eatm.c --]
[-- Type: text/x-csrc, Size: 810 bytes --]

#include <stdio.h>
#include <sys/time.h>

unsigned long elapsed(int start) {

	static struct timeval s,e;

	if (start) return gettimeofday(&s, NULL);

	gettimeofday(&e, NULL);

	return ((e.tv_sec - s.tv_sec) * 1000 + (e.tv_usec - s.tv_usec) / 1000);

}

int main(int argc, char **argv) {

    unsigned long int i,j,max;
    unsigned char *p;

    if (argc>1)
	max=atol(argv[1]);
    else
	max=0x60000;


    elapsed(1); 

    for (i=0;((i<max/1024) && (p = (char *)malloc(1024*1024)));i++) {
        for (j=0;j<1024;p[1024*j++]=0);
	fprintf(stderr,"\r%d MB ",i+1);
    }

    for (j=max-(i*=1024);((i<max) && (p = (char *)malloc(1024)));i++) {
	*p = 0;
    }
    fprintf(stderr,"%d KB ",j-(max-i));

    fprintf(stderr,"eaten in %lu msec (%lu MB/s)\n",elapsed(0),i/(elapsed(0)?:1)*1000/1024);

    return 0;
}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12 15:25                 ` Al Boldi
@ 2006-04-13 11:51                   ` Con Kolivas
  2006-04-14  3:16                     ` Al Boldi
  2006-04-16  6:02                   ` Con Kolivas
  1 sibling, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2006-04-13 11:51 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Thursday 13 April 2006 01:25, Al Boldi wrote:
> Con Kolivas wrote:
> > Nvidia driver; all separate tasks in top.
>
> On a 400MhzP2 i810drm w/ kernel HZ=1000 it stutters.
> You may want to compensate for nvidia w/ a few cpu-hogs.

I tried adding cpu hogs and it gets extremely slow very soon but still doesn't 
stutter here.

> How many gears fps do you get?

When those 3 are running concurrently (without any other cpu hogs) gears is 
showing 317 fps.

> > range 63-73 seconds.
>
> Could this 10s skew be improved to around 1s to aid smoothness?

I'm happy to try... but I doubt it. 10% difference over 10 tasks over 10 mins 
of tasks of that wake/sleep nature is pretty good IMO. I'll see if there's 
anywhere else I can make the cpu accounting any better. 

As an aside, note that sched_clock and nanosecond timing with TSC isn't 
actually used if you use the pm timer which undoes any high res accounting 
the cpu scheduler can do (I noticed this when playing with pm timer that 
sched_clock just returns jiffies resolution instead of real nanosecond res). 
This could undo any smoothness that good cpu accounting can do.

-- 
-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12 11:27               ` Con Kolivas
@ 2006-04-12 15:25                 ` Al Boldi
  2006-04-13 11:51                   ` Con Kolivas
  2006-04-16  6:02                   ` Con Kolivas
  0 siblings, 2 replies; 43+ messages in thread
From: Al Boldi @ 2006-04-12 15:25 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel, Mike Galbraith

Con Kolivas wrote:
> On Wednesday 12 April 2006 20:39, Al Boldi wrote:
> > Con Kolivas wrote:
> > > Installed and tested here just now. They run smoothly concurrently
> > > here. Are you testing on staircase15?
> >
> > staircase14.2-test3.  Are you testing w/ DRM?  If not then all mesa
> > requests will be queued into X, and then runs as one task (check top
> > d.1)
>
> Nvidia driver; all separate tasks in top.

On a 400MhzP2 i810drm w/ kernel HZ=1000 it stutters.
You may want to compensate for nvidia w/ a few cpu-hogs.
How many gears fps do you get?

> > Try ping -A (10x).  top d.1 should show skewed times.  If you have a
> > fast machine, you may have to increase the load.
>
> Ran for a bit over 10 mins outside of X to avoid other tasks influencing
> results. I was too lazy to go to init 1.
>
> ps -eALo pid,spid,user,priority,ni,pcpu,vsize,time,args
>
> 15648 15648 root      39   0  9.2  1740 00:01:03 ping -A localhost
> 15649 15649 root      28   0  9.8  1740 00:01:06 ping -A localhost
> 15650 15650 root      39   0  9.9  1744 00:01:07 ping -A localhost
> 15651 15651 root      39   0  9.3  1740 00:01:03 ping -A localhost
> 15652 15652 root      39   0 10.3  1740 00:01:10 ping -A localhost
> 15653 15653 root      39   0 10.8  1740 00:01:13 ping -A localhost
> 15654 15654 root      39   0 10.0  1740 00:01:08 ping -A localhost
> 15655 15655 root      39   0 10.5  1740 00:01:11 ping -A localhost
> 15656 15656 root      39   0  9.9  1740 00:01:07 ping -A localhost
> 15657 15657 root      39   0 10.2  1740 00:01:09 ping -A localhost
>
> mean 68.7 seconds
>
> range 63-73 seconds.

Could this 10s skew be improved to around 1s to aid smoothness?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12 10:39             ` Al Boldi
@ 2006-04-12 11:27               ` Con Kolivas
  2006-04-12 15:25                 ` Al Boldi
  0 siblings, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2006-04-12 11:27 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Wednesday 12 April 2006 20:39, Al Boldi wrote:
> Con Kolivas wrote:
> > Installed and tested here just now. They run smoothly concurrently here.
> > Are you testing on staircase15?
>
> staircase14.2-test3.  Are you testing w/ DRM?  If not then all mesa
> requests will be queued into X, and then runs as one task (check top d.1)

Nvidia driver; all separate tasks in top.

> Try ping -A (10x).  top d.1 should show skewed times.  If you have a fast
> machine, you may have to increase the load.

Ran for a bit over 10 mins outside of X to avoid other tasks influencing 
results. I was too lazy to go to init 1.

ps -eALo pid,spid,user,priority,ni,pcpu,vsize,time,args

15648 15648 root      39   0  9.2  1740 00:01:03 ping -A localhost
15649 15649 root      28   0  9.8  1740 00:01:06 ping -A localhost
15650 15650 root      39   0  9.9  1744 00:01:07 ping -A localhost
15651 15651 root      39   0  9.3  1740 00:01:03 ping -A localhost
15652 15652 root      39   0 10.3  1740 00:01:10 ping -A localhost
15653 15653 root      39   0 10.8  1740 00:01:13 ping -A localhost
15654 15654 root      39   0 10.0  1740 00:01:08 ping -A localhost
15655 15655 root      39   0 10.5  1740 00:01:11 ping -A localhost
15656 15656 root      39   0  9.9  1740 00:01:07 ping -A localhost
15657 15657 root      39   0 10.2  1740 00:01:09 ping -A localhost

mean 68.7 seconds

range 63-73 seconds.

For a load that wakes up so frequently for such a short period of time I think 
that is pretty fair cpu distribution over 10 mins. Over shorter periods top 
is hopeless at representing accurate cpu usage, especially at low HZ settings 
of the kernel. You can see the current cpu distribution on ps there is pretty 
consistent across the tasks and gives what I consider quite a fair cpu 
distribution over 10 mins.

-- 
-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12  9:36           ` Con Kolivas
@ 2006-04-12 10:39             ` Al Boldi
  2006-04-12 11:27               ` Con Kolivas
  0 siblings, 1 reply; 43+ messages in thread
From: Al Boldi @ 2006-04-12 10:39 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel, Mike Galbraith

Con Kolivas wrote:
> On Wednesday 12 April 2006 18:17, Al Boldi wrote:
> > Con Kolivas wrote:
> > > Single heavily cpu bound computationally intensive tasks (think
> > > rendering etc).
> >
> > Why do you need a switch for that?
>
> Because avoiding doing need_resched and reassessing priority at less
> regular intervals means less overhead, and there is always something else
> running on a pc. At low loads the longer timeslices and delayed preemption
> contribute considerably to cache warmth and throughput. Comparing
> staircase's sched_compute mode on kernbench at "optimal loads" (make -j4 x
> num_cpus) showed the best throughput of all the schedulers tested.

Great!

> > > Sorry I don't understand what you mean. Why do you say it's not fair
> > > (got a testcase?). What do you mean by "definitely not smooth". What
> > > is smoothness and on what workloads is it not smooth? Also by ia you
> > > mean what?
> >
> > ia=interactivity i.e: responsiveness under high load.
> > smooth=not jumpy i.e: run '# gears & morph3d & reflect &' w/o stutter
>
> Installed and tested here just now. They run smoothly concurrently here.
> Are you testing on staircase15?

staircase14.2-test3.  Are you testing w/ DRM?  If not then all mesa requests 
will be queued into X, and then runs as one task (check top d.1)

> > fair=non hogging i.e: spreading cpu-load across tasks evenly (top d.1)
>
> Only unblocked processes/threads where one depends on the other don't get
> equal share, which is as broken a testcase as relying on sched_yield. I
> have not seen a testcase demonstrating unfairness on current staircase.
> top shows me fair cpu usage.

Try ping -A (10x).  top d.1 should show skewed times.  If you have a fast 
machine, you may have to increase the load.

> > > Again I don't understand. Just how heavy a load is heavy? Your
> > > testcases are already in what I would call stratospheric range. I
> > > don't personally think a cpu scheduler should be optimised for load
> > > infinity. And how are you defining efficient? You say it doesn't
> > > "look" efficient? What "looks" inefficient about it?
> >
> > The idea here is to expose inefficiencies by driving the system into
> > saturation, and although staircase is more efficient than the default
> > 2.6 scheduler, it is obviously less efficient than spa.
>
> Where do you stop calling something saturation and start calling it
> absurd? By your reckoning staircase is stable to loads of 300 on one cpu.
> spa being stable to higher loads is hardly comparable given the
> interactivity disparity between it and staircase. A compromise is one that
> does both very well; not one perfectly and the other poorly.
>
> > > You want tunables? The only tunable in staircase is rr_interval which
> > > (in -ck) has an on/off for big/small (sched_compute) since most other
> > > numbers in between (in my experience) are pretty meaningless. I could
> > > export rr_interval directly instead... I've not seen a good argument
> > > for doing that. Got one?
> >
> > Smoothness control, maybe?
>
> Have to think about that one. I'm not seeing a smoothness issue.
>
> > > However there are no other tunables at all (just look at
> > > the code). All tasks of any nice level have available the whole
> > > priority range from 100-139 which appears as PRIO 0-39 on top.
> > > Limiting that (again) changes the semantics.
> >
> > Yes, limiting this could change the semantics for the sake of fairness,
> > it's up to you.
>
> There is no problem with fairness that I am aware of.

Let's see after you retry the tests.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12  8:17         ` Al Boldi
@ 2006-04-12  9:36           ` Con Kolivas
  2006-04-12 10:39             ` Al Boldi
  0 siblings, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2006-04-12  9:36 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Wednesday 12 April 2006 18:17, Al Boldi wrote:
> Con Kolivas wrote:
> > Single heavily cpu bound computationally intensive tasks (think rendering
> > etc).
>
> Why do you need a switch for that?

Because avoiding doing need_resched and reassessing priority at less regular 
intervals means less overhead, and there is always something else running on 
a pc. At low loads the longer timeslices and delayed preemption contribute 
considerably to cache warmth and throughput. Comparing staircase's 
sched_compute mode on kernbench at "optimal loads" (make -j4 x num_cpus) 
showed the best throughput of all the schedulers tested.

> > Sorry I don't understand what you mean. Why do you say it's not fair (got
> > a testcase?). What do you mean by "definitely not smooth". What is
> > smoothness and on what workloads is it not smooth? Also by ia you mean
> > what?
>
> ia=interactivity i.e: responsiveness under high load.
> smooth=not jumpy i.e: run '# gears & morph3d & reflect &' w/o stutter

Installed and tested here just now. They run smoothly concurrently here. Are 
you testing on staircase15?

> fair=non hogging i.e: spreading cpu-load across tasks evenly (top d.1)

Only unblocked processes/threads where one depends on the other don't get 
equal share, which is as broken a testcase as relying on sched_yield. I have 
not seen a testcase demonstrating unfairness on current staircase. top shows 
me fair cpu usage.

> > Again I don't understand. Just how heavy a load is heavy? Your testcases
> > are already in what I would call stratospheric range. I don't personally
> > think a cpu scheduler should be optimised for load infinity. And how are
> > you defining efficient? You say it doesn't "look" efficient? What "looks"
> > inefficient about it?
>
> The idea here is to expose inefficiencies by driving the system into
> saturation, and although staircase is more efficient than the default 2.6
> scheduler, it is obviously less efficient than spa.

Where do you stop calling something saturation and start calling it absurd? By 
your reckoning staircase is stable to loads of 300 on one cpu. spa being 
stable to higher loads is hardly comparable given the interactivity disparity 
between it and staircase. A compromise is one that does both very well; not 
one perfectly and the other poorly.

> > You want tunables? The only tunable in staircase is rr_interval which (in
> > -ck) has an on/off for big/small (sched_compute) since most other numbers
> > in between (in my experience) are pretty meaningless. I could export
> > rr_interval directly instead... I've not seen a good argument for doing
> > that. Got one?
>
> Smoothness control, maybe?

Have to think about that one. I'm not seeing a smoothness issue.

> > However there are no other tunables at all (just look at
> > the code). All tasks of any nice level have available the whole priority
> > range from 100-139 which appears as PRIO 0-39 on top. Limiting that
> > (again) changes the semantics.
>
> Yes, limiting this could change the semantics for the sake of fairness,
> it's up to you.

There is no problem with fairness that I am aware of.

Thanks!

-- 
-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12  6:22       ` Con Kolivas
@ 2006-04-12  8:17         ` Al Boldi
  2006-04-12  9:36           ` Con Kolivas
  0 siblings, 1 reply; 43+ messages in thread
From: Al Boldi @ 2006-04-12  8:17 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux-kernel, Mike Galbraith

Con Kolivas wrote:
> On Wed, 12 Apr 2006 03:41 pm, Al Boldi wrote:
> > Con Kolivas wrote:
> > > Which is fine because sched_compute isn't designed for heavily
> > > multithreaded usage.
> >
> > What's it good for?
>
> Single heavily cpu bound computationally intensive tasks (think rendering
> etc).

Why do you need a switch for that?

> > > The same mechanism that is responsible for
> > > maintaining fairness is also responsible for creating its
> > > interactivity. That's what I mean by "interactive by design", and what
> > > makes it different from extracting interactivity out of other designs
> > > that have some form of estimator to add unfairness to create that
> > > interactivity.
> >
> > Yes, but staircase isn't really fair, and it's definitely not smooth. 
> > You are trying to get ia by aggressively attacking priority which kills
> > smoothness, and is only fair with a short run-queue.
>
> Sorry I don't understand what you mean. Why do you say it's not fair (got
> a testcase?). What do you mean by "definitely not smooth". What is
> smoothness and on what workloads is it not smooth? Also by ia you mean
> what?

ia=interactivity i.e: responsiveness under high load.
smooth=not jumpy i.e: run '# gears & morph3d & reflect &' w/o stutter
fair=non hogging i.e: spreading cpu-load across tasks evenly (top d.1)

> > > I know you're _very_ keen on the idea of some autotuning but I think
> > > this is the wrong thing to autotune. The whole point of staircase is
> > > it's a simple design without any interactivity estimator. It uses pure
> > > cpu accounting to change priority and that is a percentage which is
> > > effectively already tuned to the underlying cpu. Any
> > > benchmarking/aggressiveness "tuning" would undo the (effectively) very
> > > simple design.
> >
> > I like simple designs.  They tend to keep things to the point and aid
> > efficiency.  But staircase doesn't look efficient to me under heavy
> > load, and I would think this may be easily improved.
>
> Again I don't understand. Just how heavy a load is heavy? Your testcases
> are already in what I would call stratospheric range. I don't personally
> think a cpu scheduler should be optimised for load infinity. And how are
> you defining efficient? You say it doesn't "look" efficient? What "looks"
> inefficient about it?

The idea here is to expose inefficiencies by driving the system into 
saturation, and although staircase is more efficient than the default 2.6 
scheduler, it is obviously less efficient than spa.

> > Also, can you export  lowest/best prio as well as timeslice and friends
> > to procfs/sysfs?
>
> You want tunables? The only tunable in staircase is rr_interval which (in
> -ck) has an on/off for big/small (sched_compute) since most other numbers
> in between (in my experience) are pretty meaningless. I could export
> rr_interval directly instead... I've not seen a good argument for doing
> that. Got one? 

Smoothness control, maybe?

> However there are no other tunables at all (just look at
> the code). All tasks of any nice level have available the whole priority
> range from 100-139 which appears as PRIO 0-39 on top. Limiting that
> (again) changes the semantics.

Yes, limiting this could change the semantics for the sake of fairness, it's 
up to you.

> And another round of thanks :) But many more questions.

No problem.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-12  5:41     ` Al Boldi
@ 2006-04-12  6:22       ` Con Kolivas
  2006-04-12  8:17         ` Al Boldi
  0 siblings, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2006-04-12  6:22 UTC (permalink / raw)
  To: Al Boldi; +Cc: ck list, linux-kernel, Mike Galbraith

On Wed, 12 Apr 2006 03:41 pm, Al Boldi wrote:
> Con Kolivas wrote:
> > Which is fine because sched_compute isn't designed for heavily
> > multithreaded usage.
>
> What's it good for?

Single heavily cpu bound computationally intensive tasks (think rendering 
etc).

> > Oh that's good because staircase14.2_test3 is basically staircase15 which
> > is in the current plugsched (ie newer than the staircase you tested in
> > plugsched-2.6.16 above). So it tolerates a load of up to 500 on single
> > cpu? That seems very robust to me.
>
> Yes, better than the default 2.6 scheduler.
>
> > > Your scheduler seems to be tuned for single-user multi-tasking, i.e.
> > > concurrent tasks around 10, where its aggressive nature is sustained by
> > > a short run-queue.  Once you go above 50, this aggressiveness starts to
> > > express itself as very jumpy.
> >
> > Oh no it's nothing like "tuned for single-user multi tasking". It seems a
> > common misconception because interactivity is a prime concern for
> > staircase but the idea is that we should be able to do interactivity
> > without sacrificing fairness.
>
> Agreed.
>
> > The same mechanism that is responsible for
> > maintaining fairness is also responsible for creating its interactivity.
> > That's what I mean by "interactive by design", and what makes it
> > different from extracting interactivity out of other designs that have
> > some form of estimator to add unfairness to create that interactivity.
>
> Yes, but staircase isn't really fair, and it's definitely not smooth.  You
> are trying to get ia by aggressively attacking priority which kills
> smoothness, and is only fair with a short run-queue.

Sorry I don't understand what you mean. Why do you say it's not fair (got a 
testcase?). What do you mean by "definitely not smooth". What is smoothness 
and on what workloads is it not smooth? Also by ia you mean what? 

> > I know you're _very_ keen on the idea of some autotuning but I think this
> > is the wrong thing to autotune. The whole point of staircase is it's a
> > simple design without any interactivity estimator. It uses pure cpu
> > accounting to change priority and that is a percentage which is
> > effectively already tuned to the underlying cpu. Any
> > benchmarking/aggressiveness "tuning" would undo the (effectively) very
> > simple design.
>
> I like simple designs.  They tend to keep things to the point and aid
> efficiency.  But staircase doesn't look efficient to me under heavy load,
> and I would think this may be easily improved.

Again I don't understand. Just how heavy a load is heavy? Your testcases are 
already in what I would call stratospheric range. I don't personally think a 
cpu scheduler should be optimised for load infinity. And how are you defining 
efficient? You say it doesn't "look" efficient? What "looks" inefficient 
about it?

> > Feel free to look at the code. Sleep for time Y, increase priority by
> > Y/RR_INTERVAL. Run for time X, drop priority by X/RR_INTERVAL. If it
> > drops to lowest priority it then jumps back up to best priority again (to
> > prevent it being "batch starved").
>
> Looks simple enough, and should work for short run'q's, but this looks
> unsustainable for long run'q's, due to the unconditional jump from lowest
> to best prio.

Looks? How? You've shown what I consider very long runqueues work fine 
already.

> Making it conditional and maybe moderating X,Y,RR_INTERVAL 
> could be helpful.

I think over all meaningful loads, and into absurdly high load ranges it 
works. I don't think undoing the incredible simplicity that works over all 
that range should be done to optimise for loads even greater than that.

> Also, can you export  lowest/best prio as well as timeslice and friends to
> procfs/sysfs?

You want tunables? The only tunable in staircase is rr_interval which (in -ck) 
has an on/off for big/small (sched_compute) since most other numbers in 
between (in my experience) are pretty meaningless. I could export rr_interval 
directly instead... I've not seen a good argument for doing that. Got one? 
However there are no other tunables at all (just look at the code). All tasks 
of any nice level have available the whole priority range from 100-139 which 
appears as PRIO 0-39 on top. Limiting that (again) changes the semantics.

> > Thanks very much for testing :)
>
> Thank you!

And another round of thanks :) But many more questions.

--
-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-11 22:56   ` Con Kolivas
@ 2006-04-12  5:41     ` Al Boldi
  2006-04-12  6:22       ` Con Kolivas
  0 siblings, 1 reply; 43+ messages in thread
From: Al Boldi @ 2006-04-12  5:41 UTC (permalink / raw)
  To: Con Kolivas, ck list; +Cc: linux-kernel, Mike Galbraith

Con Kolivas wrote:
> On Wednesday 12 April 2006 03:03, Al Boldi wrote:
> > With plugsched-2.6.16 your staircase sched reaches about 40 then slows
> > down, maxing around 100.  Setting sched_compute=1 causes console
> > lock-ups.
>
> Which is fine because sched_compute isn't designed for heavily
> multithreaded usage.

What's it good for?

> > With staircase14.2-test3 it reaches around 300 then slows down, halting
> > at around 500.
>
> Oh that's good because staircase14.2_test3 is basically staircase15 which
> is in the current plugsched (ie newer than the staircase you tested in
> plugsched-2.6.16 above). So it tolerates a load of up to 500 on single
> cpu? That seems very robust to me.

Yes, better than the default 2.6 scheduler.

> > Your scheduler seems to be tuned for single-user multi-tasking, i.e.
> > concurrent tasks around 10, where its aggressive nature is sustained by
> > a short run-queue.  Once you go above 50, this aggressiveness starts to
> > express itself as very jumpy.
>
> Oh no it's nothing like "tuned for single-user multi tasking". It seems a
> common misconception because interactivity is a prime concern for
> staircase but the idea is that we should be able to do interactivity
> without sacrificing fairness.

Agreed.

> The same mechanism that is responsible for
> maintaining fairness is also responsible for creating its interactivity.
> That's what I mean by "interactive by design", and what makes it different
> from extracting interactivity out of other designs that have some form of
> estimator to add unfairness to create that interactivity.

Yes, but staircase isn't really fair, and it's definitely not smooth.  You 
are trying to get ia by aggressively attacking priority which kills 
smoothness, and is only fair with a short run-queue.

> > This is of course very cpu/mem/ctxt dependent and it would be great, if
> > your scheduler could maybe do some simple on-the-fly benchmarking as it
> > reschedules, thus adjusting this aggressiveness depending on its
> > sustainability.
>
> I know you're _very_ keen on the idea of some autotuning but I think this
> is the wrong thing to autotune. The whole point of staircase is it's a
> simple design without any interactivity estimator. It uses pure cpu
> accounting to change priority and that is a percentage which is
> effectively already tuned to the underlying cpu. Any
> benchmarking/aggressiveness "tuning" would undo the (effectively) very
> simple design.

I like simple designs.  They tend to keep things to the point and aid 
efficiency.  But staircase doesn't look efficient to me under heavy load, 
and I would think this may be easily improved.

> Feel free to look at the code. Sleep for time Y, increase priority by
> Y/RR_INTERVAL. Run for time X, drop priority by X/RR_INTERVAL. If it drops
> to lowest priority it then jumps back up to best priority again (to
> prevent it being "batch starved").

Looks simple enough, and should work for short run'q's, but this looks 
unsustainable for long run'q's, due to the unconditional jump from lowest to 
best prio.  Making it conditional and maybe moderating X,Y,RR_INTERVAL could 
be helpful.

Also, can you export  lowest/best prio as well as timeslice and friends to 
procfs/sysfs?

> Thanks very much for testing :)

Thank you!

--
Al


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-11 17:03 ` Fwd: " Al Boldi
@ 2006-04-11 22:56   ` Con Kolivas
  2006-04-12  5:41     ` Al Boldi
  0 siblings, 1 reply; 43+ messages in thread
From: Con Kolivas @ 2006-04-11 22:56 UTC (permalink / raw)
  To: Al Boldi, ck list; +Cc: linux-kernel, Mike Galbraith

On Wednesday 12 April 2006 03:03, Al Boldi wrote:
> With plugsched-2.6.16 your staircase sched reaches about 40 then slows
> down, maxing around 100.  Setting sched_compute=1 causes console lock-ups.

Which is fine because sched_compute isn't designed for heavily multithreaded 
usage.

> With staircase14.2-test3 it reaches around 300 then slows down, halting at
> around 500.

Oh that's good because staircase14.2_test3 is basically staircase15 which is 
in the current plugsched (ie newer than the staircase you tested in 
plugsched-2.6.16 above). So it tolerates a load of up to 500 on single cpu? 
That seems very robust to me. 

> Your scheduler seems to be tuned for single-user multi-tasking, i.e.
> concurrent tasks around 10, where its aggressive nature is sustained by a
> short run-queue.  Once you go above 50, this aggressiveness starts to
> express itself as very jumpy.

Oh no it's nothing like "tuned for single-user multi tasking". It seems a 
common misconception because interactivity is a prime concern for staircase 
but the idea is that we should be able to do interactivity without 
sacrificing fairness. The same mechanism that is responsible for maintaining 
fairness is also responsible for creating its interactivity. That's what I 
mean by "interactive by design", and what makes it different from extracting 
interactivity out of other designs that have some form of estimator to add 
unfairness to create that interactivity.

> This is of course very cpu/mem/ctxt dependent and it would be great, if
> your scheduler could maybe do some simple on-the-fly benchmarking as it
> reschedules, thus adjusting this aggressiveness depending on its
> sustainability.

I know you're _very_ keen on the idea of some autotuning but I think this is 
the wrong thing to autotune. The whole point of staircase is it's a simple 
design without any interactivity estimator. It uses pure cpu accounting to 
change priority and that is a percentage which is effectively already tuned 
to the underlying cpu. Any benchmarking/aggressiveness "tuning" would undo 
the (effectively) very simple design. 

Feel free to look at the code. Sleep for time Y, increase priority by 
Y/RR_INTERVAL. Run for time X, drop priority by X/RR_INTERVAL. If it drops to 
lowest priority it then jumps back up to best priority again (to prevent it 
being "batch starved").

Thanks very much for testing :)

-- 
-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-10 14:43   ` Al Boldi
@ 2006-04-11 10:57     ` Con Kolivas
  0 siblings, 0 replies; 43+ messages in thread
From: Con Kolivas @ 2006-04-11 10:57 UTC (permalink / raw)
  To: linux-kernel

Hi Al

On Tuesday 11 April 2006 00:43, Al Boldi wrote:
> After that the loadavg starts to wrap.
> And even then it is possible to login.
> And that's not with the default 2.6 scheduler, but rather w/ spa.

Since you seem to use plugsched, I wonder if you could tell me how does 
current staircase perform with a load like that?

-- 
-ck

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-09 18:33 ` Mike Galbraith
@ 2006-04-10 14:43   ` Al Boldi
  2006-04-11 10:57     ` Con Kolivas
  0 siblings, 1 reply; 43+ messages in thread
From: Al Boldi @ 2006-04-10 14:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mike Galbraith, bert hubert

bert hubert wrote:
> In general, Linux systems are not maxed out as they will disappoint that
> way (like any system running with id=0).

top - 16:59:23 up 29 min,  0 users,  load average: 993.49, 796.33, 496.21
Tasks: 1039 total, 1000 running,  39 sleeping,   0 stopped,   0 zombie
Cpu(s):  47.6% user,  52.4% system,   0.0% nice,   0.0% idle,   0.0% IO-wait
Mem:    125796k total,   123344k used,     2452k free,       64k buffers
Swap:  1020088k total,     9176k used,  1010912k free,     1752k cached

  PID  PR  NI  VIRT  RES  SHR SWAP S %CPU    TIME+  Command
 3946  28   0  2404 1460  720  944 R  5.8   0:14.78 top
 4219  37   0  1580  488  416 1092 R  3.0   0:00.45 ping
 4214  37   0  1584  480  408 1104 R  2.8   0:00.46 ping
 4196  37   0  1580  480  408 1100 R  2.5   0:00.45 ping
 4175  37   0  1584  488  416 1096 R  2.3   0:00.30 ping
 3950  37   1  1580  492  416 1088 R  2.0   0:08.77 ping
 4136  37   0  1580  488  416 1092 R  2.0   0:00.36 ping
 4158  37   0  1584  484  408 1100 R  2.0   0:00.35 ping
 4177  37   0  1580  480  408 1100 R  2.0   0:00.27 ping
 4180  37   0  1580  484  408 1096 R  2.0   0:00.33 ping
 4194  37   0  1580  480  408 1100 R  2.0   0:00.40 ping
 4199  37   0  1584  484  408 1100 R  2.0   0:00.47 ping
 4189  37   0  1580  488  416 1092 R  1.8   0:00.43 ping
 4153  37   0  1584  480  408 1104 R  1.5   0:00.31 ping
 4170  37   0  1584  484  408 1100 R  1.5   0:00.30 ping
 4191  37   0  1584  492  416 1092 R  1.5   0:00.41 ping
 4209  37   0  1584  484  408 1100 R  1.5   0:00.39 ping
 4215  37   0  1584  484  408 1100 R  1.5   0:00.39 ping
 4221  37   0  1580  492  416 1088 R  1.5   0:00.37 ping
 4146  37   0  1580  488  416 1092 R  1.3   0:00.29 ping
 4156  37   0  1584  484  408 1100 R  1.3   0:00.32 ping
 4166  37   0  1584  488  416 1096 R  1.3   0:00.33 ping
 4183  37   0  1580  480  408 1100 R  1.3   0:00.37 ping
 4216  37   0  1584  480  408 1104 R  1.3   0:00.39 ping
 4229  37   0  1584  484  408 1100 R  1.3   0:00.41 ping
 4233  37   0  1584  488  416 1096 R  1.3   0:00.41 ping
 4137  37   0  1580  484  408 1096 R  1.0   0:00.33 ping
 4141  37   0  1584  484  408 1100 R  1.0   0:00.31 ping
 4150  37   0  1580  484  408 1096 R  1.0   0:00.30 ping
 4161  37   0  1580  480  408 1100 R  1.0   0:00.29 ping
 4172  37   0  1584  484  408 1100 R  1.0   0:00.28 ping
 4178  37   0  1580  480  408 1100 R  1.0   0:00.25 ping
 4182  37   0  1584  484  408 1100 R  1.0   0:00.23 ping

After that the loadavg starts to wrap.
And even then it is possible to login.
And that's not with the default 2.6 scheduler, but rather w/ spa.

Thanks!

--
Al




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
  2006-04-09 16:44 Al Boldi
@ 2006-04-09 18:33 ` Mike Galbraith
  2006-04-10 14:43   ` Al Boldi
  0 siblings, 1 reply; 43+ messages in thread
From: Mike Galbraith @ 2006-04-09 18:33 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel

On Sun, 2006-04-09 at 19:44 +0300, Al Boldi wrote:
> bert hubert wrote:
> > On Sun, Apr 09, 2006 at 01:39:38PM +0200, Mike Galbraith wrote:
> > > Ok, unusable may be overstated.  Nonetheless, that bit of code causes
> > > serious problems.  It makes my little PIII/500 test box trying to fill
> > > one 100Mbit local network unusable.  That is not overstated.
> >
> > If you try to make a PIII/500 fill 100mbit of TCP/IP using lots of
> > different processes, that IS a corner load.
> >
> > I'm sure you can fix this (rare) workload but are you very sure you are
> > not killing off performance for other situations?
> 
> This really has nothing to do w/ workload but rather w/ multi-user processing 
> /tasking /threading.  And the mere fact that the 2.6 kernel prefers 
> kernel-threads should imply an overall performance increase (think pdflush).
> 
> The reason why not many have noticed this scheduler problem(s) is because 
> most hackers nowadays work w/ the latest fastest hw available which does not 
> allow them to see these problems (think Windows, where most problems are 
> resolved by buying the latest hw).
> 
> Real Hackers never miss out on making their work run on the smallest common 
> denominator (think i386dx :).

Please don't trim the cc list.  I almost didn't see this, and I really
do want to hear each and every opinion.

	-Mike


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch][rfc] quell interactive feeding frenzy
@ 2006-04-09 16:44 Al Boldi
  2006-04-09 18:33 ` Mike Galbraith
  0 siblings, 1 reply; 43+ messages in thread
From: Al Boldi @ 2006-04-09 16:44 UTC (permalink / raw)
  To: linux-kernel

bert hubert wrote:
> On Sun, Apr 09, 2006 at 01:39:38PM +0200, Mike Galbraith wrote:
> > Ok, unusable may be overstated.  Nonetheless, that bit of code causes
> > serious problems.  It makes my little PIII/500 test box trying to fill
> > one 100Mbit local network unusable.  That is not overstated.
>
> If you try to make a PIII/500 fill 100mbit of TCP/IP using lots of
> different processes, that IS a corner load.
>
> I'm sure you can fix this (rare) workload but are you very sure you are
> not killing off performance for other situations?

This really has nothing to do w/ workload but rather w/ multi-user processing 
/tasking /threading.  And the mere fact that the 2.6 kernel prefers 
kernel-threads should imply an overall performance increase (think pdflush).

The reason why not many have noticed this scheduler problem(s) is because 
most hackers nowadays work w/ the latest fastest hw available which does not 
allow them to see these problems (think Windows, where most problems are 
resolved by buying the latest hw).

Real Hackers never miss out on making their work run on the smallest common 
denominator (think i386dx :).

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2006-04-16  8:58 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-07  9:38 [patch][rfc] quell interactive feeding frenzy Mike Galbraith
2006-04-07  9:47 ` Andrew Morton
2006-04-07  9:52   ` Ingo Molnar
2006-04-07 10:57     ` Mike Galbraith
2006-04-07 11:00       ` Con Kolivas
2006-04-07 11:09         ` Mike Galbraith
2006-04-07 10:40   ` Mike Galbraith
2006-04-07 12:56 ` Con Kolivas
2006-04-07 13:37   ` Mike Galbraith
2006-04-07 13:56     ` Con Kolivas
2006-04-07 14:14       ` Mike Galbraith
2006-04-07 15:16         ` Mike Galbraith
2006-04-09 11:14         ` bert hubert
2006-04-09 11:39           ` Mike Galbraith
2006-04-09 12:14             ` bert hubert
2006-04-09 18:07               ` Mike Galbraith
2006-04-10  9:12                 ` bert hubert
2006-04-10 10:00                   ` Mike Galbraith
2006-04-10 14:56                     ` Mike Galbraith
2006-04-13  7:41                       ` Mike Galbraith
2006-04-13 10:16                         ` Con Kolivas
2006-04-13 11:05                           ` Mike Galbraith
2006-04-09 18:24               ` Mike Galbraith
2006-04-09 16:44 Al Boldi
2006-04-09 18:33 ` Mike Galbraith
2006-04-10 14:43   ` Al Boldi
2006-04-11 10:57     ` Con Kolivas
     [not found] <200604112100.28725.kernel@kolivas.org>
2006-04-11 17:03 ` Fwd: " Al Boldi
2006-04-11 22:56   ` Con Kolivas
2006-04-12  5:41     ` Al Boldi
2006-04-12  6:22       ` Con Kolivas
2006-04-12  8:17         ` Al Boldi
2006-04-12  9:36           ` Con Kolivas
2006-04-12 10:39             ` Al Boldi
2006-04-12 11:27               ` Con Kolivas
2006-04-12 15:25                 ` Al Boldi
2006-04-13 11:51                   ` Con Kolivas
2006-04-14  3:16                     ` Al Boldi
2006-04-15  7:05                       ` Con Kolivas
2006-04-15 20:45                         ` Al Boldi
2006-04-15 23:22                           ` Con Kolivas
2006-04-16  6:02                   ` Con Kolivas
2006-04-16  8:31                     ` Al Boldi
2006-04-16  8:58                       ` Con Kolivas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).