linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: yield during swap prefetching
@ 2006-03-07 23:13 Con Kolivas
  2006-03-07 23:26 ` Andrew Morton
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-07 23:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, Andrew Morton, ck

Swap prefetching doesn't use very much cpu but spends a lot of time waiting on 
disk in uninterruptible sleep. This means it won't get preempted often even at 
a low nice level since it is seen as sleeping most of the time. We want to 
minimise its cpu impact so yield where possible.

Signed-off-by: Con Kolivas <kernel@kolivas.org>
---
 mm/swap_prefetch.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.15-ck5/mm/swap_prefetch.c
===================================================================
--- linux-2.6.15-ck5.orig/mm/swap_prefetch.c	2006-03-02 14:00:46.000000000 +1100
+++ linux-2.6.15-ck5/mm/swap_prefetch.c	2006-03-08 08:49:32.000000000 +1100
@@ -421,6 +421,7 @@ static enum trickle_return trickle_swap(
 
 		if (trickle_swap_cache_async(swp_entry, node) == TRICKLE_DELAY)
 			break;
+		yield();
 	}
 
 	if (sp_stat.prefetched_pages) {

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-07 23:13 [PATCH] mm: yield during swap prefetching Con Kolivas
@ 2006-03-07 23:26 ` Andrew Morton
  2006-03-07 23:32   ` Con Kolivas
  2006-03-08  8:48   ` [ck] Re: [PATCH] mm: yield during swap prefetching Andreas Mohr
  0 siblings, 2 replies; 112+ messages in thread
From: Andrew Morton @ 2006-03-07 23:26 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, linux-mm, ck

Con Kolivas <kernel@kolivas.org> wrote:
>
> Swap prefetching doesn't use very much cpu but spends a lot of time waiting on 
> disk in uninterruptible sleep. This means it won't get preempted often even at 
> a low nice level since it is seen as sleeping most of the time. We want to 
> minimise its cpu impact so yield where possible.
> 
> Signed-off-by: Con Kolivas <kernel@kolivas.org>
> ---
>  mm/swap_prefetch.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-2.6.15-ck5/mm/swap_prefetch.c
> ===================================================================
> --- linux-2.6.15-ck5.orig/mm/swap_prefetch.c	2006-03-02 14:00:46.000000000 +1100
> +++ linux-2.6.15-ck5/mm/swap_prefetch.c	2006-03-08 08:49:32.000000000 +1100
> @@ -421,6 +421,7 @@ static enum trickle_return trickle_swap(
>  
>  		if (trickle_swap_cache_async(swp_entry, node) == TRICKLE_DELAY)
>  			break;
> +		yield();
>  	}
>  
>  	if (sp_stat.prefetched_pages) {

yield() really sucks if there are a lot of runnable tasks.  And the amount
of CPU which that thread uses isn't likely to matter anyway.

I think it'd be better to just not do this.  Perhaps alter the thread's
static priority instead?  Does the scheduler have a knob which can be used
to disable a tasks's dynamic priority boost heuristic?

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-07 23:26 ` Andrew Morton
@ 2006-03-07 23:32   ` Con Kolivas
  2006-03-08  0:05     ` Andrew Morton
  2006-03-08 13:36     ` [ck] " Con Kolivas
  2006-03-08  8:48   ` [ck] Re: [PATCH] mm: yield during swap prefetching Andreas Mohr
  1 sibling, 2 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-07 23:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, ck

Andrew Morton writes:

> Con Kolivas <kernel@kolivas.org> wrote:
>>
>> Swap prefetching doesn't use very much cpu but spends a lot of time waiting on 
>> disk in uninterruptible sleep. This means it won't get preempted often even at 
>> a low nice level since it is seen as sleeping most of the time. We want to 
>> minimise its cpu impact so yield where possible.
>> 
>> Signed-off-by: Con Kolivas <kernel@kolivas.org>
>> ---
>>  mm/swap_prefetch.c |    1 +
>>  1 file changed, 1 insertion(+)
>> 
>> Index: linux-2.6.15-ck5/mm/swap_prefetch.c
>> ===================================================================
>> --- linux-2.6.15-ck5.orig/mm/swap_prefetch.c	2006-03-02 14:00:46.000000000 +1100
>> +++ linux-2.6.15-ck5/mm/swap_prefetch.c	2006-03-08 08:49:32.000000000 +1100
>> @@ -421,6 +421,7 @@ static enum trickle_return trickle_swap(
>>  
>>  		if (trickle_swap_cache_async(swp_entry, node) == TRICKLE_DELAY)
>>  			break;
>> +		yield();
>>  	}
>>  
>>  	if (sp_stat.prefetched_pages) {
> 
> yield() really sucks if there are a lot of runnable tasks.  And the amount
> of CPU which that thread uses isn't likely to matter anyway.
> 
> I think it'd be better to just not do this.  Perhaps alter the thread's
> static priority instead?  Does the scheduler have a knob which can be used
> to disable a tasks's dynamic priority boost heuristic?

We do have SCHED_BATCH but even that doesn't really have the desired effect. 
I know how much yield sucks and I actually want it to suck as much as yield 
does.

Cheers,
Con


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-07 23:32   ` Con Kolivas
@ 2006-03-08  0:05     ` Andrew Morton
  2006-03-08  0:51       ` Con Kolivas
  2006-03-08 22:24       ` Pavel Machek
  2006-03-08 13:36     ` [ck] " Con Kolivas
  1 sibling, 2 replies; 112+ messages in thread
From: Andrew Morton @ 2006-03-08  0:05 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, linux-mm, ck

Con Kolivas <kernel@kolivas.org> wrote:
>
> > yield() really sucks if there are a lot of runnable tasks.  And the amount
> > of CPU which that thread uses isn't likely to matter anyway.
> > 
> > I think it'd be better to just not do this.  Perhaps alter the thread's
> > static priority instead?  Does the scheduler have a knob which can be used
> > to disable a tasks's dynamic priority boost heuristic?
> 
> We do have SCHED_BATCH but even that doesn't really have the desired effect. 
> I know how much yield sucks and I actually want it to suck as much as yield 
> does.

Why do you want that?

If prefetch is doing its job then it will save the machine from a pile of
major faults in the near future.  The fact that the machine happens to be
running a number of busy tasks doesn't alter that.  It's _worth_ stealing a
few cycles from those tasks now to avoid lengthy D-state sleeps in the near
future?

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  0:05     ` Andrew Morton
@ 2006-03-08  0:51       ` Con Kolivas
  2006-03-08  1:11         ` Andrew Morton
  2006-03-08 22:24       ` Pavel Machek
  1 sibling, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-08  0:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, ck

On Wed, 8 Mar 2006 11:05 am, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> > > yield() really sucks if there are a lot of runnable tasks.  And the
> > > amount of CPU which that thread uses isn't likely to matter anyway.
> > >
> > > I think it'd be better to just not do this.  Perhaps alter the thread's
> > > static priority instead?  Does the scheduler have a knob which can be
> > > used to disable a tasks's dynamic priority boost heuristic?
> >
> > We do have SCHED_BATCH but even that doesn't really have the desired
> > effect. I know how much yield sucks and I actually want it to suck as
> > much as yield does.
>
> Why do you want that?
>
> If prefetch is doing its job then it will save the machine from a pile of
> major faults in the near future.  The fact that the machine happens to be
> running a number of busy tasks doesn't alter that.  It's _worth_ stealing a
> few cycles from those tasks now to avoid lengthy D-state sleeps in the near
> future?

The test case is the 3d (gaming) app that uses 100% cpu. It never sets delay 
swap prefetch in any way so swap prefetching starts working. Once swap 
prefetching starts reading it is mostly in uninterruptible sleep and always 
wakes up on the active array ready for cpu, never expiring even with its 
miniscule timeslice. The 3d app is always expiring and landing on the expired 
array behind kprefetchd even though kprefetchd is nice 19. The practical 
upshot of all this is that kprefetchd does a lot of prefetching with 3d 
gaming going on, and no amount of priority fiddling stops it doing this. The 
disk access is noticeable during 3d gaming unfortunately. Yielding regularly 
means a heck of a lot less prefetching occurs and is no longer noticeable. 
When idle, yield()ing doesn't seem to adversely affect the effectiveness of 
the prefetching.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  0:51       ` Con Kolivas
@ 2006-03-08  1:11         ` Andrew Morton
  2006-03-08  1:12           ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Andrew Morton @ 2006-03-08  1:11 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, linux-mm, ck

Con Kolivas <kernel@kolivas.org> wrote:
>
> On Wed, 8 Mar 2006 11:05 am, Andrew Morton wrote:
> > Con Kolivas <kernel@kolivas.org> wrote:
> > > > yield() really sucks if there are a lot of runnable tasks.  And the
> > > > amount of CPU which that thread uses isn't likely to matter anyway.
> > > >
> > > > I think it'd be better to just not do this.  Perhaps alter the thread's
> > > > static priority instead?  Does the scheduler have a knob which can be
> > > > used to disable a tasks's dynamic priority boost heuristic?
> > >
> > > We do have SCHED_BATCH but even that doesn't really have the desired
> > > effect. I know how much yield sucks and I actually want it to suck as
> > > much as yield does.
> >
> > Why do you want that?
> >
> > If prefetch is doing its job then it will save the machine from a pile of
> > major faults in the near future.  The fact that the machine happens to be
> > running a number of busy tasks doesn't alter that.  It's _worth_ stealing a
> > few cycles from those tasks now to avoid lengthy D-state sleeps in the near
> > future?
> 
> The test case is the 3d (gaming) app that uses 100% cpu. It never sets delay 
> swap prefetch in any way so swap prefetching starts working. Once swap 
> prefetching starts reading it is mostly in uninterruptible sleep and always 
> wakes up on the active array ready for cpu, never expiring even with its 
> miniscule timeslice. The 3d app is always expiring and landing on the expired 
> array behind kprefetchd even though kprefetchd is nice 19. The practical 
> upshot of all this is that kprefetchd does a lot of prefetching with 3d 
> gaming going on, and no amount of priority fiddling stops it doing this. The 
> disk access is noticeable during 3d gaming unfortunately. Yielding regularly 
> means a heck of a lot less prefetching occurs and is no longer noticeable. 
> When idle, yield()ing doesn't seem to adversely affect the effectiveness of 
> the prefetching.
> 

but, but.  If prefetching is prefetching stuff which that game will soon
use then it'll be an aggregate improvement.  If prefetch is prefetching
stuff which that game _won't_ use then prefetch is busted.  Using yield()
to artificially cripple kprefetchd is a rather sad workaround isn't it?

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  1:11         ` Andrew Morton
@ 2006-03-08  1:12           ` Con Kolivas
  2006-03-08  1:19             ` Con Kolivas
                               ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-08  1:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, ck

On Wed, 8 Mar 2006 12:11 pm, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> > On Wed, 8 Mar 2006 11:05 am, Andrew Morton wrote:
> > > Con Kolivas <kernel@kolivas.org> wrote:
> > > > > yield() really sucks if there are a lot of runnable tasks.  And the
> > > > > amount of CPU which that thread uses isn't likely to matter anyway.
> > > > >
> > > > > I think it'd be better to just not do this.  Perhaps alter the
> > > > > thread's static priority instead?  Does the scheduler have a knob
> > > > > which can be used to disable a tasks's dynamic priority boost
> > > > > heuristic?
> > > >
> > > > We do have SCHED_BATCH but even that doesn't really have the desired
> > > > effect. I know how much yield sucks and I actually want it to suck as
> > > > much as yield does.
> > >
> > > Why do you want that?
> > >
> > > If prefetch is doing its job then it will save the machine from a pile
> > > of major faults in the near future.  The fact that the machine happens
> > > to be running a number of busy tasks doesn't alter that.  It's _worth_
> > > stealing a few cycles from those tasks now to avoid lengthy D-state
> > > sleeps in the near future?
> >
> > The test case is the 3d (gaming) app that uses 100% cpu. It never sets
> > delay swap prefetch in any way so swap prefetching starts working. Once
> > swap prefetching starts reading it is mostly in uninterruptible sleep and
> > always wakes up on the active array ready for cpu, never expiring even
> > with its miniscule timeslice. The 3d app is always expiring and landing
> > on the expired array behind kprefetchd even though kprefetchd is nice 19.
> > The practical upshot of all this is that kprefetchd does a lot of
> > prefetching with 3d gaming going on, and no amount of priority fiddling
> > stops it doing this. The disk access is noticeable during 3d gaming
> > unfortunately. Yielding regularly means a heck of a lot less prefetching
> > occurs and is no longer noticeable. When idle, yield()ing doesn't seem to
> > adversely affect the effectiveness of the prefetching.
>
> but, but.  If prefetching is prefetching stuff which that game will soon
> use then it'll be an aggregate improvement.  If prefetch is prefetching
> stuff which that game _won't_ use then prefetch is busted.  Using yield()
> to artificially cripple kprefetchd is a rather sad workaround isn't it?

It's not the stuff that it prefetches that's the problem; it's the disk 
access.

Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  1:12           ` Con Kolivas
@ 2006-03-08  1:19             ` Con Kolivas
  2006-03-08  1:23             ` Andrew Morton
  2006-03-09  8:57             ` Helge Hafting
  2 siblings, 0 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-08  1:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, ck

On Wed, 8 Mar 2006 12:12 pm, Con Kolivas wrote:
> On Wed, 8 Mar 2006 12:11 pm, Andrew Morton wrote:
> > Con Kolivas <kernel@kolivas.org> wrote:
> > > On Wed, 8 Mar 2006 11:05 am, Andrew Morton wrote:
> > > > Con Kolivas <kernel@kolivas.org> wrote:
> > > > > > yield() really sucks if there are a lot of runnable tasks.  And
> > > > > > the amount of CPU which that thread uses isn't likely to matter
> > > > > > anyway.
> > > > > >
> > > > > > I think it'd be better to just not do this.  Perhaps alter the
> > > > > > thread's static priority instead?  Does the scheduler have a knob
> > > > > > which can be used to disable a tasks's dynamic priority boost
> > > > > > heuristic?
> > > > >
> > > > > We do have SCHED_BATCH but even that doesn't really have the
> > > > > desired effect. I know how much yield sucks and I actually want it
> > > > > to suck as much as yield does.
> > > >
> > > > Why do you want that?
> > > >
> > > > If prefetch is doing its job then it will save the machine from a
> > > > pile of major faults in the near future.  The fact that the machine
> > > > happens to be running a number of busy tasks doesn't alter that. 
> > > > It's _worth_ stealing a few cycles from those tasks now to avoid
> > > > lengthy D-state sleeps in the near future?
> > >
> > > The test case is the 3d (gaming) app that uses 100% cpu. It never sets
> > > delay swap prefetch in any way so swap prefetching starts working. Once
> > > swap prefetching starts reading it is mostly in uninterruptible sleep
> > > and always wakes up on the active array ready for cpu, never expiring
> > > even with its miniscule timeslice. The 3d app is always expiring and
> > > landing on the expired array behind kprefetchd even though kprefetchd
> > > is nice 19. The practical upshot of all this is that kprefetchd does a
> > > lot of prefetching with 3d gaming going on, and no amount of priority
> > > fiddling stops it doing this. The disk access is noticeable during 3d
> > > gaming unfortunately. Yielding regularly means a heck of a lot less
> > > prefetching occurs and is no longer noticeable. When idle, yield()ing
> > > doesn't seem to adversely affect the effectiveness of the prefetching.
> >
> > but, but.  If prefetching is prefetching stuff which that game will soon
> > use then it'll be an aggregate improvement.  If prefetch is prefetching
> > stuff which that game _won't_ use then prefetch is busted.  Using yield()
> > to artificially cripple kprefetchd is a rather sad workaround isn't it?
>
> It's not the stuff that it prefetches that's the problem; it's the disk
> access.

I guess what I'm saying is there isn't enough information to delay swap 
prefetch when cpu usage is high which was my intention as well. Yielding has 
the desired effect without adding further accounting checks to swap_prefetch.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  1:12           ` Con Kolivas
  2006-03-08  1:19             ` Con Kolivas
@ 2006-03-08  1:23             ` Andrew Morton
  2006-03-08  1:28               ` Con Kolivas
  2006-03-09  8:57             ` Helge Hafting
  2 siblings, 1 reply; 112+ messages in thread
From: Andrew Morton @ 2006-03-08  1:23 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, linux-mm, ck

Con Kolivas <kernel@kolivas.org> wrote:
>
> > but, but.  If prefetching is prefetching stuff which that game will soon
> > use then it'll be an aggregate improvement.  If prefetch is prefetching
> > stuff which that game _won't_ use then prefetch is busted.  Using yield()
> > to artificially cripple kprefetchd is a rather sad workaround isn't it?
> 
> It's not the stuff that it prefetches that's the problem; it's the disk 
> access.

But the prefetch code tries to avoid prefetching when the disk is otherwise
busy (or it should - we discussed that a bit a while ago).

Sorry, I'm not trying to be awkward here - I think that nobbling prefetch
when there's a lot of CPU activity is just the wrong thing to do and it'll
harm other workloads.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  1:23             ` Andrew Morton
@ 2006-03-08  1:28               ` Con Kolivas
  2006-03-08  2:08                 ` Lee Revell
  2006-03-08  7:51                 ` Jan Knutar
  0 siblings, 2 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-08  1:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, ck

On Wed, 8 Mar 2006 12:23 pm, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> > > but, but.  If prefetching is prefetching stuff which that game will
> > > soon use then it'll be an aggregate improvement.  If prefetch is
> > > prefetching stuff which that game _won't_ use then prefetch is busted. 
> > > Using yield() to artificially cripple kprefetchd is a rather sad
> > > workaround isn't it?
> >
> > It's not the stuff that it prefetches that's the problem; it's the disk
> > access.
>
> But the prefetch code tries to avoid prefetching when the disk is otherwise
> busy (or it should - we discussed that a bit a while ago).

Anything that does disk access delays prefetch fine. Things that only do heavy 
cpu do not delay prefetch. Anything reading from disk will be noticeable 
during 3d gaming.

> Sorry, I'm not trying to be awkward here - I think that nobbling prefetch
> when there's a lot of CPU activity is just the wrong thing to do and it'll
> harm other workloads.

I can't distinguish between when cpu activity is important (game) and when it 
is not (compile), and assuming worst case scenario and not doing any swap 
prefetching is my intent. I could add cpu accounting to prefetch_suitable() 
instead, but that gets rather messy and yielding achieves the same endpoint.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  1:28               ` Con Kolivas
@ 2006-03-08  2:08                 ` Lee Revell
  2006-03-08  2:12                   ` Con Kolivas
  2006-03-08  7:51                 ` Jan Knutar
  1 sibling, 1 reply; 112+ messages in thread
From: Lee Revell @ 2006-03-08  2:08 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, linux-kernel, linux-mm, ck

On Wed, 2006-03-08 at 12:28 +1100, Con Kolivas wrote:
> I can't distinguish between when cpu activity is important (game) and when it 
> is not (compile), and assuming worst case scenario and not doing any swap 
> prefetching is my intent. I could add cpu accounting to prefetch_suitable() 
> instead, but that gets rather messy and yielding achieves the same endpoint. 

Shouldn't the game be running with RT priority or at least at a low nice
value?

Lee


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  2:08                 ` Lee Revell
@ 2006-03-08  2:12                   ` Con Kolivas
  2006-03-08  2:18                     ` Lee Revell
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-08  2:12 UTC (permalink / raw)
  To: Lee Revell; +Cc: Andrew Morton, linux-kernel, linux-mm, ck

On Wed, 8 Mar 2006 01:08 pm, Lee Revell wrote:
> On Wed, 2006-03-08 at 12:28 +1100, Con Kolivas wrote:
> > I can't distinguish between when cpu activity is important (game) and
> > when it is not (compile), and assuming worst case scenario and not doing
> > any swap prefetching is my intent. I could add cpu accounting to
> > prefetch_suitable() instead, but that gets rather messy and yielding
> > achieves the same endpoint.
>
> Shouldn't the game be running with RT priority or at least at a low nice
> value?

No way. Games run nice 0 SCHED_NORMAL.

Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  2:12                   ` Con Kolivas
@ 2006-03-08  2:18                     ` Lee Revell
  2006-03-08  2:22                       ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Lee Revell @ 2006-03-08  2:18 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, linux-kernel, linux-mm, ck

On Wed, 2006-03-08 at 13:12 +1100, Con Kolivas wrote:
> On Wed, 8 Mar 2006 01:08 pm, Lee Revell wrote:
> > On Wed, 2006-03-08 at 12:28 +1100, Con Kolivas wrote:
> > > I can't distinguish between when cpu activity is important (game) and
> > > when it is not (compile), and assuming worst case scenario and not doing
> > > any swap prefetching is my intent. I could add cpu accounting to
> > > prefetch_suitable() instead, but that gets rather messy and yielding
> > > achieves the same endpoint.
> >
> > Shouldn't the game be running with RT priority or at least at a low nice
> > value?
> 
> No way. Games run nice 0 SCHED_NORMAL.

Maybe this is a stupid/OT question (answer off list if you think so) but
why not?  Isn't that the standard way of telling the scheduler that you
have a realtime constraint?  It's how pro audio stuff works which I
would think has similar RT requirements.

How is the scheduler supposed to know to penalize a kernel compile
taking 100% CPU but not a game using 100% CPU?

Lee


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  2:18                     ` Lee Revell
@ 2006-03-08  2:22                       ` Con Kolivas
  2006-03-08  2:27                         ` Lee Revell
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-08  2:22 UTC (permalink / raw)
  To: Lee Revell; +Cc: Andrew Morton, linux-kernel, linux-mm, ck

On Wed, 8 Mar 2006 01:18 pm, Lee Revell wrote:
> On Wed, 2006-03-08 at 13:12 +1100, Con Kolivas wrote:
> > On Wed, 8 Mar 2006 01:08 pm, Lee Revell wrote:
> > > On Wed, 2006-03-08 at 12:28 +1100, Con Kolivas wrote:
> > > > I can't distinguish between when cpu activity is important (game) and
> > > > when it is not (compile), and assuming worst case scenario and not
> > > > doing any swap prefetching is my intent. I could add cpu accounting
> > > > to prefetch_suitable() instead, but that gets rather messy and
> > > > yielding achieves the same endpoint.
> > >
> > > Shouldn't the game be running with RT priority or at least at a low
> > > nice value?
> >
> > No way. Games run nice 0 SCHED_NORMAL.
>
> Maybe this is a stupid/OT question (answer off list if you think so) but
> why not?  Isn't that the standard way of telling the scheduler that you
> have a realtime constraint?  It's how pro audio stuff works which I
> would think has similar RT requirements.
>
> How is the scheduler supposed to know to penalize a kernel compile
> taking 100% CPU but not a game using 100% CPU?

Because being a serious desktop operating system that we are (bwahahahaha) 
means the user should not have special privileges to run something as simple 
as a game. Games should not need special scheduling classes. We can always 
use 'nice' for a compile though. Real time audio is a completely different 
world to this. 

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  2:22                       ` Con Kolivas
@ 2006-03-08  2:27                         ` Lee Revell
  2006-03-08  2:30                           ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Lee Revell @ 2006-03-08  2:27 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, linux-kernel, linux-mm, ck

On Wed, 2006-03-08 at 13:22 +1100, Con Kolivas wrote:
> > How is the scheduler supposed to know to penalize a kernel compile
> > taking 100% CPU but not a game using 100% CPU?
> 
> Because being a serious desktop operating system that we are (bwahahahaha) 
> means the user should not have special privileges to run something as simple 
> as a game. Games should not need special scheduling classes. We can always 
> use 'nice' for a compile though. Real time audio is a completely different 
> world to this.  

Actually recent distros like the upcoming Ubuntu Dapper support the new
RLIMIT_NICE and RLIMIT_RTPRIO so this would Just Work without any
special privileges (well, not root anyway - you'd have to put the user
in the right group and add one line to /etc/security/limits.conf).

I think OSX also uses special scheduling classes for stuff with RT
constraints.

The only barrier I see is that games aren't specifically written to take
advantage of RT scheduling because historically it's only been available
to root.

Lee


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  2:27                         ` Lee Revell
@ 2006-03-08  2:30                           ` Con Kolivas
  2006-03-08  2:52                             ` [ck] " André Goddard Rosa
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-08  2:30 UTC (permalink / raw)
  To: Lee Revell; +Cc: Andrew Morton, linux-kernel, linux-mm, ck

On Wed, 8 Mar 2006 01:27 pm, Lee Revell wrote:
> On Wed, 2006-03-08 at 13:22 +1100, Con Kolivas wrote:
> > > How is the scheduler supposed to know to penalize a kernel compile
> > > taking 100% CPU but not a game using 100% CPU?
> >
> > Because being a serious desktop operating system that we are
> > (bwahahahaha) means the user should not have special privileges to run
> > something as simple as a game. Games should not need special scheduling
> > classes. We can always use 'nice' for a compile though. Real time audio
> > is a completely different world to this.
>
> Actually recent distros like the upcoming Ubuntu Dapper support the new
> RLIMIT_NICE and RLIMIT_RTPRIO so this would Just Work without any
> special privileges (well, not root anyway - you'd have to put the user
> in the right group and add one line to /etc/security/limits.conf).
>
> I think OSX also uses special scheduling classes for stuff with RT
> constraints.
>
> The only barrier I see is that games aren't specifically written to take
> advantage of RT scheduling because historically it's only been available
> to root.

Well as I said in my previous reply, games should _not_ need special 
scheduling classes. They are not written in a real time smart way and they do 
not have any realtime constraints or requirements.

Cheers,
Con


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [ck] Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  2:30                           ` Con Kolivas
@ 2006-03-08  2:52                             ` André Goddard Rosa
  2006-03-08  3:03                               ` Lee Revell
  2006-03-08  3:05                               ` Con Kolivas
  0 siblings, 2 replies; 112+ messages in thread
From: André Goddard Rosa @ 2006-03-08  2:52 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Lee Revell, Andrew Morton, linux-mm, linux-kernel, ck

[...]
> > > Because being a serious desktop operating system that we are
> > > (bwahahahaha) means the user should not have special privileges to run
> > > something as simple as a game. Games should not need special scheduling
> > > classes. We can always use 'nice' for a compile though. Real time audio
> > > is a completely different world to this.
[...]
> Well as I said in my previous reply, games should _not_ need special
> scheduling classes. They are not written in a real time smart way and they do
> not have any realtime constraints or requirements.

Sorry Con, but I have to disagree with you on this.

Games are very complex software, involving heavy use of hardware resources
and they also have a lot of time constraints. So, I think they should
use RT priorities
if it is necessary to get the resources needed in time.

Thanks,
--
[]s,

André Goddard

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [ck] Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  2:52                             ` [ck] " André Goddard Rosa
@ 2006-03-08  3:03                               ` Lee Revell
  2006-03-08  3:05                               ` Con Kolivas
  1 sibling, 0 replies; 112+ messages in thread
From: Lee Revell @ 2006-03-08  3:03 UTC (permalink / raw)
  To: André Goddard Rosa
  Cc: Con Kolivas, Andrew Morton, linux-mm, linux-kernel, ck

On Tue, 2006-03-07 at 22:52 -0400, André Goddard Rosa wrote:
> Sorry Con, but I have to disagree with you on this.
> 
> Games are very complex software, involving heavy use of hardware
> resources
> and they also have a lot of time constraints. So, I think they should
> use RT priorities
> if it is necessary to get the resources needed in time.
> 

The main reason I assumed games would want to use the POSIX realtime
features like priority scheduling etc. is that the simulation people all
use them - it seems like a very similar problem.

Lee


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  2:52                             ` [ck] " André Goddard Rosa
  2006-03-08  3:03                               ` Lee Revell
@ 2006-03-08  3:05                               ` Con Kolivas
  2006-03-08 21:07                                 ` Zan Lynx
  1 sibling, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-08  3:05 UTC (permalink / raw)
  To: André Goddard Rosa
  Cc: Lee Revell, Andrew Morton, linux-mm, linux-kernel, ck

[-- Attachment #1: Type: text/plain, Size: 2473 bytes --]

André Goddard Rosa writes:

> [...]
>> > > Because being a serious desktop operating system that we are
>> > > (bwahahahaha) means the user should not have special privileges to run
>> > > something as simple as a game. Games should not need special scheduling
>> > > classes. We can always use 'nice' for a compile though. Real time audio
>> > > is a completely different world to this.
> [...]
>> Well as I said in my previous reply, games should _not_ need special
>> scheduling classes. They are not written in a real time smart way and they do
>> not have any realtime constraints or requirements.
>
> Sorry Con, but I have to disagree with you on this.
>
> Games are very complex software, involving heavy use of hardware resources
> and they also have a lot of time constraints. So, I think they should
> use RT priorities
> if it is necessary to get the resources needed in time.

Excellent, I've opened the can of worms.

Yes, games are a in incredibly complex beast.

No they shouldn't need real time scheduling to work well if they are coded
properly. However, witness the fact that most of our games are windows
ports, therefore being lower quality than the original. Witness also the
fact that at last with dual core support, lots and lots (but not all) of
windows games on _windows_ are having scheduling trouble and jerky playback,
forcing them to crappily force binding to one cpu. As much as I'd love to
blame windows, it is almost certainly due to the coding of the application
since better games don't exhibit this problem. Now the games in question
can't be trusted to even run on SMP; do you really think they could cope
with good real time code? Good -complex- real time coding is very difficult.
If you take any game out there that currently exists and throw real time
scheduling at it, almost certainly it will hang the machine. No, I don't
believe games need realtime scheduling to work well; they just need to be
written well and the kernel needs to be unintrusive enough to work well with
them. Otherwise gaming would have needed realtime scheduling from day
one on all operating systems. Generic kernel activities should not cause
game stuttering either as users have little control over them. I do expect
users to not run too many userspace programs while trying to play games
though. I do not believe we should make games work well in the presence of
updatedb running for example.

Cheers,
Con


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  1:28               ` Con Kolivas
  2006-03-08  2:08                 ` Lee Revell
@ 2006-03-08  7:51                 ` Jan Knutar
  2006-03-08  8:39                   ` Con Kolivas
  1 sibling, 1 reply; 112+ messages in thread
From: Jan Knutar @ 2006-03-08  7:51 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, linux-kernel, linux-mm, ck

On Wednesday 08 March 2006 03:28, Con Kolivas wrote:

> Anything that does disk access delays prefetch fine. Things that only do heavy 
> cpu do not delay prefetch. Anything reading from disk will be noticeable 
> during 3d gaming.

What exactly makes the disk accesses noticeable? Is it because they steal
time from the disk that the game otherwise would need, or do the disk accesses
themselves consume noticeable amounts of CPU time? 
Or, do bits of the game's executable drop from memory to make room for the
new stuff being pulled in from memory, causing the game to halt while it waits
for its pages to come back? On a related note, through advanced use of
handwaving and guessing, this seems to be the thing that kills my destop
experience (*buzzword alert*) most often. Checksumming a large file
seems to be less of an impact than things that seek alot, like updatedb.

I remember playing vegastrike on my linux desktop machine few years ago,
the game leaked so much memory that it filled my 2G swap rather often,
unleashing OOM killer mayhem. I "solved" this by putting swap on floppy at
lower priority than the 2G, and a 128M swap file as "backup" at even lower
priority than the floppy. I didn't notice the swapping to harddrive, but when it
started to swap to floppy, it made the game run a bit slower for a few seconds,
plus the floppy light went on, and I knew I had 128M left to  save my position
and quit.

If I needed floppy to make disk access noticeable on my very low end
machine... What are these new fancy things doing to make HD access
noticeable?

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  7:51                 ` Jan Knutar
@ 2006-03-08  8:39                   ` Con Kolivas
  0 siblings, 0 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-08  8:39 UTC (permalink / raw)
  To: Jan Knutar; +Cc: Andrew Morton, linux-kernel, linux-mm, ck

On Wednesday 08 March 2006 18:51, Jan Knutar wrote:
> On Wednesday 08 March 2006 03:28, Con Kolivas wrote:
> > Anything that does disk access delays prefetch fine. Things that only do
> > heavy cpu do not delay prefetch. Anything reading from disk will be
> > noticeable during 3d gaming.
>
> What exactly makes the disk accesses noticeable? Is it because they steal
> time from the disk that the game otherwise would need, or do the disk
> accesses themselves consume noticeable amounts of CPU time?
> Or, do bits of the game's executable drop from memory to make room for the
> new stuff being pulled in from memory, causing the game to halt while it
> waits for its pages to come back? On a related note, through advanced use
> of handwaving and guessing, this seems to be the thing that kills my destop
> experience (*buzzword alert*) most often. Checksumming a large file seems
> to be less of an impact than things that seek alot, like updatedb.
>
> I remember playing vegastrike on my linux desktop machine few years ago,
> the game leaked so much memory that it filled my 2G swap rather often,
> unleashing OOM killer mayhem. I "solved" this by putting swap on floppy at
> lower priority than the 2G, and a 128M swap file as "backup" at even lower
> priority than the floppy. I didn't notice the swapping to harddrive, but
> when it started to swap to floppy, it made the game run a bit slower for a
> few seconds, plus the floppy light went on, and I knew I had 128M left to 
> save my position and quit.
>
> If I needed floppy to make disk access noticeable on my very low end
> machine... What are these new fancy things doing to make HD access
> noticeable?

It's the cumulative effect of the cpu used by the in kernel code paths and the 
kprefetchd kernel thread. Even running ultra low priority, if they read a lot 
from the hard drive it costs us cpu time (seen as I/O wait in top for 
example). Swap prefetch _never_ displaces anything from ram; it only ever 
reads things from swap if there is generous free ram available. Not only that 
but if it reads something from swap it is put at the end of the "least 
recently used" list meaning that if _anything_ needs ram, these are the first 
things displaced again.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [ck] Re: [PATCH] mm: yield during swap prefetching
  2006-03-07 23:26 ` Andrew Morton
  2006-03-07 23:32   ` Con Kolivas
@ 2006-03-08  8:48   ` Andreas Mohr
  2006-03-08  8:52     ` Con Kolivas
  1 sibling, 1 reply; 112+ messages in thread
From: Andreas Mohr @ 2006-03-08  8:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Con Kolivas, ck, linux-mm, linux-kernel

Hi,

On Tue, Mar 07, 2006 at 03:26:36PM -0800, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> >
> > Swap prefetching doesn't use very much cpu but spends a lot of time waiting on 
> > disk in uninterruptible sleep. This means it won't get preempted often even at 
> > a low nice level since it is seen as sleeping most of the time. We want to 
> > minimise its cpu impact so yield where possible.

> yield() really sucks if there are a lot of runnable tasks.  And the amount
> of CPU which that thread uses isn't likely to matter anyway.
> 
> I think it'd be better to just not do this.  Perhaps alter the thread's
> static priority instead?  Does the scheduler have a knob which can be used
> to disable a tasks's dynamic priority boost heuristic?

This problem occurs due to giving a priority boost to processes that are
sleeping a lot (e.g. in this case, I/O, from disk), right?
Forgive me my possibly less insightful comments, but maybe instead of adding
crude specific hacks (namely, yield()) to each specific problematic process as
it comes along (it just happens to be the swap prefetch thread this time)
there is a *general way* to give processes with lots of disk I/O sleeping
much smaller amounts of boost in order to get them preempted more often
in favour of an actually much more critical process (game)?
>From the discussion here it seems this problem is caused by a *general*
miscalculation of processes sleeping on disk I/O a lot.

Thus IMHO this problem should be solved in a general way if at all possible.

Andreas Mohr

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [ck] Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  8:48   ` [ck] Re: [PATCH] mm: yield during swap prefetching Andreas Mohr
@ 2006-03-08  8:52     ` Con Kolivas
  0 siblings, 0 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-08  8:52 UTC (permalink / raw)
  To: Andreas Mohr; +Cc: Andrew Morton, ck, linux-mm, linux-kernel

On Wednesday 08 March 2006 19:48, Andreas Mohr wrote:
> Hi,
>
> On Tue, Mar 07, 2006 at 03:26:36PM -0800, Andrew Morton wrote:
> > Con Kolivas <kernel@kolivas.org> wrote:
> > > Swap prefetching doesn't use very much cpu but spends a lot of time
> > > waiting on disk in uninterruptible sleep. This means it won't get
> > > preempted often even at a low nice level since it is seen as sleeping
> > > most of the time. We want to minimise its cpu impact so yield where
> > > possible.
> >
> > yield() really sucks if there are a lot of runnable tasks.  And the
> > amount of CPU which that thread uses isn't likely to matter anyway.
> >
> > I think it'd be better to just not do this.  Perhaps alter the thread's
> > static priority instead?  Does the scheduler have a knob which can be
> > used to disable a tasks's dynamic priority boost heuristic?
>
> This problem occurs due to giving a priority boost to processes that are
> sleeping a lot (e.g. in this case, I/O, from disk), right?
> Forgive me my possibly less insightful comments, but maybe instead of
> adding crude specific hacks (namely, yield()) to each specific problematic
> process as it comes along (it just happens to be the swap prefetch thread
> this time) there is a *general way* to give processes with lots of disk I/O
> sleeping much smaller amounts of boost in order to get them preempted more
> often in favour of an actually much more critical process (game)?
>
> >From the discussion here it seems this problem is caused by a *general*
>
> miscalculation of processes sleeping on disk I/O a lot.
>
> Thus IMHO this problem should be solved in a general way if at all
> possible.

No. We already do special things for tasks waiting on uninterruptible sleep. 
This is more about what is exaggerated on a dual array expiring scheduler 
design that mainline has.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [ck] Re: [PATCH] mm: yield during swap prefetching
  2006-03-07 23:32   ` Con Kolivas
  2006-03-08  0:05     ` Andrew Morton
@ 2006-03-08 13:36     ` Con Kolivas
  2006-03-17  9:06       ` Ingo Molnar
  1 sibling, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-08 13:36 UTC (permalink / raw)
  To: ck; +Cc: Andrew Morton, linux-mm, linux-kernel, Ingo Molnar

cc'ing Ingo...

On Wednesday 08 March 2006 10:32, Con Kolivas wrote:
> Andrew Morton writes:
> > Con Kolivas <kernel@kolivas.org> wrote:
> >> Swap prefetching doesn't use very much cpu but spends a lot of time
> >> waiting on disk in uninterruptible sleep. This means it won't get
> >> preempted often even at a low nice level since it is seen as sleeping
> >> most of the time. We want to minimise its cpu impact so yield where
> >> possible.
> >>
> >> Signed-off-by: Con Kolivas <kernel@kolivas.org>
> >> ---
> >>  mm/swap_prefetch.c |    1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> Index: linux-2.6.15-ck5/mm/swap_prefetch.c
> >> ===================================================================
> >> --- linux-2.6.15-ck5.orig/mm/swap_prefetch.c	2006-03-02
> >> 14:00:46.000000000 +1100 +++
> >> linux-2.6.15-ck5/mm/swap_prefetch.c	2006-03-08 08:49:32.000000000 +1100
> >> @@ -421,6 +421,7 @@ static enum trickle_return trickle_swap(
> >>
> >>  		if (trickle_swap_cache_async(swp_entry, node) == TRICKLE_DELAY)
> >>  			break;
> >> +		yield();
> >>  	}
> >>
> >>  	if (sp_stat.prefetched_pages) {
> >
> > yield() really sucks if there are a lot of runnable tasks.  And the
> > amount of CPU which that thread uses isn't likely to matter anyway.
> >
> > I think it'd be better to just not do this.  Perhaps alter the thread's
> > static priority instead?  Does the scheduler have a knob which can be
> > used to disable a tasks's dynamic priority boost heuristic?
>
> We do have SCHED_BATCH but even that doesn't really have the desired
> effect. I know how much yield sucks and I actually want it to suck as much
> as yield does.

Thinking some more on this I wonder if SCHED_BATCH isn't a strong enough 
scheduling hint if it's not suitable for such an application. Ingo do you 
think we could make SCHED_BATCH tasks always wake up on the expired array?

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  3:05                               ` Con Kolivas
@ 2006-03-08 21:07                                 ` Zan Lynx
  2006-03-08 23:00                                   ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Zan Lynx @ 2006-03-08 21:07 UTC (permalink / raw)
  To: Con Kolivas
  Cc: André Goddard Rosa, Lee Revell, Andrew Morton, linux-mm,
	linux-kernel, ck

[-- Attachment #1: Type: text/plain, Size: 2429 bytes --]

On Wed, 2006-03-08 at 14:05 +1100, Con Kolivas wrote:
> André Goddard Rosa writes:
> 
> > [...]
> >> > > Because being a serious desktop operating system that we are
> >> > > (bwahahahaha) means the user should not have special privileges to run
> >> > > something as simple as a game. Games should not need special scheduling
> >> > > classes. We can always use 'nice' for a compile though. Real time audio
> >> > > is a completely different world to this.
> > [...]
> >> Well as I said in my previous reply, games should _not_ need special
> >> scheduling classes. They are not written in a real time smart way and they do
> >> not have any realtime constraints or requirements.
> > 
> > Sorry Con, but I have to disagree with you on this.
> > 
> > Games are very complex software, involving heavy use of hardware resources
> > and they also have a lot of time constraints. So, I think they should
> > use RT priorities
> > if it is necessary to get the resources needed in time.
> 
> Excellent, I've opened the can of worms.
> 
> Yes, games are a in incredibly complex beast.
> 
> No they shouldn't need real time scheduling to work well if they are coded 
> properly. However, witness the fact that most of our games are windows 
> ports, therefore being lower quality than the original. Witness also the 
> fact that at last with dual core support, lots and lots (but not all) of 
> windows games on _windows_ are having scheduling trouble and jerky playback, 
> forcing them to crappily force binding to one cpu.
[snip]

Games where you notice frame-rate chop because the *disk system* is
using too much CPU are perfect examples of applications that should be
using real-time.

Multiple CPU cores and multithreading in games is another perfect
example of programming that *needs* predictable real-time thread
priorities.  There is no other way to guarantee that physics processing
takes priority over graphics updates or AI, once each task becomes
separated from a monolithic main loop and spread over several CPU cores.

Because games often *are* badly written, a user-friendly Linux gaming
system does need a high-priority real-time task watching for a special
keystroke, like C-A-Del for example, so that it can kill the other
real-time tasks and return to the UI shell.

Games and real-time go together like they were made for each other.
-- 
Zan Lynx <zlynx@acm.org>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  0:05     ` Andrew Morton
  2006-03-08  0:51       ` Con Kolivas
@ 2006-03-08 22:24       ` Pavel Machek
  2006-03-09  2:22         ` Nick Piggin
  1 sibling, 1 reply; 112+ messages in thread
From: Pavel Machek @ 2006-03-08 22:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Con Kolivas, linux-kernel, linux-mm, ck

On Út 07-03-06 16:05:15, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> >
> > > yield() really sucks if there are a lot of runnable tasks.  And the amount
> > > of CPU which that thread uses isn't likely to matter anyway.
> > > 
> > > I think it'd be better to just not do this.  Perhaps alter the thread's
> > > static priority instead?  Does the scheduler have a knob which can be used
> > > to disable a tasks's dynamic priority boost heuristic?
> > 
> > We do have SCHED_BATCH but even that doesn't really have the desired effect. 
> > I know how much yield sucks and I actually want it to suck as much as yield 
> > does.
> 
> Why do you want that?
> 
> If prefetch is doing its job then it will save the machine from a pile of
> major faults in the near future.  The fact that the machine happens

Or maybe not.... it is prefetch, it may prefetch wrongly, and you
definitely want it doing nothing when system is loaded.... It only
makes sense to prefetch when system is idle.
								Pavel
-- 
Web maintainer for suspend.sf.net (www.sf.net/projects/suspend) wanted...

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08 21:07                                 ` Zan Lynx
@ 2006-03-08 23:00                                   ` Con Kolivas
  2006-03-08 23:48                                     ` Zan Lynx
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-08 23:00 UTC (permalink / raw)
  To: Zan Lynx
  Cc: André Goddard Rosa, Lee Revell, Andrew Morton, linux-mm,
	linux-kernel, ck

[-- Attachment #1: Type: text/plain, Size: 2593 bytes --]

Zan Lynx writes:

> On Wed, 2006-03-08 at 14:05 +1100, Con Kolivas wrote:
>> André Goddard Rosa writes:
>>
>> > [...]
>> >> > > Because being a serious desktop operating system that we are
>> >> > > (bwahahahaha) means the user should not have special privileges to run
>> >> > > something as simple as a game. Games should not need special scheduling
>> >> > > classes. We can always use 'nice' for a compile though. Real time audio
>> >> > > is a completely different world to this.
>> > [...]
>> >> Well as I said in my previous reply, games should _not_ need special
>> >> scheduling classes. They are not written in a real time smart way and they do
>> >> not have any realtime constraints or requirements.
>> >
>> > Sorry Con, but I have to disagree with you on this.
>> >
>> > Games are very complex software, involving heavy use of hardware resources
>> > and they also have a lot of time constraints. So, I think they should
>> > use RT priorities
>> > if it is necessary to get the resources needed in time.
>>
>> Excellent, I've opened the can of worms.
>>
>> Yes, games are a in incredibly complex beast.
>>
>> No they shouldn't need real time scheduling to work well if they are coded
>> properly. However, witness the fact that most of our games are windows
>> ports, therefore being lower quality than the original. Witness also the
>> fact that at last with dual core support, lots and lots (but not all) of
>> windows games on _windows_ are having scheduling trouble and jerky playback,
>> forcing them to crappily force binding to one cpu.
> [snip]
>
> Games where you notice frame-rate chop because the *disk system* is
> using too much CPU are perfect examples of applications that should be
> using real-time.
>
> Multiple CPU cores and multithreading in games is another perfect
> example of programming that *needs* predictable real-time thread
> priorities.  There is no other way to guarantee that physics processing
> takes priority over graphics updates or AI, once each task becomes
> separated from a monolithic main loop and spread over several CPU cores.
>
> Because games often *are* badly written, a user-friendly Linux gaming
> system does need a high-priority real-time task watching for a special
> keystroke, like C-A-Del for example, so that it can kill the other
> real-time tasks and return to the UI shell.
>
> Games and real-time go together like they were made for each other.

I guess every single well working windows game since the dawn of time is
some sort of anomaly then.

Cheers,
Con


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08 23:00                                   ` Con Kolivas
@ 2006-03-08 23:48                                     ` Zan Lynx
  2006-03-09  0:07                                       ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Zan Lynx @ 2006-03-08 23:48 UTC (permalink / raw)
  To: Con Kolivas
  Cc: André Goddard Rosa, Lee Revell, Andrew Morton, linux-mm,
	linux-kernel, ck

[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]

On Thu, 2006-03-09 at 10:00 +1100, Con Kolivas wrote:
> Zan Lynx writes:
[snip]
> > Games and real-time go together like they were made for each other.
> 
> I guess every single well working windows game since the dawn of time is 
> some sort of anomaly then.

Yes, those Windows games are anomalies that rely on the OS scheduling
them AS IF they were real-time, but without actually claiming that
priority.

Because these games just assume they own the whole system and aren't
explicitly telling the OS about their real-time requirements, the OS has
to guess instead and can get it wrong, especially when hardware
capabilities advance in ways that force changes to the task scheduler
(multi-core, hyper-threading).  And you said it yourself, many old games
don't work well on dual-core systems.

I think your effort to improve the guessing is a good idea, and
thanks.  

Just don't dismiss the idea that games do have real-time requirements
and if they did things correctly, games would explicitly specify those
requirements.
-- 
Zan Lynx <zlynx@acm.org>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08 23:48                                     ` Zan Lynx
@ 2006-03-09  0:07                                       ` Con Kolivas
  2006-03-09  3:13                                         ` Zan Lynx
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-09  0:07 UTC (permalink / raw)
  To: Zan Lynx
  Cc: André Goddard Rosa, Lee Revell, Andrew Morton, linux-mm,
	linux-kernel, ck

[-- Attachment #1: Type: text/plain, Size: 1403 bytes --]

Zan Lynx writes:

> On Thu, 2006-03-09 at 10:00 +1100, Con Kolivas wrote:
>> Zan Lynx writes:
> [snip]
>> > Games and real-time go together like they were made for each other.
>> 
>> I guess every single well working windows game since the dawn of time is 
>> some sort of anomaly then.
> 
> Yes, those Windows games are anomalies that rely on the OS scheduling
> them AS IF they were real-time, but without actually claiming that
> priority.
> 
> Because these games just assume they own the whole system and aren't
> explicitly telling the OS about their real-time requirements, the OS has
> to guess instead and can get it wrong, especially when hardware
> capabilities advance in ways that force changes to the task scheduler
> (multi-core, hyper-threading).  And you said it yourself, many old games
> don't work well on dual-core systems.
> 
> I think your effort to improve the guessing is a good idea, and
> thanks.  
> 
> Just don't dismiss the idea that games do have real-time requirements
> and if they did things correctly, games would explicitly specify those
> requirements.

Games worked on windows for a decade on single core without real time 
scheduling because that's what they were written for. 

Now that games are written for windows with dual core they work well - again 
without real time scheduling. 

Why should a port of these games to linux require real time?

Cheers,
Con


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08 22:24       ` Pavel Machek
@ 2006-03-09  2:22         ` Nick Piggin
  2006-03-09  2:30           ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Nick Piggin @ 2006-03-09  2:22 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Andrew Morton, Con Kolivas, linux-kernel, linux-mm, ck

Pavel Machek wrote:

>On Út 07-03-06 16:05:15, Andrew Morton wrote:
>
>>Why do you want that?
>>
>>If prefetch is doing its job then it will save the machine from a pile of
>>major faults in the near future.  The fact that the machine happens
>>
>
>Or maybe not.... it is prefetch, it may prefetch wrongly, and you
>definitely want it doing nothing when system is loaded.... It only
>makes sense to prefetch when system is idle.
>

Right. Prefetching is obviously going to have a very low work/benefit,
assuming your page reclaim is working properly, because a) it doesn't
deal with file pages, and b) it is doing work to reclaim pages that
have already been deemed to be the least important.

What it is good for is working around our interesting VM that apparently
allows updatedb to swap everything out (although I haven't seen this
problem myself), and artificial memory hogs. By moving work to times of
low cost. No problem with the theory behind it.

So as much as a major fault costs in terms of performance, the tiny
chance that prefetching will avoid it means even the CPU usage is
questionable. Using sched_yield() seems like a hack though.

--

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-09  2:22         ` Nick Piggin
@ 2006-03-09  2:30           ` Con Kolivas
  2006-03-09  2:57             ` Nick Piggin
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-09  2:30 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Pavel Machek, Andrew Morton, linux-kernel, linux-mm, ck

On Thu, 9 Mar 2006 01:22 pm, Nick Piggin wrote:
> Pavel Machek wrote:
> >On Út 07-03-06 16:05:15, Andrew Morton wrote:
> >>Why do you want that?
> >>
> >>If prefetch is doing its job then it will save the machine from a pile of
> >>major faults in the near future.  The fact that the machine happens
> >
> >Or maybe not.... it is prefetch, it may prefetch wrongly, and you
> >definitely want it doing nothing when system is loaded.... It only
> >makes sense to prefetch when system is idle.
>
> Right. Prefetching is obviously going to have a very low work/benefit,
> assuming your page reclaim is working properly, because a) it doesn't
> deal with file pages, and b) it is doing work to reclaim pages that
> have already been deemed to be the least important.
>
> What it is good for is working around our interesting VM that apparently
> allows updatedb to swap everything out (although I haven't seen this
> problem myself), and artificial memory hogs. By moving work to times of
> low cost. No problem with the theory behind it.
>
> So as much as a major fault costs in terms of performance, the tiny
> chance that prefetching will avoid it means even the CPU usage is
> questionable. Using sched_yield() seems like a hack though.

Yeah it's a hack alright. Funny how at last I find a place where yield does 
exactly what I want and because we hate yield so much noone wants me to use 
it all.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-09  2:30           ` Con Kolivas
@ 2006-03-09  2:57             ` Nick Piggin
  2006-03-09  9:11               ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Nick Piggin @ 2006-03-09  2:57 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Pavel Machek, Andrew Morton, linux-kernel, linux-mm, ck

Con Kolivas wrote:

>On Thu, 9 Mar 2006 01:22 pm, Nick Piggin wrote:
>
>>
>>So as much as a major fault costs in terms of performance, the tiny
>>chance that prefetching will avoid it means even the CPU usage is
>>questionable. Using sched_yield() seems like a hack though.
>>
>
>Yeah it's a hack alright. Funny how at last I find a place where yield does 
>exactly what I want and because we hate yield so much noone wants me to use 
>it all.
>
>

AFAIKS it is a hack for the same reason using it for locking is a hack,
it's just that prefetch doesn't care if it doesn't get the CPU back for
a while.

Given a yield implementation which does something completely different
for SCHED_OTHER tasks, you code may find it doesn't work so well anymore.
This is no different to the java folk using it with decent results for
locking. Just because it happened to work OK for them at the time didn't
mean it was the right thing to do.

I have always maintained that a SCHED_OTHER task calling sched_yield
is basically a bug because it is utterly undefined behaviour.

But being an in-kernel user that "knows" the implementation sort of does
the right thin, maybe you justify it that way.

--

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-09  0:07                                       ` Con Kolivas
@ 2006-03-09  3:13                                         ` Zan Lynx
  2006-03-09  4:08                                           ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Zan Lynx @ 2006-03-09  3:13 UTC (permalink / raw)
  To: Con Kolivas
  Cc: André Goddard Rosa, Lee Revell, Andrew Morton, linux-mm,
	linux-kernel, ck

[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]

On Thu, 2006-03-09 at 11:07 +1100, Con Kolivas wrote:
> Games worked on windows for a decade on single core without real time 
> scheduling because that's what they were written for. 
> 
> Now that games are written for windows with dual core they work well -
> again 
> without real time scheduling. 
> 
> Why should a port of these games to linux require real time?

That isn't what I said.  I said nothing about *requiring* anything, only
about how to do it better.

Here is what Con said that I was disagreeing with.  All the rest was to
justify my disagreement.  

Con said, "... games should _not_ need special scheduling classes. They
are not written in a real time smart way and they do not have any
realtime constraints or requirements."

And he said later, "No they shouldn't need real time scheduling to work
well if they are coded properly."

Here is a list of simple statements of what I am saying:
Games do have real-time requirements.
The OS guessing about real-time priorities will sometimes get it wrong.
Guessing task priority is worse than being told and knowing for sure.
Games should, in an ideal world, be using real-time OS scheduling.
Games would work better using real-time OS scheduling.

That is all from me.
-- 
Zan Lynx <zlynx@acm.org>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-09  3:13                                         ` Zan Lynx
@ 2006-03-09  4:08                                           ` Con Kolivas
  2006-03-09  4:54                                             ` Lee Revell
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-09  4:08 UTC (permalink / raw)
  To: Zan Lynx
  Cc: André Goddard Rosa, Lee Revell, Andrew Morton, linux-mm,
	linux-kernel, ck

[-- Attachment #1: Type: text/plain, Size: 1667 bytes --]

Zan Lynx writes:

> On Thu, 2006-03-09 at 11:07 +1100, Con Kolivas wrote:
>> Games worked on windows for a decade on single core without real time 
>> scheduling because that's what they were written for. 
>> 
>> Now that games are written for windows with dual core they work well -
>> again 
>> without real time scheduling. 
>> 
>> Why should a port of these games to linux require real time?
> 
> That isn't what I said.  I said nothing about *requiring* anything, only
> about how to do it better.
> 
> Here is what Con said that I was disagreeing with.  All the rest was to
> justify my disagreement.  
> 
> Con said, "... games should _not_ need special scheduling classes. They
> are not written in a real time smart way and they do not have any
> realtime constraints or requirements."
> 
> And he said later, "No they shouldn't need real time scheduling to work
> well if they are coded properly."
> 
> Here is a list of simple statements of what I am saying:
> Games do have real-time requirements.
> The OS guessing about real-time priorities will sometimes get it wrong.
> Guessing task priority is worse than being told and knowing for sure.
> Games should, in an ideal world, be using real-time OS scheduling.
> Games would work better using real-time OS scheduling.

At the risk of  being repetitive to the point of tiresome, my point is that 
there are no real time requirements in games. You're assuming that 
everything will be better if we assume that there are rt requirements and 
that we're simulating pseudo real time conditions currently. That's just not 
the case and never has been. That's why it has worked fine for so long.

Cheers,
Con


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-09  4:08                                           ` Con Kolivas
@ 2006-03-09  4:54                                             ` Lee Revell
  0 siblings, 0 replies; 112+ messages in thread
From: Lee Revell @ 2006-03-09  4:54 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Zan Lynx, André Goddard Rosa, Andrew Morton, linux-mm,
	linux-kernel, ck

On Thu, 2006-03-09 at 15:08 +1100, Con Kolivas wrote:
> > Games do have real-time requirements.
> > The OS guessing about real-time priorities will sometimes get it wrong.
> > Guessing task priority is worse than being told and knowing for sure.
> > Games should, in an ideal world, be using real-time OS scheduling.
> > Games would work better using real-time OS scheduling.
> 
> At the risk of  being repetitive to the point of tiresome, my point is that 
> there are no real time requirements in games. You're assuming that 
> everything will be better if we assume that there are rt requirements and 
> that we're simulating pseudo real time conditions currently. That's just not 
> the case and never has been. That's why it has worked fine for so long. 

I think you are talking past each other, and are both right - Con is
saying games don't need realtime scheduling (SCHED_FIFO, low nice value,
whatever) to function correctly (true), while Zan is saying that games
have RT constraints in that they must react as fast as possible to user
input (also true).

Anyway, this is getting OT, I wish I had not raised this issue in this
thread.

Lee


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-08  1:12           ` Con Kolivas
  2006-03-08  1:19             ` Con Kolivas
  2006-03-08  1:23             ` Andrew Morton
@ 2006-03-09  8:57             ` Helge Hafting
  2006-03-09  9:08               ` Con Kolivas
  2 siblings, 1 reply; 112+ messages in thread
From: Helge Hafting @ 2006-03-09  8:57 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, linux-kernel, linux-mm, ck

Con Kolivas wrote:

>On Wed, 8 Mar 2006 12:11 pm, Andrew Morton wrote:
>  
>
>>but, but.  If prefetching is prefetching stuff which that game will soon
>>use then it'll be an aggregate improvement.  If prefetch is prefetching
>>stuff which that game _won't_ use then prefetch is busted.  Using yield()
>>to artificially cripple kprefetchd is a rather sad workaround isn't it?
>>    
>>
>
>It's not the stuff that it prefetches that's the problem; it's the disk 
>access.
>  
>
Well, seems you have some sorry kind of disk driver then?
An ide disk not using dma? 

A low-cpu task that only abuses the disk shouldn't make an impact
on a 3D game that hogs the cpu only.  Unless the driver for your
harddisk is faulty, using way more cpu than it need.

Use hdparm, check the basics:
unmaksirq=1, using_dma=1, multcount is some positive number,
such as 8 or 16, readahead is some positive number.
Also use hdparm -i and verify that the disk is using some
nice udma mode.  (too old for that, and it probably isn't worth
optimizing this for...)

Also make sure the disk driver isn't sharing an irq with the
3D card. 

Come to think of it, if your 3D game happens to saturate the
pci bus for long times, then disk accesses might indeed
be noticeable as they too need the bus.  Check if going to
a slower dma mode helps - this might free up the bus a bit.

Helge Hafting

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-09  8:57             ` Helge Hafting
@ 2006-03-09  9:08               ` Con Kolivas
       [not found]                 ` <4410AFD3.7090505@bigpond.net.au>
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-09  9:08 UTC (permalink / raw)
  To: Helge Hafting; +Cc: Andrew Morton, linux-kernel, linux-mm, ck

On Thursday 09 March 2006 19:57, Helge Hafting wrote:
> Con Kolivas wrote:
> >On Wed, 8 Mar 2006 12:11 pm, Andrew Morton wrote:
> >>but, but.  If prefetching is prefetching stuff which that game will soon
> >>use then it'll be an aggregate improvement.  If prefetch is prefetching
> >>stuff which that game _won't_ use then prefetch is busted.  Using yield()
> >>to artificially cripple kprefetchd is a rather sad workaround isn't it?
> >
> >It's not the stuff that it prefetches that's the problem; it's the disk
> >access.
>
> Well, seems you have some sorry kind of disk driver then?
> An ide disk not using dma?
>
> A low-cpu task that only abuses the disk shouldn't make an impact
> on a 3D game that hogs the cpu only.  Unless the driver for your
> harddisk is faulty, using way more cpu than it need.
>
> Use hdparm, check the basics:
> unmaksirq=1, using_dma=1, multcount is some positive number,
> such as 8 or 16, readahead is some positive number.
> Also use hdparm -i and verify that the disk is using some
> nice udma mode.  (too old for that, and it probably isn't worth
> optimizing this for...)
>
> Also make sure the disk driver isn't sharing an irq with the
> 3D card.
>
> Come to think of it, if your 3D game happens to saturate the
> pci bus for long times, then disk accesses might indeed
> be noticeable as they too need the bus.  Check if going to
> a slower dma mode helps - this might free up the bus a bit.

Thanks for the hints. 

However I actually wrote the swap prefetch code and this is all about changing 
its behaviour to make it do what I want. The problem is that nice 19 will 
give it up to 5% cpu in the presence of a nice 0 task when I really don't 
want swap prefetch doing anything. Furthermore because it is constantly 
waking up from sleep (after disk activity) it is always given lower latency 
scheduling than a fully cpu bound nice 0 task - this is normally appropriate 
behaviour. Yielding regularly works around that issue. 

Ideally taking into account cpu usage and only working below a certain cpu 
threshold may be the better mechanism and it does appear this would be more 
popular. It would not be hard to implement, but does add yet more code to an 
increasingly complex heuristic used to detect "idleness". I am seriously 
considering it.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] mm: yield during swap prefetching
  2006-03-09  2:57             ` Nick Piggin
@ 2006-03-09  9:11               ` Con Kolivas
  0 siblings, 0 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-09  9:11 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Pavel Machek, Andrew Morton, linux-kernel, linux-mm, ck

On Thursday 09 March 2006 13:57, Nick Piggin wrote:
> Con Kolivas wrote:
> >On Thu, 9 Mar 2006 01:22 pm, Nick Piggin wrote:
> >>So as much as a major fault costs in terms of performance, the tiny
> >>chance that prefetching will avoid it means even the CPU usage is
> >>questionable. Using sched_yield() seems like a hack though.
> >
> >Yeah it's a hack alright. Funny how at last I find a place where yield
> > does exactly what I want and because we hate yield so much noone wants me
> > to use it all.
>
> AFAIKS it is a hack for the same reason using it for locking is a hack,
> it's just that prefetch doesn't care if it doesn't get the CPU back for
> a while.
>
> Given a yield implementation which does something completely different
> for SCHED_OTHER tasks, you code may find it doesn't work so well anymore.
> This is no different to the java folk using it with decent results for
> locking. Just because it happened to work OK for them at the time didn't
> mean it was the right thing to do.
>
> I have always maintained that a SCHED_OTHER task calling sched_yield
> is basically a bug because it is utterly undefined behaviour.
>
> But being an in-kernel user that "knows" the implementation sort of does
> the right thin, maybe you justify it that way.

You're right. Even if I do know exactly how yield works and am using it to my 
advantage, any solution that depends on the way yield works may well not work 
in the future. It does look like I should just check cpu usage as well in 
prefetch_suitable(). That will probably be the best generalised solution to 
this. 

Thanks.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [ck] Re: [PATCH] mm: yield during swap prefetching
       [not found]                 ` <4410AFD3.7090505@bigpond.net.au>
@ 2006-03-10  9:01                   ` Andreas Mohr
  2006-03-10  9:11                     ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Andreas Mohr @ 2006-03-10  9:01 UTC (permalink / raw)
  To: Peter Williams
  Cc: Con Kolivas, Andrew Morton, linux-mm, linux-kernel, ck, Helge Hafting

Hi,

On Fri, Mar 10, 2006 at 09:44:35AM +1100, Peter Williams wrote:
> I'm working on a patch to add soft and hard CPU rate caps to the 
> scheduler and the soft caps may be useful for what you're trying to do. 
>  They are a generalization of your SCHED_BATCH implementation in 
> staircase (which would have been better called SCHED_BACKGROUND :-) 
Which SCHED_BATCH? ;) I only know it as SCHED_IDLEPRIO, which, come to think
of it, is a better name, I believe :-)
(renamed due to mainline introducing a *different* SCHED_BATCH mechanism)

> IMHO) in that a task with a soft cap will only use more CPU than that 
> cap if it (the cpu) would otherwise go unused.  The main difference 
> between this mechanism and staircase's SCHED_BATCH mechanism is that you 
> can specify how much (as parts per thousand of a CPU) the task can use 
> instead of just being background or not background.  With the soft cap 
> set to zero the effect would be essentially the same.
Interesting. Hopefully it will bring some nice results!

Andreas

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [ck] Re: [PATCH] mm: yield during swap prefetching
  2006-03-10  9:01                   ` [ck] " Andreas Mohr
@ 2006-03-10  9:11                     ` Con Kolivas
  0 siblings, 0 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-10  9:11 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Peter Williams, Andrew Morton, linux-mm, linux-kernel, ck, Helge Hafting

On Friday 10 March 2006 20:01, Andreas Mohr wrote:
> Hi,
>
> On Fri, Mar 10, 2006 at 09:44:35AM +1100, Peter Williams wrote:
> > I'm working on a patch to add soft and hard CPU rate caps to the
> > scheduler and the soft caps may be useful for what you're trying to do.
> >  They are a generalization of your SCHED_BATCH implementation in
> > staircase (which would have been better called SCHED_BACKGROUND :-)
>
> Which SCHED_BATCH? ;) I only know it as SCHED_IDLEPRIO, which, come to
> think of it, is a better name, I believe :-)
> (renamed due to mainline introducing a *different* SCHED_BATCH mechanism)

Just to clarify what Andreas is saying: I was forced to rename my SCHED_BATCH 
to SCHED_IDLEPRIO which is a more descriptive name anyway. That is in my 
2.6.16-rc based patches. SCHED_BATCH as you know is now used to mean "don't 
treat me as interactive" so I'm using this policy naming in 2.6.16- based 
patches.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [ck] Re: [PATCH] mm: yield during swap prefetching
  2006-03-08 13:36     ` [ck] " Con Kolivas
@ 2006-03-17  9:06       ` Ingo Molnar
  2006-03-17 10:46         ` interactive task starvation Mike Galbraith
  2006-03-17 12:38         ` [PATCH] sched: activate SCHED BATCH expired Con Kolivas
  0 siblings, 2 replies; 112+ messages in thread
From: Ingo Molnar @ 2006-03-17  9:06 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck, Andrew Morton, linux-mm, linux-kernel


* Con Kolivas <kernel@kolivas.org> wrote:

> > We do have SCHED_BATCH but even that doesn't really have the desired
> > effect. I know how much yield sucks and I actually want it to suck as much
> > as yield does.
> 
> Thinking some more on this I wonder if SCHED_BATCH isn't a strong 
> enough scheduling hint if it's not suitable for such an application. 
> Ingo do you think we could make SCHED_BATCH tasks always wake up on 
> the expired array?

yep, i think that's a good idea. In the worst case the starvation 
timeout should kick in.

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* interactive task starvation
  2006-03-17  9:06       ` Ingo Molnar
@ 2006-03-17 10:46         ` Mike Galbraith
  2006-03-17 17:15           ` Mike Galbraith
  2006-03-17 12:38         ` [PATCH] sched: activate SCHED BATCH expired Con Kolivas
  1 sibling, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-17 10:46 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: lkml

On Fri, 2006-03-17 at 10:06 +0100, Ingo Molnar wrote:

> yep, i think that's a good idea. In the worst case the starvation 
> timeout should kick in.

(I didn't want to hijack that thread ergo name change)

Speaking of the starvation timeout...

I'm beginning to wonder if it might not be a good idea to always have an
expired_timestamp to ensure that there is a limit to how long
interactive tasks can starve _each other_.  Yesterday, I ran some tests
with apache, and ended up waiting for over 3 minutes for a netstat|
grep :81|wc -l to finish when competing with 10 copies of httpd.  The
problem with the expired_timestamp is that if there is nobody already
expired, and if no non-interactive task exists, there's certainly no
expired_timestamp, there's no starvation limit. 

There are other ways to cure 'interactive starvation', but forcing an
array switch if a non-interactive task hasn't run for pick-a-number time
is the easiest.

	-Mike

(yup, folks would certainly feel it, and would _very_ likely gripe, so
it would probably have to be configurable)


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [PATCH] sched: activate SCHED BATCH expired
  2006-03-17  9:06       ` Ingo Molnar
  2006-03-17 10:46         ` interactive task starvation Mike Galbraith
@ 2006-03-17 12:38         ` Con Kolivas
  2006-03-17 13:07           ` Ingo Molnar
  2006-03-17 13:26           ` Nick Piggin
  1 sibling, 2 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-17 12:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: ck, Andrew Morton, linux-kernel

On Friday 17 March 2006 20:06, Ingo Molnar wrote:
> * Con Kolivas <kernel@kolivas.org> wrote:
> > Thinking some more on this I wonder if SCHED_BATCH isn't a strong
> > enough scheduling hint if it's not suitable for such an application.
> > Ingo do you think we could make SCHED_BATCH tasks always wake up on
> > the expired array?
>
> yep, i think that's a good idea. In the worst case the starvation
> timeout should kick in.

Ok here's a patch that does exactly that. Without an "inline" hint, gcc 4.1.0
chooses not to inline this function. I can't say I have a strong opinion
about whether it should be inlined or not (93 bytes larger inlined), so I've
decided not to given the current trend.

Cheers,
Con
---
To increase the strength of SCHED_BATCH as a scheduling hint we can activate
batch tasks on the expired array since by definition they are latency
insensitive tasks.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 include/linux/sched.h |    1 +
 kernel/sched.c        |    9 ++++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

Index: linux-2.6.16-rc6-mm1/include/linux/sched.h
===================================================================
--- linux-2.6.16-rc6-mm1.orig/include/linux/sched.h	2006-03-13 20:12:22.000000000 +1100
+++ linux-2.6.16-rc6-mm1/include/linux/sched.h	2006-03-17 23:08:31.000000000 +1100
@@ -485,6 +485,7 @@ struct signal_struct {
 #define MAX_PRIO		(MAX_RT_PRIO + 40)
 
 #define rt_task(p)		(unlikely((p)->prio < MAX_RT_PRIO))
+#define batch_task(p)		(unlikely((p)->policy == SCHED_BATCH))
 
 /*
  * Some day this will be a full-fledged user tracking system..
Index: linux-2.6.16-rc6-mm1/kernel/sched.c
===================================================================
--- linux-2.6.16-rc6-mm1.orig/kernel/sched.c	2006-03-13 20:12:15.000000000 +1100
+++ linux-2.6.16-rc6-mm1/kernel/sched.c	2006-03-17 23:08:12.000000000 +1100
@@ -737,9 +737,12 @@ static inline void dec_nr_running(task_t
 /*
  * __activate_task - move a task to the runqueue.
  */
-static inline void __activate_task(task_t *p, runqueue_t *rq)
+static void __activate_task(task_t *p, runqueue_t *rq)
 {
-	enqueue_task(p, rq->active);
+	if (batch_task(p))
+		enqueue_task(p, rq->expired);
+	else
+		enqueue_task(p, rq->active);
 	inc_nr_running(p, rq);
 }
 
@@ -758,7 +761,7 @@ static int recalc_task_prio(task_t *p, u
 	unsigned long long __sleep_time = now - p->timestamp;
 	unsigned long sleep_time;
 
-	if (unlikely(p->policy == SCHED_BATCH))
+	if (batch_task(p))
 		sleep_time = 0;
 	else {
 		if (__sleep_time > NS_MAX_SLEEP_AVG)

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] sched: activate SCHED BATCH expired
  2006-03-17 12:38         ` [PATCH] sched: activate SCHED BATCH expired Con Kolivas
@ 2006-03-17 13:07           ` Ingo Molnar
  2006-03-17 13:26           ` Nick Piggin
  1 sibling, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2006-03-17 13:07 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck, Andrew Morton, linux-kernel


* Con Kolivas <kernel@kolivas.org> wrote:

> To increase the strength of SCHED_BATCH as a scheduling hint we can activate
> batch tasks on the expired array since by definition they are latency
> insensitive tasks.
> 
> Signed-off-by: Con Kolivas <kernel@kolivas.org>

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] sched: activate SCHED BATCH expired
  2006-03-17 12:38         ` [PATCH] sched: activate SCHED BATCH expired Con Kolivas
  2006-03-17 13:07           ` Ingo Molnar
@ 2006-03-17 13:26           ` Nick Piggin
  2006-03-17 13:36             ` Con Kolivas
  1 sibling, 1 reply; 112+ messages in thread
From: Nick Piggin @ 2006-03-17 13:26 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Ingo Molnar, ck, Andrew Morton, linux-kernel

Con Kolivas wrote:

> 
> Ok here's a patch that does exactly that. Without an "inline" hint, gcc 4.1.0
> chooses not to inline this function. I can't say I have a strong opinion
> about whether it should be inlined or not (93 bytes larger inlined), so I've
> decided not to given the current trend.
> 

Sigh, sacrifice for the common case! :P


> Index: linux-2.6.16-rc6-mm1/kernel/sched.c
> ===================================================================
> --- linux-2.6.16-rc6-mm1.orig/kernel/sched.c	2006-03-13 20:12:15.000000000 +1100
> +++ linux-2.6.16-rc6-mm1/kernel/sched.c	2006-03-17 23:08:12.000000000 +1100
> @@ -737,9 +737,12 @@ static inline void dec_nr_running(task_t
>  /*
>   * __activate_task - move a task to the runqueue.
>   */
> -static inline void __activate_task(task_t *p, runqueue_t *rq)
> +static void __activate_task(task_t *p, runqueue_t *rq)
>  {
> -	enqueue_task(p, rq->active);
> +	if (batch_task(p))
> +		enqueue_task(p, rq->expired);
> +	else
> +		enqueue_task(p, rq->active);
>  	inc_nr_running(p, rq);
>  }
>  

I prefer:

   prio_array_t *target = rq->active;
   if (batch_task(p))
     target = rq->expired;
   enqueue_task(p, target);

Because gcc can use things like predicated instructions for it.
But perhaps it is smart enough these days to recognise this?
At least in the past I have seen it start using cmov after doing
such a conversion.

At any rate, I think it looks nicer as well. IMO, of course.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] sched: activate SCHED BATCH expired
  2006-03-17 13:26           ` Nick Piggin
@ 2006-03-17 13:36             ` Con Kolivas
  2006-03-17 13:46               ` Nick Piggin
  2006-03-17 13:47               ` [ck] " Andreas Mohr
  0 siblings, 2 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-17 13:36 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Ingo Molnar, ck, Andrew Morton, linux-kernel

On Saturday 18 March 2006 00:26, Nick Piggin wrote:
> Con Kolivas wrote:
> > -static inline void __activate_task(task_t *p, runqueue_t *rq)
> > +static void __activate_task(task_t *p, runqueue_t *rq)
> >  {
> > -	enqueue_task(p, rq->active);
> > +	if (batch_task(p))
> > +		enqueue_task(p, rq->expired);
> > +	else
> > +		enqueue_task(p, rq->active);
> >  	inc_nr_running(p, rq);
> >  }
>
> I prefer:
>
>    prio_array_t *target = rq->active;
>    if (batch_task(p))
>      target = rq->expired;
>    enqueue_task(p, target);
>
> Because gcc can use things like predicated instructions for it.
> But perhaps it is smart enough these days to recognise this?
> At least in the past I have seen it start using cmov after doing
> such a conversion.
>
> At any rate, I think it looks nicer as well. IMO, of course.

Well on my one boring architecture here is a before and after, gcc 4.1.0 with 
optimise for size kernel config:
0xb01127da <__activate_task+0>: push   %ebp
0xb01127db <__activate_task+1>: mov    %esp,%ebp
0xb01127dd <__activate_task+3>: push   %esi
0xb01127de <__activate_task+4>: push   %ebx
0xb01127df <__activate_task+5>: mov    %eax,%esi
0xb01127e1 <__activate_task+7>: mov    %edx,%ebx
0xb01127e3 <__activate_task+9>: cmpl   $0x3,0x58(%eax)
0xb01127e7 <__activate_task+13>:        jne    0xb01127ee <__activate_task+20>
0xb01127e9 <__activate_task+15>:        mov    0x44(%edx),%edx
0xb01127ec <__activate_task+18>:        jmp    0xb01127f1 <__activate_task+23>
0xb01127ee <__activate_task+20>:        mov    0x40(%edx),%edx
0xb01127f1 <__activate_task+23>:        mov    %esi,%eax
0xb01127f3 <__activate_task+25>:        call   0xb01124bb <enqueue_task>
0xb01127f8 <__activate_task+30>:        incl   0x8(%ebx)
0xb01127fb <__activate_task+33>:        mov    0x18(%esi),%eax
0xb01127fe <__activate_task+36>:        add    %eax,0xc(%ebx)
0xb0112801 <__activate_task+39>:        pop    %ebx
0xb0112802 <__activate_task+40>:        pop    %esi
0xb0112803 <__activate_task+41>:        pop    %ebp
0xb0112804 <__activate_task+42>:        ret

Your version:
0xb01127da <__activate_task+0>: push   %ebp
0xb01127db <__activate_task+1>: mov    %esp,%ebp
0xb01127dd <__activate_task+3>: push   %esi
0xb01127de <__activate_task+4>: push   %ebx
0xb01127df <__activate_task+5>: mov    %eax,%esi
0xb01127e1 <__activate_task+7>: mov    %edx,%ebx
0xb01127e3 <__activate_task+9>: mov    0x40(%edx),%edx
0xb01127e6 <__activate_task+12>:        cmpl   $0x3,0x58(%eax)
0xb01127ea <__activate_task+16>:        jne    0xb01127ef <__activate_task+21>
0xb01127ec <__activate_task+18>:        mov    0x44(%ebx),%edx
0xb01127ef <__activate_task+21>:        mov    %esi,%eax
0xb01127f1 <__activate_task+23>:        call   0xb01124bb <enqueue_task>
0xb01127f6 <__activate_task+28>:        incl   0x8(%ebx)
0xb01127f9 <__activate_task+31>:        mov    0x18(%esi),%eax
0xb01127fc <__activate_task+34>:        add    %eax,0xc(%ebx)
0xb01127ff <__activate_task+37>:        pop    %ebx
0xb0112800 <__activate_task+38>:        pop    %esi
0xb0112801 <__activate_task+39>:        pop    %ebp
0xb0112802 <__activate_task+40>:        ret

I'm not attached to the style, just the feature. If you think it's warranted 
I'll change it.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] sched: activate SCHED BATCH expired
  2006-03-17 13:36             ` Con Kolivas
@ 2006-03-17 13:46               ` Nick Piggin
  2006-03-17 13:51                 ` Nick Piggin
  2006-03-17 14:11                 ` Con Kolivas
  2006-03-17 13:47               ` [ck] " Andreas Mohr
  1 sibling, 2 replies; 112+ messages in thread
From: Nick Piggin @ 2006-03-17 13:46 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Ingo Molnar, ck, Andrew Morton, linux-kernel

Con Kolivas wrote:
> On Saturday 18 March 2006 00:26, Nick Piggin wrote:
> 
>>Con Kolivas wrote:
>>
>>>-static inline void __activate_task(task_t *p, runqueue_t *rq)
>>>+static void __activate_task(task_t *p, runqueue_t *rq)
>>> {
>>>-	enqueue_task(p, rq->active);
>>>+	if (batch_task(p))
>>>+		enqueue_task(p, rq->expired);
>>>+	else
>>>+		enqueue_task(p, rq->active);
>>> 	inc_nr_running(p, rq);
>>> }
>>
>>I prefer:
>>
>>   prio_array_t *target = rq->active;
>>   if (batch_task(p))
>>     target = rq->expired;
>>   enqueue_task(p, target);
>>
>>Because gcc can use things like predicated instructions for it.
>>But perhaps it is smart enough these days to recognise this?
>>At least in the past I have seen it start using cmov after doing
>>such a conversion.
>>
>>At any rate, I think it looks nicer as well. IMO, of course.
> 
> 
> Well on my one boring architecture here is a before and after, gcc 4.1.0 with 
> optimise for size kernel config:

> I'm not attached to the style, just the feature. If you think it's warranted 
> I'll change it.
> 

I guess it isn't doing the cmov because it doesn't want to do the
extra load in the common case, which is fair enough (are you compiling
for a pentiumpro+, without generic x86 support? what about if you
turn off optimise for size?)

At least other archtectures might be able to make better use of it,
and I agree even for i386 the code looks better (and slightly smaller).

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [ck] Re: [PATCH] sched: activate SCHED BATCH expired
  2006-03-17 13:36             ` Con Kolivas
  2006-03-17 13:46               ` Nick Piggin
@ 2006-03-17 13:47               ` Andreas Mohr
  2006-03-17 13:59                 ` Con Kolivas
  2006-03-17 14:06                 ` Nick Piggin
  1 sibling, 2 replies; 112+ messages in thread
From: Andreas Mohr @ 2006-03-17 13:47 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Nick Piggin, ck, Andrew Morton, linux-kernel

Hi,

On Sat, Mar 18, 2006 at 12:36:10AM +1100, Con Kolivas wrote:
> I'm not attached to the style, just the feature. If you think it's warranted 
> I'll change it.

Seconded.

An even nicer way (this solution seems somewhat asymmetric) than

   prio_array_t *target = rq->active;
   if (batch_task(p))
     target = rq->expired;
   enqueue_task(p, target);

may be

   prio_array_t *target;
   if (batch_task(p))
     target = rq->expired;
   else
     target = rq->active;
   enqueue_task(p, target);

and thus (but this coding style may be considered overloaded):

   prio_array_t *target;
   target = batch_task(p) ?
	rq->expired : rq->active;
   enqueue_task(p, target);


But this discussion is clearly growing out of control now ;)

Andreas Mohr

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] sched: activate SCHED BATCH expired
  2006-03-17 13:46               ` Nick Piggin
@ 2006-03-17 13:51                 ` Nick Piggin
  2006-03-17 14:11                 ` Con Kolivas
  1 sibling, 0 replies; 112+ messages in thread
From: Nick Piggin @ 2006-03-17 13:51 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Ingo Molnar, ck, Andrew Morton, linux-kernel

Nick Piggin wrote:
> Con Kolivas wrote:

>> I'm not attached to the style, just the feature. If you think it's 
>> warranted I'll change it.
>>
> 

> At least other archtectures might be able to make better use of it,
> and I agree even for i386 the code looks better (and slightly smaller).
> 

s/I agree/I think/

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [ck] Re: [PATCH] sched: activate SCHED BATCH expired
  2006-03-17 13:47               ` [ck] " Andreas Mohr
@ 2006-03-17 13:59                 ` Con Kolivas
  2006-03-17 14:06                 ` Nick Piggin
  1 sibling, 0 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-17 13:59 UTC (permalink / raw)
  To: Andreas Mohr; +Cc: Nick Piggin, ck, Andrew Morton, linux-kernel

On Saturday 18 March 2006 00:47, Andreas Mohr wrote:
> Hi,
>
> On Sat, Mar 18, 2006 at 12:36:10AM +1100, Con Kolivas wrote:
> > I'm not attached to the style, just the feature. If you think it's
> > warranted I'll change it.
>
> Seconded.
>
> An even nicer way (this solution seems somewhat asymmetric) than
>
>    prio_array_t *target = rq->active;
>    if (batch_task(p))
>      target = rq->expired;
>    enqueue_task(p, target);
>
> may be
>
>    prio_array_t *target;
>    if (batch_task(p))
>      target = rq->expired;
>    else
>      target = rq->active;
>    enqueue_task(p, target);

Well I hadn't quite gone to bed so I tried yours for grins too and 
interestingly it produced the identical code to my original version.

> But this discussion is clearly growing out of control now ;)

I prefer a month's worth of this over a single more email about 
cd-fscking-record's amazing perfection.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [ck] Re: [PATCH] sched: activate SCHED BATCH expired
  2006-03-17 13:47               ` [ck] " Andreas Mohr
  2006-03-17 13:59                 ` Con Kolivas
@ 2006-03-17 14:06                 ` Nick Piggin
  1 sibling, 0 replies; 112+ messages in thread
From: Nick Piggin @ 2006-03-17 14:06 UTC (permalink / raw)
  To: Andreas Mohr; +Cc: Con Kolivas, ck, Andrew Morton, linux-kernel

Andreas Mohr wrote:
> Hi,
> 
> On Sat, Mar 18, 2006 at 12:36:10AM +1100, Con Kolivas wrote:
> 
>>I'm not attached to the style, just the feature. If you think it's warranted 
>>I'll change it.
> 
> 
> Seconded.
> 
> An even nicer way (this solution seems somewhat asymmetric) than
> 
>    prio_array_t *target = rq->active;
>    if (batch_task(p))
>      target = rq->expired;
>    enqueue_task(p, target);
> 
> may be
> 
>    prio_array_t *target;
>    if (batch_task(p))
>      target = rq->expired;
>    else
>      target = rq->active;
>    enqueue_task(p, target);
> 

It doesn't actually generate the same code here (I guess it is good
that gcc gives us this control).

I think my way is (ever so slightly) better because it gets the load
going earlier and comprises one less conditional jump (admittedly in
the slowpath). You'd probably never be able to measure a difference
between any of the variants, however ;)

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] sched: activate SCHED BATCH expired
  2006-03-17 13:46               ` Nick Piggin
  2006-03-17 13:51                 ` Nick Piggin
@ 2006-03-17 14:11                 ` Con Kolivas
  2006-03-17 14:59                   ` Ingo Molnar
  1 sibling, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-17 14:11 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Ingo Molnar, ck, Andrew Morton, linux-kernel

On Saturday 18 March 2006 00:46, Nick Piggin wrote:
> I guess it isn't doing the cmov because it doesn't want to do the
> extra load in the common case, which is fair enough (are you compiling
> for a pentiumpro+, without generic x86 support?

For pentium4 with no generic support.

> what about if you 
> turn off optimise for size?)

Dunno, sleep is taking me...

> At least other archtectures might be able to make better use of it,
> and I agree even for i386 the code looks better (and slightly smaller).

Good enough for me. Here's a respin, thanks!

Cheers,
Con
---
To increase the strength of SCHED_BATCH as a scheduling hint we can activate
batch tasks on the expired array since by definition they are latency
insensitive tasks.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 include/linux/sched.h |    1 +
 kernel/sched.c        |   10 +++++++---
 2 files changed, 8 insertions(+), 3 deletions(-)

Index: linux-2.6.16-rc6-mm1/include/linux/sched.h
===================================================================
--- linux-2.6.16-rc6-mm1.orig/include/linux/sched.h	2006-03-13 20:12:22.000000000 +1100
+++ linux-2.6.16-rc6-mm1/include/linux/sched.h	2006-03-17 23:08:31.000000000 +1100
@@ -485,6 +485,7 @@ struct signal_struct {
 #define MAX_PRIO		(MAX_RT_PRIO + 40)
 
 #define rt_task(p)		(unlikely((p)->prio < MAX_RT_PRIO))
+#define batch_task(p)		(unlikely((p)->policy == SCHED_BATCH))
 
 /*
  * Some day this will be a full-fledged user tracking system..
Index: linux-2.6.16-rc6-mm1/kernel/sched.c
===================================================================
--- linux-2.6.16-rc6-mm1.orig/kernel/sched.c	2006-03-13 20:12:15.000000000 +1100
+++ linux-2.6.16-rc6-mm1/kernel/sched.c	2006-03-18 01:05:02.000000000 +1100
@@ -737,9 +737,13 @@ static inline void dec_nr_running(task_t
 /*
  * __activate_task - move a task to the runqueue.
  */
-static inline void __activate_task(task_t *p, runqueue_t *rq)
+static void __activate_task(task_t *p, runqueue_t *rq)
 {
-	enqueue_task(p, rq->active);
+	prio_array_t *target = rq->active;
+
+	if (batch_task(p))
+		target = rq->expired;
+	enqueue_task(p, target);
 	inc_nr_running(p, rq);
 }
 
@@ -758,7 +762,7 @@ static int recalc_task_prio(task_t *p, u
 	unsigned long long __sleep_time = now - p->timestamp;
 	unsigned long sleep_time;
 
-	if (unlikely(p->policy == SCHED_BATCH))
+	if (batch_task(p))
 		sleep_time = 0;
 	else {
 		if (__sleep_time > NS_MAX_SLEEP_AVG)

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH] sched: activate SCHED BATCH expired
  2006-03-17 14:11                 ` Con Kolivas
@ 2006-03-17 14:59                   ` Ingo Molnar
  0 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2006-03-17 14:59 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Nick Piggin, ck, Andrew Morton, linux-kernel


* Con Kolivas <kernel@kolivas.org> wrote:

> Good enough for me. Here's a respin, thanks!

> Signed-off-by: Con Kolivas <kernel@kolivas.org>

Still-Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-17 10:46         ` interactive task starvation Mike Galbraith
@ 2006-03-17 17:15           ` Mike Galbraith
  2006-03-20  7:09             ` Mike Galbraith
  0 siblings, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-17 17:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: lkml

On Fri, 2006-03-17 at 11:46 +0100, Mike Galbraith wrote:
> On Fri, 2006-03-17 at 10:06 +0100, Ingo Molnar wrote:
> 
> > yep, i think that's a good idea. In the worst case the starvation 
> > timeout should kick in.
> 
> (I didn't want to hijack that thread ergo name change)
> 
> Speaking of the starvation timeout...
> 

<snip day late $ short idea>

Problem solved.  I now know why the starvation logic doesn't work.
Wakeups.  In the face of 10+ copies of httpd constantly waking up, it
seems it just takes ages to get around to switching arrays.

With the (urp) patch below, I now get...

[root]:# time netstat|grep :81|wc -l
   1648

real    0m27.735s
user    0m0.158s
sys     0m0.111s
[root]:# time netstat|grep :81|wc -l
   1817

real    0m13.550s
user    0m0.121s
sys     0m0.186s
[root]:# time netstat|grep :81|wc -l
   1641

real    0m17.022s
user    0m0.132s
sys     0m0.143s
[root]:# 

which certainly isn't pleasant, but it beats the heck out of minutes.

	-Mike

--- kernel/sched.c.org	2006-03-17 14:48:35.000000000 +0100
+++ kernel/sched.c	2006-03-17 17:41:25.000000000 +0100
@@ -662,11 +662,30 @@
 }
 
 /*
+ * We place interactive tasks back into the active array, if possible.
+ *
+ * To guarantee that this does not starve expired tasks we ignore the
+ * interactivity of a task if the first expired task had to wait more
+ * than a 'reasonable' amount of time. This deadline timeout is
+ * load-dependent, as the frequency of array switched decreases with
+ * increasing number of running tasks. We also ignore the interactivity
+ * if a better static_prio task has expired:
+ */
+#define EXPIRED_STARVING(rq) \
+	((STARVATION_LIMIT && ((rq)->expired_timestamp && \
+		(jiffies - (rq)->expired_timestamp >= \
+			STARVATION_LIMIT * ((rq)->nr_running) + 1))) || \
+			((rq)->curr->static_prio > (rq)->best_expired_prio))
+
+/*
  * __activate_task - move a task to the runqueue.
  */
 static inline void __activate_task(task_t *p, runqueue_t *rq)
 {
-	enqueue_task(p, rq->active);
+	prio_array_t *array = rq->active;
+	if (unlikely(EXPIRED_STARVING(rq)))
+		array = rq->expired;
+	enqueue_task(p, array);
 	rq->nr_running++;
 }
 
@@ -2461,22 +2480,6 @@
 }
 
 /*
- * We place interactive tasks back into the active array, if possible.
- *
- * To guarantee that this does not starve expired tasks we ignore the
- * interactivity of a task if the first expired task had to wait more
- * than a 'reasonable' amount of time. This deadline timeout is
- * load-dependent, as the frequency of array switched decreases with
- * increasing number of running tasks. We also ignore the interactivity
- * if a better static_prio task has expired:
- */
-#define EXPIRED_STARVING(rq) \
-	((STARVATION_LIMIT && ((rq)->expired_timestamp && \
-		(jiffies - (rq)->expired_timestamp >= \
-			STARVATION_LIMIT * ((rq)->nr_running) + 1))) || \
-			((rq)->curr->static_prio > (rq)->best_expired_prio))
-
-/*
  * Account user cpu time to a process.
  * @p: the process that the cpu time gets accounted to
  * @hardirq_offset: the offset to subtract from hardirq_count()



^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-17 17:15           ` Mike Galbraith
@ 2006-03-20  7:09             ` Mike Galbraith
  2006-03-20 10:22               ` Ingo Molnar
  2006-03-21  6:47               ` Willy Tarreau
  0 siblings, 2 replies; 112+ messages in thread
From: Mike Galbraith @ 2006-03-20  7:09 UTC (permalink / raw)
  To: lkml; +Cc: Ingo Molnar, Andrew Morton, Con Kolivas

[-- Attachment #1: Type: text/plain, Size: 1613 bytes --]

On Fri, 2006-03-17 at 18:15 +0100, Mike Galbraith wrote: 
> Problem solved.  I now know why the starvation logic doesn't work.
> Wakeups.  In the face of 10+ copies of httpd constantly waking up, it
> seems it just takes ages to get around to switching arrays.
> 
> With the (urp) patch below, I now get...
> 
> [root]:# time netstat|grep :81|wc -l
>    1648
> 
> real    0m27.735s
> user    0m0.158s
> sys     0m0.111s
> [root]:# time netstat|grep :81|wc -l
>    1817
> 
> real    0m13.550s
> user    0m0.121s
> sys     0m0.186s
> [root]:# time netstat|grep :81|wc -l
>    1641
> 
> real    0m17.022s
> user    0m0.132s
> sys     0m0.143s
> [root]:# 

For those interested in these kind of things, here are the numbers for
2.6.16-rc6-mm2 with my [tarball] throttle patches applied...

[root]:# time netstat|grep :81|wc -l
   1681

real    0m1.525s
user    0m0.141s
sys     0m0.136s
[root]:# time netstat|grep :81|wc -l
   1491

real    0m0.356s
user    0m0.130s
sys     0m0.114s
[root]:# time netstat|grep :81|wc -l
   1527

real    0m0.343s
user    0m0.129s
sys     0m0.114s
[root]:# time netstat|grep :81|wc -l
   1568

real    0m0.512s
user    0m0.112s
sys     0m0.138s

...while running with the same apache loadavg of over 10, and tunables
set to server mode (0,0).

<plug>
Even a desktop running with these settings is so interactive that I
could play a game of Maelstrom (asteroids like thing) while doing a make
-j30 in slow nfs mount and barely feel it.  In a local filesystem, I
could't feel it at all, so I added a thud 3, irman2 and a bonnie -s 2047
for good measure.  Try that with stock :)
</plug>

[-- Attachment #2: throttle-V22-2.6.16-rc6-mm2.tar.gz --]
[-- Type: application/x-compressed-tar, Size: 7205 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-20  7:09             ` Mike Galbraith
@ 2006-03-20 10:22               ` Ingo Molnar
  2006-03-21  6:47               ` Willy Tarreau
  1 sibling, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2006-03-20 10:22 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: lkml, Andrew Morton, Con Kolivas


* Mike Galbraith <efault@gmx.de> wrote:

> <plug>
> Even a desktop running with these settings is so interactive that I 
> could play a game of Maelstrom (asteroids like thing) while doing a 
> make -j30 in slow nfs mount and barely feel it.  In a local 
> filesystem, I could't feel it at all, so I added a thud 3, irman2 and 
> a bonnie -s 2047 for good measure.  Try that with stock :)
> </plug>

great! Please make sure all the patches make their way into -mm. We 
definitely want to try this for v2.6.17. Increasing starvation 
resistance _and_ interactivity via the same patchset is a rare feat ;-)

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-20  7:09             ` Mike Galbraith
  2006-03-20 10:22               ` Ingo Molnar
@ 2006-03-21  6:47               ` Willy Tarreau
  2006-03-21  7:51                 ` Mike Galbraith
  1 sibling, 1 reply; 112+ messages in thread
From: Willy Tarreau @ 2006-03-21  6:47 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: lkml, Ingo Molnar, Andrew Morton, Con Kolivas

Hi Mike,

On Mon, Mar 20, 2006 at 08:09:13AM +0100, Mike Galbraith wrote:
(...) 
> For those interested in these kind of things, here are the numbers for
> 2.6.16-rc6-mm2 with my [tarball] throttle patches applied...
> 
> [root]:# time netstat|grep :81|wc -l
>    1681
> 
> real    0m1.525s
> user    0m0.141s
> sys     0m0.136s
> [root]:# time netstat|grep :81|wc -l
>    1491
> 
> real    0m0.356s
> user    0m0.130s
> sys     0m0.114s
> [root]:# time netstat|grep :81|wc -l
>    1527
> 
> real    0m0.343s
> user    0m0.129s
> sys     0m0.114s
> [root]:# time netstat|grep :81|wc -l
>    1568
> 
> real    0m0.512s
> user    0m0.112s
> sys     0m0.138s
> 
> ...while running with the same apache loadavg of over 10, and tunables
> set to server mode (0,0).
> 
> <plug>
> Even a desktop running with these settings is so interactive that I
> could play a game of Maelstrom (asteroids like thing) while doing a make
> -j30 in slow nfs mount and barely feel it.  In a local filesystem, I
> could't feel it at all, so I added a thud 3, irman2 and a bonnie -s 2047
> for good measure.  Try that with stock :)
> </plug>

Very good job !
I told Grant in a private email that I felt confident the problem would
quickly be solved now that someone familiar with the scheduler could
reliably reproduce it. Your numbers look excellent, I'm willing to test.
Could you remind us what kernel and what patches we need to apply to
try the same, please ?

Cheers,
Willy


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21  6:47               ` Willy Tarreau
@ 2006-03-21  7:51                 ` Mike Galbraith
  2006-03-21  9:13                   ` Willy Tarreau
  0 siblings, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-21  7:51 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: lkml, Ingo Molnar, Andrew Morton, Con Kolivas

[-- Attachment #1: Type: text/plain, Size: 1864 bytes --]

On Tue, 2006-03-21 at 07:47 +0100, Willy Tarreau wrote:
> Hi Mike,

Greetings!

> On Mon, Mar 20, 2006 at 08:09:13AM +0100, Mike Galbraith wrote:
> > real    0m0.512s
> > user    0m0.112s
> > sys     0m0.138s
> > 
> > ...while running with the same apache loadavg of over 10, and tunables
> > set to server mode (0,0).
...

> Very good job !
> I told Grant in a private email that I felt confident the problem would
> quickly be solved now that someone familiar with the scheduler could
> reliably reproduce it. Your numbers look excellent, I'm willing to test.
> Could you remind us what kernel and what patches we need to apply to
> try the same, please ?

You bet.  I'm most happy to have someone try it other than me :)

Apply the patches from the attached tarball in the obvious order to
2.6.16-rc6-mm2.  As delivered, it's knobs are set up for a desktop box.
For a server, you'll probably want maximum starvation resistance, so
echo 0 > /proc/sys/kernel/grace_g1 and grace_g2.  This will set the time
a task can exceed expected cpu (based upon sleep_avg) to zero seconds,
ie immediate throttling upon detection.  It will also disable some
interactivity specific code in the scheduler.

If you want to fiddle with the knobs, grace_g1 is the number of CPU
seconds a new task is authorized to run completely free of any
intervention... startup in a desktop environment.  grace_g2 is the
amount of CPU seconds a well behaved task can store for later usage.
With the throttling patch, an interactive task must earn the right to
exceed expected cpu by performing within expectations.  The longer the
task behaves, the more 'good carma' it earns.  This allows interactive
tasks to do a burst of activity, but the user determines how long that
burst==starvation is authorized.  Tasks with just use as much cpu as
they can get run headlong into the throttle.

	-Mike

[-- Attachment #2: throttle-V23-2.6.16-rc6-mm2.tar.gz --]
[-- Type: application/x-compressed-tar, Size: 7259 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21  7:51                 ` Mike Galbraith
@ 2006-03-21  9:13                   ` Willy Tarreau
  2006-03-21  9:14                     ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: Willy Tarreau @ 2006-03-21  9:13 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: lkml, Ingo Molnar, Andrew Morton, Con Kolivas


On Tue, Mar 21, 2006 at 08:51:38AM +0100, Mike Galbraith wrote:
> On Tue, 2006-03-21 at 07:47 +0100, Willy Tarreau wrote:
> > Hi Mike,
> 
> Greetings!

Thanks for the details,
I'll try to find some time to test your code quickly. If this fixes this
long standing problem, we should definitely try to get it into 2.6.17 !

Cheers,
Willy


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21  9:13                   ` Willy Tarreau
@ 2006-03-21  9:14                     ` Ingo Molnar
  2006-03-21 11:15                       ` Willy Tarreau
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2006-03-21  9:14 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Mike Galbraith, lkml, Andrew Morton, Con Kolivas


* Willy Tarreau <willy@w.ods.org> wrote:

> 
> On Tue, Mar 21, 2006 at 08:51:38AM +0100, Mike Galbraith wrote:
> > On Tue, 2006-03-21 at 07:47 +0100, Willy Tarreau wrote:
> > > Hi Mike,
> > 
> > Greetings!
> 
> Thanks for the details,
> I'll try to find some time to test your code quickly. If this fixes this
> long standing problem, we should definitely try to get it into 2.6.17 !

the time window is quickly closing for that to happen though.

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21  9:14                     ` Ingo Molnar
@ 2006-03-21 11:15                       ` Willy Tarreau
  2006-03-21 11:18                         ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: Willy Tarreau @ 2006-03-21 11:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mike Galbraith, lkml, Andrew Morton, Con Kolivas, bugsplatter

On Tue, Mar 21, 2006 at 10:14:22AM +0100, Ingo Molnar wrote:
> 
> * Willy Tarreau <willy@w.ods.org> wrote:
> 
> > 
> > On Tue, Mar 21, 2006 at 08:51:38AM +0100, Mike Galbraith wrote:
> > > On Tue, 2006-03-21 at 07:47 +0100, Willy Tarreau wrote:
> > > > Hi Mike,
> > > 
> > > Greetings!
> > 
> > Thanks for the details,
> > I'll try to find some time to test your code quickly. If this fixes this
> > long standing problem, we should definitely try to get it into 2.6.17 !
> 
> the time window is quickly closing for that to happen though.

Ingo, Mike,

it's a great day :-)

Right now, I'm typing this mail from my notebook which has 8 instances of
my exploit running in background. Previously, 4 of them were enough on this
machine to create pauses of up to 31 seconds. Right now, I can type normally,
and I simply can say that my exploit has no effect anymore ! It's just
consuming CPU and nothing else. I also tried to write 0 to grace_g[12] and
I find it even more responsive with 0 in those values. I've not had time to
do more extensive tests, but I can assure you that the problem is clearly
solved for me. I'd like Grant to test ssh on his firewall with it too.

Congratulations !
Willy


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 11:15                       ` Willy Tarreau
@ 2006-03-21 11:18                         ` Ingo Molnar
  2006-03-21 11:53                           ` Con Kolivas
  2006-03-21 12:07                           ` Mike Galbraith
  0 siblings, 2 replies; 112+ messages in thread
From: Ingo Molnar @ 2006-03-21 11:18 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Mike Galbraith, lkml, Andrew Morton, Con Kolivas, bugsplatter


* Willy Tarreau <willy@w.ods.org> wrote:

> On Tue, Mar 21, 2006 at 10:14:22AM +0100, Ingo Molnar wrote:
> > 
> > * Willy Tarreau <willy@w.ods.org> wrote:
> > 
> > > 
> > > On Tue, Mar 21, 2006 at 08:51:38AM +0100, Mike Galbraith wrote:
> > > > On Tue, 2006-03-21 at 07:47 +0100, Willy Tarreau wrote:
> > > > > Hi Mike,
> > > > 
> > > > Greetings!
> > > 
> > > Thanks for the details,
> > > I'll try to find some time to test your code quickly. If this fixes this
> > > long standing problem, we should definitely try to get it into 2.6.17 !
> > 
> > the time window is quickly closing for that to happen though.
> 
> Ingo, Mike,
> 
> it's a great day :-)
> 
> Right now, I'm typing this mail from my notebook which has 8 instances 
> of my exploit running in background. Previously, 4 of them were enough 
> on this machine to create pauses of up to 31 seconds. Right now, I can 
> type normally, and I simply can say that my exploit has no effect 
> anymore ! It's just consuming CPU and nothing else. I also tried to 
> write 0 to grace_g[12] and I find it even more responsive with 0 in 
> those values. I've not had time to do more extensive tests, but I can 
> assure you that the problem is clearly solved for me. I'd like Grant 
> to test ssh on his firewall with it too.

great work by Mike! One detail: i'd like there to be just one default 
throttling value, i.e. no grace_g tunables [so that we have just one 
default scheduler behavior]. Is the default grace_g[12] setting good 
enough for your workload?

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 11:18                         ` Ingo Molnar
@ 2006-03-21 11:53                           ` Con Kolivas
  2006-03-21 13:10                             ` Mike Galbraith
  2006-03-21 12:07                           ` Mike Galbraith
  1 sibling, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-21 11:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Willy Tarreau, Mike Galbraith, lkml, Andrew Morton, bugsplatter

On Tuesday 21 March 2006 22:18, Ingo Molnar wrote:
> great work by Mike! One detail: i'd like there to be just one default
> throttling value, i.e. no grace_g tunables [so that we have just one
> default scheduler behavior]. Is the default grace_g[12] setting good
> enough for your workload?

I agree. If anything is required, a simple on/off tunable makes much more 
sense. Much like I suggested ages ago with an "interactive" switch which was 
rather unpopular when I first suggested it. Perhaps my marketing was wrong. 
Oh well.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 11:18                         ` Ingo Molnar
  2006-03-21 11:53                           ` Con Kolivas
@ 2006-03-21 12:07                           ` Mike Galbraith
  2006-03-21 12:59                             ` Willy Tarreau
  1 sibling, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-21 12:07 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Willy Tarreau, lkml, Andrew Morton, Con Kolivas, bugsplatter

On Tue, 2006-03-21 at 12:18 +0100, Ingo Molnar wrote:

> great work by Mike! One detail: i'd like there to be just one default 
> throttling value, i.e. no grace_g tunables [so that we have just one 
> default scheduler behavior]. Is the default grace_g[12] setting good 
> enough for your workload?

I can make the knobs compile time so we don't see random behavior
reports, but I don't think they can be totally eliminated.  Would that
be sufficient?

If so, the numbers as delivered should be fine for desktop boxen I
think.  People who are building custom kernels can bend to fit as
always.

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 12:07                           ` Mike Galbraith
@ 2006-03-21 12:59                             ` Willy Tarreau
  2006-03-21 13:24                               ` Mike Galbraith
  0 siblings, 1 reply; 112+ messages in thread
From: Willy Tarreau @ 2006-03-21 12:59 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Ingo Molnar, lkml, Andrew Morton, Con Kolivas, bugsplatter

On Tue, Mar 21, 2006 at 01:07:58PM +0100, Mike Galbraith wrote:
> On Tue, 2006-03-21 at 12:18 +0100, Ingo Molnar wrote:
> 
> > great work by Mike! One detail: i'd like there to be just one default 
> > throttling value, i.e. no grace_g tunables [so that we have just one 
> > default scheduler behavior]. Is the default grace_g[12] setting good 
> > enough for your workload?

The default values are infinitely better than mainline, but it is still
a huge improvement to reduce them (at least grace_g2) :

default : grace_g1=10, grace_g2=14400, loadavg oscillating between 7 and 12 :

willy@wtap:~$ time ls -la /data/src/tmp/|wc
   2271   18250  212211

real    0m5.759s
user    0m0.028s
sys     0m0.008s
willy@wtap:~$ time ls -la /data/src/tmp/|wc
   2271   18250  212211

real    0m3.476s
user    0m0.020s
sys     0m0.016s
willy@wtap:~$ 

I can still observe some occasionnal pauses of 1 to 3 seconds (once
to 4 times per minute).

- grace_g2 set to 0, load converges to a stable 8 :

willy@wtap:~$ time ls -la /data/src/tmp/|wc
   2271   18250  212211

real    0m0.441s
user    0m0.036s
sys     0m0.004s
willy@wtap:~$ time ls -la /data/src/tmp/|wc
   2271   18250  212211

real    0m0.400s
user    0m0.032s
sys     0m0.008s

I can still observe some rare cases of 1 second pauses (once or twice per
minute).

- grace_g2 and grace_g1 set to zero :

willy@wtap:~$ time ls -la /data/src/tmp/|wc
   2271   18250  212211

real    0m0.214s
user    0m0.028s
sys     0m0.008s
willy@wtap:~$ time ls -la /data/src/tmp/|wc
   2271   18250  212211

real    0m0.193s
user    0m0.032s
sys     0m0.008s

=> I never observe any pause, and the numbers above sometimes even
   get lower (around 75 ms).

I have also tried injecting traffic on my proxy, and at 16000 hits/s,
its does not impact overall system's responsiveness, whatever (g1,g2).

> I can make the knobs compile time so we don't see random behavior
> reports, but I don't think they can be totally eliminated.  Would that
> be sufficient?
> 
> If so, the numbers as delivered should be fine for desktop boxen I
> think.  People who are building custom kernels can bend to fit as
> always.

That would suit me perfectly. I think I would set them both to zero.
It's not clear to me what workload they can help, it seems that they
try to allow a sometimes unfair scheduling.

> 	-Mike

Cheers,
Willy


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 11:53                           ` Con Kolivas
@ 2006-03-21 13:10                             ` Mike Galbraith
  2006-03-21 13:13                               ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-21 13:10 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Ingo Molnar, Willy Tarreau, lkml, Andrew Morton, bugsplatter

On Tue, 2006-03-21 at 22:53 +1100, Con Kolivas wrote:
> On Tuesday 21 March 2006 22:18, Ingo Molnar wrote:
> > great work by Mike! One detail: i'd like there to be just one default
> > throttling value, i.e. no grace_g tunables [so that we have just one
> > default scheduler behavior]. Is the default grace_g[12] setting good
> > enough for your workload?
> 
> I agree. If anything is required, a simple on/off tunable makes much more 
> sense. Much like I suggested ages ago with an "interactive" switch which was 
> rather unpopular when I first suggested it.

Let me try to explain why on/off is not sufficient.

You notice how Willy said that his notebook is more responsive with
tunables set to 0,0?  That's important, because it's absolutely true...
depending what you're doing.  Setting tunables to 0,0 cuts off the idle
sleep logic, and the sleep_avg divisor - both of which were put there
specifically for interactivity - and returns the scheduler to more or
less original O(1) scheduler.  You and I both know that these are most
definitely needed in a Desktop environment.  For instance, if Willy
starts editing code in X, and scrolls while something is running in the
background, he'll suddenly say hey, maybe this _ain't_ more responsive,
because all of a sudden the starvation added with the interactivity
logic will be sorely missed as my throttle wrings X's neck.

How long should Willy be able to scroll without feeling the background,
and how long should Apache be able to starve his shell.  They are one
and the same, and I can't say, because I'm not Willy.  I don't know how
to get there from here without tunables.  Picking defaults is one thing,
but I don't know how to make it one-size-fits-all.  For the general
case, the values delivered will work fine.  For the apache case, they
absolutely 100% guaranteed will not.

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 13:10                             ` Mike Galbraith
@ 2006-03-21 13:13                               ` Con Kolivas
  2006-03-21 13:33                                 ` Mike Galbraith
  2006-03-21 13:38                                 ` Willy Tarreau
  0 siblings, 2 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-21 13:13 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Willy Tarreau, lkml, Andrew Morton, bugsplatter

On Wednesday 22 March 2006 00:10, Mike Galbraith wrote:
> On Tue, 2006-03-21 at 22:53 +1100, Con Kolivas wrote:
> > On Tuesday 21 March 2006 22:18, Ingo Molnar wrote:
> > > great work by Mike! One detail: i'd like there to be just one default
> > > throttling value, i.e. no grace_g tunables [so that we have just one
> > > default scheduler behavior]. Is the default grace_g[12] setting good
> > > enough for your workload?
> >
> > I agree. If anything is required, a simple on/off tunable makes much more
> > sense. Much like I suggested ages ago with an "interactive" switch which
> > was rather unpopular when I first suggested it.
>
> Let me try to explain why on/off is not sufficient.
>
> You notice how Willy said that his notebook is more responsive with
> tunables set to 0,0?  That's important, because it's absolutely true...
> depending what you're doing.  Setting tunables to 0,0 cuts off the idle
> sleep logic, and the sleep_avg divisor - both of which were put there
> specifically for interactivity - and returns the scheduler to more or
> less original O(1) scheduler.  You and I both know that these are most
> definitely needed in a Desktop environment.  For instance, if Willy
> starts editing code in X, and scrolls while something is running in the
> background, he'll suddenly say hey, maybe this _ain't_ more responsive,
> because all of a sudden the starvation added with the interactivity
> logic will be sorely missed as my throttle wrings X's neck.
>
> How long should Willy be able to scroll without feeling the background,
> and how long should Apache be able to starve his shell.  They are one
> and the same, and I can't say, because I'm not Willy.  I don't know how
> to get there from here without tunables.  Picking defaults is one thing,
> but I don't know how to make it one-size-fits-all.  For the general
> case, the values delivered will work fine.  For the apache case, they
> absolutely 100% guaranteed will not.

So how do you propose we tune such a beast then? Apache users will use off, 
everyone else will have no idea but to use the defaults.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 12:59                             ` Willy Tarreau
@ 2006-03-21 13:24                               ` Mike Galbraith
  2006-03-21 13:53                                 ` Con Kolivas
  2006-03-21 22:51                                 ` Peter Williams
  0 siblings, 2 replies; 112+ messages in thread
From: Mike Galbraith @ 2006-03-21 13:24 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Ingo Molnar, lkml, Andrew Morton, Con Kolivas, bugsplatter

On Tue, 2006-03-21 at 13:59 +0100, Willy Tarreau wrote:
> On Tue, Mar 21, 2006 at 01:07:58PM +0100, Mike Galbraith wrote:

> > I can make the knobs compile time so we don't see random behavior
> > reports, but I don't think they can be totally eliminated.  Would that
> > be sufficient?
> > 
> > If so, the numbers as delivered should be fine for desktop boxen I
> > think.  People who are building custom kernels can bend to fit as
> > always.
> 
> That would suit me perfectly. I think I would set them both to zero.
> It's not clear to me what workload they can help, it seems that they
> try to allow a sometimes unfair scheduling.

Correct.  Massively unfair scheduling is what interactivity requires.

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 13:13                               ` Con Kolivas
@ 2006-03-21 13:33                                 ` Mike Galbraith
  2006-03-21 13:37                                   ` Con Kolivas
  2006-03-21 13:38                                 ` Willy Tarreau
  1 sibling, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-21 13:33 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Ingo Molnar, Willy Tarreau, lkml, Andrew Morton, bugsplatter

On Wed, 2006-03-22 at 00:13 +1100, Con Kolivas wrote:
> On Wednesday 22 March 2006 00:10, Mike Galbraith wrote:
> > How long should Willy be able to scroll without feeling the background,
> > and how long should Apache be able to starve his shell.  They are one
> > and the same, and I can't say, because I'm not Willy.  I don't know how
> > to get there from here without tunables.  Picking defaults is one thing,
> > but I don't know how to make it one-size-fits-all.  For the general
> > case, the values delivered will work fine.  For the apache case, they
> > absolutely 100% guaranteed will not.
> 
> So how do you propose we tune such a beast then? Apache users will use off, 
> everyone else will have no idea but to use the defaults.

Set for desktop, which is intended to mostly emulate what we have right
now, which most people are quite happy with.  The throttle will still
nail most of the corner cases, and the other adjustments nail the
majority of what's left.  That leaves the hefty server type loads as
what certainly will require tuning.  They always need tuning.

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 13:33                                 ` Mike Galbraith
@ 2006-03-21 13:37                                   ` Con Kolivas
  2006-03-21 13:44                                     ` Willy Tarreau
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-21 13:37 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Willy Tarreau, lkml, Andrew Morton, bugsplatter

On Wednesday 22 March 2006 00:33, Mike Galbraith wrote:
> On Wed, 2006-03-22 at 00:13 +1100, Con Kolivas wrote:
> > On Wednesday 22 March 2006 00:10, Mike Galbraith wrote:
> > > How long should Willy be able to scroll without feeling the background,
> > > and how long should Apache be able to starve his shell.  They are one
> > > and the same, and I can't say, because I'm not Willy.  I don't know how
> > > to get there from here without tunables.  Picking defaults is one
> > > thing, but I don't know how to make it one-size-fits-all.  For the
> > > general case, the values delivered will work fine.  For the apache
> > > case, they absolutely 100% guaranteed will not.
> >
> > So how do you propose we tune such a beast then? Apache users will use
> > off, everyone else will have no idea but to use the defaults.
>
> Set for desktop, which is intended to mostly emulate what we have right
> now, which most people are quite happy with.  The throttle will still
> nail most of the corner cases, and the other adjustments nail the
> majority of what's left.  That leaves the hefty server type loads as
> what certainly will require tuning.  They always need tuning.

That still sounds like just on/off to me. Default for desktop and 0,0 for 
server. Am I missing something?

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 13:13                               ` Con Kolivas
  2006-03-21 13:33                                 ` Mike Galbraith
@ 2006-03-21 13:38                                 ` Willy Tarreau
  2006-03-21 13:48                                   ` Mike Galbraith
  1 sibling, 1 reply; 112+ messages in thread
From: Willy Tarreau @ 2006-03-21 13:38 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wed, Mar 22, 2006 at 12:13:15AM +1100, Con Kolivas wrote:
> On Wednesday 22 March 2006 00:10, Mike Galbraith wrote:
> > On Tue, 2006-03-21 at 22:53 +1100, Con Kolivas wrote:
> > > On Tuesday 21 March 2006 22:18, Ingo Molnar wrote:
> > > > great work by Mike! One detail: i'd like there to be just one default
> > > > throttling value, i.e. no grace_g tunables [so that we have just one
> > > > default scheduler behavior]. Is the default grace_g[12] setting good
> > > > enough for your workload?
> > >
> > > I agree. If anything is required, a simple on/off tunable makes much more
> > > sense. Much like I suggested ages ago with an "interactive" switch which
> > > was rather unpopular when I first suggested it.
> >
> > Let me try to explain why on/off is not sufficient.
> >
> > You notice how Willy said that his notebook is more responsive with
> > tunables set to 0,0?  That's important, because it's absolutely true...
> > depending what you're doing.  Setting tunables to 0,0 cuts off the idle
> > sleep logic, and the sleep_avg divisor - both of which were put there
> > specifically for interactivity - and returns the scheduler to more or
> > less original O(1) scheduler.  You and I both know that these are most
> > definitely needed in a Desktop environment.  For instance, if Willy
> > starts editing code in X, and scrolls while something is running in the
> > background, he'll suddenly say hey, maybe this _ain't_ more responsive,
> > because all of a sudden the starvation added with the interactivity
> > logic will be sorely missed as my throttle wrings X's neck.
> >
> > How long should Willy be able to scroll without feeling the background,
> > and how long should Apache be able to starve his shell.  They are one
> > and the same, and I can't say, because I'm not Willy.  I don't know how
> > to get there from here without tunables.  Picking defaults is one thing,
> > but I don't know how to make it one-size-fits-all.  For the general
> > case, the values delivered will work fine.  For the apache case, they
> > absolutely 100% guaranteed will not.
> 
> So how do you propose we tune such a beast then? Apache users will use off, 
> everyone else will have no idea but to use the defaults.

What you describe is exactly a case for a tunable. Different people with
different workloads want different values. Seems fair enough. After all,
we already have /proc/sys/vm/swappiness, and things like that for the same
reason : the default value should suit most users, and the ones with
knowledge and different needs can tune their system. Maybe grace_{g1,g2}
should be renamed to be more explicit, may be we can automatically tune
one from the other and let only one tunable. But if both have a useful
effect, I don't see a reason for hiding them.

> Cheers,
> Con

Cheers,
Willy


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 13:37                                   ` Con Kolivas
@ 2006-03-21 13:44                                     ` Willy Tarreau
  2006-03-21 13:45                                       ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Willy Tarreau @ 2006-03-21 13:44 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wed, Mar 22, 2006 at 12:37:51AM +1100, Con Kolivas wrote:
> On Wednesday 22 March 2006 00:33, Mike Galbraith wrote:
> > On Wed, 2006-03-22 at 00:13 +1100, Con Kolivas wrote:
> > > On Wednesday 22 March 2006 00:10, Mike Galbraith wrote:
> > > > How long should Willy be able to scroll without feeling the background,
> > > > and how long should Apache be able to starve his shell.  They are one
> > > > and the same, and I can't say, because I'm not Willy.  I don't know how
> > > > to get there from here without tunables.  Picking defaults is one
> > > > thing, but I don't know how to make it one-size-fits-all.  For the
> > > > general case, the values delivered will work fine.  For the apache
> > > > case, they absolutely 100% guaranteed will not.
> > >
> > > So how do you propose we tune such a beast then? Apache users will use
> > > off, everyone else will have no idea but to use the defaults.
> >
> > Set for desktop, which is intended to mostly emulate what we have right
> > now, which most people are quite happy with.  The throttle will still
> > nail most of the corner cases, and the other adjustments nail the
> > majority of what's left.  That leaves the hefty server type loads as
> > what certainly will require tuning.  They always need tuning.
> 
> That still sounds like just on/off to me. Default for desktop and 0,0 for 
> server. Am I missing something?

Believe it or not, there *are* people running their servers with full
graphical environments. At the place we first encountered the interactivity
problem with my load-balancer, they first installed in on a full FC2 with the
OpenGL screen saver... No need to say they had scaling difficulties and trouble
to log in !

Although that's a stupid thing to do, what I want to show is that even on
servers, you can't easily predict the workload. Maybe a server which often
forks processes for dedicated tasks (eg: monitoring) would prefer running
between "desktop" and "server" mode.

> Cheers,
> Con

Cheers,
Willy


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 13:44                                     ` Willy Tarreau
@ 2006-03-21 13:45                                       ` Con Kolivas
  2006-03-21 14:01                                         ` Mike Galbraith
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-21 13:45 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wednesday 22 March 2006 00:44, Willy Tarreau wrote:
> On Wed, Mar 22, 2006 at 12:37:51AM +1100, Con Kolivas wrote:
> > On Wednesday 22 March 2006 00:33, Mike Galbraith wrote:
> > > On Wed, 2006-03-22 at 00:13 +1100, Con Kolivas wrote:
> > > > On Wednesday 22 March 2006 00:10, Mike Galbraith wrote:
> > > > > How long should Willy be able to scroll without feeling the
> > > > > background, and how long should Apache be able to starve his shell.
> > > > >  They are one and the same, and I can't say, because I'm not Willy.
> > > > >  I don't know how to get there from here without tunables.  Picking
> > > > > defaults is one thing, but I don't know how to make it
> > > > > one-size-fits-all.  For the general case, the values delivered will
> > > > > work fine.  For the apache case, they absolutely 100% guaranteed
> > > > > will not.
> > > >
> > > > So how do you propose we tune such a beast then? Apache users will
> > > > use off, everyone else will have no idea but to use the defaults.
> > >
> > > Set for desktop, which is intended to mostly emulate what we have right
> > > now, which most people are quite happy with.  The throttle will still
> > > nail most of the corner cases, and the other adjustments nail the
> > > majority of what's left.  That leaves the hefty server type loads as
> > > what certainly will require tuning.  They always need tuning.
> >
> > That still sounds like just on/off to me. Default for desktop and 0,0 for
> > server. Am I missing something?
>
> Believe it or not, there *are* people running their servers with full
> graphical environments. At the place we first encountered the interactivity
> problem with my load-balancer, they first installed in on a full FC2 with
> the OpenGL screen saver... No need to say they had scaling difficulties and
> trouble to log in !
>
> Although that's a stupid thing to do, what I want to show is that even on
> servers, you can't easily predict the workload. Maybe a server which often
> forks processes for dedicated tasks (eg: monitoring) would prefer running
> between "desktop" and "server" mode.

I give up. Add as many tunables as you like in as many places as possible that 
even less people will understand. You've already told me you'll be running 
0,0.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 13:38                                 ` Willy Tarreau
@ 2006-03-21 13:48                                   ` Mike Galbraith
  0 siblings, 0 replies; 112+ messages in thread
From: Mike Galbraith @ 2006-03-21 13:48 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Con Kolivas, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Tue, 2006-03-21 at 14:38 +0100, Willy Tarreau wrote:
> What you describe is exactly a case for a tunable. Different people with
> different workloads want different values. Seems fair enough. After all,
> we already have /proc/sys/vm/swappiness, and things like that for the same
> reason : the default value should suit most users, and the ones with
> knowledge and different needs can tune their system. Maybe grace_{g1,g2}
> should be renamed to be more explicit, may be we can automatically tune
> one from the other and let only one tunable. But if both have a useful
> effect, I don't see a reason for hiding them.

I'm wide open to suggestions.  I tried to make it functional, flexible,
and above all, dirt simple.  Adding 'acceptable' would be cool :)

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 13:24                               ` Mike Galbraith
@ 2006-03-21 13:53                                 ` Con Kolivas
  2006-03-21 14:17                                   ` Mike Galbraith
  2006-03-21 22:51                                 ` Peter Williams
  1 sibling, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-21 13:53 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wednesday 22 March 2006 00:24, Mike Galbraith wrote:
> On Tue, 2006-03-21 at 13:59 +0100, Willy Tarreau wrote:
> > That would suit me perfectly. I think I would set them both to zero.
> > It's not clear to me what workload they can help, it seems that they
> > try to allow a sometimes unfair scheduling.
>
> Correct.  Massively unfair scheduling is what interactivity requires.

To some degree, yes. Transient unfairness was all that it was supposed to do 
and clearly it failed at being transient. 

I would argue that good interactivity is possible with fairness by changing 
the design. I won't go there (to try and push it that is), though, as the 
opposition to changing the whole scheduler in place or making it pluggable 
has already been voiced numerous times over, and it would kill me to try and 
promote such an alternative ever again. Especially since the number of people 
willing to test interactive patches and report to lkml has dropped to 
virtually nil. 

The yardstick for changes is now the speed of 'ls' scrolling in the console. 
Where exactly are those extra cycles going I wonder? Do you think the 
scheduler somehow makes the cpu idle doing nothing in that timespace? Clearly 
that's not true, and userspace is making something spin unnecessarily, but 
we're gonna fix that by modifying the scheduler.... sigh

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 13:45                                       ` Con Kolivas
@ 2006-03-21 14:01                                         ` Mike Galbraith
  2006-03-21 14:17                                           ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-21 14:01 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wed, 2006-03-22 at 00:45 +1100, Con Kolivas wrote:

> I give up. Add as many tunables as you like in as many places as possible that 
> even less people will understand. You've already told me you'll be running 
> 0,0.

Instead of giving up, how about look at the code and make a suggestion
for improvement?  It's not an easy problem, as you're well aware.

I really don't see why you're (seemingly) getting irate.  Tunables for
this are no different that tunables like CHILD_PENALTY etc etc etc.  How
many casual users know those exist, much less understand them?

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 13:53                                 ` Con Kolivas
@ 2006-03-21 14:17                                   ` Mike Galbraith
  2006-03-21 14:19                                     ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-21 14:17 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote:
> The yardstick for changes is now the speed of 'ls' scrolling in the console. 
> Where exactly are those extra cycles going I wonder? Do you think the 
> scheduler somehow makes the cpu idle doing nothing in that timespace? Clearly 
> that's not true, and userspace is making something spin unnecessarily, but 
> we're gonna fix that by modifying the scheduler.... sigh

*Blink*

Are you having a bad hair day??

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:01                                         ` Mike Galbraith
@ 2006-03-21 14:17                                           ` Con Kolivas
  2006-03-21 15:20                                             ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-21 14:17 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wednesday 22 March 2006 01:01, Mike Galbraith wrote:
> On Wed, 2006-03-22 at 00:45 +1100, Con Kolivas wrote:
> > I give up. Add as many tunables as you like in as many places as possible
> > that even less people will understand. You've already told me you'll be
> > running 0,0.
>
> Instead of giving up, how about look at the code and make a suggestion
> for improvement?  It's not an easy problem, as you're well aware.
>
> I really don't see why you're (seemingly) getting irate.  Tunables for
> this are no different that tunables like CHILD_PENALTY etc etc etc.  How
> many casual users know those exist, much less understand them?

Because I strongly believe that tunables for this sort of thing are wrong. 
CHILD_PENALTY and friends have never been exported apart from out-of-tree 
patches. These were meant to be tuned in the kernel and never exported. Ingo 
didn't want *any* tunables so I'm relatively flexible with an on/off switch 
which he doesn't like. I really do believe most users will only have it on or 
off though. 

Don't think I'm ignoring your code. You inspired me to do the original patches 
3 years ago. 

I have looked at your patch at length and basically what it does is variably 
convert the interactive estimator from full to zero over some timeframe 
choosable with your tunables. Since most users will use either full or zero I 
actually believe the same effect can be had by a tiny modification to 
enable/disable the estimator anyway. This is not to deny you've done a lot of 
work and confirmed that the estimator running indefinitely unthrottled is 
bad. What timeframe is correct to throttle is impossible to say 
though :-( Most desktop users would be quite happy with indefinite because 
they basically do not hit workloads that "exploit" it. Most server/hybrid 
setups are willing to sacrifice some interactivity for fairness, and the 
basic active->expired design gives them enough interactivity without 
virtually any boost anyway. Ironically, audio is fabulous on such a design 
since it virtually never consumes a full timeslice. 

So any value you place on the timeframe as the default ends up being a 
compromise, and this is what Ingo is suggesting. This is similar to when 
sleep_avg changed from 10 seconds to 30 seconds to 2 seconds at various 
times. Luckily the non linear decay of sleep_avg circumvents that being 
relevant... but it also leads to the exact issue you're trying to fix. Once 
again we're left with choosing some number, and as much as I'd like to help 
since I really care about the desktop, I don't think any compromise is 
correct. Just on or off.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:17                                   ` Mike Galbraith
@ 2006-03-21 14:19                                     ` Con Kolivas
  2006-03-21 14:25                                       ` Ingo Molnar
                                                         ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-21 14:19 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wednesday 22 March 2006 01:17, Mike Galbraith wrote:
> On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote:
> > The yardstick for changes is now the speed of 'ls' scrolling in the
> > console. Where exactly are those extra cycles going I wonder? Do you
> > think the scheduler somehow makes the cpu idle doing nothing in that
> > timespace? Clearly that's not true, and userspace is making something
> > spin unnecessarily, but we're gonna fix that by modifying the
> > scheduler.... sigh
>
> *Blink*
>
> Are you having a bad hair day??

My hair is approximately 3mm long so it's kinda hard for that to happen. 

What you're fixing with unfairness is worth pursuing. The 'ls' issue just 
blows my mind though for reasons I've just said. Where are the magic cycles 
going when nothing else is running that make it take ten times longer?

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:19                                     ` Con Kolivas
@ 2006-03-21 14:25                                       ` Ingo Molnar
  2006-03-21 14:28                                         ` Con Kolivas
  2006-03-21 14:28                                       ` Mike Galbraith
  2006-03-21 14:39                                       ` Willy Tarreau
  2 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2006-03-21 14:25 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Mike Galbraith, Willy Tarreau, lkml, Andrew Morton, bugsplatter


* Con Kolivas <kernel@kolivas.org> wrote:

> On Wednesday 22 March 2006 01:17, Mike Galbraith wrote:
> > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote:
> > > The yardstick for changes is now the speed of 'ls' scrolling in the
> > > console. Where exactly are those extra cycles going I wonder? Do you
> > > think the scheduler somehow makes the cpu idle doing nothing in that
> > > timespace? Clearly that's not true, and userspace is making something
> > > spin unnecessarily, but we're gonna fix that by modifying the
> > > scheduler.... sigh
> >
> > *Blink*
> >
> > Are you having a bad hair day??
> 
> My hair is approximately 3mm long so it's kinda hard for that to happen. 
> 
> What you're fixing with unfairness is worth pursuing. The 'ls' issue 
> just blows my mind though for reasons I've just said. Where are the 
> magic cycles going when nothing else is running that make it take ten 
> times longer?

i believe such artifacts are due to array switches not happening (due to 
the workload getting queued back to rq->active, not rq->expired), and 
'ls' only gets a timeslice once in a while, every STARVATION_LIMIT
times. I.e. such workloads penalize the CPU-bound 'ls' process quite 
heavily.

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:25                                       ` Ingo Molnar
@ 2006-03-21 14:28                                         ` Con Kolivas
  2006-03-21 14:30                                           ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-21 14:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mike Galbraith, Willy Tarreau, lkml, Andrew Morton, bugsplatter

On Wednesday 22 March 2006 01:25, Ingo Molnar wrote:
> * Con Kolivas <kernel@kolivas.org> wrote:
> > What you're fixing with unfairness is worth pursuing. The 'ls' issue
> > just blows my mind though for reasons I've just said. Where are the
> > magic cycles going when nothing else is running that make it take ten
> > times longer?
>
> i believe such artifacts are due to array switches not happening (due to
> the workload getting queued back to rq->active, not rq->expired), and
> 'ls' only gets a timeslice once in a while, every STARVATION_LIMIT
> times. I.e. such workloads penalize the CPU-bound 'ls' process quite
> heavily.

With nothing else running on the machine it should still get all the cpu no 
matter which array it's on though.

Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:19                                     ` Con Kolivas
  2006-03-21 14:25                                       ` Ingo Molnar
@ 2006-03-21 14:28                                       ` Mike Galbraith
  2006-03-21 14:30                                         ` Con Kolivas
  2006-03-21 14:39                                       ` Willy Tarreau
  2 siblings, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-21 14:28 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wed, 2006-03-22 at 01:19 +1100, Con Kolivas wrote:
> On Wednesday 22 March 2006 01:17, Mike Galbraith wrote:
> > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote:
> > > The yardstick for changes is now the speed of 'ls' scrolling in the
> > > console. Where exactly are those extra cycles going I wonder? Do you
> > > think the scheduler somehow makes the cpu idle doing nothing in that
> > > timespace? Clearly that's not true, and userspace is making something
> > > spin unnecessarily, but we're gonna fix that by modifying the
> > > scheduler.... sigh
> >
> > *Blink*
> >
> > Are you having a bad hair day??
> 
> My hair is approximately 3mm long so it's kinda hard for that to happen. 
> 
> What you're fixing with unfairness is worth pursuing. The 'ls' issue just 
> blows my mind though for reasons I've just said. Where are the magic cycles 
> going when nothing else is running that make it take ten times longer?

What I was talking about when I mentioned scrolling was rendering.

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:28                                       ` Mike Galbraith
@ 2006-03-21 14:30                                         ` Con Kolivas
  2006-03-21 14:32                                           ` Ingo Molnar
  2006-03-21 14:36                                           ` Mike Galbraith
  0 siblings, 2 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-21 14:30 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wednesday 22 March 2006 01:28, Mike Galbraith wrote:
> On Wed, 2006-03-22 at 01:19 +1100, Con Kolivas wrote:
> > What you're fixing with unfairness is worth pursuing. The 'ls' issue just
> > blows my mind though for reasons I've just said. Where are the magic
> > cycles going when nothing else is running that make it take ten times
> > longer?
>
> What I was talking about when I mentioned scrolling was rendering.

I'm talking about the long standing report that 'ls' takes 10 times longer on 
2.6 90% of the time you run it, and doing 'ls | cat' makes it run as fast as 
2.4. This is what Willy has been fighting with.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:28                                         ` Con Kolivas
@ 2006-03-21 14:30                                           ` Ingo Molnar
  0 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2006-03-21 14:30 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Mike Galbraith, Willy Tarreau, lkml, Andrew Morton, bugsplatter


* Con Kolivas <kernel@kolivas.org> wrote:

> On Wednesday 22 March 2006 01:25, Ingo Molnar wrote:
> > * Con Kolivas <kernel@kolivas.org> wrote:
> > > What you're fixing with unfairness is worth pursuing. The 'ls' issue
> > > just blows my mind though for reasons I've just said. Where are the
> > > magic cycles going when nothing else is running that make it take ten
> > > times longer?
> >
> > i believe such artifacts are due to array switches not happening (due to
> > the workload getting queued back to rq->active, not rq->expired), and
> > 'ls' only gets a timeslice once in a while, every STARVATION_LIMIT
> > times. I.e. such workloads penalize the CPU-bound 'ls' process quite
> > heavily.
> 
> With nothing else running on the machine it should still get all the 
> cpu no matter which array it's on though.

yes. I thought you were asking why 'ls' pauses so long during the 
aforementioned workloads (of loadavg 7-8) - and i answered that. If you 
meant something else then please re-explain it to me.

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:30                                         ` Con Kolivas
@ 2006-03-21 14:32                                           ` Ingo Molnar
  2006-03-21 14:44                                             ` Willy Tarreau
  2006-03-21 14:36                                           ` Mike Galbraith
  1 sibling, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2006-03-21 14:32 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Mike Galbraith, Willy Tarreau, lkml, Andrew Morton, bugsplatter


* Con Kolivas <kernel@kolivas.org> wrote:

> On Wednesday 22 March 2006 01:28, Mike Galbraith wrote:
> > On Wed, 2006-03-22 at 01:19 +1100, Con Kolivas wrote:
> > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just
> > > blows my mind though for reasons I've just said. Where are the magic
> > > cycles going when nothing else is running that make it take ten times
> > > longer?
> >
> > What I was talking about when I mentioned scrolling was rendering.
> 
> I'm talking about the long standing report that 'ls' takes 10 times 
> longer on 2.6 90% of the time you run it, and doing 'ls | cat' makes 
> it run as fast as 2.4. This is what Willy has been fighting with.

ah. That's i think a gnome-terminal artifact - it does some really 
stupid dynamic things while rendering, it 'skips' certain portions of 
rendering, depending on the speed of scrolling. Gnome 2.14 ought to have 
that fixed i think.

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:30                                         ` Con Kolivas
  2006-03-21 14:32                                           ` Ingo Molnar
@ 2006-03-21 14:36                                           ` Mike Galbraith
  2006-03-21 14:39                                             ` Con Kolivas
  1 sibling, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-21 14:36 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wed, 2006-03-22 at 01:30 +1100, Con Kolivas wrote:
> On Wednesday 22 March 2006 01:28, Mike Galbraith wrote:
> > On Wed, 2006-03-22 at 01:19 +1100, Con Kolivas wrote:
> > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just
> > > blows my mind though for reasons I've just said. Where are the magic
> > > cycles going when nothing else is running that make it take ten times
> > > longer?
> >
> > What I was talking about when I mentioned scrolling was rendering.
> 
> I'm talking about the long standing report that 'ls' takes 10 times longer on 
> 2.6 90% of the time you run it, and doing 'ls | cat' makes it run as fast as 
> 2.4. This is what Willy has been fighting with.

Oh.  I thought you were calling me a _moron_ :)

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:19                                     ` Con Kolivas
  2006-03-21 14:25                                       ` Ingo Molnar
  2006-03-21 14:28                                       ` Mike Galbraith
@ 2006-03-21 14:39                                       ` Willy Tarreau
  2006-03-21 18:39                                         ` Rafael J. Wysocki
  2 siblings, 1 reply; 112+ messages in thread
From: Willy Tarreau @ 2006-03-21 14:39 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wed, Mar 22, 2006 at 01:19:49AM +1100, Con Kolivas wrote:
> On Wednesday 22 March 2006 01:17, Mike Galbraith wrote:
> > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote:
> > > The yardstick for changes is now the speed of 'ls' scrolling in the
> > > console. Where exactly are those extra cycles going I wonder? Do you
> > > think the scheduler somehow makes the cpu idle doing nothing in that
> > > timespace? Clearly that's not true, and userspace is making something
> > > spin unnecessarily, but we're gonna fix that by modifying the
> > > scheduler.... sigh
> >
> > *Blink*
> >
> > Are you having a bad hair day??
> 
> My hair is approximately 3mm long so it's kinda hard for that to happen. 
> 
> What you're fixing with unfairness is worth pursuing. The 'ls' issue just 
> blows my mind though for reasons I've just said. Where are the magic cycles 
> going when nothing else is running that make it take ten times longer?

Con, those cycles are not "magic", if you look at the numbers, the time is
not spent in the process itself. From what has been observed since the
beginning, it is spent :
  - in other processes which are starvating the CPU (eg: X11 when xterm
    scrolls)
  - in context switches when you have a pipe somewhere and the CPU is
    bouncing between tasks.

Concerning your angriness about me being OK with (0,0) and still
asking for tunables, it's precisely because I know that *my* workload
is not everyone else's, and I don't want to conclude too quickly that
there are only two types of workloads. Maybe you're right, maybe you're
wrong. At least you're right for as long as no other workload has been
identified. But thinking like this is like some time ago when we thought
that "if it runs XMMS without skipping, it'll be OK for everyone".

> Cheers,
> Con

Cheers,
Willy


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:36                                           ` Mike Galbraith
@ 2006-03-21 14:39                                             ` Con Kolivas
  0 siblings, 0 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-21 14:39 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wednesday 22 March 2006 01:36, Mike Galbraith wrote:
> Oh.  I thought you were calling me a _moron_ :)

No, never assume any emotion in email and I'm sorry if you interpreted it that 
way.

Since I run my own mailing list I had to make a FAQ on this.
http://ck.kolivas.org/faqs/replying-to-mailing-list.txt

Extract:
4. Be polite

	Humans by nature don't realise how much they depend on seeing facial
expressions, voice intonations and body language to determine the emotion
associated with words. In the context of email it is very common to
misinterpret people's emotions based on the text alone. English subtleties
will often be misinterpreted even across English speaking nations, and for
non-English speakers it becomes much harder. Without the author explicitly
stating his emotions, assume neutrality and respond politely.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:32                                           ` Ingo Molnar
@ 2006-03-21 14:44                                             ` Willy Tarreau
  2006-03-21 14:52                                               ` Ingo Molnar
  0 siblings, 1 reply; 112+ messages in thread
From: Willy Tarreau @ 2006-03-21 14:44 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, Mike Galbraith, lkml, Andrew Morton, bugsplatter

On Tue, Mar 21, 2006 at 03:32:40PM +0100, Ingo Molnar wrote:
> 
> * Con Kolivas <kernel@kolivas.org> wrote:
> 
> > On Wednesday 22 March 2006 01:28, Mike Galbraith wrote:
> > > On Wed, 2006-03-22 at 01:19 +1100, Con Kolivas wrote:
> > > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just
> > > > blows my mind though for reasons I've just said. Where are the magic
> > > > cycles going when nothing else is running that make it take ten times
> > > > longer?
> > >
> > > What I was talking about when I mentioned scrolling was rendering.
> > 
> > I'm talking about the long standing report that 'ls' takes 10 times 
> > longer on 2.6 90% of the time you run it, and doing 'ls | cat' makes 
> > it run as fast as 2.4. This is what Willy has been fighting with.
> 
> ah. That's i think a gnome-terminal artifact - it does some really 
> stupid dynamic things while rendering, it 'skips' certain portions of 
> rendering, depending on the speed of scrolling. Gnome 2.14 ought to have 
> that fixed i think.

Ah no, I never use those montruous environments ! xterm is already heavy.
don't you remember, we found that doing "ls" in an xterm was waking the
xterm process for every single line, which in turn woke the X server for
a one-line scroll, while adding the "|cat" acted like a buffer with batched
scrolls. Newer xterms have been improved to trigger jump scroll earlier and
don't exhibit this behaviour even on non-patched kernels. However, sshd
still shows the same problem IMHO.

> 	Ingo

Cheers,
Willy


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:44                                             ` Willy Tarreau
@ 2006-03-21 14:52                                               ` Ingo Molnar
  2006-03-29  3:01                                                 ` Lee Revell
  0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2006-03-21 14:52 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Con Kolivas, Mike Galbraith, lkml, Andrew Morton, bugsplatter


* Willy Tarreau <willy@w.ods.org> wrote:

> Ah no, I never use those montruous environments ! xterm is already 
> heavy. [...]

[ offtopic note: gnome-terminal developers claim some massive speedups
  in Gnome 2.14, and my experiments on Fedora rawhide seem to 
  corraborate that - gnome-term is now faster (for me) than xterm. ]

> [...] don't you remember, we found that doing "ls" in an xterm was 
> waking the xterm process for every single line, which in turn woke the 
> X server for a one-line scroll, while adding the "|cat" acted like a 
> buffer with batched scrolls. Newer xterms have been improved to 
> trigger jump scroll earlier and don't exhibit this behaviour even on 
> non-patched kernels. However, sshd still shows the same problem IMHO.

yeah. The "|cat" changes the workload, which gets rated by the scheduler 
differently. Such artifacts are inevitable once interactivity heuristics 
are strong enough to significantly distort the equal sharing of CPU 
time.

	Ingo

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:17                                           ` Con Kolivas
@ 2006-03-21 15:20                                             ` Con Kolivas
  2006-03-21 17:50                                               ` Willy Tarreau
  2006-03-21 17:51                                               ` Mike Galbraith
  0 siblings, 2 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-21 15:20 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wednesday 22 March 2006 01:17, Con Kolivas wrote:
> I actually believe the same effect can be had by a tiny 
> modification to enable/disable the estimator anyway.

Just for argument's sake it would look something like this.

Cheers,
Con
---
Add sysctl to enable/disable cpu scheduer interactivity estimator

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 include/linux/sched.h  |    1 +
 include/linux/sysctl.h |    1 +
 kernel/sched.c         |   14 +++++++++++---
 kernel/sysctl.c        |    8 ++++++++
 4 files changed, 21 insertions(+), 3 deletions(-)

Index: linux-2.6.16-rc6-mm2/include/linux/sched.h
===================================================================
--- linux-2.6.16-rc6-mm2.orig/include/linux/sched.h	2006-03-19 11:15:27.000000000 +1100
+++ linux-2.6.16-rc6-mm2/include/linux/sched.h	2006-03-22 02:13:55.000000000 +1100
@@ -104,6 +104,7 @@ extern unsigned long nr_uninterruptible(
 extern unsigned long nr_active(void);
 extern unsigned long nr_iowait(void);
 extern unsigned long weighted_cpuload(const int cpu);
+extern int sched_interactive;
 
 #include <linux/time.h>
 #include <linux/param.h>
Index: linux-2.6.16-rc6-mm2/include/linux/sysctl.h
===================================================================
--- linux-2.6.16-rc6-mm2.orig/include/linux/sysctl.h	2006-03-19 11:15:27.000000000 +1100
+++ linux-2.6.16-rc6-mm2/include/linux/sysctl.h	2006-03-22 02:14:43.000000000 +1100
@@ -148,6 +148,7 @@ enum
 	KERN_SPIN_RETRY=70,	/* int: number of spinlock retries */
 	KERN_ACPI_VIDEO_FLAGS=71, /* int: flags for setting up video after ACPI sleep */
 	KERN_IA64_UNALIGNED=72, /* int: ia64 unaligned userland trap enable */
+	KERN_INTERACTIVE=73,	/* int: enable/disable interactivity estimator */
 };
 
 
Index: linux-2.6.16-rc6-mm2/kernel/sched.c
===================================================================
--- linux-2.6.16-rc6-mm2.orig/kernel/sched.c	2006-03-19 15:41:08.000000000 +1100
+++ linux-2.6.16-rc6-mm2/kernel/sched.c	2006-03-22 02:13:56.000000000 +1100
@@ -128,6 +128,9 @@
  * too hard.
  */
 
+/* Sysctl enable/disable interactive estimator */
+int sched_interactive __read_mostly = 1;
+
 #define CURRENT_BONUS(p) \
 	(NS_TO_JIFFIES((p)->sleep_avg) * MAX_BONUS / \
 		MAX_SLEEP_AVG)
@@ -151,7 +154,8 @@
 		INTERACTIVE_DELTA)
 
 #define TASK_INTERACTIVE(p) \
-	((p)->prio <= (p)->static_prio - DELTA(p))
+	((p)->prio <= (p)->static_prio - DELTA(p) && \
+		sched_interactive)
 
 #define INTERACTIVE_SLEEP(p) \
 	(JIFFIES_TO_NS(MAX_SLEEP_AVG * \
@@ -662,9 +666,13 @@ static int effective_prio(task_t *p)
 	if (rt_task(p))
 		return p->prio;
 
-	bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;
+	prio = p->static_prio;
+
+	if (sched_interactive) {
+		bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;
 
-	prio = p->static_prio - bonus;
+		prio -= bonus;
+	}
 	if (prio < MAX_RT_PRIO)
 		prio = MAX_RT_PRIO;
 	if (prio > MAX_PRIO-1)
Index: linux-2.6.16-rc6-mm2/kernel/sysctl.c
===================================================================
--- linux-2.6.16-rc6-mm2.orig/kernel/sysctl.c	2006-03-19 11:15:27.000000000 +1100
+++ linux-2.6.16-rc6-mm2/kernel/sysctl.c	2006-03-22 02:15:23.000000000 +1100
@@ -684,6 +684,14 @@ static ctl_table kern_table[] = {
 		.proc_handler	= &proc_dointvec,
 	},
 #endif
+	{
+		.ctl_name	= KERN_SCHED_INTERACTIVE,
+		.procname	= "interactive",
+		.data		= &sched_interactive,
+		.maxlen		= sizeof (int),
+	 	.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
 	{ .ctl_name = 0 }
 };
 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 15:20                                             ` Con Kolivas
@ 2006-03-21 17:50                                               ` Willy Tarreau
  2006-03-22  4:18                                                 ` Mike Galbraith
  2006-03-21 17:51                                               ` Mike Galbraith
  1 sibling, 1 reply; 112+ messages in thread
From: Willy Tarreau @ 2006-03-21 17:50 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Mike Galbraith, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wed, Mar 22, 2006 at 02:20:10AM +1100, Con Kolivas wrote:
> On Wednesday 22 March 2006 01:17, Con Kolivas wrote:
> > I actually believe the same effect can be had by a tiny 
> > modification to enable/disable the estimator anyway.
> 
> Just for argument's sake it would look something like this.
> 
> Cheers,
> Con
> ---
> Add sysctl to enable/disable cpu scheduer interactivity estimator

At least, in May 2005, the equivalent of this patch I tested on
2.6.11.7 considerably improved responsiveness, but there was still
this very annoying slowdown when the load increased. vmstat delays
increased by one second every 10 processes. I retried again around
2.6.14 a few months ago, and it was the same. Perhaps Mike's code
and other changes in 2.6-mm really fix the initial problem (array
switching ?) and then only the interactivity boost is causing the
remaining trouble ?

Cheers,
Willy


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 15:20                                             ` Con Kolivas
  2006-03-21 17:50                                               ` Willy Tarreau
@ 2006-03-21 17:51                                               ` Mike Galbraith
  1 sibling, 0 replies; 112+ messages in thread
From: Mike Galbraith @ 2006-03-21 17:51 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Wed, 2006-03-22 at 02:20 +1100, Con Kolivas wrote:
> On Wednesday 22 March 2006 01:17, Con Kolivas wrote:
> > I actually believe the same effect can be had by a tiny 
> > modification to enable/disable the estimator anyway.
> 
> Just for argument's sake it would look something like this.

That won't have the same effect.  What you disabled isn't only about
interactivity.   It's also about preemption, throughput and fairness.

	-Mike

(we now interrupt this thread for an evening of real life;)


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:39                                       ` Willy Tarreau
@ 2006-03-21 18:39                                         ` Rafael J. Wysocki
  2006-03-21 19:32                                           ` Willy Tarreau
  0 siblings, 1 reply; 112+ messages in thread
From: Rafael J. Wysocki @ 2006-03-21 18:39 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Con Kolivas, Mike Galbraith, Ingo Molnar, lkml, Andrew Morton,
	bugsplatter

On Tuesday 21 March 2006 15:39, Willy Tarreau wrote:
> On Wed, Mar 22, 2006 at 01:19:49AM +1100, Con Kolivas wrote:
> > On Wednesday 22 March 2006 01:17, Mike Galbraith wrote:
> > > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote:
> > > > The yardstick for changes is now the speed of 'ls' scrolling in the
> > > > console. Where exactly are those extra cycles going I wonder? Do you
> > > > think the scheduler somehow makes the cpu idle doing nothing in that
> > > > timespace? Clearly that's not true, and userspace is making something
> > > > spin unnecessarily, but we're gonna fix that by modifying the
> > > > scheduler.... sigh
> > >
> > > *Blink*
> > >
> > > Are you having a bad hair day??
> > 
> > My hair is approximately 3mm long so it's kinda hard for that to happen. 
> > 
> > What you're fixing with unfairness is worth pursuing. The 'ls' issue just 
> > blows my mind though for reasons I've just said. Where are the magic cycles 
> > going when nothing else is running that make it take ten times longer?
> 
> Con, those cycles are not "magic", if you look at the numbers, the time is
> not spent in the process itself. From what has been observed since the
> beginning, it is spent :
>   - in other processes which are starvating the CPU (eg: X11 when xterm
>     scrolls)
>   - in context switches when you have a pipe somewhere and the CPU is
>     bouncing between tasks.
> 
> Concerning your angriness about me being OK with (0,0) and still
> asking for tunables, it's precisely because I know that *my* workload
> is not everyone else's, and I don't want to conclude too quickly that
> there are only two types of workloads.

Well, perhaps we can assume there are only two types of workloads and
wait for a test case that will show the assumption is wrong?

> Maybe you're right, maybe you're wrong. At least you're right for as long
> as no other workload has been identified. But thinking like this is like
> some time ago when we thought that "if it runs XMMS without skipping,
> it'll be OK for everyone".

However, we should not try to anticipate every possible kind of workload
IMHO.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 18:39                                         ` Rafael J. Wysocki
@ 2006-03-21 19:32                                           ` Willy Tarreau
  2006-03-21 21:47                                             ` Rafael J. Wysocki
  0 siblings, 1 reply; 112+ messages in thread
From: Willy Tarreau @ 2006-03-21 19:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Con Kolivas, Mike Galbraith, Ingo Molnar, lkml, Andrew Morton,
	bugsplatter

On Tue, Mar 21, 2006 at 07:39:11PM +0100, Rafael J. Wysocki wrote:
> On Tuesday 21 March 2006 15:39, Willy Tarreau wrote:
> > On Wed, Mar 22, 2006 at 01:19:49AM +1100, Con Kolivas wrote:
> > > On Wednesday 22 March 2006 01:17, Mike Galbraith wrote:
> > > > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote:
> > > > > The yardstick for changes is now the speed of 'ls' scrolling in the
> > > > > console. Where exactly are those extra cycles going I wonder? Do you
> > > > > think the scheduler somehow makes the cpu idle doing nothing in that
> > > > > timespace? Clearly that's not true, and userspace is making something
> > > > > spin unnecessarily, but we're gonna fix that by modifying the
> > > > > scheduler.... sigh
> > > >
> > > > *Blink*
> > > >
> > > > Are you having a bad hair day??
> > > 
> > > My hair is approximately 3mm long so it's kinda hard for that to happen. 
> > > 
> > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just 
> > > blows my mind though for reasons I've just said. Where are the magic cycles 
> > > going when nothing else is running that make it take ten times longer?
> > 
> > Con, those cycles are not "magic", if you look at the numbers, the time is
> > not spent in the process itself. From what has been observed since the
> > beginning, it is spent :
> >   - in other processes which are starvating the CPU (eg: X11 when xterm
> >     scrolls)
> >   - in context switches when you have a pipe somewhere and the CPU is
> >     bouncing between tasks.
> > 
> > Concerning your angriness about me being OK with (0,0) and still
> > asking for tunables, it's precisely because I know that *my* workload
> > is not everyone else's, and I don't want to conclude too quickly that
> > there are only two types of workloads.
> 
> Well, perhaps we can assume there are only two types of workloads and
> wait for a test case that will show the assumption is wrong?

It would certainly fit most usages, but as soon as we find another group
of users complaining, we will add another sysctl just for them ? Perhaps
we could just resume the two current sysctls into one called
"interactivity_boost" with a value between 0 and 100, with the ability
for any user to increase or decrease it easily ? Mainline would be
pre-configured with something reasonable, like what Mike proposed as
default values for example, and server admins would only set it to
zero while desktop-intensive users could increase it a bit if they like
to.

> > Maybe you're right, maybe you're wrong. At least you're right for as long
> > as no other workload has been identified. But thinking like this is like
> > some time ago when we thought that "if it runs XMMS without skipping,
> > it'll be OK for everyone".
> 
> However, we should not try to anticipate every possible kind of workload
> IMHO.

I generally agree on this, except that we got caught once in this area for
this exact reason.

> Greetings,
> Rafael

Regards,
Willy


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 19:32                                           ` Willy Tarreau
@ 2006-03-21 21:47                                             ` Rafael J. Wysocki
  0 siblings, 0 replies; 112+ messages in thread
From: Rafael J. Wysocki @ 2006-03-21 21:47 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Con Kolivas, Mike Galbraith, Ingo Molnar, lkml, Andrew Morton,
	bugsplatter

On Tuesday 21 March 2006 20:32, Willy Tarreau wrote:
> On Tue, Mar 21, 2006 at 07:39:11PM +0100, Rafael J. Wysocki wrote:
> > On Tuesday 21 March 2006 15:39, Willy Tarreau wrote:
> > > On Wed, Mar 22, 2006 at 01:19:49AM +1100, Con Kolivas wrote:
> > > > On Wednesday 22 March 2006 01:17, Mike Galbraith wrote:
> > > > > On Wed, 2006-03-22 at 00:53 +1100, Con Kolivas wrote:
> > > > > > The yardstick for changes is now the speed of 'ls' scrolling in the
> > > > > > console. Where exactly are those extra cycles going I wonder? Do you
> > > > > > think the scheduler somehow makes the cpu idle doing nothing in that
> > > > > > timespace? Clearly that's not true, and userspace is making something
> > > > > > spin unnecessarily, but we're gonna fix that by modifying the
> > > > > > scheduler.... sigh
> > > > >
> > > > > *Blink*
> > > > >
> > > > > Are you having a bad hair day??
> > > > 
> > > > My hair is approximately 3mm long so it's kinda hard for that to happen. 
> > > > 
> > > > What you're fixing with unfairness is worth pursuing. The 'ls' issue just 
> > > > blows my mind though for reasons I've just said. Where are the magic cycles 
> > > > going when nothing else is running that make it take ten times longer?
> > > 
> > > Con, those cycles are not "magic", if you look at the numbers, the time is
> > > not spent in the process itself. From what has been observed since the
> > > beginning, it is spent :
> > >   - in other processes which are starvating the CPU (eg: X11 when xterm
> > >     scrolls)
> > >   - in context switches when you have a pipe somewhere and the CPU is
> > >     bouncing between tasks.
> > > 
> > > Concerning your angriness about me being OK with (0,0) and still
> > > asking for tunables, it's precisely because I know that *my* workload
> > > is not everyone else's, and I don't want to conclude too quickly that
> > > there are only two types of workloads.
> > 
> > Well, perhaps we can assume there are only two types of workloads and
> > wait for a test case that will show the assumption is wrong?
> 
> It would certainly fit most usages, but as soon as we find another group
> of users complaining, we will add another sysctl just for them ? Perhaps
> we could just resume the two current sysctls into one called
> "interactivity_boost" with a value between 0 and 100, with the ability
> for any user to increase or decrease it easily ? Mainline would be
> pre-configured with something reasonable, like what Mike proposed as
> default values for example, and server admins would only set it to
> zero while desktop-intensive users could increase it a bit if they like
> to.

Sounds reasonable to me.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 13:24                               ` Mike Galbraith
  2006-03-21 13:53                                 ` Con Kolivas
@ 2006-03-21 22:51                                 ` Peter Williams
  2006-03-22  3:49                                   ` Mike Galbraith
  1 sibling, 1 reply; 112+ messages in thread
From: Peter Williams @ 2006-03-21 22:51 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, Con Kolivas,
	bugsplatter

Mike Galbraith wrote:
> On Tue, 2006-03-21 at 13:59 +0100, Willy Tarreau wrote:
> 
>>On Tue, Mar 21, 2006 at 01:07:58PM +0100, Mike Galbraith wrote:
> 
> 
>>>I can make the knobs compile time so we don't see random behavior
>>>reports, but I don't think they can be totally eliminated.  Would that
>>>be sufficient?
>>>
>>>If so, the numbers as delivered should be fine for desktop boxen I
>>>think.  People who are building custom kernels can bend to fit as
>>>always.
>>
>>That would suit me perfectly. I think I would set them both to zero.
>>It's not clear to me what workload they can help, it seems that they
>>try to allow a sometimes unfair scheduling.
> 
> 
> Correct.  Massively unfair scheduling is what interactivity requires.
> 

Selective unfairness not massive unfairness is what's required.  The 
hard part is automating the selectiveness especially when there are 
three quite different types of task that need special treatment: 1) the 
X server, 2) normal interactive tasks and 3) media streamers; each of 
which has different behavioural characteristics.  A single mechanism 
that classifies all of these as "interactive" will unfortunately catch a 
lot of tasks that don't belong to any one of these types.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 22:51                                 ` Peter Williams
@ 2006-03-22  3:49                                   ` Mike Galbraith
  2006-03-22  3:59                                     ` Peter Williams
  2006-03-22 12:14                                     ` [interbench numbers] " Mike Galbraith
  0 siblings, 2 replies; 112+ messages in thread
From: Mike Galbraith @ 2006-03-22  3:49 UTC (permalink / raw)
  To: Peter Williams
  Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, Con Kolivas,
	bugsplatter

On Wed, 2006-03-22 at 09:51 +1100, Peter Williams wrote:
> Mike Galbraith wrote:
> > On Tue, 2006-03-21 at 13:59 +0100, Willy Tarreau wrote:
> >>That would suit me perfectly. I think I would set them both to zero.
> >>It's not clear to me what workload they can help, it seems that they
> >>try to allow a sometimes unfair scheduling.
> > 
> > 
> > Correct.  Massively unfair scheduling is what interactivity requires.
> > 
> 
> Selective unfairness not massive unfairness is what's required.  The 
> hard part is automating the selectiveness especially when there are 
> three quite different types of task that need special treatment: 1) the 
> X server, 2) normal interactive tasks and 3) media streamers; each of 
> which has different behavioural characteristics.  A single mechanism 
> that classifies all of these as "interactive" will unfortunately catch a 
> lot of tasks that don't belong to any one of these types.

Yes, selective would be nice, but it's still massively unfair that is
required.  There is no criteria available for discrimination, so my
patches don't even try to classify, they only enforce the rules.  I
don't classify X as interactive, I merely provide a mechanism which
enables X to accumulate the cycles an interactive task needs to be able
to perform by actually _being_ interactive, by conforming to the
definition of sleep_avg.  Fortunately, it uses that mechanism.  I do
nothing more than trade stout rope for good behavior.  I anchor one end
to a boulder, the other to a task's neck.  The mechanism is agnostic.
The task determines whether it gets hung or not, and the user determines
how long the rope is.

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-22  3:49                                   ` Mike Galbraith
@ 2006-03-22  3:59                                     ` Peter Williams
  2006-03-22 12:14                                     ` [interbench numbers] " Mike Galbraith
  1 sibling, 0 replies; 112+ messages in thread
From: Peter Williams @ 2006-03-22  3:59 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, Con Kolivas,
	bugsplatter

Mike Galbraith wrote:
> On Wed, 2006-03-22 at 09:51 +1100, Peter Williams wrote:
> 
>>Mike Galbraith wrote:
>>
>>>On Tue, 2006-03-21 at 13:59 +0100, Willy Tarreau wrote:
>>>
>>>>That would suit me perfectly. I think I would set them both to zero.
>>>>It's not clear to me what workload they can help, it seems that they
>>>>try to allow a sometimes unfair scheduling.
>>>
>>>
>>>Correct.  Massively unfair scheduling is what interactivity requires.
>>>
>>
>>Selective unfairness not massive unfairness is what's required.  The 
>>hard part is automating the selectiveness especially when there are 
>>three quite different types of task that need special treatment: 1) the 
>>X server, 2) normal interactive tasks and 3) media streamers; each of 
>>which has different behavioural characteristics.  A single mechanism 
>>that classifies all of these as "interactive" will unfortunately catch a 
>>lot of tasks that don't belong to any one of these types.
> 
> 
> Yes, selective would be nice, but it's still massively unfair that is
> required.  There is no criteria available for discrimination, so my
> patches don't even try to classify, they only enforce the rules.  I
> don't classify X as interactive, I merely provide a mechanism which
> enables X to accumulate the cycles an interactive task needs to be able
> to perform by actually _being_ interactive, by conforming to the
> definition of sleep_avg.

That's what I mean by classification :-)

>  Fortunately, it uses that mechanism.  I do
> nothing more than trade stout rope for good behavior.  I anchor one end
> to a boulder, the other to a task's neck.  The mechanism is agnostic.
> The task determines whether it gets hung or not, and the user determines
> how long the rope is.

I view that as a modification (hopefully an improvement) of the 
classification rules :-).  In particular, a variation in the persistence 
of a classification and the criteria for losing/downgrading it.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 17:50                                               ` Willy Tarreau
@ 2006-03-22  4:18                                                 ` Mike Galbraith
  0 siblings, 0 replies; 112+ messages in thread
From: Mike Galbraith @ 2006-03-22  4:18 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Con Kolivas, Ingo Molnar, lkml, Andrew Morton, bugsplatter

On Tue, 2006-03-21 at 18:50 +0100, Willy Tarreau wrote:
> On Wed, Mar 22, 2006 at 02:20:10AM +1100, Con Kolivas wrote:
> > On Wednesday 22 March 2006 01:17, Con Kolivas wrote:
> > > I actually believe the same effect can be had by a tiny 
> > > modification to enable/disable the estimator anyway.
> > 
> > Just for argument's sake it would look something like this.
> > 
> > Cheers,
> > Con
> > ---
> > Add sysctl to enable/disable cpu scheduer interactivity estimator
> 
> At least, in May 2005, the equivalent of this patch I tested on
> 2.6.11.7 considerably improved responsiveness, but there was still
> this very annoying slowdown when the load increased. vmstat delays
> increased by one second every 10 processes. I retried again around
> 2.6.14 a few months ago, and it was the same. Perhaps Mike's code
> and other changes in 2.6-mm really fix the initial problem (array
> switching ?) and then only the interactivity boost is causing the
> remaining trouble ?

The slowdown you see is because a timeslice is 100ms, and that patch
turned the scheduler into a non-preempting pure round-robin slug.

Array switching is only one aspect, and one I hadn't thought of as I was
tinkering with my patches, I discovered that aspect by accident.

My code does a few things, and all of them are part of the picture.  One
of them is to deal with excessive interactive boost.  Another is to
tighten timeslice enforcement, and another is to close the fundamental
hole in the concept sleep_avg.  That hole is causing the majority of the
problems that crop up, the interactivity bits only make it worse.  The
hole is this.  If priority is based solely upon % sleep time, even if
there is no interactive boost, even if accumulation vs consumption is
1:1, if you sleep 51% of the time, you will inevitably rise to max
priority, and be able to use 49% of the CPU at max priority forever.
The current heuristics make that very close to but not quite 95%.

The fact that we don't have _horrendous_ problems shows that the basic
concept of sleep_avg is pretty darn good.  Close the hole in any way you
can think of (mine is one), and it's excellent.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [interbench numbers] Re: interactive task starvation
  2006-03-22  3:49                                   ` Mike Galbraith
  2006-03-22  3:59                                     ` Peter Williams
@ 2006-03-22 12:14                                     ` Mike Galbraith
  2006-03-22 20:27                                       ` Con Kolivas
  1 sibling, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-22 12:14 UTC (permalink / raw)
  To: lkml
  Cc: Willy Tarreau, Ingo Molnar, lkml, Andrew Morton, Con Kolivas,
	bugsplatter, Peter Williams

Greetings,

I was asked to do some interbench runs, with various throttle settings,
see below.  I'll not attempt to interpret results, only present raw data
for others to examine.

Tested throttling patches version is V24, because while I was compiling
2.6.16-rc6-mm2 in preparation for comparison, I found I'd introduced an
SMP buglet in V23.  Something good came from the added testing whether
the results are informative or not :)

	-Mike

1. virgin 2.6.16-rc6-mm2.

Using 1975961 loops per ms, running every load for 30 seconds
Benchmarking kernel 2.6.16-rc6-mm2-smp at datestamp 200603221223

--- Benchmarking simulated cpu of Audio in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.024 +/- 0.0486         1		 100	        100
Video	  0.996 +/- 1.31        6.05		 100	        100
X	  0.336 +/- 0.739       5.01		 100	        100
Burn	  0.028 +/- 0.0905      2.05		 100	        100
Write	  0.058 +/- 0.508       12.1		 100	        100
Read	  0.043 +/- 0.115       1.66		 100	        100
Compile	  0.047 +/- 0.126       2.55		 100	        100
Memload	  0.258 +/- 4.57         112		99.8	       99.8

--- Benchmarking simulated cpu of Video in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.031 +/- 0.396       16.7		 100	       99.9
X	  0.722 +/- 3.35        30.7		 100	         97
Burn	  0.531 +/- 7.42         246		99.1	         98
Write	  0.302 +/- 2.31        40.4		99.9	       98.5
Read	  0.092 +/- 1.11        32.9		99.9	       99.7
Compile	  0.428 +/- 2.77        36.3		99.9	       97.9
Memload	  0.235 +/- 3.3          104		99.5	       99.1

--- Benchmarking simulated cpu of X in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	   1.25 +/- 6.46          70		85.8	       83.2
Video	   17.8 +/- 32            92		31.7	       22.3
Burn	   45.5 +/- 97.5         503		8.35	       4.22
Write	   3.55 +/- 12.2          66		79.9	       73.6
Read	  0.739 +/- 3.04          20		87.4	         83
Compile	   51.9 +/- 122          857		10.7	       5.34
Memload	   1.81 +/- 6.67          54		85.1	       78.3

--- Benchmarking simulated cpu of Gaming in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU
None	   8.65 +/- 14.8         116		  92
Video	   77.9 +/- 78.5         107		56.2
X	   64.2 +/- 72.9         124		60.9
Burn	    301 +/- 317          524		24.9
Write	   26.8 +/- 45.6         135		78.9
Read	   13.1 +/- 16.8        67.9		88.4
Compile	    478 +/- 519          765		17.3
Memload	   21.1 +/- 28.8         148		82.6


2. 2.6.16-rc6-mm2x with no throttling.

Using 1975961 loops per ms, running every load for 30 seconds
Benchmarking kernel 2.6.16-rc6-mm2x-smp at datestamp 200603220914

--- Benchmarking simulated cpu of Audio in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.062 +/- 0.11        1.09		 100	        100
Video	   1.15 +/- 1.53        11.4		 100	        100
X	  0.223 +/- 0.609       6.09		 100	        100
Burn	  0.039 +/- 0.258       6.01		 100	        100
Write	  0.194 +/- 0.837         14		 100	        100
Read	   0.05 +/- 0.202       3.01		 100	        100
Compile	  0.216 +/- 1.36          19		 100	        100
Memload	  0.218 +/- 2.22        51.4		 100	       99.8

--- Benchmarking simulated cpu of Video in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.185 +/- 1.6         18.8		 100	       99.1
X	   1.27 +/- 4.47          27		 100	       94.3
Burn	   1.57 +/- 13.3         345		98.1	         93
Write	  0.819 +/- 3.76        34.7		99.9	         96
Read	  0.301 +/- 2.05        18.7		 100	       98.5
Compile	   4.22 +/- 12.9         233		92.4	       80.2
Memload	  0.624 +/- 3.46        66.7		99.6	         97

--- Benchmarking simulated cpu of X in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	   2.57 +/- 7.94          43		74.6	       67.7
Video	   17.6 +/- 32.2          99		31.2	       22.3
Burn	   40.1 +/- 79.4         716		12.9	       6.65
Write	   6.03 +/- 16.6          80		75.1	       64.6
Read	   2.52 +/- 7.49          42		74.8	       66.7
Compile	   54.1 +/- 79.3         410		15.6	       6.56
Memload	   2.08 +/- 6.93          48		77.3	       71.7

--- Benchmarking simulated cpu of Gaming in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU
None	   12.3 +/- 16.6        65.3		  89
Video	   78.7 +/- 79.4         109		  56
X	   70.6 +/- 78.2         128		58.6
Burn	    468 +/- 492          737		17.6
Write	   36.6 +/- 52.7         300		73.2
Read	   18.3 +/- 20.6        47.9		84.5
Compile	    468 +/- 486          802		17.6
Memload	   21.4 +/- 27           132		82.4


3. 2.6.16-rc6-mm2x with default settings.

Using 1975961 loops per ms, running every load for 30 seconds
Benchmarking kernel 2.6.16-rc6-mm2x-smp at datestamp 200603221006

--- Benchmarking simulated cpu of Audio in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.033 +/- 0.0989      1.05		 100	        100
Video	  0.859 +/- 1.17        7.45		 100	        100
X	  0.239 +/- 0.662        7.1		 100	        100
Burn	   0.06 +/- 0.382       7.86		 100	        100
Write	  0.123 +/- 0.422       4.12		 100	        100
Read	  0.045 +/- 0.103       1.18		 100	        100
Compile	  0.292 +/- 2.9         65.8		 100	       99.8
Memload	  0.256 +/- 3.78        91.8		 100	       99.8

--- Benchmarking simulated cpu of Video in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.101 +/- 1.06        16.7		 100	       99.6
X	   1.13 +/- 4.38        33.7		99.9	       95.2
Burn	   10.7 +/- 47.1         410		67.2	       64.7
Write	   1.17 +/- 10.9         417		98.2	       94.8
Read	  0.127 +/- 1.13        16.8		 100	       99.6
Compile	    8.6 +/- 32.6         200		70.7	       63.6
Memload	  0.512 +/- 3.32        83.5		99.7	       97.6

--- Benchmarking simulated cpu of X in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	    2.2 +/- 7.75          51		81.9	       74.9
Video	   15.8 +/- 29.4          81		  33	       23.9
Burn	   74.1 +/- 124          406		18.5	       9.57
Write	    4.6 +/- 14            86		  55	       48.5
Read	   1.75 +/- 5.16          26		80.7	       73.1
Compile	   71.2 +/- 124          468		21.8	       12.2
Memload	   2.95 +/- 9.31          70		75.6	       69.1

--- Benchmarking simulated cpu of Gaming in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU
None	   13.7 +/- 17.9        56.4		87.9
Video	   74.6 +/- 75.4        98.5		57.3
X	   68.2 +/- 76.1         128		59.4
Burn	    515 +/- 526          735		16.3
Write	   35.5 +/- 58.3         505		73.8
Read	   15.7 +/- 17.8        45.8		86.4
Compile	    436 +/- 453          863		18.7
Memload	   22.3 +/- 30.1         227		81.8


4. 2.6.16-rc6-mm2x with max throttling.

Using 1975961 loops per ms, running every load for 30 seconds
Benchmarking kernel 2.6.16-rc6-mm2x-smp at datestamp 200603220938

--- Benchmarking simulated cpu of Audio in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.035 +/- 0.118       2.01		 100	        100
Video	  0.043 +/- 0.231       5.02		 100	        100
X	  0.109 +/- 0.737       12.3		 100	        100
Burn	  0.072 +/- 0.574       9.78		 100	        100
Write	   0.11 +/- 0.367       4.14		 100	        100
Read	  0.052 +/- 0.141       2.02		 100	        100
Compile	    0.5 +/- 4.84         112		99.8	       99.8
Memload	  0.093 +/- 0.461       9.13		 100	        100

--- Benchmarking simulated cpu of Video in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.187 +/- 1.59        16.7		 100	       99.1
X	    2.4 +/- 6.26        32.8		99.9	         90
Burn	   59.7 +/- 130          478		27.1	       23.8
Write	   2.08 +/- 9.24         208		98.3	       90.5
Read	  0.154 +/- 1.3         18.8		 100	       99.4
Compile	   57.9 +/- 130          714		28.3	       22.4
Memload	  0.743 +/- 3.7         66.7		99.8	       96.3

--- Benchmarking simulated cpu of X in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	   1.73 +/- 6.46          42		74.4	         70
Video	   13.3 +/- 24.5          74		39.8	       29.2
Burn	    142 +/- 206          579		9.11	       4.69
Write	   4.51 +/- 14.1        88.4		61.4	       55.5
Read	   1.38 +/- 4.38          24		85.3	       78.3
Compile	    126 +/- 190          619		12.4	       6.51
Memload	   3.61 +/- 11.7          70		61.7	       55.8

--- Benchmarking simulated cpu of Gaming in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU
None	   12.9 +/- 16.5        67.6		88.6
Video	   67.7 +/- 69          97.3		59.6
X	   70.7 +/- 77.7         130		58.6
Burn	    355 +/- 367          625		  22
Write	   35.6 +/- 61.3         545		73.8
Read	   23.1 +/- 28.4         115		81.3
Compile	    467 +/- 485          793		17.6
Memload	   25.6 +/- 32.9         138		79.6






^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [interbench numbers] Re: interactive task starvation
  2006-03-22 12:14                                     ` [interbench numbers] " Mike Galbraith
@ 2006-03-22 20:27                                       ` Con Kolivas
  2006-03-23  3:22                                         ` Mike Galbraith
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-22 20:27 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter,
	Peter Williams

On Wednesday 22 March 2006 23:14, Mike Galbraith wrote:
> Greetings,
>
> I was asked to do some interbench runs, with various throttle settings,
> see below.  I'll not attempt to interpret results, only present raw data
> for others to examine.
>
> Tested throttling patches version is V24, because while I was compiling
> 2.6.16-rc6-mm2 in preparation for comparison, I found I'd introduced an
> SMP buglet in V23.  Something good came from the added testing whether
> the results are informative or not :)

Thanks!

I wonder why the results are affected even without any throttling settings but 
just patched in? Specifically I'm talking about deadlines met with video 
being sensitive to this. Were there any other config differences between the 
tests? Changing HZ would invalidate the results for example. Comments?

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [interbench numbers] Re: interactive task starvation
  2006-03-22 20:27                                       ` Con Kolivas
@ 2006-03-23  3:22                                         ` Mike Galbraith
  2006-03-23  5:43                                           ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-23  3:22 UTC (permalink / raw)
  To: Con Kolivas
  Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter,
	Peter Williams

On Thu, 2006-03-23 at 07:27 +1100, Con Kolivas wrote:
> On Wednesday 22 March 2006 23:14, Mike Galbraith wrote:
> > Greetings,
> >
> > I was asked to do some interbench runs, with various throttle settings,
> > see below.  I'll not attempt to interpret results, only present raw data
> > for others to examine.
> >
> > Tested throttling patches version is V24, because while I was compiling
> > 2.6.16-rc6-mm2 in preparation for comparison, I found I'd introduced an
> > SMP buglet in V23.  Something good came from the added testing whether
> > the results are informative or not :)
> 
> Thanks!
> 
> I wonder why the results are affected even without any throttling settings but 
> just patched in? Specifically I'm talking about deadlines met with video 
> being sensitive to this. Were there any other config differences between the 
> tests? Changing HZ would invalidate the results for example. Comments?

I wondered the same.  The only difference then is the lower idle sleep
prio, tighter timeslice enforcement, and the SMP buglet fix for now <
p->timestamp due to SMP rounding.  Configs are identical.

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [interbench numbers] Re: interactive task starvation
  2006-03-23  3:22                                         ` Mike Galbraith
@ 2006-03-23  5:43                                           ` Con Kolivas
  2006-03-23  5:53                                             ` Mike Galbraith
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-23  5:43 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter,
	Peter Williams

On Thu, 23 Mar 2006 02:22 pm, Mike Galbraith wrote:
> On Thu, 2006-03-23 at 07:27 +1100, Con Kolivas wrote:
> > I wonder why the results are affected even without any throttling
> > settings but just patched in? Specifically I'm talking about deadlines
> > met with video being sensitive to this. Were there any other config
> > differences between the tests? Changing HZ would invalidate the results
> > for example. Comments?
>
> I wondered the same.  The only difference then is the lower idle sleep
> prio, tighter timeslice enforcement, and the SMP buglet fix for now <
> p->timestamp due to SMP rounding.  Configs are identical.

Ok well if we're going to run with this set of changes then we need to assess 
the affect of each change and splitting them up into separate patches would 
be appropriate normally anyway. That will allow us to track down which 
particular patch causes it. That won't mean we will turn down the change 
based on that one result, though, it will just help us understand it better.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [interbench numbers] Re: interactive task starvation
  2006-03-23  5:43                                           ` Con Kolivas
@ 2006-03-23  5:53                                             ` Mike Galbraith
  2006-03-23 11:07                                               ` Mike Galbraith
  0 siblings, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-23  5:53 UTC (permalink / raw)
  To: Con Kolivas
  Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter,
	Peter Williams

On Thu, 2006-03-23 at 16:43 +1100, Con Kolivas wrote:
> On Thu, 23 Mar 2006 02:22 pm, Mike Galbraith wrote:
> > On Thu, 2006-03-23 at 07:27 +1100, Con Kolivas wrote:
> > > I wonder why the results are affected even without any throttling
> > > settings but just patched in? Specifically I'm talking about deadlines
> > > met with video being sensitive to this. Were there any other config
> > > differences between the tests? Changing HZ would invalidate the results
> > > for example. Comments?
> >
> > I wondered the same.  The only difference then is the lower idle sleep
> > prio, tighter timeslice enforcement, and the SMP buglet fix for now <
> > p->timestamp due to SMP rounding.  Configs are identical.
> 
> Ok well if we're going to run with this set of changes then we need to assess 
> the affect of each change and splitting them up into separate patches would 
> be appropriate normally anyway. That will allow us to track down which 
> particular patch causes it. That won't mean we will turn down the change 
> based on that one result, though, it will just help us understand it better.

I'm investigating now.

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [interbench numbers] Re: interactive task starvation
  2006-03-23  5:53                                             ` Mike Galbraith
@ 2006-03-23 11:07                                               ` Mike Galbraith
  2006-03-24  0:21                                                 ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-23 11:07 UTC (permalink / raw)
  To: Con Kolivas
  Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter,
	Peter Williams

On Thu, 2006-03-23 at 06:53 +0100, Mike Galbraith wrote:
> On Thu, 2006-03-23 at 16:43 +1100, Con Kolivas wrote:
> > On Thu, 23 Mar 2006 02:22 pm, Mike Galbraith wrote:
> > > On Thu, 2006-03-23 at 07:27 +1100, Con Kolivas wrote:
> > > > I wonder why the results are affected even without any throttling
> > > > settings but just patched in? Specifically I'm talking about deadlines
> > > > met with video being sensitive to this. Were there any other config
> > > > differences between the tests? Changing HZ would invalidate the results
> > > > for example. Comments?
> > >
> > > I wondered the same.  The only difference then is the lower idle sleep
> > > prio, tighter timeslice enforcement, and the SMP buglet fix for now <
> > > p->timestamp due to SMP rounding.  Configs are identical.
> > 
> > Ok well if we're going to run with this set of changes then we need to assess 
> > the affect of each change and splitting them up into separate patches would 
> > be appropriate normally anyway. That will allow us to track down which 
> > particular patch causes it. That won't mean we will turn down the change 
> > based on that one result, though, it will just help us understand it better.
> 
> I'm investigating now.

Nothing conclusive.  Some of the difference may be because interbench
has a dependency on the idle sleep path popping tasks in a prio 16
instead of 18.  Some of it may be because I'm not restricting IO, doing
that makes a bit of difference.  Some of it is definitely plain old
jitter.

Six hours is long enough.  I'm all done chasing interbench numbers.

	-Mike

virgin

--- Benchmarking simulated cpu of Video in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.031 +/- 0.396       16.7		 100	       99.9
X	  0.722 +/- 3.35        30.7		 100	         97
Burn	  0.531 +/- 7.42         246		99.1	         98
Write	  0.302 +/- 2.31        40.4		99.9	       98.5
Read	  0.092 +/- 1.11        32.9		99.9	       99.7
Compile	  0.428 +/- 2.77        36.3		99.9	       97.9
Memload	  0.235 +/- 3.3          104		99.5	       99.1

throttle patches with throttling disabled

--- Benchmarking simulated cpu of Video in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.185 +/- 1.6         18.8		 100	       99.1
X	   1.27 +/- 4.47          27		 100	       94.3
Burn	   1.57 +/- 13.3         345		98.1	         93
Write	  0.819 +/- 3.76        34.7		99.9	         96
Read	  0.301 +/- 2.05        18.7		 100	       98.5
Compile	   4.22 +/- 12.9         233		92.4	       80.2
Memload	  0.624 +/- 3.46        66.7		99.6	         97

minus idle sleep

--- Benchmarking simulated cpu of Video in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.222 +/- 1.82        16.8		 100	       98.8
X	   1.02 +/- 3.9         30.7		 100	       95.7
Burn	  0.208 +/- 3.67         141		99.8	       99.3
Write	  0.755 +/- 3.62        37.2		99.9	       96.4
Read	  0.265 +/- 1.94        16.9		 100	       98.6
Compile	   2.16 +/- 15.2         333		96.7	       90.7
Memload	  0.723 +/- 3.5         37.4		99.8	       96.3

minus don't restrict IO

--- Benchmarking simulated cpu of Video in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	  0.226 +/- 1.82        16.8		 100	       98.8
X	   1.38 +/- 4.68        49.4		99.9	       93.9
Burn	  0.513 +/- 9.62         339		98.8	       98.4
Write	  0.418 +/- 2.7         30.8		99.9	       97.9
Read	  0.565 +/- 2.99        16.7		 100	       96.8
Compile	   1.05 +/- 13.6         545		99.1	       95.1
Memload	  0.345 +/- 3.23        80.5		99.8	       98.5




^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [interbench numbers] Re: interactive task starvation
  2006-03-23 11:07                                               ` Mike Galbraith
@ 2006-03-24  0:21                                                 ` Con Kolivas
  2006-03-24  5:02                                                   ` Mike Galbraith
  0 siblings, 1 reply; 112+ messages in thread
From: Con Kolivas @ 2006-03-24  0:21 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter,
	Peter Williams

On Thursday 23 March 2006 22:07, Mike Galbraith wrote:
> Nothing conclusive.  Some of the difference may be because interbench
> has a dependency on the idle sleep path popping tasks in a prio 16
> instead of 18.  Some of it may be because I'm not restricting IO, doing
> that makes a bit of difference.  Some of it is definitely plain old
> jitter.

Thanks for those! Just a clarification please

> virgin

I assume 2.6.16-rc6-mm2 ?

> throttle patches with throttling disabled

With your full patchset but no throttling enabled?

> minus idle sleep

Full patchset -throttling-idlesleep ?

> minus don't restrict IO

Full patchset -throttling-idlesleep-restrictio ?

Can you please email the latest separate patches so we can see them in 
isolation? I promise I won't ask for any more interbench numbers any time 
soon :)

Thanks!

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [interbench numbers] Re: interactive task starvation
  2006-03-24  0:21                                                 ` Con Kolivas
@ 2006-03-24  5:02                                                   ` Mike Galbraith
  2006-03-24  5:04                                                     ` Con Kolivas
  0 siblings, 1 reply; 112+ messages in thread
From: Mike Galbraith @ 2006-03-24  5:02 UTC (permalink / raw)
  To: Con Kolivas
  Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter,
	Peter Williams

On Fri, 2006-03-24 at 11:21 +1100, Con Kolivas wrote:
> On Thursday 23 March 2006 22:07, Mike Galbraith wrote:
> > Nothing conclusive.  Some of the difference may be because interbench
> > has a dependency on the idle sleep path popping tasks in a prio 16
> > instead of 18.  Some of it may be because I'm not restricting IO, doing
> > that makes a bit of difference.  Some of it is definitely plain old
> > jitter.
> 
> Thanks for those! Just a clarification please
> 
> > virgin
> 
> I assume 2.6.16-rc6-mm2 ?

Yes.

> 
> > throttle patches with throttling disabled
> 
> With your full patchset but no throttling enabled?

Yes.

> 
> > minus idle sleep
> 
> Full patchset -throttling-idlesleep ?

Yes, using stock idle sleep bits.

> 
> > minus don't restrict IO
> 
> Full patchset -throttling-idlesleep-restrictio ?
> 

Yes.

> Can you please email the latest separate patches so we can see them in 
> isolation? I promise I won't ask for any more interbench numbers any time 
> soon :)

I've separated the buglet fix parts from the rest, so there are four
patches instead of two.  I've also hidden the knobs, though for the
testing phase at least, I personally think it would be better to leave
the knobs there for people to twiddle.  Something Willy said indicated
to me that 'credit' would be more palatable than 'grace', so I've
renamed and updated comments to match.  I think it might look better,
but can't know since 'grace' was perfectly fine for my taste buds ;-)

I'll post as soon as I do some more cleanup pondering and verification.

	-Mike


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [interbench numbers] Re: interactive task starvation
  2006-03-24  5:02                                                   ` Mike Galbraith
@ 2006-03-24  5:04                                                     ` Con Kolivas
  0 siblings, 0 replies; 112+ messages in thread
From: Con Kolivas @ 2006-03-24  5:04 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: lkml, Willy Tarreau, Ingo Molnar, Andrew Morton, bugsplatter,
	Peter Williams

On Friday 24 March 2006 16:02, Mike Galbraith wrote:
> I've separated the buglet fix parts from the rest, so there are four
> patches instead of two.  I've also hidden the knobs, though for the
> testing phase at least, I personally think it would be better to leave
> the knobs there for people to twiddle.  Something Willy said indicated
> to me that 'credit' would be more palatable than 'grace', so I've
> renamed and updated comments to match.  I think it might look better,
> but can't know since 'grace' was perfectly fine for my taste buds ;-)
>
> I'll post as soon as I do some more cleanup pondering and verification.

Great. I suggest making the base patch have the values hard coded as #defines 
and then have a patch on top that turns those into userspace tunables we can 
hand tune while in -mm which can then be dropped if/when merged upstream.

Cheers,
Con

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-21 14:52                                               ` Ingo Molnar
@ 2006-03-29  3:01                                                 ` Lee Revell
  2006-03-29  5:56                                                   ` Ray Lee
  0 siblings, 1 reply; 112+ messages in thread
From: Lee Revell @ 2006-03-29  3:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Willy Tarreau, Con Kolivas, Mike Galbraith, lkml, Andrew Morton,
	bugsplatter

On Tue, 2006-03-21 at 15:52 +0100, Ingo Molnar wrote:
> * Willy Tarreau <willy@w.ods.org> wrote:
> 
> > Ah no, I never use those montruous environments ! xterm is already 
> > heavy. [...]
> 
> [ offtopic note: gnome-terminal developers claim some massive speedups
>   in Gnome 2.14, and my experiments on Fedora rawhide seem to 
>   corraborate that - gnome-term is now faster (for me) than xterm. ]
> 
> > [...] don't you remember, we found that doing "ls" in an xterm was 
> > waking the xterm process for every single line, which in turn woke the 
> > X server for a one-line scroll, while adding the "|cat" acted like a 
> > buffer with batched scrolls. Newer xterms have been improved to 
> > trigger jump scroll earlier and don't exhibit this behaviour even on 
> > non-patched kernels. However, sshd still shows the same problem IMHO.
> 
> yeah. The "|cat" changes the workload, which gets rated by the scheduler 
> differently. Such artifacts are inevitable once interactivity heuristics 
> are strong enough to significantly distort the equal sharing of CPU 
> time.

Can you explain why terminal output ping-pongs back and forth between
taking a certain amount of time, and approximately 10x longer?  For
example here's the result of "time dmesg" 6 times in an xterm with a
constant background workload:

real    0m0.086s
user    0m0.005s
sys     0m0.012s

real    0m0.078s
user    0m0.008s
sys     0m0.009s

real    0m0.082s
user    0m0.004s
sys     0m0.013s

real    0m0.084s
user    0m0.005s
sys     0m0.011s

real    0m0.751s
user    0m0.006s
sys     0m0.017s

real    0m0.749s
user    0m0.005s
sys     0m0.017s

Why does it ping-pong between taking ~0.08s and ~0.75s like that?  The
behavior is completely reproducible.

Lee


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-29  3:01                                                 ` Lee Revell
@ 2006-03-29  5:56                                                   ` Ray Lee
  2006-03-29  6:16                                                     ` Lee Revell
  0 siblings, 1 reply; 112+ messages in thread
From: Ray Lee @ 2006-03-29  5:56 UTC (permalink / raw)
  To: Lee Revell
  Cc: Ingo Molnar, Willy Tarreau, Con Kolivas, Mike Galbraith, lkml,
	Andrew Morton, bugsplatter

On 3/28/06, Lee Revell <rlrevell@joe-job.com> wrote:
> Can you explain why terminal output ping-pongs back and forth between
> taking a certain amount of time, and approximately 10x longer?
[...]
> Why does it ping-pong between taking ~0.08s and ~0.75s like that?  The
> behavior is completely reproducible.

Does the scheduler have any concept of dependent tasks? (If so, hit
<delete> and move on.) If not, then the producer and consumer will be
scheduled randomly w/r/t each other, right? Sometimes producer then
consumer, sometimes vice versa. If so, the ping pong should be half of
the time slow, half of the time fast (+/- sqrt(N)), and the slow time
should scale directly with the number of tasks running on the system.

Do any of the above WAGs match what you see? If so, then perhaps it's
random just due to the order in which the tasks get initially
scheduled (dmesg vs ssh, or dmesg vs xterm vs X -- er, though I guess
in that latter case there's really <thinks> three separate timings
that you'd get back, as the triple set of tasks could be in one of six
orderings, one fast, one slow, and four equally mixed between the
two).

I wonder if on a pipe write, moving the reader to be right after the
writer in the list would even that out. (But only on cases where the
reader didn't just run -- wouldn't want a back and forth conversation
to starve everyone else...)

But like I said, just a WAG.

Ray

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: interactive task starvation
  2006-03-29  5:56                                                   ` Ray Lee
@ 2006-03-29  6:16                                                     ` Lee Revell
  0 siblings, 0 replies; 112+ messages in thread
From: Lee Revell @ 2006-03-29  6:16 UTC (permalink / raw)
  To: ray-gmail
  Cc: Ingo Molnar, Willy Tarreau, Con Kolivas, Mike Galbraith, lkml,
	Andrew Morton, bugsplatter

On Tue, 2006-03-28 at 21:56 -0800, Ray Lee wrote:
> Do any of the above WAGs match what you see? If so, then perhaps it's
> random just due to the order in which the tasks get initially
> scheduled (dmesg vs ssh, or dmesg vs xterm vs X -- er, though I guess
> in that latter case there's really <thinks> three separate timings
> that you'd get back, as the triple set of tasks could be in one of six
> orderings, one fast, one slow, and four equally mixed between the
> two).
> 

Possibly - *very* rarely, like 1 out of 50 or 100 times, it falls
somewhere in the middle.

Lee


^ permalink raw reply	[flat|nested] 112+ messages in thread

end of thread, other threads:[~2006-03-29  6:16 UTC | newest]

Thread overview: 112+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-07 23:13 [PATCH] mm: yield during swap prefetching Con Kolivas
2006-03-07 23:26 ` Andrew Morton
2006-03-07 23:32   ` Con Kolivas
2006-03-08  0:05     ` Andrew Morton
2006-03-08  0:51       ` Con Kolivas
2006-03-08  1:11         ` Andrew Morton
2006-03-08  1:12           ` Con Kolivas
2006-03-08  1:19             ` Con Kolivas
2006-03-08  1:23             ` Andrew Morton
2006-03-08  1:28               ` Con Kolivas
2006-03-08  2:08                 ` Lee Revell
2006-03-08  2:12                   ` Con Kolivas
2006-03-08  2:18                     ` Lee Revell
2006-03-08  2:22                       ` Con Kolivas
2006-03-08  2:27                         ` Lee Revell
2006-03-08  2:30                           ` Con Kolivas
2006-03-08  2:52                             ` [ck] " André Goddard Rosa
2006-03-08  3:03                               ` Lee Revell
2006-03-08  3:05                               ` Con Kolivas
2006-03-08 21:07                                 ` Zan Lynx
2006-03-08 23:00                                   ` Con Kolivas
2006-03-08 23:48                                     ` Zan Lynx
2006-03-09  0:07                                       ` Con Kolivas
2006-03-09  3:13                                         ` Zan Lynx
2006-03-09  4:08                                           ` Con Kolivas
2006-03-09  4:54                                             ` Lee Revell
2006-03-08  7:51                 ` Jan Knutar
2006-03-08  8:39                   ` Con Kolivas
2006-03-09  8:57             ` Helge Hafting
2006-03-09  9:08               ` Con Kolivas
     [not found]                 ` <4410AFD3.7090505@bigpond.net.au>
2006-03-10  9:01                   ` [ck] " Andreas Mohr
2006-03-10  9:11                     ` Con Kolivas
2006-03-08 22:24       ` Pavel Machek
2006-03-09  2:22         ` Nick Piggin
2006-03-09  2:30           ` Con Kolivas
2006-03-09  2:57             ` Nick Piggin
2006-03-09  9:11               ` Con Kolivas
2006-03-08 13:36     ` [ck] " Con Kolivas
2006-03-17  9:06       ` Ingo Molnar
2006-03-17 10:46         ` interactive task starvation Mike Galbraith
2006-03-17 17:15           ` Mike Galbraith
2006-03-20  7:09             ` Mike Galbraith
2006-03-20 10:22               ` Ingo Molnar
2006-03-21  6:47               ` Willy Tarreau
2006-03-21  7:51                 ` Mike Galbraith
2006-03-21  9:13                   ` Willy Tarreau
2006-03-21  9:14                     ` Ingo Molnar
2006-03-21 11:15                       ` Willy Tarreau
2006-03-21 11:18                         ` Ingo Molnar
2006-03-21 11:53                           ` Con Kolivas
2006-03-21 13:10                             ` Mike Galbraith
2006-03-21 13:13                               ` Con Kolivas
2006-03-21 13:33                                 ` Mike Galbraith
2006-03-21 13:37                                   ` Con Kolivas
2006-03-21 13:44                                     ` Willy Tarreau
2006-03-21 13:45                                       ` Con Kolivas
2006-03-21 14:01                                         ` Mike Galbraith
2006-03-21 14:17                                           ` Con Kolivas
2006-03-21 15:20                                             ` Con Kolivas
2006-03-21 17:50                                               ` Willy Tarreau
2006-03-22  4:18                                                 ` Mike Galbraith
2006-03-21 17:51                                               ` Mike Galbraith
2006-03-21 13:38                                 ` Willy Tarreau
2006-03-21 13:48                                   ` Mike Galbraith
2006-03-21 12:07                           ` Mike Galbraith
2006-03-21 12:59                             ` Willy Tarreau
2006-03-21 13:24                               ` Mike Galbraith
2006-03-21 13:53                                 ` Con Kolivas
2006-03-21 14:17                                   ` Mike Galbraith
2006-03-21 14:19                                     ` Con Kolivas
2006-03-21 14:25                                       ` Ingo Molnar
2006-03-21 14:28                                         ` Con Kolivas
2006-03-21 14:30                                           ` Ingo Molnar
2006-03-21 14:28                                       ` Mike Galbraith
2006-03-21 14:30                                         ` Con Kolivas
2006-03-21 14:32                                           ` Ingo Molnar
2006-03-21 14:44                                             ` Willy Tarreau
2006-03-21 14:52                                               ` Ingo Molnar
2006-03-29  3:01                                                 ` Lee Revell
2006-03-29  5:56                                                   ` Ray Lee
2006-03-29  6:16                                                     ` Lee Revell
2006-03-21 14:36                                           ` Mike Galbraith
2006-03-21 14:39                                             ` Con Kolivas
2006-03-21 14:39                                       ` Willy Tarreau
2006-03-21 18:39                                         ` Rafael J. Wysocki
2006-03-21 19:32                                           ` Willy Tarreau
2006-03-21 21:47                                             ` Rafael J. Wysocki
2006-03-21 22:51                                 ` Peter Williams
2006-03-22  3:49                                   ` Mike Galbraith
2006-03-22  3:59                                     ` Peter Williams
2006-03-22 12:14                                     ` [interbench numbers] " Mike Galbraith
2006-03-22 20:27                                       ` Con Kolivas
2006-03-23  3:22                                         ` Mike Galbraith
2006-03-23  5:43                                           ` Con Kolivas
2006-03-23  5:53                                             ` Mike Galbraith
2006-03-23 11:07                                               ` Mike Galbraith
2006-03-24  0:21                                                 ` Con Kolivas
2006-03-24  5:02                                                   ` Mike Galbraith
2006-03-24  5:04                                                     ` Con Kolivas
2006-03-17 12:38         ` [PATCH] sched: activate SCHED BATCH expired Con Kolivas
2006-03-17 13:07           ` Ingo Molnar
2006-03-17 13:26           ` Nick Piggin
2006-03-17 13:36             ` Con Kolivas
2006-03-17 13:46               ` Nick Piggin
2006-03-17 13:51                 ` Nick Piggin
2006-03-17 14:11                 ` Con Kolivas
2006-03-17 14:59                   ` Ingo Molnar
2006-03-17 13:47               ` [ck] " Andreas Mohr
2006-03-17 13:59                 ` Con Kolivas
2006-03-17 14:06                 ` Nick Piggin
2006-03-08  8:48   ` [ck] Re: [PATCH] mm: yield during swap prefetching Andreas Mohr
2006-03-08  8:52     ` Con Kolivas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).