linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC] Re: Blender profiling-1 O16.2int
       [not found] <20030818110001.6564.64238.Mailman@lists.us.dell.com>
@ 2003-08-18 12:52 ` Max Hailperin
  2003-08-18 13:05   ` Con Kolivas
  2003-08-18 14:43   ` William Lee Irwin III
  0 siblings, 2 replies; 14+ messages in thread
From: Max Hailperin @ 2003-08-18 12:52 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel

   From: Con Kolivas <kernel@kolivas.org>
   To: Voluspa <voluspa@comhem.se>
   Subject: [RFC] Re: Blender profiling-1 O16.2int
   Date: Sun, 17 Aug 2003 23:36:42 +1000
   Cc: linux kernel mailing list <linux-kernel@vger.kernel.org>,
      Mike Galbraith <efault@gmx.de>, Andrew Morton <akpm@osdl.org>,
      William Lee Irwin III <wli@holomorphy.com>

   ... I do have some ideas about how to progress with this (some Mike
   has suggested to me previously), but so far most of them involve
   some detriment to the interactive performance on other apps. So I'm
   appealing to others for ideas.

   Comments?

   Con

I suggest you put a small limit on the number of times that a task can
be requeued onto the active array before it needs to go to the expired
array.  -max

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Re: Blender profiling-1 O16.2int
  2003-08-18 12:52 ` [RFC] Re: Blender profiling-1 O16.2int Max Hailperin
@ 2003-08-18 13:05   ` Con Kolivas
  2003-08-18 14:43   ` William Lee Irwin III
  1 sibling, 0 replies; 14+ messages in thread
From: Con Kolivas @ 2003-08-18 13:05 UTC (permalink / raw)
  To: Max Hailperin; +Cc: linux-kernel

On Mon, 18 Aug 2003 22:52, Max Hailperin wrote:
>    ... I do have some ideas about how to progress with this (some Mike
>    has suggested to me previously), but so far most of them involve
>    some detriment to the interactive performance on other apps. So I'm
>    appealing to others for ideas.

> I suggest you put a small limit on the number of times that a task can
> be requeued onto the active array before it needs to go to the expired
> array.  -max

Thanks; I appreciate your comments. That's what's currently in place at the 
moment, and the lower the number of requeues the less it starves... 
Unfortunately, this also means that the shorter it takes a real interactive 
task to expire as well. This is the theme I'm currently working on though, as 
the starver gets rescheduled ahead of anything else and I've developed a way 
of letting other tasks sneak in ahead if it starts doing that. It needs more 
tuning but seems to work so far.

Cheers,
Con


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Re: Blender profiling-1 O16.2int
  2003-08-18 12:52 ` [RFC] Re: Blender profiling-1 O16.2int Max Hailperin
  2003-08-18 13:05   ` Con Kolivas
@ 2003-08-18 14:43   ` William Lee Irwin III
  1 sibling, 0 replies; 14+ messages in thread
From: William Lee Irwin III @ 2003-08-18 14:43 UTC (permalink / raw)
  To: Max Hailperin; +Cc: Con Kolivas, linux-kernel

From: Con Kolivas <kernel@kolivas.org>
>    ... I do have some ideas about how to progress with this (some Mike
>    has suggested to me previously), but so far most of them involve
>    some detriment to the interactive performance on other apps. So I'm
>    appealing to others for ideas.

On Mon, Aug 18, 2003 at 07:52:35AM -0500, Max Hailperin wrote:
> I suggest you put a small limit on the number of times that a task can
> be requeued onto the active array before it needs to go to the expired
> array.  -max

Well, first off this undoes the effect of code already in place (the
interactivity heuristics), and second it's defeated by high arrival
rates spread across many tasks. The second is fatal but there isn't
much that can be done about it in the current design, and I don't
have enough subtlety to see how to fiddle with the bits necessary to
recover anything out of the first. AFAICT no fixed limit will ever
do; any given limit will be too many for some tasks and not enough
for others, and there's no way to guess based on the task (if what said
it was interactive was enough in the first place, you wouldn't have to
resort to this), so you've been hanged just for trying.

-- wli

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Re: Blender profiling-1 O16.2int
  2003-08-26  2:54 Kester Maddock
@ 2003-08-26  5:03 ` Con Kolivas
  0 siblings, 0 replies; 14+ messages in thread
From: Con Kolivas @ 2003-08-26  5:03 UTC (permalink / raw)
  To: Kester Maddock, linux-kernel; +Cc: bf-committers

On Tue, 26 Aug 2003 12:54, Kester Maddock wrote:
> On Sun, Aug 17, 2003 at 11:36:42PM +1000, Con Kolivas wrote:
> > Now normally, blender should just sleep and wait till X comes
> > alive again before it does anything. However here it shows clearly that
> > it is spinning madly looking for something from X, and poor X can't do
> > anything. This is the busy on wait I've described.
> >
> > Second, any applications that exhibit this should be fixed since it is a
> > bug.
>
> Say if blender is polling the mouse in the view rotate loop?  (If so, you
> should be able to just hold down the middle mouse to starve, without moving
> the mouse.)
>
> Is this really a bug in the application?  Blender is interactive while
> rotating the view.  More CPU means more frames per second which gives the
> user a better experience.  The CPU usage will drop down when the user
> releases the middle mouse button.
>
> (OK, in this specific case blender could update the screen on mouse move
> events, but what about the general case eg a 3d game, where the screen
> is updated by eg monster ai?)
>
> And how do you fix it?  Would sleep(0) in these loops do, or do you need to
> select(...) on X?
>
> CC me please on replys.
>
> Thanks,
>
> Kester Maddock.
> ^ sends occaisional patches to blender.

Thanks for you attention. Yes this is a scheduler issue that makes it prone to 
the effect of priority inversion (heck even the mars rover module suffered 
it), brought out by the quirk in the coding. I wouldn't even know where to 
begin looking at the source of blender but the culprits we've tracked down so 
far that cause this use select with a very short timeout (15ms in one case). 
Basically it repeatedly times out, and X never gets adequate scheduling time 
to respond because the applications themselves are the ones preempting X.

Con


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC] Re: Blender profiling-1 O16.2int
@ 2003-08-26  2:54 Kester Maddock
  2003-08-26  5:03 ` Con Kolivas
  0 siblings, 1 reply; 14+ messages in thread
From: Kester Maddock @ 2003-08-26  2:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: bf-committers


On Sun, Aug 17, 2003 at 11:36:42PM +1000, Con Kolivas wrote:
> Now normally, blender should just sleep and wait till X comes
> alive again before it does anything. However here it shows clearly that 
> it is spinning madly looking for something from X, and poor X can't do 
> anything. This is the busy on wait I've described.

> Second, any applications that exhibit this should be fixed since it is a bug. 

Say if blender is polling the mouse in the view rotate loop?  (If so, you 
should be able to just hold down the middle mouse to starve, without moving 
the mouse.)

Is this really a bug in the application?  Blender is interactive while rotating 
the view.  More CPU means more frames per second which gives the user a
better experience.  The CPU usage will drop down when the user releases the
middle mouse button.

(OK, in this specific case blender could update the screen on mouse move 
events, but what about the general case eg a 3d game, where the screen
is updated by eg monster ai?)

And how do you fix it?  Would sleep(0) in these loops do, or do you need to
select(...) on X?

CC me please on replys.

Thanks,

Kester Maddock.
^ sends occaisional patches to blender.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Re: Blender profiling-1 O16.2int
  2003-08-19  0:28         ` Con Kolivas
  2003-08-19  0:37           ` Richard A Nelson
@ 2003-08-19  8:39           ` Rob Landley
  1 sibling, 0 replies; 14+ messages in thread
From: Rob Landley @ 2003-08-19  8:39 UTC (permalink / raw)
  To: Con Kolivas, Wes Janzen; +Cc: linux kernel mailing list

On Monday 18 August 2003 20:28, Con Kolivas wrote:
> On Tue, 19 Aug 2003 10:14, Wes Janzen wrote:
> > I think this problem is exacerbated when another app is competing for
> > the processor.  The machine just pauses unless I'm also doing something
> > else, in this case compiling XINE.  Once something is competing, it
> > looks like X takes an extraordinarily long time to come back into the
> > running queue.
>
> Yes that is correct behaviour and to be expected.
>
> > Is there a way to figure out when a process is spinning on a wait and
>
> That's the trick isn't it? No there isn't or else I'd fix it in a jiffy. If
> someone can think of a way I'd love to know.

Well, in theory you could fiddle with its memory map and force it to 
soft-fault all its pages back in, and count the soft faults.  But I know just 
enough about that to suspect it's a rast nest of evil complexity and loss of 
orthogonality.

Another thing is if it's calling the same syscalls in sequence with the same 
arguments...  (What syscalls IS it calling?)

Looking at the profile list you gave earlier, schedule is showing up.  If the 
sucker's calling schedule in the loop, that should be a heck of a hint that 
it's busy-waiting, dontcha think?

As long as you've got a queue for each priority, how about having an "I called 
schedule" queue?  (Or can you re-queue it somewhere lower than its actual 
priority?  How about one queue lower than the queue was on when it called 
schedule, without necessarily changing its dynamic priority...)

What DOES calling "schedule" from userspace mean with the current scheduler?

Rob



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Re: Blender profiling-1 O16.2int
  2003-08-19  0:14       ` Wes Janzen
  2003-08-19  0:28         ` Con Kolivas
  2003-08-19  0:31         ` William Lee Irwin III
@ 2003-08-19  0:58         ` Wes Janzen
  2 siblings, 0 replies; 14+ messages in thread
From: Wes Janzen @ 2003-08-19  0:58 UTC (permalink / raw)
  To: linux kernel mailing list; +Cc: wli

Those benchmarks/tests I'm referring to are at:

I had to correct some things to get them to compile, but it was very 
simple.  I think one of the versions actually checks to see if it has 
been woken up more than once.

cspipe2a.cpp/csfast5.cpp:
-->Article with links to source:
    http://www-106.ibm.com/developerworks/linux/library/l-rt9/

csfast5a.cpp/fract5.cpp:
-->Original site (doesn't have the source, but explains the program):
     
http://www-106.ibm.com/developerworks/linux/library/l-rt10/?t=gr,lnxw01=ConSwiP2
-->Google search found me this site which has the source linked:
     
http://www-900.ibm.com/developerWorks/cn/linux/sdk/rt/part10/index.shtml#resources

Wes


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Re: Blender profiling-1 O16.2int
  2003-08-19  0:28         ` Con Kolivas
@ 2003-08-19  0:37           ` Richard A Nelson
  2003-08-19  8:39           ` Rob Landley
  1 sibling, 0 replies; 14+ messages in thread
From: Richard A Nelson @ 2003-08-19  0:37 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list

On Tue, 19 Aug 2003, Con Kolivas wrote:

> > Is there a way to figure out when a process is spinning on a wait and
>
> That's the trick isn't it? No there isn't or else I'd fix it in a jiffy. If
> someone can think of a way I'd love to know.

Have you considered instrumenting waits/locks ? I had to do that for a
schedular/locks/events (not Linux based), by extending the structures
to include current ownership and waiters...  At a problem point, you'd
then have to use something like sysreq to trigger a scan of the relevant
data areas.  I had to cause an error and do a postmortem via paper dump
:(

-- 
Rick Nelson
> : Any porters out there should feel happier knowing that DEC is shipping
> : me an AlphaPC that I intend to try getting linux running on: this will
> : definitely help flush out some of the most flagrant unportable stuff.
> : The Alpha is much more different from the i386 than the 68k stuff is, so
> : it's likely to get most of the stuff fixed.
>
> It's posts like this that almost convince us non-believers that there
> really is a god.
	-- Anthony Lovell, to Linus's remarks about porting

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Re: Blender profiling-1 O16.2int
  2003-08-19  0:14       ` Wes Janzen
  2003-08-19  0:28         ` Con Kolivas
@ 2003-08-19  0:31         ` William Lee Irwin III
  2003-08-19  0:58         ` Wes Janzen
  2 siblings, 0 replies; 14+ messages in thread
From: William Lee Irwin III @ 2003-08-19  0:31 UTC (permalink / raw)
  To: Wes Janzen; +Cc: Con Kolivas, linux kernel mailing list

On Mon, Aug 18, 2003 at 07:14:12PM -0500, Wes Janzen wrote:
> That makes sense, I was running a program that I found on IBM's website 
> that's supposed to test context switching speed this weekend.  It has 1 
> free lock and passes them around the group.  If I put it up to 32 
> threads or so with one spare lock, I can start to see the starvation.  
> When running vmstat, it's apparent when the the starvation occurs as the 
> context switching sky-rockets.  I was going to add to the example code 
> to check for how many times a thread wakes up waiting for the lock and 
> can't get it after reading that message about locks in the list.  I 
> guess I won't have to do that now.  Anyway, that'll bring my system to a 
> halt when the thread count gets up over 256.  Still, it's usuable as 
> long as I'm not doing something else that makes heavy use of the processor.

Could you give a URL to that benchmark?

Thanks.


-- wli

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Re: Blender profiling-1 O16.2int
  2003-08-19  0:14       ` Wes Janzen
@ 2003-08-19  0:28         ` Con Kolivas
  2003-08-19  0:37           ` Richard A Nelson
  2003-08-19  8:39           ` Rob Landley
  2003-08-19  0:31         ` William Lee Irwin III
  2003-08-19  0:58         ` Wes Janzen
  2 siblings, 2 replies; 14+ messages in thread
From: Con Kolivas @ 2003-08-19  0:28 UTC (permalink / raw)
  To: Wes Janzen; +Cc: linux kernel mailing list

On Tue, 19 Aug 2003 10:14, Wes Janzen wrote:
> I think this problem is exacerbated when another app is competing for
> the processor.  The machine just pauses unless I'm also doing something
> else, in this case compiling XINE.  Once something is competing, it
> looks like X takes an extraordinarily long time to come back into the
> running queue.

Yes that is correct behaviour and to be expected.

> Is there a way to figure out when a process is spinning on a wait and

That's the trick isn't it? No there isn't or else I'd fix it in a jiffy. If 
someone can think of a way I'd love to know.

> exponentially decrease it's bonus if they are consecutive?  I should
> probably read through the source and some of these posts before I make
> suggestions though, because I don't currently know much about how all
> that works.

If you get a chance could you try the prerelease O17 patch against O16.3 in 
http://kernel.kolivas.org/2.5/experimental to see if that helps? It's purely 
an anti-starvation patch.

Thanks for comments.

Cheers,
Con


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Re: Blender profiling-1 O16.2int
  2003-08-17 13:36     ` Con Kolivas
  2003-08-17 16:34       ` Con Kolivas
  2003-08-18  0:30       ` William Lee Irwin III
@ 2003-08-19  0:14       ` Wes Janzen
  2003-08-19  0:28         ` Con Kolivas
                           ` (2 more replies)
  2 siblings, 3 replies; 14+ messages in thread
From: Wes Janzen @ 2003-08-19  0:14 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list

That makes sense, I was running a program that I found on IBM's website 
that's supposed to test context switching speed this weekend.  It has 1 
free lock and passes them around the group.  If I put it up to 32 
threads or so with one spare lock, I can start to see the starvation.  
When running vmstat, it's apparent when the the starvation occurs as the 
context switching sky-rockets.  I was going to add to the example code 
to check for how many times a thread wakes up waiting for the lock and 
can't get it after reading that message about locks in the list.  I 
guess I won't have to do that now.  Anyway, that'll bring my system to a 
halt when the thread count gets up over 256.  Still, it's usuable as 
long as I'm not doing something else that makes heavy use of the processor.

I also had a nasty experience with a nautilus crash today that caused 
this problem with X, I'm sure much similar to the results from Blender.  
Obviously,  X keeps getting expired because I was forced to reset after 
waiting 1 1/2 hours for the system to shutdown (i.e. after the runlevel 
changed to 6 to the actual rebooting of the system).  Even after I 
managed to get out of X (though I don't know if the process was able 
ever to actually exit) the system took 5 minutes to get from "Stopping 
sound services" to the next message.  After that, I got a cursor that 
didn't flash, then a blank screen but no reboot.  I don't think X ever 
exited since it never took me back to VT7, and the XDM shutdown comes 
shortly before killing cron.  The last message I saw indicated it was 
shutting down service at...  At that point, I could no longer change 
between terminals though I could see continuing activity on my hard 
drive.  I waited 30 minutes from when I had exited X until I finally 
reset.  I had just clicked on the debug information button in bug buddy 
when this started.

I think this problem is exacerbated when another app is competing for 
the processor.  The machine just pauses unless I'm also doing something 
else, in this case compiling XINE.  Once something is competing, it 
looks like X takes an extraordinarily long time to come back into the 
running queue.

Is there a way to figure out when a process is spinning on a wait and 
exponentially decrease it's bonus if they are consecutive?  I should 
probably read through the source and some of these posts before I make 
suggestions though, because I don't currently know much about how all 
that works.

Wes

Con Kolivas wrote:

>Second, any applications that exhibit this should be fixed since it is a bug. 
>
>Finally I still need to find a way to prevent this from transiently starving 
>the system without undoing the interactive improvements to normal 
>applications; I certainly don't intend to make them "work nicely" just for 
>the sake of it.
>
>I do have some ideas about how to progress with this (some Mike has suggested 
>to me previously), but so far most of them involve some detriment to the 
>interactive performance on other apps. So I'm appealing to others for ideas.
>
>Comments?
>
>Con
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>  
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Re: Blender profiling-1 O16.2int
  2003-08-17 13:36     ` Con Kolivas
  2003-08-17 16:34       ` Con Kolivas
@ 2003-08-18  0:30       ` William Lee Irwin III
  2003-08-19  0:14       ` Wes Janzen
  2 siblings, 0 replies; 14+ messages in thread
From: William Lee Irwin III @ 2003-08-18  0:30 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Voluspa, linux kernel mailing list, Mike Galbraith, Andrew Morton

On Sun, Aug 17, 2003 at 11:36:42PM +1000, Con Kolivas wrote:
> Now for those that can't see what this is, blender and X are
> interactive tasks and getting high priority (PRI < 18) which makes
> sense. During heavy usage of blender, it is X that gets pegged for
> cpu usage (ie it is doing the work for blender), and eventually it
> gets expired onto the expired array for being naughty and stops doing
> anything till all other tasks have finished working on the active
> array. Now normally, blender should just sleep and wait till X comes
> alive again before it does anything. However here it shows clearly that 
> it is spinning madly looking for something from X, and poor X can't do 
> anything. This is the busy on wait I've described. Meanwhile, since blender 
> was seen as an interactive task (which it is), it preempts everything lower 
> priority than it till it also gets booted. 

Ugh, priority inversion.


-- wli

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Re: Blender profiling-1 O16.2int
  2003-08-17 13:36     ` Con Kolivas
@ 2003-08-17 16:34       ` Con Kolivas
  2003-08-18  0:30       ` William Lee Irwin III
  2003-08-19  0:14       ` Wes Janzen
  2 siblings, 0 replies; 14+ messages in thread
From: Con Kolivas @ 2003-08-17 16:34 UTC (permalink / raw)
  To: Voluspa
  Cc: linux kernel mailing list, Mike Galbraith, Andrew Morton,
	William Lee Irwin III

Addendum:

KernProfile during starvation:
   435 get_offset_tsc                            18.9130
   273 do_softirq                                 1.8079
   263 sys_gettimeofday                           1.3350
   121 do_gettimeofday                            0.9167
    72 fget                                       1.1077
    68 schedule                                   0.0465
    20 do_sys_settimeofday                        0.0943
    20 do_settimeofday                            0.0627
    14 fput                                       0.7000
    14 add_wait_queue                             0.1474
    11 __wake_up                                  0.1250
    11 remove_wait_queue                          0.0815
    11 do_anonymous_page                          0.0199
     9 pipe_poll                                  0.0726
     9 link_path_walk                             0.0040
     8 free_hot_cold_page                         0.0311

vmstat during starvation
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0      0  99716  56348 193256    0    0     0     0 1687  5962 92  8  0  0
 1  0      0  99716  56376 193256    0    0     0   144 1691  5976 81 19  0  0
 1  0      0  99748  56376 193256    0    0     0     0 1656  6057 87 13  0  0
 2  0      0  99748  56376 193256    0    0     0     0 1636  5477 85 15  0  0
11  0      0  99748  56376 193256    0    0     0     0 1430   445 51 49  0  0
12  0      0  99748  56376 193256    0    0     0     0 1459   362 52 48  0  0
12  0      0  99764  56392 193256    0    0     0    92 1474   400 51 49  0  0
12  0      0  99764  56392 193256    0    0     0     0 1463   356 53 47  0  0
12  0      0  99764  56392 193256    0    0     0     0 1466   379 50 50  0  0

Note drop in ctx, rise in load (although in top only blender is actually 
getting cpu at 99%) and massive rise in sys time.

Con


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC] Re: Blender profiling-1 O16.2int
       [not found]   ` <20030817073859.51021571.voluspa@comhem.se>
@ 2003-08-17 13:36     ` Con Kolivas
  2003-08-17 16:34       ` Con Kolivas
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Con Kolivas @ 2003-08-17 13:36 UTC (permalink / raw)
  To: Voluspa
  Cc: linux kernel mailing list, Mike Galbraith, Andrew Morton,
	William Lee Irwin III

I've investigated this starvation issue now that I've been given a testcase. 
First some background.  Mats reported that he could elicit starvation in 
blender so I downloaded that and have been thrashing it out trying to find 
out what happens. Here are relevant excerpts from the discussion.

On Sun, 17 Aug 2003 15:38, Voluspa wrote:
> Ah, the freezes has been there with all 2.6s, just shorter (1/2 - 2
> seconds). Where I didn't see them was in the older blender version of
> 2.23. I'll investigate it more thorough.

This struck me as interesting because the starvation was there in the vanilla 
scheduler anyway, but for a shorter period. Even more interesting was the 
suggestion that it didn't happen with previous versions of blender.

Reproducing it was:

> Voluspa <voluspa@comhem.se> wrote:
>Thusly: Start program, move mouse to get rid of splash screen. Click and
>hold down 3rd mouse button (unless you use simulate 3 buttons in X, then
>it's left-right-mousebutton simultaniously) anywhere on the white grid
>(first view is top-down) and move mouse around... That is how you rotate
>the world plane around an axes. Doing this quickly, or slowly for a
>longer period, I freeze.

Which I tried for some time and eventually managed to get locally. In the 
meantime,

> Voluspa <voluspa@comhem.se> wrote:
> But not in 2.23, not even with this O16.2! Don't see any freezes with
> that older Blender version.

So I had my suspicions about what was happening already based on that 
information. Anyway I ran top in batch mode, reproduced the starvation and 
got this (just X and 3[blender] at one second intervals):

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  Command
 782 root      16   0  109m  39m  77m R 80.5  7.9   2:37.17 X
12553 con       15   0 12648 6252 9072 S 19.9  1.2   0:07.98 3

  782 root      16   0  109m  39m  77m R 73.6  7.9   2:37.91 X
12553 con       15   0 12648 6252 9072 R 23.9  1.2   0:08.22 3

  782 root      16   0  109m  39m  77m R 79.5  7.9   2:38.71 X
12553 con       15   0 12648 6252 9072 S 20.9  1.2   0:08.43 3

Starvation starts here:
  782 root      16   0  109m  39m  77m R 95.4  7.9   2:39.67 X
12553 con       15   0 12648 6252 9072 S  3.0  1.2   0:08.46 3

12553 con       16   0 12648 6252 9072 R 73.6  1.2   0:09.20 3
  782 root      17   0  109m  39m  77m R 25.8  7.9   2:39.93 X

12553 con       16   0 12648 6252 9072 R 98.4  1.2   0:10.19 3
 782 root      17   0  109m  39m  77m R  0.0  7.9   2:39.93 X
 
12553 con       16   0 12648 6252 9072 R 99.4  1.2   0:11.19 3
  782 root      17   0  109m  39m  77m R  0.0  7.9   2:39.93 X

12553 con       16   0 12648 6252 9072 R 99.4  1.2   0:12.19 3
  782 root      17   0  109m  39m  77m R  0.0  7.9   2:39.93 X

12553 con       16   0 12648 6252 9072 R 99.9  1.2   0:13.20 3
  782 root      17   0  109m  39m  77m R  0.0  7.9   2:39.93 X

[snip some more seconds]
and ends here:
12553 con       15   0 12648 6252 9072 S 39.8  1.2   0:23.59 3
  782 root      16   0  109m  39m  77m S 16.9  7.9   2:40.10 X

  782 root      15   0  109m  39m  77m S  8.9  7.9   2:40.19 X
12553 con       15   0 12964 6576 9072 S  2.0  1.3   0:23.61 3

Now for those that can't see what this is, blender and X are interactive tasks 
and getting high priority (PRI < 18) which makes sense. During heavy usage of 
blender, it is X that gets pegged for cpu usage (ie it is doing the work for 
blender), and eventually it gets expired onto the expired array for being 
naughty and stops doing anything till all other tasks have finished working 
on the active array. Now normally, blender should just sleep and wait till X 
comes alive again before it does anything. However here it shows clearly that 
it is spinning madly looking for something from X, and poor X can't do 
anything. This is the busy on wait I've described. Meanwhile, since blender 
was seen as an interactive task (which it is), it preempts everything lower 
priority than it till it also gets booted. 

Now in O16.2 I managed to minimise this happening by not allowing tasks to 
preempt their waker, but this has no effect if X gets expired. The evidence I 
really needed was that this also happened to the vanilla scheduler to a 
lesser degree and that it is a software flaw. This same thing happens to 
2.6.0-test3 for 1-2 seconds, and doesn't happen at all with a previous 
version of blender.

So where does this leave us? 

Well first I can remove entirely the not preempting the waker in O16s since 
that is faulty for smp, and doesn't really fix the problem. 

Second, any applications that exhibit this should be fixed since it is a bug. 

Finally I still need to find a way to prevent this from transiently starving 
the system without undoing the interactive improvements to normal 
applications; I certainly don't intend to make them "work nicely" just for 
the sake of it.

I do have some ideas about how to progress with this (some Mike has suggested 
to me previously), but so far most of them involve some detriment to the 
interactive performance on other apps. So I'm appealing to others for ideas.

Comments?

Con


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2003-08-26  4:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20030818110001.6564.64238.Mailman@lists.us.dell.com>
2003-08-18 12:52 ` [RFC] Re: Blender profiling-1 O16.2int Max Hailperin
2003-08-18 13:05   ` Con Kolivas
2003-08-18 14:43   ` William Lee Irwin III
2003-08-26  2:54 Kester Maddock
2003-08-26  5:03 ` Con Kolivas
     [not found] <20030817003128.04855aed.voluspa@comhem.se>
     [not found] ` <200308171142.33131.kernel@kolivas.org>
     [not found]   ` <20030817073859.51021571.voluspa@comhem.se>
2003-08-17 13:36     ` Con Kolivas
2003-08-17 16:34       ` Con Kolivas
2003-08-18  0:30       ` William Lee Irwin III
2003-08-19  0:14       ` Wes Janzen
2003-08-19  0:28         ` Con Kolivas
2003-08-19  0:37           ` Richard A Nelson
2003-08-19  8:39           ` Rob Landley
2003-08-19  0:31         ` William Lee Irwin III
2003-08-19  0:58         ` Wes Janzen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).