Re: Real-Time Preemption, comparison to 2.6.10-mm1

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Real-Time Preemption, comparison to 2.6.10-mm1
@ 2005-01-05 13:55 Mark_H_Johnson
  2005-01-05 14:38 ` K.R. Foley
  0 siblings, 1 reply; 8+ messages in thread
From: Mark_H_Johnson @ 2005-01-05 13:55 UTC (permalink / raw)
  To: Lee Revell
  Cc: Andrew Morton, Bill Huey, Adam Heath, K.R. Foley, linux-kernel,
	Ingo Molnar, Florian Schmidt, Fernando Pablo Lopez-Lezcano,
	Rui Nuno Capela, Steven Rostedt, Thomas Gleixner

> On Tue, 2005-01-04 at 14:11 -0600, Mark_H_Johnson@raytheon.com wrote:
> > The non RT application starvation for mm1 was much less
> > pronounced but still present. I could watch the disk light
> > on the last two tests & see it go out (and stay out) for an
> > extended period. It does not make sense to me that a single RT
> > application (on a two CPU machine) and a nice'd non RT application
> > can cause this starvation behavior. This behavior was not
> > present on the 2.4 kernels and seems to be a regression to me.

> I think I am seeing this problem too.  It doesn't just apply to RT
> tasks, it seems that CPU bound tasks starve each other.  I noticed that
> with the RT kernel, a kernel compile or dpkg will starve evolution, to
> the point where it takes 30 seconds to display a message.  If I go and
> background the CPU hog, the message renders _instantly_.

> It's definitely present with 2.6.10-rc2 + RT (PK config) and absent with
> 2.6.10 vanilla.  I need to figure out whether -mm has the problem.
My point was that -mm definitely has the problem (though to a lesser
degree). The tests I ran showed it on both the disk read and disk copy
stress tests. I guess I should try a vanilla 2.6.10 run as well to see
if it is something introduced in the -mm series (it certainly is not a
recent change...).

--Mark H Johnson
  <mailto:Mark_H_Johnson@raytheon.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Real-Time Preemption, comparison to 2.6.10-mm1
  2005-01-05 13:55 Real-Time Preemption, comparison to 2.6.10-mm1 Mark_H_Johnson
@ 2005-01-05 14:38 ` K.R. Foley
  0 siblings, 0 replies; 8+ messages in thread
From: K.R. Foley @ 2005-01-05 14:38 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Lee Revell, Andrew Morton, Bill Huey, Adam Heath, linux-kernel,
	Ingo Molnar, Florian Schmidt, Fernando Pablo Lopez-Lezcano,
	Rui Nuno Capela, Steven Rostedt, Thomas Gleixner

Mark_H_Johnson@raytheon.com wrote:
>>On Tue, 2005-01-04 at 14:11 -0600, Mark_H_Johnson@raytheon.com wrote:
>>
>>>The non RT application starvation for mm1 was much less
>>>pronounced but still present. I could watch the disk light
>>>on the last two tests & see it go out (and stay out) for an
>>>extended period. It does not make sense to me that a single RT
>>>application (on a two CPU machine) and a nice'd non RT application
>>>can cause this starvation behavior. This behavior was not
>>>present on the 2.4 kernels and seems to be a regression to me.
> 
> 
>>I think I am seeing this problem too.  It doesn't just apply to RT
>>tasks, it seems that CPU bound tasks starve each other.  I noticed that
>>with the RT kernel, a kernel compile or dpkg will starve evolution, to
>>the point where it takes 30 seconds to display a message.  If I go and
>>background the CPU hog, the message renders _instantly_.
> 
> 
>>It's definitely present with 2.6.10-rc2 + RT (PK config) and absent with
>>2.6.10 vanilla.  I need to figure out whether -mm has the problem.
> 
> My point was that -mm definitely has the problem (though to a lesser
> degree). The tests I ran showed it on both the disk read and disk copy
> stress tests. I guess I should try a vanilla 2.6.10 run as well to see
> if it is something introduced in the -mm series (it certainly is not a
> recent change...).
> 
> --Mark H Johnson
>   <mailto:Mark_H_Johnson@raytheon.com>
> 
> 
I'm curious if anyone is seeing this behavior on UP systems, or is it 
only happening on SMP?

kr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Real-Time Preemption, comparison to 2.6.10-mm1
  2005-01-05 22:58 Mark_H_Johnson
@ 2005-01-07  0:48 ` K.R. Foley
  0 siblings, 0 replies; 8+ messages in thread
From: K.R. Foley @ 2005-01-07  0:48 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Lee Revell, Andrew Morton, Bill Huey, Adam Heath, linux-kernel,
	Ingo Molnar, Florian Schmidt, Fernando Pablo Lopez-Lezcano,
	Rui Nuno Capela, Steven Rostedt, Thomas Gleixner

Mark_H_Johnson@raytheon.com wrote:
>>Do you have a simple way of triggering and trapping the starvation? That
>>of course is probably asking for a lot. :)
> 
> 
> What I have been doing is now recorded as bug #3997 at
>   http://bugme.osdl.org/show_bug.cgi?id=3997
> Running latencytest has the advantage of generating some charts
> that I can show others how well (or poorly) the system runs.
> 
> I just rebooted in SMP and did a few other tests with the following
> steps that should be simpler to set up and run.
> 
> [1] Create cpu_test.c as follows:
> 
> #define LOOPS 100000000
> int main() {
>   int u, v, k, l;
>   for (v=0; v<100; v++) {
>     for (u=0; u<LOOPS; u++) {
>       k += 1;
>       if (!(u%100)) {
>       l += 1;
>       k = 0;
>       }
>     }
>   }
>   return k;
> }
> 
> On my 866 Mhz Pentium III, it runs for about 3 minutes, 45 seconds.
> Adjust the outer loop in the code if you need to run shorter or
> longer. I would not run more than 5 minutes - you should easily
> get the symptom before then. This also puts a limit of how long
> you have to wait if the system gets "stuck" and does not respond
> to your keyboard or mouse.
> 
> [2] Build the application
>   gcc -o cpu_test cpu_test.c
> 
> [3] On a two CPU system (or repeat this step N-1 times for
> your N cpu system). Run in a separate window...
>   chrt -f 10 ./cpu_test
> 
> [4] In a separate window...
>   nice ./cpu_test
> 
> At this point, you should have the system 100% utilized with N-1
> real time applications & 1 nice application. I used top to confirm
> this result.
> 
> [5] In a separate window do one or more of the following:
>   a. head -c $1 /dev/zero >tmpfile
>   (replacing $1 with about 1.5x your physical memory size
>    - this is my "disk write" test)
>   b. cp tmpfile tmpfile2
>   (this is my "disk copy" test)
>   c. cat tmpfile tmpfile2 >/dev/null
>   (this is my "disk read" test)
> delete the files when done.
> 
> It appears (at least on my system) that disk I/O triggers the
> problem more than the other tests (x11perf, top [with no delay]).
> 
> I was however in an odd situation [just before I sent this message]
> where it appeared that the disk copy was OK but I could not
> type on any of my windows - mouse entries were OK but not the
> keyboard. That may be a different variant of my "starvation
> problem".
> 
>   --Mark
> 
> 

I have been able to reproduce this on my system. Actually I can pretty 
much put the system to sleep, at least to all external input/output, for 
a period of time. I haven't had time to look into this further but will 
try to later tonight.

kr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Real-Time Preemption, comparison to 2.6.10-mm1
@ 2005-01-05 22:58 Mark_H_Johnson
  2005-01-07  0:48 ` K.R. Foley
  0 siblings, 1 reply; 8+ messages in thread
From: Mark_H_Johnson @ 2005-01-05 22:58 UTC (permalink / raw)
  To: K.R. Foley
  Cc: Lee Revell, Andrew Morton, Bill Huey, Adam Heath, linux-kernel,
	Ingo Molnar, Florian Schmidt, Fernando Pablo Lopez-Lezcano,
	Rui Nuno Capela, Steven Rostedt, Thomas Gleixner

> Do you have a simple way of triggering and trapping the starvation? That
> of course is probably asking for a lot. :)

What I have been doing is now recorded as bug #3997 at
  http://bugme.osdl.org/show_bug.cgi?id=3997
Running latencytest has the advantage of generating some charts
that I can show others how well (or poorly) the system runs.

I just rebooted in SMP and did a few other tests with the following
steps that should be simpler to set up and run.

[1] Create cpu_test.c as follows:

#define LOOPS 100000000
int main() {
  int u, v, k, l;
  for (v=0; v<100; v++) {
    for (u=0; u<LOOPS; u++) {
      k += 1;
      if (!(u%100)) {
      l += 1;
      k = 0;
      }
    }
  }
  return k;
}

On my 866 Mhz Pentium III, it runs for about 3 minutes, 45 seconds.
Adjust the outer loop in the code if you need to run shorter or
longer. I would not run more than 5 minutes - you should easily
get the symptom before then. This also puts a limit of how long
you have to wait if the system gets "stuck" and does not respond
to your keyboard or mouse.

[2] Build the application
  gcc -o cpu_test cpu_test.c

[3] On a two CPU system (or repeat this step N-1 times for
your N cpu system). Run in a separate window...
  chrt -f 10 ./cpu_test

[4] In a separate window...
  nice ./cpu_test

At this point, you should have the system 100% utilized with N-1
real time applications & 1 nice application. I used top to confirm
this result.

[5] In a separate window do one or more of the following:
  a. head -c $1 /dev/zero >tmpfile
  (replacing $1 with about 1.5x your physical memory size
   - this is my "disk write" test)
  b. cp tmpfile tmpfile2
  (this is my "disk copy" test)
  c. cat tmpfile tmpfile2 >/dev/null
  (this is my "disk read" test)
delete the files when done.

It appears (at least on my system) that disk I/O triggers the
problem more than the other tests (x11perf, top [with no delay]).

I was however in an odd situation [just before I sent this message]
where it appeared that the disk copy was OK but I could not
type on any of my windows - mouse entries were OK but not the
keyboard. That may be a different variant of my "starvation
problem".

  --Mark

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Real-Time Preemption, comparison to 2.6.10-mm1
  2005-01-05 17:52 Mark_H_Johnson
@ 2005-01-05 21:20 ` K.R. Foley
  0 siblings, 0 replies; 8+ messages in thread
From: K.R. Foley @ 2005-01-05 21:20 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Lee Revell, Andrew Morton, Bill Huey, Adam Heath, linux-kernel,
	Ingo Molnar, Florian Schmidt, Fernando Pablo Lopez-Lezcano,
	Rui Nuno Capela, Steven Rostedt, Thomas Gleixner

Mark_H_Johnson@Raytheon.com wrote:
> K.R. Foley wrote:
> 
>>Mark_H_Johnson@raytheon.com wrote:
> 
> [snip - long explanation of how a nice application can starve a non
> nice application for minutes at a time on an SMP system]
> 
> 
>>>My point was that -mm definitely has the problem (though to a lesser
>>>degree). The tests I ran showed it on both the disk read and disk copy
>>>stress tests. I guess I should try a vanilla 2.6.10 run as well to see
>>>if it is something introduced in the -mm series (it certainly is not a
>>>recent change...).
>>
>>I'm curious if anyone is seeing this behavior on UP systems, or is it
>>only happening on SMP?
> 
> The build of 2.6.10 vanilla just completed and I reran my tests with
> SMP and with MAXCPUS=1 (UP w/ SMP kernel).

Well that blows one of the theories I was looking at. :( -mm is carrying 
a patch that lengthens the cache_hot_time to roughly a ms instead of a 
usec, which could effect how fast tasks might be migrated to an idle cpu.
> 
> The vanilla 2.6.10 kernel has the non RT starvation problem as well
> for both test runs. It looks like this is not something in -mm but a
> change between 2.4 and 2.6.
> 
> I did notice the test results were a little inconsistent between the
> two runs...
>              2.6.10 SMP    2.6.10 UP (w/ SMP kernel)
> disk write    starved          OK
> disk copy        OK         starved
> disk read     starved       starved
> but in both cases, a non nice (non RT) disk application was
> starved by a nice (non RT) cpu application for minutes.

Do you have a simple way of triggering and trapping the starvation? That 
of course is probably asking for a lot. :)

kr

> 
> I wonder who I should be talking to next (or submit a bug report?)
> about this.
> 
>   --Mark
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Real-Time Preemption, comparison to 2.6.10-mm1
@ 2005-01-05 17:52 Mark_H_Johnson
  2005-01-05 21:20 ` K.R. Foley
  0 siblings, 1 reply; 8+ messages in thread
From: Mark_H_Johnson @ 2005-01-05 17:52 UTC (permalink / raw)
  To: K.R. Foley
  Cc: Lee Revell, Andrew Morton, Bill Huey, Adam Heath, linux-kernel,
	Ingo Molnar, Florian Schmidt, Fernando Pablo Lopez-Lezcano,
	Rui Nuno Capela, Steven Rostedt, Thomas Gleixner

K.R. Foley wrote:
>Mark_H_Johnson@raytheon.com wrote:
[snip - long explanation of how a nice application can starve a non
nice application for minutes at a time on an SMP system]

>> My point was that -mm definitely has the problem (though to a lesser
>> degree). The tests I ran showed it on both the disk read and disk copy
>> stress tests. I guess I should try a vanilla 2.6.10 run as well to see
>> if it is something introduced in the -mm series (it certainly is not a
>> recent change...).
>
>I'm curious if anyone is seeing this behavior on UP systems, or is it
>only happening on SMP?
The build of 2.6.10 vanilla just completed and I reran my tests with
SMP and with MAXCPUS=1 (UP w/ SMP kernel).

The vanilla 2.6.10 kernel has the non RT starvation problem as well
for both test runs. It looks like this is not something in -mm but a
change between 2.4 and 2.6.

I did notice the test results were a little inconsistent between the
two runs...
             2.6.10 SMP    2.6.10 UP (w/ SMP kernel)
disk write    starved          OK
disk copy        OK         starved
disk read     starved       starved
but in both cases, a non nice (non RT) disk application was
starved by a nice (non RT) cpu application for minutes.

I wonder who I should be talking to next (or submit a bug report?)
about this.

  --Mark


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Real-Time Preemption, comparison to 2.6.10-mm1
  2005-01-04 20:11 Mark_H_Johnson
@ 2005-01-04 21:37 ` Lee Revell
  0 siblings, 0 replies; 8+ messages in thread
From: Lee Revell @ 2005-01-04 21:37 UTC (permalink / raw)
  To: Mark_H_Johnson
  Cc: Ingo Molnar, Andrew Morton, Bill Huey, linux-kernel,
	Rui Nuno Capela, K.R. Foley, Adam Heath, Florian Schmidt,
	Thomas Gleixner, Fernando Pablo Lopez-Lezcano, Steven Rostedt

On Tue, 2005-01-04 at 14:11 -0600, Mark_H_Johnson@raytheon.com wrote:
> The non RT application starvation for mm1 was much less
> pronounced but still present. I could watch the disk light
> on the last two tests & see it go out (and stay out) for an
> extended period. It does not make sense to me that a single RT
> application (on a two CPU machine) and a nice'd non RT application
> can cause this starvation behavior. This behavior was not
> present on the 2.4 kernels and seems to be a regression to me.

I think I am seeing this problem too.  It doesn't just apply to RT
tasks, it seems that CPU bound tasks starve each other.  I noticed that
with the RT kernel, a kernel compile or dpkg will starve evolution, to
the point where it takes 30 seconds to display a message.  If I go and
background the CPU hog, the message renders _instantly_.

It's definitely present with 2.6.10-rc2 + RT (PK config) and absent with
2.6.10 vanilla.  I need to figure out whether -mm has the problem.

Lee

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Real-Time Preemption, comparison to 2.6.10-mm1
@ 2005-01-04 20:11 Mark_H_Johnson
  2005-01-04 21:37 ` Lee Revell
  0 siblings, 1 reply; 8+ messages in thread
From: Mark_H_Johnson @ 2005-01-04 20:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Bill Huey, linux-kernel, Lee Revell,
	Rui Nuno Capela, Mark_H_Johnson, K.R. Foley, Adam Heath,
	Florian Schmidt, Thomas Gleixner, Fernando Pablo Lopez-Lezcano,
	Steven Rostedt

I just finished a couple test runs of 2.6.10-mm1 (no realtime preemption)
patches for comparison. The results show that the realtime preemption
patches go a long way to improving the latency of the kernel as measured
by the maximum duration of the CPU loop in latencytest. This was on my
two CPU test system (866 Mhz Pentium III).

Comparison of 2.6.9-V0.7.33-00RT and .33-00PK results with 2.6.10-mm1

00RT has PREEMPT_RT (and tracing)
00PK has PREEMPT_DESKTOP and no threaded IRQ's (and tracing)
mm1 is 2.6.10-mm1 with CONFIG_PREEMPT=y and CONFIG_PREEMPT_BKL=y
2.4 has lowlat + preempt patches applied

% within 100 usec (200 usec for RT due to tracing overhead)
           CPU loop (%)          Elapsed Time (sec)        2.4
Test   mm1     RT     PK       mm1     RT      PK   |   CPU  Elapsed
X     99.90  96.78  99.88       62 *   90 *    83+* |  97.20   70
top   99.94  93.87 100.00       30 *   36 *    31+  |  97.48   29
neto  94.16  99.17 100.00       75 *  340 *   193+  |  96.23   36
neti  93.92  99.13  98.39       54 *  340 *   280 * |  95.86   41
diskw 89.16  98.04 100.00?      64 *  350 *   310+  |  77.64   29
diskc 92.31  95.56  99.94      320+*  350 *   310+  |  84.12   77
diskr 95.06  90.77  99.94      310+*  220 *   310+  |  90.66   86
total                          915   1726    1517   |         368
        [higher is better]        [lower is better]
* wide variation in audio duration
+ long stretch of audio duration "too fast"
? I believe I had non RT starvation & this result is in error
[chart shows a typical results for about 20 seconds and then
gets "really smooth" like the top chart for the remainder of
the measured duration]

The percentages are all within a handful of a percent so these
results look pretty comparable.

Looking at ping response time:
  RT 0.226 / 0.486 / 2.475 / 0.083 msec
  PK 0.102 / 0.174 / 0.813 / 0.054 msec
 mm1 0.087 / 0.150 / 2.279 / 0.125 msec
for min / average / max / mdev values. Again, tracing penalizes
RT much more than PK so this is to be expected. The higher variation
on mm1 is perhaps to be expected (as well as the max value). The min
value is comparable to PK & is likely smaller due to differences in
tracing (PK had tracing, mm1 does not).

The maximum duration of the CPU loop (as measured by the
application) is in the range of 2.05 msec to 3.30 compared
to the nominal 1.16 msec duration for -00RT. The equivalent
numbers for -00PK are 1.21 to 2.61 msec. I would expect RT
to be better than PK on this measure, but it never seems to
be the result I measure. For the mm1 kernel, the range was
much larger (as expected) from 1.18 to 5.59 msec. The huge
latencies in mm1 are primarily in disk reads & copies /
overhead of the ext3 file system.

The non RT application starvation for mm1 was much less
pronounced but still present. I could watch the disk light
on the last two tests & see it go out (and stay out) for an
extended period. It does not make sense to me that a single RT
application (on a two CPU machine) and a nice'd non RT application
can cause this starvation behavior. This behavior was not
present on the 2.4 kernels and seems to be a regression to me.

I will put together a pair of 2.6.10-mm1-V0.7.34-xx kernels
tomorrow and rerun the tests to see if the results are consistent.

  --Mark

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-01-07  1:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-05 13:55 Real-Time Preemption, comparison to 2.6.10-mm1 Mark_H_Johnson
2005-01-05 14:38 ` K.R. Foley
  -- strict thread matches above, loose matches on Subject: below --
2005-01-05 22:58 Mark_H_Johnson
2005-01-07  0:48 ` K.R. Foley
2005-01-05 17:52 Mark_H_Johnson
2005-01-05 21:20 ` K.R. Foley
2005-01-04 20:11 Mark_H_Johnson
2005-01-04 21:37 ` Lee Revell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).