All of lore.kernel.org
 help / color / mirror / Atom feed
* [Adeos-main] latency results for ppc and x86
       [not found] <45CD730A.6000405@domain.hid>
@ 2007-02-20  7:21 ` poornima r
  2007-02-21  7:13   ` Wolfgang Grandegger
  0 siblings, 1 reply; 15+ messages in thread
From: poornima r @ 2007-02-20  7:21 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: adeos-main


Hello,

These were the scheduling latency and interrupt
latency test results on ppc and x86 with IPIPE tracer
option disabled.

1.Please comment on these results (whether valid) and 
2.Is there any method to optimize these results.

1)PPC:-
(MPC-860 at 48 MHz, 4 kB I-Cache and 4 kB D-Cache) 

User mode:-
root@domain.hid# ./latency -t0
== Sampling period: 1000000 us
== Test mode: periodic user-mode task
== All results in microseconds
warming up...
RTT|  00:00:01  (periodic user-mode task, 1000000 us
period, priority 99)
RTH|-----lat min|-----lat
avg|-----latmax|-overrun|----lat best|---lat worst
RTD|     167.000|     167.000|     167.000|       0|  
   167.000|     167.000
RTD|     176.000|     176.000|     176.000|       0|  
   167.000|     176.000
RTD|     168.000|     168.000|     168.000|       0|  
    167.000|    176.000
RTD|     171.000|     171.000|     171.000|       0|  
    167.000|    176.000

Kernel mode:-
root@domain.hid# ./latency -t1
== Sampling period: 1000000 us
== Test mode: in-kernel periodic task
== All results in microseconds
warming up...
RTT|  00:00:00  (in-kernel periodic task, 1000000 us
period, priority 99)
RTH|-----lat min|-----lat
avg|-----latmax|-overrun|----lat best|---lat worst
RTD|     123.000|     123.000|     123.000|       0|  
    123.000|     123.000
RTD|     125.000|     125.000|     125.000|       0|  
    123.000|     125.000
RTD|     128.333|     128.333|     128.333|       0|  
    123.000|     128.333
RTD|     127.000|     127.000|     127.000|       0|  
    123.000|     128.333

Interrupt mode:-
root@domain.hid# ./latency -t2
== Sampling period: 1000000 us
== Test mode: in-kernel timer handler
== All results in microseconds
warming up...
RTT|  00:00:01  (in-kernel timer handler, 1000000 us
period, priority 99)
RTH|-----lat min|-----lat
avg|-----latmax|-overrun|----lat best|---lat worst
RTD|      45.334|      45.334|       45.334|       0| 
         45.334|      45.334
RTD|      45.000|      45.000|       45.000|       0| 
         45.000|      45.334
RTD|      46.000|      46.000|       46.000|       0| 
         45.000|      46.000
RTD|      47.334|      47.334|       47.334|       0| 
         45.000|      47.334
RTD|      46.334|      46.334|       46.334|       0| 
         45.000|      47.334

2)X86:-
(Pentium4, 3.06GHz, 1024 KB cache size)
User mode:-
Sampling period: 100 us
== Test mode: in-kernel periodic task
== All results in microseconds
warming up...

RTT|  00:00:01  (periodic user-mode task, 100 us
period, priority 99)
RTH|-----lat min|-----lat avg|-----lat
max|-overrun|----lat best|---lat worst
RTD|       3.807|      12.825|      21.565|       0|  
    3.807|      21.565
RTD|       3.796|      12.792|      21.483|       0|  
    3.796|      21.565
RTD|       3.770|      12.799|      21.501|       0|  
    3.770|      21.565
RTD|       3.578|      12.806|      20.890|       0|  
    3.578|      21.565
RTD|       3.755|      12.809|      21.486|       0|  
    3.578|

kernel mode:-
Sampling period: 100 us
== Test mode: in-kernel periodic task
== All results in microseconds
warming up...

RTT|  00:00:01  (in-kernel periodic task, 100 us
period, priority 99)
RTH|-----lat min|-----lat avg|-----lat
max|-overrun|----lat best|---lat worst
RTD|       2.381|       3.451|      19.620|       0|  
    2.381|      19.620
RTD|       2.332|       3.480|      19.930|       0|  
    2.332|      19.930
RTD|       2.382|       3.649|      19.609|       0|  
    2.332|      19.930
RTD|       2.323|       2.786|      14.351|       0|  
    2.323|      19.930
RTD|       2.375|       2.532|       5.519|       0|  
    2.323|      19.930
RTD|       2.332|       3.971|      19.617|       0|  
    2.323|      19.930

Interrupt mode:-
Sampling period: 100 us
== Test mode: in-kernel timer handler
== All results in microseconds
warming up...

RTT|  00:00:01  (in-kernel timer handler, 100 us
period, priority 99)
RTH|-----lat min|-----lat avg|-----lat
max|-overrun|----lat best|---lat worst
RTD|      -1.563|       7.553|      15.736|       0|  
   -1.563|      15.736
RTD|      -1.579|       7.558|      15.804|       0|  
   -1.579|      15.804
RTD|      -1.584|       7.529|      16.167|       0|  
   -1.584|      16.167
RTD|      -1.548|       7.553|      16.186|       0|  
   -1.584|      16.186
RTD|      -1.585|       7.556|      16.275|       0|  
   -1.585|      16.275

Thanks,
Poornima


--- Wolfgang Grandegger <wg@domain.hid> wrote:

> Hello,
> 
> poornima r wrote:
> > Hello,
> > 
> > Srry for not replying all these days...
> > (Was not in in station, may be too personal!!!!!)
> > 
> > About software emulation error:
> > 
> > 4)Output of /proc/xenomai/faults after the illegal
> >>> instruction:-
> >>> root@domain.hid# cat
> >>> /proc/xenomai/faults
> >>> TRAP         CPU0
> >>>   0:            0    (Data or instruction
> access)
> >>>   1:            0    (Alignment)
> >>>   2:            0    (Altivec unavailable)
> >>>   3:            0    (Program check exception)
> >>>   4:            0    (Machine check exception)
> >>>   5:            0    (Unknown)
> >>>   6:            0    (Instruction breakpoint)
> >>>   7:            0    (Run mode exception)
> >>>   8:            0    (Single-step exception)
> >>>   9:            0    (Non-recoverable exception)
> >>>  10:            1    (Software emulation)
> >>>  11:            0    (Debug)
> >>>  12:            0    (SPE)
> >>>  13:            0    (Altivec assist)
> >> Hm, I see a software emulation exception which is
> >> also the reason for 
> >> the illegal instructions. What toolchain do you
> use?
> >> The toolchain 
> >> should support software FP emulation.
> > 
> > 1)I am using open source too chain with software
> > floating point emulation support.
> > (#ppc_8xx-gcc --v
> >
>
/lib/gcc/powerpc/3.4.3/../../../../target_powerpc/usr/include/c++/3.4.3
> > --with-numa-policy=no --with-float=soft)
> > 
> > 2)And the kernel is included with code to emulate
> a
> > floating-point                                    
>    
> >          unit, which will allow programs that 
> > use floating-point                                
>    
> >                           instructions to run
> > 
> > Kernel configuration 
> > ----CONFIG_MATH_EMULATION:y
> 
> If you build with "--with-float=soft" there is no
> need for math 
> emulation in the kernel. Likely, there is something
> wrong with your 
> tool-chain. Could you please try a known-to-work
> tool-chain like the 
> ELDK v4.x from http://www.denx.de.
> 
> Wolfgang.
> 
> > Thanks,
> > Poornima
> > 
> > --- Wolfgang Grandegger <wg@domain.hid> wrote:
> > 
> >> poornima r wrote:
> >>> Hi,
> >>>
> >>> 1)I am using open source kernel from Kernel.org,
> >>> but what is meant by vanilla kernel from
> >> Kernel.org?
> >>
> >> It's the kernel from kernel.org. This means that
> the
> >> Linux kernel 2.6.18 
> >> is running fine on your MPC860 platform as is?
> >> Thanks for the info.
> >>
> >>> 2)With sampling period of 500usec the system
> >> simply
> >>> hangs without printing any results (./latenct
> >> -p500)
> >>
> >> OK.
> >>
> >>> 3)cyclictest with -t1 option (without
> >> IPIPE-tracer)
> >>> root@domain.hid#
> ./cyclictest
> >> -t1
> >>> 2.04 0.50 0.17 8/27 174
> >>>
> >>> T: 0 (    0) P:99 I:    1000 C:       0 Min:
> >> 1000000
> >>> Act:       0 Avg:       0 Max:-1000000
> >>> Illegal instruction
> >>>
> >>> 4)Output of /proc/xenomai/faults after the
> illegal
> >>> instruction:-
> >>> root@domain.hid# cat
> >>> /proc/xenomai/faults
> >>> TRAP         CPU0
> >>>   0:            0    (Data or instruction
> access)
> >>>   1:            0    (Alignment)
> >>>   2:            0    (Altivec unavailable)
> >>>   3:            0    (Program check exception)
> >>>   4:            0    (Machine check exception)
> >>>   5:            0    (Unknown)
> >>>   6:            0    (Instruction breakpoint)
> >>>   7:            0    (Run mode exception)
> >>>   8:            0    (Single-step exception)
> >>>   9:            0    (Non-recoverable exception)
> >>>  10:            1    (Software emulation)
> >>>  11:            0    (Debug)
> >>>  12:            0    (SPE)
> >>>  13:            0    (Altivec assist)
> >> Hm, I see a software emulation exception which is
> >> also the reason for 
> >> the illegal instructions. What toolchain do you
> use?
> >> The toolchain 
> >> should support software FP emulation.
> >>
> >>> 5)Running switchtest:-
> >>> root@domain.hid#
> ./switchtest
> >> -n
> >>> --The system hangs wihtout printing any results
> >>>
> >>> Thanks,
> >>> Poornima
> >>>
> >>>
> >>> --- Wolfgang Grandegger <wg@domain.hid>
> wrote:
> >>>
> >>>> poornima r wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Thanks for the reply.
> >>>>>
> >>>>> Linux version:linux-2.6.18
> >>>>> Xenomai: xenomai-2.3.0 (Stable version)
> >>>>> adeos patch:
> adeos-ipipe-2.6.18-ppc-1.5-01.patch
> >>>> OK, I'm curious, did you use the vanilla kernel
> >> from
> >>>> kernel.org?
> >>>> More comments below.
> >>>>
> >>>>> The tests were run as follows:
> >>>>> 1)The sampling period in the code for latency
> >> and
> >>>>> switchbench was changed to 1000000000ns(to
> >> remove
> >>>>> overrun error) 
> >>>>> 2)switchtest was run with -n5 option
> >>>>> 3)cyclictest was run with  -t5 option(5
> threads 
> >>>>> were created.)
> >>>>> 4)cyclictest was terminated with Illegal
> >>>> instruction
> >>>>> (after creating 5 threads) with IPIPE tracer
> >>>> enabled.
> >>>>
> >>>>> These were the results without I-PIPE Tracer
> >>>> option:
> >>>>> (All the tests were run without any load)
> >>>>> 1)LATENCY TEST:-
> >>>>> User mode:-
> >>>>> /mnt/out_xen/bin# ./latency -t0
> >>>>> == Sampling period: 1000000 us
> >>>>> == Test mode: periodic user-mode task
> >>>>> == All results in microseconds
> >>>>> warming up...
> >>>>> RTT|  00:00:01  (periodic user-mode task,
> >> 1000000
> >>>> us
> >>>>> period, priority 99)
> >>>>> RTH|-----lat min|-----lat avg|-----lat
> >>>>> max|-overrun|----lat best|---lat worst
> >>>>> RTD|     167.000|     167.000|     167.000|   
>  
> >>>> 0|  
> >>>>>   167.000|     167.000
> >>>>> RTD|     176.000|     176.000|     176.000|   
>  
> >>>> 0|  
> >>>>>   167.000|     176.000
> >>>>> RTD|     168.000|     168.000|     168.000|   
>  
> >>>> 0|  
> >>>>>   167.000|     176.000
> 
=== message truncated ===



 
____________________________________________________________________________________
Get your own web address.  
Have a HUGE year through Yahoo! Small Business.
http://smallbusiness.yahoo.com/domains/?p=BESTDEAL


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-20  7:21 ` [Adeos-main] latency results for ppc and x86 poornima r
@ 2007-02-21  7:13   ` Wolfgang Grandegger
  2007-02-21  9:33     ` poornima r
  2007-03-14 12:51     ` [Adeos-main] test results for switchtest and cyclictest on x86 poornima r
  0 siblings, 2 replies; 15+ messages in thread
From: Wolfgang Grandegger @ 2007-02-21  7:13 UTC (permalink / raw)
  To: poornima r; +Cc: adeos-main

Hello,

poornima r wrote:
> Hello,
> 
> These were the scheduling latency and interrupt
> latency test results on ppc and x86 with IPIPE tracer
> option disabled.
> 
> 1.Please comment on these results (whether valid) and 

Your results are OK. These are actually the figures I remember from my 
own tests in the past.

> 2.Is there any method to optimize these results.

No that I know of. There are a few ideas how to reduce latencies further 
like cache locking or TLB pinning.

> 1)PPC:-
> (MPC-860 at 48 MHz, 4 kB I-Cache and 4 kB D-Cache) 
> 
> User mode:-
> root@domain.hid# ./latency -t0
> == Sampling period: 1000000 us
> == Test mode: periodic user-mode task
> == All results in microseconds
> warming up...
> RTT|  00:00:01  (periodic user-mode task, 1000000 us
> period, priority 99)
> RTH|-----lat min|-----lat
> avg|-----latmax|-overrun|----lat best|---lat worst
> RTD|     167.000|     167.000|     167.000|       0|  
>    167.000|     167.000
> RTD|     176.000|     176.000|     176.000|       0|  
>    167.000|     176.000
> RTD|     168.000|     168.000|     168.000|       0|  
>     167.000|    176.000
> RTD|     171.000|     171.000|     171.000|       0|  
>     167.000|    176.000
> 
> Kernel mode:-
> root@domain.hid# ./latency -t1
> == Sampling period: 1000000 us
> == Test mode: in-kernel periodic task
> == All results in microseconds
> warming up...
> RTT|  00:00:00  (in-kernel periodic task, 1000000 us
> period, priority 99)
> RTH|-----lat min|-----lat
> avg|-----latmax|-overrun|----lat best|---lat worst
> RTD|     123.000|     123.000|     123.000|       0|  
>     123.000|     123.000
> RTD|     125.000|     125.000|     125.000|       0|  
>     123.000|     125.000
> RTD|     128.333|     128.333|     128.333|       0|  
>     123.000|     128.333
> RTD|     127.000|     127.000|     127.000|       0|  
>     123.000|     128.333
> 
> Interrupt mode:-
> root@domain.hid# ./latency -t2
> == Sampling period: 1000000 us
> == Test mode: in-kernel timer handler
> == All results in microseconds
> warming up...
> RTT|  00:00:01  (in-kernel timer handler, 1000000 us
> period, priority 99)
> RTH|-----lat min|-----lat
> avg|-----latmax|-overrun|----lat best|---lat worst
> RTD|      45.334|      45.334|       45.334|       0| 
>          45.334|      45.334
> RTD|      45.000|      45.000|       45.000|       0| 
>          45.000|      45.334
> RTD|      46.000|      46.000|       46.000|       0| 
>          45.000|      46.000
> RTD|      47.334|      47.334|       47.334|       0| 
>          45.000|      47.334
> RTD|      46.334|      46.334|       46.334|       0| 
>          45.000|      47.334

On the MPC860, the latencies are mainly due code execution time as this 
processor is very slow.

> 2)X86:-
> (Pentium4, 3.06GHz, 1024 KB cache size)
> User mode:-
> Sampling period: 100 us
> == Test mode: in-kernel periodic task
> == All results in microseconds
> warming up...
> 
> RTT|  00:00:01  (periodic user-mode task, 100 us
> period, priority 99)
> RTH|-----lat min|-----lat avg|-----lat
> max|-overrun|----lat best|---lat worst
> RTD|       3.807|      12.825|      21.565|       0|  
>     3.807|      21.565
> RTD|       3.796|      12.792|      21.483|       0|  
>     3.796|      21.565
> RTD|       3.770|      12.799|      21.501|       0|  
>     3.770|      21.565
> RTD|       3.578|      12.806|      20.890|       0|  
>     3.578|      21.565
> RTD|       3.755|      12.809|      21.486|       0|  
>     3.578|
> 
> kernel mode:-
> Sampling period: 100 us
> == Test mode: in-kernel periodic task
> == All results in microseconds
> warming up...
> 
> RTT|  00:00:01  (in-kernel periodic task, 100 us
> period, priority 99)
> RTH|-----lat min|-----lat avg|-----lat
> max|-overrun|----lat best|---lat worst
> RTD|       2.381|       3.451|      19.620|       0|  
>     2.381|      19.620
> RTD|       2.332|       3.480|      19.930|       0|  
>     2.332|      19.930
> RTD|       2.382|       3.649|      19.609|       0|  
>     2.332|      19.930
> RTD|       2.323|       2.786|      14.351|       0|  
>     2.323|      19.930
> RTD|       2.375|       2.532|       5.519|       0|  
>     2.323|      19.930
> RTD|       2.332|       3.971|      19.617|       0|  
>     2.323|      19.930
> 
> Interrupt mode:-
> Sampling period: 100 us
> == Test mode: in-kernel timer handler
> == All results in microseconds
> warming up...
> 
> RTT|  00:00:01  (in-kernel timer handler, 100 us
> period, priority 99)
> RTH|-----lat min|-----lat avg|-----lat
> max|-overrun|----lat best|---lat worst
> RTD|      -1.563|       7.553|      15.736|       0|  
>    -1.563|      15.736
> RTD|      -1.579|       7.558|      15.804|       0|  
>    -1.579|      15.804
> RTD|      -1.584|       7.529|      16.167|       0|  
>    -1.584|      16.167
> RTD|      -1.548|       7.553|      16.186|       0|  
>    -1.584|      16.186
> RTD|      -1.585|       7.556|      16.275|       0|  
>    -1.585|      16.275

Latencies are mainly due to cache refills on the P4. Have you already 
put load onto your system? If not, worst case latencies will be even longer.

Wolfgang.

> Thanks,
> Poornima
> 
> 
> --- Wolfgang Grandegger <wg@domain.hid> wrote:
> 
>> Hello,
>>
>> poornima r wrote:
>>> Hello,
>>>
>>> Srry for not replying all these days...
>>> (Was not in in station, may be too personal!!!!!)
>>>
>>> About software emulation error:
>>>
>>> 4)Output of /proc/xenomai/faults after the illegal
>>>>> instruction:-
>>>>> root@domain.hid# cat
>>>>> /proc/xenomai/faults
>>>>> TRAP         CPU0
>>>>>   0:            0    (Data or instruction
>> access)
>>>>>   1:            0    (Alignment)
>>>>>   2:            0    (Altivec unavailable)
>>>>>   3:            0    (Program check exception)
>>>>>   4:            0    (Machine check exception)
>>>>>   5:            0    (Unknown)
>>>>>   6:            0    (Instruction breakpoint)
>>>>>   7:            0    (Run mode exception)
>>>>>   8:            0    (Single-step exception)
>>>>>   9:            0    (Non-recoverable exception)
>>>>>  10:            1    (Software emulation)
>>>>>  11:            0    (Debug)
>>>>>  12:            0    (SPE)
>>>>>  13:            0    (Altivec assist)
>>>> Hm, I see a software emulation exception which is
>>>> also the reason for 
>>>> the illegal instructions. What toolchain do you
>> use?
>>>> The toolchain 
>>>> should support software FP emulation.
>>> 1)I am using open source too chain with software
>>> floating point emulation support.
>>> (#ppc_8xx-gcc --v
>>>
> /lib/gcc/powerpc/3.4.3/../../../../target_powerpc/usr/include/c++/3.4.3
>>> --with-numa-policy=no --with-float=soft)
>>>
>>> 2)And the kernel is included with code to emulate
>> a
>>> floating-point                                    
>>    
>>>          unit, which will allow programs that 
>>> use floating-point                                
>>    
>>>                           instructions to run
>>>
>>> Kernel configuration 
>>> ----CONFIG_MATH_EMULATION:y
>> If you build with "--with-float=soft" there is no
>> need for math 
>> emulation in the kernel. Likely, there is something
>> wrong with your 
>> tool-chain. Could you please try a known-to-work
>> tool-chain like the 
>> ELDK v4.x from http://www.denx.de.
>>
>> Wolfgang.
>>
>>> Thanks,
>>> Poornima
>>>
>>> --- Wolfgang Grandegger <wg@domain.hid> wrote:
>>>
>>>> poornima r wrote:
>>>>> Hi,
>>>>>
>>>>> 1)I am using open source kernel from Kernel.org,
>>>>> but what is meant by vanilla kernel from
>>>> Kernel.org?
>>>>
>>>> It's the kernel from kernel.org. This means that
>> the
>>>> Linux kernel 2.6.18 
>>>> is running fine on your MPC860 platform as is?
>>>> Thanks for the info.
>>>>
>>>>> 2)With sampling period of 500usec the system
>>>> simply
>>>>> hangs without printing any results (./latenct
>>>> -p500)
>>>>
>>>> OK.
>>>>
>>>>> 3)cyclictest with -t1 option (without
>>>> IPIPE-tracer)
>>>>> root@domain.hid#
>> ./cyclictest
>>>> -t1
>>>>> 2.04 0.50 0.17 8/27 174
>>>>>
>>>>> T: 0 (    0) P:99 I:    1000 C:       0 Min:
>>>> 1000000
>>>>> Act:       0 Avg:       0 Max:-1000000
>>>>> Illegal instruction
>>>>>
>>>>> 4)Output of /proc/xenomai/faults after the
>> illegal
>>>>> instruction:-
>>>>> root@domain.hid# cat
>>>>> /proc/xenomai/faults
>>>>> TRAP         CPU0
>>>>>   0:            0    (Data or instruction
>> access)
>>>>>   1:            0    (Alignment)
>>>>>   2:            0    (Altivec unavailable)
>>>>>   3:            0    (Program check exception)
>>>>>   4:            0    (Machine check exception)
>>>>>   5:            0    (Unknown)
>>>>>   6:            0    (Instruction breakpoint)
>>>>>   7:            0    (Run mode exception)
>>>>>   8:            0    (Single-step exception)
>>>>>   9:            0    (Non-recoverable exception)
>>>>>  10:            1    (Software emulation)
>>>>>  11:            0    (Debug)
>>>>>  12:            0    (SPE)
>>>>>  13:            0    (Altivec assist)
>>>> Hm, I see a software emulation exception which is
>>>> also the reason for 
>>>> the illegal instructions. What toolchain do you
>> use?
>>>> The toolchain 
>>>> should support software FP emulation.
>>>>
>>>>> 5)Running switchtest:-
>>>>> root@domain.hid#
>> ./switchtest
>>>> -n
>>>>> --The system hangs wihtout printing any results
>>>>>
>>>>> Thanks,
>>>>> Poornima
>>>>>
>>>>>
>>>>> --- Wolfgang Grandegger <wg@domain.hid>
>> wrote:
>>>>>> poornima r wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Thanks for the reply.
>>>>>>>
>>>>>>> Linux version:linux-2.6.18
>>>>>>> Xenomai: xenomai-2.3.0 (Stable version)
>>>>>>> adeos patch:
>> adeos-ipipe-2.6.18-ppc-1.5-01.patch
>>>>>> OK, I'm curious, did you use the vanilla kernel
>>>> from
>>>>>> kernel.org?
>>>>>> More comments below.
>>>>>>
>>>>>>> The tests were run as follows:
>>>>>>> 1)The sampling period in the code for latency
>>>> and
>>>>>>> switchbench was changed to 1000000000ns(to
>>>> remove
>>>>>>> overrun error) 
>>>>>>> 2)switchtest was run with -n5 option
>>>>>>> 3)cyclictest was run with  -t5 option(5
>> threads 
>>>>>>> were created.)
>>>>>>> 4)cyclictest was terminated with Illegal
>>>>>> instruction
>>>>>>> (after creating 5 threads) with IPIPE tracer
>>>>>> enabled.
>>>>>>
>>>>>>> These were the results without I-PIPE Tracer
>>>>>> option:
>>>>>>> (All the tests were run without any load)
>>>>>>> 1)LATENCY TEST:-
>>>>>>> User mode:-
>>>>>>> /mnt/out_xen/bin# ./latency -t0
>>>>>>> == Sampling period: 1000000 us
>>>>>>> == Test mode: periodic user-mode task
>>>>>>> == All results in microseconds
>>>>>>> warming up...
>>>>>>> RTT|  00:00:01  (periodic user-mode task,
>>>> 1000000
>>>>>> us
>>>>>>> period, priority 99)
>>>>>>> RTH|-----lat min|-----lat avg|-----lat
>>>>>>> max|-overrun|----lat best|---lat worst
>>>>>>> RTD|     167.000|     167.000|     167.000|   
>>  
>>>>>> 0|  
>>>>>>>   167.000|     167.000
>>>>>>> RTD|     176.000|     176.000|     176.000|   
>>  
>>>>>> 0|  
>>>>>>>   167.000|     176.000
>>>>>>> RTD|     168.000|     168.000|     168.000|   
>>  
>>>>>> 0|  
>>>>>>>   167.000|     176.000
> === message truncated ===
> 
> 
> 
>  
> ____________________________________________________________________________________
> Get your own web address.  
> Have a HUGE year through Yahoo! Small Business.
> http://smallbusiness.yahoo.com/domains/?p=BESTDEAL
> 
> _______________________________________________
> Adeos-main mailing list
> Adeos-main@domain.hid
> https://mail.gna.org/listinfo/adeos-main
> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-21  7:13   ` Wolfgang Grandegger
@ 2007-02-21  9:33     ` poornima r
  2007-02-21  9:33       ` Nicholas Mc Guire
  2007-03-14 12:51     ` [Adeos-main] test results for switchtest and cyclictest on x86 poornima r
  1 sibling, 1 reply; 15+ messages in thread
From: poornima r @ 2007-02-21  9:33 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: adeos-main

[-- Attachment #1: Type: text/plain, Size: 11704 bytes --]


Hello, 

Thankx for the reply.
These tests were actually run without loading the system

Thanks,
Poornima
Wolfgang Grandegger <wg@domain.hid> wrote: Hello,

poornima r wrote:
> Hello,
> 
> These were the scheduling latency and interrupt
> latency test results on ppc and x86 with IPIPE tracer
> option disabled.
> 
> 1.Please comment on these results (whether valid) and 

Your results are OK. These are actually the figures I remember from my 
own tests in the past.

> 2.Is there any method to optimize these results.

No that I know of. There are a few ideas how to reduce latencies further 
like cache locking or TLB pinning.

> 1)PPC:-
> (MPC-860 at 48 MHz, 4 kB I-Cache and 4 kB D-Cache) 
> 
> User mode:-
> root@domain.hid# ./latency -t0
> == Sampling period: 1000000 us
> == Test mode: periodic user-mode task
> == All results in microseconds
> warming up...
> RTT|  00:00:01  (periodic user-mode task, 1000000 us
> period, priority 99)
> RTH|-----lat min|-----lat
> avg|-----latmax|-overrun|----lat best|---lat worst
> RTD|     167.000|     167.000|     167.000|       0|  
>    167.000|     167.000
> RTD|     176.000|     176.000|     176.000|       0|  
>    167.000|     176.000
> RTD|     168.000|     168.000|     168.000|       0|  
>     167.000|    176.000
> RTD|     171.000|     171.000|     171.000|       0|  
>     167.000|    176.000
> 
> Kernel mode:-
> root@domain.hid# ./latency -t1
> == Sampling period: 1000000 us
> == Test mode: in-kernel periodic task
> == All results in microseconds
> warming up...
> RTT|  00:00:00  (in-kernel periodic task, 1000000 us
> period, priority 99)
> RTH|-----lat min|-----lat
> avg|-----latmax|-overrun|----lat best|---lat worst
> RTD|     123.000|     123.000|     123.000|       0|  
>     123.000|     123.000
> RTD|     125.000|     125.000|     125.000|       0|  
>     123.000|     125.000
> RTD|     128.333|     128.333|     128.333|       0|  
>     123.000|     128.333
> RTD|     127.000|     127.000|     127.000|       0|  
>     123.000|     128.333
> 
> Interrupt mode:-
> root@domain.hid# ./latency -t2
> == Sampling period: 1000000 us
> == Test mode: in-kernel timer handler
> == All results in microseconds
> warming up...
> RTT|  00:00:01  (in-kernel timer handler, 1000000 us
> period, priority 99)
> RTH|-----lat min|-----lat
> avg|-----latmax|-overrun|----lat best|---lat worst
> RTD|      45.334|      45.334|       45.334|       0| 
>          45.334|      45.334
> RTD|      45.000|      45.000|       45.000|       0| 
>          45.000|      45.334
> RTD|      46.000|      46.000|       46.000|       0| 
>          45.000|      46.000
> RTD|      47.334|      47.334|       47.334|       0| 
>          45.000|      47.334
> RTD|      46.334|      46.334|       46.334|       0| 
>          45.000|      47.334

On the MPC860, the latencies are mainly due code execution time as this 
processor is very slow.

> 2)X86:-
> (Pentium4, 3.06GHz, 1024 KB cache size)
> User mode:-
> Sampling period: 100 us
> == Test mode: in-kernel periodic task
> == All results in microseconds
> warming up...
> 
> RTT|  00:00:01  (periodic user-mode task, 100 us
> period, priority 99)
> RTH|-----lat min|-----lat avg|-----lat
> max|-overrun|----lat best|---lat worst
> RTD|       3.807|      12.825|      21.565|       0|  
>     3.807|      21.565
> RTD|       3.796|      12.792|      21.483|       0|  
>     3.796|      21.565
> RTD|       3.770|      12.799|      21.501|       0|  
>     3.770|      21.565
> RTD|       3.578|      12.806|      20.890|       0|  
>     3.578|      21.565
> RTD|       3.755|      12.809|      21.486|       0|  
>     3.578|
> 
> kernel mode:-
> Sampling period: 100 us
> == Test mode: in-kernel periodic task
> == All results in microseconds
> warming up...
> 
> RTT|  00:00:01  (in-kernel periodic task, 100 us
> period, priority 99)
> RTH|-----lat min|-----lat avg|-----lat
> max|-overrun|----lat best|---lat worst
> RTD|       2.381|       3.451|      19.620|       0|  
>     2.381|      19.620
> RTD|       2.332|       3.480|      19.930|       0|  
>     2.332|      19.930
> RTD|       2.382|       3.649|      19.609|       0|  
>     2.332|      19.930
> RTD|       2.323|       2.786|      14.351|       0|  
>     2.323|      19.930
> RTD|       2.375|       2.532|       5.519|       0|  
>     2.323|      19.930
> RTD|       2.332|       3.971|      19.617|       0|  
>     2.323|      19.930
> 
> Interrupt mode:-
> Sampling period: 100 us
> == Test mode: in-kernel timer handler
> == All results in microseconds
> warming up...
> 
> RTT|  00:00:01  (in-kernel timer handler, 100 us
> period, priority 99)
> RTH|-----lat min|-----lat avg|-----lat
> max|-overrun|----lat best|---lat worst
> RTD|      -1.563|       7.553|      15.736|       0|  
>    -1.563|      15.736
> RTD|      -1.579|       7.558|      15.804|       0|  
>    -1.579|      15.804
> RTD|      -1.584|       7.529|      16.167|       0|  
>    -1.584|      16.167
> RTD|      -1.548|       7.553|      16.186|       0|  
>    -1.584|      16.186
> RTD|      -1.585|       7.556|      16.275|       0|  
>    -1.585|      16.275

Latencies are mainly due to cache refills on the P4. Have you already 
put load onto your system? If not, worst case latencies will be even longer.

Wolfgang.

> Thanks,
> Poornima
> 
> 
> --- Wolfgang Grandegger  wrote:
> 
>> Hello,
>>
>> poornima r wrote:
>>> Hello,
>>>
>>> Srry for not replying all these days...
>>> (Was not in in station, may be too personal!!!!!)
>>>
>>> About software emulation error:
>>>
>>> 4)Output of /proc/xenomai/faults after the illegal
>>>>> instruction:-
>>>>> root@domain.hid# cat
>>>>> /proc/xenomai/faults
>>>>> TRAP         CPU0
>>>>>   0:            0    (Data or instruction
>> access)
>>>>>   1:            0    (Alignment)
>>>>>   2:            0    (Altivec unavailable)
>>>>>   3:            0    (Program check exception)
>>>>>   4:            0    (Machine check exception)
>>>>>   5:            0    (Unknown)
>>>>>   6:            0    (Instruction breakpoint)
>>>>>   7:            0    (Run mode exception)
>>>>>   8:            0    (Single-step exception)
>>>>>   9:            0    (Non-recoverable exception)
>>>>>  10:            1    (Software emulation)
>>>>>  11:            0    (Debug)
>>>>>  12:            0    (SPE)
>>>>>  13:            0    (Altivec assist)
>>>> Hm, I see a software emulation exception which is
>>>> also the reason for 
>>>> the illegal instructions. What toolchain do you
>> use?
>>>> The toolchain 
>>>> should support software FP emulation.
>>> 1)I am using open source too chain with software
>>> floating point emulation support.
>>> (#ppc_8xx-gcc --v
>>>
> /lib/gcc/powerpc/3.4.3/../../../../target_powerpc/usr/include/c++/3.4.3
>>> --with-numa-policy=no --with-float=soft)
>>>
>>> 2)And the kernel is included with code to emulate
>> a
>>> floating-point                                    
>>    
>>>          unit, which will allow programs that 
>>> use floating-point                                
>>    
>>>                           instructions to run
>>>
>>> Kernel configuration 
>>> ----CONFIG_MATH_EMULATION:y
>> If you build with "--with-float=soft" there is no
>> need for math 
>> emulation in the kernel. Likely, there is something
>> wrong with your 
>> tool-chain. Could you please try a known-to-work
>> tool-chain like the 
>> ELDK v4.x from http://www.denx.de.
>>
>> Wolfgang.
>>
>>> Thanks,
>>> Poornima
>>>
>>> --- Wolfgang Grandegger  wrote:
>>>
>>>> poornima r wrote:
>>>>> Hi,
>>>>>
>>>>> 1)I am using open source kernel from Kernel.org,
>>>>> but what is meant by vanilla kernel from
>>>> Kernel.org?
>>>>
>>>> It's the kernel from kernel.org. This means that
>> the
>>>> Linux kernel 2.6.18 
>>>> is running fine on your MPC860 platform as is?
>>>> Thanks for the info.
>>>>
>>>>> 2)With sampling period of 500usec the system
>>>> simply
>>>>> hangs without printing any results (./latenct
>>>> -p500)
>>>>
>>>> OK.
>>>>
>>>>> 3)cyclictest with -t1 option (without
>>>> IPIPE-tracer)
>>>>> root@domain.hid#
>> ./cyclictest
>>>> -t1
>>>>> 2.04 0.50 0.17 8/27 174
>>>>>
>>>>> T: 0 (    0) P:99 I:    1000 C:       0 Min:
>>>> 1000000
>>>>> Act:       0 Avg:       0 Max:-1000000
>>>>> Illegal instruction
>>>>>
>>>>> 4)Output of /proc/xenomai/faults after the
>> illegal
>>>>> instruction:-
>>>>> root@domain.hid# cat
>>>>> /proc/xenomai/faults
>>>>> TRAP         CPU0
>>>>>   0:            0    (Data or instruction
>> access)
>>>>>   1:            0    (Alignment)
>>>>>   2:            0    (Altivec unavailable)
>>>>>   3:            0    (Program check exception)
>>>>>   4:            0    (Machine check exception)
>>>>>   5:            0    (Unknown)
>>>>>   6:            0    (Instruction breakpoint)
>>>>>   7:            0    (Run mode exception)
>>>>>   8:            0    (Single-step exception)
>>>>>   9:            0    (Non-recoverable exception)
>>>>>  10:            1    (Software emulation)
>>>>>  11:            0    (Debug)
>>>>>  12:            0    (SPE)
>>>>>  13:            0    (Altivec assist)
>>>> Hm, I see a software emulation exception which is
>>>> also the reason for 
>>>> the illegal instructions. What toolchain do you
>> use?
>>>> The toolchain 
>>>> should support software FP emulation.
>>>>
>>>>> 5)Running switchtest:-
>>>>> root@domain.hid#
>> ./switchtest
>>>> -n
>>>>> --The system hangs wihtout printing any results
>>>>>
>>>>> Thanks,
>>>>> Poornima
>>>>>
>>>>>
>>>>> --- Wolfgang Grandegger 
>> wrote:
>>>>>> poornima r wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Thanks for the reply.
>>>>>>>
>>>>>>> Linux version:linux-2.6.18
>>>>>>> Xenomai: xenomai-2.3.0 (Stable version)
>>>>>>> adeos patch:
>> adeos-ipipe-2.6.18-ppc-1.5-01.patch
>>>>>> OK, I'm curious, did you use the vanilla kernel
>>>> from
>>>>>> kernel.org?
>>>>>> More comments below.
>>>>>>
>>>>>>> The tests were run as follows:
>>>>>>> 1)The sampling period in the code for latency
>>>> and
>>>>>>> switchbench was changed to 1000000000ns(to
>>>> remove
>>>>>>> overrun error) 
>>>>>>> 2)switchtest was run with -n5 option
>>>>>>> 3)cyclictest was run with  -t5 option(5
>> threads 
>>>>>>> were created.)
>>>>>>> 4)cyclictest was terminated with Illegal
>>>>>> instruction
>>>>>>> (after creating 5 threads) with IPIPE tracer
>>>>>> enabled.
>>>>>>
>>>>>>> These were the results without I-PIPE Tracer
>>>>>> option:
>>>>>>> (All the tests were run without any load)
>>>>>>> 1)LATENCY TEST:-
>>>>>>> User mode:-
>>>>>>> /mnt/out_xen/bin# ./latency -t0
>>>>>>> == Sampling period: 1000000 us
>>>>>>> == Test mode: periodic user-mode task
>>>>>>> == All results in microseconds
>>>>>>> warming up...
>>>>>>> RTT|  00:00:01  (periodic user-mode task,
>>>> 1000000
>>>>>> us
>>>>>>> period, priority 99)
>>>>>>> RTH|-----lat min|-----lat avg|-----lat
>>>>>>> max|-overrun|----lat best|---lat worst
>>>>>>> RTD|     167.000|     167.000|     167.000|   
>>  
>>>>>> 0|  
>>>>>>>   167.000|     167.000
>>>>>>> RTD|     176.000|     176.000|     176.000|   
>>  
>>>>>> 0|  
>>>>>>>   167.000|     176.000
>>>>>>> RTD|     168.000|     168.000|     168.000|   
>>  
>>>>>> 0|  
>>>>>>>   167.000|     176.000
> === message truncated ===
> 
> 
> 
>  
> ____________________________________________________________________________________
> Get your own web address.  
> Have a HUGE year through Yahoo! Small Business.
> http://smallbusiness.yahoo.com/domains/?p=BESTDEAL
> 
> _______________________________________________
> Adeos-main mailing list
> Adeos-main@domain.hid
> https://mail.gna.org/listinfo/adeos-main
> 
> 



 
---------------------------------
Bored stiff? Loosen up...
Download and play hundreds of games for free on Yahoo! Games.

[-- Attachment #2: Type: text/html, Size: 16396 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-21  9:33     ` poornima r
@ 2007-02-21  9:33       ` Nicholas Mc Guire
  2007-02-21 10:49         ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Nicholas Mc Guire @ 2007-02-21  9:33 UTC (permalink / raw)
  To: poornima r; +Cc: adeos-main, Wolfgang Grandegger

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>
> Latencies are mainly due to cache refills on the P4. Have you already
> put load onto your system? If not, worst case latencies will be even longer.
>

one posibility we found in RTLinux/GPL to reduce latency is to free up
TLBs by flushing a few of the TLB hot spots, basically these flushpoints
are something like:

__asm__ __volatile__("invlpg %0": :"m" (*(char*)__builtin_return_address(0)));

put at places where we know we don't need thos lines any more (i.e. after 
switching tasks or the like). By inserting only a few such flushpoints in
hot code on the kernel side we found a clear reduction of the worst case
jitter and interrupt response times.

Aside from caches, BTB exhaustion in high load situations is also a 
problem that has not been addressed much in the realtime variants - with 
the P6 families having a botched BTB prediction unit, one can use some 
"strange" constructions to reduce branch penalties - i.e.:

   if(!condition){slow_path();}
   else{fast_path();}

if more predictalbe than

   if(codition){fast_path();}
   else{slow_path();}

as in the first case the branch prediction is static, thus the worst case
is that you are jumping over a few bytes of object code when the condition
is not met. in the second case the default if the BTB does not yet know
this branch is to guess not-taken and thus load the jump target of the 
slow patch with the overhead of TLB/Cache penalties.

Regarding the PPC numbers, the surprising thing for me is that the same
archs are doing MUCH better with old RTAI/RTLinux versions, i.e. 2.4.4
kernel on a 50MHz MPC860 shows a worst case of 57us - so I do question
what is going wrong here in the 2.6.X branches of hard-realtime Linux -
my suspicion is that there is too much work being done on fast-hot CPUs
and the low-end is being neglected - which is bad as the numbers you
post here for ADEOS are numbers reachable with mainstream preemptive
kernel by now as well (off course not on the low end systems though).

hofrat
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFF3BHqnU7rXZKfY2oRAoUIAJ9F+Y/uwanXyUPJlTJYyOQm2H0efgCeOcTM
Hh1/eLtu+SHeZpjlIVQMLgM=
=0PD6
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-21 10:49         ` Jan Kiszka
@ 2007-02-21 10:26           ` Nicholas Mc Guire
  2007-02-21 12:29             ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Nicholas Mc Guire @ 2007-02-21 10:26 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main, Nicholas Mc Guire, Wolfgang Grandegger

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>>> Latencies are mainly due to cache refills on the P4. Have you already
>>> put load onto your system? If not, worst case latencies will be even
>>> longer.
>>
>>
>> one posibility we found in RTLinux/GPL to reduce latency is to free up
>> TLBs by flushing a few of the TLB hot spots, basically these flushpoints
>> are something like:
>>
>> __asm__ __volatile__("invlpg %0": :"m"
>> (*(char*)__builtin_return_address(0)));
>>
>> put at places where we know we don't need thos lines any more (i.e.
>> after switching tasks or the like). By inserting only a few such
>> flushpoints in
>> hot code on the kernel side we found a clear reduction of the worst case
>> jitter and interrupt response times.
>
> Interesting. Are these flushpoints present in latest kernel patches of
> RTLinux/GPL? Sounds like a nice thing to play with on a rainy day. :)
>

yup - basically if you look at the latest patches (2.4.33-rtl3.2) you
will find them in the kernel code. Or in the rtlinux core code 
(rtl_core.c and rtl_sched.c). The concept is off course not 
restricted to 2.4.X kernels note thought that some archs (notably MIPS)
have a problem with __builtin_return_address.


>>
>> Aside from caches, BTB exhaustion in high load situations is also a
>> problem that has not been addressed much in the realtime variants - with
>> the P6 families having a botched BTB prediction unit, one can use some
>> "strange" constructions to reduce branch penalties - i.e.:
>>
>>   if(!condition){slow_path();}
>>   else{fast_path();}
>>
>> if more predictalbe than
>>
>>   if(codition){fast_path();}
>>   else{slow_path();}
>
> I think this is also what likely()/unlikely() teaches to the the
> compiler on x86 (where there is no branch prediction predicate for the
> instructions), isn't it?
>

no not really - likely/unlikely give hints during compilation to relocate
the unlikey part to a distant location (some lable at the end of the 
file...) but that does not change the rpoblem at runtime with respect to
the worst case. The BTB uses a hysteresis of one miss/hit to adjust the
guess on P6 systems with the default (if the address is not present in
the BTB) of not taken - thus if you reorder for the "not taken" case
being the fast patch you will always have the fast path preloaded in
the pipeline.

if(likley(condition)){
    fast_patch();
else
    slow_path();

will be fast on average but the worst case is that the address is not
in the BTB so the slow_patch() tag is loaded by default.

There is a paper on this (a bit messy) published at RTLWS7 (Lile) 2005
if you are interested in the details.

>>
>> as in the first case the branch prediction is static, thus the worst case
>> is that you are jumping over a few bytes of object code when the condition
>> is not met. in the second case the default if the BTB does not yet know
>> this branch is to guess not-taken and thus load the jump target of the
>> slow patch with the overhead of TLB/Cache penalties.
>>
>> Regarding the PPC numbers, the surprising thing for me is that the same
>> archs are doing MUCH better with old RTAI/RTLinux versions, i.e. 2.4.4
>> kernel on a 50MHz MPC860 shows a worst case of 57us - so I do question
>> what is going wrong here in the 2.6.X branches of hard-realtime Linux -
>
> You forget that old stuff was kernel-only, lacking a lot of Linux
> integration features. Recent I-pipe-based real-time via Xenomai normally
> includes support for user-space RT (you can switch it off, but hardly
> anyone does). So its not a useful comparison given that new real-time
> projects almost always want full-featured user space these days. For a
> fairer comparison, one should consider a simple I-pipe domain that
> contains the real-time "application".
>

note that the numbers posted here WERE kernel numbers !
I know that people want to move to user-space - but what is the advantage
over RT-preempt then if you use the dynamic tick patch (scheduled to go
mainline in 2.6.21 BTW) ?

>> my suspicion is that there is too much work being done on fast-hot CPUs
>> and the low-end is being neglected - which is bad as the numbers you
>> post here for ADEOS are numbers reachable with mainstream preemptive
>> kernel by now as well (off course not on the low end systems though).
>
> That's scenario-dependent. Simple setups like a plain timed task can
> reach the dimension of I-pipe-based Xenomai, but more complex scenarios
> suffer from the exploding complexity in mainstream Linux, even with -rt.
> Just think of "simple" mutexes realised via futexes.
>

do you have some code samples with numbers ? I would be very interested in
a demo that shows this problem - I was not able to really find a smoking
gun with RT-preempt and dynamic ticks (2.6.17.2).

hofrat
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFF3B5hnU7rXZKfY2oRAmrGAJwN6SK3pGLMBcxSa2MT9HGQv0q4+wCfZVuq
Yxaynkg4Bitl0uMlFug6Yak=
=5xzd
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-21  9:33       ` Nicholas Mc Guire
@ 2007-02-21 10:49         ` Jan Kiszka
  2007-02-21 10:26           ` Nicholas Mc Guire
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2007-02-21 10:49 UTC (permalink / raw)
  To: Nicholas Mc Guire; +Cc: adeos-main, Wolfgang Grandegger

[-- Attachment #1: Type: text/plain, Size: 3198 bytes --]

Nicholas Mc Guire wrote:
> 
>> Latencies are mainly due to cache refills on the P4. Have you already
>> put load onto your system? If not, worst case latencies will be even
>> longer.
> 
> 
> one posibility we found in RTLinux/GPL to reduce latency is to free up
> TLBs by flushing a few of the TLB hot spots, basically these flushpoints
> are something like:
> 
> __asm__ __volatile__("invlpg %0": :"m"
> (*(char*)__builtin_return_address(0)));
> 
> put at places where we know we don't need thos lines any more (i.e.
> after switching tasks or the like). By inserting only a few such
> flushpoints in
> hot code on the kernel side we found a clear reduction of the worst case
> jitter and interrupt response times.

Interesting. Are these flushpoints present in latest kernel patches of
RTLinux/GPL? Sounds like a nice thing to play with on a rainy day. :)

> 
> Aside from caches, BTB exhaustion in high load situations is also a
> problem that has not been addressed much in the realtime variants - with
> the P6 families having a botched BTB prediction unit, one can use some
> "strange" constructions to reduce branch penalties - i.e.:
> 
>   if(!condition){slow_path();}
>   else{fast_path();}
> 
> if more predictalbe than
> 
>   if(codition){fast_path();}
>   else{slow_path();}

I think this is also what likely()/unlikely() teaches to the the
compiler on x86 (where there is no branch prediction predicate for the
instructions), isn't it?

> 
> as in the first case the branch prediction is static, thus the worst case
> is that you are jumping over a few bytes of object code when the condition
> is not met. in the second case the default if the BTB does not yet know
> this branch is to guess not-taken and thus load the jump target of the
> slow patch with the overhead of TLB/Cache penalties.
> 
> Regarding the PPC numbers, the surprising thing for me is that the same
> archs are doing MUCH better with old RTAI/RTLinux versions, i.e. 2.4.4
> kernel on a 50MHz MPC860 shows a worst case of 57us - so I do question
> what is going wrong here in the 2.6.X branches of hard-realtime Linux -

You forget that old stuff was kernel-only, lacking a lot of Linux
integration features. Recent I-pipe-based real-time via Xenomai normally
includes support for user-space RT (you can switch it off, but hardly
anyone does). So its not a useful comparison given that new real-time
projects almost always want full-featured user space these days. For a
fairer comparison, one should consider a simple I-pipe domain that
contains the real-time "application".

> my suspicion is that there is too much work being done on fast-hot CPUs
> and the low-end is being neglected - which is bad as the numbers you
> post here for ADEOS are numbers reachable with mainstream preemptive
> kernel by now as well (off course not on the low end systems though).

That's scenario-dependent. Simple setups like a plain timed task can
reach the dimension of I-pipe-based Xenomai, but more complex scenarios
suffer from the exploding complexity in mainstream Linux, even with -rt.
Just think of "simple" mutexes realised via futexes.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-21 12:29             ` Jan Kiszka
@ 2007-02-21 12:14               ` Nicholas Mc Guire
  2007-02-21 13:51                 ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Nicholas Mc Guire @ 2007-02-21 12:14 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main, Nicholas Mc Guire, Wolfgang Grandegger

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>> the unlikey part to a distant location (some lable at the end of the
>> file...) but that does not change the rpoblem at runtime with respect to
>> the worst case. The BTB uses a hysteresis of one miss/hit to adjust the
>> guess on P6 systems with the default (if the address is not present in
>> the BTB) of not taken - thus if you reorder for the "not taken" case
>> being the fast patch you will always have the fast path preloaded in
>> the pipeline.
>>
>> if(likley(condition)){
>>    fast_patch();
>> else
>>    slow_path();
>>
>> will be fast on average but the worst case is that the address is not
>> in the BTB so the slow_patch() tag is loaded by default.
>
> Ah, got the idea. How much arch/processor-type-dependent is this
> optimisation? It would surely makes no sense to optimise for arch X in
> generic code.
>

thats the problem it is very x86 centric - p6 and AMD Duron/K7.

>>
>>> You forget that old stuff was kernel-only, lacking a lot of Linux
>>> integration features. Recent I-pipe-based real-time via Xenomai normally
>>> includes support for user-space RT (you can switch it off, but hardly
>>> anyone does). So its not a useful comparison given that new real-time
>>> projects almost always want full-featured user space these days. For a
>>> fairer comparison, one should consider a simple I-pipe domain that
>>> contains the real-time "application".
>>
>>
>> note that the numbers posted here WERE kernel numbers !
>
> But with user space support enabled. There are no separate code paths
> for kernel and user space threads, basic infrastructure is shared here
> for good reasons.
>
>> I know that people want to move to user-space - but what is the advantage
>> over RT-preempt then if you use the dynamic tick patch (scheduled to go
>> mainline in 2.6.21 BTW) ?
>
> So far, determinism (both /wrt mainline and latest -rt).
>
> BTW, kernel space real time is specifically no longer recommendable for
> commercial projects that have to worry about the (likely non-GPL)
> license of their application code. And then there are those countless
> technical advantages that speed up the development process of user space
> apps.
>

well I don't see that advantage at this point - determinism seems to be
in the same range as you get on ADEOS based systems. That there is a
move towards user-space is clear.

>>
>>>> my suspicion is that there is too much work being done on fast-hot CPUs
>>>> and the low-end is being neglected - which is bad as the numbers you
>>>> post here for ADEOS are numbers reachable with mainstream preemptive
>>>> kernel by now as well (off course not on the low end systems though).
>>
>>> That's scenario-dependent. Simple setups like a plain timed task can
>>> reach the dimension of I-pipe-based Xenomai, but more complex scenarios
>>> suffer from the exploding complexity in mainstream Linux, even with -rt.
>>> Just think of "simple" mutexes realised via futexes.
>>
>>
>> do you have some code samples with numbers ? I would be very interested in
>> a demo that shows this problem - I was not able to really find a smoking
>> gun with RT-preempt and dynamic ticks (2.6.17.2).
>
> I can't help with demo code, but I can name a few conceptual issues:
>
> o Futexes may require to allocate memory when suspending on a contented
>   lock (refill_pi_state_cache)
> o Futexes depend on mmap_sem

ok - thats a nice one

> o Preemptible RCU read-sides can either lead to OOM or require
>   intrusive read-side priority boosting (see Paul McKenney's LWN
>   article)
> o Excessive lock nesting depths in critical code paths makes it hard to
>   predict worst-case behaviour (or to verify that measurements actually
>   already triggered them)

well thats true for ADEOS/RTAI/RTLinux as well - we are also only 
black-box testing the RT-kernel - there currently is absolutley NO
prof for worst-case timing in any of the flavours of RT-Linux.

> o Any nanosleep&friends-using Linux process can schedule hrtimers at
>   arbitrary dates, requiring to have a pretty close look at the
>   (worst-case) timer usage pattern of the _whole_ system, not only the
>   SCHED_FIFO/RR part

true - but resource overload hits all flavours - and the splitt of
timers and timeouts in 2.6.18++ does reduce the risk clearly.

>
> That's what I can tell from the heart. But one would have to analyse the
> code more thoroughly I guess.
>
thanks for the imput - at the embedded world Thomas Gleixner 
demonstrated a simple control system that could sustain sub 10us
scheduling jitter under load based on the latest rt-preempt + a bit
of tuning I guess (actually don't know). The essence for me is that with 
the work in 2.6.X I don't see the big performance jump provided by teh
hard-RT variants around - especially with respect to guaranteed worst
case (and not only "black-box" results).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFF3De1nU7rXZKfY2oRAnjkAJ9jsT6PAhwlY6Wu8a3wddTjHbcWZQCgn6cZ
8Ve6WL2E+QuENP9ezT0I3HU=
=hSbF
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-21 10:26           ` Nicholas Mc Guire
@ 2007-02-21 12:29             ` Jan Kiszka
  2007-02-21 12:14               ` Nicholas Mc Guire
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2007-02-21 12:29 UTC (permalink / raw)
  To: Nicholas Mc Guire; +Cc: adeos-main, Wolfgang Grandegger

[-- Attachment #1: Type: text/plain, Size: 6592 bytes --]

Nicholas Mc Guire wrote:
>>>> Latencies are mainly due to cache refills on the P4. Have you already
>>>> put load onto your system? If not, worst case latencies will be even
>>>> longer.
>>>
>>>
>>> one posibility we found in RTLinux/GPL to reduce latency is to free up
>>> TLBs by flushing a few of the TLB hot spots, basically these flushpoints
>>> are something like:
>>>
>>> __asm__ __volatile__("invlpg %0": :"m"
>>> (*(char*)__builtin_return_address(0)));
>>>
>>> put at places where we know we don't need thos lines any more (i.e.
>>> after switching tasks or the like). By inserting only a few such
>>> flushpoints in
>>> hot code on the kernel side we found a clear reduction of the worst case
>>> jitter and interrupt response times.
> 
>> Interesting. Are these flushpoints present in latest kernel patches of
>> RTLinux/GPL? Sounds like a nice thing to play with on a rainy day. :)
> 
> 
> yup - basically if you look at the latest patches (2.4.33-rtl3.2) you
> will find them in the kernel code. Or in the rtlinux core code
> (rtl_core.c and rtl_sched.c). The concept is off course not restricted
> to 2.4.X kernels note thought that some archs (notably MIPS)
> have a problem with __builtin_return_address.

OK, thanks.

> 
> 
>>>
>>> Aside from caches, BTB exhaustion in high load situations is also a
>>> problem that has not been addressed much in the realtime variants - with
>>> the P6 families having a botched BTB prediction unit, one can use some
>>> "strange" constructions to reduce branch penalties - i.e.:
>>>
>>>   if(!condition){slow_path();}
>>>   else{fast_path();}
>>>
>>> if more predictalbe than
>>>
>>>   if(codition){fast_path();}
>>>   else{slow_path();}
> 
>> I think this is also what likely()/unlikely() teaches to the the
>> compiler on x86 (where there is no branch prediction predicate for the
>> instructions), isn't it?
> 
> 
> no not really - likely/unlikely give hints during compilation to relocate
> the unlikey part to a distant location (some lable at the end of the
> file...) but that does not change the rpoblem at runtime with respect to
> the worst case. The BTB uses a hysteresis of one miss/hit to adjust the
> guess on P6 systems with the default (if the address is not present in
> the BTB) of not taken - thus if you reorder for the "not taken" case
> being the fast patch you will always have the fast path preloaded in
> the pipeline.
> 
> if(likley(condition)){
>    fast_patch();
> else
>    slow_path();
> 
> will be fast on average but the worst case is that the address is not
> in the BTB so the slow_patch() tag is loaded by default.

Ah, got the idea. How much arch/processor-type-dependent is this
optimisation? It would surely makes no sense to optimise for arch X in
generic code.

> 
> There is a paper on this (a bit messy) published at RTLWS7 (Lile) 2005
> if you are interested in the details.
> 
>>>
>>> as in the first case the branch prediction is static, thus the worst
>>> case
>>> is that you are jumping over a few bytes of object code when the
>>> condition
>>> is not met. in the second case the default if the BTB does not yet know
>>> this branch is to guess not-taken and thus load the jump target of the
>>> slow patch with the overhead of TLB/Cache penalties.
>>>
>>> Regarding the PPC numbers, the surprising thing for me is that the same
>>> archs are doing MUCH better with old RTAI/RTLinux versions, i.e. 2.4.4
>>> kernel on a 50MHz MPC860 shows a worst case of 57us - so I do question
>>> what is going wrong here in the 2.6.X branches of hard-realtime Linux -
> 
>> You forget that old stuff was kernel-only, lacking a lot of Linux
>> integration features. Recent I-pipe-based real-time via Xenomai normally
>> includes support for user-space RT (you can switch it off, but hardly
>> anyone does). So its not a useful comparison given that new real-time
>> projects almost always want full-featured user space these days. For a
>> fairer comparison, one should consider a simple I-pipe domain that
>> contains the real-time "application".
> 
> 
> note that the numbers posted here WERE kernel numbers !

But with user space support enabled. There are no separate code paths
for kernel and user space threads, basic infrastructure is shared here
for good reasons.

> I know that people want to move to user-space - but what is the advantage
> over RT-preempt then if you use the dynamic tick patch (scheduled to go
> mainline in 2.6.21 BTW) ?

So far, determinism (both /wrt mainline and latest -rt).

BTW, kernel space real time is specifically no longer recommendable for
commercial projects that have to worry about the (likely non-GPL)
license of their application code. And then there are those countless
technical advantages that speed up the development process of user space
apps.

> 
>>> my suspicion is that there is too much work being done on fast-hot CPUs
>>> and the low-end is being neglected - which is bad as the numbers you
>>> post here for ADEOS are numbers reachable with mainstream preemptive
>>> kernel by now as well (off course not on the low end systems though).
> 
>> That's scenario-dependent. Simple setups like a plain timed task can
>> reach the dimension of I-pipe-based Xenomai, but more complex scenarios
>> suffer from the exploding complexity in mainstream Linux, even with -rt.
>> Just think of "simple" mutexes realised via futexes.
> 
> 
> do you have some code samples with numbers ? I would be very interested in
> a demo that shows this problem - I was not able to really find a smoking
> gun with RT-preempt and dynamic ticks (2.6.17.2).

I can't help with demo code, but I can name a few conceptual issues:

 o Futexes may require to allocate memory when suspending on a contented
   lock (refill_pi_state_cache)
 o Futexes depend on mmap_sem
 o Preemptible RCU read-sides can either lead to OOM or require
   intrusive read-side priority boosting (see Paul McKenney's LWN
   article)
 o Excessive lock nesting depths in critical code paths makes it hard to
   predict worst-case behaviour (or to verify that measurements actually
   already triggered them)
 o Any nanosleep&friends-using Linux process can schedule hrtimers at
   arbitrary dates, requiring to have a pretty close look at the
   (worst-case) timer usage pattern of the _whole_ system, not only the
   SCHED_FIFO/RR part

That's what I can tell from the heart. But one would have to analyse the
code more thoroughly I guess.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-21 12:14               ` Nicholas Mc Guire
@ 2007-02-21 13:51                 ` Jan Kiszka
  2007-02-21 14:52                   ` Wolfgang Grandegger
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2007-02-21 13:51 UTC (permalink / raw)
  To: Nicholas Mc Guire; +Cc: adeos-main, Wolfgang Grandegger

[-- Attachment #1: Type: text/plain, Size: 4524 bytes --]

Nicholas Mc Guire wrote:
>>> I know that people want to move to user-space - but what is the
>>> advantage
>>> over RT-preempt then if you use the dynamic tick patch (scheduled to go
>>> mainline in 2.6.21 BTW) ?
> 
>> So far, determinism (both /wrt mainline and latest -rt).
> 
>> BTW, kernel space real time is specifically no longer recommendable for
>> commercial projects that have to worry about the (likely non-GPL)
>> license of their application code. And then there are those countless
>> technical advantages that speed up the development process of user space
>> apps.
> 
> 
> well I don't see that advantage at this point - determinism seems to be
> in the same range as you get on ADEOS based systems. That there is a
> move towards user-space is clear.

Yeah, it /seems/...

> 
>>>
>>>>> my suspicion is that there is too much work being done on fast-hot
>>>>> CPUs
>>>>> and the low-end is being neglected - which is bad as the numbers you
>>>>> post here for ADEOS are numbers reachable with mainstream preemptive
>>>>> kernel by now as well (off course not on the low end systems though).
>>>
>>>> That's scenario-dependent. Simple setups like a plain timed task can
>>>> reach the dimension of I-pipe-based Xenomai, but more complex scenarios
>>>> suffer from the exploding complexity in mainstream Linux, even with
>>>> -rt.
>>>> Just think of "simple" mutexes realised via futexes.
>>>
>>>
>>> do you have some code samples with numbers ? I would be very
>>> interested in
>>> a demo that shows this problem - I was not able to really find a smoking
>>> gun with RT-preempt and dynamic ticks (2.6.17.2).
> 
>> I can't help with demo code, but I can name a few conceptual issues:
> 
>> o Futexes may require to allocate memory when suspending on a contented
>>   lock (refill_pi_state_cache)
>> o Futexes depend on mmap_sem
> 
> ok - thats a nice one
> 
>> o Preemptible RCU read-sides can either lead to OOM or require
>>   intrusive read-side priority boosting (see Paul McKenney's LWN
>>   article)
>> o Excessive lock nesting depths in critical code paths makes it hard to
>>   predict worst-case behaviour (or to verify that measurements actually
>>   already triggered them)
> 
> well thats true for ADEOS/RTAI/RTLinux as well - we are also only
> black-box testing the RT-kernel - there currently is absolutley NO
> prof for worst-case timing in any of the flavours of RT-Linux.

Nope, it isn't. There are neither sleeping not spinning lock nesting
depths of that kind in Xenomai or Adeos/I-pipe (or older RT extensions,
AFAIK) - ok, except for one spot in a driver we have scheduled for
re-design already.

> 
>> o Any nanosleep&friends-using Linux process can schedule hrtimers at
>>   arbitrary dates, requiring to have a pretty close look at the
>>   (worst-case) timer usage pattern of the _whole_ system, not only the
>>   SCHED_FIFO/RR part
> 
> true - but resource overload hits all flavours - and the splitt of
> timers and timeouts in 2.6.18++ does reduce the risk clearly.

Compared to making all Linux timers hrtimers? Yes, for sure. But that
would be an insane idea anyway, just considering all the network-related
timers.

> 
>> That's what I can tell from the heart. But one would have to analyse the
>> code more thoroughly I guess.
> 
> thanks for the imput - at the embedded world Thomas Gleixner
> demonstrated a simple control system that could sustain sub 10us
> scheduling jitter under load based on the latest rt-preempt + a bit
> of tuning I guess (actually don't know).

Without knowing the test (Wolfgang, did you see it?), I would guess the
setup as follows: dual-core GHz Pentium, isolated core for the timed
task, no peripheral interaction, no synchronisation means, likely even
no further syscalls except for the sleep service. Surely a progress over
plain Linux, but that one's only useful for very specific scenarios.

No one claims -rt is not useful or too limited. Each approach has its
preferred application domain. Knowing strength and weaknesses of both is
required here. And providing the user the choice (like Xenomai 3 will).

> The essence for me is that with
> the work in 2.6.X I don't see the big performance jump provided by teh
> hard-RT variants around - especially with respect to guaranteed worst
> case (and not only "black-box" results).

Could it be a bit too enthusiastic to base such an assessment on a
corner-case demonstration?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-21 13:51                 ` Jan Kiszka
@ 2007-02-21 14:52                   ` Wolfgang Grandegger
  2007-02-21 15:10                     ` Nicholas Mc Guire
  0 siblings, 1 reply; 15+ messages in thread
From: Wolfgang Grandegger @ 2007-02-21 14:52 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main, Nicholas Mc Guire

Jan Kiszka wrote:
> Nicholas Mc Guire wrote:
>>>> I know that people want to move to user-space - but what is the
>>>> advantage
>>>> over RT-preempt then if you use the dynamic tick patch (scheduled to go
>>>> mainline in 2.6.21 BTW) ?
>>> So far, determinism (both /wrt mainline and latest -rt).
>>> BTW, kernel space real time is specifically no longer recommendable for
>>> commercial projects that have to worry about the (likely non-GPL)
>>> license of their application code. And then there are those countless
>>> technical advantages that speed up the development process of user space
>>> apps.
>>
>> well I don't see that advantage at this point - determinism seems to be
>> in the same range as you get on ADEOS based systems. That there is a
>> move towards user-space is clear.
> 
> Yeah, it /seems/...
> 
>>>>>> my suspicion is that there is too much work being done on fast-hot
>>>>>> CPUs
>>>>>> and the low-end is being neglected - which is bad as the numbers you
>>>>>> post here for ADEOS are numbers reachable with mainstream preemptive
>>>>>> kernel by now as well (off course not on the low end systems though).
>>>>> That's scenario-dependent. Simple setups like a plain timed task can
>>>>> reach the dimension of I-pipe-based Xenomai, but more complex scenarios
>>>>> suffer from the exploding complexity in mainstream Linux, even with
>>>>> -rt.
>>>>> Just think of "simple" mutexes realised via futexes.
>>>>
>>>> do you have some code samples with numbers ? I would be very
>>>> interested in
>>>> a demo that shows this problem - I was not able to really find a smoking
>>>> gun with RT-preempt and dynamic ticks (2.6.17.2).
>>> I can't help with demo code, but I can name a few conceptual issues:
>>> o Futexes may require to allocate memory when suspending on a contented
>>>   lock (refill_pi_state_cache)
>>> o Futexes depend on mmap_sem
>> ok - thats a nice one
>>
>>> o Preemptible RCU read-sides can either lead to OOM or require
>>>   intrusive read-side priority boosting (see Paul McKenney's LWN
>>>   article)
>>> o Excessive lock nesting depths in critical code paths makes it hard to
>>>   predict worst-case behaviour (or to verify that measurements actually
>>>   already triggered them)
>> well thats true for ADEOS/RTAI/RTLinux as well - we are also only
>> black-box testing the RT-kernel - there currently is absolutley NO
>> prof for worst-case timing in any of the flavours of RT-Linux.
> 
> Nope, it isn't. There are neither sleeping not spinning lock nesting
> depths of that kind in Xenomai or Adeos/I-pipe (or older RT extensions,
> AFAIK) - ok, except for one spot in a driver we have scheduled for
> re-design already.
> 
>>> o Any nanosleep&friends-using Linux process can schedule hrtimers at
>>>   arbitrary dates, requiring to have a pretty close look at the
>>>   (worst-case) timer usage pattern of the _whole_ system, not only the
>>>   SCHED_FIFO/RR part
>> true - but resource overload hits all flavours - and the splitt of
>> timers and timeouts in 2.6.18++ does reduce the risk clearly.
> 
> Compared to making all Linux timers hrtimers? Yes, for sure. But that
> would be an insane idea anyway, just considering all the network-related
> timers.
> 
>>> That's what I can tell from the heart. But one would have to analyse the
>>> code more thoroughly I guess.
>> thanks for the imput - at the embedded world Thomas Gleixner
>> demonstrated a simple control system that could sustain sub 10us
>> scheduling jitter under load based on the latest rt-preempt + a bit
>> of tuning I guess (actually don't know).
> 
> Without knowing the test (Wolfgang, did you see it?), I would guess the
> setup as follows: dual-core GHz Pentium, isolated core for the timed
> task, no peripheral interaction, no synchronisation means, likely even
> no further syscalls except for the sleep service. Surely a progress over
> plain Linux, but that one's only useful for very specific scenarios.

No, I have not seen it. But I believe, that with careful hardware 
selection it's possible to achieve that. On high-end systems the latency 
is dominate by hardware. On low-end systems code size matters. So far I 
have not seen any serious comparison for low-end Linux systems and -rt 
does not work yet on PowerPC (the high-res support is still missing).

> No one claims -rt is not useful or too limited. Each approach has its
> preferred application domain. Knowing strength and weaknesses of both is
> required here. And providing the user the choice (like Xenomai 3 will).
> 
>> The essence for me is that with
>> the work in 2.6.X I don't see the big performance jump provided by teh
>> hard-RT variants around - especially with respect to guaranteed worst
>> case (and not only "black-box" results).
> 
> Could it be a bit too enthusiastic to base such an assessment on a
> corner-case demonstration?
> 
> Jan
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-21 14:52                   ` Wolfgang Grandegger
@ 2007-02-21 15:10                     ` Nicholas Mc Guire
  2007-02-21 18:27                       ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Nicholas Mc Guire @ 2007-02-21 15:10 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Nicholas Mc Guire, adeos-main

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>>>>> 
>>>>> do you have some code samples with numbers ? I would be very
>>>>> interested in
>>>>> a demo that shows this problem - I was not able to really find a 
>>>>> smoking
>>>>> gun with RT-preempt and dynamic ticks (2.6.17.2).
>>>> I can't help with demo code, but I can name a few conceptual issues:
>>>> o Futexes may require to allocate memory when suspending on a 
>>>> contented
>>>>   lock (refill_pi_state_cache)
>>>> o Futexes depend on mmap_sem
>>> ok - thats a nice one
>>> 
>>>> o Preemptible RCU read-sides can either lead to OOM or require
>>>>   intrusive read-side priority boosting (see Paul McKenney's LWN
>>>>   article)
>>>> o Excessive lock nesting depths in critical code paths makes it hard 
>>>> to
>>>>   predict worst-case behaviour (or to verify that measurements 
>>>> actually
>>>>   already triggered them)
>>> well thats true for ADEOS/RTAI/RTLinux as well - we are also only
>>> black-box testing the RT-kernel - there currently is absolutley NO
>>> prof for worst-case timing in any of the flavours of RT-Linux.
>> 
>> Nope, it isn't. There are neither sleeping not spinning lock nesting
>> depths of that kind in Xenomai or Adeos/I-pipe (or older RT extensions,
>> AFAIK) - ok, except for one spot in a driver we have scheduled for
>> re-design already.

that might be so - never the less there is no formal-proof that the worst
case of ADEOS/I-pipe is X-microseconds, the latency/jitter numbers are
based on black-box testing. In fact one problem is that there are not even
code-coverage tools (or I just did not find them) that can provide 
coverage data for ADEOS - thus how can one guarantee worst-case ?

>> 
>>>> o Any nanosleep&friends-using Linux process can schedule hrtimers at
>>>>   arbitrary dates, requiring to have a pretty close look at the
>>>>   (worst-case) timer usage pattern of the _whole_ system, not only the
>>>>   SCHED_FIFO/RR part
>>> true - but resource overload hits all flavours - and the splitt of
>>> timers and timeouts in 2.6.18++ does reduce the risk clearly.
>> 
>> Compared to making all Linux timers hrtimers? Yes, for sure. But that
>> would be an insane idea anyway, just considering all the network-related
>> timers.

well they were all on one timer wheel not too long ago - and yes - it was
insane ;)

>> 
>>>> That's what I can tell from the heart. But one would have to analyse 
>>>> the
>>>> code more thoroughly I guess.
>>> thanks for the imput - at the embedded world Thomas Gleixner
>>> demonstrated a simple control system that could sustain sub 10us
>>> scheduling jitter under load based on the latest rt-preempt + a bit
>>> of tuning I guess (actually don't know).
>> 
>> Without knowing the test (Wolfgang, did you see it?), I would guess the
>> setup as follows: dual-core GHz Pentium, isolated core for the timed
>> task, no peripheral interaction, no synchronisation means, likely even
>> no further syscalls except for the sleep service. Surely a progress over
>> plain Linux, but that one's only useful for very specific scenarios.
>
> No, I have not seen it. But I believe, that with careful hardware selection 
> it's possible to achieve that. On high-end systems the latency is dominate by 
> hardware. On low-end systems code size matters. So far I have not seen any 
> serious comparison for low-end Linux systems and -rt does not work yet on 
> PowerPC (the high-res support is still missing).

I did some on low end X86 (ELAN SC520 133MHz) the results are not going to
make many happy at this point (2.6.14-rt9 was my last test), but I have 
still to run benchmarks with dynamic tick and some of the tglx patches on 
low end X86. The fact that again all archs except X86 are laging behind is 
off course a key issue at this point.

>
>> No one claims -rt is not useful or too limited. Each approach has its
>> preferred application domain. Knowing strength and weaknesses of both is
>> required here. And providing the user the choice (like Xenomai 3 will).
>> 
>>> The essence for me is that with
>>> the work in 2.6.X I don't see the big performance jump provided by teh
>>> hard-RT variants around - especially with respect to guaranteed worst
>>> case (and not only "black-box" results).
>> 
>> Could it be a bit too enthusiastic to base such an assessment on a
>> corner-case demonstration?

its not a corener case demonstration, Ive been doing benchmarks on rt
preempt now for quite some time, there is still an advantage if you run
simple comparisons (jitter measurements) - but it is clearly going down,
The problem I have with RT-preempt being 50us and ADEOS is 15us is simply 
that the sector that does need those numbers that RT-preempt will most likely
never reach is generally interested in guaranteed times, and thats where
it becomes tough to argue any of the hard-realtime extensions at this 
point - that is not saying RT-preempt can replace ADEOS/RTAI/RTLinux-gpl
Im just saying that the numbers are no longer 2/3 orders of 
magnitude,which they were in 2.2.X/2.4.X and where arguing the use was
simple.

Don't get me wrong Im not trying to argue away ADEOS/RTAI or I would have 
given up RTLinux/GPL quite some time ago - but I belive if these 
low-jitter/latency systems want to keep there acceptance in industry a key 
issue will be to improve the tools for verification/validation - just take 
this discussion - it started out with:

<snip>
> RTD|      -1.585|       7.556|      16.275|       0|
>    -1.585|      16.275

Latencies are mainly due to cache refills on the P4. Have you already
put load onto your system? If not, worst case latencies will be even 
longer.

<snip>

  THAT is a problem in arguing for ADEOS/I-pipe - WHAT is the worst case 
now ? what is the cause of the worst case ? and can I really demonstrate 
by strong evidence that the worst case on this system is actually XXXX 
microseconds under arbitrary load and will not be higher in some strange 
corner cases ?

hofrat
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFF3GD9nU7rXZKfY2oRAptQAJ4iwYoaJtfTds9am4Gwxl6xSNqR1gCfcAMC
7p9PWIJ8a6mOErrMFGQ4MbI=
=QTjQ
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-21 15:10                     ` Nicholas Mc Guire
@ 2007-02-21 18:27                       ` Jan Kiszka
  2007-02-21 19:07                         ` Nicholas Mc Guire
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2007-02-21 18:27 UTC (permalink / raw)
  To: Nicholas Mc Guire; +Cc: adeos-main, Wolfgang Grandegger

[-- Attachment #1: Type: text/plain, Size: 5363 bytes --]

Nicholas Mc Guire wrote:
>>>> well thats true for ADEOS/RTAI/RTLinux as well - we are also only
>>>> black-box testing the RT-kernel - there currently is absolutley NO
>>>> prof for worst-case timing in any of the flavours of RT-Linux.
>>>
>>> Nope, it isn't. There are neither sleeping not spinning lock nesting
>>> depths of that kind in Xenomai or Adeos/I-pipe (or older RT extensions,
>>> AFAIK) - ok, except for one spot in a driver we have scheduled for
>>> re-design already.
> 
> that might be so - never the less there is no formal-proof that the worst
> case of ADEOS/I-pipe is X-microseconds, the latency/jitter numbers are
> based on black-box testing. In fact one problem is that there are not even
> code-coverage tools (or I just did not find them) that can provide
> coverage data for ADEOS - thus how can one guarantee worst-case ?

The fact that tool support is "improvable" doesn't mean that such an
analysis is impossible. You may over-estimate, but you can derive
numbers for a given system (consisting of real-time core + RT
applications) based on a combined offline system analysis and runtime
measurements. But hardly anyone is doing this "for fun".

>>>> The essence for me is that with
>>>> the work in 2.6.X I don't see the big performance jump provided by teh
>>>> hard-RT variants around - especially with respect to guaranteed worst
>>>> case (and not only "black-box" results).
>>>
>>> Could it be a bit too enthusiastic to base such an assessment on a
>>> corner-case demonstration?
> 
> its not a corener case demonstration, Ive been doing benchmarks on rt
> preempt now for quite some time, there is still an advantage if you run
> simple comparisons (jitter measurements) - but it is clearly going down,
> The problem I have with RT-preempt being 50us and ADEOS is 15us is
> simply that the sector that does need those numbers that RT-preempt will
> most likely
> never reach is generally interested in guaranteed times, and thats where
> it becomes tough to argue any of the hard-realtime extensions at this
> point - that is not saying RT-preempt can replace ADEOS/RTAI/RTLinux-gpl
> Im just saying that the numbers are no longer 2/3 orders of
> magnitude,which they were in 2.2.X/2.4.X and where arguing the use was
> simple.

Granted, arguing becomes more hairy when you have to pull out low-level
system details like I posted (and not discussing individual issues of
certain patches). There are scenarios where I would recommend -rt as
well, but so far only few where RT extensions are fitting too.

> 
> Don't get me wrong Im not trying to argue away ADEOS/RTAI or I would
> have given up RTLinux/GPL quite some time ago - but I belive if these
> low-jitter/latency systems want to keep there acceptance in industry a
> key issue will be to improve the tools for verification/validation -

Ack, and I'm sure they will emerge over the time. I don't expect this to
happen just because someone enjoys it (adding features is always
funnier), but because users will at some point really need them. It's a
process that will derive from the steadily growing professional user
base in both industry and academia.

> just take this discussion - it started out with:
> 
> <snip>
>> RTD|      -1.585|       7.556|      16.275|       0|
>>    -1.585|      16.275
> 
> Latencies are mainly due to cache refills on the P4. Have you already
> put load onto your system? If not, worst case latencies will be even
> longer.

As pointed out earlier in this thread, those numbers doesn't tell much
without appropriate load and a significant runtime. We are maintaining
documentation on this in Xenomai, but it may be too tricky to find. And
as always, such a test only represents one simple snapshot. At least you
have to redo this on the target hardware with all peripheral devices in use.

> 
> <snip>
> 
>  THAT is a problem in arguing for ADEOS/I-pipe - WHAT is the worst case
> now ? what is the cause of the worst case ? and can I really demonstrate
> by strong evidence that the worst case on this system is actually XXXX
> microseconds under arbitrary load and will not be higher in some strange
> corner cases ?

Leaving the completely formal proof aside (that's something even
microkernels still cannot provide), you may go to the drawing board,
develop a model of your _specific_ system, derive worst-case
constellations, and trace the real system for those events (probably
also stimulating them) while measuring latencies. Then add some safety
margin ;), and you have worst-case numbers of a far higher quality then
by just experimenting with benchmarks. This process can become complex
(ie. costly), but it is doable.

The point about co-scheduling approaches is here, that they already come
with a simpler base model (for the RT part), and they allow to "tune"
your system to simplify this model even further - without giving up an
integrated non-RT execution environment and its optimisations. We will
see the effect better on upcoming multi-core systems (not claiming that
Xenomai is already in /the/ perfect shape for them).


However, if you have suggestions on how to improve the current tool
situation, /me and likely others are all ears. And such improvements do
not have to be I-pipe/Xenomai-specific...

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-21 18:27                       ` Jan Kiszka
@ 2007-02-21 19:07                         ` Nicholas Mc Guire
  2007-02-21 21:05                           ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Nicholas Mc Guire @ 2007-02-21 19:07 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main, Nicholas Mc Guire, Wolfgang Grandegger

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Nicholas Mc Guire wrote:
>>>>> well thats true for ADEOS/RTAI/RTLinux as well - we are also only
>>>>> black-box testing the RT-kernel - there currently is absolutley NO
>>>>> prof for worst-case timing in any of the flavours of RT-Linux.
>>>>
>>>> Nope, it isn't. There are neither sleeping not spinning lock nesting
>>>> depths of that kind in Xenomai or Adeos/I-pipe (or older RT extensions,
>>>> AFAIK) - ok, except for one spot in a driver we have scheduled for
>>>> re-design already.
>>
>> that might be so - never the less there is no formal-proof that the worst
>> case of ADEOS/I-pipe is X-microseconds, the latency/jitter numbers are
>> based on black-box testing. In fact one problem is that there are not even
>> code-coverage tools (or I just did not find them) that can provide
>> coverage data for ADEOS - thus how can one guarantee worst-case ?
>
> The fact that tool support is "improvable" doesn't mean that such an
> analysis is impossible. You may over-estimate, but you can derive
> numbers for a given system (consisting of real-time core + RT
> applications) based on a combined offline system analysis and runtime
> measurements. But hardly anyone is doing this "for fun".
>

with the current status I don't think a off-line analysis is resonable
I don't think a model of ADEOS is resonably duable, alteast not a modleing 
that would lead to any usable results - I might be wrong - do you know of 
any such successfull approaches ? All testing is really inherently 
limited, from black-box testing you simply don't get any guarantees.

<snip>
>> its not a corener case demonstration, Ive been doing benchmarks on rt
>> preempt now for quite some time, there is still an advantage if you run
>> simple comparisons (jitter measurements) - but it is clearly going down,
>> The problem I have with RT-preempt being 50us and ADEOS is 15us is
>> simply that the sector that does need those numbers that RT-preempt will
>> most likely
>> never reach is generally interested in guaranteed times, and thats where
>> it becomes tough to argue any of the hard-realtime extensions at this
>> point - that is not saying RT-preempt can replace ADEOS/RTAI/RTLinux-gpl
>> Im just saying that the numbers are no longer 2/3 orders of
>> magnitude,which they were in 2.2.X/2.4.X and where arguing the use was
>> simple.
>
> Granted, arguing becomes more hairy when you have to pull out low-level
> system details like I posted (and not discussing individual issues of
> certain patches). There are scenarios where I would recommend -rt as
> well, but so far only few where RT extensions are fitting too.
>
>>
>> Don't get me wrong Im not trying to argue away ADEOS/RTAI or I would
>> have given up RTLinux/GPL quite some time ago - but I belive if these
>> low-jitter/latency systems want to keep there acceptance in industry a
>> key issue will be to improve the tools for verification/validation -
>
> Ack, and I'm sure they will emerge over the time. I don't expect this to
> happen just because someone enjoys it (adding features is always
> funnier), but because users will at some point really need them. It's a
> process that will derive from the steadily growing professional user
> base in both industry and academia.

let see - I hope you are right - I'm just starting into a FMEA/HAZOP for
XtratuM "for fun" ;)

>>
>> <snip>
>>
>>  THAT is a problem in arguing for ADEOS/I-pipe - WHAT is the worst case
>> now ? what is the cause of the worst case ? and can I really demonstrate
>> by strong evidence that the worst case on this system is actually XXXX
>> microseconds under arbitrary load and will not be higher in some strange
>> corner cases ?
>
> Leaving the completely formal proof aside (that's something even
> microkernels still cannot provide), you may go to the drawing board,
> develop a model of your _specific_ system, derive worst-case
> constellations, and trace the real system for those events (probably
> also stimulating them) while measuring latencies. Then add some safety
> margin ;), and you have worst-case numbers of a far higher quality then
> by just experimenting with benchmarks. This process can become complex
> (ie. costly), but it is doable.
>
> The point about co-scheduling approaches is here, that they already come
> with a simpler base model (for the RT part), and they allow to "tune"
> your system to simplify this model even further - without giving up an
> integrated non-RT execution environment and its optimisations. We will
> see the effect better on upcoming multi-core systems (not claiming that
> Xenomai is already in /the/ perfect shape for them).
>
>
> However, if you have suggestions on how to improve the current tool
> situation, /me and likely others are all ears. And such improvements do
> not have to be I-pipe/Xenomai-specific...
>
well one thing Im looking into for RTLinux is to extend things like
kernel GCOV into RTLinux and KFI/KFT to RTLinux as this allows much better 
assessment. I guess that those extensions would equally be worth while
for ADESO/I-pipe/Xenomai.

refs:

  KFT www.celinuxforum.org/CelfPubWiki/PatchArchive last one for 2.6.12
  GCOV-Kernel part of LTP now (last one is for linux-2.6.16-gcov.patch.gz

hofrat
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFF3Jh1nU7rXZKfY2oRAmZAAJ9ZneSKj4sRCx0h2CBlhCXvkkDVWQCfSxkb
RpGdirhoa91vElKgqrZ4Cpg=
=OSmY
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] latency results for ppc and x86
  2007-02-21 19:07                         ` Nicholas Mc Guire
@ 2007-02-21 21:05                           ` Jan Kiszka
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2007-02-21 21:05 UTC (permalink / raw)
  To: Nicholas Mc Guire; +Cc: adeos-main, Wolfgang Grandegger

[-- Attachment #1: Type: text/plain, Size: 7125 bytes --]

Nicholas Mc Guire wrote:
>> Nicholas Mc Guire wrote:
>>>>>> well thats true for ADEOS/RTAI/RTLinux as well - we are also only
>>>>>> black-box testing the RT-kernel - there currently is absolutley NO
>>>>>> prof for worst-case timing in any of the flavours of RT-Linux.
>>>>>
>>>>> Nope, it isn't. There are neither sleeping not spinning lock nesting
>>>>> depths of that kind in Xenomai or Adeos/I-pipe (or older RT
>>>>> extensions,
>>>>> AFAIK) - ok, except for one spot in a driver we have scheduled for
>>>>> re-design already.
>>>
>>> that might be so - never the less there is no formal-proof that the
>>> worst
>>> case of ADEOS/I-pipe is X-microseconds, the latency/jitter numbers are
>>> based on black-box testing. In fact one problem is that there are not
>>> even
>>> code-coverage tools (or I just did not find them) that can provide
>>> coverage data for ADEOS - thus how can one guarantee worst-case ?
> 
>> The fact that tool support is "improvable" doesn't mean that such an
>> analysis is impossible. You may over-estimate, but you can derive
>> numbers for a given system (consisting of real-time core + RT
>> applications) based on a combined offline system analysis and runtime
>> measurements. But hardly anyone is doing this "for fun".
> 
> 
> with the current status I don't think a off-line analysis is resonable
> I don't think a model of ADEOS is resonably duable, alteast not a
> modleing that would lead to any usable results - I might be wrong - do
> you know of any such successfull approaches ? All testing is really
> inherently limited, from black-box testing you simply don't get any
> guarantees.

We are no longer black-box testing - thanks to our "KFT". I'm trying to
advertise this model heavily to users, but it still requires a bit too
much system knowledge. Still, modelling a system of I-pipe + Xenomai
remains an open challenge AFAIK.

> 
> <snip>
>>> its not a corener case demonstration, Ive been doing benchmarks on rt
>>> preempt now for quite some time, there is still an advantage if you run
>>> simple comparisons (jitter measurements) - but it is clearly going down,
>>> The problem I have with RT-preempt being 50us and ADEOS is 15us is
>>> simply that the sector that does need those numbers that RT-preempt will
>>> most likely
>>> never reach is generally interested in guaranteed times, and thats where
>>> it becomes tough to argue any of the hard-realtime extensions at this
>>> point - that is not saying RT-preempt can replace ADEOS/RTAI/RTLinux-gpl
>>> Im just saying that the numbers are no longer 2/3 orders of
>>> magnitude,which they were in 2.2.X/2.4.X and where arguing the use was
>>> simple.
> 
>> Granted, arguing becomes more hairy when you have to pull out low-level
>> system details like I posted (and not discussing individual issues of
>> certain patches). There are scenarios where I would recommend -rt as
>> well, but so far only few where RT extensions are fitting too.
> 
>>>
>>> Don't get me wrong Im not trying to argue away ADEOS/RTAI or I would
>>> have given up RTLinux/GPL quite some time ago - but I belive if these
>>> low-jitter/latency systems want to keep there acceptance in industry a
>>> key issue will be to improve the tools for verification/validation -
> 
>> Ack, and I'm sure they will emerge over the time. I don't expect this to
>> happen just because someone enjoys it (adding features is always
>> funnier), but because users will at some point really need them. It's a
>> process that will derive from the steadily growing professional user
>> base in both industry and academia.
> 
> let see - I hope you are right - I'm just starting into a FMEA/HAZOP for
> XtratuM "for fun" ;)

Will be interesting to hear/read about practical experiences.

> 
>>>
>>> <snip>
>>>
>>>  THAT is a problem in arguing for ADEOS/I-pipe - WHAT is the worst case
>>> now ? what is the cause of the worst case ? and can I really demonstrate
>>> by strong evidence that the worst case on this system is actually XXXX
>>> microseconds under arbitrary load and will not be higher in some strange
>>> corner cases ?
> 
>> Leaving the completely formal proof aside (that's something even
>> microkernels still cannot provide), you may go to the drawing board,
>> develop a model of your _specific_ system, derive worst-case
>> constellations, and trace the real system for those events (probably
>> also stimulating them) while measuring latencies. Then add some safety
>> margin ;), and you have worst-case numbers of a far higher quality then
>> by just experimenting with benchmarks. This process can become complex
>> (ie. costly), but it is doable.
> 
>> The point about co-scheduling approaches is here, that they already come
>> with a simpler base model (for the RT part), and they allow to "tune"
>> your system to simplify this model even further - without giving up an
>> integrated non-RT execution environment and its optimisations. We will
>> see the effect better on upcoming multi-core systems (not claiming that
>> Xenomai is already in /the/ perfect shape for them).
> 
> 
>> However, if you have suggestions on how to improve the current tool
>> situation, /me and likely others are all ears. And such improvements do
>> not have to be I-pipe/Xenomai-specific...
> 
> well one thing Im looking into for RTLinux is to extend things like
> kernel GCOV into RTLinux and KFI/KFT to RTLinux as this allows much
> better assessment. I guess that those extensions would equally be worth
> while
> for ADESO/I-pipe/Xenomai.
> 
> refs:
> 
>  KFT www.celinuxforum.org/CelfPubWiki/PatchArchive last one for 2.6.12
>  GCOV-Kernel part of LTP now (last one is for linux-2.6.16-gcov.patch.gz
> 

[Quick glance at GCOV patch] Hmm, the thrilling thing is typically
locking, but I don't see a single spinlock, just some semaphores that
cannot be called from arbitrary contexts anyway. Hmm. Did you already
played with it for some kernel?

Regarding KFT: we have such thing already. Partly derived from Ingo
Molnar's work, but with less impact during freeze, the function tracer
is in I-pipe since more than a year. It's heavily used (at least by the
core team) for application and kernel debugging, and for latency
spotting of course. Available for most I-pipe archs, even for the latest
x86_64-WiP. The funny thing is that even RTAI could make use of it - if
they only realised that it's in their patches.

Next to come (yeah, long announced) is LTTng support, i.e. patch and
front-end extensions for Xenomai. There is a working version lying
around somewhere in Canada, I just need to kick the guy again who did
that work for his thesis so that he roles out a release and we can start
discussing the patch integration. Good to be reminded...

So there is definitely not nothing - but surely still enough to do :).
If you see some potential in cooperating on front-ends (given that you
still seem to head for your own kernel-patch path), let us know. I guess
there should be common ground.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Adeos-main] test results for switchtest and cyclictest on x86
  2007-02-21  7:13   ` Wolfgang Grandegger
  2007-02-21  9:33     ` poornima r
@ 2007-03-14 12:51     ` poornima r
  1 sibling, 0 replies; 15+ messages in thread
From: poornima r @ 2007-03-14 12:51 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: adeos-main

[-- Attachment #1: Type: text/plain, Size: 12298 bytes --]

Hello,
   
  There were the results for cyclictest and switchbench test results run on X86
  (without load)
   
  System specifications:-
  Xenomai: xenomai-2.3.0
  Linux kernel: linux-2.6.19.3
  CPU speed : 2.79GHz
  Cache: 512KB
  
 
  a) Switchtest:-
   
  Results:
   
  ./switchtest -T 50 -n rtk rtup rtus
  (-T:limit of test duration i.e 50 seconds
  -n:disables any use of FPU instructions.
  threadspec:rtk i.e kernel-space realtime thread ,rtup i.e for a user-space real-time thread running in primary mode,rtus i.e user-space real-time thread running in secondary mode)
   
  RTT| 00:00:01
  RTH|ctx switches|-------total
  RTD| 1000| 1000
  RTD| 1004| 2004
  RTD| 1002| 3006
  RTD| 1006| 4012
  RTD| 1004| 5016
  RTD| 1002| 6018
  RTD| 1006| 7024
  RTD| 998| 8022
  RTD| 1006| 9028
  RTD| 1004| 10032
  RTD| 516| 10548
  RTD| 504| 11052
  RTD| 504| 11556
  RTD| 504| 12060
  RTD| 504| 12564
  RTD| 504| 13068
  RTD| 504| 13572
  RTD| 504| 14076
  RTD| 504| 14580
  RTD| 504| 15084
  RTD| 504| 15588
  RTT| 00:00:22
  RTH|ctx switches|-------total
  RTD| 510| 16098
  RTD| 846| 16944
  RTD| 996| 17940
  RTD| 1002| 18942
  RTD| 996| 19938
  RTD| 1006| 20944
  RTD| 1004| 21948
  RTD| 1000| 22948
  RTD| 1004| 23952
  RTD| 1002| 24954
  RTD| 1006| 25960
  RTD| 1004| 26964
  RTD| 1002| 27966
  RTD| 1006| 28972
  RTD| 1004| 29976
  RTD| 1002| 30978
  RTD| 1006| 31984
  RTD| 998| 32982
  RTD| 1006| 33988
  RTD| 1004| 34992
  RTD| 1002| 35994
  RTT| 00:00:43
  RTH|ctx switches|-------total
  RTD| 1002| 36996
  RTD| 1002| 37998
  RTD| 1006| 39004
  RTD| 1004| 40008
  RTD| 1002| 41010
  RTD| 1002| 42012
  RTD| 1002| 43014
  RTD| 790| 43804
   
  Total no of context switches between kernel-space realtime thread ,user-space real-time 
  thread running in primary mode and user-space real-time thread running in secondary mode is 43804.
   
  b)cyclictest:-
   
  Results:- 
   
  ./cyclictest -l 100
  0.59 0.13 0.06 1/99 25688
  T: 0 (25688) P:99 I: 1000 C: 100 Min: 7 Act: 7 Avg: 9 Max: 24
   
  1) Please comment on these results.
  2) What are we benchmarking from the above cyclictest result?
   
  Thanks and Regards,
  Poornima

   
   
   
   
  Wolfgang Grandegger <wg@domain.hid> wrote:
  Hello,

poornima r wrote:
> Hello,
> 
> These were the scheduling latency and interrupt
> latency test results on ppc and x86 with IPIPE tracer
> option disabled.
> 
> 1.Please comment on these results (whether valid) and 

Your results are OK. These are actually the figures I remember from my 
own tests in the past.

> 2.Is there any method to optimize these results.

No that I know of. There are a few ideas how to reduce latencies further 
like cache locking or TLB pinning.

> 1)PPC:-
> (MPC-860 at 48 MHz, 4 kB I-Cache and 4 kB D-Cache) 
> 
> User mode:-
> root@domain.hid# ./latency -t0
> == Sampling period: 1000000 us
> == Test mode: periodic user-mode task
> == All results in microseconds
> warming up...
> RTT| 00:00:01 (periodic user-mode task, 1000000 us
> period, priority 99)
> RTH|-----lat min|-----lat
> avg|-----latmax|-overrun|----lat best|---lat worst
> RTD| 167.000| 167.000| 167.000| 0| 
> 167.000| 167.000
> RTD| 176.000| 176.000| 176.000| 0| 
> 167.000| 176.000
> RTD| 168.000| 168.000| 168.000| 0| 
> 167.000| 176.000
> RTD| 171.000| 171.000| 171.000| 0| 
> 167.000| 176.000
> 
> Kernel mode:-
> root@domain.hid# ./latency -t1
> == Sampling period: 1000000 us
> == Test mode: in-kernel periodic task
> == All results in microseconds
> warming up...
> RTT| 00:00:00 (in-kernel periodic task, 1000000 us
> period, priority 99)
> RTH|-----lat min|-----lat
> avg|-----latmax|-overrun|----lat best|---lat worst
> RTD| 123.000| 123.000| 123.000| 0| 
> 123.000| 123.000
> RTD| 125.000| 125.000| 125.000| 0| 
> 123.000| 125.000
> RTD| 128.333| 128.333| 128.333| 0| 
> 123.000| 128.333
> RTD| 127.000| 127.000| 127.000| 0| 
> 123.000| 128.333
> 
> Interrupt mode:-
> root@domain.hid# ./latency -t2
> == Sampling period: 1000000 us
> == Test mode: in-kernel timer handler
> == All results in microseconds
> warming up...
> RTT| 00:00:01 (in-kernel timer handler, 1000000 us
> period, priority 99)
> RTH|-----lat min|-----lat
> avg|-----latmax|-overrun|----lat best|---lat worst
> RTD| 45.334| 45.334| 45.334| 0| 
> 45.334| 45.334
> RTD| 45.000| 45.000| 45.000| 0| 
> 45.000| 45.334
> RTD| 46.000| 46.000| 46.000| 0| 
> 45.000| 46.000
> RTD| 47.334| 47.334| 47.334| 0| 
> 45.000| 47.334
> RTD| 46.334| 46.334| 46.334| 0| 
> 45.000| 47.334

On the MPC860, the latencies are mainly due code execution time as this 
processor is very slow.

> 2)X86:-
> (Pentium4, 3.06GHz, 1024 KB cache size)
> User mode:-
> Sampling period: 100 us
> == Test mode: in-kernel periodic task
> == All results in microseconds
> warming up...
> 
> RTT| 00:00:01 (periodic user-mode task, 100 us
> period, priority 99)
> RTH|-----lat min|-----lat avg|-----lat
> max|-overrun|----lat best|---lat worst
> RTD| 3.807| 12.825| 21.565| 0| 
> 3.807| 21.565
> RTD| 3.796| 12.792| 21.483| 0| 
> 3.796| 21.565
> RTD| 3.770| 12.799| 21.501| 0| 
> 3.770| 21.565
> RTD| 3.578| 12.806| 20.890| 0| 
> 3.578| 21.565
> RTD| 3.755| 12.809| 21.486| 0| 
> 3.578|
> 
> kernel mode:-
> Sampling period: 100 us
> == Test mode: in-kernel periodic task
> == All results in microseconds
> warming up...
> 
> RTT| 00:00:01 (in-kernel periodic task, 100 us
> period, priority 99)
> RTH|-----lat min|-----lat avg|-----lat
> max|-overrun|----lat best|---lat worst
> RTD| 2.381| 3.451| 19.620| 0| 
> 2.381| 19.620
> RTD| 2.332| 3.480| 19.930| 0| 
> 2.332| 19.930
> RTD| 2.382| 3.649| 19.609| 0| 
> 2.332| 19.930
> RTD| 2.323| 2.786| 14.351| 0| 
> 2.323| 19.930
> RTD| 2.375| 2.532| 5.519| 0| 
> 2.323| 19.930
> RTD| 2.332| 3.971| 19.617| 0| 
> 2.323| 19.930
> 
> Interrupt mode:-
> Sampling period: 100 us
> == Test mode: in-kernel timer handler
> == All results in microseconds
> warming up...
> 
> RTT| 00:00:01 (in-kernel timer handler, 100 us
> period, priority 99)
> RTH|-----lat min|-----lat avg|-----lat
> max|-overrun|----lat best|---lat worst
> RTD| -1.563| 7.553| 15.736| 0| 
> -1.563| 15.736
> RTD| -1.579| 7.558| 15.804| 0| 
> -1.579| 15.804
> RTD| -1.584| 7.529| 16.167| 0| 
> -1.584| 16.167
> RTD| -1.548| 7.553| 16.186| 0| 
> -1.584| 16.186
> RTD| -1.585| 7.556| 16.275| 0| 
> -1.585| 16.275

Latencies are mainly due to cache refills on the P4. Have you already 
put load onto your system? If not, worst case latencies will be even longer.

Wolfgang.

> Thanks,
> Poornima
> 
> 
> --- Wolfgang Grandegger wrote:
> 
>> Hello,
>>
>> poornima r wrote:
>>> Hello,
>>>
>>> Srry for not replying all these days...
>>> (Was not in in station, may be too personal!!!!!)
>>>
>>> About software emulation error:
>>>
>>> 4)Output of /proc/xenomai/faults after the illegal
>>>>> instruction:-
>>>>> root@domain.hid# cat
>>>>> /proc/xenomai/faults
>>>>> TRAP CPU0
>>>>> 0: 0 (Data or instruction
>> access)
>>>>> 1: 0 (Alignment)
>>>>> 2: 0 (Altivec unavailable)
>>>>> 3: 0 (Program check exception)
>>>>> 4: 0 (Machine check exception)
>>>>> 5: 0 (Unknown)
>>>>> 6: 0 (Instruction breakpoint)
>>>>> 7: 0 (Run mode exception)
>>>>> 8: 0 (Single-step exception)
>>>>> 9: 0 (Non-recoverable exception)
>>>>> 10: 1 (Software emulation)
>>>>> 11: 0 (Debug)
>>>>> 12: 0 (SPE)
>>>>> 13: 0 (Altivec assist)
>>>> Hm, I see a software emulation exception which is
>>>> also the reason for 
>>>> the illegal instructions. What toolchain do you
>> use?
>>>> The toolchain 
>>>> should support software FP emulation.
>>> 1)I am using open source too chain with software
>>> floating point emulation support.
>>> (#ppc_8xx-gcc --v
>>>
> /lib/gcc/powerpc/3.4.3/../../../../target_powerpc/usr/include/c++/3.4.3
>>> --with-numa-policy=no --with-float=soft)
>>>
>>> 2)And the kernel is included with code to emulate
>> a
>>> floating-point 
>> 
>>> unit, which will allow programs that 
>>> use floating-point 
>> 
>>> instructions to run
>>>
>>> Kernel configuration 
>>> ----CONFIG_MATH_EMULATION:y
>> If you build with "--with-float=soft" there is no
>> need for math 
>> emulation in the kernel. Likely, there is something
>> wrong with your 
>> tool-chain. Could you please try a known-to-work
>> tool-chain like the 
>> ELDK v4.x from http://www.denx.de.
>>
>> Wolfgang.
>>
>>> Thanks,
>>> Poornima
>>>
>>> --- Wolfgang Grandegger wrote:
>>>
>>>> poornima r wrote:
>>>>> Hi,
>>>>>
>>>>> 1)I am using open source kernel from Kernel.org,
>>>>> but what is meant by vanilla kernel from
>>>> Kernel.org?
>>>>
>>>> It's the kernel from kernel.org. This means that
>> the
>>>> Linux kernel 2.6.18 
>>>> is running fine on your MPC860 platform as is?
>>>> Thanks for the info.
>>>>
>>>>> 2)With sampling period of 500usec the system
>>>> simply
>>>>> hangs without printing any results (./latenct
>>>> -p500)
>>>>
>>>> OK.
>>>>
>>>>> 3)cyclictest with -t1 option (without
>>>> IPIPE-tracer)
>>>>> root@domain.hid#
>> ./cyclictest
>>>> -t1
>>>>> 2.04 0.50 0.17 8/27 174
>>>>>
>>>>> T: 0 ( 0) P:99 I: 1000 C: 0 Min:
>>>> 1000000
>>>>> Act: 0 Avg: 0 Max:-1000000
>>>>> Illegal instruction
>>>>>
>>>>> 4)Output of /proc/xenomai/faults after the
>> illegal
>>>>> instruction:-
>>>>> root@domain.hid# cat
>>>>> /proc/xenomai/faults
>>>>> TRAP CPU0
>>>>> 0: 0 (Data or instruction
>> access)
>>>>> 1: 0 (Alignment)
>>>>> 2: 0 (Altivec unavailable)
>>>>> 3: 0 (Program check exception)
>>>>> 4: 0 (Machine check exception)
>>>>> 5: 0 (Unknown)
>>>>> 6: 0 (Instruction breakpoint)
>>>>> 7: 0 (Run mode exception)
>>>>> 8: 0 (Single-step exception)
>>>>> 9: 0 (Non-recoverable exception)
>>>>> 10: 1 (Software emulation)
>>>>> 11: 0 (Debug)
>>>>> 12: 0 (SPE)
>>>>> 13: 0 (Altivec assist)
>>>> Hm, I see a software emulation exception which is
>>>> also the reason for 
>>>> the illegal instructions. What toolchain do you
>> use?
>>>> The toolchain 
>>>> should support software FP emulation.
>>>>
>>>>> 5)Running switchtest:-
>>>>> root@domain.hid#
>> ./switchtest
>>>> -n
>>>>> --The system hangs wihtout printing any results
>>>>>
>>>>> Thanks,
>>>>> Poornima
>>>>>
>>>>>
>>>>> --- Wolfgang Grandegger 
>> wrote:
>>>>>> poornima r wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Thanks for the reply.
>>>>>>>
>>>>>>> Linux version:linux-2.6.18
>>>>>>> Xenomai: xenomai-2.3.0 (Stable version)
>>>>>>> adeos patch:
>> adeos-ipipe-2.6.18-ppc-1.5-01.patch
>>>>>> OK, I'm curious, did you use the vanilla kernel
>>>> from
>>>>>> kernel.org?
>>>>>> More comments below.
>>>>>>
>>>>>>> The tests were run as follows:
>>>>>>> 1)The sampling period in the code for latency
>>>> and
>>>>>>> switchbench was changed to 1000000000ns(to
>>>> remove
>>>>>>> overrun error) 
>>>>>>> 2)switchtest was run with -n5 option
>>>>>>> 3)cyclictest was run with -t5 option(5
>> threads 
>>>>>>> were created.)
>>>>>>> 4)cyclictest was terminated with Illegal
>>>>>> instruction
>>>>>>> (after creating 5 threads) with IPIPE tracer
>>>>>> enabled.
>>>>>>
>>>>>>> These were the results without I-PIPE Tracer
>>>>>> option:
>>>>>>> (All the tests were run without any load)
>>>>>>> 1)LATENCY TEST:-
>>>>>>> User mode:-
>>>>>>> /mnt/out_xen/bin# ./latency -t0
>>>>>>> == Sampling period: 1000000 us
>>>>>>> == Test mode: periodic user-mode task
>>>>>>> == All results in microseconds
>>>>>>> warming up...
>>>>>>> RTT| 00:00:01 (periodic user-mode task,
>>>> 1000000
>>>>>> us
>>>>>>> period, priority 99)
>>>>>>> RTH|-----lat min|-----lat avg|-----lat
>>>>>>> max|-overrun|----lat best|---lat worst
>>>>>>> RTD| 167.000| 167.000| 167.000| 
>> 
>>>>>> 0| 
>>>>>>> 167.000| 167.000
>>>>>>> RTD| 176.000| 176.000| 176.000| 
>> 
>>>>>> 0| 
>>>>>>> 167.000| 176.000
>>>>>>> RTD| 168.000| 168.000| 168.000| 
>> 
>>>>>> 0| 
>>>>>>> 167.000| 176.000
> === message truncated ===
> 
> 
> 
> 
> ____________________________________________________________________________________
> Get your own web address. 
> Have a HUGE year through Yahoo! Small Business.
> http://smallbusiness.yahoo.com/domains/?p=BESTDEAL
> 
> _______________________________________________
> Adeos-main mailing list
> Adeos-main@domain.hid
> https://mail.gna.org/listinfo/adeos-main
> 
> 



 
---------------------------------
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
 
---------------------------------
Never miss an email again!
Yahoo! Toolbar alerts you the instant new Mail arrives. Check it out.

[-- Attachment #2: Type: text/html, Size: 18428 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-03-14 12:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <45CD730A.6000405@domain.hid>
2007-02-20  7:21 ` [Adeos-main] latency results for ppc and x86 poornima r
2007-02-21  7:13   ` Wolfgang Grandegger
2007-02-21  9:33     ` poornima r
2007-02-21  9:33       ` Nicholas Mc Guire
2007-02-21 10:49         ` Jan Kiszka
2007-02-21 10:26           ` Nicholas Mc Guire
2007-02-21 12:29             ` Jan Kiszka
2007-02-21 12:14               ` Nicholas Mc Guire
2007-02-21 13:51                 ` Jan Kiszka
2007-02-21 14:52                   ` Wolfgang Grandegger
2007-02-21 15:10                     ` Nicholas Mc Guire
2007-02-21 18:27                       ` Jan Kiszka
2007-02-21 19:07                         ` Nicholas Mc Guire
2007-02-21 21:05                           ` Jan Kiszka
2007-03-14 12:51     ` [Adeos-main] test results for switchtest and cyclictest on x86 poornima r

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.