All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] problem in pthread_mutex_lock/unlock
@ 2010-06-18 13:52 Nero Fernandez
  2010-06-18 14:12 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 18+ messages in thread
From: Nero Fernandez @ 2010-06-18 13:52 UTC (permalink / raw)
  To: xenomai


[-- Attachment #1.1: Type: text/plain, Size: 1523 bytes --]

Hi,

Please find an archive attached, containing :
 - a program for testing context-switch-latency using posix-APIs
   for native linux kernel and xenomai-posix-skin (userspace).
 - Makefile to build it using xenomai

In brief, the attempt is to spawn a bunch of pthreads, which share pairs
of read-write pthread-mutexes in a circular fashion, as shown below, and
measure the time taken for to complete a (rel_mtx_0 - acq_mtx_n) by
pthread_0
( - threads try to acquire read mutexes and release write_mutexes)

        pthread_0:write_mtx      =   pthread_1:read_mtx
        pthread_1:write_mtx      =   pthread_2:read_mtx
        ......
        pthread_(n-1):write_mtx  =   pthread_1:write_mtx
        pthread_n:write_mtx      =   pthread_0:read_mtx

The program runs fine on native linux userspace (ARM, x86) , but the
in the xenomai-space, threads lock up at trying to acquire their respective
read_mutexes.

A similar attempt using message-queues runs fine.

Am running a version between 2.5.2 and 2.5.3.
Will try to check with 2.5.3 as well.

Following is my current xeno-config
# ./xeno-config
xeno-config --verbose
        --version="2.5.2"
        --cc="arm-linux-gcc"
        --arch="arm"
        --prefix="/opt/xeno_utils"
        --xeno-cflags="-I/opt/xeno_utils/include -D_GNU_SOURCE -D_REENTRANT
-Wall -pipe -D__XENO__"
        --xeno-ldflags="-L/opt/xeno_utils/lib -lxenomai -lpthread "
        --posix-cflags=""
        --posix-ldflags="-Wl,--wrap,pthread_create
-Wl,--wrap,pthread_setschedparam -Wl,--wrap,"

[-- Attachment #1.2: Type: text/html, Size: 3591 bytes --]

[-- Attachment #2: context_switch_latency.tgz --]
[-- Type: application/x-gzip, Size: 9807 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] problem in pthread_mutex_lock/unlock
  2010-06-18 13:52 [Xenomai-core] problem in pthread_mutex_lock/unlock Nero Fernandez
@ 2010-06-18 14:12 ` Gilles Chanteperdrix
       [not found]   ` <AANLkTinABTK2nMI0QfZVaULQ4OKwF0678PKOBc_OMIn1@domain.hid>
  0 siblings, 1 reply; 18+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-18 14:12 UTC (permalink / raw)
  To: Nero Fernandez; +Cc: xenomai

Nero Fernandez wrote:
> Hi,
> 
> Please find an archive attached, containing :
>  - a program for testing context-switch-latency using posix-APIs
>    for native linux kernel and xenomai-posix-skin (userspace).
>  - Makefile to build it using xenomai

Your program is very long to tell fast. But it seems you are using the
mutex as if they were recursive. Xenomai posix skin mutexes used to be
recursive by default, but no longer are.

Also note that your code does not check the return value of the posix
skin services, which is a really bad idea.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Xenomai-core] Fwd:  problem in pthread_mutex_lock/unlock
       [not found]   ` <AANLkTinABTK2nMI0QfZVaULQ4OKwF0678PKOBc_OMIn1@domain.hid>
@ 2010-06-18 14:59     ` Nero Fernandez
  2010-06-18 15:08       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 18+ messages in thread
From: Nero Fernandez @ 2010-06-18 14:59 UTC (permalink / raw)
  To: xenomai

[-- Attachment #1: Type: text/plain, Size: 1546 bytes --]

On Fri, Jun 18, 2010 at 7:42 PM, Gilles Chanteperdrix <
gilles.chanteperdrix@xenomai.org> wrote:

> Nero Fernandez wrote:
> > Hi,
> >
> > Please find an archive attached, containing :
> >  - a program for testing context-switch-latency using posix-APIs
> >    for native linux kernel and xenomai-posix-skin (userspace).
> >  - Makefile to build it using xenomai
>
> Your program is very long to tell fast. But it seems you are using the
> mutex as if they were recursive. Xenomai posix skin mutexes used to be
> recursive by default, but no longer are.
>
> Also note that your code does not check the return value of the posix
> skin services, which is a really bad idea.
>
> --
>                                             Gilles.
>

Thanks for the prompt response.

Could you explain  'recursive usage of mutex' a little further?
Are the xenomai pthread-mutexes very different in behaviour than regular
posix mutexes?

The major portions of the code are the following three methods:
 - main()
      creates resource, spawns and collects pthreads
 - rt_task_master()
      calibrates the dry-run time, controls the mutex-passing and
      reports the final timing. Measuring scheme is similar to lmbench
 - rt_task_slave()
      task for passing the mutex lock()-unlock() chain over.

The program allocates same number of mutexes as the number of processes,
so ideally only one thread would be active and each would its read and write

locks in the following fashion:
 - unlock and lock , in case of master
 - lock and unlock, in case of slave

[-- Attachment #2: Type: text/html, Size: 3821 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Fwd:  problem in pthread_mutex_lock/unlock
  2010-06-18 14:59     ` [Xenomai-core] Fwd: " Nero Fernandez
@ 2010-06-18 15:08       ` Gilles Chanteperdrix
  2010-06-18 19:45         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 18+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-18 15:08 UTC (permalink / raw)
  To: Nero Fernandez; +Cc: xenomai

Nero Fernandez wrote:
> 
> On Fri, Jun 18, 2010 at 7:42 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org
> <mailto:gilles.chanteperdrix@xenomai.org>> wrote:
> 
>     Nero Fernandez wrote:
>     > Hi,
>     >
>     > Please find an archive attached, containing :
>     >  - a program for testing context-switch-latency using posix-APIs
>     >    for native linux kernel and xenomai-posix-skin (userspace).
>     >  - Makefile to build it using xenomai
> 
>     Your program is very long to tell fast. But it seems you are using the
>     mutex as if they were recursive. Xenomai posix skin mutexes used to be
>     recursive by default, but no longer are.
> 
>     Also note that your code does not check the return value of the posix
>     skin services, which is a really bad idea.
> 
>     --
>                                                Gilles.
> 
> 
> Thanks for the prompt response.
> 
> Could you explain  'recursive usage of mutex' a little further?
> Are the xenomai pthread-mutexes very different in behaviour than regular
> posix mutexes?

The posix specification does not define the default type of a mutex. So,
 in short, the behaviour of a "regular posix mutex" is unspecified.
However, following the principle of least surprise, Xenomai chose, like
Linux, to use the "normal" type by default.

What is the type of a posix mutex is explained in many places, starting
with Xenomai API documentation. So, no, I will not repeat it here.


-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Fwd:  problem in pthread_mutex_lock/unlock
  2010-06-18 15:08       ` Gilles Chanteperdrix
@ 2010-06-18 19:45         ` Gilles Chanteperdrix
  2010-06-23 20:45           ` Nero Fernandez
  0 siblings, 1 reply; 18+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-18 19:45 UTC (permalink / raw)
  To: Nero Fernandez; +Cc: xenomai

Gilles Chanteperdrix wrote:
> Nero Fernandez wrote:
>> On Fri, Jun 18, 2010 at 7:42 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org
>> <mailto:gilles.chanteperdrix@xenomai.org>> wrote:
>>
>>     Nero Fernandez wrote:
>>     > Hi,
>>     >
>>     > Please find an archive attached, containing :
>>     >  - a program for testing context-switch-latency using posix-APIs
>>     >    for native linux kernel and xenomai-posix-skin (userspace).
>>     >  - Makefile to build it using xenomai
>>
>>     Your program is very long to tell fast. But it seems you are using the
>>     mutex as if they were recursive. Xenomai posix skin mutexes used to be
>>     recursive by default, but no longer are.
>>
>>     Also note that your code does not check the return value of the posix
>>     skin services, which is a really bad idea.
>>
>>     --
>>                                                Gilles.
>>
>>
>> Thanks for the prompt response.
>>
>> Could you explain  'recursive usage of mutex' a little further?
>> Are the xenomai pthread-mutexes very different in behaviour than regular
>> posix mutexes?
> 
> The posix specification does not define the default type of a mutex. So,
>  in short, the behaviour of a "regular posix mutex" is unspecified.
> However, following the principle of least surprise, Xenomai chose, like
> Linux, to use the "normal" type by default.
> 
> What is the type of a posix mutex is explained in many places, starting
> with Xenomai API documentation. So, no, I will not repeat it here.

Actually, that is not your problem. However, you do not check the return
value of posix services, which is a bad idea. And indeed, if you check
it you will find your error: a thread which does not own a mutex tries
to unlock it.

Sorry, mutex are not semaphore, this is invalid, and Xenomai returns an
error in such a case.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Fwd: problem in pthread_mutex_lock/unlock
  2010-06-18 19:45         ` Gilles Chanteperdrix
@ 2010-06-23 20:45           ` Nero Fernandez
  2010-06-23 22:00             ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Nero Fernandez @ 2010-06-23 20:45 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai


[-- Attachment #1.1: Type: text/plain, Size: 2414 bytes --]

Thanks for your response, Gilles.

i modified the code to use semaphore instead of mutex, which worked fine.
Attached is a compilation of some latency figures and system loading figures
(using lmbench)
that i obtained from my proprietary ARM-9 board, using Xenomai-2.5.2.

Any comments are welcome. TIY.


On Sat, Jun 19, 2010 at 1:15 AM, Gilles Chanteperdrix <
gilles.chanteperdrix@xenomai.org> wrote:

> Gilles Chanteperdrix wrote:
> > Nero Fernandez wrote:
> >> On Fri, Jun 18, 2010 at 7:42 PM, Gilles Chanteperdrix
> >> <gilles.chanteperdrix@xenomai.org
> >> <mailto:gilles.chanteperdrix@xenomai.org>> wrote:
> >>
> >>     Nero Fernandez wrote:
> >>     > Hi,
> >>     >
> >>     > Please find an archive attached, containing :
> >>     >  - a program for testing context-switch-latency using posix-APIs
> >>     >    for native linux kernel and xenomai-posix-skin (userspace).
> >>     >  - Makefile to build it using xenomai
> >>
> >>     Your program is very long to tell fast. But it seems you are using
> the
> >>     mutex as if they were recursive. Xenomai posix skin mutexes used to
> be
> >>     recursive by default, but no longer are.
> >>
> >>     Also note that your code does not check the return value of the
> posix
> >>     skin services, which is a really bad idea.
> >>
> >>     --
> >>                                                Gilles.
> >>
> >>
> >> Thanks for the prompt response.
> >>
> >> Could you explain  'recursive usage of mutex' a little further?
> >> Are the xenomai pthread-mutexes very different in behaviour than regular
> >> posix mutexes?
> >
> > The posix specification does not define the default type of a mutex. So,
> >  in short, the behaviour of a "regular posix mutex" is unspecified.
> > However, following the principle of least surprise, Xenomai chose, like
> > Linux, to use the "normal" type by default.
> >
> > What is the type of a posix mutex is explained in many places, starting
> > with Xenomai API documentation. So, no, I will not repeat it here.
>
> Actually, that is not your problem. However, you do not check the return
> value of posix services, which is a bad idea. And indeed, if you check
> it you will find your error: a thread which does not own a mutex tries
> to unlock it.
>
> Sorry, mutex are not semaphore, this is invalid, and Xenomai returns an
> error in such a case.
>
> --
>                                             Gilles.
>

[-- Attachment #1.2: Type: text/html, Size: 3290 bytes --]

[-- Attachment #2: xenomai_analysis.xls --]
[-- Type: application/vnd.ms-excel, Size: 107008 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Fwd: problem in pthread_mutex_lock/unlock
  2010-06-23 20:45           ` Nero Fernandez
@ 2010-06-23 22:00             ` Philippe Gerum
  2010-06-24 11:35               ` Nero Fernandez
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2010-06-23 22:00 UTC (permalink / raw)
  To: Nero Fernandez; +Cc: xenomai

On Thu, 2010-06-24 at 02:15 +0530, Nero Fernandez wrote:
> Thanks for your response, Gilles.
> 
> i modified the code to use semaphore instead of mutex, which worked
> fine.
> Attached is a compilation of some latency figures and system loading
> figures (using lmbench)
> that i obtained from my proprietary ARM-9 board, using Xenomai-2.5.2.
> 
> Any comments are welcome. TIY.
> 

Yikes. Let me sum up what I understood from your intent:

- you are measuring lmbench test latencies, that is to say, you don't
measure the real-time core capabilities at all. Unless you crafted a
Xenomai-linked version of lmbench, you are basically testing regular
processes.

- you are benchmarking your own port of the interrupt pipeline over some
random, outdated vendor kernel (2.6.18-based Mvista 5.0 dates back to
2007, right?), albeit the original ARM port of such code is based on
mainline since day #1. Since the latest latency-saving features like
FCSE are available with Adeos patches on recent kernels, you are likely
looking at ancient light rays from a fossile galaxy (btw, this may
explain the incorrect results in the 0k context switch test - you don't
have FCSE enabled in your Adeos port, right?).

- instead of reporting figures from a real-time interrupt handler
actually connected to the Xenomai core, you hijacked the system timer
core to pile up your instrumentation on top of the original code you
were supposed to benchmark. If this helps, run /usr/xenomai/bin/latency
-t2 and you will get the real figures.

Quoting you, from your document:
"The intent for running these tests is to gauge the overhead of running
interrupt-virtualization and further running a (real-time co-kernel + 
interrupt virtualization) on an embedded-device."

I'm unsure that you clearly identified the functional layers. If you
don't measure the Xenomai core based on Xenomai activities, then you
don't measure the co-kernel overhead. Besides, trying to measure the
interrupt pipeline overhead via the lmbench micro-benchmarks makes no
sense.

> 
> On Sat, Jun 19, 2010 at 1:15 AM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>         
>         Gilles Chanteperdrix wrote:
>         > Nero Fernandez wrote:
>         >> On Fri, Jun 18, 2010 at 7:42 PM, Gilles Chanteperdrix
>         >> <gilles.chanteperdrix@xenomai.org
>         >> <mailto:gilles.chanteperdrix@xenomai.org>> wrote:
>         >>
>         >>     Nero Fernandez wrote:
>         >>     > Hi,
>         >>     >
>         >>     > Please find an archive attached, containing :
>         >>     >  - a program for testing context-switch-latency using
>         posix-APIs
>         >>     >    for native linux kernel and xenomai-posix-skin
>         (userspace).
>         >>     >  - Makefile to build it using xenomai
>         >>
>         >>     Your program is very long to tell fast. But it seems
>         you are using the
>         >>     mutex as if they were recursive. Xenomai posix skin
>         mutexes used to be
>         >>     recursive by default, but no longer are.
>         >>
>         >>     Also note that your code does not check the return
>         value of the posix
>         >>     skin services, which is a really bad idea.
>         >>
>         >>     --
>         >>                                                Gilles.
>         >>
>         >>
>         >> Thanks for the prompt response.
>         >>
>         >> Could you explain  'recursive usage of mutex' a little
>         further?
>         >> Are the xenomai pthread-mutexes very different in behaviour
>         than regular
>         >> posix mutexes?
>         >
>         > The posix specification does not define the default type of
>         a mutex. So,
>         >  in short, the behaviour of a "regular posix mutex" is
>         unspecified.
>         > However, following the principle of least surprise, Xenomai
>         chose, like
>         > Linux, to use the "normal" type by default.
>         >
>         > What is the type of a posix mutex is explained in many
>         places, starting
>         > with Xenomai API documentation. So, no, I will not repeat it
>         here.
>         
>         
>         Actually, that is not your problem. However, you do not check
>         the return
>         value of posix services, which is a bad idea. And indeed, if
>         you check
>         it you will find your error: a thread which does not own a
>         mutex tries
>         to unlock it.
>         
>         Sorry, mutex are not semaphore, this is invalid, and Xenomai
>         returns an
>         error in such a case.
>         
>         --
>                                                    Gilles.
> 
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core


-- 
Philippe.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Fwd: problem in pthread_mutex_lock/unlock
  2010-06-23 22:00             ` Philippe Gerum
@ 2010-06-24 11:35               ` Nero Fernandez
  2010-06-24 11:50                 ` Gilles Chanteperdrix
                                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Nero Fernandez @ 2010-06-24 11:35 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 6833 bytes --]

Thanks for your response, Philippe.

The concerns while the carrying out my experiments were to:

 - compare xenomai co-kernel overheads (timer and context switch latencies)
   in xenomai-space vs similar native-linux overheads. These are presented
in
   the first two sheets.

 - find out, how addition of xenomai, xenomai+adeos effects the native
kernel's
   performance. Here, lmbench was used on the native linux side to estimate
   the changes to standard linux services.

Regarding the additions of latency measurements in sys-timer handler, i
performed
a similar measurement from xnintr_clock_handler(), and the results were
similar
to ones reported from sys-timer handler in xenomai-enabled linux. While
trying to
make both these measurements, i tried to take care that delay-value logging
is
done at the end the handler routines,but the __ipipe_mach_tsc value is
recorded
at the beginning of the routine (a patch for this is included in the
worksheet itself)

Regarding the system, changing the kernel version would invalidate my
results
as the system is a released CE device and has no plans to upgrade the
kernel.
AFAIK, enabling FCSE would limit the number of concurrent processes, hence
becoming inviable in my scenario.
As far as the adeos patch is concerned, i took a recent one (2.6.32) and
back-ported
it to 2.6.18, so as not to lose out on any new Adeos-only upgrades. i
carried out the
back-port activity for two platforms,a qemu-based integrator platform (for
minimal functional validity) and my proprietary board.

However, i am new to this field and would like to correct things if i went
wrong anywhere.
Your comments and guidance would be much appreciated.






On Thu, Jun 24, 2010 at 3:30 AM, Philippe Gerum <rpm@xenomai.org> wrote:

> On Thu, 2010-06-24 at 02:15 +0530, Nero Fernandez wrote:
> > Thanks for your response, Gilles.
> >
> > i modified the code to use semaphore instead of mutex, which worked
> > fine.
> > Attached is a compilation of some latency figures and system loading
> > figures (using lmbench)
> > that i obtained from my proprietary ARM-9 board, using Xenomai-2.5.2.
> >
> > Any comments are welcome. TIY.
> >
>
> Yikes. Let me sum up what I understood from your intent:
>
> - you are measuring lmbench test latencies, that is to say, you don't
> measure the real-time core capabilities at all. Unless you crafted a
> Xenomai-linked version of lmbench, you are basically testing regular
> processes.
>
> - you are benchmarking your own port of the interrupt pipeline over some
> random, outdated vendor kernel (2.6.18-based Mvista 5.0 dates back to
> 2007, right?), albeit the original ARM port of such code is based on
> mainline since day #1. Since the latest latency-saving features like
> FCSE are available with Adeos patches on recent kernels, you are likely
> looking at ancient light rays from a fossile galaxy (btw, this may
> explain the incorrect results in the 0k context switch test - you don't
> have FCSE enabled in your Adeos port, right?).
>
> - instead of reporting figures from a real-time interrupt handler
> actually connected to the Xenomai core, you hijacked the system timer
> core to pile up your instrumentation on top of the original code you
> were supposed to benchmark. If this helps, run /usr/xenomai/bin/latency
> -t2 and you will get the real figures.
>
> Quoting you, from your document:
> "The intent for running these tests is to gauge the overhead of running
> interrupt-virtualization and further running a (real-time co-kernel +
> interrupt virtualization) on an embedded-device."
>
> I'm unsure that you clearly identified the functional layers. If you
> don't measure the Xenomai core based on Xenomai activities, then you
> don't measure the co-kernel overhead. Besides, trying to measure the
> interrupt pipeline overhead via the lmbench micro-benchmarks makes no
> sense.
>
> >
> > On Sat, Jun 19, 2010 at 1:15 AM, Gilles Chanteperdrix
> > <gilles.chanteperdrix@xenomai.org> wrote:
> >
> >         Gilles Chanteperdrix wrote:
> >         > Nero Fernandez wrote:
> >         >> On Fri, Jun 18, 2010 at 7:42 PM, Gilles Chanteperdrix
> >         >> <gilles.chanteperdrix@xenomai.org
> >         >> <mailto:gilles.chanteperdrix@xenomai.org>> wrote:
> >         >>
> >         >>     Nero Fernandez wrote:
> >         >>     > Hi,
> >         >>     >
> >         >>     > Please find an archive attached, containing :
> >         >>     >  - a program for testing context-switch-latency using
> >         posix-APIs
> >         >>     >    for native linux kernel and xenomai-posix-skin
> >         (userspace).
> >         >>     >  - Makefile to build it using xenomai
> >         >>
> >         >>     Your program is very long to tell fast. But it seems
> >         you are using the
> >         >>     mutex as if they were recursive. Xenomai posix skin
> >         mutexes used to be
> >         >>     recursive by default, but no longer are.
> >         >>
> >         >>     Also note that your code does not check the return
> >         value of the posix
> >         >>     skin services, which is a really bad idea.
> >         >>
> >         >>     --
> >         >>                                                Gilles.
> >         >>
> >         >>
> >         >> Thanks for the prompt response.
> >         >>
> >         >> Could you explain  'recursive usage of mutex' a little
> >         further?
> >         >> Are the xenomai pthread-mutexes very different in behaviour
> >         than regular
> >         >> posix mutexes?
> >         >
> >         > The posix specification does not define the default type of
> >         a mutex. So,
> >         >  in short, the behaviour of a "regular posix mutex" is
> >         unspecified.
> >         > However, following the principle of least surprise, Xenomai
> >         chose, like
> >         > Linux, to use the "normal" type by default.
> >         >
> >         > What is the type of a posix mutex is explained in many
> >         places, starting
> >         > with Xenomai API documentation. So, no, I will not repeat it
> >         here.
> >
> >
> >         Actually, that is not your problem. However, you do not check
> >         the return
> >         value of posix services, which is a bad idea. And indeed, if
> >         you check
> >         it you will find your error: a thread which does not own a
> >         mutex tries
> >         to unlock it.
> >
> >         Sorry, mutex are not semaphore, this is invalid, and Xenomai
> >         returns an
> >         error in such a case.
> >
> >         --
> >                                                    Gilles.
> >
> > _______________________________________________
> > Xenomai-core mailing list
> > Xenomai-core@domain.hid
> > https://mail.gna.org/listinfo/xenomai-core
>
>
> --
> Philippe.
>
>
>

[-- Attachment #2: Type: text/html, Size: 8551 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Fwd: problem in pthread_mutex_lock/unlock
  2010-06-24 11:35               ` Nero Fernandez
@ 2010-06-24 11:50                 ` Gilles Chanteperdrix
  2010-06-24 13:21                   ` Nero Fernandez
  2010-06-24 20:40                 ` Gilles Chanteperdrix
  2010-06-25 15:00                 ` [Xenomai-core] co-kernel benchmarking on arm926 (was: Fwd: problem in pthread_mutex_lock/unlock) Philippe Gerum
  2 siblings, 1 reply; 18+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-24 11:50 UTC (permalink / raw)
  To: Nero Fernandez; +Cc: xenomai

Nero Fernandez wrote:
> Thanks for your response, Philippe.
> 
> The concerns while the carrying out my experiments were to:
> 
>  - compare xenomai co-kernel overheads (timer and context switch latencies)
>    in xenomai-space vs similar native-linux overheads. These are
> presented in
>    the first two sheets.

On what ARM system do you get these latency figures? I really doubt the
linux kernel has a bounded latency under 35us. Because:
- the preempt_rt people, which work on getting a bounded latency get
something around 200us on AT91, an ARM9;
- there would be no reason of the preempt_rt effort if the linux kernel
interrupt latency was already bounded.

So, I take it that you do your measurement without generating a load. We
do our measurements using the latency test, while generating a load for
several hours. And on the average ARM, we usually get an interrupt
latency around 50us.

Please add some load on the system, and do the measurments again. The
best source of load we have found so far is to load the LTP testsuite
while running the latency test.

If you tell me what ARM SOC, or at least what ARM architecture revision
you use (the ARM920T core is an armv4, and the ARM926EJS is an armv5, so
ARM 9 does not tell us much), I can provide you with the root filesystem
we use for our tests.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Fwd: problem in pthread_mutex_lock/unlock
  2010-06-24 11:50                 ` Gilles Chanteperdrix
@ 2010-06-24 13:21                   ` Nero Fernandez
  2010-06-24 14:14                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 18+ messages in thread
From: Nero Fernandez @ 2010-06-24 13:21 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 2318 bytes --]

Yes, the measurements are on no-load scenarios.
I will try to repeat my measurements with system-loads as you suggest.

Following is the cpu-info of my board:
----------------------------------------------------------
Processor       : ARM926EJ-S rev 5 (v5l)
BogoMIPS        : 131.48
Features        : swp half fastmult edsp java
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant     : 0x0
CPU part        : 0x926
CPU revision    : 5
Cache type      : write-back
Cache clean     : cp15 c7 ops
Cache lockdown  : format C
Cache format    : Harvard
I size          : 16384
I assoc         : 4
I line length   : 32
I sets          : 128
D size          : 16384
D assoc         : 4
D line length   : 32
D sets          : 128
----------------------------------------------------------

On Thu, Jun 24, 2010 at 5:20 PM, Gilles Chanteperdrix <
gilles.chanteperdrix@xenomai.org> wrote:

> Nero Fernandez wrote:
> > Thanks for your response, Philippe.
> >
> > The concerns while the carrying out my experiments were to:
> >
> >  - compare xenomai co-kernel overheads (timer and context switch
> latencies)
> >    in xenomai-space vs similar native-linux overheads. These are
> > presented in
> >    the first two sheets.
>
> On what ARM system do you get these latency figures? I really doubt the
> linux kernel has a bounded latency under 35us. Because:
> - the preempt_rt people, which work on getting a bounded latency get
> something around 200us on AT91, an ARM9;
> - there would be no reason of the preempt_rt effort if the linux kernel
> interrupt latency was already bounded.
>
> So, I take it that you do your measurement without generating a load. We
> do our measurements using the latency test, while generating a load for
> several hours. And on the average ARM, we usually get an interrupt
> latency around 50us.
>
> Please add some load on the system, and do the measurments again. The
> best source of load we have found so far is to load the LTP testsuite
> while running the latency test.
>
> If you tell me what ARM SOC, or at least what ARM architecture revision
> you use (the ARM920T core is an armv4, and the ARM926EJS is an armv5, so
> ARM 9 does not tell us much), I can provide you with the root filesystem
> we use for our tests.
>
> --
>                                             Gilles.
>

[-- Attachment #2: Type: text/html, Size: 5000 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Fwd: problem in pthread_mutex_lock/unlock
  2010-06-24 13:21                   ` Nero Fernandez
@ 2010-06-24 14:14                     ` Gilles Chanteperdrix
  2010-06-28 17:53                       ` Nero Fernandez
  0 siblings, 1 reply; 18+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-24 14:14 UTC (permalink / raw)
  To: Nero Fernandez; +Cc: xenomai

Nero Fernandez wrote:
> 
> Yes, the measurements are on no-load scenarios.
> I will try to repeat my measurements with system-loads as you suggest.

You can find a working root filesystem image with Xenomai 2.5.3 compiled
here:
http://www.xenomai.org/~gch/pub/rootfs-arm926-ejs.tar.bz2

The root password is empty, the system launches a telnet daemon, so you
can log on the board via telnet.

To run the tests, launch in a first telnet session:
echo 0 > /proc/xenomai/latency
latency -T 2 -H
in a second telnet sesssion, launch:
dohell
When you see "Listening on any address 5566", run on the host:
netcat <target-name-or-IP> 5566

where <target-name-or-IP> is the name of your arm board in the host
/etc/hosts file or its IP address.

Now, you can let the system run as long as the latency test prints
message. When the dohell script is done, it will kill the latency test,
which will cause it to print the histogram values and exit.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Fwd: problem in pthread_mutex_lock/unlock
  2010-06-24 11:35               ` Nero Fernandez
  2010-06-24 11:50                 ` Gilles Chanteperdrix
@ 2010-06-24 20:40                 ` Gilles Chanteperdrix
  2010-06-25 15:00                 ` [Xenomai-core] co-kernel benchmarking on arm926 (was: Fwd: problem in pthread_mutex_lock/unlock) Philippe Gerum
  2 siblings, 0 replies; 18+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-24 20:40 UTC (permalink / raw)
  To: Nero Fernandez; +Cc: xenomai

Nero Fernandez wrote:
> As far as the adeos patch is concerned, i took a recent one (2.6.32) and
> back-ported
> it to 2.6.18, so as not to lose out on any new Adeos-only upgrades.

There is no such thing as an Adeos patch for linux 2.6.32 on the ARM
platforme.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] co-kernel benchmarking on arm926 (was: Fwd: problem in pthread_mutex_lock/unlock)
  2010-06-24 11:35               ` Nero Fernandez
  2010-06-24 11:50                 ` Gilles Chanteperdrix
  2010-06-24 20:40                 ` Gilles Chanteperdrix
@ 2010-06-25 15:00                 ` Philippe Gerum
  2010-06-28 17:50                   ` Nero Fernandez
  2 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2010-06-25 15:00 UTC (permalink / raw)
  To: Nero Fernandez; +Cc: xenomai

On Thu, 2010-06-24 at 17:05 +0530, Nero Fernandez wrote:
> Thanks for your response, Philippe.
> 
> The concerns while the carrying out my experiments were to:
> 
>  - compare xenomai co-kernel overheads (timer and context switch
> latencies)
>    in xenomai-space vs similar native-linux overheads. These are
> presented in 
>    the first two sheets.
> 
>  - find out, how addition of xenomai, xenomai+adeos effects the native
> kernel's 
>    performance. Here, lmbench was used on the native linux side to
> estimate
>    the changes to standard linux services.

How can your reasonably estimate the overhead of co-kernel services
without running any co-kernel services? Interrupt pipelining is not a
co-kernel service. You do nothing with interrupt pipelining except
enabling co-kernel services to be implemented with real-time response
guarantee.

> 
> Regarding the additions of latency measurements in sys-timer handler,
> i performed 
> a similar measurement from xnintr_clock_handler(), and the results
> were similar 
> to ones reported from sys-timer handler in xenomai-enabled linux.

If your benchmark is about Xenomai, then at least make sure to provide
results for Xenomai services, used in a relevant application and
platform context. Pretending that you instrumented
xnintr_clock_handler() at some point and got some results, but
eventually decided to illustrate your benchmark with other "similar"
results obtained from a totally unrelated instrumentation code, does not
help considering the figures as relevant.

Btw, hooking xnintr_clock_handler() is not correct. Again, benchmarking
interrupt latency with Xenomai has to measure the entire code path, from
the moment the interrupt is taken by the CPU, until it is delivered to
the Xenomai service user. By instrumenting directly in
xnintr_clock_handler(), your test bypasses the Xenomai timer handling
code which delivers the timer tick to the user code, and the
rescheduling procedure as well, so your figures are optimistically wrong
for any normal use case based on real-time tasks.

>  While trying to 
> make both these measurements, i tried to take care that delay-value
> logging is 
> done at the end the handler routines,but the __ipipe_mach_tsc value is
> recorded 
> at the beginning of the routine (a patch for this is included in the
> worksheet itself)

This patch is hopelessly useless and misleading. Unless your intent is
to have your application directly embodied into low-level interrupt
handlers, you are not measuring the actual overhead.

Latency is not solely a matter of interrupt masking, but also a matter
of I/D cache misses, particularly on ARM - you have to traverse the
actual code until delivery to exhibit the latter.

This is exactly what the latency tests shipped with Xenomai are for:
- /usr/xenomai/bin/latency -t0/1/2
- /usr/xenomai/bin/klatency
- /usr/xenomai/bin/irqbench

If your system involves user-space tasks, then you should benchmark
user-space response time using latency [-t0]. If you plan to use
kernel-based tasks such as RTDM tasks, then latency -t1 and klatency
tests will provide correct results for your benchmark.
If you are interested only in interrupt latency, then latency -t2 will
help.

If you do think that those tests do not measure what you seem to be
interested in, then you may want to explain why on this list, so that we
eventually understand what you are after.

> 
> Regarding the system, changing the kernel version would invalidate my
> results
> as the system is a released CE device and has no plans to upgrade the
> kernel.

Ok. But that makes your benchmark 100% irrelevant with respect to
assessing the real performances of a decent co-kernel on your setup.

> AFAIK, enabling FCSE would limit the number of concurrent processes,
> hence
> becoming inviable in my scenario.

Ditto. Besides, FCSE as implemented in recent I-pipe patches has a
best-effort mode which lifts those limitations, at the expense of
voiding the latency guarantee, but on the average, that would still be
much better than always suffering the VIVT cache insanity without FCSE.

Quoting a previous mail of yours, regarding your target:
> Processor       : ARM926EJ-S rev 5 (v5l)

The latency hit induced by VIVT caching on arm926 is typically in the
180-200 us range under load in user-space, and 100-120 us in kernel
space. So, without FCSE, this would bite at each Xenomai __and__ linux
process context switch. Since your application requires that more than
95 processes be available in the system, you will likely get quite a few
switches in any given period of time, unless most of them always sleep,
of course.

Ok, so let me do some wild guesses here: you told us this is a CE-based
application; maybe it exists already? maybe it has to be put on steroïds
for gaining decent real-time guarantees it doesn't have yet? and perhaps
the design of that application involves many processes undergoing
periodic activities, so lots of context switches with address space
changes during normal operations?

And, you want that to run on arm926, with no FCSE, and likely not a huge
amount of RAM either, with more than 95 different address spaces? Don't
you think there might be a problem? If so, don't you think implementing
a benchmark based on those assumptions might be irrelevant at some
point?

> As far as the adeos patch is concerned, i took a recent one (2.6.32)

I guess you meant 2.6.33?

>  and back-ported
> it to 2.6.18, so as not to lose out on any new Adeos-only upgrades. i
> carried out the 
> back-port activity for two platforms,a qemu-based integrator platform
> (for 
> minimal functional validity) and my proprietary board.
> 
> However, i am new to this field and would like to correct things if i
> went wrong anywhere.
> Your comments and guidance would be much appreciated.
> 

Since you told us only very few details, it's quite difficult to help.
AFAICS, the only advice that would make sense here, can be expressed as
a question for you: are you really, 100% sure that your app would fit on
that hardware, even without any real-time requirement?

> 
> 
> 
> 
> 
> 
> 
> 
> On Thu, Jun 24, 2010 at 3:30 AM, Philippe Gerum <rpm@xenomai.org>
> wrote:
>         On Thu, 2010-06-24 at 02:15 +0530, Nero Fernandez wrote:
>         > Thanks for your response, Gilles.
>         >
>         > i modified the code to use semaphore instead of mutex, which
>         worked
>         > fine.
>         > Attached is a compilation of some latency figures and system
>         loading
>         > figures (using lmbench)
>         > that i obtained from my proprietary ARM-9 board, using
>         Xenomai-2.5.2.
>         >
>         > Any comments are welcome. TIY.
>         >
>         
>         
>         Yikes. Let me sum up what I understood from your intent:
>         
>         - you are measuring lmbench test latencies, that is to say,
>         you don't
>         measure the real-time core capabilities at all. Unless you
>         crafted a
>         Xenomai-linked version of lmbench, you are basically testing
>         regular
>         processes.
>         
>         - you are benchmarking your own port of the interrupt pipeline
>         over some
>         random, outdated vendor kernel (2.6.18-based Mvista 5.0 dates
>         back to
>         2007, right?), albeit the original ARM port of such code is
>         based on
>         mainline since day #1. Since the latest latency-saving
>         features like
>         FCSE are available with Adeos patches on recent kernels, you
>         are likely
>         looking at ancient light rays from a fossile galaxy (btw, this
>         may
>         explain the incorrect results in the 0k context switch test -
>         you don't
>         have FCSE enabled in your Adeos port, right?).
>         
>         - instead of reporting figures from a real-time interrupt
>         handler
>         actually connected to the Xenomai core, you hijacked the
>         system timer
>         core to pile up your instrumentation on top of the original
>         code you
>         were supposed to benchmark. If this helps,
>         run /usr/xenomai/bin/latency
>         -t2 and you will get the real figures.
>         
>         Quoting you, from your document:
>         "The intent for running these tests is to gauge the overhead
>         of running
>         interrupt-virtualization and further running a (real-time
>         co-kernel +
>         interrupt virtualization) on an embedded-device."
>         
>         I'm unsure that you clearly identified the functional layers.
>         If you
>         don't measure the Xenomai core based on Xenomai activities,
>         then you
>         don't measure the co-kernel overhead. Besides, trying to
>         measure the
>         interrupt pipeline overhead via the lmbench micro-benchmarks
>         makes no
>         sense.
>         
>         
>         >
>         > On Sat, Jun 19, 2010 at 1:15 AM, Gilles Chanteperdrix
>         > <gilles.chanteperdrix@xenomai.org> wrote:
>         >
>         >         Gilles Chanteperdrix wrote:
>         >         > Nero Fernandez wrote:
>         >         >> On Fri, Jun 18, 2010 at 7:42 PM, Gilles
>         Chanteperdrix
>         >         >> <gilles.chanteperdrix@xenomai.org
>         >         >> <mailto:gilles.chanteperdrix@xenomai.org>> wrote:
>         >         >>
>         >         >>     Nero Fernandez wrote:
>         >         >>     > Hi,
>         >         >>     >
>         >         >>     > Please find an archive attached,
>         containing :
>         >         >>     >  - a program for testing
>         context-switch-latency using
>         >         posix-APIs
>         >         >>     >    for native linux kernel and
>         xenomai-posix-skin
>         >         (userspace).
>         >         >>     >  - Makefile to build it using xenomai
>         >         >>
>         >         >>     Your program is very long to tell fast. But
>         it seems
>         >         you are using the
>         >         >>     mutex as if they were recursive. Xenomai
>         posix skin
>         >         mutexes used to be
>         >         >>     recursive by default, but no longer are.
>         >         >>
>         >         >>     Also note that your code does not check the
>         return
>         >         value of the posix
>         >         >>     skin services, which is a really bad idea.
>         >         >>
>         >         >>     --
>         >         >>
>          Gilles.
>         >         >>
>         >         >>
>         >         >> Thanks for the prompt response.
>         >         >>
>         >         >> Could you explain  'recursive usage of mutex' a
>         little
>         >         further?
>         >         >> Are the xenomai pthread-mutexes very different in
>         behaviour
>         >         than regular
>         >         >> posix mutexes?
>         >         >
>         >         > The posix specification does not define the
>         default type of
>         >         a mutex. So,
>         >         >  in short, the behaviour of a "regular posix
>         mutex" is
>         >         unspecified.
>         >         > However, following the principle of least
>         surprise, Xenomai
>         >         chose, like
>         >         > Linux, to use the "normal" type by default.
>         >         >
>         >         > What is the type of a posix mutex is explained in
>         many
>         >         places, starting
>         >         > with Xenomai API documentation. So, no, I will not
>         repeat it
>         >         here.
>         >
>         >
>         >         Actually, that is not your problem. However, you do
>         not check
>         >         the return
>         >         value of posix services, which is a bad idea. And
>         indeed, if
>         >         you check
>         >         it you will find your error: a thread which does not
>         own a
>         >         mutex tries
>         >         to unlock it.
>         >
>         >         Sorry, mutex are not semaphore, this is invalid, and
>         Xenomai
>         >         returns an
>         >         error in such a case.
>         >
>         >         --
>         >                                                    Gilles.
>         >
>         
>         > _______________________________________________
>         > Xenomai-core mailing list
>         > Xenomai-core@domain.hid
>         > https://mail.gna.org/listinfo/xenomai-core
>         
>         
>         
>         --
>         Philippe.
>         
>         
> 


-- 
Philippe.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] co-kernel benchmarking on arm926 (was: Fwd: problem in pthread_mutex_lock/unlock)
  2010-06-25 15:00                 ` [Xenomai-core] co-kernel benchmarking on arm926 (was: Fwd: problem in pthread_mutex_lock/unlock) Philippe Gerum
@ 2010-06-28 17:50                   ` Nero Fernandez
  2010-06-28 21:31                     ` Philippe Gerum
  2010-06-28 21:50                     ` [Xenomai-core] co-kernel benchmarking on arm926 Gilles Chanteperdrix
  0 siblings, 2 replies; 18+ messages in thread
From: Nero Fernandez @ 2010-06-28 17:50 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 14617 bytes --]

On Fri, Jun 25, 2010 at 8:30 PM, Philippe Gerum <rpm@xenomai.org> wrote:

> On Thu, 2010-06-24 at 17:05 +0530, Nero Fernandez wrote:
> > Thanks for your response, Philippe.
> >
> > The concerns while the carrying out my experiments were to:
> >
> >  - compare xenomai co-kernel overheads (timer and context switch
> > latencies)
> >    in xenomai-space vs similar native-linux overheads. These are
> > presented in
> >    the first two sheets.
> >
> >  - find out, how addition of xenomai, xenomai+adeos effects the native
> > kernel's
> >    performance. Here, lmbench was used on the native linux side to
> > estimate
> >    the changes to standard linux services.
>
> How can your reasonably estimate the overhead of co-kernel services
> without running any co-kernel services? Interrupt pipelining is not a
> co-kernel service. You do nothing with interrupt pipelining except
> enabling co-kernel services to be implemented with real-time response
> guarantee.
>

Repeating myself, sheet 1 and 2 contain the results of running
co-kernel services(real-time pthread, message-queues, semaphores
and clock-nansleep) and making measurment regarding scheduling
and timer-base functionality provided by co-kernel via posix skin.

Same code was then built native posix, instead of  xenomai-posix skin
and similar measurements were taken for linux-scheduler and timerbase.
This is something that i cant do with xenomai's native test (use it for
native linux benchmarking).
The point here is to demostrate what kind of benefits may be drawn using
 xenomai-space without any code change.



>
> > Regarding the additions of latency measurements in sys-timer handler,
> > i performed
> > a similar measurement from xnintr_clock_handler(), and the results
> > were similar
> > to ones reported from sys-timer handler in xenomai-enabled linux.
>
> If your benchmark is about Xenomai, then at least make sure to provide
> results for Xenomai services, used in a relevant application and
> platform context. Pretending that you instrumented
> xnintr_clock_handler() at some point and got some results, but
> eventually decided to illustrate your benchmark with other "similar"
> results obtained from a totally unrelated instrumentation code, does not
> help considering the figures as relevant.
>
> Btw, hooking xnintr_clock_handler() is not correct. Again, benchmarking
> interrupt latency with Xenomai has to measure the entire code path, from
> the moment the interrupt is taken by the CPU, until it is delivered to
> the Xenomai service user. By instrumenting directly in
> xnintr_clock_handler(), your test bypasses the Xenomai timer handling
> code which delivers the timer tick to the user code, and the
> rescheduling procedure as well, so your figures are optimistically wrong
> for any normal use case based on real-time tasks.
>

Regarding hooking up a measurement-device in sys-timer itself, it serves
the benefit of observing the changes that xenomai's aperiodic handling
of system-timer brings. This measurement does not attempt to measure
the co-kernel services in any manner.


>

>  While trying to
> > make both these measurements, i tried to take care that delay-value
> > logging is
> > done at the end the handler routines,but the __ipipe_mach_tsc value is
> > recorded
> > at the beginning of the routine (a patch for this is included in the
> > worksheet itself)
>
> This patch is hopelessly useless and misleading. Unless your intent is
> to have your application directly embodied into low-level interrupt
> handlers, you are not measuring the actual overhead.
>
> Latency is not solely a matter of interrupt masking, but also a matter
> of I/D cache misses, particularly on ARM - you have to traverse the
> actual code until delivery to exhibit the latter.
>
> This is exactly what the latency tests shipped with Xenomai are for:
> - /usr/xenomai/bin/latency -t0/1/2
> - /usr/xenomai/bin/klatency
> - /usr/xenomai/bin/irqbench
>
> If your system involves user-space tasks, then you should benchmark
> user-space response time using latency [-t0]. If you plan to use
> kernel-based tasks such as RTDM tasks, then latency -t1 and klatency
> tests will provide correct results for your benchmark.
> If you are interested only in interrupt latency, then latency -t2 will
> help.
>
> If you do think that those tests do not measure what you seem to be
> interested in, then you may want to explain why on this list, so that we
> eventually understand what you are after.
>
> >
> > Regarding the system, changing the kernel version would invalidate my
> > results
> > as the system is a released CE device and has no plans to upgrade the
> > kernel.
>
> Ok. But that makes your benchmark 100% irrelevant with respect to
> assessing the real performances of a decent co-kernel on your setup.
>
> > AFAIK, enabling FCSE would limit the number of concurrent processes,
> > hence
> > becoming inviable in my scenario.
>
> Ditto. Besides, FCSE as implemented in recent I-pipe patches has a
> best-effort mode which lifts those limitations, at the expense of
> voiding the latency guarantee, but on the average, that would still be
> much better than always suffering the VIVT cache insanity without FCSE
>

Thanks for mentioning this. I will try to enable this option for
re-measurements.


> Quoting a previous mail of yours, regarding your target:
> > Processor       : ARM926EJ-S rev 5 (v5l)
>
> The latency hit induced by VIVT caching on arm926 is typically in the
> 180-200 us range under load in user-space, and 100-120 us in kernel
> space. So, without FCSE, this would bite at each Xenomai __and__ linux
> process context switch. Since your application requires that more than
> 95 processes be available in the system, you will likely get quite a few
> switches in any given period of time, unless most of them always sleep,
> of course.
>
> Ok, so let me do some wild guesses here: you told us this is a CE-based
> application; maybe it exists already? maybe it has to be put on steroïds
> for gaining decent real-time guarantees it doesn't have yet? and perhaps
> the design of that application involves many processes undergoing
> periodic activities, so lots of context switches with address space
> changes during normal operations?
>
> And, you want that to run on arm926, with no FCSE, and likely not a huge
> amount of RAM either, with more than 95 different address spaces? Don't
> you think there might be a problem? If so, don't you think implementing
> a benchmark based on those assumptions might be irrelevant at some
> point?
>
> > As far as the adeos patch is concerned, i took a recent one (2.6.32)
>
> I guess you meant 2.6.33?
>

Correction, 2.6.30.


> >  and back-ported
> > it to 2.6.18, so as not to lose out on any new Adeos-only upgrades. i
> > carried out the
> > back-port activity for two platforms,a qemu-based integrator platform
> > (for
> > minimal functional validity) and my proprietary board.
> >
> > However, i am new to this field and would like to correct things if i
> > went wrong anywhere.0
> > Your comments and guidance would be much appreciated.
> >
>
> Since you told us only very few details, it's quite difficult to help.
> AFAICS, the only advice that would make sense here, can be expressed as
> a question for you: are you really, 100% sure that your app would fit on
> that hardware, even without any real-time requirement?
>
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Jun 24, 2010 at 3:30 AM, Philippe Gerum <rpm@xenomai.org>
> > wrote:
> >         On Thu, 2010-06-24 at 02:15 +0530, Nero Fernandez wrote:
> >         > Thanks for your response, Gilles.
> >         >
> >         > i modified the code to use semaphore instead of mutex, which
> >         worked
> >         > fine.
> >         > Attached is a compilation of some latency figures and system
> >         loading
> >         > figures (using lmbench)
> >         > that i obtained from my proprietary ARM-9 board, using
> >         Xenomai-2.5.2.
> >         >
> >         > Any comments are welcome. TIY.
> >         >
> >
> >
> >         Yikes. Let me sum up what I understood from your intent:
> >
> >         - you are measuring lmbench test latencies, that is to say,
> >         you don't
> >         measure the real-time core capabilities at all. Unless you
> >         crafted a
> >         Xenomai-linked version of lmbench, you are basically testing
> >         regular
> >         processes.
> >
> >         - you are benchmarking your own port of the interrupt pipeline
> >         over some
> >         random, outdated vendor kernel (2.6.18-based Mvista 5.0 dates
> >         back to
> >         2007, right?), albeit the original ARM port of such code is
> >         based on
> >         mainline since day #1. Since the latest latency-saving
> >         features like
> >         FCSE are available with Adeos patches on recent kernels, you
> >         are likely
> >         looking at ancient light rays from a fossile galaxy (btw, this
> >         may
> >         explain the incorrect results in the 0k context switch test -
> >         you don't
> >         have FCSE enabled in your Adeos port, right?).
> >
> >         - instead of reporting figures from a real-time interrupt
> >         handler
> >         actually connected to the Xenomai core, you hijacked the
> >         system timer
> >         core to pile up your instrumentation on top of the original
> >         code you
> >         were supposed to benchmark. If this helps,
> >         run /usr/xenomai/bin/latency
> >         -t2 and you will get the real figures.
> >
> >         Quoting you, from your document:
> >         "The intent for running these tests is to gauge the overhead
> >         of running
> >         interrupt-virtualization and further running a (real-time
> >         co-kernel +
> >         interrupt virtualization) on an embedded-device."
> >
> >         I'm unsure that you clearly identified the functional layers.
> >         If you
> >         don't measure the Xenomai core based on Xenomai activities,
> >         then you
> >         don't measure the co-kernel overhead. Besides, trying to
> >         measure the
> >         interrupt pipeline overhead via the lmbench micro-benchmarks
> >         makes no
> >         sense.
> >
> >
> >         >
> >         > On Sat, Jun 19, 2010 at 1:15 AM, Gilles Chanteperdrix
> >         > <gilles.chanteperdrix@xenomai.org> wrote:
> >         >
> >         >         Gilles Chanteperdrix wrote:
> >         >         > Nero Fernandez wrote:
> >         >         >> On Fri, Jun 18, 2010 at 7:42 PM, Gilles
> >         Chanteperdrix
> >         >         >> <gilles.chanteperdrix@xenomai.org
> >         >         >> <mailto:gilles.chanteperdrix@xenomai.org>> wrote:
> >         >         >>
> >         >         >>     Nero Fernandez wrote:
> >         >         >>     > Hi,
> >         >         >>     >
> >         >         >>     > Please find an archive attached,
> >         containing :
> >         >         >>     >  - a program for testing
> >         context-switch-latency using
> >         >         posix-APIs
> >         >         >>     >    for native linux kernel and
> >         xenomai-posix-skin
> >         >         (userspace).
> >         >         >>     >  - Makefile to build it using xenomai
> >         >         >>
> >         >         >>     Your program is very long to tell fast. But
> >         it seems
> >         >         you are using the
> >         >         >>     mutex as if they were recursive. Xenomai
> >         posix skin
> >         >         mutexes used to be
> >         >         >>     recursive by default, but no longer are.
> >         >         >>
> >         >         >>     Also note that your code does not check the
> >         return
> >         >         value of the posix
> >         >         >>     skin services, which is a really bad idea.
> >         >         >>
> >         >         >>     --
> >         >         >>
> >          Gilles.
> >         >         >>
> >         >         >>
> >         >         >> Thanks for the prompt response.
> >         >         >>
> >         >         >> Could you explain  'recursive usage of mutex' a
> >         little
> >         >         further?
> >         >         >> Are the xenomai pthread-mutexes very different in
> >         behaviour
> >         >         than regular
> >         >         >> posix mutexes?
> >         >         >
> >         >         > The posix specification does not define the
> >         default type of
> >         >         a mutex. So,
> >         >         >  in short, the behaviour of a "regular posix
> >         mutex" is
> >         >         unspecified.
> >         >         > However, following the principle of least
> >         surprise, Xenomai
> >         >         chose, like
> >         >         > Linux, to use the "normal" type by default.
> >         >         >
> >         >         > What is the type of a posix mutex is explained in
> >         many
> >         >         places, starting
> >         >         > with Xenomai API documentation. So, no, I will not
> >         repeat it
> >         >         here.
> >         >
> >         >
> >         >         Actually, that is not your problem. However, you do
> >         not check
> >         >         the return
> >         >         value of posix services, which is a bad idea. And
> >         indeed, if
> >         >         you check
> >         >         it you will find your error: a thread which does not
> >         own a
> >         >         mutex tries
> >         >         to unlock it.
> >         >
> >         >         Sorry, mutex are not semaphore, this is invalid, and
> >         Xenomai
> >         >         returns an
> >         >         error in such a case.
> >         >
> >         >         --
> >         >                                                    Gilles.
> >         >
> >
> >         > _______________________________________________
> >         > Xenomai-core mailing list
> >         > Xenomai-core@domain.hid
> >         > https://mail.gna.org/listinfo/xenomai-core
> >
> >
> >
> >         --
> >         Philippe.
> >
> >
> >
>
>
> --
> Philippe.
>
>
>

[-- Attachment #2: Type: text/html, Size: 18153 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Fwd: problem in pthread_mutex_lock/unlock
  2010-06-24 14:14                     ` Gilles Chanteperdrix
@ 2010-06-28 17:53                       ` Nero Fernandez
  2010-06-28 19:26                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 18+ messages in thread
From: Nero Fernandez @ 2010-06-28 17:53 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]

Thanks for the rootfs, Gilles.
Although i am unable to use it directly (i get 'illegal instruction'
error while running any application/busybox-applet), i will try
to construct a similar test-setup for further testing.

On Thu, Jun 24, 2010 at 7:44 PM, Gilles Chanteperdrix <
gilles.chanteperdrix@xenomai.org> wrote:

> Nero Fernandez wrote:
> >
> > Yes, the measurements are on no-load scenarios.
> > I will try to repeat my measurements with system-loads as you suggest.
>
> You can find a working root filesystem image with Xenomai 2.5.3 compiled
> here:
> http://www.xenomai.org/~gch/pub/rootfs-arm926-ejs.tar.bz2<http://www.xenomai.org/%7Egch/pub/rootfs-arm926-ejs.tar.bz2>
>
> The root password is empty, the system launches a telnet daemon, so you
> can log on the board via telnet.
>
> To run the tests, launch in a first telnet session:
> echo 0 > /proc/xenomai/latency
> latency -T 2 -H
> in a second telnet sesssion, launch:
> dohell
> When you see "Listening on any address 5566", run on the host:
> netcat <target-name-or-IP> 5566
>
> where <target-name-or-IP> is the name of your arm board in the host
> /etc/hosts file or its IP address.
>
> Now, you can let the system run as long as the latency test prints
> message. When the dohell script is done, it will kill the latency test,
> which will cause it to print the histogram values and exit.
>
> --
>                                             Gilles.
>

[-- Attachment #2: Type: text/html, Size: 1927 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Fwd: problem in pthread_mutex_lock/unlock
  2010-06-28 17:53                       ` Nero Fernandez
@ 2010-06-28 19:26                         ` Gilles Chanteperdrix
  0 siblings, 0 replies; 18+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-28 19:26 UTC (permalink / raw)
  To: Nero Fernandez; +Cc: xenomai

Nero Fernandez wrote:
> Thanks for the rootfs, Gilles.
> Although i am unable to use it directly (i get 'illegal instruction'
> error while running any application/busybox-applet), i will try
> to construct a similar test-setup for further testing.

It is an EABI rootfs. Is EABI enabled in your kernel configuration?


-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] co-kernel benchmarking on arm926 (was: Fwd: problem in pthread_mutex_lock/unlock)
  2010-06-28 17:50                   ` Nero Fernandez
@ 2010-06-28 21:31                     ` Philippe Gerum
  2010-06-28 21:50                     ` [Xenomai-core] co-kernel benchmarking on arm926 Gilles Chanteperdrix
  1 sibling, 0 replies; 18+ messages in thread
From: Philippe Gerum @ 2010-06-28 21:31 UTC (permalink / raw)
  To: Nero Fernandez; +Cc: xenomai

On Mon, 2010-06-28 at 23:20 +0530, Nero Fernandez wrote:
> 
> 
> On Fri, Jun 25, 2010 at 8:30 PM, Philippe Gerum <rpm@xenomai.org>
> wrote:
>         On Thu, 2010-06-24 at 17:05 +0530, Nero Fernandez wrote:
>         > Thanks for your response, Philippe.
>         >
>         > The concerns while the carrying out my experiments were to:
>         >
>         >  - compare xenomai co-kernel overheads (timer and context
>         switch
>         > latencies)
>         >    in xenomai-space vs similar native-linux overheads. These
>         are
>         > presented in
>         >    the first two sheets.
>         >
>         >  - find out, how addition of xenomai, xenomai+adeos effects
>         the native
>         > kernel's
>         >    performance. Here, lmbench was used on the native linux
>         side to
>         > estimate
>         >    the changes to standard linux services.
>         
>         How can your reasonably estimate the overhead of co-kernel
>         services
>         without running any co-kernel services? Interrupt pipelining
>         is not a
>         co-kernel service. You do nothing with interrupt pipelining
>         except
>         enabling co-kernel services to be implemented with real-time
>         response
>         guarantee.
>  
> Repeating myself, sheet 1 and 2 contain the results of running 
> co-kernel services(real-time pthread, message-queues, semaphores 
> and clock-nansleep) and making measurment regarding scheduling 
> and timer-base functionality provided by co-kernel via posix skin.
> 

Ok, thanks for rehashing. But since you sent a series of latency
benchmarks on worst-case latencies done with no load on a sub-optimal
FCSE-less kernel, I just wanted to be sure that we were now 100% on the
same page regarding the protocol for testing those picky things. But
this rehashing still does not answer my concerns, actually:

> Same code was then built native posix, instead of  xenomai-posix skin
> and similar measurements were taken for linux-scheduler and timerbase.
> This is something that i cant do with xenomai's native test (use it
> for
> native linux benchmarking). 
> The point here is to demostrate what kind of benefits may be drawn
> using
>  xenomai-space without any code change.

Please re-read my post, you told us:

>  - find out, how addition of xenomai, xenomai+adeos effects the native
>         > kernel's
>         >    performance. Here, lmbench was used on the native linux
>         side to
>         > estimate
>         >    the changes to standard linux services.
>         

>From your description, you are trying to measure the overhead of the
interrupt pipeline activity on an some native lmbench load for assessing
the Xenomai core impact on your system. If you are not doing that, then
no issue.

>         
>         
> 
> 
>          
>         >
>         > Regarding the additions of latency measurements in sys-timer
>         handler,
>         > i performed
>         > a similar measurement from xnintr_clock_handler(), and the
>         results
>         > were similar
>         > to ones reported from sys-timer handler in xenomai-enabled
>         linux.
>         
>         If your benchmark is about Xenomai, then at least make sure to
>         provide
>         results for Xenomai services, used in a relevant application
>         and
>         platform context. Pretending that you instrumented
>         xnintr_clock_handler() at some point and got some results, but
>         eventually decided to illustrate your benchmark with other
>         "similar"
>         results obtained from a totally unrelated instrumentation
>         code, does not
>         help considering the figures as relevant.
>         
>         Btw, hooking xnintr_clock_handler() is not correct. Again,
>         benchmarking
>         interrupt latency with Xenomai has to measure the entire code
>         path, from
>         the moment the interrupt is taken by the CPU, until it is
>         delivered to
>         the Xenomai service user. By instrumenting directly in
>         xnintr_clock_handler(), your test bypasses the Xenomai timer
>         handling
>         code which delivers the timer tick to the user code, and the
>         rescheduling procedure as well, so your figures are
>         optimistically wrong
>         for any normal use case based on real-time tasks.
>  
> Regarding hooking up a measurement-device in sys-timer itself, it
> serves
> the benefit of observing the changes that xenomai's aperiodic handling
> of system-timer brings. This measurement does not attempt to measure 
> the co-kernel services in any manner.
>  

Your instrumentation code in the system timer handling seems to be about
measuring tick latencies, so if latency-related drifts in serving timers
are not "the changes" you want to observe this way, what changes do you
intend to observe, given that aperiodic tick management is 100%
Xenomai's business (see nucleus/timer.c)? 

>          
>         >  While trying to
>         > make both these measurements, i tried to take care that
>         delay-value
>         > logging is
>         > done at the end the handler routines,but the
>         __ipipe_mach_tsc value is
>         > recorded
>         > at the beginning of the routine (a patch for this is
>         included in the
>         > worksheet itself)
>         
>         This patch is hopelessly useless and misleading. Unless your
>         intent is
>         to have your application directly embodied into low-level
>         interrupt
>         handlers, you are not measuring the actual overhead.
>         
>         Latency is not solely a matter of interrupt masking, but also
>         a matter
>         of I/D cache misses, particularly on ARM - you have to
>         traverse the
>         actual code until delivery to exhibit the latter.
>         
>         This is exactly what the latency tests shipped with Xenomai
>         are for:
>         - /usr/xenomai/bin/latency -t0/1/2
>         - /usr/xenomai/bin/klatency
>         - /usr/xenomai/bin/irqbench
>         
>         If your system involves user-space tasks, then you should
>         benchmark
>         user-space response time using latency [-t0]. If you plan to
>         use
>         kernel-based tasks such as RTDM tasks, then latency -t1 and
>         klatency
>         tests will provide correct results for your benchmark.
>         If you are interested only in interrupt latency, then latency
>         -t2 will
>         help.
>         
>         If you do think that those tests do not measure what you seem
>         to be
>         interested in, then you may want to explain why on this list,
>         so that we
>         eventually understand what you are after.
>         
>         >
>         > Regarding the system, changing the kernel version would
>         invalidate my
>         > results
>         > as the system is a released CE device and has no plans to
>         upgrade the
>         > kernel.
>         
>         Ok. But that makes your benchmark 100% irrelevant with respect
>         to
>         assessing the real performances of a decent co-kernel on your
>         setup.
>         
>         > AFAIK, enabling FCSE would limit the number of concurrent
>         processes,
>         > hence
>         > becoming inviable in my scenario.
>         
>         Ditto. Besides, FCSE as implemented in recent I-pipe patches
>         has a
>         best-effort mode which lifts those limitations, at the expense
>         of
>         voiding the latency guarantee, but on the average, that would
>         still be
>         much better than always suffering the VIVT cache insanity
>         without FCSE
> 
> Thanks for mentioning this. I will try to enable this option for
> re-measurements.
>  
> 
>         Quoting a previous mail of yours, regarding your target:
>         > Processor       : ARM926EJ-S rev 5 (v5l)
>         
>         The latency hit induced by VIVT caching on arm926 is typically
>         in the
>         180-200 us range under load in user-space, and 100-120 us in
>         kernel
>         space. So, without FCSE, this would bite at each Xenomai
>         __and__ linux
>         process context switch. Since your application requires that
>         more than
>         95 processes be available in the system, you will likely get
>         quite a few
>         switches in any given period of time, unless most of them
>         always sleep,
>         of course.
>         
>         Ok, so let me do some wild guesses here: you told us this is a
>         CE-based
>         application; maybe it exists already? maybe it has to be put
>         on steroïds
>         for gaining decent real-time guarantees it doesn't have yet?
>         and perhaps
>         the design of that application involves many processes
>         undergoing
>         periodic activities, so lots of context switches with address
>         space
>         changes during normal operations?
>         
>         And, you want that to run on arm926, with no FCSE, and likely
>         not a huge
>         amount of RAM either, with more than 95 different address
>         spaces? Don't
>         you think there might be a problem? If so, don't you think
>         implementing
>         a benchmark based on those assumptions might be irrelevant at
>         some
>         point?
>         
>         > As far as the adeos patch is concerned, i took a recent one
>         (2.6.32)
>         
>         I guess you meant 2.6.33?
>  
> Correction, 2.6.30.
> 
> 
>         
>         >  and back-ported
>         > it to 2.6.18, so as not to lose out on any new Adeos-only
>         upgrades. i
>         > carried out the
>         > back-port activity for two platforms,a qemu-based integrator
>         platform
>         > (for
>         > minimal functional validity) and my proprietary board.
>         >
>         > However, i am new to this field and would like to correct
>         things if i
>         > went wrong anywhere.0
>         > Your comments and guidance would be much appreciated.
>         >
>         
>         Since you told us only very few details, it's quite difficult
>         to help.
>         AFAICS, the only advice that would make sense here, can be
>         expressed as
>         a question for you: are you really, 100% sure that your app
>         would fit on
>         that hardware, even without any real-time requirement?
>         
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         > On Thu, Jun 24, 2010 at 3:30 AM, Philippe Gerum
>         <rpm@xenomai.org>
>         > wrote:
>         >         On Thu, 2010-06-24 at 02:15 +0530, Nero Fernandez
>         wrote:
>         >         > Thanks for your response, Gilles.
>         >         >
>         >         > i modified the code to use semaphore instead of
>         mutex, which
>         >         worked
>         >         > fine.
>         >         > Attached is a compilation of some latency figures
>         and system
>         >         loading
>         >         > figures (using lmbench)
>         >         > that i obtained from my proprietary ARM-9 board,
>         using
>         >         Xenomai-2.5.2.
>         >         >
>         >         > Any comments are welcome. TIY.
>         >         >
>         >
>         >
>         >         Yikes. Let me sum up what I understood from your
>         intent:
>         >
>         >         - you are measuring lmbench test latencies, that is
>         to say,
>         >         you don't
>         >         measure the real-time core capabilities at all.
>         Unless you
>         >         crafted a
>         >         Xenomai-linked version of lmbench, you are basically
>         testing
>         >         regular
>         >         processes.
>         >
>         >         - you are benchmarking your own port of the
>         interrupt pipeline
>         >         over some
>         >         random, outdated vendor kernel (2.6.18-based Mvista
>         5.0 dates
>         >         back to
>         >         2007, right?), albeit the original ARM port of such
>         code is
>         >         based on
>         >         mainline since day #1. Since the latest
>         latency-saving
>         >         features like
>         >         FCSE are available with Adeos patches on recent
>         kernels, you
>         >         are likely
>         >         looking at ancient light rays from a fossile galaxy
>         (btw, this
>         >         may
>         >         explain the incorrect results in the 0k context
>         switch test -
>         >         you don't
>         >         have FCSE enabled in your Adeos port, right?).
>         >
>         >         - instead of reporting figures from a real-time
>         interrupt
>         >         handler
>         >         actually connected to the Xenomai core, you hijacked
>         the
>         >         system timer
>         >         core to pile up your instrumentation on top of the
>         original
>         >         code you
>         >         were supposed to benchmark. If this helps,
>         >         run /usr/xenomai/bin/latency
>         >         -t2 and you will get the real figures.
>         >
>         >         Quoting you, from your document:
>         >         "The intent for running these tests is to gauge the
>         overhead
>         >         of running
>         >         interrupt-virtualization and further running a
>         (real-time
>         >         co-kernel +
>         >         interrupt virtualization) on an embedded-device."
>         >
>         >         I'm unsure that you clearly identified the
>         functional layers.
>         >         If you
>         >         don't measure the Xenomai core based on Xenomai
>         activities,
>         >         then you
>         >         don't measure the co-kernel overhead. Besides,
>         trying to
>         >         measure the
>         >         interrupt pipeline overhead via the lmbench
>         micro-benchmarks
>         >         makes no
>         >         sense.
>         >
>         >
>         >         >
>         >         > On Sat, Jun 19, 2010 at 1:15 AM, Gilles
>         Chanteperdrix
>         >         > <gilles.chanteperdrix@xenomai.org> wrote:
>         >         >
>         >         >         Gilles Chanteperdrix wrote:
>         >         >         > Nero Fernandez wrote:
>         >         >         >> On Fri, Jun 18, 2010 at 7:42 PM, Gilles
>         >         Chanteperdrix
>         >         >         >> <gilles.chanteperdrix@xenomai.org
>         >         >         >>
>         <mailto:gilles.chanteperdrix@xenomai.org>> wrote:
>         >         >         >>
>         >         >         >>     Nero Fernandez wrote:
>         >         >         >>     > Hi,
>         >         >         >>     >
>         >         >         >>     > Please find an archive attached,
>         >         containing :
>         >         >         >>     >  - a program for testing
>         >         context-switch-latency using
>         >         >         posix-APIs
>         >         >         >>     >    for native linux kernel and
>         >         xenomai-posix-skin
>         >         >         (userspace).
>         >         >         >>     >  - Makefile to build it using
>         xenomai
>         >         >         >>
>         >         >         >>     Your program is very long to tell
>         fast. But
>         >         it seems
>         >         >         you are using the
>         >         >         >>     mutex as if they were recursive.
>         Xenomai
>         >         posix skin
>         >         >         mutexes used to be
>         >         >         >>     recursive by default, but no longer
>         are.
>         >         >         >>
>         >         >         >>     Also note that your code does not
>         check the
>         >         return
>         >         >         value of the posix
>         >         >         >>     skin services, which is a really
>         bad idea.
>         >         >         >>
>         >         >         >>     --
>         >         >         >>
>         >          Gilles.
>         >         >         >>
>         >         >         >>
>         >         >         >> Thanks for the prompt response.
>         >         >         >>
>         >         >         >> Could you explain  'recursive usage of
>         mutex' a
>         >         little
>         >         >         further?
>         >         >         >> Are the xenomai pthread-mutexes very
>         different in
>         >         behaviour
>         >         >         than regular
>         >         >         >> posix mutexes?
>         >         >         >
>         >         >         > The posix specification does not define
>         the
>         >         default type of
>         >         >         a mutex. So,
>         >         >         >  in short, the behaviour of a "regular
>         posix
>         >         mutex" is
>         >         >         unspecified.
>         >         >         > However, following the principle of
>         least
>         >         surprise, Xenomai
>         >         >         chose, like
>         >         >         > Linux, to use the "normal" type by
>         default.
>         >         >         >
>         >         >         > What is the type of a posix mutex is
>         explained in
>         >         many
>         >         >         places, starting
>         >         >         > with Xenomai API documentation. So, no,
>         I will not
>         >         repeat it
>         >         >         here.
>         >         >
>         >         >
>         >         >         Actually, that is not your problem.
>         However, you do
>         >         not check
>         >         >         the return
>         >         >         value of posix services, which is a bad
>         idea. And
>         >         indeed, if
>         >         >         you check
>         >         >         it you will find your error: a thread
>         which does not
>         >         own a
>         >         >         mutex tries
>         >         >         to unlock it.
>         >         >
>         >         >         Sorry, mutex are not semaphore, this is
>         invalid, and
>         >         Xenomai
>         >         >         returns an
>         >         >         error in such a case.
>         >         >
>         >         >         --
>         >         >
>          Gilles.
>         >         >
>         >
>         >         > _______________________________________________
>         >         > Xenomai-core mailing list
>         >         > Xenomai-core@domain.hid
>         >         > https://mail.gna.org/listinfo/xenomai-core
>         >
>         >
>         >
>         >         --
>         >         Philippe.
>         >
>         >
>         >
>         
>         
>         --
>         Philippe.
>         
>         
> 


-- 
Philippe.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] co-kernel benchmarking on arm926
  2010-06-28 17:50                   ` Nero Fernandez
  2010-06-28 21:31                     ` Philippe Gerum
@ 2010-06-28 21:50                     ` Gilles Chanteperdrix
  1 sibling, 0 replies; 18+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-28 21:50 UTC (permalink / raw)
  To: Nero Fernandez; +Cc: xenomai

Nero Fernandez wrote:
> On Fri, Jun 25, 2010 at 8:30 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> 
>> On Thu, 2010-06-24 at 17:05 +0530, Nero Fernandez wrote:
>>> Thanks for your response, Philippe.
>>>
>>> The concerns while the carrying out my experiments were to:
>>>
>>>  - compare xenomai co-kernel overheads (timer and context switch
>>> latencies)
>>>    in xenomai-space vs similar native-linux overheads. These are
>>> presented in
>>>    the first two sheets.
>>>
>>>  - find out, how addition of xenomai, xenomai+adeos effects the native
>>> kernel's
>>>    performance. Here, lmbench was used on the native linux side to
>>> estimate
>>>    the changes to standard linux services.
>> How can your reasonably estimate the overhead of co-kernel services
>> without running any co-kernel services? Interrupt pipelining is not a
>> co-kernel service. You do nothing with interrupt pipelining except
>> enabling co-kernel services to be implemented with real-time response
>> guarantee.
>>
> 
> Repeating myself, sheet 1 and 2 contain the results of running
> co-kernel services(real-time pthread, message-queues, semaphores
> and clock-nansleep) and making measurment regarding scheduling
> and timer-base functionality provided by co-kernel via posix skin.
> 
> Same code was then built native posix, instead of  xenomai-posix skin
> and similar measurements were taken for linux-scheduler and timerbase.
> This is something that i cant do with xenomai's native test (use it for
> native linux benchmarking).
> The point here is to demostrate what kind of benefits may be drawn using
>  xenomai-space without any code change.
> 
> 
> 
>>> Regarding the additions of latency measurements in sys-timer handler,
>>> i performed
>>> a similar measurement from xnintr_clock_handler(), and the results
>>> were similar
>>> to ones reported from sys-timer handler in xenomai-enabled linux.
>> If your benchmark is about Xenomai, then at least make sure to provide
>> results for Xenomai services, used in a relevant application and
>> platform context. Pretending that you instrumented
>> xnintr_clock_handler() at some point and got some results, but
>> eventually decided to illustrate your benchmark with other "similar"
>> results obtained from a totally unrelated instrumentation code, does not
>> help considering the figures as relevant.
>>
>> Btw, hooking xnintr_clock_handler() is not correct. Again, benchmarking
>> interrupt latency with Xenomai has to measure the entire code path, from
>> the moment the interrupt is taken by the CPU, until it is delivered to
>> the Xenomai service user. By instrumenting directly in
>> xnintr_clock_handler(), your test bypasses the Xenomai timer handling
>> code which delivers the timer tick to the user code, and the
>> rescheduling procedure as well, so your figures are optimistically wrong
>> for any normal use case based on real-time tasks.
>>
> 
> Regarding hooking up a measurement-device in sys-timer itself, it serves
> the benefit of observing the changes that xenomai's aperiodic handling
> of system-timer brings. This measurement does not attempt to measure
> the co-kernel services in any manner.
> 
> 
> 
>>  While trying to
>>> make both these measurements, i tried to take care that delay-value
>>> logging is
>>> done at the end the handler routines,but the __ipipe_mach_tsc value is
>>> recorded
>>> at the beginning of the routine (a patch for this is included in the
>>> worksheet itself)
>> This patch is hopelessly useless and misleading. Unless your intent is
>> to have your application directly embodied into low-level interrupt
>> handlers, you are not measuring the actual overhead.
>>
>> Latency is not solely a matter of interrupt masking, but also a matter
>> of I/D cache misses, particularly on ARM - you have to traverse the
>> actual code until delivery to exhibit the latter.
>>
>> This is exactly what the latency tests shipped with Xenomai are for:
>> - /usr/xenomai/bin/latency -t0/1/2
>> - /usr/xenomai/bin/klatency
>> - /usr/xenomai/bin/irqbench
>>
>> If your system involves user-space tasks, then you should benchmark
>> user-space response time using latency [-t0]. If you plan to use
>> kernel-based tasks such as RTDM tasks, then latency -t1 and klatency
>> tests will provide correct results for your benchmark.
>> If you are interested only in interrupt latency, then latency -t2 will
>> help.
>>
>> If you do think that those tests do not measure what you seem to be
>> interested in, then you may want to explain why on this list, so that we
>> eventually understand what you are after.
>>
>>> Regarding the system, changing the kernel version would invalidate my
>>> results
>>> as the system is a released CE device and has no plans to upgrade the
>>> kernel.
>> Ok. But that makes your benchmark 100% irrelevant with respect to
>> assessing the real performances of a decent co-kernel on your setup.
>>
>>> AFAIK, enabling FCSE would limit the number of concurrent processes,
>>> hence
>>> becoming inviable in my scenario.
>> Ditto. Besides, FCSE as implemented in recent I-pipe patches has a
>> best-effort mode which lifts those limitations, at the expense of
>> voiding the latency guarantee, but on the average, that would still be
>> much better than always suffering the VIVT cache insanity without FCSE
>>
> 
> Thanks for mentioning this. I will try to enable this option for
> re-measurements.
> 
> 
>> Quoting a previous mail of yours, regarding your target:
>>> Processor       : ARM926EJ-S rev 5 (v5l)
>> The latency hit induced by VIVT caching on arm926 is typically in the
>> 180-200 us range under load in user-space, and 100-120 us in kernel
>> space. So, without FCSE, this would bite at each Xenomai __and__ linux
>> process context switch. Since your application requires that more than
>> 95 processes be available in the system, you will likely get quite a few
>> switches in any given period of time, unless most of them always sleep,
>> of course.
>>
>> Ok, so let me do some wild guesses here: you told us this is a CE-based
>> application; maybe it exists already? maybe it has to be put on steroïds
>> for gaining decent real-time guarantees it doesn't have yet? and perhaps
>> the design of that application involves many processes undergoing
>> periodic activities, so lots of context switches with address space
>> changes during normal operations?
>>
>> And, you want that to run on arm926, with no FCSE, and likely not a huge
>> amount of RAM either, with more than 95 different address spaces? Don't
>> you think there might be a problem? If so, don't you think implementing
>> a benchmark based on those assumptions might be irrelevant at some
>> point?
>>
>>> As far as the adeos patch is concerned, i took a recent one (2.6.32)
>> I guess you meant 2.6.33?
>>
> 
> Correction, 2.6.30.

Ok. If you are interested in the FCSE code, you may want to use FCSE v4.
See the comparison on the hackbench test here:
http://sisyphus.hd.free.fr/~gilles/pub/fcse/hackbench-fcse-v4.png

I did not rebase the I-pipe patch for 2.6.30 on this new fcse, but you
can find it in the patches for 2.6.31 and 2.6.33. Or as standalone trees
in my adeos git tree:
http://git.xenomai.org/?p=ipipe-gch.git;a=summary

Also note that since we are in the re-hashing tonight, as Philippe told
you, 95 processes is actually a lot on a low-end ARM platform, so you
would better be sure that you really need more than 95 processes
(beware, we are talking processes here, memory spaces, not threads, a
process may have has many threads as it wants) before deciding not to
use the FCSE guaranteed mode. Thinking that the number of processes is
unlimited on a low-end/embedded ARM system is an error: it is limited by
the available ressources (RAM, CPU) on your system. The lower the
ressources, the lower the practical limit is, and I bet this practical
limit is much lower than you would like.

-- 
					    Gilles.



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-06-28 21:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-18 13:52 [Xenomai-core] problem in pthread_mutex_lock/unlock Nero Fernandez
2010-06-18 14:12 ` Gilles Chanteperdrix
     [not found]   ` <AANLkTinABTK2nMI0QfZVaULQ4OKwF0678PKOBc_OMIn1@domain.hid>
2010-06-18 14:59     ` [Xenomai-core] Fwd: " Nero Fernandez
2010-06-18 15:08       ` Gilles Chanteperdrix
2010-06-18 19:45         ` Gilles Chanteperdrix
2010-06-23 20:45           ` Nero Fernandez
2010-06-23 22:00             ` Philippe Gerum
2010-06-24 11:35               ` Nero Fernandez
2010-06-24 11:50                 ` Gilles Chanteperdrix
2010-06-24 13:21                   ` Nero Fernandez
2010-06-24 14:14                     ` Gilles Chanteperdrix
2010-06-28 17:53                       ` Nero Fernandez
2010-06-28 19:26                         ` Gilles Chanteperdrix
2010-06-24 20:40                 ` Gilles Chanteperdrix
2010-06-25 15:00                 ` [Xenomai-core] co-kernel benchmarking on arm926 (was: Fwd: problem in pthread_mutex_lock/unlock) Philippe Gerum
2010-06-28 17:50                   ` Nero Fernandez
2010-06-28 21:31                     ` Philippe Gerum
2010-06-28 21:50                     ` [Xenomai-core] co-kernel benchmarking on arm926 Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.