All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] Simple application for invoking rtdm driver
@ 2018-03-20  1:42 Pintu Kumar
  2018-03-20  3:33 ` Greg Gallagher
  0 siblings, 1 reply; 18+ messages in thread
From: Pintu Kumar @ 2018-03-20  1:42 UTC (permalink / raw)
  To: Xenomai@xenomai.org

Hi,

I have developed a simple rtdm driver using: open, read_rt, write_rt, close.
Now I wanted to test it using a Xenomai native application, using native skin.

Here are my observation.

1) If I use normal open, read, write system call, then Xenomai reports
that normal read/write method is used for rtdm.
So, it does not work like that.

2) If I use, rt_dev_open, rt_dev_read, rt_dev_write, then it works fine.
But latency is very high for write/read, compared to normal.
Also, the migration document says these are legacy API and should be
replaced with rtdm_open, etc. for Xenomai 3.0.
However, if I use rtdm_open, rtdm_write, etc, it could not compile successfully.
I have included rtdm/rtdm.h header file.

So, please guide me which are the right APIs to use to invoke the rtdm driver.
I could to find the right example in test suite.


Thanks,
Pintu


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-20  1:42 [Xenomai] Simple application for invoking rtdm driver Pintu Kumar
@ 2018-03-20  3:33 ` Greg Gallagher
  2018-03-20  5:27   ` Pintu Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: Greg Gallagher @ 2018-03-20  3:33 UTC (permalink / raw)
  To: Pintu Kumar; +Cc: Xenomai@xenomai.org

If you want to use open, read, write you need to specify in the
makefile to use the posix skin.  You need something like these in your
Makefile:

XENO_CONFIG := /usr/xenomai/bin/xeno-config
CFLAGS := $(shell $(XENO_CONFIG) --posix --cflags)
LDFLAGS := $(shell  $(XENO_CONFIG) --posix --ldflags)


-Greg



On Mon, Mar 19, 2018 at 9:42 PM, Pintu Kumar <pintu.ping@gmail.com> wrote:
> Hi,
>
> I have developed a simple rtdm driver using: open, read_rt, write_rt, close.
> Now I wanted to test it using a Xenomai native application, using native skin.
>
> Here are my observation.
>
> 1) If I use normal open, read, write system call, then Xenomai reports
> that normal read/write method is used for rtdm.
> So, it does not work like that.
>
> 2) If I use, rt_dev_open, rt_dev_read, rt_dev_write, then it works fine.
> But latency is very high for write/read, compared to normal.
> Also, the migration document says these are legacy API and should be
> replaced with rtdm_open, etc. for Xenomai 3.0.
> However, if I use rtdm_open, rtdm_write, etc, it could not compile successfully.
> I have included rtdm/rtdm.h header file.
>
> So, please guide me which are the right APIs to use to invoke the rtdm driver.
> I could to find the right example in test suite.
>
>
> Thanks,
> Pintu
>
> _______________________________________________
> Xenomai mailing list
> Xenomai@xenomai.org
> https://xenomai.org/mailman/listinfo/xenomai


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-20  3:33 ` Greg Gallagher
@ 2018-03-20  5:27   ` Pintu Kumar
  2018-03-20  7:26     ` Pintu Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: Pintu Kumar @ 2018-03-20  5:27 UTC (permalink / raw)
  To: Greg Gallagher; +Cc: Xenomai@xenomai.org

On Tue, Mar 20, 2018 at 9:03 AM, Greg Gallagher <greg@embeddedgreg.com> wrote:
> If you want to use open, read, write you need to specify in the
> makefile to use the posix skin.  You need something like these in your
> Makefile:
>
> XENO_CONFIG := /usr/xenomai/bin/xeno-config
> CFLAGS := $(shell $(XENO_CONFIG) --posix --cflags)
> LDFLAGS := $(shell  $(XENO_CONFIG) --posix --ldflags)
>

Oh yes I forgot to mention with posix skin it is working.

But I wanted to use native API only, so I removed posix skin from Makefile.

For, native API, I am using: rt_dev_{open, read, write}. Is this the
valid API for Xenomai 3.0 ?
Or there is something else?
Is there any reference ?


Thanks,
Pintu


>
> -Greg
>
>
>
> On Mon, Mar 19, 2018 at 9:42 PM, Pintu Kumar <pintu.ping@gmail.com> wrote:
>> Hi,
>>
>> I have developed a simple rtdm driver using: open, read_rt, write_rt, close.
>> Now I wanted to test it using a Xenomai native application, using native skin.
>>
>> Here are my observation.
>>
>> 1) If I use normal open, read, write system call, then Xenomai reports
>> that normal read/write method is used for rtdm.
>> So, it does not work like that.
>>
>> 2) If I use, rt_dev_open, rt_dev_read, rt_dev_write, then it works fine.
>> But latency is very high for write/read, compared to normal.
>> Also, the migration document says these are legacy API and should be
>> replaced with rtdm_open, etc. for Xenomai 3.0.
>> However, if I use rtdm_open, rtdm_write, etc, it could not compile successfully.
>> I have included rtdm/rtdm.h header file.
>>
>> So, please guide me which are the right APIs to use to invoke the rtdm driver.
>> I could to find the right example in test suite.
>>
>>
>> Thanks,
>> Pintu
>>
>> _______________________________________________
>> Xenomai mailing list
>> Xenomai@xenomai.org
>> https://xenomai.org/mailman/listinfo/xenomai


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-20  5:27   ` Pintu Kumar
@ 2018-03-20  7:26     ` Pintu Kumar
  2018-03-20  9:32       ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Pintu Kumar @ 2018-03-20  7:26 UTC (permalink / raw)
  To: Greg Gallagher; +Cc: Xenomai@xenomai.org

On Tue, Mar 20, 2018 at 10:57 AM, Pintu Kumar <pintu.ping@gmail.com> wrote:
> On Tue, Mar 20, 2018 at 9:03 AM, Greg Gallagher <greg@embeddedgreg.com> wrote:
>> If you want to use open, read, write you need to specify in the
>> makefile to use the posix skin.  You need something like these in your
>> Makefile:
>>
>> XENO_CONFIG := /usr/xenomai/bin/xeno-config
>> CFLAGS := $(shell $(XENO_CONFIG) --posix --cflags)
>> LDFLAGS := $(shell  $(XENO_CONFIG) --posix --ldflags)
>>
>
> Oh yes I forgot to mention with posix skin it is working.
>
> But I wanted to use native API only, so I removed posix skin from Makefile.
>
> For, native API, I am using: rt_dev_{open, read, write}. Is this the
> valid API for Xenomai 3.0 ?
> Or there is something else?
> Is there any reference ?
>

Dear Greg,

In my sample, I am just copying some string from user <--> kernel and
printing them.
For normal driver, I get read/write latency like this:
write latency: 2.247 us
read latency: 2.202 us

For Xenomai 3.0 rtdm driver, using : rt_dev_{open, read, write}
I get the latency like this:
write latency: 7.668 us
read latency: 5.558 us

My concern is, why the latency is higher in case of RTDM ?
This is on x86-64 machine.

Latency is little better, when using only posix skin:
write latency: 3.587 us
read latency: 3.392 us


Do you have any inputs for this behavior ?


Thanks,
Pintu


>
> Thanks,
> Pintu
>
>
>>
>> -Greg
>>
>>
>>
>> On Mon, Mar 19, 2018 at 9:42 PM, Pintu Kumar <pintu.ping@gmail.com> wrote:
>>> Hi,
>>>
>>> I have developed a simple rtdm driver using: open, read_rt, write_rt, close.
>>> Now I wanted to test it using a Xenomai native application, using native skin.
>>>
>>> Here are my observation.
>>>
>>> 1) If I use normal open, read, write system call, then Xenomai reports
>>> that normal read/write method is used for rtdm.
>>> So, it does not work like that.
>>>
>>> 2) If I use, rt_dev_open, rt_dev_read, rt_dev_write, then it works fine.
>>> But latency is very high for write/read, compared to normal.
>>> Also, the migration document says these are legacy API and should be
>>> replaced with rtdm_open, etc. for Xenomai 3.0.
>>> However, if I use rtdm_open, rtdm_write, etc, it could not compile successfully.
>>> I have included rtdm/rtdm.h header file.
>>>
>>> So, please guide me which are the right APIs to use to invoke the rtdm driver.
>>> I could to find the right example in test suite.
>>>
>>>
>>> Thanks,
>>> Pintu
>>>
>>> _______________________________________________
>>> Xenomai mailing list
>>> Xenomai@xenomai.org
>>> https://xenomai.org/mailman/listinfo/xenomai


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-20  7:26     ` Pintu Kumar
@ 2018-03-20  9:32       ` Philippe Gerum
  2018-03-20 11:31         ` Pintu Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2018-03-20  9:32 UTC (permalink / raw)
  To: Pintu Kumar, Greg Gallagher; +Cc: Xenomai@xenomai.org

On 03/20/2018 08:26 AM, Pintu Kumar wrote:
> On Tue, Mar 20, 2018 at 10:57 AM, Pintu Kumar <pintu.ping@gmail.com> wrote:
>> On Tue, Mar 20, 2018 at 9:03 AM, Greg Gallagher <greg@embeddedgreg.com> wrote:
>>> If you want to use open, read, write you need to specify in the
>>> makefile to use the posix skin.  You need something like these in your
>>> Makefile:
>>>
>>> XENO_CONFIG := /usr/xenomai/bin/xeno-config
>>> CFLAGS := $(shell $(XENO_CONFIG) --posix --cflags)
>>> LDFLAGS := $(shell  $(XENO_CONFIG) --posix --ldflags)
>>>
>>
>> Oh yes I forgot to mention with posix skin it is working.
>>
>> But I wanted to use native API only, so I removed posix skin from Makefile.
>>
>> For, native API, I am using: rt_dev_{open, read, write}. Is this the
>> valid API for Xenomai 3.0 ?
>> Or there is something else?
>> Is there any reference ?
>>
> 
> Dear Greg,
> 
> In my sample, I am just copying some string from user <--> kernel and
> printing them.
> For normal driver, I get read/write latency like this:
> write latency: 2.247 us
> read latency: 2.202 us
> 
> For Xenomai 3.0 rtdm driver, using : rt_dev_{open, read, write}
> I get the latency like this:
> write latency: 7.668 us
> read latency: 5.558 us
> 
> My concern is, why the latency is higher in case of RTDM ?
> This is on x86-64 machine.
>

Did you stress your machine while your test was running? If not, you
were not measuring worst-case latency, you were measuring execution time
in this case, which is different. If you want to actually measure
latency for real-time usage, you need to run your tests under
significant stress load. Under such load, the RTDM version should
perform reliably below a reasonable latency limit, the "normal" version
will experience jittery above that limit.

A trivial stress load may be as simple as running a dd loop copying
128Mb blocks from /dev/zero to /dev/null in the background, you may also
add a kernel compilation keeping all CPUs busy.

Besides, you need to make sure to disable I-pipe and Cobalt debug
options, particularly CONFIG_IPIPE_TRACE and
CONFIG_XENO_OPT_DEBUG_LOCKING when running the RTDM case.

>
> Latency is little better, when using only posix skin:
> write latency: 3.587 us
> read latency: 3.392 us
>

This does not make much sense, see the excerpt from
include/trank/rtdm/rtdm.h, which simply wraps the inline rt_dev_write()
call to Cobalt's POSIX call [__wrap_]write() from lib/cobalt/rtdm.c:

#define rt_dev_call(__call, __args...)	\
({					\
	int __ret;			\
	__ret = __RT(__call(__args));	\
	__ret < 0 ? -errno : __ret;	\
})

static inline ssize_t rt_dev_write(int fd, const void *buf, size_t len)
{
	return rt_dev_call(write, fd, buf, len);
}

The way you measure the elapsed time may affect the measurement:
libalchemy's rt_timer_read() is definitely slower than libcobalt's
clock_gettime().

The POSIX skin is generally faster than the alchemy API, because it
implements wrappers to the corresponding Cobalt system calls (i.e.
libcobalt is Xenomai's libc equivalent). Alchemy has to traverse
libcopperplate before actual syscalls may be issued by libcobalt it is
depending on, because libalchemy needs the copperplate interface layer
for shielding itself from Cobalt/Mercury differences.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-20  9:32       ` Philippe Gerum
@ 2018-03-20 11:31         ` Pintu Kumar
  2018-03-20 11:37           ` Philippe Gerum
  2018-03-20 11:45           ` Philippe Gerum
  0 siblings, 2 replies; 18+ messages in thread
From: Pintu Kumar @ 2018-03-20 11:31 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai@xenomai.org

On Tue, Mar 20, 2018 at 3:02 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/20/2018 08:26 AM, Pintu Kumar wrote:
>> On Tue, Mar 20, 2018 at 10:57 AM, Pintu Kumar <pintu.ping@gmail.com> wrote:
>>> On Tue, Mar 20, 2018 at 9:03 AM, Greg Gallagher <greg@embeddedgreg.com> wrote:
>>>> If you want to use open, read, write you need to specify in the
>>>> makefile to use the posix skin.  You need something like these in your
>>>> Makefile:
>>>>
>>>> XENO_CONFIG := /usr/xenomai/bin/xeno-config
>>>> CFLAGS := $(shell $(XENO_CONFIG) --posix --cflags)
>>>> LDFLAGS := $(shell  $(XENO_CONFIG) --posix --ldflags)
>>>>
>>>
>>> Oh yes I forgot to mention with posix skin it is working.
>>>
>>> But I wanted to use native API only, so I removed posix skin from Makefile.
>>>
>>> For, native API, I am using: rt_dev_{open, read, write}. Is this the
>>> valid API for Xenomai 3.0 ?
>>> Or there is something else?
>>> Is there any reference ?
>>>
>>
>> Dear Greg,
>>
>> In my sample, I am just copying some string from user <--> kernel and
>> printing them.
>> For normal driver, I get read/write latency like this:
>> write latency: 2.247 us
>> read latency: 2.202 us
>>
>> For Xenomai 3.0 rtdm driver, using : rt_dev_{open, read, write}
>> I get the latency like this:
>> write latency: 7.668 us
>> read latency: 5.558 us
>>
>> My concern is, why the latency is higher in case of RTDM ?
>> This is on x86-64 machine.
>>
>
> Did you stress your machine while your test was running? If not, you
> were not measuring worst-case latency, you were measuring execution time
> in this case, which is different. If you want to actually measure
> latency for real-time usage, you need to run your tests under
> significant stress load. Under such load, the RTDM version should
> perform reliably below a reasonable latency limit, the "normal" version
> will experience jittery above that limit.
>
> A trivial stress load may be as simple as running a dd loop copying
> 128Mb blocks from /dev/zero to /dev/null in the background, you may also
> add a kernel compilation keeping all CPUs busy.
>

OK, I tried both the option. But still normal driver latency is much lower.
In fact, with kernel build in another terminal, rtdm latency shoots much higher.
Normal Kernel
--------------------
write latency: 3.084 us
read latency: 3.186 us

RTDM Kernel (native)
---------------------------------
write latency: 12.676 us
read latency: 9.858 us

RTDM Kernel (posix)
---------------------------------
write latency: 12.907 us
read latency: 8.699 us

During the beginning of kernel build I even observed, RTDM (native)
goes as high as:
write latency: 4061.266 us
read latency: 3947.836 us

---------------------------------
As a quick reference, this is the snippet for the rtdm write method.

--------------------------------
static ssize_t rtdm_write(..)
{
        struct dummy_context *context;

        context = rtdm_fd_to_private(fd);

        memset(context->buffer, 0, 4096);
        rtdm_safe_copy_from_user(fd, context->buffer, buff, len);
        rtdm_printk("write done\n");

        return len;
}

The normal driver write is also almost same.

In the application side, I just invoke using:
        t1 = rt_timer_read();
        ret = rt_dev_write(fd, msg, len);
        t2 = rt_timer_read();

Is there any thing wrong on the rtdm side ?
--------------------------------

> Besides, you need to make sure to disable I-pipe and Cobalt debug
> options, particularly CONFIG_IPIPE_TRACE and
> CONFIG_XENO_OPT_DEBUG_LOCKING when running the RTDM case.
>

Yes this debug options are already disabled.

>>
>> Latency is little better, when using only posix skin:
>> write latency: 3.587 us
>> read latency: 3.392 us
>>
>
> This does not make much sense, see the excerpt from
> include/trank/rtdm/rtdm.h, which simply wraps the inline rt_dev_write()
> call to Cobalt's POSIX call [__wrap_]write() from lib/cobalt/rtdm.c:
>

OK sorry, there was a mistake in posix latency value.
I forgot to switch to rtdm driver, instead of normal driver.
With posix skin and using the exactly same as normal driver application.
The latency figure was almost same as native skin.
write latency: 7.044 us
read latency: 6.786 us


> #define rt_dev_call(__call, __args...)  \
> ({                                      \
>         int __ret;                      \
>         __ret = __RT(__call(__args));   \
>         __ret < 0 ? -errno : __ret;     \
> })
>
> static inline ssize_t rt_dev_write(int fd, const void *buf, size_t len)
> {
>         return rt_dev_call(write, fd, buf, len);
> }
>
> The way you measure the elapsed time may affect the measurement:
> libalchemy's rt_timer_read() is definitely slower than libcobalt's
> clock_gettime().

For normal kernel driver (and rtdm with posix skin) application, I am
using clock_gettime().
For Xenomai rtdm driver with native skin application, I am using rt_timer_read()


>
> The POSIX skin is generally faster than the alchemy API, because it
> implements wrappers to the corresponding Cobalt system calls (i.e.
> libcobalt is Xenomai's libc equivalent). Alchemy has to traverse
> libcopperplate before actual syscalls may be issued by libcobalt it is
> depending on, because libalchemy needs the copperplate interface layer
> for shielding itself from Cobalt/Mercury differences.
>

Actually, as per the previous experience for simple thread
application, rt_timer_read() with native
skin gave better latency, when compared to using posix skin with clock API.

> --
> Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-20 11:31         ` Pintu Kumar
@ 2018-03-20 11:37           ` Philippe Gerum
  2018-03-20 11:45           ` Philippe Gerum
  1 sibling, 0 replies; 18+ messages in thread
From: Philippe Gerum @ 2018-03-20 11:37 UTC (permalink / raw)
  To: Pintu Kumar; +Cc: Xenomai@xenomai.org

On 03/20/2018 12:31 PM, Pintu Kumar wrote:
> On Tue, Mar 20, 2018 at 3:02 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 03/20/2018 08:26 AM, Pintu Kumar wrote:
>>> On Tue, Mar 20, 2018 at 10:57 AM, Pintu Kumar <pintu.ping@gmail.com> wrote:
>>>> On Tue, Mar 20, 2018 at 9:03 AM, Greg Gallagher <greg@embeddedgreg.com> wrote:
>>>>> If you want to use open, read, write you need to specify in the
>>>>> makefile to use the posix skin.  You need something like these in your
>>>>> Makefile:
>>>>>
>>>>> XENO_CONFIG := /usr/xenomai/bin/xeno-config
>>>>> CFLAGS := $(shell $(XENO_CONFIG) --posix --cflags)
>>>>> LDFLAGS := $(shell  $(XENO_CONFIG) --posix --ldflags)
>>>>>
>>>>
>>>> Oh yes I forgot to mention with posix skin it is working.
>>>>
>>>> But I wanted to use native API only, so I removed posix skin from Makefile.
>>>>
>>>> For, native API, I am using: rt_dev_{open, read, write}. Is this the
>>>> valid API for Xenomai 3.0 ?
>>>> Or there is something else?
>>>> Is there any reference ?
>>>>
>>>
>>> Dear Greg,
>>>
>>> In my sample, I am just copying some string from user <--> kernel and
>>> printing them.
>>> For normal driver, I get read/write latency like this:
>>> write latency: 2.247 us
>>> read latency: 2.202 us
>>>
>>> For Xenomai 3.0 rtdm driver, using : rt_dev_{open, read, write}
>>> I get the latency like this:
>>> write latency: 7.668 us
>>> read latency: 5.558 us
>>>
>>> My concern is, why the latency is higher in case of RTDM ?
>>> This is on x86-64 machine.
>>>
>>
>> Did you stress your machine while your test was running? If not, you
>> were not measuring worst-case latency, you were measuring execution time
>> in this case, which is different. If you want to actually measure
>> latency for real-time usage, you need to run your tests under
>> significant stress load. Under such load, the RTDM version should
>> perform reliably below a reasonable latency limit, the "normal" version
>> will experience jittery above that limit.
>>
>> A trivial stress load may be as simple as running a dd loop copying
>> 128Mb blocks from /dev/zero to /dev/null in the background, you may also
>> add a kernel compilation keeping all CPUs busy.
>>
> 
> OK, I tried both the option. But still normal driver latency is much lower.
> In fact, with kernel build in another terminal, rtdm latency shoots much higher.
> Normal Kernel
> --------------------
> write latency: 3.084 us
> read latency: 3.186 us
> 
> RTDM Kernel (native)
> ---------------------------------
> write latency: 12.676 us
> read latency: 9.858 us
> 
> RTDM Kernel (posix)
> ---------------------------------
> write latency: 12.907 us
> read latency: 8.699 us
> 
> During the beginning of kernel build I even observed, RTDM (native)
> goes as high as:
> write latency: 4061.266 us
> read latency: 3947.836 us

Just to make sure, we are actually discussing a test case on real
hardware, not VM stuff, right?

-- 
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-20 11:31         ` Pintu Kumar
  2018-03-20 11:37           ` Philippe Gerum
@ 2018-03-20 11:45           ` Philippe Gerum
  2018-03-20 12:00             ` Pintu Kumar
  1 sibling, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2018-03-20 11:45 UTC (permalink / raw)
  To: Pintu Kumar; +Cc: Xenomai@xenomai.org

On 03/20/2018 12:31 PM, Pintu Kumar wrote:
> On Tue, Mar 20, 2018 at 3:02 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 03/20/2018 08:26 AM, Pintu Kumar wrote:
>>> On Tue, Mar 20, 2018 at 10:57 AM, Pintu Kumar <pintu.ping@gmail.com> wrote:
>>>> On Tue, Mar 20, 2018 at 9:03 AM, Greg Gallagher <greg@embeddedgreg.com> wrote:
>>>>> If you want to use open, read, write you need to specify in the
>>>>> makefile to use the posix skin.  You need something like these in your
>>>>> Makefile:
>>>>>
>>>>> XENO_CONFIG := /usr/xenomai/bin/xeno-config
>>>>> CFLAGS := $(shell $(XENO_CONFIG) --posix --cflags)
>>>>> LDFLAGS := $(shell  $(XENO_CONFIG) --posix --ldflags)
>>>>>
>>>>
>>>> Oh yes I forgot to mention with posix skin it is working.
>>>>
>>>> But I wanted to use native API only, so I removed posix skin from Makefile.
>>>>
>>>> For, native API, I am using: rt_dev_{open, read, write}. Is this the
>>>> valid API for Xenomai 3.0 ?
>>>> Or there is something else?
>>>> Is there any reference ?
>>>>
>>>
>>> Dear Greg,
>>>
>>> In my sample, I am just copying some string from user <--> kernel and
>>> printing them.
>>> For normal driver, I get read/write latency like this:
>>> write latency: 2.247 us
>>> read latency: 2.202 us
>>>
>>> For Xenomai 3.0 rtdm driver, using : rt_dev_{open, read, write}
>>> I get the latency like this:
>>> write latency: 7.668 us
>>> read latency: 5.558 us
>>>
>>> My concern is, why the latency is higher in case of RTDM ?
>>> This is on x86-64 machine.
>>>
>>
>> Did you stress your machine while your test was running? If not, you
>> were not measuring worst-case latency, you were measuring execution time
>> in this case, which is different. If you want to actually measure
>> latency for real-time usage, you need to run your tests under
>> significant stress load. Under such load, the RTDM version should
>> perform reliably below a reasonable latency limit, the "normal" version
>> will experience jittery above that limit.
>>
>> A trivial stress load may be as simple as running a dd loop copying
>> 128Mb blocks from /dev/zero to /dev/null in the background, you may also
>> add a kernel compilation keeping all CPUs busy.
>>
> 
> OK, I tried both the option. But still normal driver latency is much lower.
> In fact, with kernel build in another terminal, rtdm latency shoots much higher.
> Normal Kernel
> --------------------
> write latency: 3.084 us
> read latency: 3.186 us
> 
> RTDM Kernel (native)
> ---------------------------------
> write latency: 12.676 us
> read latency: 9.858 us
> 
> RTDM Kernel (posix)
> ---------------------------------
> write latency: 12.907 us
> read latency: 8.699 us
> 
> During the beginning of kernel build I even observed, RTDM (native)
> goes as high as:
> write latency: 4061.266 us
> read latency: 3947.836 us
> 
> ---------------------------------
> As a quick reference, this is the snippet for the rtdm write method.
> 
> --------------------------------
> static ssize_t rtdm_write(..)
> {
>         struct dummy_context *context;
> 
>         context = rtdm_fd_to_private(fd);
> 
>         memset(context->buffer, 0, 4096);
>         rtdm_safe_copy_from_user(fd, context->buffer, buff, len);
>         rtdm_printk("write done\n");
> 
>         return len;
> }
> 
> The normal driver write is also almost same.
> 
> In the application side, I just invoke using:
>         t1 = rt_timer_read();
>         ret = rt_dev_write(fd, msg, len);
>         t2 = rt_timer_read();
> 
> Is there any thing wrong on the rtdm side ?
> --------------------------------
> 
>> Besides, you need to make sure to disable I-pipe and Cobalt debug
>> options, particularly CONFIG_IPIPE_TRACE and
>> CONFIG_XENO_OPT_DEBUG_LOCKING when running the RTDM case.
>>
> 
> Yes this debug options are already disabled.
> 
>>>
>>> Latency is little better, when using only posix skin:
>>> write latency: 3.587 us
>>> read latency: 3.392 us
>>>
>>
>> This does not make much sense, see the excerpt from
>> include/trank/rtdm/rtdm.h, which simply wraps the inline rt_dev_write()
>> call to Cobalt's POSIX call [__wrap_]write() from lib/cobalt/rtdm.c:
>>
> 
> OK sorry, there was a mistake in posix latency value.
> I forgot to switch to rtdm driver, instead of normal driver.
> With posix skin and using the exactly same as normal driver application.
> The latency figure was almost same as native skin.
> write latency: 7.044 us
> read latency: 6.786 us
> 
> 
>> #define rt_dev_call(__call, __args...)  \
>> ({                                      \
>>         int __ret;                      \
>>         __ret = __RT(__call(__args));   \
>>         __ret < 0 ? -errno : __ret;     \
>> })
>>
>> static inline ssize_t rt_dev_write(int fd, const void *buf, size_t len)
>> {
>>         return rt_dev_call(write, fd, buf, len);
>> }
>>
>> The way you measure the elapsed time may affect the measurement:
>> libalchemy's rt_timer_read() is definitely slower than libcobalt's
>> clock_gettime().
> 
> For normal kernel driver (and rtdm with posix skin) application, I am
> using clock_gettime().
> For Xenomai rtdm driver with native skin application, I am using rt_timer_read()
> 
> 
>>
>> The POSIX skin is generally faster than the alchemy API, because it
>> implements wrappers to the corresponding Cobalt system calls (i.e.
>> libcobalt is Xenomai's libc equivalent). Alchemy has to traverse
>> libcopperplate before actual syscalls may be issued by libcobalt it is
>> depending on, because libalchemy needs the copperplate interface layer
>> for shielding itself from Cobalt/Mercury differences.
>>
> 
> Actually, as per the previous experience for simple thread
> application, rt_timer_read() with native
> skin gave better latency, when compared to using posix skin with clock API.
> 

This behavior makes not much sense, simply looking at the library code:
rt_timer_read() may be considered as a superset of libcobalt's
clock_gettime.

This could be a hint that you might not be testing with Cobalt's POSIX
API. You may want to check running "nm" on your executable, verifying
that __wrap_* calls are listed (e.g. __wrap_clock_gettime instead of
clock_gettime).

-- 
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-20 11:45           ` Philippe Gerum
@ 2018-03-20 12:00             ` Pintu Kumar
  2018-03-20 13:09               ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Pintu Kumar @ 2018-03-20 12:00 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai@xenomai.org

On Tue, Mar 20, 2018 at 5:15 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/20/2018 12:31 PM, Pintu Kumar wrote:
>> On Tue, Mar 20, 2018 at 3:02 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>>> On 03/20/2018 08:26 AM, Pintu Kumar wrote:
>>>> On Tue, Mar 20, 2018 at 10:57 AM, Pintu Kumar <pintu.ping@gmail.com> wrote:
>>>>> On Tue, Mar 20, 2018 at 9:03 AM, Greg Gallagher <greg@embeddedgreg.com> wrote:
>>>>>> If you want to use open, read, write you need to specify in the
>>>>>> makefile to use the posix skin.  You need something like these in your
>>>>>> Makefile:
>>>>>>
>>>>>> XENO_CONFIG := /usr/xenomai/bin/xeno-config
>>>>>> CFLAGS := $(shell $(XENO_CONFIG) --posix --cflags)
>>>>>> LDFLAGS := $(shell  $(XENO_CONFIG) --posix --ldflags)
>>>>>>
>>>>>
>>>>> Oh yes I forgot to mention with posix skin it is working.
>>>>>
>>>>> But I wanted to use native API only, so I removed posix skin from Makefile.
>>>>>
>>>>> For, native API, I am using: rt_dev_{open, read, write}. Is this the
>>>>> valid API for Xenomai 3.0 ?
>>>>> Or there is something else?
>>>>> Is there any reference ?
>>>>>
>>>>
>>>> Dear Greg,
>>>>
>>>> In my sample, I am just copying some string from user <--> kernel and
>>>> printing them.
>>>> For normal driver, I get read/write latency like this:
>>>> write latency: 2.247 us
>>>> read latency: 2.202 us
>>>>
>>>> For Xenomai 3.0 rtdm driver, using : rt_dev_{open, read, write}
>>>> I get the latency like this:
>>>> write latency: 7.668 us
>>>> read latency: 5.558 us
>>>>
>>>> My concern is, why the latency is higher in case of RTDM ?
>>>> This is on x86-64 machine.
>>>>
>>>
>>> Did you stress your machine while your test was running? If not, you
>>> were not measuring worst-case latency, you were measuring execution time
>>> in this case, which is different. If you want to actually measure
>>> latency for real-time usage, you need to run your tests under
>>> significant stress load. Under such load, the RTDM version should
>>> perform reliably below a reasonable latency limit, the "normal" version
>>> will experience jittery above that limit.
>>>
>>> A trivial stress load may be as simple as running a dd loop copying
>>> 128Mb blocks from /dev/zero to /dev/null in the background, you may also
>>> add a kernel compilation keeping all CPUs busy.
>>>
>>
>> OK, I tried both the option. But still normal driver latency is much lower.
>> In fact, with kernel build in another terminal, rtdm latency shoots much higher.
>> Normal Kernel
>> --------------------
>> write latency: 3.084 us
>> read latency: 3.186 us
>>
>> RTDM Kernel (native)
>> ---------------------------------
>> write latency: 12.676 us
>> read latency: 9.858 us
>>
>> RTDM Kernel (posix)
>> ---------------------------------
>> write latency: 12.907 us
>> read latency: 8.699 us
>>
>> During the beginning of kernel build I even observed, RTDM (native)
>> goes as high as:
>> write latency: 4061.266 us
>> read latency: 3947.836 us
>>
>> ---------------------------------
>> As a quick reference, this is the snippet for the rtdm write method.
>>
>> --------------------------------
>> static ssize_t rtdm_write(..)
>> {
>>         struct dummy_context *context;
>>
>>         context = rtdm_fd_to_private(fd);
>>
>>         memset(context->buffer, 0, 4096);
>>         rtdm_safe_copy_from_user(fd, context->buffer, buff, len);
>>         rtdm_printk("write done\n");
>>
>>         return len;
>> }
>>
>> The normal driver write is also almost same.
>>
>> In the application side, I just invoke using:
>>         t1 = rt_timer_read();
>>         ret = rt_dev_write(fd, msg, len);
>>         t2 = rt_timer_read();
>>
>> Is there any thing wrong on the rtdm side ?
>> --------------------------------
>>
>>> Besides, you need to make sure to disable I-pipe and Cobalt debug
>>> options, particularly CONFIG_IPIPE_TRACE and
>>> CONFIG_XENO_OPT_DEBUG_LOCKING when running the RTDM case.
>>>
>>
>> Yes this debug options are already disabled.
>>
>>>>
>>>> Latency is little better, when using only posix skin:
>>>> write latency: 3.587 us
>>>> read latency: 3.392 us
>>>>
>>>
>>> This does not make much sense, see the excerpt from
>>> include/trank/rtdm/rtdm.h, which simply wraps the inline rt_dev_write()
>>> call to Cobalt's POSIX call [__wrap_]write() from lib/cobalt/rtdm.c:
>>>
>>
>> OK sorry, there was a mistake in posix latency value.
>> I forgot to switch to rtdm driver, instead of normal driver.
>> With posix skin and using the exactly same as normal driver application.
>> The latency figure was almost same as native skin.
>> write latency: 7.044 us
>> read latency: 6.786 us
>>
>>
>>> #define rt_dev_call(__call, __args...)  \
>>> ({                                      \
>>>         int __ret;                      \
>>>         __ret = __RT(__call(__args));   \
>>>         __ret < 0 ? -errno : __ret;     \
>>> })
>>>
>>> static inline ssize_t rt_dev_write(int fd, const void *buf, size_t len)
>>> {
>>>         return rt_dev_call(write, fd, buf, len);
>>> }
>>>
>>> The way you measure the elapsed time may affect the measurement:
>>> libalchemy's rt_timer_read() is definitely slower than libcobalt's
>>> clock_gettime().
>>
>> For normal kernel driver (and rtdm with posix skin) application, I am
>> using clock_gettime().
>> For Xenomai rtdm driver with native skin application, I am using rt_timer_read()
>>
>>
>>>
>>> The POSIX skin is generally faster than the alchemy API, because it
>>> implements wrappers to the corresponding Cobalt system calls (i.e.
>>> libcobalt is Xenomai's libc equivalent). Alchemy has to traverse
>>> libcopperplate before actual syscalls may be issued by libcobalt it is
>>> depending on, because libalchemy needs the copperplate interface layer
>>> for shielding itself from Cobalt/Mercury differences.
>>>
>>
>> Actually, as per the previous experience for simple thread
>> application, rt_timer_read() with native
>> skin gave better latency, when compared to using posix skin with clock API.
>>
>
> This behavior makes not much sense, simply looking at the library code:
> rt_timer_read() may be considered as a superset of libcobalt's
> clock_gettime.
>
> This could be a hint that you might not be testing with Cobalt's POSIX
> API. You may want to check running "nm" on your executable, verifying
> that __wrap_* calls are listed (e.g. __wrap_clock_gettime instead of
> clock_gettime).
>

Yes, wrap calls are listed in symbol table.

posix# nm -a my_app | grep clock
                 U __wrap_clock_gettime


> --
> Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-20 12:00             ` Pintu Kumar
@ 2018-03-20 13:09               ` Philippe Gerum
  2018-03-23 12:40                 ` Pintu Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2018-03-20 13:09 UTC (permalink / raw)
  To: Pintu Kumar; +Cc: Xenomai@xenomai.org

On 03/20/2018 01:00 PM, Pintu Kumar wrote:
> On Tue, Mar 20, 2018 at 5:15 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 03/20/2018 12:31 PM, Pintu Kumar wrote:
>>> On Tue, Mar 20, 2018 at 3:02 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>>>> On 03/20/2018 08:26 AM, Pintu Kumar wrote:
>>>>> On Tue, Mar 20, 2018 at 10:57 AM, Pintu Kumar <pintu.ping@gmail.com> wrote:
>>>>>> On Tue, Mar 20, 2018 at 9:03 AM, Greg Gallagher <greg@embeddedgreg.com> wrote:
>>>>>>> If you want to use open, read, write you need to specify in the
>>>>>>> makefile to use the posix skin.  You need something like these in your
>>>>>>> Makefile:
>>>>>>>
>>>>>>> XENO_CONFIG := /usr/xenomai/bin/xeno-config
>>>>>>> CFLAGS := $(shell $(XENO_CONFIG) --posix --cflags)
>>>>>>> LDFLAGS := $(shell  $(XENO_CONFIG) --posix --ldflags)
>>>>>>>
>>>>>>
>>>>>> Oh yes I forgot to mention with posix skin it is working.
>>>>>>
>>>>>> But I wanted to use native API only, so I removed posix skin from Makefile.
>>>>>>
>>>>>> For, native API, I am using: rt_dev_{open, read, write}. Is this the
>>>>>> valid API for Xenomai 3.0 ?
>>>>>> Or there is something else?
>>>>>> Is there any reference ?
>>>>>>
>>>>>
>>>>> Dear Greg,
>>>>>
>>>>> In my sample, I am just copying some string from user <--> kernel and
>>>>> printing them.
>>>>> For normal driver, I get read/write latency like this:
>>>>> write latency: 2.247 us
>>>>> read latency: 2.202 us
>>>>>
>>>>> For Xenomai 3.0 rtdm driver, using : rt_dev_{open, read, write}
>>>>> I get the latency like this:
>>>>> write latency: 7.668 us
>>>>> read latency: 5.558 us
>>>>>
>>>>> My concern is, why the latency is higher in case of RTDM ?
>>>>> This is on x86-64 machine.
>>>>>
>>>>
>>>> Did you stress your machine while your test was running? If not, you
>>>> were not measuring worst-case latency, you were measuring execution time
>>>> in this case, which is different. If you want to actually measure
>>>> latency for real-time usage, you need to run your tests under
>>>> significant stress load. Under such load, the RTDM version should
>>>> perform reliably below a reasonable latency limit, the "normal" version
>>>> will experience jittery above that limit.
>>>>
>>>> A trivial stress load may be as simple as running a dd loop copying
>>>> 128Mb blocks from /dev/zero to /dev/null in the background, you may also
>>>> add a kernel compilation keeping all CPUs busy.
>>>>
>>>
>>> OK, I tried both the option. But still normal driver latency is much lower.
>>> In fact, with kernel build in another terminal, rtdm latency shoots much higher.
>>> Normal Kernel
>>> --------------------
>>> write latency: 3.084 us
>>> read latency: 3.186 us
>>>
>>> RTDM Kernel (native)
>>> ---------------------------------
>>> write latency: 12.676 us
>>> read latency: 9.858 us
>>>
>>> RTDM Kernel (posix)
>>> ---------------------------------
>>> write latency: 12.907 us
>>> read latency: 8.699 us
>>>
>>> During the beginning of kernel build I even observed, RTDM (native)
>>> goes as high as:
>>> write latency: 4061.266 us
>>> read latency: 3947.836 us
>>>
>>> ---------------------------------
>>> As a quick reference, this is the snippet for the rtdm write method.
>>>
>>> --------------------------------
>>> static ssize_t rtdm_write(..)
>>> {
>>>         struct dummy_context *context;
>>>
>>>         context = rtdm_fd_to_private(fd);
>>>
>>>         memset(context->buffer, 0, 4096);
>>>         rtdm_safe_copy_from_user(fd, context->buffer, buff, len);
>>>         rtdm_printk("write done\n");
>>>
>>>         return len;
>>> }
>>>
>>> The normal driver write is also almost same.
>>>
>>> In the application side, I just invoke using:
>>>         t1 = rt_timer_read();
>>>         ret = rt_dev_write(fd, msg, len);
>>>         t2 = rt_timer_read();
>>>
>>> Is there any thing wrong on the rtdm side ?
>>> --------------------------------
>>>
>>>> Besides, you need to make sure to disable I-pipe and Cobalt debug
>>>> options, particularly CONFIG_IPIPE_TRACE and
>>>> CONFIG_XENO_OPT_DEBUG_LOCKING when running the RTDM case.
>>>>
>>>
>>> Yes this debug options are already disabled.
>>>
>>>>>
>>>>> Latency is little better, when using only posix skin:
>>>>> write latency: 3.587 us
>>>>> read latency: 3.392 us
>>>>>
>>>>
>>>> This does not make much sense, see the excerpt from
>>>> include/trank/rtdm/rtdm.h, which simply wraps the inline rt_dev_write()
>>>> call to Cobalt's POSIX call [__wrap_]write() from lib/cobalt/rtdm.c:
>>>>
>>>
>>> OK sorry, there was a mistake in posix latency value.
>>> I forgot to switch to rtdm driver, instead of normal driver.
>>> With posix skin and using the exactly same as normal driver application.
>>> The latency figure was almost same as native skin.
>>> write latency: 7.044 us
>>> read latency: 6.786 us
>>>
>>>
>>>> #define rt_dev_call(__call, __args...)  \
>>>> ({                                      \
>>>>         int __ret;                      \
>>>>         __ret = __RT(__call(__args));   \
>>>>         __ret < 0 ? -errno : __ret;     \
>>>> })
>>>>
>>>> static inline ssize_t rt_dev_write(int fd, const void *buf, size_t len)
>>>> {
>>>>         return rt_dev_call(write, fd, buf, len);
>>>> }
>>>>
>>>> The way you measure the elapsed time may affect the measurement:
>>>> libalchemy's rt_timer_read() is definitely slower than libcobalt's
>>>> clock_gettime().
>>>
>>> For normal kernel driver (and rtdm with posix skin) application, I am
>>> using clock_gettime().
>>> For Xenomai rtdm driver with native skin application, I am using rt_timer_read()
>>>
>>>
>>>>
>>>> The POSIX skin is generally faster than the alchemy API, because it
>>>> implements wrappers to the corresponding Cobalt system calls (i.e.
>>>> libcobalt is Xenomai's libc equivalent). Alchemy has to traverse
>>>> libcopperplate before actual syscalls may be issued by libcobalt it is
>>>> depending on, because libalchemy needs the copperplate interface layer
>>>> for shielding itself from Cobalt/Mercury differences.
>>>>
>>>
>>> Actually, as per the previous experience for simple thread
>>> application, rt_timer_read() with native
>>> skin gave better latency, when compared to using posix skin with clock API.
>>>
>>
>> This behavior makes not much sense, simply looking at the library code:
>> rt_timer_read() may be considered as a superset of libcobalt's
>> clock_gettime.
>>
>> This could be a hint that you might not be testing with Cobalt's POSIX
>> API. You may want to check running "nm" on your executable, verifying
>> that __wrap_* calls are listed (e.g. __wrap_clock_gettime instead of
>> clock_gettime).
>>
> 
> Yes, wrap calls are listed in symbol table.
> 
> posix# nm -a my_app | grep clock
>                  U __wrap_clock_gettime
> 
> 

Then you may suspect a SMI problem, less likely an hyperthreading issue
given the magnitude of the latency figures. You may want to have a look
at this thread [1].

If this is a SMI issue, you should be able to see it on a regular kernel
as well, provided the protocol for testing is right and actually
exercises the same thing for an equivalent period of time on both ends
(Xenomai / native).

We generally don't know which configuration is under test on your end:
which I-pipe patch you have been using, which kernel release, which
Xenomai release (assuming you are not using the obsolete 3.0.0 original
release). Actually, we don't know much about your hw either (the Intel
micro-architecture is not defining when it comes to latency issues, the
SoC may be, the BIOS usually is).

AFAIU, the symptoms you are referring to are either SMI-related, or
might be related to some GPU driver doing issuing costly instructions
badly affecting the latency (e.g. wbinvd, although a 4 ms latency spike
is completely insane and does not fit the typical footprint of those
insns), or might denote a massive I-pipe/Xenomai bug.

The latter is always possible, but not the most likely at the moment, as
this kind of bug tends to be noticed by many people, and nobody did so
far with the stable Xenomai release and an official I-pipe patch for
x86, or any other architecture.

First I would recommend reading [2], providing the missing information
afterwards.

Then, I would make 100% sure that your SoC is SMI-free, and HT is
disabled in the BIOS, running a vanilla kernel: no I-pipe, no Xenomai,
just plain regular kernel.

[1] http://xenomai.org/pipermail/xenomai/2018-February/038393.html
[2] http://xenomai.org/asking-for-help/

-- 
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-20 13:09               ` Philippe Gerum
@ 2018-03-23 12:40                 ` Pintu Kumar
  2018-03-25 12:09                   ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Pintu Kumar @ 2018-03-23 12:40 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai@xenomai.org

Dear Philippe,

Thank you so much for your detailed explanation.

First to cross-check, I also tried on ARM BeagleBone (White) with
256MB RAM, Single core
These are the values I got.
===========================
NORMAL KERNEL Driver Build (with xenomai present)
---------------------------------------------------------------------------
write latency: 8235.083 us
read latency: 13636.875 us
-------------------------------------
write latency: 8192.542 us
read latency: 12859.833 us
--------------------------------------
write latency: 8182.833 us
read latency: 11003.333 us

===========================
XENOMAI RTDM Driver (native skin only)
--------------------------------------------------------
write latency: 8118.208 us
read latency: 12464.459 us
------------------------------------
write latency: 8162.083 us
read latency: 12885.000 us
------------------------------------
write latency: 8353.792 us
read latency: 11065.666 us

===========================
XENOMAI RTDM Driver (using posix skin only)
write latency: 8459.958 us
read latency: 12597.875 us
-------------------------------------
write latency: 8386.042 us
read latency: 11579.958 us
-------------------------------------
write latency: 8283.958 us
read latency: 13078.167 us
==============================================

So, looks like random behavior.
Sometimes normal driver is better, sometime RTDM-native is better,
sometimes RTDM-posix is better
I even tried by firing dd commands in background. In this case also
normal kernel driver is better.



About previous SkyLake machine, please find the information below:
# Architecture:          x86_64
# CPU op-mode(s):        32-bit, 64-bit
# CPU(s):                8
# Core:                    i7-6700K CPU @ 4.00GHz
# Virtualization:        VT-x
# Linux Kernel:         4.9.51
# cat /proc/cmdline
initrd=0:\initrd.img-4.9.51-amd-x86-64-rtdm
root=/dev/disk/by-partlabel/system ro ip=off processor.max_cstate=1
idle=poll i915.enable_rc6=0 i915.enable_dc=0 i915.powersave=0 nosmap
maxcpus=8 xenomai.smi=disabled
# /usr/xenomai/sbin/version
Xenomai/cobalt v3.0.6 -- #5956064 (2018-03-20 12:13:33 +0100)
# cat /proc/ipipe/version
4
# cat /proc/xenomai/latency
896


This is the latest latency value on SkyLake machine
==================================================
NORMAL KERNEL DRIVER:
write latency: 2.053 us
read latency: 2.285 us
-------------------------
XENOMAI RTDM DRIVER (native skin)
write latency: 6.248 us
read latency: 5.024 us
-------------------------
XENOMAI RTDM DRIVER (posix skin)
write latency: 6.320 us
read latency: 5.524 us
-------------------------

==================================================
This is the code snippet for RTDM write function:
(Note: error checking is removed to make it easy.
--------------------------------------------------------------------
#include <linux/module.h>
#include <linux/errno.h>
#include <linux/kernel.h>
#include <linux/string.h>
#include <rtdm/driver.h>

static ssize_t my_write(struct rtdm_fd *fd, const void __user *buff,
                                size_t len)
{
        struct my_context *context;

        context = rtdm_fd_to_private(fd);

        memset(context->buffer, 0, PAGE_SIZE);
        rtdm_safe_copy_from_user(fd, context->buffer, buff, len);
        rtdm_printk(": %s: done\n", __func__);

        return len;
}

static struct rtdm_driver my_driver = {
        .profile_info           =       RTDM_PROFILE_INFO(name,
                                                          RTDM_CLASS_MISC,
                                                          0,
                                                          0),
        .device_flags           =       RTDM_NAMED_DEVICE|RTDM_EXCLUSIVE,
        .device_count           =       1,
        .context_size           =       sizeof(struct my_context),
        .ops = {
                .open           =       my_open,
                .read_rt        =       my_read,
                .write_rt       =      my_write,
                .close          =       my_close,
        },
};

Makefile:
Same as normal linux module makefile, without any XENO_FLAGS.
-------------------------------------------------------------------
Application:
char msg[] = "something";
        len = strlen(msg);
        prev = rt_timer_read();
        ret = rt_dev_write(fd, msg, len);
        now = rt_timer_read();
        diff = (now - prev) / 1000.0;
        rt_printf("write latency: %5.3f us\n", diff);
--------------------------------------------------------------------
Hope this implementation is fine. If there is any mistake here, please
let me know.
==================================================

At the RTDM driver side, I even tried removing memset, printk, and
kept just copy_from_user, but it just reduces to 1 micro-seconds.
Also I tried replacing the rtdm_safe_copy_from_user, with just
rtdm_copy_from_user, nothing much changed.
So, it seems 2 things to me:
- rtdm_copy_from_user - takes more thing compare to normal kernel copy_from_user
- rt_dev_write - takes more time compare to normal write call, in normal kernel.

OR, is there too many primary<-->secondary switching happening in case
of my RTDM driver.

Is there any other way to check this issue and improve latency with
rtdm driver ?

If you have any other pointers/suggestions, please let me know.


Thanks,
Pintu

On Tue, Mar 20, 2018 at 6:39 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/20/2018 01:00 PM, Pintu Kumar wrote:
>> On Tue, Mar 20, 2018 at 5:15 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>>> On 03/20/2018 12:31 PM, Pintu Kumar wrote:
>>>> On Tue, Mar 20, 2018 at 3:02 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>>>>> On 03/20/2018 08:26 AM, Pintu Kumar wrote:
>>>>>> On Tue, Mar 20, 2018 at 10:57 AM, Pintu Kumar <pintu.ping@gmail.com> wrote:
>>>>>>> On Tue, Mar 20, 2018 at 9:03 AM, Greg Gallagher <greg@embeddedgreg.com> wrote:
>>>>>>>> If you want to use open, read, write you need to specify in the
>>>>>>>> makefile to use the posix skin.  You need something like these in your
>>>>>>>> Makefile:
>>>>>>>>
>>>>>>>> XENO_CONFIG := /usr/xenomai/bin/xeno-config
>>>>>>>> CFLAGS := $(shell $(XENO_CONFIG) --posix --cflags)
>>>>>>>> LDFLAGS := $(shell  $(XENO_CONFIG) --posix --ldflags)
>>>>>>>>
>>>>>>>
>>>>>>> Oh yes I forgot to mention with posix skin it is working.
>>>>>>>
>>>>>>> But I wanted to use native API only, so I removed posix skin from Makefile.
>>>>>>>
>>>>>>> For, native API, I am using: rt_dev_{open, read, write}. Is this the
>>>>>>> valid API for Xenomai 3.0 ?
>>>>>>> Or there is something else?
>>>>>>> Is there any reference ?
>>>>>>>
>>>>>>
>>>>>> Dear Greg,
>>>>>>
>>>>>> In my sample, I am just copying some string from user <--> kernel and
>>>>>> printing them.
>>>>>> For normal driver, I get read/write latency like this:
>>>>>> write latency: 2.247 us
>>>>>> read latency: 2.202 us
>>>>>>
>>>>>> For Xenomai 3.0 rtdm driver, using : rt_dev_{open, read, write}
>>>>>> I get the latency like this:
>>>>>> write latency: 7.668 us
>>>>>> read latency: 5.558 us
>>>>>>
>>>>>> My concern is, why the latency is higher in case of RTDM ?
>>>>>> This is on x86-64 machine.
>>>>>>
>>>>>
>>>>> Did you stress your machine while your test was running? If not, you
>>>>> were not measuring worst-case latency, you were measuring execution time
>>>>> in this case, which is different. If you want to actually measure
>>>>> latency for real-time usage, you need to run your tests under
>>>>> significant stress load. Under such load, the RTDM version should
>>>>> perform reliably below a reasonable latency limit, the "normal" version
>>>>> will experience jittery above that limit.
>>>>>
>>>>> A trivial stress load may be as simple as running a dd loop copying
>>>>> 128Mb blocks from /dev/zero to /dev/null in the background, you may also
>>>>> add a kernel compilation keeping all CPUs busy.
>>>>>
>>>>
>>>> OK, I tried both the option. But still normal driver latency is much lower.
>>>> In fact, with kernel build in another terminal, rtdm latency shoots much higher.
>>>> Normal Kernel
>>>> --------------------
>>>> write latency: 3.084 us
>>>> read latency: 3.186 us
>>>>
>>>> RTDM Kernel (native)
>>>> ---------------------------------
>>>> write latency: 12.676 us
>>>> read latency: 9.858 us
>>>>
>>>> RTDM Kernel (posix)
>>>> ---------------------------------
>>>> write latency: 12.907 us
>>>> read latency: 8.699 us
>>>>
>>>> During the beginning of kernel build I even observed, RTDM (native)
>>>> goes as high as:
>>>> write latency: 4061.266 us
>>>> read latency: 3947.836 us
>>>>
>>>> ---------------------------------
>>>> As a quick reference, this is the snippet for the rtdm write method.
>>>>
>>>> --------------------------------
>>>> static ssize_t rtdm_write(..)
>>>> {
>>>>         struct dummy_context *context;
>>>>
>>>>         context = rtdm_fd_to_private(fd);
>>>>
>>>>         memset(context->buffer, 0, 4096);
>>>>         rtdm_safe_copy_from_user(fd, context->buffer, buff, len);
>>>>         rtdm_printk("write done\n");
>>>>
>>>>         return len;
>>>> }
>>>>
>>>> The normal driver write is also almost same.
>>>>
>>>> In the application side, I just invoke using:
>>>>         t1 = rt_timer_read();
>>>>         ret = rt_dev_write(fd, msg, len);
>>>>         t2 = rt_timer_read();
>>>>
>>>> Is there any thing wrong on the rtdm side ?
>>>> --------------------------------
>>>>
>>>>> Besides, you need to make sure to disable I-pipe and Cobalt debug
>>>>> options, particularly CONFIG_IPIPE_TRACE and
>>>>> CONFIG_XENO_OPT_DEBUG_LOCKING when running the RTDM case.
>>>>>
>>>>
>>>> Yes this debug options are already disabled.
>>>>
>>>>>>
>>>>>> Latency is little better, when using only posix skin:
>>>>>> write latency: 3.587 us
>>>>>> read latency: 3.392 us
>>>>>>
>>>>>
>>>>> This does not make much sense, see the excerpt from
>>>>> include/trank/rtdm/rtdm.h, which simply wraps the inline rt_dev_write()
>>>>> call to Cobalt's POSIX call [__wrap_]write() from lib/cobalt/rtdm.c:
>>>>>
>>>>
>>>> OK sorry, there was a mistake in posix latency value.
>>>> I forgot to switch to rtdm driver, instead of normal driver.
>>>> With posix skin and using the exactly same as normal driver application.
>>>> The latency figure was almost same as native skin.
>>>> write latency: 7.044 us
>>>> read latency: 6.786 us
>>>>
>>>>
>>>>> #define rt_dev_call(__call, __args...)  \
>>>>> ({                                      \
>>>>>         int __ret;                      \
>>>>>         __ret = __RT(__call(__args));   \
>>>>>         __ret < 0 ? -errno : __ret;     \
>>>>> })
>>>>>
>>>>> static inline ssize_t rt_dev_write(int fd, const void *buf, size_t len)
>>>>> {
>>>>>         return rt_dev_call(write, fd, buf, len);
>>>>> }
>>>>>
>>>>> The way you measure the elapsed time may affect the measurement:
>>>>> libalchemy's rt_timer_read() is definitely slower than libcobalt's
>>>>> clock_gettime().
>>>>
>>>> For normal kernel driver (and rtdm with posix skin) application, I am
>>>> using clock_gettime().
>>>> For Xenomai rtdm driver with native skin application, I am using rt_timer_read()
>>>>
>>>>
>>>>>
>>>>> The POSIX skin is generally faster than the alchemy API, because it
>>>>> implements wrappers to the corresponding Cobalt system calls (i.e.
>>>>> libcobalt is Xenomai's libc equivalent). Alchemy has to traverse
>>>>> libcopperplate before actual syscalls may be issued by libcobalt it is
>>>>> depending on, because libalchemy needs the copperplate interface layer
>>>>> for shielding itself from Cobalt/Mercury differences.
>>>>>
>>>>
>>>> Actually, as per the previous experience for simple thread
>>>> application, rt_timer_read() with native
>>>> skin gave better latency, when compared to using posix skin with clock API.
>>>>
>>>
>>> This behavior makes not much sense, simply looking at the library code:
>>> rt_timer_read() may be considered as a superset of libcobalt's
>>> clock_gettime.
>>>
>>> This could be a hint that you might not be testing with Cobalt's POSIX
>>> API. You may want to check running "nm" on your executable, verifying
>>> that __wrap_* calls are listed (e.g. __wrap_clock_gettime instead of
>>> clock_gettime).
>>>
>>
>> Yes, wrap calls are listed in symbol table.
>>
>> posix# nm -a my_app | grep clock
>>                  U __wrap_clock_gettime
>>
>>
>
> Then you may suspect a SMI problem, less likely an hyperthreading issue
> given the magnitude of the latency figures. You may want to have a look
> at this thread [1].
>
> If this is a SMI issue, you should be able to see it on a regular kernel
> as well, provided the protocol for testing is right and actually
> exercises the same thing for an equivalent period of time on both ends
> (Xenomai / native).
>
> We generally don't know which configuration is under test on your end:
> which I-pipe patch you have been using, which kernel release, which
> Xenomai release (assuming you are not using the obsolete 3.0.0 original
> release). Actually, we don't know much about your hw either (the Intel
> micro-architecture is not defining when it comes to latency issues, the
> SoC may be, the BIOS usually is).
>
> AFAIU, the symptoms you are referring to are either SMI-related, or
> might be related to some GPU driver doing issuing costly instructions
> badly affecting the latency (e.g. wbinvd, although a 4 ms latency spike
> is completely insane and does not fit the typical footprint of those
> insns), or might denote a massive I-pipe/Xenomai bug.
>
> The latter is always possible, but not the most likely at the moment, as
> this kind of bug tends to be noticed by many people, and nobody did so
> far with the stable Xenomai release and an official I-pipe patch for
> x86, or any other architecture.
>
> First I would recommend reading [2], providing the missing information
> afterwards.
>
> Then, I would make 100% sure that your SoC is SMI-free, and HT is
> disabled in the BIOS, running a vanilla kernel: no I-pipe, no Xenomai,
> just plain regular kernel.
>
> [1] http://xenomai.org/pipermail/xenomai/2018-February/038393.html
> [2] http://xenomai.org/asking-for-help/
>
> --
> Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-23 12:40                 ` Pintu Kumar
@ 2018-03-25 12:09                   ` Philippe Gerum
  2018-03-26 13:12                     ` Pintu Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2018-03-25 12:09 UTC (permalink / raw)
  To: Pintu Kumar; +Cc: Xenomai@xenomai.org

On 03/23/2018 01:40 PM, Pintu Kumar wrote:
> Dear Philippe,
> 
> Thank you so much for your detailed explanation.
> 
> First to cross-check, I also tried on ARM BeagleBone (White) with
> 256MB RAM, Single core
> These are the values I got.

After how many samples?

> ===========================
> NORMAL KERNEL Driver Build (with xenomai present)
> ---------------------------------------------------------------------------
> write latency: 8235.083 us

Are you sure that any driver (plain Linux or Xenomai) would take up 8.2
MILLIseconds for performing a single write with your test module? Either
you meant 8235 nanoseconds, or something is really wrong with your
system. This said, benchmarking code calling printk() bluntly defeats
the purpose of the test.

> 
> So, looks like random behavior.
> Sometimes normal driver is better, sometime RTDM-native is better,
> sometimes RTDM-posix is better
> I even tried by firing dd commands in background. In this case also
> normal kernel driver is better.
> 
>

[...]

> At the RTDM driver side, I even tried removing memset, printk, and
> kept just copy_from_user, but it just reduces to 1 micro-seconds.
> Also I tried replacing the rtdm_safe_copy_from_user, with just
> rtdm_copy_from_user, nothing much changed.
> So, it seems 2 things to me:
> - rtdm_copy_from_user - takes more thing compare to normal kernel copy_from_user
> - rt_dev_write - takes more time compare to normal write call, in normal kernel.
> 
> OR, is there too many primary<-->secondary switching happening in case
> of my RTDM driver.
> 
> Is there any other way to check this issue and improve latency with
> rtdm driver ?
> 
> If you have any other pointers/suggestions, please let me know.
> 
> 

After many iterations, we still have no precise idea of the actual test
you are actually running, since the application code is only sketched,
and the module code is only partially available to us which does not
help either. Since there is no way we can converge to any sensible
result that way, I have demoed how I would write a simple test:

http://xenomai.org/downloads/xenomai/tmp/posix_test/

This test involves two modules, plan Linux and RTDM, and a single POSIX
client code alternatively built with libcobalt and glibc.

It displays the min, max and average values observed for read() and
write() loops. More details are available from comments in the source
code regarding the measurement.

Once the two modules, and two test executables are built, just push the
modules (they can live together in the kernel, no conflict), then run
either of the executables for measuring 1) the execution time on the
write() side, and 2) the response time on the read side.

On imx6qp (quad-core ARM Cortex A9 1.2Ghz), under stress load (dd loop +
hackbench loops) after 15' runtime (which is not long enough for full
validation but significant for getting the general trend), the figures
are as follows:

Cobalt:

[15' 0"] RD_MIN | RD_MAX |  R_AVG  | WR_MIN | WR_MAX |  WR_AVG
--------------------------------------------------------------
              7 |     49 |   9.100 |      5 |     46 |  6.464

(plain) POSIX [CONFIG_PREEMPT]:

[15' 0"] RD_MIN | RD_MAX |  R_AVG  | WR_MIN | WR_MAX |  WR_AVG
--------------------------------------------------------------
             13 |    456 |  16.325 |      7 |    435 |  9.568


On x86_64 with the exact same code (embbeded SoC 4 x 2Ghz CPU),

Cobalt:

            2 |     12 |   3.059 |      1 |     13 |  2.015

(plain) POSIX [CONFIG_PREEMPT]:

            3 |    182 |   3.702 |      1 |    185 |  2.095


Those figures are consistent with what I'd expect from such test.

The Xenomai code base used is the tip of the stable-3.0.x branch. ARM
kernel is 4.14.4, x86 kernel is 4.9.51 with the latest I-pipe to date
for both.

NOTE about Alchemy: the figures with this API would be in the same
ballpark than Cobalt, slightly higher (2-3 us worst-case) due to the
intermediate libcopperplate layer involved in implementing it. As I
mentioned earlier, using rt_dev* and friends does not make any
difference than using Cobalt directly, those are macro wrappers
expanding to Cobalt calls.

If you want to figure out what a plain Linux kernel is apt to when it
comes to response time to timer events on your SoC, you can configure
Xenomai with --core=mercury, instead of cobalt. The stock "latency" test
will be built against the plain glibc, instead of libcobalt. Then you
can compare the latency figures to the results obtained from the same
test from a Cobalt build. Such test has been carefully crafted and
refined over the years: the results you get from it are trustworthy.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-25 12:09                   ` Philippe Gerum
@ 2018-03-26 13:12                     ` Pintu Kumar
  2018-03-26 15:09                       ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Pintu Kumar @ 2018-03-26 13:12 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai@xenomai.org

Dear Philippe,

Thank you so much for your reply.
Please find my comments below.


On Sun, Mar 25, 2018 at 5:39 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/23/2018 01:40 PM, Pintu Kumar wrote:
>> Dear Philippe,
>>
>> Thank you so much for your detailed explanation.
>>
>> First to cross-check, I also tried on ARM BeagleBone (White) with
>> 256MB RAM, Single core
>> These are the values I got.
>
> After how many samples?

Just after 3 samples only for each cases. Just an initial run to
understand the difference.

>
>> ===========================
>> NORMAL KERNEL Driver Build (with xenomai present)
>> ---------------------------------------------------------------------------
>> write latency: 8235.083 us
>
> Are you sure that any driver (plain Linux or Xenomai) would take up 8.2
> MILLIseconds for performing a single write with your test module? Either
> you meant 8235 nanoseconds, or something is really wrong with your
> system.

Yes these values are calculated in micro-seconds.
I have used the same to measure latency for native application, and it
reports fine.
These large values are seen only on Beagle bone (white) with just 256MB RAM,
and model name: ARMv7 Processor rev 2 (v7l)
I think this is very old board and its very slow in normal usage itself.
So, these figures could be high.

This is the latency test output from same machine:
# /usr/xenomai/bin/latency
== Sampling period: 1000 us
== Test mode: periodic user-mode task
== All results in microseconds
warming up...
RTT|  00:00:01  (periodic user-mode task, 1000 us period, priority 99)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
RTD|     25.249|     29.711|     63.749|       0|     0|     25.249|     63.749
RTD|     25.207|     29.589|     60.749|       0|     0|     25.207|     63.749
RTD|     25.207|     29.701|     61.041|       0|     0|     25.207|     63.749
RTD|     22.874|     29.263|     54.749|       0|     0|     22.874|     63.749
RTD|     25.248|     29.542|     78.373|       0|     0|     22.874|     78.373
RTD|     15.081|     29.050|     55.082|       0|     0|     15.081|     78.373
RTD|     22.873|     28.940|     57.415|       0|     0|     15.081|     78.373
RTD|     25.331|     28.972|     55.498|       0|     0|     15.081|     78.373
RTD|     24.164|     28.071|     56.498|       0|     0|     15.081|     78.373
^C---|-----------|-----------|-----------|--------|------|-------------------------
RTS|     15.081|     29.204|     78.373|       0|     0|    00:00:10/00:00:10



> This said, benchmarking code calling printk() bluntly defeats
> the purpose of the test.

I also tried commenting the printk, or replacing it with rt_printk.

>
>>
>> So, looks like random behavior.
>> Sometimes normal driver is better, sometime RTDM-native is better,
>> sometimes RTDM-posix is better
>> I even tried by firing dd commands in background. In this case also
>> normal kernel driver is better.
>>
>>
>
> [...]
>
>> At the RTDM driver side, I even tried removing memset, printk, and
>> kept just copy_from_user, but it just reduces to 1 micro-seconds.
>> Also I tried replacing the rtdm_safe_copy_from_user, with just
>> rtdm_copy_from_user, nothing much changed.
>> So, it seems 2 things to me:
>> - rtdm_copy_from_user - takes more thing compare to normal kernel copy_from_user
>> - rt_dev_write - takes more time compare to normal write call, in normal kernel.
>>
>> OR, is there too many primary<-->secondary switching happening in case
>> of my RTDM driver.
>>
>> Is there any other way to check this issue and improve latency with
>> rtdm driver ?
>>
>> If you have any other pointers/suggestions, please let me know.
>>
>>
>
> After many iterations, we still have no precise idea of the actual test
> you are actually running, since the application code is only sketched,
> and the module code is only partially available to us which does not
> help either.

OK, I will try to post my code on github so that you can review.

> Since there is no way we can converge to any sensible
> result that way, I have demoed how I would write a simple test:
>
> http://xenomai.org/downloads/xenomai/tmp/posix_test/
>

OK. Thank you so much for providing the sample.

> This test involves two modules, plan Linux and RTDM, and a single POSIX
> client code alternatively built with libcobalt and glibc.
>
> It displays the min, max and average values observed for read() and
> write() loops. More details are available from comments in the source
> code regarding the measurement.
>

First of all I checked your code.
I think your driver code (normal/rtdm) is almost same as mine (except
for the event signal part).
>From the application side, the difference is that, you are using
separate real time thread to
read/write the data.
But I am doing everything in main sequentially
(open->write->read->close), and measuring latency
only during write and read.

I even tried running the whole operation inside a RT task with priority 99.
Then in this case, latency values are reduced by almost half, but
still 2-3 us higher than normal driver.

> Once the two modules, and two test executables are built, just push the
> modules (they can live together in the kernel, no conflict), then run
> either of the executables for measuring 1) the execution time on the
> write() side, and 2) the response time on the read side.
>

Anyways, I have build your test application and modules (using my
Makefile) and verified it
on my x86_64 skylake machine.

Here are the results that I obtained:

# ./posix_test ; ./cobalt_test
DEVICE: /dev/bar, all microseconds

[ 0' 0"] RD_MIN | RD_MAX |  R_AVG  | WR_MIN | WR_MAX |  WR_AVG
--------------------------------------------------------------
              0 |     16 |   0.518 |      0 |      7 |  0.338
              0 |     16 |   0.501 |      0 |     16 |  0.337
              0 |     16 |   0.501 |      0 |     16 |  0.337
              0 |     16 |   0.501 |      0 |     16 |  0.337
              0 |     16 |   0.501 |      0 |     16 |  0.337
              0 |     16 |   0.501 |      0 |     16 |  0.337
              0 |     16 |   0.501 |      0 |     16 |  0.337
              0 |     16 |   0.501 |      0 |     16 |  0.337
              0 |     16 |   0.501 |      0 |     16 |  0.337
^C
DEVICE: /dev/rtdm/foo, all microseconds

[ 0' 0"] RD_MIN | RD_MAX |  R_AVG  | WR_MIN | WR_MAX |  WR_AVG
--------------------------------------------------------------
              0 |      1 |   0.573 |      0 |      1 |  0.241
              0 |     17 |   0.570 |      0 |     17 |  0.240
              0 |     17 |   0.570 |      0 |     17 |  0.240
              0 |     17 |   0.570 |      0 |     17 |  0.240
              0 |     17 |   0.570 |      0 |     17 |  0.240
              0 |     17 |   0.570 |      0 |     17 |  0.240
              0 |     17 |   0.570 |      0 |     17 |  0.240
              0 |     17 |   0.570 |      0 |     17 |  0.240
^C

Here, I did not run any dd or hackbench loops.
This is just a plan run on x86 PC.

Here also it looks like read_max is higher for rtdm case.
What does this indicates to you ?

>From this, do you see any configuration problem in my machine?
Can you share your /proc/cmdline for x86, if you added anything?
Also, if any config that you would have enabled/disabled?

One thing is, I am using some 3 months old xenomai-3 kernel drivers.
But ipipe is same, and also I tried upgrading to xenomai-3 latest
libraries as well.

One more thing:
In the same SkyLake machine when I measure latency for a simple
Xenomai task application,
I get better latency compare to normal kernel posix application (with
100 us sleep).

For your reference, I am also providing latency test output from
SkyLake machine.
# /usr/xenomai/bin/latency
== Sampling period: 100 us
== Test mode: periodic user-mode task
== All results in microseconds
warming up...
RTT|  00:00:01  (periodic user-mode task, 100 us period, priority 99)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
RTD|     -0.176|      0.033|      0.806|       0|     0|     -0.176|      0.806
RTD|     -0.173|      0.033|      0.670|       0|     0|     -0.176|      0.806
RTD|     -0.150|      0.034|      1.000|       0|     0|     -0.176|      1.000
RTD|     -0.155|      0.033|      0.289|       0|     0|     -0.176|      1.000
RTD|     -0.169|      0.033|      0.841|       0|     0|     -0.176|      1.000
RTD|     -0.161|      0.033|      0.895|       0|     0|     -0.176|      1.000
RTD|     -0.177|      0.033|      0.209|       0|     0|     -0.177|      1.000
RTD|     -0.171|      0.033|      0.321|       0|     0|     -0.177|      1.000
RTD|     -0.159|      0.032|      0.208|       0|     0|     -0.177|      1.000
RTD|     -0.163|      0.033|      0.907|       0|     0|     -0.177|      1.000
RTD|     -0.154|      0.033|      0.707|       0|     0|     -0.177|      1.000
RTD|     -0.185|      0.084|      0.401|       0|     0|     -0.185|      1.000
RTD|     -0.175|      0.033|      0.539|       0|     0|     -0.185|      1.000
RTD|     -0.196|      0.033|      0.370|       0|     0|     -0.196|      1.000
RTD|     -0.178|      0.033|      0.800|       0|     0|     -0.196|      1.000
^C---|-----------|-----------|-----------|--------|------|-------------------------
RTS|     -0.196|      0.036|      1.000|       0|     0|    00:00:15/00:00:15



> On imx6qp (quad-core ARM Cortex A9 1.2Ghz), under stress load (dd loop +
> hackbench loops) after 15' runtime (which is not long enough for full
> validation but significant for getting the general trend), the figures
> are as follows:
>
> Cobalt:
>
> [15' 0"] RD_MIN | RD_MAX |  R_AVG  | WR_MIN | WR_MAX |  WR_AVG
> --------------------------------------------------------------
>               7 |     49 |   9.100 |      5 |     46 |  6.464
>
> (plain) POSIX [CONFIG_PREEMPT]:
>
> [15' 0"] RD_MIN | RD_MAX |  R_AVG  | WR_MIN | WR_MAX |  WR_AVG
> --------------------------------------------------------------
>              13 |    456 |  16.325 |      7 |    435 |  9.568
>
>
> On x86_64 with the exact same code (embbeded SoC 4 x 2Ghz CPU),
>
> Cobalt:
>
>             2 |     12 |   3.059 |      1 |     13 |  2.015
>
> (plain) POSIX [CONFIG_PREEMPT]:
>
>             3 |    182 |   3.702 |      1 |    185 |  2.095
>
>
> Those figures are consistent with what I'd expect from such test.
>
> The Xenomai code base used is the tip of the stable-3.0.x branch. ARM
> kernel is 4.14.4, x86 kernel is 4.9.51 with the latest I-pipe to date
> for both.
>
> NOTE about Alchemy: the figures with this API would be in the same
> ballpark than Cobalt, slightly higher (2-3 us worst-case) due to the
> intermediate libcopperplate layer involved in implementing it. As I
> mentioned earlier, using rt_dev* and friends does not make any
> difference than using Cobalt directly, those are macro wrappers
> expanding to Cobalt calls.
>
> If you want to figure out what a plain Linux kernel is apt to when it
> comes to response time to timer events on your SoC, you can configure
> Xenomai with --core=mercury, instead of cobalt. The stock "latency" test
> will be built against the plain glibc, instead of libcobalt. Then you
> can compare the latency figures to the results obtained from the same
> test from a Cobalt build. Such test has been carefully crafted and
> refined over the years: the results you get from it are trustworthy.
>

OK thanks for your suggestions. I will try for mercury core as well.


> --
> Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-26 13:12                     ` Pintu Kumar
@ 2018-03-26 15:09                       ` Philippe Gerum
  2018-03-27 12:09                         ` Pintu Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2018-03-26 15:09 UTC (permalink / raw)
  To: Pintu Kumar; +Cc: Xenomai@xenomai.org

On 03/26/2018 03:12 PM, Pintu Kumar wrote:
> Dear Philippe,
> 
> Thank you so much for your reply.
> Please find my comments below.
> 
> 
> On Sun, Mar 25, 2018 at 5:39 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 03/23/2018 01:40 PM, Pintu Kumar wrote:
>>> Dear Philippe,
>>>
>>> Thank you so much for your detailed explanation.
>>>
>>> First to cross-check, I also tried on ARM BeagleBone (White) with
>>> 256MB RAM, Single core
>>> These are the values I got.
>>
>> After how many samples?
> 
> Just after 3 samples only for each cases. Just an initial run to
> understand the difference.
> 
>>
>>> ===========================
>>> NORMAL KERNEL Driver Build (with xenomai present)
>>> ---------------------------------------------------------------------------
>>> write latency: 8235.083 us
>>
>> Are you sure that any driver (plain Linux or Xenomai) would take up 8.2
>> MILLIseconds for performing a single write with your test module? Either
>> you meant 8235 nanoseconds, or something is really wrong with your
>> system.
> 
> Yes these values are calculated in micro-seconds.
> I have used the same to measure latency for native application, and it
> reports fine.
> These large values are seen only on Beagle bone (white) with just 256MB RAM,
> and model name: ARMv7 Processor rev 2 (v7l)
> I think this is very old board and its very slow in normal usage itself.
> So, these figures could be high.
>

No, these figures do not make sense in dual kernel context even on this
board and clearly denote a problem with the application, given the
figures you got on the same machine with a proper latency test as
illustrated below.

> This is the latency test output from same machine:
> # /usr/xenomai/bin/latency
> == Sampling period: 1000 us
> == Test mode: periodic user-mode task
> == All results in microseconds
> warming up...
> RTT|  00:00:01  (periodic user-mode task, 1000 us period, priority 99)
> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
> RTD|     25.249|     29.711|     63.749|       0|     0|     25.249|     63.749
> RTD|     25.207|     29.589|     60.749|       0|     0|     25.207|     63.749
> RTD|     25.207|     29.701|     61.041|       0|     0|     25.207|     63.749
> RTD|     22.874|     29.263|     54.749|       0|     0|     22.874|     63.749
> RTD|     25.248|     29.542|     78.373|       0|     0|     22.874|     78.373
> RTD|     15.081|     29.050|     55.082|       0|     0|     15.081|     78.373
> RTD|     22.873|     28.940|     57.415|       0|     0|     15.081|     78.373
> RTD|     25.331|     28.972|     55.498|       0|     0|     15.081|     78.373
> RTD|     24.164|     28.071|     56.498|       0|     0|     15.081|     78.373
> ^C---|-----------|-----------|-----------|--------|------|-------------------------
> RTS|     15.081|     29.204|     78.373|       0|     0|    00:00:10/00:00:10
> 
> 
> 

[...]

> 
> I even tried running the whole operation inside a RT task with priority 99.

Regular Linux threads cannot compete with Cobalt threads by priority in
rt mode, those threads are managed by separate schedulers, and Cobalt's
scheduler always runs first. There is no point in raising the priority
of a Cobalt thread if no other Cobalt thread actually competes with it.

> Then in this case, latency values are reduced by almost half, but
> still 2-3 us higher than normal driver.
> 

Again, please read my previous answers, I'm going to rehash them:
your test does NOT measure latency as in "response time", simply because
it does not wait for any event. To respond to an event, you have to wait
for it first. Your test measures the execution time of a dummy write()
system call. The test I provided does the measure execution time for
write() AND the latency of read(), just like the "latency" test bundled
with Xenomai.

The values you got so far with any test are not trustworthy because:

- you don't add any stress load in parallel, so your are not measuring
anything close to a worst case time,

- the test needs to run for much longer than a couple of seconds or even
worse, iterations. It needs to run for hours under load to be
meaningful. The longer it runs with well-chosen stress loads in
parallel, the more trustworthy it can be.

In absence of formal analysis, all we have is a probabilistic approach
for getting close to the real worst-case latency figures: the only way
we can get there is to hammer the target system hard, diversely and long
enough while measuring.

>> Once the two modules, and two test executables are built, just push the
>> modules (they can live together in the kernel, no conflict), then run
>> either of the executables for measuring 1) the execution time on the
>> write() side, and 2) the response time on the read side.
>>
> 
> Anyways, I have build your test application and modules (using my
> Makefile) and verified it
> on my x86_64 skylake machine.
> 
> Here are the results that I obtained:
> 
> # ./posix_test ; ./cobalt_test
> DEVICE: /dev/bar, all microseconds
> 
> [ 0' 0"] RD_MIN | RD_MAX |  R_AVG  | WR_MIN | WR_MAX |  WR_AVG
> --------------------------------------------------------------
>               0 |     16 |   0.518 |      0 |      7 |  0.338
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
> ^C
> DEVICE: /dev/rtdm/foo, all microseconds
> 
> [ 0' 0"] RD_MIN | RD_MAX |  R_AVG  | WR_MIN | WR_MAX |  WR_AVG
> --------------------------------------------------------------
>               0 |      1 |   0.573 |      0 |      1 |  0.241
>               0 |     17 |   0.570 |      0 |     17 |  0.240
>               0 |     17 |   0.570 |      0 |     17 |  0.240
>               0 |     17 |   0.570 |      0 |     17 |  0.240
>               0 |     17 |   0.570 |      0 |     17 |  0.240
>               0 |     17 |   0.570 |      0 |     17 |  0.240
>               0 |     17 |   0.570 |      0 |     17 |  0.240
>               0 |     17 |   0.570 |      0 |     17 |  0.240
> ^C
> 
> Here, I did not run any dd or hackbench loops.
> This is just a plan run on x86 PC.

Which is wrong and totally defeats the purpose of your test, see above.
Really, you do want to run significant stress load in parallel to any of
your test, which must last long enough to be meaningful.

> 
> Here also it looks like read_max is higher for rtdm case.

The information you have from RD_MAX after only a few seconds is
meaningless, the average might be slightly more useful. It says that on
average, it takes 69 nanoseconds more to run the RTDM  write() syscall
compared to a native one, while no other activity is eagerly trying to
grab the CPU, or causing cacheline eviction. Which may definitely be the
case, since Xenomai may be running more code in the syscall path in some
situations.

Taking the argument to the extremes, this basically tells you that you
might want to use a native kernel for running empty write() system calls
on an idle machine. For any other usage, you might want to consider
other factors that may well happen in real world systems.

Metaphorically speaking, this real-time game is not about shooting the
ball through the hoop most of the time, but doing so every time a shot
is taken instead, including when the shooter is facing both the
non-cooperative 250 lbs power forward and 7 ft pivot from the other side.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-26 15:09                       ` Philippe Gerum
@ 2018-03-27 12:09                         ` Pintu Kumar
  2018-03-27 13:05                           ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Pintu Kumar @ 2018-03-27 12:09 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai@xenomai.org

Hi,

Thank you so much for all your explanation.
But, before I dig deeper, I have some simple questions which is troubling me.

1) In Idle case, we see latency improvement (~2-3 micro-seconds on
average) using Xenomai native task application, compared to normal
posix thread application (with 100 us sleep).
    Then why not it is seen with RTDM driver and its application ?

2) In the idle case itself, we have seen read/write latency
improvement using RTNET (loopback) and simple UDP client/server
application using Xenomai posix skin
    Then why not same is visible with RTDM driver?

3) How to choose, when to develop a Xenomai native application, and
when to simply convert using posix skin ?


Anyways, give me sometime, I will share my samples on github for your review.
Here is the snapshot of my rtdm application.
----------------------------------------------------------------
int main(int argc, char *argv[])
{
        int ret, fd, len;
        char msg[] = "Hello!";
        char *buff = NULL;
        RTIME prev, now;
        float diff;

        fd = rt_dev_open(/dev/rtdm/rtsample, O_RDWR);

        len = strlen(msg);
        prev = rt_timer_read();
        ret = rt_dev_write(fd, msg, len);
        now = rt_timer_read();
        diff = (now - prev) / 1000.0;
        rt_printf("write latency: %5.3f us\n", diff);

        buff = malloc(4096);
        memset(buff, 0, len);
        prev = rt_timer_read();
        ret = rt_dev_read(fd, buff, len);
        now = rt_timer_read();
        diff = (now - prev) / 1000.0;
        rt_printf("Message from driver:\n");
        rt_printf("%s\n", buff);
        rt_printf("read latency: %5.3f us\n", diff);

        if (buff)
                free(buff);
err:
        rt_dev_close(fd);

        return 0;
}
--------------------------------------------------------------------------
I used exactly the same application for normal driver, using normal
open/read/write calls.
Do you see any problem in the way, we measure latency here?

Yes, it is true that I am just measuring read/write system call timing
across both kernel.
We expect that Xenomai (or any RTOS) should give better latency both
in idle case as well as over-loaded system.
Of-course in over-loaded system, we can trust that real time
application (with strict timing requirement) will never cross the
deadline.


Thanks,
Pintu

On Mon, Mar 26, 2018 at 8:39 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/26/2018 03:12 PM, Pintu Kumar wrote:
>> Dear Philippe,
>>
>> Thank you so much for your reply.
>> Please find my comments below.
>>
>>
>> On Sun, Mar 25, 2018 at 5:39 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>>> On 03/23/2018 01:40 PM, Pintu Kumar wrote:
>>>> Dear Philippe,
>>>>
>>>> Thank you so much for your detailed explanation.
>>>>
>>>> First to cross-check, I also tried on ARM BeagleBone (White) with
>>>> 256MB RAM, Single core
>>>> These are the values I got.
>>>
>>> After how many samples?
>>
>> Just after 3 samples only for each cases. Just an initial run to
>> understand the difference.
>>
>>>
>>>> ===========================
>>>> NORMAL KERNEL Driver Build (with xenomai present)
>>>> ---------------------------------------------------------------------------
>>>> write latency: 8235.083 us
>>>
>>> Are you sure that any driver (plain Linux or Xenomai) would take up 8.2
>>> MILLIseconds for performing a single write with your test module? Either
>>> you meant 8235 nanoseconds, or something is really wrong with your
>>> system.
>>
>> Yes these values are calculated in micro-seconds.
>> I have used the same to measure latency for native application, and it
>> reports fine.
>> These large values are seen only on Beagle bone (white) with just 256MB RAM,
>> and model name: ARMv7 Processor rev 2 (v7l)
>> I think this is very old board and its very slow in normal usage itself.
>> So, these figures could be high.
>>
>
> No, these figures do not make sense in dual kernel context even on this
> board and clearly denote a problem with the application, given the
> figures you got on the same machine with a proper latency test as
> illustrated below.
>
>> This is the latency test output from same machine:
>> # /usr/xenomai/bin/latency
>> == Sampling period: 1000 us
>> == Test mode: periodic user-mode task
>> == All results in microseconds
>> warming up...
>> RTT|  00:00:01  (periodic user-mode task, 1000 us period, priority 99)
>> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
>> RTD|     25.249|     29.711|     63.749|       0|     0|     25.249|     63.749
>> RTD|     25.207|     29.589|     60.749|       0|     0|     25.207|     63.749
>> RTD|     25.207|     29.701|     61.041|       0|     0|     25.207|     63.749
>> RTD|     22.874|     29.263|     54.749|       0|     0|     22.874|     63.749
>> RTD|     25.248|     29.542|     78.373|       0|     0|     22.874|     78.373
>> RTD|     15.081|     29.050|     55.082|       0|     0|     15.081|     78.373
>> RTD|     22.873|     28.940|     57.415|       0|     0|     15.081|     78.373
>> RTD|     25.331|     28.972|     55.498|       0|     0|     15.081|     78.373
>> RTD|     24.164|     28.071|     56.498|       0|     0|     15.081|     78.373
>> ^C---|-----------|-----------|-----------|--------|------|-------------------------
>> RTS|     15.081|     29.204|     78.373|       0|     0|    00:00:10/00:00:10
>>
>>
>>
>
> [...]
>
>>
>> I even tried running the whole operation inside a RT task with priority 99.
>
> Regular Linux threads cannot compete with Cobalt threads by priority in
> rt mode, those threads are managed by separate schedulers, and Cobalt's
> scheduler always runs first. There is no point in raising the priority
> of a Cobalt thread if no other Cobalt thread actually competes with it.
>
>> Then in this case, latency values are reduced by almost half, but
>> still 2-3 us higher than normal driver.
>>
>
> Again, please read my previous answers, I'm going to rehash them:
> your test does NOT measure latency as in "response time", simply because
> it does not wait for any event. To respond to an event, you have to wait
> for it first. Your test measures the execution time of a dummy write()
> system call. The test I provided does the measure execution time for
> write() AND the latency of read(), just like the "latency" test bundled
> with Xenomai.
>
> The values you got so far with any test are not trustworthy because:
>
> - you don't add any stress load in parallel, so your are not measuring
> anything close to a worst case time,
>
> - the test needs to run for much longer than a couple of seconds or even
> worse, iterations. It needs to run for hours under load to be
> meaningful. The longer it runs with well-chosen stress loads in
> parallel, the more trustworthy it can be.
>
> In absence of formal analysis, all we have is a probabilistic approach
> for getting close to the real worst-case latency figures: the only way
> we can get there is to hammer the target system hard, diversely and long
> enough while measuring.
>
>>> Once the two modules, and two test executables are built, just push the
>>> modules (they can live together in the kernel, no conflict), then run
>>> either of the executables for measuring 1) the execution time on the
>>> write() side, and 2) the response time on the read side.
>>>
>>
>> Anyways, I have build your test application and modules (using my
>> Makefile) and verified it
>> on my x86_64 skylake machine.
>>
>> Here are the results that I obtained:
>>
>> # ./posix_test ; ./cobalt_test
>> DEVICE: /dev/bar, all microseconds
>>
>> [ 0' 0"] RD_MIN | RD_MAX |  R_AVG  | WR_MIN | WR_MAX |  WR_AVG
>> --------------------------------------------------------------
>>               0 |     16 |   0.518 |      0 |      7 |  0.338
>>               0 |     16 |   0.501 |      0 |     16 |  0.337
>>               0 |     16 |   0.501 |      0 |     16 |  0.337
>>               0 |     16 |   0.501 |      0 |     16 |  0.337
>>               0 |     16 |   0.501 |      0 |     16 |  0.337
>>               0 |     16 |   0.501 |      0 |     16 |  0.337
>>               0 |     16 |   0.501 |      0 |     16 |  0.337
>>               0 |     16 |   0.501 |      0 |     16 |  0.337
>>               0 |     16 |   0.501 |      0 |     16 |  0.337
>> ^C
>> DEVICE: /dev/rtdm/foo, all microseconds
>>
>> [ 0' 0"] RD_MIN | RD_MAX |  R_AVG  | WR_MIN | WR_MAX |  WR_AVG
>> --------------------------------------------------------------
>>               0 |      1 |   0.573 |      0 |      1 |  0.241
>>               0 |     17 |   0.570 |      0 |     17 |  0.240
>>               0 |     17 |   0.570 |      0 |     17 |  0.240
>>               0 |     17 |   0.570 |      0 |     17 |  0.240
>>               0 |     17 |   0.570 |      0 |     17 |  0.240
>>               0 |     17 |   0.570 |      0 |     17 |  0.240
>>               0 |     17 |   0.570 |      0 |     17 |  0.240
>>               0 |     17 |   0.570 |      0 |     17 |  0.240
>> ^C
>>
>> Here, I did not run any dd or hackbench loops.
>> This is just a plan run on x86 PC.
>
> Which is wrong and totally defeats the purpose of your test, see above.
> Really, you do want to run significant stress load in parallel to any of
> your test, which must last long enough to be meaningful.
>
>>
>> Here also it looks like read_max is higher for rtdm case.
>
> The information you have from RD_MAX after only a few seconds is
> meaningless, the average might be slightly more useful. It says that on
> average, it takes 69 nanoseconds more to run the RTDM  write() syscall
> compared to a native one, while no other activity is eagerly trying to
> grab the CPU, or causing cacheline eviction. Which may definitely be the
> case, since Xenomai may be running more code in the syscall path in some
> situations.
>
> Taking the argument to the extremes, this basically tells you that you
> might want to use a native kernel for running empty write() system calls
> on an idle machine. For any other usage, you might want to consider
> other factors that may well happen in real world systems.
>
> Metaphorically speaking, this real-time game is not about shooting the
> ball through the hoop most of the time, but doing so every time a shot
> is taken instead, including when the shooter is facing both the
> non-cooperative 250 lbs power forward and 7 ft pivot from the other side.
>
> --
> Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-27 12:09                         ` Pintu Kumar
@ 2018-03-27 13:05                           ` Philippe Gerum
  2018-04-02 13:48                             ` Pintu Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2018-03-27 13:05 UTC (permalink / raw)
  To: Pintu Kumar; +Cc: Xenomai@xenomai.org

On 03/27/2018 02:09 PM, Pintu Kumar wrote:
> Hi,
> 
> Thank you so much for all your explanation.
> But, before I dig deeper, I have some simple questions which is troubling me.
> 
> 1) In Idle case, we see latency improvement (~2-3 micro-seconds on
> average) using Xenomai native task application, compared to normal
> posix thread application (with 100 us sleep).
>     Then why not it is seen with RTDM driver and its application ?
> 
> 2) In the idle case itself, we have seen read/write latency
> improvement using RTNET (loopback) and simple UDP client/server
> application using Xenomai posix skin
>     Then why not same is visible with RTDM driver?
>

All answers have already been given, or can be inferred fairly easy from
what was said earlier.

On a general note, it is safer not to compare apples and oranges when it
comes to benchmarking: two different codes usually do things
differently, so you may simply not be measuring the same work load. If
you really want to figure out what happens, you may need to dive into
the implementation.

> 3) How to choose, when to develop a Xenomai native application, and
> when to simply convert using posix skin ?
> 

Choose portability with POSIX. The so-called "native" - now Alchemy API
- was defined 14 years ago for a single purpose: make greybards coming
from the traditional RTOS world comfortable with API semantics close to
what they were used to with VxWorks, VRTX, pSOS and friends.

Alchemy exists in Xenomai 3 only for the purpose of making the life of
people migrating from 2.x easier.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-03-27 13:05                           ` Philippe Gerum
@ 2018-04-02 13:48                             ` Pintu Kumar
  2018-04-03 10:44                               ` Pintu Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: Pintu Kumar @ 2018-04-02 13:48 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai@xenomai.org

Hi,

I tried changing "native" skin to "alchemy" skin and replaced header
from native/timer.h to alchemy/timer.h
But when I build my rtdm application, I get below error:

app_test.c: In function ‘main’:
app_test.c:41:7: warning: implicit declaration of function
‘rt_dev_open’ [-Wimplicit-function-declaration]
  fd = rt_dev_open(DEVICE_NAME, O_RDWR);
       ^
app_test.c:50:8: warning: implicit declaration of function
‘rt_dev_write’ [-Wimplicit-function-declaration]
  ret = rt_dev_write(fd, msg, len);
        ^
app_test.c:65:8: warning: implicit declaration of function
‘rt_dev_read’ [-Wimplicit-function-declaration]
  ret = rt_dev_read(fd, buff, len);
        ^
app_test.c:76:2: warning: implicit declaration of function
‘rt_dev_close’ [-Wimplicit-function-declaration]
  rt_dev_close(fd);
  ^
/tmp/ccTcHyFz.o: In function `main':
app_test.c:(.text.startup+0x45): undefined reference to `rt_dev_open'
app_test.c:(.text.startup+0x85): undefined reference to `rt_dev_write'
app_test.c:(.text.startup+0xf2): undefined reference to `rt_dev_read'
app_test.c:(.text.startup+0x15e): undefined reference to `rt_dev_close'
collect2: error: ld returned 1 exit status
Makefile:11: recipe for target 'app_test' failed
make: *** [app_test] Error 1

---------------------------------------
I even tried adding "-lrtdm" to the cflags but it did not work.
skin = alchemy
CC := $(shell $(INSTALL_PATH)/xeno-config --cc)
CFLAGS := $(shell $(INSTALL_PATH)/xeno-config --skin=$(skin) --cflags)
-O2 -lrtdm

I am using Xenomai-3.0.

Is there any specific library I need to include for alchemy with RTDM API?



Thanks,
Pintu




On Tue, Mar 27, 2018 at 6:35 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 03/27/2018 02:09 PM, Pintu Kumar wrote:
>> Hi,
>>
>> Thank you so much for all your explanation.
>> But, before I dig deeper, I have some simple questions which is troubling me.
>>
>> 1) In Idle case, we see latency improvement (~2-3 micro-seconds on
>> average) using Xenomai native task application, compared to normal
>> posix thread application (with 100 us sleep).
>>     Then why not it is seen with RTDM driver and its application ?
>>
>> 2) In the idle case itself, we have seen read/write latency
>> improvement using RTNET (loopback) and simple UDP client/server
>> application using Xenomai posix skin
>>     Then why not same is visible with RTDM driver?
>>
>
> All answers have already been given, or can be inferred fairly easy from
> what was said earlier.
>
> On a general note, it is safer not to compare apples and oranges when it
> comes to benchmarking: two different codes usually do things
> differently, so you may simply not be measuring the same work load. If
> you really want to figure out what happens, you may need to dive into
> the implementation.
>
>> 3) How to choose, when to develop a Xenomai native application, and
>> when to simply convert using posix skin ?
>>
>
> Choose portability with POSIX. The so-called "native" - now Alchemy API
> - was defined 14 years ago for a single purpose: make greybards coming
> from the traditional RTOS world comfortable with API semantics close to
> what they were used to with VxWorks, VRTX, pSOS and friends.
>
> Alchemy exists in Xenomai 3 only for the purpose of making the life of
> people migrating from 2.x easier.
>
> --
> Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] Simple application for invoking rtdm driver
  2018-04-02 13:48                             ` Pintu Kumar
@ 2018-04-03 10:44                               ` Pintu Kumar
  0 siblings, 0 replies; 18+ messages in thread
From: Pintu Kumar @ 2018-04-03 10:44 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai@xenomai.org

Hi,

Any clue on this?

I guess native and alchemy skin is internally same.
Then why rt_dev_xxx does not work with alchemy skin.
Please let me know the alternate API to invoke rtdm driver with alchemy skin.

Thanks,
Pintu


On Mon, Apr 2, 2018 at 7:18 PM, Pintu Kumar <pintu.ping@gmail.com> wrote:
> Hi,
>
> I tried changing "native" skin to "alchemy" skin and replaced header
> from native/timer.h to alchemy/timer.h
> But when I build my rtdm application, I get below error:
>
> app_test.c: In function ‘main’:
> app_test.c:41:7: warning: implicit declaration of function
> ‘rt_dev_open’ [-Wimplicit-function-declaration]
>   fd = rt_dev_open(DEVICE_NAME, O_RDWR);
>        ^
> app_test.c:50:8: warning: implicit declaration of function
> ‘rt_dev_write’ [-Wimplicit-function-declaration]
>   ret = rt_dev_write(fd, msg, len);
>         ^
> app_test.c:65:8: warning: implicit declaration of function
> ‘rt_dev_read’ [-Wimplicit-function-declaration]
>   ret = rt_dev_read(fd, buff, len);
>         ^
> app_test.c:76:2: warning: implicit declaration of function
> ‘rt_dev_close’ [-Wimplicit-function-declaration]
>   rt_dev_close(fd);
>   ^
> /tmp/ccTcHyFz.o: In function `main':
> app_test.c:(.text.startup+0x45): undefined reference to `rt_dev_open'
> app_test.c:(.text.startup+0x85): undefined reference to `rt_dev_write'
> app_test.c:(.text.startup+0xf2): undefined reference to `rt_dev_read'
> app_test.c:(.text.startup+0x15e): undefined reference to `rt_dev_close'
> collect2: error: ld returned 1 exit status
> Makefile:11: recipe for target 'app_test' failed
> make: *** [app_test] Error 1
>
> ---------------------------------------
> I even tried adding "-lrtdm" to the cflags but it did not work.
> skin = alchemy
> CC := $(shell $(INSTALL_PATH)/xeno-config --cc)
> CFLAGS := $(shell $(INSTALL_PATH)/xeno-config --skin=$(skin) --cflags)
> -O2 -lrtdm
>
> I am using Xenomai-3.0.
>
> Is there any specific library I need to include for alchemy with RTDM API?
>
>
>
> Thanks,
> Pintu
>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-04-03 10:44 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-20  1:42 [Xenomai] Simple application for invoking rtdm driver Pintu Kumar
2018-03-20  3:33 ` Greg Gallagher
2018-03-20  5:27   ` Pintu Kumar
2018-03-20  7:26     ` Pintu Kumar
2018-03-20  9:32       ` Philippe Gerum
2018-03-20 11:31         ` Pintu Kumar
2018-03-20 11:37           ` Philippe Gerum
2018-03-20 11:45           ` Philippe Gerum
2018-03-20 12:00             ` Pintu Kumar
2018-03-20 13:09               ` Philippe Gerum
2018-03-23 12:40                 ` Pintu Kumar
2018-03-25 12:09                   ` Philippe Gerum
2018-03-26 13:12                     ` Pintu Kumar
2018-03-26 15:09                       ` Philippe Gerum
2018-03-27 12:09                         ` Pintu Kumar
2018-03-27 13:05                           ` Philippe Gerum
2018-04-02 13:48                             ` Pintu Kumar
2018-04-03 10:44                               ` Pintu Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.