[Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
@ 2016-05-31 14:09 Wolfgang Netbal
  2016-05-31 14:16 ` Gilles Chanteperdrix
  2016-05-31 15:08 ` Philippe Gerum
  0 siblings, 2 replies; 36+ messages in thread
From: Wolfgang Netbal @ 2016-05-31 14:09 UTC (permalink / raw)
  To: xenomai

Dear all,

we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to 
"XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system 
is now up and running and works stable. Unfortunately we see a 
difference in the performance. Our old combination (XENOMAI 2.6.2.1 + 
Linux 3.0.43) was slightly faster.

At the moment it looks like that XENOMAI 2.6.4 calls 
xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old 
system.  Every call of xnpod_schedule_handler interrupts our main 
XENOMAI task with priority = 95.

I have compared the configuration of both XENOMAI versions but did not 
found any difference. I checked the source code (new commits) but did 
also not find a solution.

Any help would be greatly appreciated.

-- 
Wolfgang Netbal
Softwareentwicklung
________________________________

SIGMATEK GmbH & Co KG
Sigmatekstraße 1
5112 Lamprechtshausen
Österreich / Austria

Tel.: +43/6274/4321-0
Fax: +43/6274/4321-300
E-Mail: wolfgang.netbal@sigmatek.at
http://www.sigmatek-automation.at

****************************Please note: ****************************
This email and all attachments are confidential and intended solely
for the person or entity to whom it is addressed. If you are not the
named addressee you must not make this email and all attachments
accessible to any other person. If you have received this email in
error please delete it together with all attachments.
*********************************************************************

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-05-31 14:09 [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 Wolfgang Netbal
@ 2016-05-31 14:16 ` Gilles Chanteperdrix
  2016-06-01 13:52   ` Wolfgang Netbal
  2016-05-31 15:08 ` Philippe Gerum
  1 sibling, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-05-31 14:16 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
> Dear all,
> 
> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to 
> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system 
> is now up and running and works stable. Unfortunately we see a 
> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + 
> Linux 3.0.43) was slightly faster.
> 
> At the moment it looks like that XENOMAI 2.6.4 calls 
> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old 
> system.  Every call of xnpod_schedule_handler interrupts our main 
> XENOMAI task with priority = 95.
> 
> I have compared the configuration of both XENOMAI versions but did not 
> found any difference. I checked the source code (new commits) but did 
> also not find a solution.

Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see
whether it comes from the kernel update or the Xenomai udpate?

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-05-31 14:09 [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 Wolfgang Netbal
  2016-05-31 14:16 ` Gilles Chanteperdrix
@ 2016-05-31 15:08 ` Philippe Gerum
  1 sibling, 0 replies; 36+ messages in thread
From: Philippe Gerum @ 2016-05-31 15:08 UTC (permalink / raw)
  To: wolfgang.netbal, xenomai

On 05/31/2016 04:09 PM, Wolfgang Netbal wrote:
> Dear all,
> 
> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
> is now up and running and works stable. Unfortunately we see a
> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
> Linux 3.0.43) was slightly faster.
>

Could you quantify "slightly faster"? This is a dual kernel system, so
changes on either the regular kernel side and/or the co-kernel side may
have a measurable impact.

> At the moment it looks like that XENOMAI 2.6.4 calls
> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
> system.  Every call of xnpod_schedule_handler interrupts our main
> XENOMAI task with priority = 95.

That handler is attached to the inter-processor interrupt used for
rescheduling tasks running on a remote CPU. You may want to check the
CPU affinity settings of your tasks, and the way they interact/synchronize.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-05-31 14:16 ` Gilles Chanteperdrix
@ 2016-06-01 13:52   ` Wolfgang Netbal
  2016-06-01 14:12     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-01 13:52 UTC (permalink / raw)
  To: xenomai



Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
>> Dear all,
>>
>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>> is now up and running and works stable. Unfortunately we see a
>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>> Linux 3.0.43) was slightly faster.
>>
>> At the moment it looks like that XENOMAI 2.6.4 calls
>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>> system.  Every call of xnpod_schedule_handler interrupts our main
>> XENOMAI task with priority = 95.
>>
>> I have compared the configuration of both XENOMAI versions but did not
>> found any difference. I checked the source code (new commits) but did
>> also not find a solution.
> Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see
> whether it comes from the kernel update or the Xenomai udpate?
I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to 
Xenomai 2.6.2.1
Looks like there is an other reason than Xenomai.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-01 13:52   ` Wolfgang Netbal
@ 2016-06-01 14:12     ` Gilles Chanteperdrix
  2016-06-02  8:15       ` Wolfgang Netbal
  0 siblings, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-01 14:12 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
> > On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
> >> Dear all,
> >>
> >> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
> >> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
> >> is now up and running and works stable. Unfortunately we see a
> >> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
> >> Linux 3.0.43) was slightly faster.
> >>
> >> At the moment it looks like that XENOMAI 2.6.4 calls
> >> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
> >> system.  Every call of xnpod_schedule_handler interrupts our main
> >> XENOMAI task with priority = 95.
> >>
> >> I have compared the configuration of both XENOMAI versions but did not
> >> found any difference. I checked the source code (new commits) but did
> >> also not find a solution.
> > Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see
> > whether it comes from the kernel update or the Xenomai udpate?
> I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to 
> Xenomai 2.6.2.1
> Looks like there is an other reason than Xenomai.

Ok, one thing to pay attention to on imx6 is the L2 cache write
allocate policy. You want to disable L2 write allocate on imx6 to
get low latencies. I do not know which patches exactly you are
using, so it is difficult to check, but the kernel normally displays
the value set in the L2 auxiliary configuration register, you can
check in the datasheet if it means that L2 write allocate is
disabled or not. And check if you get the same value with 3.0 and
3.10.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-01 14:12     ` Gilles Chanteperdrix
@ 2016-06-02  8:15       ` Wolfgang Netbal
  2016-06-02  8:23         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-02  8:15 UTC (permalink / raw)
  To: xenomai



Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
>>>> Dear all,
>>>>
>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>>>> is now up and running and works stable. Unfortunately we see a
>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>>>> Linux 3.0.43) was slightly faster.
>>>>
>>>> At the moment it looks like that XENOMAI 2.6.4 calls
>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>>>> system.  Every call of xnpod_schedule_handler interrupts our main
>>>> XENOMAI task with priority = 95.
>>>>
>>>> I have compared the configuration of both XENOMAI versions but did not
>>>> found any difference. I checked the source code (new commits) but did
>>>> also not find a solution.
>>> Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see
>>> whether it comes from the kernel update or the Xenomai udpate?
>> I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to
>> Xenomai 2.6.2.1
>> Looks like there is an other reason than Xenomai.
> Ok, one thing to pay attention to on imx6 is the L2 cache write
> allocate policy. You want to disable L2 write allocate on imx6 to
> get low latencies. I do not know which patches exactly you are
> using, so it is difficult to check, but the kernel normally displays
> the value set in the L2 auxiliary configuration register, you can
> check in the datasheet if it means that L2 write allocate is
> disabled or not. And check if you get the same value with 3.0 and
> 3.10.
Thank you for this hint, I looked around in the kernel config, but cant 
find
an option sounds like L2 write allocate.
The only option I found was CACHE_L2X0 and that is activated on both 
kernels.
Do you have an idea whats the name of this configuration or where in the
kernel sources it should be located, so I can find out whats the name of 
the
config flag by searching the sourcecode.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-02  8:15       ` Wolfgang Netbal
@ 2016-06-02  8:23         ` Gilles Chanteperdrix
  2016-06-06  7:03           ` Wolfgang Netbal
  0 siblings, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-02  8:23 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
> > On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
> >>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
> >>>> Dear all,
> >>>>
> >>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
> >>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
> >>>> is now up and running and works stable. Unfortunately we see a
> >>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
> >>>> Linux 3.0.43) was slightly faster.
> >>>>
> >>>> At the moment it looks like that XENOMAI 2.6.4 calls
> >>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
> >>>> system.  Every call of xnpod_schedule_handler interrupts our main
> >>>> XENOMAI task with priority = 95.
> >>>>
> >>>> I have compared the configuration of both XENOMAI versions but did not
> >>>> found any difference. I checked the source code (new commits) but did
> >>>> also not find a solution.
> >>> Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see
> >>> whether it comes from the kernel update or the Xenomai udpate?
> >> I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to
> >> Xenomai 2.6.2.1
> >> Looks like there is an other reason than Xenomai.
> > Ok, one thing to pay attention to on imx6 is the L2 cache write
> > allocate policy. You want to disable L2 write allocate on imx6 to
> > get low latencies. I do not know which patches exactly you are
> > using, so it is difficult to check, but the kernel normally displays
> > the value set in the L2 auxiliary configuration register, you can
> > check in the datasheet if it means that L2 write allocate is
> > disabled or not. And check if you get the same value with 3.0 and
> > 3.10.
> Thank you for this hint, I looked around in the kernel config, but cant 
> find
> an option sounds like L2 write allocate.
> The only option I found was CACHE_L2X0 and that is activated on both 
> kernels.
> Do you have an idea whats the name of this configuration or where in the
> kernel sources it should be located, so I can find out whats the name of 
> the
> config flag by searching the sourcecode.

I never talked about any kernel configuration option. I am talking
checking the value passed to the L2 cache auxiliary configuration
register, this is a hardware register. Also, as I said, the value
passed to the L2 cache auxiliary register is printed by the kernel
during boot.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-02  8:23         ` Gilles Chanteperdrix
@ 2016-06-06  7:03           ` Wolfgang Netbal
  2016-06-06 15:35             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-06  7:03 UTC (permalink / raw)
  To: xenomai



Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>>>>>> is now up and running and works stable. Unfortunately we see a
>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>>>>>> Linux 3.0.43) was slightly faster.
>>>>>>
>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
>>>>>> XENOMAI task with priority = 95.
>>>>>>
>>>>>> I have compared the configuration of both XENOMAI versions but did not
>>>>>> found any difference. I checked the source code (new commits) but did
>>>>>> also not find a solution.
>>>>> Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see
>>>>> whether it comes from the kernel update or the Xenomai udpate?
>>>> I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to
>>>> Xenomai 2.6.2.1
>>>> Looks like there is an other reason than Xenomai.
>>> Ok, one thing to pay attention to on imx6 is the L2 cache write
>>> allocate policy. You want to disable L2 write allocate on imx6 to
>>> get low latencies. I do not know which patches exactly you are
>>> using, so it is difficult to check, but the kernel normally displays
>>> the value set in the L2 auxiliary configuration register, you can
>>> check in the datasheet if it means that L2 write allocate is
>>> disabled or not. And check if you get the same value with 3.0 and
>>> 3.10.
>> Thank you for this hint, I looked around in the kernel config, but cant
>> find
>> an option sounds like L2 write allocate.
>> The only option I found was CACHE_L2X0 and that is activated on both
>> kernels.
>> Do you have an idea whats the name of this configuration or where in the
>> kernel sources it should be located, so I can find out whats the name of
>> the
>> config flag by searching the sourcecode.
> I never talked about any kernel configuration option. I am talking
> checking the value passed to the L2 cache auxiliary configuration
> register, this is a hardware register. Also, as I said, the value
> passed to the L2 cache auxiliary register is printed by the kernel
> during boot.
>
>
Sorry Gilles,
I found the message in the kernel log, you are right they are different
Kernel 3.0.43 shows   l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL 
0x02850000, Cache size: 524288 B
Kernel 3.10.53 shows l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL 
0x32c50000, Cache size: 524288 B
Kernel 3.10.53 sets addidtional the bits 22 (Shared attribute override 
enable), 28 (Data prefetch) and 29 (Instruction prefetch)
I used the same settings on Kernel 3.0.43 but the perfromance didn't 
change, looks like this configurations didn't slow down my
system.

What I have seen while searching the kernel config was that there are a 
few errate that are activated as dependency in 3.10.53,
to be sure none of the errata is the source of my performance reduction 
I activated them on 3.0.43 as well.
But again no difference to our default configuration.

To avoid our application is running slower I created a shell-script 
incrementing a variable
10.000 times and measuring the runtime with time

#!/bin/sh
var=0
while [  $var -lt $1 ]; do
     let var++
done

 > time /mnt/drive-C/CpuTime.sh 10000

On this test
Kernel 3.0.43 Xenomai 2.6.2.1  needs 480 ms
Kernel 3.10.53  Xenomai 2.6.4  needs 820ms

This differences are huge, an I'm not sure if I can trust this test
because we also use a different busybox,
and the difference using our application are between 2% and 3%
in the realtime task (Xenomaitask with priority 95)
Do you have an idea why this is that much slower ?

I also see differences when I use the xeno-test command to check the speed

Kernel 3.0.43 Xenomai 2.6.2.1

Started child 1209: /bin/sh /usr/xenomai/bin/xeno-test-run-wrapper 
/usr/xenomai/bin/xeno-test
+ echo 0
+ /usr/xenomai/bin/arith
mul: 0x79364d93, shft: 26
integ: 30, frac: 0x4d9364d9364d9364

signed positive operation: 0x03ffffffffffffff * 1000000000 / 33000000
inline calibration: 0x0000000000000000: 43.260 ns, rejected 0/10000
inlined llimd: 0x79364d9364d9362f: 1476.384 ns, rejected 4/10000
inlined llmulshft: 0x79364d92ffffffe1: 35.131 ns, rejected 0/10000
inlined nodiv_llimd: 0x79364d9364d9362f: 47.745 ns, rejected 3/10000
out of line calibration: 0x0000000000000000: 49.235 ns, rejected 2/10000
out of line llimd: 0x79364d9364d9362f: 1483.759 ns, rejected 2/10000
out of line llmulshft: 0x79364d92ffffffe1: 31.719 ns, rejected 2/10000
out of line nodiv_llimd: 0x79364d9364d9362f: 49.376 ns, rejected 0/10000

signed negative operation: 0xfc00000000000001 * 1000000000 / 33000000
inline calibration: 0x0000000000000000: 41.872 ns, rejected 0/10000
inlined llimd: 0x86c9b26c9b26c9d1: 1485.415 ns, rejected 2/10000
inlined llmulshft: 0x86c9b26d0000001e: 39.234 ns, rejected 0/10000
inlined nodiv_llimd: 0x86c9b26c9b26c9d1: 54.266 ns, rejected 1/10000
out of line calibration: 0x0000000000000000: 49.237 ns, rejected 0/10000
out of line llimd: 0x86c9b26c9b26c9d1: 1489.059 ns, rejected 1/10000
out of line llmulshft: 0xd45d172d0000001e: 36.847 ns, rejected 0/10000
out of line nodiv_llimd: 0x86c9b26c9b26c9d1: 56.973 ns, rejected 2/10000

unsigned operation: 0x03ffffffffffffff * 1000000000 / 33000000
inline calibration: 0x0000000000000000: 42.432 ns, rejected 1/10000
inlined nodiv_ullimd: 0x79364d9364d9362f: 51.083 ns, rejected 0/10000
out of line calibration: 0x0000000000000000: 48.086 ns, rejected 0/10000
out of line nodiv_ullimd: 0x79364d9364d9362f: 44.964 ns, rejected 0/10000
+ /usr/xenomai/bin/clocktest -C 42 -T 30
Xenomai: POSIX skin or CONFIG_XENO_OPT_PERVASIVE disabled.
(modprobe xeno_posix?)
+ /usr/xenomai/bin/clocktest -T 30
Xenomai: POSIX skin or CONFIG_XENO_OPT_PERVASIVE disabled.
(modprobe xeno_posix?)

Kernel 3.10.53 Xenomai 2.6.4

Started child 729: /bin/sh /usr/xenomai/bin/xeno-test-run-wrapper 
/usr/xenomai/bin/xeno-test
++ echo 0
++ /usr/xenomai/bin/arith
mul: 0x79364d93, shft: 26
integ: 30, frac: 0x4d9364d9364d9364

signed positive operation: 0x03ffffffffffffff * 1000000000 / 33000000
inline calibration: 0x0000000000000000: 42.979 ns, rejected 1/10000
inlined llimd: 0x79364d9364d9362f: 1491.632 ns, rejected 2/10000
inlined llmulshft: 0x79364d92ffffffe1: 37.873 ns, rejected 1/10000
inlined nodiv_llimd: 0x79364d9364d9362f: 50.520 ns, rejected 0/10000
out of line calibration: 0x0000000000000000: 50.611 ns, rejected 1/10000
out of line llimd: 0x79364d9364d9362f: 1476.381 ns, rejected 4/10000
out of line llmulshft: 0x79364d92ffffffe1: 25.364 ns, rejected 1/10000
out of line nodiv_llimd: 0x79364d9364d9362f: 45.493 ns, rejected 1/10000

signed negative operation: 0xfc00000000000001 * 1000000000 / 33000000
inline calibration: 0x0000000000000000: 42.962 ns, rejected 1/10000
inlined llimd: 0x86c9b26c9b26c9d1: 1488.811 ns, rejected 4/10000
inlined llmulshft: 0x86c9b26d0000001e: 42.972 ns, rejected 2/10000
inlined nodiv_llimd: 0x86c9b26c9b26c9d1: 55.611 ns, rejected 1/10000
out of line calibration: 0x0000000000000000: 50.572 ns, rejected 1/10000
out of line llimd: 0x86c9b26c9b26c9d1: 1481.904 ns, rejected 3/10000
out of line llmulshft: 0x86c9b26d0000001e: 27.818 ns, rejected 0/10000
out of line nodiv_llimd: 0x86c9b26c9b26c9d1: 53.008 ns, rejected 1/10000

unsigned operation: 0x03ffffffffffffff * 1000000000 / 33000000
inline calibration: 0x0000000000000000: 42.968 ns, rejected 0/10000
inlined nodiv_ullimd: 0x79364d9364d9362f: 53.060 ns, rejected 1/10000
out of line calibration: 0x0000000000000000: 50.591 ns, rejected 1/10000
out of line nodiv_ullimd: 0x79364d9364d9362f: 46.102 ns, rejected 1/10000
++ /usr/xenomai/bin/clocktest -C 42 -T 30
Xenomai: POSIX skin or CONFIG_XENO_OPT_PERVASIVE disabled.
(modprobe xeno_posix?)
++ /usr/xenomai/bin/clocktest -T 30
Xenomai: POSIX skin or CONFIG_XENO_OPT_PERVASIVE disabled.
(modprobe xeno_posix?)

Some of the operations are faster on newer Xenomai but a few are much 
slower,
for example inlined llimd.

With every test I run it looks like the issue is not located in Kernel 
or Xenomai.
Do you know any speed issues on system libraries like libc or something 
like that ?

Kind regards
Wolfgang


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-06  7:03           ` Wolfgang Netbal
@ 2016-06-06 15:35             ` Gilles Chanteperdrix
  2016-06-07 14:13               ` Wolfgang Netbal
  0 siblings, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-06 15:35 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
> > On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
> >>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
> >>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
> >>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
> >>>>>> Dear all,
> >>>>>>
> >>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
> >>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
> >>>>>> is now up and running and works stable. Unfortunately we see a
> >>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
> >>>>>> Linux 3.0.43) was slightly faster.
> >>>>>>
> >>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
> >>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
> >>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
> >>>>>> XENOMAI task with priority = 95.
> >>>>>>
> >>>>>> I have compared the configuration of both XENOMAI versions but did not
> >>>>>> found any difference. I checked the source code (new commits) but did
> >>>>>> also not find a solution.
> >>>>> Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see
> >>>>> whether it comes from the kernel update or the Xenomai udpate?
> >>>> I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to
> >>>> Xenomai 2.6.2.1
> >>>> Looks like there is an other reason than Xenomai.
> >>> Ok, one thing to pay attention to on imx6 is the L2 cache write
> >>> allocate policy. You want to disable L2 write allocate on imx6 to
> >>> get low latencies. I do not know which patches exactly you are
> >>> using, so it is difficult to check, but the kernel normally displays
> >>> the value set in the L2 auxiliary configuration register, you can
> >>> check in the datasheet if it means that L2 write allocate is
> >>> disabled or not. And check if you get the same value with 3.0 and
> >>> 3.10.
> >> Thank you for this hint, I looked around in the kernel config, but cant
> >> find
> >> an option sounds like L2 write allocate.
> >> The only option I found was CACHE_L2X0 and that is activated on both
> >> kernels.
> >> Do you have an idea whats the name of this configuration or where in the
> >> kernel sources it should be located, so I can find out whats the name of
> >> the
> >> config flag by searching the sourcecode.
> > I never talked about any kernel configuration option. I am talking
> > checking the value passed to the L2 cache auxiliary configuration
> > register, this is a hardware register. Also, as I said, the value
> > passed to the L2 cache auxiliary register is printed by the kernel
> > during boot.
> >
> >
> Sorry Gilles,
> I found the message in the kernel log, you are right they are different
> Kernel 3.0.43 shows   l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL 
> 0x02850000, Cache size: 524288 B
> Kernel 3.10.53 shows l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL 
> 0x32c50000, Cache size: 524288 B
> Kernel 3.10.53 sets addidtional the bits 22 (Shared attribute override 
> enable), 28 (Data prefetch) and 29 (Instruction prefetch)
> I used the same settings on Kernel 3.0.43 but the perfromance didn't 
> change, looks like this configurations didn't slow down my
> system.
> 
> What I have seen while searching the kernel config was that there are a 
> few errate that are activated as dependency in 3.10.53,
> to be sure none of the errata is the source of my performance reduction 
> I activated them on 3.0.43 as well.
> But again no difference to our default configuration.
> 
> To avoid our application is running slower I created a shell-script 
> incrementing a variable
> 10.000 times and measuring the runtime with time
> 
> #!/bin/sh
> var=0
> while [  $var -lt $1 ]; do
>      let var++
> done
> 
>  > time /mnt/drive-C/CpuTime.sh 10000
> 
> On this test
> Kernel 3.0.43 Xenomai 2.6.2.1  needs 480 ms
> Kernel 3.10.53  Xenomai 2.6.4  needs 820ms

If you run the same test several times on the same kernel, do you
reliably always get the same duration?

> 
> This differences are huge, an I'm not sure if I can trust this test
> because we also use a different busybox,
> and the difference using our application are between 2% and 3%
> in the realtime task (Xenomaitask with priority 95)
> Do you have an idea why this is that much slower ?

I would not call a 2% or 3% difference "much slower", only
measurement noise.

> 
> I also see differences when I use the xeno-test command to check the speed
> Some of the operations are faster on newer Xenomai but a few are much 
> slower,
> for example inlined llimd.

The differences in the "arith" test are measurement noise. Chances
are, if you run twice the arith test with the same kernel you are
not going to find the same values.

> 
> With every test I run it looks like the issue is not located in Kernel 
> or Xenomai.
> Do you know any speed issues on system libraries like libc or something 
> like that ?

Stupid question: do the two kernels run the processor at the same
speed? You could have a difference if one kernel runs it at 1GHz and
the other at 800MHz for instance.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-06 15:35             ` Gilles Chanteperdrix
@ 2016-06-07 14:13               ` Wolfgang Netbal
  2016-06-07 17:00                 ` Gilles Chanteperdrix
  2016-06-07 17:22                 ` Philippe Gerum
  0 siblings, 2 replies; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-07 14:13 UTC (permalink / raw)
  Cc: xenomai



Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>>>>>>>> is now up and running and works stable. Unfortunately we see a
>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>>>>>>>> Linux 3.0.43) was slightly faster.
>>>>>>>>
>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
>>>>>>>> XENOMAI task with priority = 95.
As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
and 1038 handled by xnpod_schedule_handler() while my realtime task
is running on kernel 3.10.53 with Xenomai 2.6.4.
On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
once that are send by my board using GPIOs, but this virtual interrupts
are assigned to Xenomai and Linux as well but I didn't see a handler 
installed.
I'm pretty sure that these interrupts are slowing down my system, but
where do they come from ?
why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
how long do they need to process ?

Is there any dependecy in Xenomai between the kernel version and this 
virtual interrupts ?

>>>>>>>> I have compared the configuration of both XENOMAI versions but did not
>>>>>>>> found any difference. I checked the source code (new commits) but did
>>>>>>>> also not find a solution.
>>>>>>> Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see
>>>>>>> whether it comes from the kernel update or the Xenomai udpate?
>>>>>> I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to
>>>>>> Xenomai 2.6.2.1
>>>>>> Looks like there is an other reason than Xenomai.
>>>>> Ok, one thing to pay attention to on imx6 is the L2 cache write
>>>>> allocate policy. You want to disable L2 write allocate on imx6 to
>>>>> get low latencies. I do not know which patches exactly you are
>>>>> using, so it is difficult to check, but the kernel normally displays
>>>>> the value set in the L2 auxiliary configuration register, you can
>>>>> check in the datasheet if it means that L2 write allocate is
>>>>> disabled or not. And check if you get the same value with 3.0 and
>>>>> 3.10.
>>>> Thank you for this hint, I looked around in the kernel config, but cant
>>>> find
>>>> an option sounds like L2 write allocate.
>>>> The only option I found was CACHE_L2X0 and that is activated on both
>>>> kernels.
>>>> Do you have an idea whats the name of this configuration or where in the
>>>> kernel sources it should be located, so I can find out whats the name of
>>>> the
>>>> config flag by searching the sourcecode.
>>> I never talked about any kernel configuration option. I am talking
>>> checking the value passed to the L2 cache auxiliary configuration
>>> register, this is a hardware register. Also, as I said, the value
>>> passed to the L2 cache auxiliary register is printed by the kernel
>>> during boot.
>>>
>>>
>> Sorry Gilles,
>> I found the message in the kernel log, you are right they are different
>> Kernel 3.0.43 shows   l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL
>> 0x02850000, Cache size: 524288 B
>> Kernel 3.10.53 shows l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL
>> 0x32c50000, Cache size: 524288 B
>> Kernel 3.10.53 sets addidtional the bits 22 (Shared attribute override
>> enable), 28 (Data prefetch) and 29 (Instruction prefetch)
>> I used the same settings on Kernel 3.0.43 but the perfromance didn't
>> change, looks like this configurations didn't slow down my
>> system.
>>
>> What I have seen while searching the kernel config was that there are a
>> few errate that are activated as dependency in 3.10.53,
>> to be sure none of the errata is the source of my performance reduction
>> I activated them on 3.0.43 as well.
>> But again no difference to our default configuration.
>>
>> To avoid our application is running slower I created a shell-script
>> incrementing a variable
>> 10.000 times and measuring the runtime with time
>>
>> #!/bin/sh
>> var=0
>> while [  $var -lt $1 ]; do
>>       let var++
>> done
>>
>>   > time /mnt/drive-C/CpuTime.sh 10000
>>
>> On this test
>> Kernel 3.0.43 Xenomai 2.6.2.1  needs 480 ms
>> Kernel 3.10.53  Xenomai 2.6.4  needs 820ms
> If you run the same test several times on the same kernel, do you
> reliably always get the same duration?
Yes I run this test 10 times and the values are always nearly the same 
+-10ms
>> This differences are huge, an I'm not sure if I can trust this test
>> because we also use a different busybox,
>> and the difference using our application are between 2% and 3%
>> in the realtime task (Xenomaitask with priority 95)
>> Do you have an idea why this is that much slower ?
> I would not call a 2% or 3% difference "much slower", only
> measurement noise.
I see the difference of 2% or 3% when the realtime task takes only 20% 
of the cpu time
if I set the used cpu time to 90% I have a difference of 12%.
The percente I wrote are average values measured over 10.000 cycles.

On every measurement or test I run is kernel 3.10.53 with xenomai 2.6.4 
slower that
kernel 3.0.43 with xenomai 2.6.2.1
>> I also see differences when I use the xeno-test command to check the speed
>> Some of the operations are faster on newer Xenomai but a few are much
>> slower,
>> for example inlined llimd.
> The differences in the "arith" test are measurement noise. Chances
> are, if you run twice the arith test with the same kernel you are
> not going to find the same values.
>> With every test I run it looks like the issue is not located in Kernel
>> or Xenomai.
>> Do you know any speed issues on system libraries like libc or something
>> like that ?
> Stupid question: do the two kernels run the processor at the same
> speed? You could have a difference if one kernel runs it at 1GHz and
> the other at 800MHz for instance.
>
Yes the tow kernels are running the same processor.
One of the first things I checked was if they setting the same cpu 
frequence
and the RAM settings.
I also changed the microsdcards to avoid that I have a processor that is 
faster.
There are no differences.

Thank you
Wolfgang


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-07 14:13               ` Wolfgang Netbal
@ 2016-06-07 17:00                 ` Gilles Chanteperdrix
  2016-06-27 15:55                   ` Wolfgang Netbal
  2016-06-07 17:22                 ` Philippe Gerum
  1 sibling, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-07 17:00 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
> > On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
> >>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
> >>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
> >>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
> >>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
> >>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
> >>>>>>>> Dear all,
> >>>>>>>>
> >>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
> >>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
> >>>>>>>> is now up and running and works stable. Unfortunately we see a
> >>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
> >>>>>>>> Linux 3.0.43) was slightly faster.
> >>>>>>>>
> >>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
> >>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
> >>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
> >>>>>>>> XENOMAI task with priority = 95.
> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
> and 1038 handled by xnpod_schedule_handler() while my realtime task
> is running on kernel 3.10.53 with Xenomai 2.6.4.
> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
> once that are send by my board using GPIOs, but this virtual interrupts
> are assigned to Xenomai and Linux as well but I didn't see a handler 
> installed.
> I'm pretty sure that these interrupts are slowing down my system, but
> where do they come from ?
> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
> how long do they need to process ?

How do you mean you do not see them? If you are talking about the
rescheduling API, it used no to be bound to a virq (so, it would
have a different irq number on cortex A9, something between 0 and 31
that would not show in the usual /proc files), I wonder if 3.0 is
before or after that. You do not see them in /proc, or you see them
and their count does not increase?

As for where they come from, this is not a mystery, the reschedule
IPI is triggered when code on one cpu changes the scheduler state
(wakes up a thread for instance) on another cpu. If you want to
avoid it, do not do that. That means, do not share mutex between
threads running on different cpus, pay attention for timers to be
running on the same cpu as the thread they signal, etc...

The APC virq is used to multiplex several services, which you can
find by grepping the sources for rthal_apc_alloc:
./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",

It would be interesting to know which of these services is triggered
a lot. One possibility I see would be root thread priority
inheritance, so it would be caused by mode switches. This brings the
question: do your application have threads migrating between primary
and secondary mode, do you see the count of mode switches increase
with the kernel changes, do you have root thread priority
inheritance enabled?

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-07 14:13               ` Wolfgang Netbal
  2016-06-07 17:00                 ` Gilles Chanteperdrix
@ 2016-06-07 17:22                 ` Philippe Gerum
  1 sibling, 0 replies; 36+ messages in thread
From: Philippe Gerum @ 2016-06-07 17:22 UTC (permalink / raw)
  To: wolfgang.netbal; +Cc: xenomai

On 06/07/2016 04:13 PM, Wolfgang Netbal wrote:

> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
> and 1038 handled by xnpod_schedule_handler() while my realtime task
> is running on kernel 3.10.53 with Xenomai 2.6.4.
> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
> once that are send by my board using GPIOs, but this virtual interrupts
> are assigned to Xenomai and Linux as well but I didn't see a handler
> installed.
> I'm pretty sure that these interrupts are slowing down my system, but
> where do they come from ?
> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
> how long do they need to process ?
> 
> Is there any dependecy in Xenomai between the kernel version and this
> virtual interrupts ?
> 

Maybe you should consider reading all the replies you get:

On 05/31/2016 05:08 PM, Philippe Gerum wrote:
> On 05/31/2016 04:09 PM, Wolfgang Netbal wrote:
>> Dear all,
>>
>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>> is now up and running and works stable. Unfortunately we see a
>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>> Linux 3.0.43) was slightly faster.
>>
>
> Could you quantify "slightly faster"? This is a dual kernel system, so
> changes on either the regular kernel side and/or the co-kernel side may
> have a measurable impact.
>
>> At the moment it looks like that XENOMAI 2.6.4 calls
>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>> system.  Every call of xnpod_schedule_handler interrupts our main
>> XENOMAI task with priority = 95.
>
> That handler is attached to the inter-processor interrupt used for
> rescheduling tasks running on a remote CPU. You may want to check the
> CPU affinity settings of your tasks, and the way they
interact/synchronize.
>

You may also want to check the mode switch count for your threads in
/proc/xenomai/stat (MSW field). I suspect your application may be
switching mode like crazy between Linux and Xenomai, causing interrupt
activity for waking up either sides in turn.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-07 17:00                 ` Gilles Chanteperdrix
@ 2016-06-27 15:55                   ` Wolfgang Netbal
  2016-06-27 16:00                     ` Gilles Chanteperdrix
  2016-06-27 16:46                     ` Gilles Chanteperdrix
  0 siblings, 2 replies; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-27 15:55 UTC (permalink / raw)
  To: xenomai



Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>> Dear all,
>>>>>>>>>>
>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>>>>>>>>>> Linux 3.0.43) was slightly faster.
>>>>>>>>>>
>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
>>>>>>>>>> XENOMAI task with priority = 95.
>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
>> and 1038 handled by xnpod_schedule_handler() while my realtime task
>> is running on kernel 3.10.53 with Xenomai 2.6.4.
>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
>> once that are send by my board using GPIOs, but this virtual interrupts
>> are assigned to Xenomai and Linux as well but I didn't see a handler
>> installed.
>> I'm pretty sure that these interrupts are slowing down my system, but
>> where do they come from ?
>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
>> how long do they need to process ?
> How do you mean you do not see them? If you are talking about the
> rescheduling API, it used no to be bound to a virq (so, it would
> have a different irq number on cortex A9, something between 0 and 31
> that would not show in the usual /proc files), I wonder if 3.0 is
> before or after that. You do not see them in /proc, or you see them
> and their count does not increase?
Sorry for the long delay, we ran a lot of tests to find out what could 
be the reason for
the performance difference.

If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to 
the virtual
IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
> As for where they come from, this is not a mystery, the reschedule
> IPI is triggered when code on one cpu changes the scheduler state
> (wakes up a thread for instance) on another cpu. If you want to
> avoid it, do not do that. That means, do not share mutex between
> threads running on different cpus, pay attention for timers to be
> running on the same cpu as the thread they signal, etc...
>
> The APC virq is used to multiplex several services, which you can
> find by grepping the sources for rthal_apc_alloc:
> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
>
> It would be interesting to know which of these services is triggered
> a lot. One possibility I see would be root thread priority
> inheritance, so it would be caused by mode switches. This brings the
> question: do your application have threads migrating between primary
> and secondary mode, do you see the count of mode switches increase
> with the kernel changes, do you have root thread priority
> inheritance enabled?
>
Here a short sum up of our tests and the results and at the end a few 
questions :-)

we are using a Freescale imx6dl on our hardware and upgraded our operating system from
Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
On both Kernels the CONFIG_SMP is set.

What we see is that when we running a customer project in a Xenomai task with priority 95
tooks 40% of the CPU time on Kernel 3.0.43
and 47% of CPU time on Kernel 3.10.53

so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
To find out what is the reason for this difference we ran the following test.
We tried to get the new system faster by change some components of the system.

-Changing U-Boot on new system                -> still 7% slower
-Copy Kernel 3.0.43 to new system            -> still 7% slower
-Creating Kernel 3.0.43 with
     Xenomai 2.6.4 and copy it to new system    -> still 7% slower
-Compiling the new system with
     old GCC version                                        -> still 7% slower
-We also checked the settings for RAM and CPU clock -> these are equal

It looks like that is not one of the big components,
so we started to test some special functions like rt_timer_tsc()
In the following example we stay for 800µs in the while loop and
start this loop again after 200µs delay.
The task application running this code has priotity 95.

Here a simplified code snipped
start = rt_timer_tsc();
do
{
	current = rt_timer_tsc();
	i++;	
} while((current - start) < 800)

i is on old system at 1464
i is on new system at 1392
this means a difference of 5%

Is it possible that the prefetching of code change between the two Kernel versions,
because our customer application is bigger then our testcode the difference can be greater.

Any hints what can be the reason for this slowdown ?

  Kind regards
  Wolfgang




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-27 15:55                   ` Wolfgang Netbal
@ 2016-06-27 16:00                     ` Gilles Chanteperdrix
  2016-06-28  8:08                       ` Wolfgang Netbal
  2016-06-27 16:46                     ` Gilles Chanteperdrix
  1 sibling, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-27 16:00 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
> -Creating Kernel 3.0.43 with
>      Xenomai 2.6.4 and copy it to new system    -> still 7% slower

This contradicts what you said here:
https://xenomai.org/pipermail/xenomai/2016-June/036370.html

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-27 15:55                   ` Wolfgang Netbal
  2016-06-27 16:00                     ` Gilles Chanteperdrix
@ 2016-06-27 16:46                     ` Gilles Chanteperdrix
  2016-06-28  8:31                       ` Wolfgang Netbal
  1 sibling, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-27 16:46 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
> > On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
> >>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
> >>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
> >>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
> >>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
> >>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
> >>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
> >>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>> Dear all,
> >>>>>>>>>>
> >>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
> >>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
> >>>>>>>>>> is now up and running and works stable. Unfortunately we see a
> >>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
> >>>>>>>>>> Linux 3.0.43) was slightly faster.
> >>>>>>>>>>
> >>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
> >>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
> >>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
> >>>>>>>>>> XENOMAI task with priority = 95.
> >> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
> >> and 1038 handled by xnpod_schedule_handler() while my realtime task
> >> is running on kernel 3.10.53 with Xenomai 2.6.4.
> >> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
> >> once that are send by my board using GPIOs, but this virtual interrupts
> >> are assigned to Xenomai and Linux as well but I didn't see a handler
> >> installed.
> >> I'm pretty sure that these interrupts are slowing down my system, but
> >> where do they come from ?
> >> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
> >> how long do they need to process ?
> > How do you mean you do not see them? If you are talking about the
> > rescheduling API, it used no to be bound to a virq (so, it would
> > have a different irq number on cortex A9, something between 0 and 31
> > that would not show in the usual /proc files), I wonder if 3.0 is
> > before or after that. You do not see them in /proc, or you see them
> > and their count does not increase?
> Sorry for the long delay, we ran a lot of tests to find out what could 
> be the reason for
> the performance difference.
> 
> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to 
> the virtual
> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
> > As for where they come from, this is not a mystery, the reschedule
> > IPI is triggered when code on one cpu changes the scheduler state
> > (wakes up a thread for instance) on another cpu. If you want to
> > avoid it, do not do that. That means, do not share mutex between
> > threads running on different cpus, pay attention for timers to be
> > running on the same cpu as the thread they signal, etc...
> >
> > The APC virq is used to multiplex several services, which you can
> > find by grepping the sources for rthal_apc_alloc:
> > ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
> > ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
> > ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
> > ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
> > ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
> > ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
> >
> > It would be interesting to know which of these services is triggered
> > a lot. One possibility I see would be root thread priority
> > inheritance, so it would be caused by mode switches. This brings the
> > question: do your application have threads migrating between primary
> > and secondary mode, do you see the count of mode switches increase
> > with the kernel changes, do you have root thread priority
> > inheritance enabled?
> >
> Here a short sum up of our tests and the results and at the end a few 
> questions :-)
> 
> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
> On both Kernels the CONFIG_SMP is set.
> 
> What we see is that when we running a customer project in a Xenomai task with priority 95
> tooks 40% of the CPU time on Kernel 3.0.43
> and 47% of CPU time on Kernel 3.10.53
> 
> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
> To find out what is the reason for this difference we ran the following test.
> We tried to get the new system faster by change some components of the system.
> 
> -Changing U-Boot on new system                -> still 7% slower
> -Copy Kernel 3.0.43 to new system            -> still 7% slower
> -Creating Kernel 3.0.43 with
>      Xenomai 2.6.4 and copy it to new system    -> still 7% slower
> -Compiling the new system with
>      old GCC version                                        -> still 7% slower
> -We also checked the settings for RAM and CPU clock -> these are equal
> 
> It looks like that is not one of the big components,
> so we started to test some special functions like rt_timer_tsc()
> In the following example we stay for 800µs in the while loop and
> start this loop again after 200µs delay.
> The task application running this code has priotity 95.
> 
> Here a simplified code snipped
> start = rt_timer_tsc();
> do
> {
> 	current = rt_timer_tsc();
> 	i++;	
> } while((current - start) < 800)

If your CPU is running at 1 GHz and uses the global timer as clock
source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
something around 1.6 us

So, I do not really understand what you are talking about. But are
you sure the two kernels use the same clocksource for xenomai? 

Could you show us the result of "dmesg | grep I-pipe" with the two
kernels ?

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-27 16:00                     ` Gilles Chanteperdrix
@ 2016-06-28  8:08                       ` Wolfgang Netbal
  0 siblings, 0 replies; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-28  8:08 UTC (permalink / raw)
  To: xenomai



Am 2016-06-27 um 18:00 schrieb Gilles Chanteperdrix:
> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
>> -Creating Kernel 3.0.43 with
>>       Xenomai 2.6.4 and copy it to new system    -> still 7% slower
> This contradicts what you said here:
> https://xenomai.org/pipermail/xenomai/2016-June/036370.html
I always tried to speed up the new system, so I always tested the
changes on the new system, so when I wrote that nothing changed I meant 
that
the new system doesn speed up.
Sorry for the missing details in the above post.
>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-27 16:46                     ` Gilles Chanteperdrix
@ 2016-06-28  8:31                       ` Wolfgang Netbal
  2016-06-28  8:34                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-28  8:31 UTC (permalink / raw)
  To: xenomai



Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix:
> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>
>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>>>>>>>>>>>> Linux 3.0.43) was slightly faster.
>>>>>>>>>>>>
>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>>>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
>>>>>>>>>>>> XENOMAI task with priority = 95.
>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task
>>>> is running on kernel 3.10.53 with Xenomai 2.6.4.
>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
>>>> once that are send by my board using GPIOs, but this virtual interrupts
>>>> are assigned to Xenomai and Linux as well but I didn't see a handler
>>>> installed.
>>>> I'm pretty sure that these interrupts are slowing down my system, but
>>>> where do they come from ?
>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
>>>> how long do they need to process ?
>>> How do you mean you do not see them? If you are talking about the
>>> rescheduling API, it used no to be bound to a virq (so, it would
>>> have a different irq number on cortex A9, something between 0 and 31
>>> that would not show in the usual /proc files), I wonder if 3.0 is
>>> before or after that. You do not see them in /proc, or you see them
>>> and their count does not increase?
>> Sorry for the long delay, we ran a lot of tests to find out what could
>> be the reason for
>> the performance difference.
>>
>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to
>> the virtual
>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
>>> As for where they come from, this is not a mystery, the reschedule
>>> IPI is triggered when code on one cpu changes the scheduler state
>>> (wakes up a thread for instance) on another cpu. If you want to
>>> avoid it, do not do that. That means, do not share mutex between
>>> threads running on different cpus, pay attention for timers to be
>>> running on the same cpu as the thread they signal, etc...
>>>
>>> The APC virq is used to multiplex several services, which you can
>>> find by grepping the sources for rthal_apc_alloc:
>>> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
>>> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
>>> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
>>> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
>>> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
>>> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
>>>
>>> It would be interesting to know which of these services is triggered
>>> a lot. One possibility I see would be root thread priority
>>> inheritance, so it would be caused by mode switches. This brings the
>>> question: do your application have threads migrating between primary
>>> and secondary mode, do you see the count of mode switches increase
>>> with the kernel changes, do you have root thread priority
>>> inheritance enabled?
>>>
>> Here a short sum up of our tests and the results and at the end a few
>> questions :-)
>>
>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
>> On both Kernels the CONFIG_SMP is set.
>>
>> What we see is that when we running a customer project in a Xenomai task with priority 95
>> tooks 40% of the CPU time on Kernel 3.0.43
>> and 47% of CPU time on Kernel 3.10.53
>>
>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
>> To find out what is the reason for this difference we ran the following test.
>> We tried to get the new system faster by change some components of the system.
>>
>> -Changing U-Boot on new system                -> still 7% slower
>> -Copy Kernel 3.0.43 to new system            -> still 7% slower
>> -Creating Kernel 3.0.43 with
>>       Xenomai 2.6.4 and copy it to new system    -> still 7% slower
>> -Compiling the new system with
>>       old GCC version                                        -> still 7% slower
>> -We also checked the settings for RAM and CPU clock -> these are equal
>>
>> It looks like that is not one of the big components,
>> so we started to test some special functions like rt_timer_tsc()
>> In the following example we stay for 800µs in the while loop and
>> start this loop again after 200µs delay.
>> The task application running this code has priotity 95.
>>
>> Here a simplified code snipped
>> start = rt_timer_tsc();
>> do
>> {
>> 	current = rt_timer_tsc();
>> 	i++;	
>> } while((current - start) < 800)
> If your CPU is running at 1 GHz and uses the global timer as clock
> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
> something around 1.6 us
Sorry I simplified the code snippet a little bit to much.
Thats the correct code.

current = rt_timer_tsc2ns(rt_timer_tsc());

> So, I do not really understand what you are talking about. But are
> you sure the two kernels use the same clocksource for xenomai?
>
> Could you show us the result of "dmesg | grep I-pipe" with the two
> kernels ?
Output of Kernel 3.10.53 with Xenomai 2.6.4
I-pipe, 3.000 MHz clocksource
I-pipe, 396.000 MHz clocksource
I-pipe, 396.000 MHz timer
I-pipe, 396.000 MHz timer
I-pipe: head domain Xenomai registered.

Output of Kernel 3.0.43 with Xenomai 2.6.2.1
[    0.000000] I-pipe 1.18-13: pipeline enabled.
[    0.331999] I-pipe, 396.000 MHz timer
[    0.335720] I-pipe, 396.000 MHz clocksource
[    0.844016] I-pipe: Domain Xenomai registered.

The controller is a imx6dl, this controller can run maximum 800MHz



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28  8:31                       ` Wolfgang Netbal
@ 2016-06-28  8:34                         ` Gilles Chanteperdrix
  2016-06-28  9:15                           ` Wolfgang Netbal
  0 siblings, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-28  8:34 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix:
> > On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
> >>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
> >>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
> >>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
> >>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
> >>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
> >>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
> >>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>> Dear all,
> >>>>>>>>>>>>
> >>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
> >>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
> >>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
> >>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
> >>>>>>>>>>>> Linux 3.0.43) was slightly faster.
> >>>>>>>>>>>>
> >>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
> >>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
> >>>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
> >>>>>>>>>>>> XENOMAI task with priority = 95.
> >>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
> >>>> and 1038 handled by xnpod_schedule_handler() while my realtime task
> >>>> is running on kernel 3.10.53 with Xenomai 2.6.4.
> >>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
> >>>> once that are send by my board using GPIOs, but this virtual interrupts
> >>>> are assigned to Xenomai and Linux as well but I didn't see a handler
> >>>> installed.
> >>>> I'm pretty sure that these interrupts are slowing down my system, but
> >>>> where do they come from ?
> >>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
> >>>> how long do they need to process ?
> >>> How do you mean you do not see them? If you are talking about the
> >>> rescheduling API, it used no to be bound to a virq (so, it would
> >>> have a different irq number on cortex A9, something between 0 and 31
> >>> that would not show in the usual /proc files), I wonder if 3.0 is
> >>> before or after that. You do not see them in /proc, or you see them
> >>> and their count does not increase?
> >> Sorry for the long delay, we ran a lot of tests to find out what could
> >> be the reason for
> >> the performance difference.
> >>
> >> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to
> >> the virtual
> >> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
> >>> As for where they come from, this is not a mystery, the reschedule
> >>> IPI is triggered when code on one cpu changes the scheduler state
> >>> (wakes up a thread for instance) on another cpu. If you want to
> >>> avoid it, do not do that. That means, do not share mutex between
> >>> threads running on different cpus, pay attention for timers to be
> >>> running on the same cpu as the thread they signal, etc...
> >>>
> >>> The APC virq is used to multiplex several services, which you can
> >>> find by grepping the sources for rthal_apc_alloc:
> >>> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
> >>> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
> >>> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
> >>> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
> >>> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
> >>> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
> >>>
> >>> It would be interesting to know which of these services is triggered
> >>> a lot. One possibility I see would be root thread priority
> >>> inheritance, so it would be caused by mode switches. This brings the
> >>> question: do your application have threads migrating between primary
> >>> and secondary mode, do you see the count of mode switches increase
> >>> with the kernel changes, do you have root thread priority
> >>> inheritance enabled?
> >>>
> >> Here a short sum up of our tests and the results and at the end a few
> >> questions :-)
> >>
> >> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
> >> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
> >> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
> >> On both Kernels the CONFIG_SMP is set.
> >>
> >> What we see is that when we running a customer project in a Xenomai task with priority 95
> >> tooks 40% of the CPU time on Kernel 3.0.43
> >> and 47% of CPU time on Kernel 3.10.53
> >>
> >> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
> >> To find out what is the reason for this difference we ran the following test.
> >> We tried to get the new system faster by change some components of the system.
> >>
> >> -Changing U-Boot on new system                -> still 7% slower
> >> -Copy Kernel 3.0.43 to new system            -> still 7% slower
> >> -Creating Kernel 3.0.43 with
> >>       Xenomai 2.6.4 and copy it to new system    -> still 7% slower
> >> -Compiling the new system with
> >>       old GCC version                                        -> still 7% slower
> >> -We also checked the settings for RAM and CPU clock -> these are equal
> >>
> >> It looks like that is not one of the big components,
> >> so we started to test some special functions like rt_timer_tsc()
> >> In the following example we stay for 800µs in the while loop and
> >> start this loop again after 200µs delay.
> >> The task application running this code has priotity 95.
> >>
> >> Here a simplified code snipped
> >> start = rt_timer_tsc();
> >> do
> >> {
> >> 	current = rt_timer_tsc();
> >> 	i++;	
> >> } while((current - start) < 800)
> > If your CPU is running at 1 GHz and uses the global timer as clock
> > source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
> > something around 1.6 us
> Sorry I simplified the code snippet a little bit to much.
> Thats the correct code.
> 
> current = rt_timer_tsc2ns(rt_timer_tsc());
> 
> > So, I do not really understand what you are talking about. But are
> > you sure the two kernels use the same clocksource for xenomai?
> >
> > Could you show us the result of "dmesg | grep I-pipe" with the two
> > kernels ?
> Output of Kernel 3.10.53 with Xenomai 2.6.4
> I-pipe, 3.000 MHz clocksource
> I-pipe, 396.000 MHz clocksource
> I-pipe, 396.000 MHz timer
> I-pipe, 396.000 MHz timer
> I-pipe: head domain Xenomai registered.
> 
> Output of Kernel 3.0.43 with Xenomai 2.6.2.1
> [    0.000000] I-pipe 1.18-13: pipeline enabled.
> [    0.331999] I-pipe, 396.000 MHz timer
> [    0.335720] I-pipe, 396.000 MHz clocksource
> [    0.844016] I-pipe: Domain Xenomai registered.
> 
> The controller is a imx6dl, this controller can run maximum 800MHz

Ok, so the new kernel registers two tsc emulations, could you run
the "tsc" regression test to measure the tsc latency? The two tsc
emulations have very different latencies, so the result would be
unmistakable.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28  8:34                         ` Gilles Chanteperdrix
@ 2016-06-28  9:15                           ` Wolfgang Netbal
  2016-06-28  9:17                             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-28  9:15 UTC (permalink / raw)
  To: xenomai



Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix:
> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix:
>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>>>>>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
>>>>>>>>>>>>>> XENOMAI task with priority = 95.
>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task
>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4.
>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
>>>>>> once that are send by my board using GPIOs, but this virtual interrupts
>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler
>>>>>> installed.
>>>>>> I'm pretty sure that these interrupts are slowing down my system, but
>>>>>> where do they come from ?
>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
>>>>>> how long do they need to process ?
>>>>> How do you mean you do not see them? If you are talking about the
>>>>> rescheduling API, it used no to be bound to a virq (so, it would
>>>>> have a different irq number on cortex A9, something between 0 and 31
>>>>> that would not show in the usual /proc files), I wonder if 3.0 is
>>>>> before or after that. You do not see them in /proc, or you see them
>>>>> and their count does not increase?
>>>> Sorry for the long delay, we ran a lot of tests to find out what could
>>>> be the reason for
>>>> the performance difference.
>>>>
>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to
>>>> the virtual
>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
>>>>> As for where they come from, this is not a mystery, the reschedule
>>>>> IPI is triggered when code on one cpu changes the scheduler state
>>>>> (wakes up a thread for instance) on another cpu. If you want to
>>>>> avoid it, do not do that. That means, do not share mutex between
>>>>> threads running on different cpus, pay attention for timers to be
>>>>> running on the same cpu as the thread they signal, etc...
>>>>>
>>>>> The APC virq is used to multiplex several services, which you can
>>>>> find by grepping the sources for rthal_apc_alloc:
>>>>> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
>>>>> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
>>>>> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
>>>>> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
>>>>> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
>>>>> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
>>>>>
>>>>> It would be interesting to know which of these services is triggered
>>>>> a lot. One possibility I see would be root thread priority
>>>>> inheritance, so it would be caused by mode switches. This brings the
>>>>> question: do your application have threads migrating between primary
>>>>> and secondary mode, do you see the count of mode switches increase
>>>>> with the kernel changes, do you have root thread priority
>>>>> inheritance enabled?
>>>>>
>>>> Here a short sum up of our tests and the results and at the end a few
>>>> questions :-)
>>>>
>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
>>>> On both Kernels the CONFIG_SMP is set.
>>>>
>>>> What we see is that when we running a customer project in a Xenomai task with priority 95
>>>> tooks 40% of the CPU time on Kernel 3.0.43
>>>> and 47% of CPU time on Kernel 3.10.53
>>>>
>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
>>>> To find out what is the reason for this difference we ran the following test.
>>>> We tried to get the new system faster by change some components of the system.
>>>>
>>>> -Changing U-Boot on new system                -> still 7% slower
>>>> -Copy Kernel 3.0.43 to new system            -> still 7% slower
>>>> -Creating Kernel 3.0.43 with
>>>>        Xenomai 2.6.4 and copy it to new system    -> still 7% slower
>>>> -Compiling the new system with
>>>>        old GCC version                                        -> still 7% slower
>>>> -We also checked the settings for RAM and CPU clock -> these are equal
>>>>
>>>> It looks like that is not one of the big components,
>>>> so we started to test some special functions like rt_timer_tsc()
>>>> In the following example we stay for 800µs in the while loop and
>>>> start this loop again after 200µs delay.
>>>> The task application running this code has priotity 95.
>>>>
>>>> Here a simplified code snipped
>>>> start = rt_timer_tsc();
>>>> do
>>>> {
>>>> 	current = rt_timer_tsc();
>>>> 	i++;	
>>>> } while((current - start) < 800)
>>> If your CPU is running at 1 GHz and uses the global timer as clock
>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
>>> something around 1.6 us
>> Sorry I simplified the code snippet a little bit to much.
>> Thats the correct code.
>>
>> current = rt_timer_tsc2ns(rt_timer_tsc());
>>
>>> So, I do not really understand what you are talking about. But are
>>> you sure the two kernels use the same clocksource for xenomai?
>>>
>>> Could you show us the result of "dmesg | grep I-pipe" with the two
>>> kernels ?
>> Output of Kernel 3.10.53 with Xenomai 2.6.4
>> I-pipe, 3.000 MHz clocksource
>> I-pipe, 396.000 MHz clocksource
>> I-pipe, 396.000 MHz timer
>> I-pipe, 396.000 MHz timer
>> I-pipe: head domain Xenomai registered.
>>
>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1
>> [    0.000000] I-pipe 1.18-13: pipeline enabled.
>> [    0.331999] I-pipe, 396.000 MHz timer
>> [    0.335720] I-pipe, 396.000 MHz clocksource
>> [    0.844016] I-pipe: Domain Xenomai registered.
>>
>> The controller is a imx6dl, this controller can run maximum 800MHz
> Ok, so the new kernel registers two tsc emulations, could you run
> the "tsc" regression test to measure the tsc latency? The two tsc
> emulations have very different latencies, so the result would be
> unmistakable.
>

Output of Kernel 3.10.53 with Xenomai 2.6.4
/usr/xenomai/bin/latency
== Sampling period: 1000 us
== Test mode: periodic user-mode task
== All results in microseconds
warming up...
RTT|  00:00:01  (periodic user-mode task, 1000 us period, priority 99)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
RTD|     -3.048|     -1.960|      4.053|       0|     0|     -3.048|      4.053
RTD|     -3.064|     -1.874|      5.936|       0|     0|     -3.064|      5.936
RTD|     -3.137|     -1.963|      3.545|       0|     0|     -3.137|      5.936
RTD|     -3.069|     -1.968|      5.805|       0|     0|     -3.137|      5.936
RTD|     -3.064|     -1.945|      4.371|       0|     0|     -3.137|      5.936
RTD|     -3.071|     -1.905|      3.613|       0|     0|     -3.137|      5.936
RTD|     -3.119|     -1.766|      4.967|       0|     0|     -3.137|      5.936
RTD|     -3.119|     -1.910|      3.883|       0|     0|     -3.137|      5.936
RTD|     -3.102|     -1.910|      5.494|       0|     0|     -3.137|      5.936
RTD|     -3.107|     -1.907|      3.795|       0|     0|     -3.137|      5.936
RTD|     -3.066|     -1.935|      4.068|       0|     0|     -3.137|      5.936
RTD|     -2.960|     -1.920|      4.270|       0|     0|     -3.137|      5.936
RTD|     -3.190|     -2.003|      3.436|       0|     0|     -3.190|      5.936
RTD|     -3.026|     -2.003|      4.679|       0|     0|     -3.190|      5.936
RTD|     -3.149|     -2.011|      3.861|       0|     0|     -3.190|      5.936
RTD|     -3.059|     -1.990|      3.651|       0|     0|     -3.190|      5.936
RTD|     -3.119|     -1.940|      4.249|       0|     0|     -3.190|      5.936
RTD|     -3.192|     -1.983|      4.270|       0|     0|     -3.192|      5.936
RTD|     -3.026|     -2.003|      3.568|       0|     0|     -3.192|      5.936
RTD|     -3.096|     -1.973|      6.376|       0|     0|     -3.192|      6.376
RTD|     -3.258|     -1.953|      5.131|       0|     0|     -3.258|      6.376
RTT|  00:00:22  (periodic user-mode task, 1000 us period, priority 99)

Can be the two timers the reason for the -3.xxx at lat min ?
Is it possible to disable one of the two timers ?

Output of Kernel 3.0.43 with Xenomai 2.6.2.1

/usr/xenomai/bin/latency
== Sampling period: 1000 us
== Test mode: periodic user-mode task
== All results in microseconds
warming up...
RTT|  00:00:01  (periodic user-mode task, 1000 us period, priority 99)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
RTD|      3.060|      5.098|     10.255|       0|     0|      3.060|     10.255
RTD|      3.073|      5.146|     10.742|       0|     0|      3.060|     10.742
RTD|      2.999|      5.146|     10.818|       0|     0|      2.999|     10.818
RTD|      3.249|      5.146|     10.936|       0|     0|      2.999|     10.936
RTD|      3.169|      5.184|     11.656|       0|     0|      2.999|     11.656
RTD|      3.133|      5.156|     10.881|       0|     0|      2.999|     11.656
RTD|      3.123|      5.068|      9.835|       0|     0|      2.999|     11.656
RTD|      3.032|      5.101|     10.628|       0|     0|      2.999|     11.656
RTD|      3.088|      5.047|     10.492|       0|     0|      2.999|     11.656
RTD|      3.068|      5.159|     11.681|       0|     0|      2.999|     11.681
RTD|      2.967|      5.073|     11.648|       0|     0|      2.967|     11.681
RTD|      3.073|      5.106|     11.499|       0|     0|      2.967|     11.681
RTD|      3.053|      5.063|     10.727|       0|     0|      2.967|     11.681
RTD|      3.159|      5.113|     10.560|       0|     0|      2.967|     11.681
RTD|      3.020|      5.078|     11.578|       0|     0|      2.967|     11.681
RTD|      3.227|      5.146|     10.856|       0|     0|      2.967|     11.681
RTD|      3.186|      5.118|     10.335|       0|     0|      2.967|     11.681
RTD|      3.116|      5.108|     10.782|       0|     0|      2.967|     11.681
RTD|      2.954|      5.095|     11.921|       0|     0|      2.954|     11.921
RTD|      2.952|      5.156|     10.631|       0|     0|      2.952|     11.921
RTD|      2.898|      5.121|     10.699|       0|     0|      2.898|     11.921
RTT|  00:00:22  (periodic user-mode task, 1000 us period, priority 99)




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28  9:15                           ` Wolfgang Netbal
@ 2016-06-28  9:17                             ` Gilles Chanteperdrix
  2016-06-28  9:28                               ` Wolfgang Netbal
  0 siblings, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-28  9:17 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix:
> > On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix:
> >>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
> >>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
> >>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
> >>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
> >>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
> >>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
> >>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>>>> Dear all,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
> >>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
> >>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
> >>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
> >>>>>>>>>>>>>> Linux 3.0.43) was slightly faster.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
> >>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
> >>>>>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
> >>>>>>>>>>>>>> XENOMAI task with priority = 95.
> >>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
> >>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task
> >>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4.
> >>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
> >>>>>> once that are send by my board using GPIOs, but this virtual interrupts
> >>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler
> >>>>>> installed.
> >>>>>> I'm pretty sure that these interrupts are slowing down my system, but
> >>>>>> where do they come from ?
> >>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
> >>>>>> how long do they need to process ?
> >>>>> How do you mean you do not see them? If you are talking about the
> >>>>> rescheduling API, it used no to be bound to a virq (so, it would
> >>>>> have a different irq number on cortex A9, something between 0 and 31
> >>>>> that would not show in the usual /proc files), I wonder if 3.0 is
> >>>>> before or after that. You do not see them in /proc, or you see them
> >>>>> and their count does not increase?
> >>>> Sorry for the long delay, we ran a lot of tests to find out what could
> >>>> be the reason for
> >>>> the performance difference.
> >>>>
> >>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to
> >>>> the virtual
> >>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
> >>>>> As for where they come from, this is not a mystery, the reschedule
> >>>>> IPI is triggered when code on one cpu changes the scheduler state
> >>>>> (wakes up a thread for instance) on another cpu. If you want to
> >>>>> avoid it, do not do that. That means, do not share mutex between
> >>>>> threads running on different cpus, pay attention for timers to be
> >>>>> running on the same cpu as the thread they signal, etc...
> >>>>>
> >>>>> The APC virq is used to multiplex several services, which you can
> >>>>> find by grepping the sources for rthal_apc_alloc:
> >>>>> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
> >>>>> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
> >>>>> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
> >>>>> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
> >>>>> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
> >>>>> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
> >>>>>
> >>>>> It would be interesting to know which of these services is triggered
> >>>>> a lot. One possibility I see would be root thread priority
> >>>>> inheritance, so it would be caused by mode switches. This brings the
> >>>>> question: do your application have threads migrating between primary
> >>>>> and secondary mode, do you see the count of mode switches increase
> >>>>> with the kernel changes, do you have root thread priority
> >>>>> inheritance enabled?
> >>>>>
> >>>> Here a short sum up of our tests and the results and at the end a few
> >>>> questions :-)
> >>>>
> >>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
> >>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
> >>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
> >>>> On both Kernels the CONFIG_SMP is set.
> >>>>
> >>>> What we see is that when we running a customer project in a Xenomai task with priority 95
> >>>> tooks 40% of the CPU time on Kernel 3.0.43
> >>>> and 47% of CPU time on Kernel 3.10.53
> >>>>
> >>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
> >>>> To find out what is the reason for this difference we ran the following test.
> >>>> We tried to get the new system faster by change some components of the system.
> >>>>
> >>>> -Changing U-Boot on new system                -> still 7% slower
> >>>> -Copy Kernel 3.0.43 to new system            -> still 7% slower
> >>>> -Creating Kernel 3.0.43 with
> >>>>        Xenomai 2.6.4 and copy it to new system    -> still 7% slower
> >>>> -Compiling the new system with
> >>>>        old GCC version                                        -> still 7% slower
> >>>> -We also checked the settings for RAM and CPU clock -> these are equal
> >>>>
> >>>> It looks like that is not one of the big components,
> >>>> so we started to test some special functions like rt_timer_tsc()
> >>>> In the following example we stay for 800µs in the while loop and
> >>>> start this loop again after 200µs delay.
> >>>> The task application running this code has priotity 95.
> >>>>
> >>>> Here a simplified code snipped
> >>>> start = rt_timer_tsc();
> >>>> do
> >>>> {
> >>>> 	current = rt_timer_tsc();
> >>>> 	i++;	
> >>>> } while((current - start) < 800)
> >>> If your CPU is running at 1 GHz and uses the global timer as clock
> >>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
> >>> something around 1.6 us
> >> Sorry I simplified the code snippet a little bit to much.
> >> Thats the correct code.
> >>
> >> current = rt_timer_tsc2ns(rt_timer_tsc());
> >>
> >>> So, I do not really understand what you are talking about. But are
> >>> you sure the two kernels use the same clocksource for xenomai?
> >>>
> >>> Could you show us the result of "dmesg | grep I-pipe" with the two
> >>> kernels ?
> >> Output of Kernel 3.10.53 with Xenomai 2.6.4
> >> I-pipe, 3.000 MHz clocksource
> >> I-pipe, 396.000 MHz clocksource
> >> I-pipe, 396.000 MHz timer
> >> I-pipe, 396.000 MHz timer
> >> I-pipe: head domain Xenomai registered.
> >>
> >> Output of Kernel 3.0.43 with Xenomai 2.6.2.1
> >> [    0.000000] I-pipe 1.18-13: pipeline enabled.
> >> [    0.331999] I-pipe, 396.000 MHz timer
> >> [    0.335720] I-pipe, 396.000 MHz clocksource
> >> [    0.844016] I-pipe: Domain Xenomai registered.
> >>
> >> The controller is a imx6dl, this controller can run maximum 800MHz
> > Ok, so the new kernel registers two tsc emulations, could you run
> > the "tsc" regression test to measure the tsc latency? The two tsc
> > emulations have very different latencies, so the result would be
> > unmistakable.
> >
> 
> Output of Kernel 3.10.53 with Xenomai 2.6.4
> /usr/xenomai/bin/latency

This test is named "latency", not "tsc". As the different names
imply, they are not measuring the same thing.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28  9:17                             ` Gilles Chanteperdrix
@ 2016-06-28  9:28                               ` Wolfgang Netbal
  2016-06-28  9:29                                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-28  9:28 UTC (permalink / raw)
  To: gilles.chanteperdrix; +Cc: xenomai



Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix:
> On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix:
>>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote:
>>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix:
>>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
>>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
>>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
>>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
>>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
>>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
>>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
>>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>>>>>>>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
>>>>>>>>>>>>>>>> XENOMAI task with priority = 95.
>>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
>>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task
>>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4.
>>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
>>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts
>>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler
>>>>>>>> installed.
>>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but
>>>>>>>> where do they come from ?
>>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
>>>>>>>> how long do they need to process ?
>>>>>>> How do you mean you do not see them? If you are talking about the
>>>>>>> rescheduling API, it used no to be bound to a virq (so, it would
>>>>>>> have a different irq number on cortex A9, something between 0 and 31
>>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is
>>>>>>> before or after that. You do not see them in /proc, or you see them
>>>>>>> and their count does not increase?
>>>>>> Sorry for the long delay, we ran a lot of tests to find out what could
>>>>>> be the reason for
>>>>>> the performance difference.
>>>>>>
>>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to
>>>>>> the virtual
>>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
>>>>>>> As for where they come from, this is not a mystery, the reschedule
>>>>>>> IPI is triggered when code on one cpu changes the scheduler state
>>>>>>> (wakes up a thread for instance) on another cpu. If you want to
>>>>>>> avoid it, do not do that. That means, do not share mutex between
>>>>>>> threads running on different cpus, pay attention for timers to be
>>>>>>> running on the same cpu as the thread they signal, etc...
>>>>>>>
>>>>>>> The APC virq is used to multiplex several services, which you can
>>>>>>> find by grepping the sources for rthal_apc_alloc:
>>>>>>> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
>>>>>>> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
>>>>>>> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
>>>>>>> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
>>>>>>> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
>>>>>>> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
>>>>>>>
>>>>>>> It would be interesting to know which of these services is triggered
>>>>>>> a lot. One possibility I see would be root thread priority
>>>>>>> inheritance, so it would be caused by mode switches. This brings the
>>>>>>> question: do your application have threads migrating between primary
>>>>>>> and secondary mode, do you see the count of mode switches increase
>>>>>>> with the kernel changes, do you have root thread priority
>>>>>>> inheritance enabled?
>>>>>>>
>>>>>> Here a short sum up of our tests and the results and at the end a few
>>>>>> questions :-)
>>>>>>
>>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
>>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
>>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
>>>>>> On both Kernels the CONFIG_SMP is set.
>>>>>>
>>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95
>>>>>> tooks 40% of the CPU time on Kernel 3.0.43
>>>>>> and 47% of CPU time on Kernel 3.10.53
>>>>>>
>>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
>>>>>> To find out what is the reason for this difference we ran the following test.
>>>>>> We tried to get the new system faster by change some components of the system.
>>>>>>
>>>>>> -Changing U-Boot on new system                -> still 7% slower
>>>>>> -Copy Kernel 3.0.43 to new system            -> still 7% slower
>>>>>> -Creating Kernel 3.0.43 with
>>>>>>         Xenomai 2.6.4 and copy it to new system    -> still 7% slower
>>>>>> -Compiling the new system with
>>>>>>         old GCC version                                        -> still 7% slower
>>>>>> -We also checked the settings for RAM and CPU clock -> these are equal
>>>>>>
>>>>>> It looks like that is not one of the big components,
>>>>>> so we started to test some special functions like rt_timer_tsc()
>>>>>> In the following example we stay for 800µs in the while loop and
>>>>>> start this loop again after 200µs delay.
>>>>>> The task application running this code has priotity 95.
>>>>>>
>>>>>> Here a simplified code snipped
>>>>>> start = rt_timer_tsc();
>>>>>> do
>>>>>> {
>>>>>> 	current = rt_timer_tsc();
>>>>>> 	i++;	
>>>>>> } while((current - start) < 800)
>>>>> If your CPU is running at 1 GHz and uses the global timer as clock
>>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
>>>>> something around 1.6 us
>>>> Sorry I simplified the code snippet a little bit to much.
>>>> Thats the correct code.
>>>>
>>>> current = rt_timer_tsc2ns(rt_timer_tsc());
>>>>
>>>>> So, I do not really understand what you are talking about. But are
>>>>> you sure the two kernels use the same clocksource for xenomai?
>>>>>
>>>>> Could you show us the result of "dmesg | grep I-pipe" with the two
>>>>> kernels ?
>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4
>>>> I-pipe, 3.000 MHz clocksource
>>>> I-pipe, 396.000 MHz clocksource
>>>> I-pipe, 396.000 MHz timer
>>>> I-pipe, 396.000 MHz timer
>>>> I-pipe: head domain Xenomai registered.
>>>>
>>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1
>>>> [    0.000000] I-pipe 1.18-13: pipeline enabled.
>>>> [    0.331999] I-pipe, 396.000 MHz timer
>>>> [    0.335720] I-pipe, 396.000 MHz clocksource
>>>> [    0.844016] I-pipe: Domain Xenomai registered.
>>>>
>>>> The controller is a imx6dl, this controller can run maximum 800MHz
>>> Ok, so the new kernel registers two tsc emulations, could you run
>>> the "tsc" regression test to measure the tsc latency? The two tsc
>>> emulations have very different latencies, so the result would be
>>> unmistakable.
>>>
>> Output of Kernel 3.10.53 with Xenomai 2.6.4
>> /usr/xenomai/bin/latency
> This test is named "latency", not "tsc". As the different names
> imply, they are not measuring the same thing.
>
Sorry for the stupied question,
but where do I find the "tsc" test, because in folder /usr/xenomai/bin/ 
is it not located



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28  9:28                               ` Wolfgang Netbal
@ 2016-06-28  9:29                                 ` Gilles Chanteperdrix
  2016-06-28  9:51                                   ` Wolfgang Netbal
  0 siblings, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-28  9:29 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix:
> > On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix:
> >>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote:
> >>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix:
> >>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
> >>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
> >>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
> >>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
> >>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>>>>>> Dear all,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
> >>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
> >>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
> >>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
> >>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
> >>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
> >>>>>>>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
> >>>>>>>>>>>>>>>> XENOMAI task with priority = 95.
> >>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
> >>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task
> >>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4.
> >>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
> >>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts
> >>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler
> >>>>>>>> installed.
> >>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but
> >>>>>>>> where do they come from ?
> >>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
> >>>>>>>> how long do they need to process ?
> >>>>>>> How do you mean you do not see them? If you are talking about the
> >>>>>>> rescheduling API, it used no to be bound to a virq (so, it would
> >>>>>>> have a different irq number on cortex A9, something between 0 and 31
> >>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is
> >>>>>>> before or after that. You do not see them in /proc, or you see them
> >>>>>>> and their count does not increase?
> >>>>>> Sorry for the long delay, we ran a lot of tests to find out what could
> >>>>>> be the reason for
> >>>>>> the performance difference.
> >>>>>>
> >>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to
> >>>>>> the virtual
> >>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
> >>>>>>> As for where they come from, this is not a mystery, the reschedule
> >>>>>>> IPI is triggered when code on one cpu changes the scheduler state
> >>>>>>> (wakes up a thread for instance) on another cpu. If you want to
> >>>>>>> avoid it, do not do that. That means, do not share mutex between
> >>>>>>> threads running on different cpus, pay attention for timers to be
> >>>>>>> running on the same cpu as the thread they signal, etc...
> >>>>>>>
> >>>>>>> The APC virq is used to multiplex several services, which you can
> >>>>>>> find by grepping the sources for rthal_apc_alloc:
> >>>>>>> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
> >>>>>>> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
> >>>>>>> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
> >>>>>>> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
> >>>>>>> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
> >>>>>>> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
> >>>>>>>
> >>>>>>> It would be interesting to know which of these services is triggered
> >>>>>>> a lot. One possibility I see would be root thread priority
> >>>>>>> inheritance, so it would be caused by mode switches. This brings the
> >>>>>>> question: do your application have threads migrating between primary
> >>>>>>> and secondary mode, do you see the count of mode switches increase
> >>>>>>> with the kernel changes, do you have root thread priority
> >>>>>>> inheritance enabled?
> >>>>>>>
> >>>>>> Here a short sum up of our tests and the results and at the end a few
> >>>>>> questions :-)
> >>>>>>
> >>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
> >>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
> >>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
> >>>>>> On both Kernels the CONFIG_SMP is set.
> >>>>>>
> >>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95
> >>>>>> tooks 40% of the CPU time on Kernel 3.0.43
> >>>>>> and 47% of CPU time on Kernel 3.10.53
> >>>>>>
> >>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
> >>>>>> To find out what is the reason for this difference we ran the following test.
> >>>>>> We tried to get the new system faster by change some components of the system.
> >>>>>>
> >>>>>> -Changing U-Boot on new system                -> still 7% slower
> >>>>>> -Copy Kernel 3.0.43 to new system            -> still 7% slower
> >>>>>> -Creating Kernel 3.0.43 with
> >>>>>>         Xenomai 2.6.4 and copy it to new system    -> still 7% slower
> >>>>>> -Compiling the new system with
> >>>>>>         old GCC version                                        -> still 7% slower
> >>>>>> -We also checked the settings for RAM and CPU clock -> these are equal
> >>>>>>
> >>>>>> It looks like that is not one of the big components,
> >>>>>> so we started to test some special functions like rt_timer_tsc()
> >>>>>> In the following example we stay for 800µs in the while loop and
> >>>>>> start this loop again after 200µs delay.
> >>>>>> The task application running this code has priotity 95.
> >>>>>>
> >>>>>> Here a simplified code snipped
> >>>>>> start = rt_timer_tsc();
> >>>>>> do
> >>>>>> {
> >>>>>> 	current = rt_timer_tsc();
> >>>>>> 	i++;	
> >>>>>> } while((current - start) < 800)
> >>>>> If your CPU is running at 1 GHz and uses the global timer as clock
> >>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
> >>>>> something around 1.6 us
> >>>> Sorry I simplified the code snippet a little bit to much.
> >>>> Thats the correct code.
> >>>>
> >>>> current = rt_timer_tsc2ns(rt_timer_tsc());
> >>>>
> >>>>> So, I do not really understand what you are talking about. But are
> >>>>> you sure the two kernels use the same clocksource for xenomai?
> >>>>>
> >>>>> Could you show us the result of "dmesg | grep I-pipe" with the two
> >>>>> kernels ?
> >>>> Output of Kernel 3.10.53 with Xenomai 2.6.4
> >>>> I-pipe, 3.000 MHz clocksource
> >>>> I-pipe, 396.000 MHz clocksource
> >>>> I-pipe, 396.000 MHz timer
> >>>> I-pipe, 396.000 MHz timer
> >>>> I-pipe: head domain Xenomai registered.
> >>>>
> >>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1
> >>>> [    0.000000] I-pipe 1.18-13: pipeline enabled.
> >>>> [    0.331999] I-pipe, 396.000 MHz timer
> >>>> [    0.335720] I-pipe, 396.000 MHz clocksource
> >>>> [    0.844016] I-pipe: Domain Xenomai registered.
> >>>>
> >>>> The controller is a imx6dl, this controller can run maximum 800MHz
> >>> Ok, so the new kernel registers two tsc emulations, could you run
> >>> the "tsc" regression test to measure the tsc latency? The two tsc
> >>> emulations have very different latencies, so the result would be
> >>> unmistakable.
> >>>
> >> Output of Kernel 3.10.53 with Xenomai 2.6.4
> >> /usr/xenomai/bin/latency
> > This test is named "latency", not "tsc". As the different names
> > imply, they are not measuring the same thing.
> >
> Sorry for the stupied question,
> but where do I find the "tsc" test, because in folder /usr/xenomai/bin/ 
> is it not located

You have millions of files on your target? Or you compiled busybox
without support for the "find" utility ?

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28  9:29                                 ` Gilles Chanteperdrix
@ 2016-06-28  9:51                                   ` Wolfgang Netbal
  2016-06-28  9:55                                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-28  9:51 UTC (permalink / raw)
  To: gilles.chanteperdrix; +Cc: xenomai



Am 2016-06-28 um 11:29 schrieb Gilles Chanteperdrix:
> On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix:
>>> On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote:
>>>> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix:
>>>>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote:
>>>>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix:
>>>>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
>>>>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
>>>>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
>>>>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>>>>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>>>>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
>>>>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>>>>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
>>>>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>>>>>>>>>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
>>>>>>>>>>>>>>>>>> XENOMAI task with priority = 95.
>>>>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
>>>>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task
>>>>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4.
>>>>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
>>>>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts
>>>>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler
>>>>>>>>>> installed.
>>>>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but
>>>>>>>>>> where do they come from ?
>>>>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
>>>>>>>>>> how long do they need to process ?
>>>>>>>>> How do you mean you do not see them? If you are talking about the
>>>>>>>>> rescheduling API, it used no to be bound to a virq (so, it would
>>>>>>>>> have a different irq number on cortex A9, something between 0 and 31
>>>>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is
>>>>>>>>> before or after that. You do not see them in /proc, or you see them
>>>>>>>>> and their count does not increase?
>>>>>>>> Sorry for the long delay, we ran a lot of tests to find out what could
>>>>>>>> be the reason for
>>>>>>>> the performance difference.
>>>>>>>>
>>>>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to
>>>>>>>> the virtual
>>>>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
>>>>>>>>> As for where they come from, this is not a mystery, the reschedule
>>>>>>>>> IPI is triggered when code on one cpu changes the scheduler state
>>>>>>>>> (wakes up a thread for instance) on another cpu. If you want to
>>>>>>>>> avoid it, do not do that. That means, do not share mutex between
>>>>>>>>> threads running on different cpus, pay attention for timers to be
>>>>>>>>> running on the same cpu as the thread they signal, etc...
>>>>>>>>>
>>>>>>>>> The APC virq is used to multiplex several services, which you can
>>>>>>>>> find by grepping the sources for rthal_apc_alloc:
>>>>>>>>> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
>>>>>>>>> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
>>>>>>>>> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
>>>>>>>>> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
>>>>>>>>> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
>>>>>>>>> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
>>>>>>>>>
>>>>>>>>> It would be interesting to know which of these services is triggered
>>>>>>>>> a lot. One possibility I see would be root thread priority
>>>>>>>>> inheritance, so it would be caused by mode switches. This brings the
>>>>>>>>> question: do your application have threads migrating between primary
>>>>>>>>> and secondary mode, do you see the count of mode switches increase
>>>>>>>>> with the kernel changes, do you have root thread priority
>>>>>>>>> inheritance enabled?
>>>>>>>>>
>>>>>>>> Here a short sum up of our tests and the results and at the end a few
>>>>>>>> questions :-)
>>>>>>>>
>>>>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
>>>>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
>>>>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
>>>>>>>> On both Kernels the CONFIG_SMP is set.
>>>>>>>>
>>>>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95
>>>>>>>> tooks 40% of the CPU time on Kernel 3.0.43
>>>>>>>> and 47% of CPU time on Kernel 3.10.53
>>>>>>>>
>>>>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
>>>>>>>> To find out what is the reason for this difference we ran the following test.
>>>>>>>> We tried to get the new system faster by change some components of the system.
>>>>>>>>
>>>>>>>> -Changing U-Boot on new system                -> still 7% slower
>>>>>>>> -Copy Kernel 3.0.43 to new system            -> still 7% slower
>>>>>>>> -Creating Kernel 3.0.43 with
>>>>>>>>          Xenomai 2.6.4 and copy it to new system    -> still 7% slower
>>>>>>>> -Compiling the new system with
>>>>>>>>          old GCC version                                        -> still 7% slower
>>>>>>>> -We also checked the settings for RAM and CPU clock -> these are equal
>>>>>>>>
>>>>>>>> It looks like that is not one of the big components,
>>>>>>>> so we started to test some special functions like rt_timer_tsc()
>>>>>>>> In the following example we stay for 800µs in the while loop and
>>>>>>>> start this loop again after 200µs delay.
>>>>>>>> The task application running this code has priotity 95.
>>>>>>>>
>>>>>>>> Here a simplified code snipped
>>>>>>>> start = rt_timer_tsc();
>>>>>>>> do
>>>>>>>> {
>>>>>>>> 	current = rt_timer_tsc();
>>>>>>>> 	i++;	
>>>>>>>> } while((current - start) < 800)
>>>>>>> If your CPU is running at 1 GHz and uses the global timer as clock
>>>>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
>>>>>>> something around 1.6 us
>>>>>> Sorry I simplified the code snippet a little bit to much.
>>>>>> Thats the correct code.
>>>>>>
>>>>>> current = rt_timer_tsc2ns(rt_timer_tsc());
>>>>>>
>>>>>>> So, I do not really understand what you are talking about. But are
>>>>>>> you sure the two kernels use the same clocksource for xenomai?
>>>>>>>
>>>>>>> Could you show us the result of "dmesg | grep I-pipe" with the two
>>>>>>> kernels ?
>>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4
>>>>>> I-pipe, 3.000 MHz clocksource
>>>>>> I-pipe, 396.000 MHz clocksource
>>>>>> I-pipe, 396.000 MHz timer
>>>>>> I-pipe, 396.000 MHz timer
>>>>>> I-pipe: head domain Xenomai registered.
>>>>>>
>>>>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1
>>>>>> [    0.000000] I-pipe 1.18-13: pipeline enabled.
>>>>>> [    0.331999] I-pipe, 396.000 MHz timer
>>>>>> [    0.335720] I-pipe, 396.000 MHz clocksource
>>>>>> [    0.844016] I-pipe: Domain Xenomai registered.
>>>>>>
>>>>>> The controller is a imx6dl, this controller can run maximum 800MHz
>>>>> Ok, so the new kernel registers two tsc emulations, could you run
>>>>> the "tsc" regression test to measure the tsc latency? The two tsc
>>>>> emulations have very different latencies, so the result would be
>>>>> unmistakable.
>>>>>
>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4
>>>> /usr/xenomai/bin/latency
>>> This test is named "latency", not "tsc". As the different names
>>> imply, they are not measuring the same thing.
>>>
>> Sorry for the stupied question,
>> but where do I find the "tsc" test, because in folder /usr/xenomai/bin/
>> is it not located
> You have millions of files on your target? Or you compiled busybox
> without support for the "find" utility ?
>
Sorry I searched for tsc befor but didn't find any executable called "tsc"
what I find is /usr/share/ghostscript and its config files and 
/usr/bin/xtscal.

I find in the xenomai sources the file tsc.c, could you please tell me
what option do I have to enable that this will be build ?



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28  9:51                                   ` Wolfgang Netbal
@ 2016-06-28  9:55                                     ` Gilles Chanteperdrix
  2016-06-28 10:10                                       ` Wolfgang Netbal
  0 siblings, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-28  9:55 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Tue, Jun 28, 2016 at 11:51:39AM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-28 um 11:29 schrieb Gilles Chanteperdrix:
> > On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix:
> >>> On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote:
> >>>> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix:
> >>>>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote:
> >>>>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix:
> >>>>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
> >>>>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
> >>>>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>>>>>>>> Dear all,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
> >>>>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
> >>>>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
> >>>>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
> >>>>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
> >>>>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
> >>>>>>>>>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
> >>>>>>>>>>>>>>>>>> XENOMAI task with priority = 95.
> >>>>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
> >>>>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task
> >>>>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4.
> >>>>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
> >>>>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts
> >>>>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler
> >>>>>>>>>> installed.
> >>>>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but
> >>>>>>>>>> where do they come from ?
> >>>>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
> >>>>>>>>>> how long do they need to process ?
> >>>>>>>>> How do you mean you do not see them? If you are talking about the
> >>>>>>>>> rescheduling API, it used no to be bound to a virq (so, it would
> >>>>>>>>> have a different irq number on cortex A9, something between 0 and 31
> >>>>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is
> >>>>>>>>> before or after that. You do not see them in /proc, or you see them
> >>>>>>>>> and their count does not increase?
> >>>>>>>> Sorry for the long delay, we ran a lot of tests to find out what could
> >>>>>>>> be the reason for
> >>>>>>>> the performance difference.
> >>>>>>>>
> >>>>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to
> >>>>>>>> the virtual
> >>>>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
> >>>>>>>>> As for where they come from, this is not a mystery, the reschedule
> >>>>>>>>> IPI is triggered when code on one cpu changes the scheduler state
> >>>>>>>>> (wakes up a thread for instance) on another cpu. If you want to
> >>>>>>>>> avoid it, do not do that. That means, do not share mutex between
> >>>>>>>>> threads running on different cpus, pay attention for timers to be
> >>>>>>>>> running on the same cpu as the thread they signal, etc...
> >>>>>>>>>
> >>>>>>>>> The APC virq is used to multiplex several services, which you can
> >>>>>>>>> find by grepping the sources for rthal_apc_alloc:
> >>>>>>>>> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
> >>>>>>>>> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
> >>>>>>>>> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
> >>>>>>>>> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
> >>>>>>>>> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
> >>>>>>>>> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
> >>>>>>>>>
> >>>>>>>>> It would be interesting to know which of these services is triggered
> >>>>>>>>> a lot. One possibility I see would be root thread priority
> >>>>>>>>> inheritance, so it would be caused by mode switches. This brings the
> >>>>>>>>> question: do your application have threads migrating between primary
> >>>>>>>>> and secondary mode, do you see the count of mode switches increase
> >>>>>>>>> with the kernel changes, do you have root thread priority
> >>>>>>>>> inheritance enabled?
> >>>>>>>>>
> >>>>>>>> Here a short sum up of our tests and the results and at the end a few
> >>>>>>>> questions :-)
> >>>>>>>>
> >>>>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
> >>>>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
> >>>>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
> >>>>>>>> On both Kernels the CONFIG_SMP is set.
> >>>>>>>>
> >>>>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95
> >>>>>>>> tooks 40% of the CPU time on Kernel 3.0.43
> >>>>>>>> and 47% of CPU time on Kernel 3.10.53
> >>>>>>>>
> >>>>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
> >>>>>>>> To find out what is the reason for this difference we ran the following test.
> >>>>>>>> We tried to get the new system faster by change some components of the system.
> >>>>>>>>
> >>>>>>>> -Changing U-Boot on new system                -> still 7% slower
> >>>>>>>> -Copy Kernel 3.0.43 to new system            -> still 7% slower
> >>>>>>>> -Creating Kernel 3.0.43 with
> >>>>>>>>          Xenomai 2.6.4 and copy it to new system    -> still 7% slower
> >>>>>>>> -Compiling the new system with
> >>>>>>>>          old GCC version                                        -> still 7% slower
> >>>>>>>> -We also checked the settings for RAM and CPU clock -> these are equal
> >>>>>>>>
> >>>>>>>> It looks like that is not one of the big components,
> >>>>>>>> so we started to test some special functions like rt_timer_tsc()
> >>>>>>>> In the following example we stay for 800µs in the while loop and
> >>>>>>>> start this loop again after 200µs delay.
> >>>>>>>> The task application running this code has priotity 95.
> >>>>>>>>
> >>>>>>>> Here a simplified code snipped
> >>>>>>>> start = rt_timer_tsc();
> >>>>>>>> do
> >>>>>>>> {
> >>>>>>>> 	current = rt_timer_tsc();
> >>>>>>>> 	i++;	
> >>>>>>>> } while((current - start) < 800)
> >>>>>>> If your CPU is running at 1 GHz and uses the global timer as clock
> >>>>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
> >>>>>>> something around 1.6 us
> >>>>>> Sorry I simplified the code snippet a little bit to much.
> >>>>>> Thats the correct code.
> >>>>>>
> >>>>>> current = rt_timer_tsc2ns(rt_timer_tsc());
> >>>>>>
> >>>>>>> So, I do not really understand what you are talking about. But are
> >>>>>>> you sure the two kernels use the same clocksource for xenomai?
> >>>>>>>
> >>>>>>> Could you show us the result of "dmesg | grep I-pipe" with the two
> >>>>>>> kernels ?
> >>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4
> >>>>>> I-pipe, 3.000 MHz clocksource
> >>>>>> I-pipe, 396.000 MHz clocksource
> >>>>>> I-pipe, 396.000 MHz timer
> >>>>>> I-pipe, 396.000 MHz timer
> >>>>>> I-pipe: head domain Xenomai registered.
> >>>>>>
> >>>>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1
> >>>>>> [    0.000000] I-pipe 1.18-13: pipeline enabled.
> >>>>>> [    0.331999] I-pipe, 396.000 MHz timer
> >>>>>> [    0.335720] I-pipe, 396.000 MHz clocksource
> >>>>>> [    0.844016] I-pipe: Domain Xenomai registered.
> >>>>>>
> >>>>>> The controller is a imx6dl, this controller can run maximum 800MHz
> >>>>> Ok, so the new kernel registers two tsc emulations, could you run
> >>>>> the "tsc" regression test to measure the tsc latency? The two tsc
> >>>>> emulations have very different latencies, so the result would be
> >>>>> unmistakable.
> >>>>>
> >>>> Output of Kernel 3.10.53 with Xenomai 2.6.4
> >>>> /usr/xenomai/bin/latency
> >>> This test is named "latency", not "tsc". As the different names
> >>> imply, they are not measuring the same thing.
> >>>
> >> Sorry for the stupied question,
> >> but where do I find the "tsc" test, because in folder /usr/xenomai/bin/
> >> is it not located
> > You have millions of files on your target? Or you compiled busybox
> > without support for the "find" utility ?
> >
> Sorry I searched for tsc befor but didn't find any executable called "tsc"
> what I find is /usr/share/ghostscript and its config files and 
> /usr/bin/xtscal.
> 
> I find in the xenomai sources the file tsc.c, could you please tell me
> what option do I have to enable that this will be build ?

There is no option that can disable its compilation/installation as
far as I know.

It is normally built and installed under
@XENO_TEST_DIR@/regression/native

So the installation directory depends on the value you passed to
configure --with-testdir option (the default being /usr/bin).

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28  9:55                                     ` Gilles Chanteperdrix
@ 2016-06-28 10:10                                       ` Wolfgang Netbal
  2016-06-28 10:19                                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-28 10:10 UTC (permalink / raw)
  To: gilles.chanteperdrix; +Cc: xenomai



Am 2016-06-28 um 11:55 schrieb Gilles Chanteperdrix:
> On Tue, Jun 28, 2016 at 11:51:39AM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-28 um 11:29 schrieb Gilles Chanteperdrix:
>>> On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote:
>>>> Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix:
>>>>> On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote:
>>>>>> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix:
>>>>>>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote:
>>>>>>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix:
>>>>>>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
>>>>>>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>>>>>>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>>>>>>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
>>>>>>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>>>>>>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
>>>>>>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>>>>>>>>>>>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
>>>>>>>>>>>>>>>>>>>> XENOMAI task with priority = 95.
>>>>>>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
>>>>>>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task
>>>>>>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4.
>>>>>>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
>>>>>>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts
>>>>>>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler
>>>>>>>>>>>> installed.
>>>>>>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but
>>>>>>>>>>>> where do they come from ?
>>>>>>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
>>>>>>>>>>>> how long do they need to process ?
>>>>>>>>>>> How do you mean you do not see them? If you are talking about the
>>>>>>>>>>> rescheduling API, it used no to be bound to a virq (so, it would
>>>>>>>>>>> have a different irq number on cortex A9, something between 0 and 31
>>>>>>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is
>>>>>>>>>>> before or after that. You do not see them in /proc, or you see them
>>>>>>>>>>> and their count does not increase?
>>>>>>>>>> Sorry for the long delay, we ran a lot of tests to find out what could
>>>>>>>>>> be the reason for
>>>>>>>>>> the performance difference.
>>>>>>>>>>
>>>>>>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to
>>>>>>>>>> the virtual
>>>>>>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
>>>>>>>>>>> As for where they come from, this is not a mystery, the reschedule
>>>>>>>>>>> IPI is triggered when code on one cpu changes the scheduler state
>>>>>>>>>>> (wakes up a thread for instance) on another cpu. If you want to
>>>>>>>>>>> avoid it, do not do that. That means, do not share mutex between
>>>>>>>>>>> threads running on different cpus, pay attention for timers to be
>>>>>>>>>>> running on the same cpu as the thread they signal, etc...
>>>>>>>>>>>
>>>>>>>>>>> The APC virq is used to multiplex several services, which you can
>>>>>>>>>>> find by grepping the sources for rthal_apc_alloc:
>>>>>>>>>>> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
>>>>>>>>>>> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
>>>>>>>>>>> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
>>>>>>>>>>> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
>>>>>>>>>>> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
>>>>>>>>>>> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
>>>>>>>>>>>
>>>>>>>>>>> It would be interesting to know which of these services is triggered
>>>>>>>>>>> a lot. One possibility I see would be root thread priority
>>>>>>>>>>> inheritance, so it would be caused by mode switches. This brings the
>>>>>>>>>>> question: do your application have threads migrating between primary
>>>>>>>>>>> and secondary mode, do you see the count of mode switches increase
>>>>>>>>>>> with the kernel changes, do you have root thread priority
>>>>>>>>>>> inheritance enabled?
>>>>>>>>>>>
>>>>>>>>>> Here a short sum up of our tests and the results and at the end a few
>>>>>>>>>> questions :-)
>>>>>>>>>>
>>>>>>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
>>>>>>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
>>>>>>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
>>>>>>>>>> On both Kernels the CONFIG_SMP is set.
>>>>>>>>>>
>>>>>>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95
>>>>>>>>>> tooks 40% of the CPU time on Kernel 3.0.43
>>>>>>>>>> and 47% of CPU time on Kernel 3.10.53
>>>>>>>>>>
>>>>>>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
>>>>>>>>>> To find out what is the reason for this difference we ran the following test.
>>>>>>>>>> We tried to get the new system faster by change some components of the system.
>>>>>>>>>>
>>>>>>>>>> -Changing U-Boot on new system                -> still 7% slower
>>>>>>>>>> -Copy Kernel 3.0.43 to new system            -> still 7% slower
>>>>>>>>>> -Creating Kernel 3.0.43 with
>>>>>>>>>>           Xenomai 2.6.4 and copy it to new system    -> still 7% slower
>>>>>>>>>> -Compiling the new system with
>>>>>>>>>>           old GCC version                                        -> still 7% slower
>>>>>>>>>> -We also checked the settings for RAM and CPU clock -> these are equal
>>>>>>>>>>
>>>>>>>>>> It looks like that is not one of the big components,
>>>>>>>>>> so we started to test some special functions like rt_timer_tsc()
>>>>>>>>>> In the following example we stay for 800µs in the while loop and
>>>>>>>>>> start this loop again after 200µs delay.
>>>>>>>>>> The task application running this code has priotity 95.
>>>>>>>>>>
>>>>>>>>>> Here a simplified code snipped
>>>>>>>>>> start = rt_timer_tsc();
>>>>>>>>>> do
>>>>>>>>>> {
>>>>>>>>>> 	current = rt_timer_tsc();
>>>>>>>>>> 	i++;	
>>>>>>>>>> } while((current - start) < 800)
>>>>>>>>> If your CPU is running at 1 GHz and uses the global timer as clock
>>>>>>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
>>>>>>>>> something around 1.6 us
>>>>>>>> Sorry I simplified the code snippet a little bit to much.
>>>>>>>> Thats the correct code.
>>>>>>>>
>>>>>>>> current = rt_timer_tsc2ns(rt_timer_tsc());
>>>>>>>>
>>>>>>>>> So, I do not really understand what you are talking about. But are
>>>>>>>>> you sure the two kernels use the same clocksource for xenomai?
>>>>>>>>>
>>>>>>>>> Could you show us the result of "dmesg | grep I-pipe" with the two
>>>>>>>>> kernels ?
>>>>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4
>>>>>>>> I-pipe, 3.000 MHz clocksource
>>>>>>>> I-pipe, 396.000 MHz clocksource
>>>>>>>> I-pipe, 396.000 MHz timer
>>>>>>>> I-pipe, 396.000 MHz timer
>>>>>>>> I-pipe: head domain Xenomai registered.
>>>>>>>>
>>>>>>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1
>>>>>>>> [    0.000000] I-pipe 1.18-13: pipeline enabled.
>>>>>>>> [    0.331999] I-pipe, 396.000 MHz timer
>>>>>>>> [    0.335720] I-pipe, 396.000 MHz clocksource
>>>>>>>> [    0.844016] I-pipe: Domain Xenomai registered.
>>>>>>>>
>>>>>>>> The controller is a imx6dl, this controller can run maximum 800MHz
>>>>>>> Ok, so the new kernel registers two tsc emulations, could you run
>>>>>>> the "tsc" regression test to measure the tsc latency? The two tsc
>>>>>>> emulations have very different latencies, so the result would be
>>>>>>> unmistakable.
>>>>>>>
>>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4
>>>>>> /usr/xenomai/bin/latency
>>>>> This test is named "latency", not "tsc". As the different names
>>>>> imply, they are not measuring the same thing.
>>>>>
>>>> Sorry for the stupied question,
>>>> but where do I find the "tsc" test, because in folder /usr/xenomai/bin/
>>>> is it not located
>>> You have millions of files on your target? Or you compiled busybox
>>> without support for the "find" utility ?
>>>
>> Sorry I searched for tsc befor but didn't find any executable called "tsc"
>> what I find is /usr/share/ghostscript and its config files and
>> /usr/bin/xtscal.
>>
>> I find in the xenomai sources the file tsc.c, could you please tell me
>> what option do I have to enable that this will be build ?
> There is no option that can disable its compilation/installation as
> far as I know.
>
> It is normally built and installed under
> @XENO_TEST_DIR@/regression/native
>
> So the installation directory depends on the value you passed to
> configure --with-testdir option (the default being /usr/bin).
>
Thanks a lot I found it and downloaded it to my target

Here are the output for Kernel 3.10.53 and Xenomai 2.6.4

#> ./tsc
Checking tsc for 1 minute(s)
min: 10, max: 596, avg: 10.5056
min: 10, max: 595, avg: 10.5053
min: 10, max: 603, avg: 10.5053
min: 10, max: 630, avg: 10.5052
min: 10, max: 600, avg: 10.505
min: 10, max: 595, avg: 10.5056
min: 10, max: 562, avg: 10.505
min: 10, max: 605, avg: 10.5056
min: 10, max: 602, avg: 10.5055
min: 10, max: 595, avg: 10.5052


Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1
#> ./tsc
Checking tsc for 1 minute(s)
min: 10, max: 611, avg: 11.5499
min: 10, max: 608, avg: 11.5443
min: 10, max: 81, avg: 11.5265
min: 10, max: 53, avg: 11.5155
min: 10, max: 152, avg: 11.5239
min: 10, max: 618, avg: 11.5352
min: 10, max: 588, avg: 11.5398
min: 10, max: 81, avg: 11.5269
min: 10, max: 532, avg: 11.5541
min: 10, max: 80, avg: 11.5394




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28 10:10                                       ` Wolfgang Netbal
@ 2016-06-28 10:19                                         ` Gilles Chanteperdrix
  2016-06-28 10:31                                           ` Wolfgang Netbal
  0 siblings, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-28 10:19 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Tue, Jun 28, 2016 at 12:10:06PM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-28 um 11:55 schrieb Gilles Chanteperdrix:
> > On Tue, Jun 28, 2016 at 11:51:39AM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-06-28 um 11:29 schrieb Gilles Chanteperdrix:
> >>> On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote:
> >>>> Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix:
> >>>>> On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote:
> >>>>>> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix:
> >>>>>>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote:
> >>>>>>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix:
> >>>>>>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
> >>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
> >>>>>>>>>>>>>>>>>>>> Dear all,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
> >>>>>>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
> >>>>>>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
> >>>>>>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
> >>>>>>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
> >>>>>>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
> >>>>>>>>>>>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
> >>>>>>>>>>>>>>>>>>>> XENOMAI task with priority = 95.
> >>>>>>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
> >>>>>>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task
> >>>>>>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4.
> >>>>>>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
> >>>>>>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts
> >>>>>>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler
> >>>>>>>>>>>> installed.
> >>>>>>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but
> >>>>>>>>>>>> where do they come from ?
> >>>>>>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
> >>>>>>>>>>>> how long do they need to process ?
> >>>>>>>>>>> How do you mean you do not see them? If you are talking about the
> >>>>>>>>>>> rescheduling API, it used no to be bound to a virq (so, it would
> >>>>>>>>>>> have a different irq number on cortex A9, something between 0 and 31
> >>>>>>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is
> >>>>>>>>>>> before or after that. You do not see them in /proc, or you see them
> >>>>>>>>>>> and their count does not increase?
> >>>>>>>>>> Sorry for the long delay, we ran a lot of tests to find out what could
> >>>>>>>>>> be the reason for
> >>>>>>>>>> the performance difference.
> >>>>>>>>>>
> >>>>>>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to
> >>>>>>>>>> the virtual
> >>>>>>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
> >>>>>>>>>>> As for where they come from, this is not a mystery, the reschedule
> >>>>>>>>>>> IPI is triggered when code on one cpu changes the scheduler state
> >>>>>>>>>>> (wakes up a thread for instance) on another cpu. If you want to
> >>>>>>>>>>> avoid it, do not do that. That means, do not share mutex between
> >>>>>>>>>>> threads running on different cpus, pay attention for timers to be
> >>>>>>>>>>> running on the same cpu as the thread they signal, etc...
> >>>>>>>>>>>
> >>>>>>>>>>> The APC virq is used to multiplex several services, which you can
> >>>>>>>>>>> find by grepping the sources for rthal_apc_alloc:
> >>>>>>>>>>> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
> >>>>>>>>>>> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
> >>>>>>>>>>> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
> >>>>>>>>>>> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
> >>>>>>>>>>> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
> >>>>>>>>>>> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
> >>>>>>>>>>>
> >>>>>>>>>>> It would be interesting to know which of these services is triggered
> >>>>>>>>>>> a lot. One possibility I see would be root thread priority
> >>>>>>>>>>> inheritance, so it would be caused by mode switches. This brings the
> >>>>>>>>>>> question: do your application have threads migrating between primary
> >>>>>>>>>>> and secondary mode, do you see the count of mode switches increase
> >>>>>>>>>>> with the kernel changes, do you have root thread priority
> >>>>>>>>>>> inheritance enabled?
> >>>>>>>>>>>
> >>>>>>>>>> Here a short sum up of our tests and the results and at the end a few
> >>>>>>>>>> questions :-)
> >>>>>>>>>>
> >>>>>>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
> >>>>>>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
> >>>>>>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
> >>>>>>>>>> On both Kernels the CONFIG_SMP is set.
> >>>>>>>>>>
> >>>>>>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95
> >>>>>>>>>> tooks 40% of the CPU time on Kernel 3.0.43
> >>>>>>>>>> and 47% of CPU time on Kernel 3.10.53
> >>>>>>>>>>
> >>>>>>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
> >>>>>>>>>> To find out what is the reason for this difference we ran the following test.
> >>>>>>>>>> We tried to get the new system faster by change some components of the system.
> >>>>>>>>>>
> >>>>>>>>>> -Changing U-Boot on new system                -> still 7% slower
> >>>>>>>>>> -Copy Kernel 3.0.43 to new system            -> still 7% slower
> >>>>>>>>>> -Creating Kernel 3.0.43 with
> >>>>>>>>>>           Xenomai 2.6.4 and copy it to new system    -> still 7% slower
> >>>>>>>>>> -Compiling the new system with
> >>>>>>>>>>           old GCC version                                        -> still 7% slower
> >>>>>>>>>> -We also checked the settings for RAM and CPU clock -> these are equal
> >>>>>>>>>>
> >>>>>>>>>> It looks like that is not one of the big components,
> >>>>>>>>>> so we started to test some special functions like rt_timer_tsc()
> >>>>>>>>>> In the following example we stay for 800µs in the while loop and
> >>>>>>>>>> start this loop again after 200µs delay.
> >>>>>>>>>> The task application running this code has priotity 95.
> >>>>>>>>>>
> >>>>>>>>>> Here a simplified code snipped
> >>>>>>>>>> start = rt_timer_tsc();
> >>>>>>>>>> do
> >>>>>>>>>> {
> >>>>>>>>>> 	current = rt_timer_tsc();
> >>>>>>>>>> 	i++;	
> >>>>>>>>>> } while((current - start) < 800)
> >>>>>>>>> If your CPU is running at 1 GHz and uses the global timer as clock
> >>>>>>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
> >>>>>>>>> something around 1.6 us
> >>>>>>>> Sorry I simplified the code snippet a little bit to much.
> >>>>>>>> Thats the correct code.
> >>>>>>>>
> >>>>>>>> current = rt_timer_tsc2ns(rt_timer_tsc());
> >>>>>>>>
> >>>>>>>>> So, I do not really understand what you are talking about. But are
> >>>>>>>>> you sure the two kernels use the same clocksource for xenomai?
> >>>>>>>>>
> >>>>>>>>> Could you show us the result of "dmesg | grep I-pipe" with the two
> >>>>>>>>> kernels ?
> >>>>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4
> >>>>>>>> I-pipe, 3.000 MHz clocksource
> >>>>>>>> I-pipe, 396.000 MHz clocksource
> >>>>>>>> I-pipe, 396.000 MHz timer
> >>>>>>>> I-pipe, 396.000 MHz timer
> >>>>>>>> I-pipe: head domain Xenomai registered.
> >>>>>>>>
> >>>>>>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1
> >>>>>>>> [    0.000000] I-pipe 1.18-13: pipeline enabled.
> >>>>>>>> [    0.331999] I-pipe, 396.000 MHz timer
> >>>>>>>> [    0.335720] I-pipe, 396.000 MHz clocksource
> >>>>>>>> [    0.844016] I-pipe: Domain Xenomai registered.
> >>>>>>>>
> >>>>>>>> The controller is a imx6dl, this controller can run maximum 800MHz
> >>>>>>> Ok, so the new kernel registers two tsc emulations, could you run
> >>>>>>> the "tsc" regression test to measure the tsc latency? The two tsc
> >>>>>>> emulations have very different latencies, so the result would be
> >>>>>>> unmistakable.
> >>>>>>>
> >>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4
> >>>>>> /usr/xenomai/bin/latency
> >>>>> This test is named "latency", not "tsc". As the different names
> >>>>> imply, they are not measuring the same thing.
> >>>>>
> >>>> Sorry for the stupied question,
> >>>> but where do I find the "tsc" test, because in folder /usr/xenomai/bin/
> >>>> is it not located
> >>> You have millions of files on your target? Or you compiled busybox
> >>> without support for the "find" utility ?
> >>>
> >> Sorry I searched for tsc befor but didn't find any executable called "tsc"
> >> what I find is /usr/share/ghostscript and its config files and
> >> /usr/bin/xtscal.
> >>
> >> I find in the xenomai sources the file tsc.c, could you please tell me
> >> what option do I have to enable that this will be build ?
> > There is no option that can disable its compilation/installation as
> > far as I know.
> >
> > It is normally built and installed under
> > @XENO_TEST_DIR@/regression/native
> >
> > So the installation directory depends on the value you passed to
> > configure --with-testdir option (the default being /usr/bin).
> >
> Thanks a lot I found it and downloaded it to my target
> 
> Here are the output for Kernel 3.10.53 and Xenomai 2.6.4
> 
> #> ./tsc
> Checking tsc for 1 minute(s)
> min: 10, max: 596, avg: 10.5056
> min: 10, max: 595, avg: 10.5053
> min: 10, max: 603, avg: 10.5053
> min: 10, max: 630, avg: 10.5052
> min: 10, max: 600, avg: 10.505
> min: 10, max: 595, avg: 10.5056
> min: 10, max: 562, avg: 10.505
> min: 10, max: 605, avg: 10.5056
> min: 10, max: 602, avg: 10.5055
> min: 10, max: 595, avg: 10.5052

Could you let the test run until the end ? (1 minute is not so long).

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28 10:19                                         ` Gilles Chanteperdrix
@ 2016-06-28 10:31                                           ` Wolfgang Netbal
  2016-06-28 10:39                                             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-28 10:31 UTC (permalink / raw)
  To: gilles.chanteperdrix; +Cc: xenomai



Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix:
> On Tue, Jun 28, 2016 at 12:10:06PM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-28 um 11:55 schrieb Gilles Chanteperdrix:
>>> On Tue, Jun 28, 2016 at 11:51:39AM +0200, Wolfgang Netbal wrote:
>>>> Am 2016-06-28 um 11:29 schrieb Gilles Chanteperdrix:
>>>>> On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote:
>>>>>> Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix:
>>>>>>> On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote:
>>>>>>>> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix:
>>>>>>>>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote:
>>>>>>>>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix:
>>>>>>>>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>>>>>>>>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>>>>>>>>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
>>>>>>>>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>>>>>>>>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
>>>>>>>>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>>>>>>>>>>>>>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
>>>>>>>>>>>>>>>>>>>>>> XENOMAI task with priority = 95.
>>>>>>>>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
>>>>>>>>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task
>>>>>>>>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4.
>>>>>>>>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
>>>>>>>>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts
>>>>>>>>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler
>>>>>>>>>>>>>> installed.
>>>>>>>>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but
>>>>>>>>>>>>>> where do they come from ?
>>>>>>>>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
>>>>>>>>>>>>>> how long do they need to process ?
>>>>>>>>>>>>> How do you mean you do not see them? If you are talking about the
>>>>>>>>>>>>> rescheduling API, it used no to be bound to a virq (so, it would
>>>>>>>>>>>>> have a different irq number on cortex A9, something between 0 and 31
>>>>>>>>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is
>>>>>>>>>>>>> before or after that. You do not see them in /proc, or you see them
>>>>>>>>>>>>> and their count does not increase?
>>>>>>>>>>>> Sorry for the long delay, we ran a lot of tests to find out what could
>>>>>>>>>>>> be the reason for
>>>>>>>>>>>> the performance difference.
>>>>>>>>>>>>
>>>>>>>>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to
>>>>>>>>>>>> the virtual
>>>>>>>>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
>>>>>>>>>>>>> As for where they come from, this is not a mystery, the reschedule
>>>>>>>>>>>>> IPI is triggered when code on one cpu changes the scheduler state
>>>>>>>>>>>>> (wakes up a thread for instance) on another cpu. If you want to
>>>>>>>>>>>>> avoid it, do not do that. That means, do not share mutex between
>>>>>>>>>>>>> threads running on different cpus, pay attention for timers to be
>>>>>>>>>>>>> running on the same cpu as the thread they signal, etc...
>>>>>>>>>>>>>
>>>>>>>>>>>>> The APC virq is used to multiplex several services, which you can
>>>>>>>>>>>>> find by grepping the sources for rthal_apc_alloc:
>>>>>>>>>>>>> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
>>>>>>>>>>>>> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
>>>>>>>>>>>>> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
>>>>>>>>>>>>> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
>>>>>>>>>>>>> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
>>>>>>>>>>>>> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
>>>>>>>>>>>>>
>>>>>>>>>>>>> It would be interesting to know which of these services is triggered
>>>>>>>>>>>>> a lot. One possibility I see would be root thread priority
>>>>>>>>>>>>> inheritance, so it would be caused by mode switches. This brings the
>>>>>>>>>>>>> question: do your application have threads migrating between primary
>>>>>>>>>>>>> and secondary mode, do you see the count of mode switches increase
>>>>>>>>>>>>> with the kernel changes, do you have root thread priority
>>>>>>>>>>>>> inheritance enabled?
>>>>>>>>>>>>>
>>>>>>>>>>>> Here a short sum up of our tests and the results and at the end a few
>>>>>>>>>>>> questions :-)
>>>>>>>>>>>>
>>>>>>>>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
>>>>>>>>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
>>>>>>>>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
>>>>>>>>>>>> On both Kernels the CONFIG_SMP is set.
>>>>>>>>>>>>
>>>>>>>>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95
>>>>>>>>>>>> tooks 40% of the CPU time on Kernel 3.0.43
>>>>>>>>>>>> and 47% of CPU time on Kernel 3.10.53
>>>>>>>>>>>>
>>>>>>>>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
>>>>>>>>>>>> To find out what is the reason for this difference we ran the following test.
>>>>>>>>>>>> We tried to get the new system faster by change some components of the system.
>>>>>>>>>>>>
>>>>>>>>>>>> -Changing U-Boot on new system                -> still 7% slower
>>>>>>>>>>>> -Copy Kernel 3.0.43 to new system            -> still 7% slower
>>>>>>>>>>>> -Creating Kernel 3.0.43 with
>>>>>>>>>>>>            Xenomai 2.6.4 and copy it to new system    -> still 7% slower
>>>>>>>>>>>> -Compiling the new system with
>>>>>>>>>>>>            old GCC version                                        -> still 7% slower
>>>>>>>>>>>> -We also checked the settings for RAM and CPU clock -> these are equal
>>>>>>>>>>>>
>>>>>>>>>>>> It looks like that is not one of the big components,
>>>>>>>>>>>> so we started to test some special functions like rt_timer_tsc()
>>>>>>>>>>>> In the following example we stay for 800µs in the while loop and
>>>>>>>>>>>> start this loop again after 200µs delay.
>>>>>>>>>>>> The task application running this code has priotity 95.
>>>>>>>>>>>>
>>>>>>>>>>>> Here a simplified code snipped
>>>>>>>>>>>> start = rt_timer_tsc();
>>>>>>>>>>>> do
>>>>>>>>>>>> {
>>>>>>>>>>>> 	current = rt_timer_tsc();
>>>>>>>>>>>> 	i++;	
>>>>>>>>>>>> } while((current - start) < 800)
>>>>>>>>>>> If your CPU is running at 1 GHz and uses the global timer as clock
>>>>>>>>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
>>>>>>>>>>> something around 1.6 us
>>>>>>>>>> Sorry I simplified the code snippet a little bit to much.
>>>>>>>>>> Thats the correct code.
>>>>>>>>>>
>>>>>>>>>> current = rt_timer_tsc2ns(rt_timer_tsc());
>>>>>>>>>>
>>>>>>>>>>> So, I do not really understand what you are talking about. But are
>>>>>>>>>>> you sure the two kernels use the same clocksource for xenomai?
>>>>>>>>>>>
>>>>>>>>>>> Could you show us the result of "dmesg | grep I-pipe" with the two
>>>>>>>>>>> kernels ?
>>>>>>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4
>>>>>>>>>> I-pipe, 3.000 MHz clocksource
>>>>>>>>>> I-pipe, 396.000 MHz clocksource
>>>>>>>>>> I-pipe, 396.000 MHz timer
>>>>>>>>>> I-pipe, 396.000 MHz timer
>>>>>>>>>> I-pipe: head domain Xenomai registered.
>>>>>>>>>>
>>>>>>>>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1
>>>>>>>>>> [    0.000000] I-pipe 1.18-13: pipeline enabled.
>>>>>>>>>> [    0.331999] I-pipe, 396.000 MHz timer
>>>>>>>>>> [    0.335720] I-pipe, 396.000 MHz clocksource
>>>>>>>>>> [    0.844016] I-pipe: Domain Xenomai registered.
>>>>>>>>>>
>>>>>>>>>> The controller is a imx6dl, this controller can run maximum 800MHz
>>>>>>>>> Ok, so the new kernel registers two tsc emulations, could you run
>>>>>>>>> the "tsc" regression test to measure the tsc latency? The two tsc
>>>>>>>>> emulations have very different latencies, so the result would be
>>>>>>>>> unmistakable.
>>>>>>>>>
>>>>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4
>>>>>>>> /usr/xenomai/bin/latency
>>>>>>> This test is named "latency", not "tsc". As the different names
>>>>>>> imply, they are not measuring the same thing.
>>>>>>>
>>>>>> Sorry for the stupied question,
>>>>>> but where do I find the "tsc" test, because in folder /usr/xenomai/bin/
>>>>>> is it not located
>>>>> You have millions of files on your target? Or you compiled busybox
>>>>> without support for the "find" utility ?
>>>>>
>>>> Sorry I searched for tsc befor but didn't find any executable called "tsc"
>>>> what I find is /usr/share/ghostscript and its config files and
>>>> /usr/bin/xtscal.
>>>>
>>>> I find in the xenomai sources the file tsc.c, could you please tell me
>>>> what option do I have to enable that this will be build ?
>>> There is no option that can disable its compilation/installation as
>>> far as I know.
>>>
>>> It is normally built and installed under
>>> @XENO_TEST_DIR@/regression/native
>>>
>>> So the installation directory depends on the value you passed to
>>> configure --with-testdir option (the default being /usr/bin).
>>>
>> Thanks a lot I found it and downloaded it to my target
>>
>> Here are the output for Kernel 3.10.53 and Xenomai 2.6.4
>>
>> #> ./tsc
>> Checking tsc for 1 minute(s)
>> min: 10, max: 596, avg: 10.5056
>> min: 10, max: 595, avg: 10.5053
>> min: 10, max: 603, avg: 10.5053
>> min: 10, max: 630, avg: 10.5052
>> min: 10, max: 600, avg: 10.505
>> min: 10, max: 595, avg: 10.5056
>> min: 10, max: 562, avg: 10.505
>> min: 10, max: 605, avg: 10.5056
>> min: 10, max: 602, avg: 10.5055
>> min: 10, max: 595, avg: 10.5052
> Could you let the test run until the end ? (1 minute is not so long).
>
Sorry I dont want to overload the mail

Here are the output for Kernel 3.10.53 and Xenomai 2.6.4


#> ./tsc
Checking tsc for 1 minute(s)
min: 10, max: 595, avg: 10.5055
min: 10, max: 596, avg: 10.5048
min: 10, max: 595, avg: 10.5048
min: 10, max: 563, avg: 10.5047
min: 10, max: 594, avg: 10.505
min: 10, max: 652, avg: 10.5047
min: 10, max: 604, avg: 10.505
min: 10, max: 595, avg: 10.5045
min: 10, max: 595, avg: 10.5046
min: 10, max: 594, avg: 10.5046
min: 10, max: 582, avg: 10.5045
min: 10, max: 599, avg: 10.5046
min: 10, max: 379, avg: 10.5046
min: 10, max: 571, avg: 10.5046
min: 10, max: 534, avg: 10.5051
min: 10, max: 591, avg: 10.5046
min: 10, max: 595, avg: 10.5048
min: 10, max: 630, avg: 10.5049
min: 10, max: 534, avg: 10.5047
min: 10, max: 564, avg: 10.5048
min: 10, max: 520, avg: 10.5049
min: 10, max: 255, avg: 10.5047
min: 10, max: 594, avg: 10.5047
min: 10, max: 677, avg: 10.505
min: 10, max: 128, avg: 10.5053
min: 10, max: 530, avg: 10.5046
min: 10, max: 595, avg: 10.5047
min: 10, max: 595, avg: 10.5048
min: 10, max: 121, avg: 10.5044
min: 10, max: 595, avg: 10.5053
min: 10, max: 598, avg: 10.505
min: 10, max: 595, avg: 10.505
min: 10, max: 595, avg: 10.505
min: 10, max: 598, avg: 10.5047
min: 10, max: 143, avg: 10.5058
min: 10, max: 125, avg: 10.505
min: 10, max: 202, avg: 10.5047
min: 10, max: 595, avg: 10.5047
min: 10, max: 295, avg: 10.5049
min: 10, max: 156, avg: 10.5047
min: 10, max: 546, avg: 10.5049
min: 10, max: 547, avg: 10.5048
min: 10, max: 153, avg: 10.5048
min: 10, max: 116, avg: 10.5046
min: 10, max: 591, avg: 10.5052
min: 10, max: 594, avg: 10.5046
min: 10, max: 563, avg: 10.5048
min: 10, max: 101, avg: 10.5046
min: 10, max: 117, avg: 10.5048
min: 10, max: 608, avg: 10.505
min: 10, max: 593, avg: 10.5048
min: 10, max: 617, avg: 10.5047
min: 10, max: 596, avg: 10.505
min: 10, max: 529, avg: 10.5048
min: 10, max: 557, avg: 10.5052
min: 10, max: 104, avg: 10.5047
min: 10, max: 595, avg: 10.5048
min: 10, max: 595, avg: 10.5046
min: 10, max: 115, avg: 10.5045
min: 10, max: 196, avg: 10.5046
min: 10, max: 677, avg: 10.5048 -> 0.0265273 us

Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1

#> ./tsc
Checking tsc for 1 minute(s)
min: 10, max: 141, avg: 11.5371
min: 10, max: 181, avg: 11.5377
min: 10, max: 512, avg: 11.5444
min: 10, max: 93, avg: 11.5335
min: 10, max: 578, avg: 11.5581
min: 10, max: 81, avg: 11.5838
min: 10, max: 290, avg: 11.5455
min: 10, max: 105, avg: 11.5441
min: 10, max: 543, avg: 11.5648
min: 10, max: 111, avg: 11.5602
min: 10, max: 96, avg: 11.5607
min: 10, max: 77, avg: 11.5758
min: 10, max: 122, avg: 11.5601
min: 10, max: 78, avg: 11.5494
min: 10, max: 142, avg: 11.581
min: 10, max: 95, avg: 11.5513
min: 10, max: 142, avg: 11.5825
min: 10, max: 580, avg: 11.5829
min: 10, max: 104, avg: 11.5656
min: 10, max: 77, avg: 11.5833
min: 10, max: 667, avg: 11.5623
min: 10, max: 91, avg: 11.5742
min: 10, max: 656, avg: 11.5742
min: 10, max: 78, avg: 11.5866
min: 10, max: 428, avg: 11.5958
min: 10, max: 77, avg: 11.5945
min: 10, max: 81, avg: 11.5948
min: 10, max: 96, avg: 11.5787
min: 10, max: 85, avg: 11.5975
min: 10, max: 122, avg: 11.6049
min: 10, max: 516, avg: 11.5776
min: 10, max: 95, avg: 11.597
min: 10, max: 550, avg: 11.5868
min: 10, max: 147, avg: 11.6125
min: 10, max: 582, avg: 11.611
min: 10, max: 541, avg: 11.6105
min: 10, max: 154, avg: 11.5942
min: 10, max: 94, avg: 11.6217
min: 10, max: 74, avg: 11.6066
min: 10, max: 91, avg: 11.653
min: 10, max: 539, avg: 11.6248
min: 10, max: 166, avg: 11.5915
min: 10, max: 81, avg: 11.6156
min: 10, max: 111, avg: 11.6099
min: 10, max: 578, avg: 11.6348
min: 10, max: 95, avg: 11.6329
min: 10, max: 640, avg: 11.629
min: 10, max: 597, avg: 11.6336
min: 10, max: 95, avg: 11.615
min: 10, max: 555, avg: 11.5757
min: 10, max: 94, avg: 11.5749
min: 10, max: 530, avg: 11.5542
min: 10, max: 82, avg: 11.5346
min: 10, max: 119, avg: 11.5466
min: 10, max: 94, avg: 11.5169
min: 10, max: 173, avg: 11.5044
min: 10, max: 147, avg: 11.5309
min: 10, max: 196, avg: 11.4833
min: 10, max: 542, avg: 11.4979
min: 10, max: 95, avg: 11.4895
min: 10, max: 667, avg: 11.5755 -> 0.029231 us




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28 10:31                                           ` Wolfgang Netbal
@ 2016-06-28 10:39                                             ` Gilles Chanteperdrix
  2016-06-28 11:45                                               ` Wolfgang Netbal
  2016-06-28 11:55                                               ` Wolfgang Netbal
  0 siblings, 2 replies; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-28 10:39 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix:
> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us
> 
> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1
> 
> #> ./tsc
> min: 10, max: 667, avg: 11.5755 -> 0.029231 us

Ok. So, first it confirms that the two configurations are running
the processor at the same frequency. But we seem to see a pattern,
the maxima in the case of the new kernel seems consistently higher.
Which would suggest that there is some difference in the cache. What
is the status of the two configurations with regard to the L2 cache
write allocate policy? Could you show us the tsc results of Xenomai
2.6.4 with the 3.0 kernel ?

-- 
					    Gilles.
https://click-hack.org

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28 10:39                                             ` Gilles Chanteperdrix
@ 2016-06-28 11:45                                               ` Wolfgang Netbal
  2016-06-28 11:57                                                 ` Gilles Chanteperdrix
  2016-06-28 11:55                                               ` Wolfgang Netbal
  1 sibling, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-28 11:45 UTC (permalink / raw)
  To: xenomai



Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix:
> On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix:
>> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us
>>
>> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1
>>
>> #> ./tsc
>> min: 10, max: 667, avg: 11.5755 -> 0.029231 us
> Ok. So, first it confirms that the two configurations are running
> the processor at the same frequency. But we seem to see a pattern,
> the maxima in the case of the new kernel seems consistently higher.
> Which would suggest that there is some difference in the cache. What
> is the status of the two configurations with regard to the L2 cache
> write allocate policy? Could you show us the tsc results of Xenomai
> 2.6.4 with the 3.0 kernel ?
As requested I created a Kernel 3.0.43 with Xenomai 2.6.4
#> dmesg | grep "Linux version"
Linux version 3.0.43 (netwol@DSWUB001) (gcc version 4.7.2 (GCC) ) #186 
SMP PREEMPT Tue Jun 28 13:28:40 CEST 2016

#> dmesg | grep "Xenomai"
[    0.844697] I-pipe: Domain Xenomai registered.
[    0.849188] Xenomai: hal/arm started.
[    0.853189] Xenomai: scheduling class idle registered.
[    0.858350] Xenomai: scheduling class rt registered.
[    0.882246] Xenomai: real-time nucleus v2.6.4 (Jumpin' Out) loaded.
[    0.888926] Xenomai: starting native API services.
[    0.893811] Xenomai: starting RTDM services.

#> dmesg | grep I-pipe
[    0.000000] I-pipe 1.18-13: pipeline enabled.
[    0.331174] I-pipe, 396.000 MHz timer
[    0.334900] I-pipe, 396.000 MHz clocksource
[    0.844697] I-pipe: Domain Xenomai registered.


Here the output of tsc
#> ./tsc
Checking tsc for 1 minute(s)
min: 10, max: 24, avg: 10.5
min: 10, max: 24, avg: 10.5
min: 10, max: 26, avg: 10.5
min: 10, max: 22, avg: 10.5
min: 10, max: 26, avg: 10.5
min: 10, max: 18, avg: 10.5
min: 10, max: 51, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 22, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 32, avg: 10.5
min: 10, max: 23, avg: 10.5
min: 10, max: 47, avg: 10.5
min: 10, max: 35, avg: 10.5
min: 10, max: 29, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 345, avg: 10.5
min: 10, max: 23, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 23, avg: 10.5
min: 10, max: 42, avg: 10.5
min: 10, max: 25, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 23, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 38, avg: 10.5
min: 10, max: 35, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 28, avg: 10.5
min: 10, max: 25, avg: 10.5
min: 10, max: 22, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 26, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 24, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 19, avg: 10.5
min: 10, max: 23, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 35, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 25, avg: 10.5
min: 10, max: 24, avg: 10.5
min: 10, max: 36, avg: 10.5
min: 10, max: 35, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 34, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 36, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 26, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 26, avg: 10.5
min: 10, max: 40, avg: 10.5
min: 10, max: 23, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 21, avg: 10.5
min: 10, max: 345, avg: 10.5 -> 0.0265152 us



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28 10:39                                             ` Gilles Chanteperdrix
  2016-06-28 11:45                                               ` Wolfgang Netbal
@ 2016-06-28 11:55                                               ` Wolfgang Netbal
  2016-06-28 12:01                                                 ` Gilles Chanteperdrix
  1 sibling, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-28 11:55 UTC (permalink / raw)
  To: xenomai



Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix:
> On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix:
>> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us
>>
>> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1
>>
>> #> ./tsc
>> min: 10, max: 667, avg: 11.5755 -> 0.029231 us
> Ok. So, first it confirms that the two configurations are running
> the processor at the same frequency. But we seem to see a pattern,
> the maxima in the case of the new kernel seems consistently higher.
> Which would suggest that there is some difference in the cache. What
> is the status of the two configurations with regard to the L2 cache
> write allocate policy?
Do you mean the configuration we checked in this request
https://xenomai.org/pipermail/xenomai/2016-June/036390.html
?
> Could you show us the tsc results of Xenomai
> 2.6.4 with the 3.0 kernel ?
>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28 11:45                                               ` Wolfgang Netbal
@ 2016-06-28 11:57                                                 ` Gilles Chanteperdrix
  0 siblings, 0 replies; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-28 11:57 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Tue, Jun 28, 2016 at 01:45:59PM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix:
> > On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix:
> >> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us
> >>
> >> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1
> >>
> >> #> ./tsc
> >> min: 10, max: 667, avg: 11.5755 -> 0.029231 us
> > Ok. So, first it confirms that the two configurations are running
> > the processor at the same frequency. But we seem to see a pattern,
> > the maxima in the case of the new kernel seems consistently higher.
> > Which would suggest that there is some difference in the cache. What
> > is the status of the two configurations with regard to the L2 cache
> > write allocate policy? Could you show us the tsc results of Xenomai
> > 2.6.4 with the 3.0 kernel ?
> As requested I created a Kernel 3.0.43 with Xenomai 2.6.4
> #> dmesg | grep "Linux version"
> Linux version 3.0.43 (netwol@DSWUB001) (gcc version 4.7.2 (GCC) ) #186 
> SMP PREEMPT Tue Jun 28 13:28:40 CEST 2016
> 
> #> dmesg | grep "Xenomai"
> [    0.844697] I-pipe: Domain Xenomai registered.
> [    0.849188] Xenomai: hal/arm started.
> [    0.853189] Xenomai: scheduling class idle registered.
> [    0.858350] Xenomai: scheduling class rt registered.
> [    0.882246] Xenomai: real-time nucleus v2.6.4 (Jumpin' Out) loaded.
> [    0.888926] Xenomai: starting native API services.
> [    0.893811] Xenomai: starting RTDM services.
> 
> #> dmesg | grep I-pipe
> [    0.000000] I-pipe 1.18-13: pipeline enabled.
> [    0.331174] I-pipe, 396.000 MHz timer
> [    0.334900] I-pipe, 396.000 MHz clocksource
> [    0.844697] I-pipe: Domain Xenomai registered.
> 
> 
> Here the output of tsc
> min: 10, max: 345, avg: 10.5 -> 0.0265152 us

Ok, so 3.0.43 with 2.6.4 has the same consistent behaviour with
regard to the __xn_rdtsc() latency as 3.0.43 with 2.6.2.1. So:

- if you find 2.6.4 with 3.0.43 slower than 3.0.43 with 2.6.2.1, you
can remove the kernel version change from the mix and do your tests
from now on exclusively with the kernel 3.0.43, and the tsc latency
is unlikely to be the cause of the performance difference. To really
make sure of that, you can replace __xn_rdtsc() in tsc.c with a call
to rt_timer_tsc(), recompile and rerun on the two remaining
configurations.

- if you find 2.6.4 with 3.0.43 is as fast as 3.0.43 with 2.6.2.1,
you can remove the Xenomai version change from the mix and do you
from now on exclusively with Xenomai 2.6.4. And the question about
differences in cache configuration remains.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28 11:55                                               ` Wolfgang Netbal
@ 2016-06-28 12:01                                                 ` Gilles Chanteperdrix
  2016-06-28 14:32                                                   ` Wolfgang Netbal
  0 siblings, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-28 12:01 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Tue, Jun 28, 2016 at 01:55:27PM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix:
> > On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix:
> >> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us
> >>
> >> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1
> >>
> >> #> ./tsc
> >> min: 10, max: 667, avg: 11.5755 -> 0.029231 us
> > Ok. So, first it confirms that the two configurations are running
> > the processor at the same frequency. But we seem to see a pattern,
> > the maxima in the case of the new kernel seems consistently higher.
> > Which would suggest that there is some difference in the cache. What
> > is the status of the two configurations with regard to the L2 cache
> > write allocate policy?
> Do you mean the configuration we checked in this request
> https://xenomai.org/pipermail/xenomai/2016-June/036390.html

This answer is based on a kernel message, which may happen before
or after the I-pipe patch has changed the value passed to the
register, so, essentially, it is useless. I would not call that
checking the L2 cache configuration differences.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28 12:01                                                 ` Gilles Chanteperdrix
@ 2016-06-28 14:32                                                   ` Wolfgang Netbal
  2016-06-28 14:42                                                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-28 14:32 UTC (permalink / raw)
  To: xenomai



Am 2016-06-28 um 14:01 schrieb Gilles Chanteperdrix:
> On Tue, Jun 28, 2016 at 01:55:27PM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix:
>>> On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote:
>>>> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix:
>>>> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us
>>>>
>>>> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1
>>>>
>>>> #> ./tsc
>>>> min: 10, max: 667, avg: 11.5755 -> 0.029231 us
>>> Ok. So, first it confirms that the two configurations are running
>>> the processor at the same frequency. But we seem to see a pattern,
>>> the maxima in the case of the new kernel seems consistently higher.
>>> Which would suggest that there is some difference in the cache. What
>>> is the status of the two configurations with regard to the L2 cache
>>> write allocate policy?
>> Do you mean the configuration we checked in this request
>> https://xenomai.org/pipermail/xenomai/2016-June/036390.html
> This answer is based on a kernel message, which may happen before
> or after the I-pipe patch has changed the value passed to the
> register, so, essentially, it is useless. I would not call that
> checking the L2 cache configuration differences.
>
I readed the values from the auxiliary control register,
when the system is up and running.
I get the same values like I see in the Kernel log.

Kernel 3.10.53 [0xa02104]=0x32c50000
Kernel 3.0.43    [0xa02104]=0x2850000






^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28 14:32                                                   ` Wolfgang Netbal
@ 2016-06-28 14:42                                                     ` Gilles Chanteperdrix
  2016-06-30  9:17                                                       ` Wolfgang Netbal
  0 siblings, 1 reply; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-28 14:42 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Tue, Jun 28, 2016 at 04:32:17PM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-28 um 14:01 schrieb Gilles Chanteperdrix:
> > On Tue, Jun 28, 2016 at 01:55:27PM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix:
> >>> On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote:
> >>>> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix:
> >>>> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us
> >>>>
> >>>> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1
> >>>>
> >>>> #> ./tsc
> >>>> min: 10, max: 667, avg: 11.5755 -> 0.029231 us
> >>> Ok. So, first it confirms that the two configurations are running
> >>> the processor at the same frequency. But we seem to see a pattern,
> >>> the maxima in the case of the new kernel seems consistently higher.
> >>> Which would suggest that there is some difference in the cache. What
> >>> is the status of the two configurations with regard to the L2 cache
> >>> write allocate policy?
> >> Do you mean the configuration we checked in this request
> >> https://xenomai.org/pipermail/xenomai/2016-June/036390.html
> > This answer is based on a kernel message, which may happen before
> > or after the I-pipe patch has changed the value passed to the
> > register, so, essentially, it is useless. I would not call that
> > checking the L2 cache configuration differences.
> >
> I readed the values from the auxiliary control register,
> when the system is up and running.
> I get the same values like I see in the Kernel log.
> 
> Kernel 3.10.53 [0xa02104]=0x32c50000
> Kernel 3.0.43    [0xa02104]=0x2850000

Ok, so, if I read this correctly both values have 0x800000 set,
which means "force no allocate", and is what we want. But there are
a lot of other questions in my answer which you avoid to answer (and
note that that one was only relevant in one of two cases, which I
believe is not yours).

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-28 14:42                                                     ` Gilles Chanteperdrix
@ 2016-06-30  9:17                                                       ` Wolfgang Netbal
  2016-06-30  9:39                                                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 36+ messages in thread
From: Wolfgang Netbal @ 2016-06-30  9:17 UTC (permalink / raw)
  To: xenomai



Am 2016-06-28 um 16:42 schrieb Gilles Chanteperdrix:
> On Tue, Jun 28, 2016 at 04:32:17PM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-28 um 14:01 schrieb Gilles Chanteperdrix:
>>> On Tue, Jun 28, 2016 at 01:55:27PM +0200, Wolfgang Netbal wrote:
>>>> Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix:
>>>>> On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote:
>>>>>> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix:
>>>>>> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us
>>>>>>
>>>>>> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1
>>>>>>
>>>>>> #> ./tsc
>>>>>> min: 10, max: 667, avg: 11.5755 -> 0.029231 us
>>>>> Ok. So, first it confirms that the two configurations are running
>>>>> the processor at the same frequency. But we seem to see a pattern,
>>>>> the maxima in the case of the new kernel seems consistently higher.
>>>>> Which would suggest that there is some difference in the cache. What
>>>>> is the status of the two configurations with regard to the L2 cache
>>>>> write allocate policy?
>>>> Do you mean the configuration we checked in this request
>>>> https://xenomai.org/pipermail/xenomai/2016-June/036390.html
>>> This answer is based on a kernel message, which may happen before
>>> or after the I-pipe patch has changed the value passed to the
>>> register, so, essentially, it is useless. I would not call that
>>> checking the L2 cache configuration differences.
>>>
>> I readed the values from the auxiliary control register,
>> when the system is up and running.
>> I get the same values like I see in the Kernel log.
>>
>> Kernel 3.10.53 [0xa02104]=0x32c50000
>> Kernel 3.0.43    [0xa02104]=0x2850000
> Ok, so, if I read this correctly both values have 0x800000 set,
> which means "force no allocate", and is what we want. But there are
> a lot of other questions in my answer which you avoid to answer (and
> note that that one was only relevant in one of two cases, which I
> believe is not yours).
>
Dear Gilles,

your first intention was correct, that the L2 cache configuration may be
the reason for our issue.
I disabled the instruction and data prefetching and my customer application
is as fast as in our old kernel.
It was a change in the Kernel file arch/arm/mach-imx/system.c where the 
prefetching
was activated.

We will additional replace the function rt_timer_tsc() by __xn_rdtsc() 
as you recommended.

In our customer applications every millisecond the xenomai task with 
priority 95
is called and works on different objects that are located on different 
memory locations.
When the objects are finished we leave the xenomai domain and let work 
Linux.

Do you have any additional hints for me what configrations L2 cache or 
other that can
speed up this use case ?

Thanks a lot for you support and patience. 
<http://dict.leo.org/ende/index_de.html#/search=patience&searchLoc=0&resultOrder=basic&multiwordShowSingle=on&pos=0>

Kind regards
Wolfgang



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
  2016-06-30  9:17                                                       ` Wolfgang Netbal
@ 2016-06-30  9:39                                                         ` Gilles Chanteperdrix
  0 siblings, 0 replies; 36+ messages in thread
From: Gilles Chanteperdrix @ 2016-06-30  9:39 UTC (permalink / raw)
  To: Wolfgang Netbal; +Cc: xenomai

On Thu, Jun 30, 2016 at 11:17:59AM +0200, Wolfgang Netbal wrote:
> 
> 
> Am 2016-06-28 um 16:42 schrieb Gilles Chanteperdrix:
> > On Tue, Jun 28, 2016 at 04:32:17PM +0200, Wolfgang Netbal wrote:
> >>
> >> Am 2016-06-28 um 14:01 schrieb Gilles Chanteperdrix:
> >>> On Tue, Jun 28, 2016 at 01:55:27PM +0200, Wolfgang Netbal wrote:
> >>>> Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix:
> >>>>> On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote:
> >>>>>> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix:
> >>>>>> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us
> >>>>>>
> >>>>>> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1
> >>>>>>
> >>>>>> #> ./tsc
> >>>>>> min: 10, max: 667, avg: 11.5755 -> 0.029231 us
> >>>>> Ok. So, first it confirms that the two configurations are running
> >>>>> the processor at the same frequency. But we seem to see a pattern,
> >>>>> the maxima in the case of the new kernel seems consistently higher.
> >>>>> Which would suggest that there is some difference in the cache. What
> >>>>> is the status of the two configurations with regard to the L2 cache
> >>>>> write allocate policy?
> >>>> Do you mean the configuration we checked in this request
> >>>> https://xenomai.org/pipermail/xenomai/2016-June/036390.html
> >>> This answer is based on a kernel message, which may happen before
> >>> or after the I-pipe patch has changed the value passed to the
> >>> register, so, essentially, it is useless. I would not call that
> >>> checking the L2 cache configuration differences.
> >>>
> >> I readed the values from the auxiliary control register,
> >> when the system is up and running.
> >> I get the same values like I see in the Kernel log.
> >>
> >> Kernel 3.10.53 [0xa02104]=0x32c50000
> >> Kernel 3.0.43    [0xa02104]=0x2850000
> > Ok, so, if I read this correctly both values have 0x800000 set,
> > which means "force no allocate", and is what we want. But there are
> > a lot of other questions in my answer which you avoid to answer (and
> > note that that one was only relevant in one of two cases, which I
> > believe is not yours).
> >
> Dear Gilles,

Hi,

> 
> your first intention was correct, that the L2 cache configuration may be
> the reason for our issue.
> I disabled the instruction and data prefetching and my customer application
> is as fast as in our old kernel.

I thought you said the contrary in this mail:
https://xenomai.org/pipermail/xenomai/2016-June/036390.html

Well not exactly the contrary, but that you tried to enable prefetching
with 3.0.43 and that performance did not degrade.

> It was a change in the Kernel file arch/arm/mach-imx/system.c where the 
> prefetching
> was activated.
> 
> We will additional replace the function rt_timer_tsc() by __xn_rdtsc() 
> as you recommended.

No, do not do that. __xn_rdtsc() is a Xenomai internal function. The
"tsc" test uses it because it is a test measuring the execution time
of this function. What I said was to replace __xn_rdtsc() with
rt_timer_tsc() in the "tsc" test, to see if there was no performance
regression in rt_timer_tsc().

Also, I would be curious to understand why the execution time of
__xn_rdtsc() changed, it is just one processor cycle, so, it should not
matter to applications, but still, I do not see what change could
cause this.

> 
> In our customer applications every millisecond the xenomai task with 
> priority 95
> is called and works on different objects that are located on different 
> memory locations.
> When the objects are finished we leave the xenomai domain and let work 
> Linux.
> 
> Do you have any additional hints for me what configrations L2 cache or 
> other that can
> speed up this use case ?

Well, no, I know nothing about the L2 cache configuration. A
customer told me that disabling write allocate was improving the
latency test results greatly on imx6, and I benchmarked it on OMAP4,
another processor based on cortex A9, you can find the benchmark
here:
https://xenomai.org/2014/08/benchmarks-xenomai-dual-kernel-over-linux-3-14/#For_the_Texas_Instrument_Panda_board_running_a_TI_OMAP4430_processor_at_1_GHz

Since it seemed to improve the performance on all the processors
Xenomai supported at the time with this L2 cache (i.e. omap4 and
imx6 really), we made the change in the I-pipe patch with a kernel
parameter to disable it, in case someone would prefer to disable it.

> 
> Thanks a lot for you support and patience. 
> <http://dict.leo.org/ende/index_de.html#/search=patience&searchLoc=0&resultOrder=basic&multiwordShowSingle=on&pos=0>

Well, I am not always patient. But you are welcome.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2016-06-30  9:39 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-31 14:09 [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 Wolfgang Netbal
2016-05-31 14:16 ` Gilles Chanteperdrix
2016-06-01 13:52   ` Wolfgang Netbal
2016-06-01 14:12     ` Gilles Chanteperdrix
2016-06-02  8:15       ` Wolfgang Netbal
2016-06-02  8:23         ` Gilles Chanteperdrix
2016-06-06  7:03           ` Wolfgang Netbal
2016-06-06 15:35             ` Gilles Chanteperdrix
2016-06-07 14:13               ` Wolfgang Netbal
2016-06-07 17:00                 ` Gilles Chanteperdrix
2016-06-27 15:55                   ` Wolfgang Netbal
2016-06-27 16:00                     ` Gilles Chanteperdrix
2016-06-28  8:08                       ` Wolfgang Netbal
2016-06-27 16:46                     ` Gilles Chanteperdrix
2016-06-28  8:31                       ` Wolfgang Netbal
2016-06-28  8:34                         ` Gilles Chanteperdrix
2016-06-28  9:15                           ` Wolfgang Netbal
2016-06-28  9:17                             ` Gilles Chanteperdrix
2016-06-28  9:28                               ` Wolfgang Netbal
2016-06-28  9:29                                 ` Gilles Chanteperdrix
2016-06-28  9:51                                   ` Wolfgang Netbal
2016-06-28  9:55                                     ` Gilles Chanteperdrix
2016-06-28 10:10                                       ` Wolfgang Netbal
2016-06-28 10:19                                         ` Gilles Chanteperdrix
2016-06-28 10:31                                           ` Wolfgang Netbal
2016-06-28 10:39                                             ` Gilles Chanteperdrix
2016-06-28 11:45                                               ` Wolfgang Netbal
2016-06-28 11:57                                                 ` Gilles Chanteperdrix
2016-06-28 11:55                                               ` Wolfgang Netbal
2016-06-28 12:01                                                 ` Gilles Chanteperdrix
2016-06-28 14:32                                                   ` Wolfgang Netbal
2016-06-28 14:42                                                     ` Gilles Chanteperdrix
2016-06-30  9:17                                                       ` Wolfgang Netbal
2016-06-30  9:39                                                         ` Gilles Chanteperdrix
2016-06-07 17:22                 ` Philippe Gerum
2016-05-31 15:08 ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.