* [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 @ 2016-05-31 14:09 Wolfgang Netbal 2016-05-31 14:16 ` Gilles Chanteperdrix 2016-05-31 15:08 ` Philippe Gerum 0 siblings, 2 replies; 36+ messages in thread From: Wolfgang Netbal @ 2016-05-31 14:09 UTC (permalink / raw) To: xenomai Dear all, we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system is now up and running and works stable. Unfortunately we see a difference in the performance. Our old combination (XENOMAI 2.6.2.1 + Linux 3.0.43) was slightly faster. At the moment it looks like that XENOMAI 2.6.4 calls xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old system. Every call of xnpod_schedule_handler interrupts our main XENOMAI task with priority = 95. I have compared the configuration of both XENOMAI versions but did not found any difference. I checked the source code (new commits) but did also not find a solution. Any help would be greatly appreciated. -- Wolfgang Netbal Softwareentwicklung ________________________________ SIGMATEK GmbH & Co KG Sigmatekstraße 1 5112 Lamprechtshausen Österreich / Austria Tel.: +43/6274/4321-0 Fax: +43/6274/4321-300 E-Mail: wolfgang.netbal@sigmatek.at http://www.sigmatek-automation.at ****************************Please note: **************************** This email and all attachments are confidential and intended solely for the person or entity to whom it is addressed. If you are not the named addressee you must not make this email and all attachments accessible to any other person. If you have received this email in error please delete it together with all attachments. ********************************************************************* ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-05-31 14:09 [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 Wolfgang Netbal @ 2016-05-31 14:16 ` Gilles Chanteperdrix 2016-06-01 13:52 ` Wolfgang Netbal 2016-05-31 15:08 ` Philippe Gerum 1 sibling, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-05-31 14:16 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: > Dear all, > > we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to > "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system > is now up and running and works stable. Unfortunately we see a > difference in the performance. Our old combination (XENOMAI 2.6.2.1 + > Linux 3.0.43) was slightly faster. > > At the moment it looks like that XENOMAI 2.6.4 calls > xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old > system. Every call of xnpod_schedule_handler interrupts our main > XENOMAI task with priority = 95. > > I have compared the configuration of both XENOMAI versions but did not > found any difference. I checked the source code (new commits) but did > also not find a solution. Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see whether it comes from the kernel update or the Xenomai udpate? -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-05-31 14:16 ` Gilles Chanteperdrix @ 2016-06-01 13:52 ` Wolfgang Netbal 2016-06-01 14:12 ` Gilles Chanteperdrix 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-01 13:52 UTC (permalink / raw) To: xenomai Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >> Dear all, >> >> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to >> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system >> is now up and running and works stable. Unfortunately we see a >> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + >> Linux 3.0.43) was slightly faster. >> >> At the moment it looks like that XENOMAI 2.6.4 calls >> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old >> system. Every call of xnpod_schedule_handler interrupts our main >> XENOMAI task with priority = 95. >> >> I have compared the configuration of both XENOMAI versions but did not >> found any difference. I checked the source code (new commits) but did >> also not find a solution. > Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see > whether it comes from the kernel update or the Xenomai udpate? I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to Xenomai 2.6.2.1 Looks like there is an other reason than Xenomai. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-01 13:52 ` Wolfgang Netbal @ 2016-06-01 14:12 ` Gilles Chanteperdrix 2016-06-02 8:15 ` Wolfgang Netbal 0 siblings, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-01 14:12 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: > > > Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > > On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: > >> Dear all, > >> > >> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to > >> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system > >> is now up and running and works stable. Unfortunately we see a > >> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + > >> Linux 3.0.43) was slightly faster. > >> > >> At the moment it looks like that XENOMAI 2.6.4 calls > >> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old > >> system. Every call of xnpod_schedule_handler interrupts our main > >> XENOMAI task with priority = 95. > >> > >> I have compared the configuration of both XENOMAI versions but did not > >> found any difference. I checked the source code (new commits) but did > >> also not find a solution. > > Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see > > whether it comes from the kernel update or the Xenomai udpate? > I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to > Xenomai 2.6.2.1 > Looks like there is an other reason than Xenomai. Ok, one thing to pay attention to on imx6 is the L2 cache write allocate policy. You want to disable L2 write allocate on imx6 to get low latencies. I do not know which patches exactly you are using, so it is difficult to check, but the kernel normally displays the value set in the L2 auxiliary configuration register, you can check in the datasheet if it means that L2 write allocate is disabled or not. And check if you get the same value with 3.0 and 3.10. -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-01 14:12 ` Gilles Chanteperdrix @ 2016-06-02 8:15 ` Wolfgang Netbal 2016-06-02 8:23 ` Gilles Chanteperdrix 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-02 8:15 UTC (permalink / raw) To: xenomai Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: > On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: >> >> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: >>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >>>> Dear all, >>>> >>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to >>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system >>>> is now up and running and works stable. Unfortunately we see a >>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + >>>> Linux 3.0.43) was slightly faster. >>>> >>>> At the moment it looks like that XENOMAI 2.6.4 calls >>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old >>>> system. Every call of xnpod_schedule_handler interrupts our main >>>> XENOMAI task with priority = 95. >>>> >>>> I have compared the configuration of both XENOMAI versions but did not >>>> found any difference. I checked the source code (new commits) but did >>>> also not find a solution. >>> Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see >>> whether it comes from the kernel update or the Xenomai udpate? >> I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to >> Xenomai 2.6.2.1 >> Looks like there is an other reason than Xenomai. > Ok, one thing to pay attention to on imx6 is the L2 cache write > allocate policy. You want to disable L2 write allocate on imx6 to > get low latencies. I do not know which patches exactly you are > using, so it is difficult to check, but the kernel normally displays > the value set in the L2 auxiliary configuration register, you can > check in the datasheet if it means that L2 write allocate is > disabled or not. And check if you get the same value with 3.0 and > 3.10. Thank you for this hint, I looked around in the kernel config, but cant find an option sounds like L2 write allocate. The only option I found was CACHE_L2X0 and that is activated on both kernels. Do you have an idea whats the name of this configuration or where in the kernel sources it should be located, so I can find out whats the name of the config flag by searching the sourcecode. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-02 8:15 ` Wolfgang Netbal @ 2016-06-02 8:23 ` Gilles Chanteperdrix 2016-06-06 7:03 ` Wolfgang Netbal 0 siblings, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-02 8:23 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: > > On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > >>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: > >>>> Dear all, > >>>> > >>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to > >>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system > >>>> is now up and running and works stable. Unfortunately we see a > >>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + > >>>> Linux 3.0.43) was slightly faster. > >>>> > >>>> At the moment it looks like that XENOMAI 2.6.4 calls > >>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old > >>>> system. Every call of xnpod_schedule_handler interrupts our main > >>>> XENOMAI task with priority = 95. > >>>> > >>>> I have compared the configuration of both XENOMAI versions but did not > >>>> found any difference. I checked the source code (new commits) but did > >>>> also not find a solution. > >>> Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see > >>> whether it comes from the kernel update or the Xenomai udpate? > >> I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to > >> Xenomai 2.6.2.1 > >> Looks like there is an other reason than Xenomai. > > Ok, one thing to pay attention to on imx6 is the L2 cache write > > allocate policy. You want to disable L2 write allocate on imx6 to > > get low latencies. I do not know which patches exactly you are > > using, so it is difficult to check, but the kernel normally displays > > the value set in the L2 auxiliary configuration register, you can > > check in the datasheet if it means that L2 write allocate is > > disabled or not. And check if you get the same value with 3.0 and > > 3.10. > Thank you for this hint, I looked around in the kernel config, but cant > find > an option sounds like L2 write allocate. > The only option I found was CACHE_L2X0 and that is activated on both > kernels. > Do you have an idea whats the name of this configuration or where in the > kernel sources it should be located, so I can find out whats the name of > the > config flag by searching the sourcecode. I never talked about any kernel configuration option. I am talking checking the value passed to the L2 cache auxiliary configuration register, this is a hardware register. Also, as I said, the value passed to the L2 cache auxiliary register is printed by the kernel during boot. -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-02 8:23 ` Gilles Chanteperdrix @ 2016-06-06 7:03 ` Wolfgang Netbal 2016-06-06 15:35 ` Gilles Chanteperdrix 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-06 7:03 UTC (permalink / raw) To: xenomai Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: > On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: >>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: >>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: >>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >>>>>> Dear all, >>>>>> >>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to >>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system >>>>>> is now up and running and works stable. Unfortunately we see a >>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + >>>>>> Linux 3.0.43) was slightly faster. >>>>>> >>>>>> At the moment it looks like that XENOMAI 2.6.4 calls >>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old >>>>>> system. Every call of xnpod_schedule_handler interrupts our main >>>>>> XENOMAI task with priority = 95. >>>>>> >>>>>> I have compared the configuration of both XENOMAI versions but did not >>>>>> found any difference. I checked the source code (new commits) but did >>>>>> also not find a solution. >>>>> Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see >>>>> whether it comes from the kernel update or the Xenomai udpate? >>>> I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to >>>> Xenomai 2.6.2.1 >>>> Looks like there is an other reason than Xenomai. >>> Ok, one thing to pay attention to on imx6 is the L2 cache write >>> allocate policy. You want to disable L2 write allocate on imx6 to >>> get low latencies. I do not know which patches exactly you are >>> using, so it is difficult to check, but the kernel normally displays >>> the value set in the L2 auxiliary configuration register, you can >>> check in the datasheet if it means that L2 write allocate is >>> disabled or not. And check if you get the same value with 3.0 and >>> 3.10. >> Thank you for this hint, I looked around in the kernel config, but cant >> find >> an option sounds like L2 write allocate. >> The only option I found was CACHE_L2X0 and that is activated on both >> kernels. >> Do you have an idea whats the name of this configuration or where in the >> kernel sources it should be located, so I can find out whats the name of >> the >> config flag by searching the sourcecode. > I never talked about any kernel configuration option. I am talking > checking the value passed to the L2 cache auxiliary configuration > register, this is a hardware register. Also, as I said, the value > passed to the L2 cache auxiliary register is printed by the kernel > during boot. > > Sorry Gilles, I found the message in the kernel log, you are right they are different Kernel 3.0.43 shows l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL 0x02850000, Cache size: 524288 B Kernel 3.10.53 shows l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL 0x32c50000, Cache size: 524288 B Kernel 3.10.53 sets addidtional the bits 22 (Shared attribute override enable), 28 (Data prefetch) and 29 (Instruction prefetch) I used the same settings on Kernel 3.0.43 but the perfromance didn't change, looks like this configurations didn't slow down my system. What I have seen while searching the kernel config was that there are a few errate that are activated as dependency in 3.10.53, to be sure none of the errata is the source of my performance reduction I activated them on 3.0.43 as well. But again no difference to our default configuration. To avoid our application is running slower I created a shell-script incrementing a variable 10.000 times and measuring the runtime with time #!/bin/sh var=0 while [ $var -lt $1 ]; do let var++ done > time /mnt/drive-C/CpuTime.sh 10000 On this test Kernel 3.0.43 Xenomai 2.6.2.1 needs 480 ms Kernel 3.10.53 Xenomai 2.6.4 needs 820ms This differences are huge, an I'm not sure if I can trust this test because we also use a different busybox, and the difference using our application are between 2% and 3% in the realtime task (Xenomaitask with priority 95) Do you have an idea why this is that much slower ? I also see differences when I use the xeno-test command to check the speed Kernel 3.0.43 Xenomai 2.6.2.1 Started child 1209: /bin/sh /usr/xenomai/bin/xeno-test-run-wrapper /usr/xenomai/bin/xeno-test + echo 0 + /usr/xenomai/bin/arith mul: 0x79364d93, shft: 26 integ: 30, frac: 0x4d9364d9364d9364 signed positive operation: 0x03ffffffffffffff * 1000000000 / 33000000 inline calibration: 0x0000000000000000: 43.260 ns, rejected 0/10000 inlined llimd: 0x79364d9364d9362f: 1476.384 ns, rejected 4/10000 inlined llmulshft: 0x79364d92ffffffe1: 35.131 ns, rejected 0/10000 inlined nodiv_llimd: 0x79364d9364d9362f: 47.745 ns, rejected 3/10000 out of line calibration: 0x0000000000000000: 49.235 ns, rejected 2/10000 out of line llimd: 0x79364d9364d9362f: 1483.759 ns, rejected 2/10000 out of line llmulshft: 0x79364d92ffffffe1: 31.719 ns, rejected 2/10000 out of line nodiv_llimd: 0x79364d9364d9362f: 49.376 ns, rejected 0/10000 signed negative operation: 0xfc00000000000001 * 1000000000 / 33000000 inline calibration: 0x0000000000000000: 41.872 ns, rejected 0/10000 inlined llimd: 0x86c9b26c9b26c9d1: 1485.415 ns, rejected 2/10000 inlined llmulshft: 0x86c9b26d0000001e: 39.234 ns, rejected 0/10000 inlined nodiv_llimd: 0x86c9b26c9b26c9d1: 54.266 ns, rejected 1/10000 out of line calibration: 0x0000000000000000: 49.237 ns, rejected 0/10000 out of line llimd: 0x86c9b26c9b26c9d1: 1489.059 ns, rejected 1/10000 out of line llmulshft: 0xd45d172d0000001e: 36.847 ns, rejected 0/10000 out of line nodiv_llimd: 0x86c9b26c9b26c9d1: 56.973 ns, rejected 2/10000 unsigned operation: 0x03ffffffffffffff * 1000000000 / 33000000 inline calibration: 0x0000000000000000: 42.432 ns, rejected 1/10000 inlined nodiv_ullimd: 0x79364d9364d9362f: 51.083 ns, rejected 0/10000 out of line calibration: 0x0000000000000000: 48.086 ns, rejected 0/10000 out of line nodiv_ullimd: 0x79364d9364d9362f: 44.964 ns, rejected 0/10000 + /usr/xenomai/bin/clocktest -C 42 -T 30 Xenomai: POSIX skin or CONFIG_XENO_OPT_PERVASIVE disabled. (modprobe xeno_posix?) + /usr/xenomai/bin/clocktest -T 30 Xenomai: POSIX skin or CONFIG_XENO_OPT_PERVASIVE disabled. (modprobe xeno_posix?) Kernel 3.10.53 Xenomai 2.6.4 Started child 729: /bin/sh /usr/xenomai/bin/xeno-test-run-wrapper /usr/xenomai/bin/xeno-test ++ echo 0 ++ /usr/xenomai/bin/arith mul: 0x79364d93, shft: 26 integ: 30, frac: 0x4d9364d9364d9364 signed positive operation: 0x03ffffffffffffff * 1000000000 / 33000000 inline calibration: 0x0000000000000000: 42.979 ns, rejected 1/10000 inlined llimd: 0x79364d9364d9362f: 1491.632 ns, rejected 2/10000 inlined llmulshft: 0x79364d92ffffffe1: 37.873 ns, rejected 1/10000 inlined nodiv_llimd: 0x79364d9364d9362f: 50.520 ns, rejected 0/10000 out of line calibration: 0x0000000000000000: 50.611 ns, rejected 1/10000 out of line llimd: 0x79364d9364d9362f: 1476.381 ns, rejected 4/10000 out of line llmulshft: 0x79364d92ffffffe1: 25.364 ns, rejected 1/10000 out of line nodiv_llimd: 0x79364d9364d9362f: 45.493 ns, rejected 1/10000 signed negative operation: 0xfc00000000000001 * 1000000000 / 33000000 inline calibration: 0x0000000000000000: 42.962 ns, rejected 1/10000 inlined llimd: 0x86c9b26c9b26c9d1: 1488.811 ns, rejected 4/10000 inlined llmulshft: 0x86c9b26d0000001e: 42.972 ns, rejected 2/10000 inlined nodiv_llimd: 0x86c9b26c9b26c9d1: 55.611 ns, rejected 1/10000 out of line calibration: 0x0000000000000000: 50.572 ns, rejected 1/10000 out of line llimd: 0x86c9b26c9b26c9d1: 1481.904 ns, rejected 3/10000 out of line llmulshft: 0x86c9b26d0000001e: 27.818 ns, rejected 0/10000 out of line nodiv_llimd: 0x86c9b26c9b26c9d1: 53.008 ns, rejected 1/10000 unsigned operation: 0x03ffffffffffffff * 1000000000 / 33000000 inline calibration: 0x0000000000000000: 42.968 ns, rejected 0/10000 inlined nodiv_ullimd: 0x79364d9364d9362f: 53.060 ns, rejected 1/10000 out of line calibration: 0x0000000000000000: 50.591 ns, rejected 1/10000 out of line nodiv_ullimd: 0x79364d9364d9362f: 46.102 ns, rejected 1/10000 ++ /usr/xenomai/bin/clocktest -C 42 -T 30 Xenomai: POSIX skin or CONFIG_XENO_OPT_PERVASIVE disabled. (modprobe xeno_posix?) ++ /usr/xenomai/bin/clocktest -T 30 Xenomai: POSIX skin or CONFIG_XENO_OPT_PERVASIVE disabled. (modprobe xeno_posix?) Some of the operations are faster on newer Xenomai but a few are much slower, for example inlined llimd. With every test I run it looks like the issue is not located in Kernel or Xenomai. Do you know any speed issues on system libraries like libc or something like that ? Kind regards Wolfgang ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-06 7:03 ` Wolfgang Netbal @ 2016-06-06 15:35 ` Gilles Chanteperdrix 2016-06-07 14:13 ` Wolfgang Netbal 0 siblings, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-06 15:35 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: > > On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: > >>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: > >>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > >>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: > >>>>>> Dear all, > >>>>>> > >>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to > >>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system > >>>>>> is now up and running and works stable. Unfortunately we see a > >>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + > >>>>>> Linux 3.0.43) was slightly faster. > >>>>>> > >>>>>> At the moment it looks like that XENOMAI 2.6.4 calls > >>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old > >>>>>> system. Every call of xnpod_schedule_handler interrupts our main > >>>>>> XENOMAI task with priority = 95. > >>>>>> > >>>>>> I have compared the configuration of both XENOMAI versions but did not > >>>>>> found any difference. I checked the source code (new commits) but did > >>>>>> also not find a solution. > >>>>> Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see > >>>>> whether it comes from the kernel update or the Xenomai udpate? > >>>> I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to > >>>> Xenomai 2.6.2.1 > >>>> Looks like there is an other reason than Xenomai. > >>> Ok, one thing to pay attention to on imx6 is the L2 cache write > >>> allocate policy. You want to disable L2 write allocate on imx6 to > >>> get low latencies. I do not know which patches exactly you are > >>> using, so it is difficult to check, but the kernel normally displays > >>> the value set in the L2 auxiliary configuration register, you can > >>> check in the datasheet if it means that L2 write allocate is > >>> disabled or not. And check if you get the same value with 3.0 and > >>> 3.10. > >> Thank you for this hint, I looked around in the kernel config, but cant > >> find > >> an option sounds like L2 write allocate. > >> The only option I found was CACHE_L2X0 and that is activated on both > >> kernels. > >> Do you have an idea whats the name of this configuration or where in the > >> kernel sources it should be located, so I can find out whats the name of > >> the > >> config flag by searching the sourcecode. > > I never talked about any kernel configuration option. I am talking > > checking the value passed to the L2 cache auxiliary configuration > > register, this is a hardware register. Also, as I said, the value > > passed to the L2 cache auxiliary register is printed by the kernel > > during boot. > > > > > Sorry Gilles, > I found the message in the kernel log, you are right they are different > Kernel 3.0.43 shows l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL > 0x02850000, Cache size: 524288 B > Kernel 3.10.53 shows l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL > 0x32c50000, Cache size: 524288 B > Kernel 3.10.53 sets addidtional the bits 22 (Shared attribute override > enable), 28 (Data prefetch) and 29 (Instruction prefetch) > I used the same settings on Kernel 3.0.43 but the perfromance didn't > change, looks like this configurations didn't slow down my > system. > > What I have seen while searching the kernel config was that there are a > few errate that are activated as dependency in 3.10.53, > to be sure none of the errata is the source of my performance reduction > I activated them on 3.0.43 as well. > But again no difference to our default configuration. > > To avoid our application is running slower I created a shell-script > incrementing a variable > 10.000 times and measuring the runtime with time > > #!/bin/sh > var=0 > while [ $var -lt $1 ]; do > let var++ > done > > > time /mnt/drive-C/CpuTime.sh 10000 > > On this test > Kernel 3.0.43 Xenomai 2.6.2.1 needs 480 ms > Kernel 3.10.53 Xenomai 2.6.4 needs 820ms If you run the same test several times on the same kernel, do you reliably always get the same duration? > > This differences are huge, an I'm not sure if I can trust this test > because we also use a different busybox, > and the difference using our application are between 2% and 3% > in the realtime task (Xenomaitask with priority 95) > Do you have an idea why this is that much slower ? I would not call a 2% or 3% difference "much slower", only measurement noise. > > I also see differences when I use the xeno-test command to check the speed > Some of the operations are faster on newer Xenomai but a few are much > slower, > for example inlined llimd. The differences in the "arith" test are measurement noise. Chances are, if you run twice the arith test with the same kernel you are not going to find the same values. > > With every test I run it looks like the issue is not located in Kernel > or Xenomai. > Do you know any speed issues on system libraries like libc or something > like that ? Stupid question: do the two kernels run the processor at the same speed? You could have a difference if one kernel runs it at 1GHz and the other at 800MHz for instance. -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-06 15:35 ` Gilles Chanteperdrix @ 2016-06-07 14:13 ` Wolfgang Netbal 2016-06-07 17:00 ` Gilles Chanteperdrix 2016-06-07 17:22 ` Philippe Gerum 0 siblings, 2 replies; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-07 14:13 UTC (permalink / raw) Cc: xenomai Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: > On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: >>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: >>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: >>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: >>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: >>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >>>>>>>> Dear all, >>>>>>>> >>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to >>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system >>>>>>>> is now up and running and works stable. Unfortunately we see a >>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + >>>>>>>> Linux 3.0.43) was slightly faster. >>>>>>>> >>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls >>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old >>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main >>>>>>>> XENOMAI task with priority = 95. As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() and 1038 handled by xnpod_schedule_handler() while my realtime task is running on kernel 3.10.53 with Xenomai 2.6.4. On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the once that are send by my board using GPIOs, but this virtual interrupts are assigned to Xenomai and Linux as well but I didn't see a handler installed. I'm pretty sure that these interrupts are slowing down my system, but where do they come from ? why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? how long do they need to process ? Is there any dependecy in Xenomai between the kernel version and this virtual interrupts ? >>>>>>>> I have compared the configuration of both XENOMAI versions but did not >>>>>>>> found any difference. I checked the source code (new commits) but did >>>>>>>> also not find a solution. >>>>>>> Have you tried Xenomai 2.6.4 with Linux 3.0.43 ? In order to see >>>>>>> whether it comes from the kernel update or the Xenomai udpate? >>>>>> I've tried Linux 3.0.43 with Xenomai 2.6.4 an there is no difference to >>>>>> Xenomai 2.6.2.1 >>>>>> Looks like there is an other reason than Xenomai. >>>>> Ok, one thing to pay attention to on imx6 is the L2 cache write >>>>> allocate policy. You want to disable L2 write allocate on imx6 to >>>>> get low latencies. I do not know which patches exactly you are >>>>> using, so it is difficult to check, but the kernel normally displays >>>>> the value set in the L2 auxiliary configuration register, you can >>>>> check in the datasheet if it means that L2 write allocate is >>>>> disabled or not. And check if you get the same value with 3.0 and >>>>> 3.10. >>>> Thank you for this hint, I looked around in the kernel config, but cant >>>> find >>>> an option sounds like L2 write allocate. >>>> The only option I found was CACHE_L2X0 and that is activated on both >>>> kernels. >>>> Do you have an idea whats the name of this configuration or where in the >>>> kernel sources it should be located, so I can find out whats the name of >>>> the >>>> config flag by searching the sourcecode. >>> I never talked about any kernel configuration option. I am talking >>> checking the value passed to the L2 cache auxiliary configuration >>> register, this is a hardware register. Also, as I said, the value >>> passed to the L2 cache auxiliary register is printed by the kernel >>> during boot. >>> >>> >> Sorry Gilles, >> I found the message in the kernel log, you are right they are different >> Kernel 3.0.43 shows l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL >> 0x02850000, Cache size: 524288 B >> Kernel 3.10.53 shows l2x0: 16 ways, CACHE_ID 0x410000c8, AUX_CTRL >> 0x32c50000, Cache size: 524288 B >> Kernel 3.10.53 sets addidtional the bits 22 (Shared attribute override >> enable), 28 (Data prefetch) and 29 (Instruction prefetch) >> I used the same settings on Kernel 3.0.43 but the perfromance didn't >> change, looks like this configurations didn't slow down my >> system. >> >> What I have seen while searching the kernel config was that there are a >> few errate that are activated as dependency in 3.10.53, >> to be sure none of the errata is the source of my performance reduction >> I activated them on 3.0.43 as well. >> But again no difference to our default configuration. >> >> To avoid our application is running slower I created a shell-script >> incrementing a variable >> 10.000 times and measuring the runtime with time >> >> #!/bin/sh >> var=0 >> while [ $var -lt $1 ]; do >> let var++ >> done >> >> > time /mnt/drive-C/CpuTime.sh 10000 >> >> On this test >> Kernel 3.0.43 Xenomai 2.6.2.1 needs 480 ms >> Kernel 3.10.53 Xenomai 2.6.4 needs 820ms > If you run the same test several times on the same kernel, do you > reliably always get the same duration? Yes I run this test 10 times and the values are always nearly the same +-10ms >> This differences are huge, an I'm not sure if I can trust this test >> because we also use a different busybox, >> and the difference using our application are between 2% and 3% >> in the realtime task (Xenomaitask with priority 95) >> Do you have an idea why this is that much slower ? > I would not call a 2% or 3% difference "much slower", only > measurement noise. I see the difference of 2% or 3% when the realtime task takes only 20% of the cpu time if I set the used cpu time to 90% I have a difference of 12%. The percente I wrote are average values measured over 10.000 cycles. On every measurement or test I run is kernel 3.10.53 with xenomai 2.6.4 slower that kernel 3.0.43 with xenomai 2.6.2.1 >> I also see differences when I use the xeno-test command to check the speed >> Some of the operations are faster on newer Xenomai but a few are much >> slower, >> for example inlined llimd. > The differences in the "arith" test are measurement noise. Chances > are, if you run twice the arith test with the same kernel you are > not going to find the same values. >> With every test I run it looks like the issue is not located in Kernel >> or Xenomai. >> Do you know any speed issues on system libraries like libc or something >> like that ? > Stupid question: do the two kernels run the processor at the same > speed? You could have a difference if one kernel runs it at 1GHz and > the other at 800MHz for instance. > Yes the tow kernels are running the same processor. One of the first things I checked was if they setting the same cpu frequence and the RAM settings. I also changed the microsdcards to avoid that I have a processor that is faster. There are no differences. Thank you Wolfgang ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-07 14:13 ` Wolfgang Netbal @ 2016-06-07 17:00 ` Gilles Chanteperdrix 2016-06-27 15:55 ` Wolfgang Netbal 2016-06-07 17:22 ` Philippe Gerum 1 sibling, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-07 17:00 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: > > On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: > >>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: > >>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: > >>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: > >>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > >>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: > >>>>>>>> Dear all, > >>>>>>>> > >>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to > >>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system > >>>>>>>> is now up and running and works stable. Unfortunately we see a > >>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + > >>>>>>>> Linux 3.0.43) was slightly faster. > >>>>>>>> > >>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls > >>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old > >>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main > >>>>>>>> XENOMAI task with priority = 95. > As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() > and 1038 handled by xnpod_schedule_handler() while my realtime task > is running on kernel 3.10.53 with Xenomai 2.6.4. > On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the > once that are send by my board using GPIOs, but this virtual interrupts > are assigned to Xenomai and Linux as well but I didn't see a handler > installed. > I'm pretty sure that these interrupts are slowing down my system, but > where do they come from ? > why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? > how long do they need to process ? How do you mean you do not see them? If you are talking about the rescheduling API, it used no to be bound to a virq (so, it would have a different irq number on cortex A9, something between 0 and 31 that would not show in the usual /proc files), I wonder if 3.0 is before or after that. You do not see them in /proc, or you see them and their count does not increase? As for where they come from, this is not a mystery, the reschedule IPI is triggered when code on one cpu changes the scheduler state (wakes up a thread for instance) on another cpu. If you want to avoid it, do not do that. That means, do not share mutex between threads running on different cpus, pay attention for timers to be running on the same cpu as the thread they signal, etc... The APC virq is used to multiplex several services, which you can find by grepping the sources for rthal_apc_alloc: ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", It would be interesting to know which of these services is triggered a lot. One possibility I see would be root thread priority inheritance, so it would be caused by mode switches. This brings the question: do your application have threads migrating between primary and secondary mode, do you see the count of mode switches increase with the kernel changes, do you have root thread priority inheritance enabled? -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-07 17:00 ` Gilles Chanteperdrix @ 2016-06-27 15:55 ` Wolfgang Netbal 2016-06-27 16:00 ` Gilles Chanteperdrix 2016-06-27 16:46 ` Gilles Chanteperdrix 0 siblings, 2 replies; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-27 15:55 UTC (permalink / raw) To: xenomai Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: > On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: >>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: >>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: >>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: >>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: >>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: >>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: >>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >>>>>>>>>> Dear all, >>>>>>>>>> >>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to >>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system >>>>>>>>>> is now up and running and works stable. Unfortunately we see a >>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + >>>>>>>>>> Linux 3.0.43) was slightly faster. >>>>>>>>>> >>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls >>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old >>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main >>>>>>>>>> XENOMAI task with priority = 95. >> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() >> and 1038 handled by xnpod_schedule_handler() while my realtime task >> is running on kernel 3.10.53 with Xenomai 2.6.4. >> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the >> once that are send by my board using GPIOs, but this virtual interrupts >> are assigned to Xenomai and Linux as well but I didn't see a handler >> installed. >> I'm pretty sure that these interrupts are slowing down my system, but >> where do they come from ? >> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? >> how long do they need to process ? > How do you mean you do not see them? If you are talking about the > rescheduling API, it used no to be bound to a virq (so, it would > have a different irq number on cortex A9, something between 0 and 31 > that would not show in the usual /proc files), I wonder if 3.0 is > before or after that. You do not see them in /proc, or you see them > and their count does not increase? Sorry for the long delay, we ran a lot of tests to find out what could be the reason for the performance difference. If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to the virtual IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel > As for where they come from, this is not a mystery, the reschedule > IPI is triggered when code on one cpu changes the scheduler state > (wakes up a thread for instance) on another cpu. If you want to > avoid it, do not do that. That means, do not share mutex between > threads running on different cpus, pay attention for timers to be > running on the same cpu as the thread they signal, etc... > > The APC virq is used to multiplex several services, which you can > find by grepping the sources for rthal_apc_alloc: > ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", > ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, > ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); > ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); > ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); > ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", > > It would be interesting to know which of these services is triggered > a lot. One possibility I see would be root thread priority > inheritance, so it would be caused by mode switches. This brings the > question: do your application have threads migrating between primary > and secondary mode, do you see the count of mode switches increase > with the kernel changes, do you have root thread priority > inheritance enabled? > Here a short sum up of our tests and the results and at the end a few questions :-) we are using a Freescale imx6dl on our hardware and upgraded our operating system from Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 On both Kernels the CONFIG_SMP is set. What we see is that when we running a customer project in a Xenomai task with priority 95 tooks 40% of the CPU time on Kernel 3.0.43 and 47% of CPU time on Kernel 3.10.53 so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% To find out what is the reason for this difference we ran the following test. We tried to get the new system faster by change some components of the system. -Changing U-Boot on new system -> still 7% slower -Copy Kernel 3.0.43 to new system -> still 7% slower -Creating Kernel 3.0.43 with Xenomai 2.6.4 and copy it to new system -> still 7% slower -Compiling the new system with old GCC version -> still 7% slower -We also checked the settings for RAM and CPU clock -> these are equal It looks like that is not one of the big components, so we started to test some special functions like rt_timer_tsc() In the following example we stay for 800µs in the while loop and start this loop again after 200µs delay. The task application running this code has priotity 95. Here a simplified code snipped start = rt_timer_tsc(); do { current = rt_timer_tsc(); i++; } while((current - start) < 800) i is on old system at 1464 i is on new system at 1392 this means a difference of 5% Is it possible that the prefetching of code change between the two Kernel versions, because our customer application is bigger then our testcode the difference can be greater. Any hints what can be the reason for this slowdown ? Kind regards Wolfgang ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-27 15:55 ` Wolfgang Netbal @ 2016-06-27 16:00 ` Gilles Chanteperdrix 2016-06-28 8:08 ` Wolfgang Netbal 2016-06-27 16:46 ` Gilles Chanteperdrix 1 sibling, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-27 16:00 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: > -Creating Kernel 3.0.43 with > Xenomai 2.6.4 and copy it to new system -> still 7% slower This contradicts what you said here: https://xenomai.org/pipermail/xenomai/2016-June/036370.html -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-27 16:00 ` Gilles Chanteperdrix @ 2016-06-28 8:08 ` Wolfgang Netbal 0 siblings, 0 replies; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-28 8:08 UTC (permalink / raw) To: xenomai Am 2016-06-27 um 18:00 schrieb Gilles Chanteperdrix: > On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: >> -Creating Kernel 3.0.43 with >> Xenomai 2.6.4 and copy it to new system -> still 7% slower > This contradicts what you said here: > https://xenomai.org/pipermail/xenomai/2016-June/036370.html I always tried to speed up the new system, so I always tested the changes on the new system, so when I wrote that nothing changed I meant that the new system doesn speed up. Sorry for the missing details in the above post. > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-27 15:55 ` Wolfgang Netbal 2016-06-27 16:00 ` Gilles Chanteperdrix @ 2016-06-27 16:46 ` Gilles Chanteperdrix 2016-06-28 8:31 ` Wolfgang Netbal 1 sibling, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-27 16:46 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: > > On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: > >>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: > >>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: > >>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: > >>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: > >>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: > >>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > >>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>> Dear all, > >>>>>>>>>> > >>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to > >>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system > >>>>>>>>>> is now up and running and works stable. Unfortunately we see a > >>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + > >>>>>>>>>> Linux 3.0.43) was slightly faster. > >>>>>>>>>> > >>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls > >>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old > >>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main > >>>>>>>>>> XENOMAI task with priority = 95. > >> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() > >> and 1038 handled by xnpod_schedule_handler() while my realtime task > >> is running on kernel 3.10.53 with Xenomai 2.6.4. > >> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the > >> once that are send by my board using GPIOs, but this virtual interrupts > >> are assigned to Xenomai and Linux as well but I didn't see a handler > >> installed. > >> I'm pretty sure that these interrupts are slowing down my system, but > >> where do they come from ? > >> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? > >> how long do they need to process ? > > How do you mean you do not see them? If you are talking about the > > rescheduling API, it used no to be bound to a virq (so, it would > > have a different irq number on cortex A9, something between 0 and 31 > > that would not show in the usual /proc files), I wonder if 3.0 is > > before or after that. You do not see them in /proc, or you see them > > and their count does not increase? > Sorry for the long delay, we ran a lot of tests to find out what could > be the reason for > the performance difference. > > If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to > the virtual > IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel > > As for where they come from, this is not a mystery, the reschedule > > IPI is triggered when code on one cpu changes the scheduler state > > (wakes up a thread for instance) on another cpu. If you want to > > avoid it, do not do that. That means, do not share mutex between > > threads running on different cpus, pay attention for timers to be > > running on the same cpu as the thread they signal, etc... > > > > The APC virq is used to multiplex several services, which you can > > find by grepping the sources for rthal_apc_alloc: > > ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", > > ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, > > ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); > > ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); > > ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); > > ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", > > > > It would be interesting to know which of these services is triggered > > a lot. One possibility I see would be root thread priority > > inheritance, so it would be caused by mode switches. This brings the > > question: do your application have threads migrating between primary > > and secondary mode, do you see the count of mode switches increase > > with the kernel changes, do you have root thread priority > > inheritance enabled? > > > Here a short sum up of our tests and the results and at the end a few > questions :-) > > we are using a Freescale imx6dl on our hardware and upgraded our operating system from > Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 > Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 > On both Kernels the CONFIG_SMP is set. > > What we see is that when we running a customer project in a Xenomai task with priority 95 > tooks 40% of the CPU time on Kernel 3.0.43 > and 47% of CPU time on Kernel 3.10.53 > > so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% > To find out what is the reason for this difference we ran the following test. > We tried to get the new system faster by change some components of the system. > > -Changing U-Boot on new system -> still 7% slower > -Copy Kernel 3.0.43 to new system -> still 7% slower > -Creating Kernel 3.0.43 with > Xenomai 2.6.4 and copy it to new system -> still 7% slower > -Compiling the new system with > old GCC version -> still 7% slower > -We also checked the settings for RAM and CPU clock -> these are equal > > It looks like that is not one of the big components, > so we started to test some special functions like rt_timer_tsc() > In the following example we stay for 800µs in the while loop and > start this loop again after 200µs delay. > The task application running this code has priotity 95. > > Here a simplified code snipped > start = rt_timer_tsc(); > do > { > current = rt_timer_tsc(); > i++; > } while((current - start) < 800) If your CPU is running at 1 GHz and uses the global timer as clock source, the clock source runs at 500 MHz, so 800 ticks of the tsc is something around 1.6 us So, I do not really understand what you are talking about. But are you sure the two kernels use the same clocksource for xenomai? Could you show us the result of "dmesg | grep I-pipe" with the two kernels ? -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-27 16:46 ` Gilles Chanteperdrix @ 2016-06-28 8:31 ` Wolfgang Netbal 2016-06-28 8:34 ` Gilles Chanteperdrix 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-28 8:31 UTC (permalink / raw) To: xenomai Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: > On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: >>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: >>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: >>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: >>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: >>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: >>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: >>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: >>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: >>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>> Dear all, >>>>>>>>>>>> >>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to >>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system >>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a >>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + >>>>>>>>>>>> Linux 3.0.43) was slightly faster. >>>>>>>>>>>> >>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls >>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old >>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main >>>>>>>>>>>> XENOMAI task with priority = 95. >>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() >>>> and 1038 handled by xnpod_schedule_handler() while my realtime task >>>> is running on kernel 3.10.53 with Xenomai 2.6.4. >>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the >>>> once that are send by my board using GPIOs, but this virtual interrupts >>>> are assigned to Xenomai and Linux as well but I didn't see a handler >>>> installed. >>>> I'm pretty sure that these interrupts are slowing down my system, but >>>> where do they come from ? >>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? >>>> how long do they need to process ? >>> How do you mean you do not see them? If you are talking about the >>> rescheduling API, it used no to be bound to a virq (so, it would >>> have a different irq number on cortex A9, something between 0 and 31 >>> that would not show in the usual /proc files), I wonder if 3.0 is >>> before or after that. You do not see them in /proc, or you see them >>> and their count does not increase? >> Sorry for the long delay, we ran a lot of tests to find out what could >> be the reason for >> the performance difference. >> >> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to >> the virtual >> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel >>> As for where they come from, this is not a mystery, the reschedule >>> IPI is triggered when code on one cpu changes the scheduler state >>> (wakes up a thread for instance) on another cpu. If you want to >>> avoid it, do not do that. That means, do not share mutex between >>> threads running on different cpus, pay attention for timers to be >>> running on the same cpu as the thread they signal, etc... >>> >>> The APC virq is used to multiplex several services, which you can >>> find by grepping the sources for rthal_apc_alloc: >>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", >>> ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, >>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); >>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); >>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); >>> ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", >>> >>> It would be interesting to know which of these services is triggered >>> a lot. One possibility I see would be root thread priority >>> inheritance, so it would be caused by mode switches. This brings the >>> question: do your application have threads migrating between primary >>> and secondary mode, do you see the count of mode switches increase >>> with the kernel changes, do you have root thread priority >>> inheritance enabled? >>> >> Here a short sum up of our tests and the results and at the end a few >> questions :-) >> >> we are using a Freescale imx6dl on our hardware and upgraded our operating system from >> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 >> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 >> On both Kernels the CONFIG_SMP is set. >> >> What we see is that when we running a customer project in a Xenomai task with priority 95 >> tooks 40% of the CPU time on Kernel 3.0.43 >> and 47% of CPU time on Kernel 3.10.53 >> >> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% >> To find out what is the reason for this difference we ran the following test. >> We tried to get the new system faster by change some components of the system. >> >> -Changing U-Boot on new system -> still 7% slower >> -Copy Kernel 3.0.43 to new system -> still 7% slower >> -Creating Kernel 3.0.43 with >> Xenomai 2.6.4 and copy it to new system -> still 7% slower >> -Compiling the new system with >> old GCC version -> still 7% slower >> -We also checked the settings for RAM and CPU clock -> these are equal >> >> It looks like that is not one of the big components, >> so we started to test some special functions like rt_timer_tsc() >> In the following example we stay for 800µs in the while loop and >> start this loop again after 200µs delay. >> The task application running this code has priotity 95. >> >> Here a simplified code snipped >> start = rt_timer_tsc(); >> do >> { >> current = rt_timer_tsc(); >> i++; >> } while((current - start) < 800) > If your CPU is running at 1 GHz and uses the global timer as clock > source, the clock source runs at 500 MHz, so 800 ticks of the tsc is > something around 1.6 us Sorry I simplified the code snippet a little bit to much. Thats the correct code. current = rt_timer_tsc2ns(rt_timer_tsc()); > So, I do not really understand what you are talking about. But are > you sure the two kernels use the same clocksource for xenomai? > > Could you show us the result of "dmesg | grep I-pipe" with the two > kernels ? Output of Kernel 3.10.53 with Xenomai 2.6.4 I-pipe, 3.000 MHz clocksource I-pipe, 396.000 MHz clocksource I-pipe, 396.000 MHz timer I-pipe, 396.000 MHz timer I-pipe: head domain Xenomai registered. Output of Kernel 3.0.43 with Xenomai 2.6.2.1 [ 0.000000] I-pipe 1.18-13: pipeline enabled. [ 0.331999] I-pipe, 396.000 MHz timer [ 0.335720] I-pipe, 396.000 MHz clocksource [ 0.844016] I-pipe: Domain Xenomai registered. The controller is a imx6dl, this controller can run maximum 800MHz ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 8:31 ` Wolfgang Netbal @ 2016-06-28 8:34 ` Gilles Chanteperdrix 2016-06-28 9:15 ` Wolfgang Netbal 0 siblings, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-28 8:34 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: > > On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: > >>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: > >>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: > >>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: > >>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: > >>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: > >>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: > >>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > >>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>> Dear all, > >>>>>>>>>>>> > >>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to > >>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system > >>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a > >>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + > >>>>>>>>>>>> Linux 3.0.43) was slightly faster. > >>>>>>>>>>>> > >>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls > >>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old > >>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main > >>>>>>>>>>>> XENOMAI task with priority = 95. > >>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() > >>>> and 1038 handled by xnpod_schedule_handler() while my realtime task > >>>> is running on kernel 3.10.53 with Xenomai 2.6.4. > >>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the > >>>> once that are send by my board using GPIOs, but this virtual interrupts > >>>> are assigned to Xenomai and Linux as well but I didn't see a handler > >>>> installed. > >>>> I'm pretty sure that these interrupts are slowing down my system, but > >>>> where do they come from ? > >>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? > >>>> how long do they need to process ? > >>> How do you mean you do not see them? If you are talking about the > >>> rescheduling API, it used no to be bound to a virq (so, it would > >>> have a different irq number on cortex A9, something between 0 and 31 > >>> that would not show in the usual /proc files), I wonder if 3.0 is > >>> before or after that. You do not see them in /proc, or you see them > >>> and their count does not increase? > >> Sorry for the long delay, we ran a lot of tests to find out what could > >> be the reason for > >> the performance difference. > >> > >> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to > >> the virtual > >> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel > >>> As for where they come from, this is not a mystery, the reschedule > >>> IPI is triggered when code on one cpu changes the scheduler state > >>> (wakes up a thread for instance) on another cpu. If you want to > >>> avoid it, do not do that. That means, do not share mutex between > >>> threads running on different cpus, pay attention for timers to be > >>> running on the same cpu as the thread they signal, etc... > >>> > >>> The APC virq is used to multiplex several services, which you can > >>> find by grepping the sources for rthal_apc_alloc: > >>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", > >>> ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, > >>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); > >>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); > >>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); > >>> ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", > >>> > >>> It would be interesting to know which of these services is triggered > >>> a lot. One possibility I see would be root thread priority > >>> inheritance, so it would be caused by mode switches. This brings the > >>> question: do your application have threads migrating between primary > >>> and secondary mode, do you see the count of mode switches increase > >>> with the kernel changes, do you have root thread priority > >>> inheritance enabled? > >>> > >> Here a short sum up of our tests and the results and at the end a few > >> questions :-) > >> > >> we are using a Freescale imx6dl on our hardware and upgraded our operating system from > >> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 > >> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 > >> On both Kernels the CONFIG_SMP is set. > >> > >> What we see is that when we running a customer project in a Xenomai task with priority 95 > >> tooks 40% of the CPU time on Kernel 3.0.43 > >> and 47% of CPU time on Kernel 3.10.53 > >> > >> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% > >> To find out what is the reason for this difference we ran the following test. > >> We tried to get the new system faster by change some components of the system. > >> > >> -Changing U-Boot on new system -> still 7% slower > >> -Copy Kernel 3.0.43 to new system -> still 7% slower > >> -Creating Kernel 3.0.43 with > >> Xenomai 2.6.4 and copy it to new system -> still 7% slower > >> -Compiling the new system with > >> old GCC version -> still 7% slower > >> -We also checked the settings for RAM and CPU clock -> these are equal > >> > >> It looks like that is not one of the big components, > >> so we started to test some special functions like rt_timer_tsc() > >> In the following example we stay for 800µs in the while loop and > >> start this loop again after 200µs delay. > >> The task application running this code has priotity 95. > >> > >> Here a simplified code snipped > >> start = rt_timer_tsc(); > >> do > >> { > >> current = rt_timer_tsc(); > >> i++; > >> } while((current - start) < 800) > > If your CPU is running at 1 GHz and uses the global timer as clock > > source, the clock source runs at 500 MHz, so 800 ticks of the tsc is > > something around 1.6 us > Sorry I simplified the code snippet a little bit to much. > Thats the correct code. > > current = rt_timer_tsc2ns(rt_timer_tsc()); > > > So, I do not really understand what you are talking about. But are > > you sure the two kernels use the same clocksource for xenomai? > > > > Could you show us the result of "dmesg | grep I-pipe" with the two > > kernels ? > Output of Kernel 3.10.53 with Xenomai 2.6.4 > I-pipe, 3.000 MHz clocksource > I-pipe, 396.000 MHz clocksource > I-pipe, 396.000 MHz timer > I-pipe, 396.000 MHz timer > I-pipe: head domain Xenomai registered. > > Output of Kernel 3.0.43 with Xenomai 2.6.2.1 > [ 0.000000] I-pipe 1.18-13: pipeline enabled. > [ 0.331999] I-pipe, 396.000 MHz timer > [ 0.335720] I-pipe, 396.000 MHz clocksource > [ 0.844016] I-pipe: Domain Xenomai registered. > > The controller is a imx6dl, this controller can run maximum 800MHz Ok, so the new kernel registers two tsc emulations, could you run the "tsc" regression test to measure the tsc latency? The two tsc emulations have very different latencies, so the result would be unmistakable. -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 8:34 ` Gilles Chanteperdrix @ 2016-06-28 9:15 ` Wolfgang Netbal 2016-06-28 9:17 ` Gilles Chanteperdrix 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-28 9:15 UTC (permalink / raw) To: xenomai Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix: > On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: >>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: >>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: >>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: >>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: >>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: >>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: >>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: >>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: >>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>> Dear all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to >>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system >>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a >>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + >>>>>>>>>>>>>> Linux 3.0.43) was slightly faster. >>>>>>>>>>>>>> >>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls >>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old >>>>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main >>>>>>>>>>>>>> XENOMAI task with priority = 95. >>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() >>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task >>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4. >>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the >>>>>> once that are send by my board using GPIOs, but this virtual interrupts >>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler >>>>>> installed. >>>>>> I'm pretty sure that these interrupts are slowing down my system, but >>>>>> where do they come from ? >>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? >>>>>> how long do they need to process ? >>>>> How do you mean you do not see them? If you are talking about the >>>>> rescheduling API, it used no to be bound to a virq (so, it would >>>>> have a different irq number on cortex A9, something between 0 and 31 >>>>> that would not show in the usual /proc files), I wonder if 3.0 is >>>>> before or after that. You do not see them in /proc, or you see them >>>>> and their count does not increase? >>>> Sorry for the long delay, we ran a lot of tests to find out what could >>>> be the reason for >>>> the performance difference. >>>> >>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to >>>> the virtual >>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel >>>>> As for where they come from, this is not a mystery, the reschedule >>>>> IPI is triggered when code on one cpu changes the scheduler state >>>>> (wakes up a thread for instance) on another cpu. If you want to >>>>> avoid it, do not do that. That means, do not share mutex between >>>>> threads running on different cpus, pay attention for timers to be >>>>> running on the same cpu as the thread they signal, etc... >>>>> >>>>> The APC virq is used to multiplex several services, which you can >>>>> find by grepping the sources for rthal_apc_alloc: >>>>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", >>>>> ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, >>>>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); >>>>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); >>>>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); >>>>> ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", >>>>> >>>>> It would be interesting to know which of these services is triggered >>>>> a lot. One possibility I see would be root thread priority >>>>> inheritance, so it would be caused by mode switches. This brings the >>>>> question: do your application have threads migrating between primary >>>>> and secondary mode, do you see the count of mode switches increase >>>>> with the kernel changes, do you have root thread priority >>>>> inheritance enabled? >>>>> >>>> Here a short sum up of our tests and the results and at the end a few >>>> questions :-) >>>> >>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from >>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 >>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 >>>> On both Kernels the CONFIG_SMP is set. >>>> >>>> What we see is that when we running a customer project in a Xenomai task with priority 95 >>>> tooks 40% of the CPU time on Kernel 3.0.43 >>>> and 47% of CPU time on Kernel 3.10.53 >>>> >>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% >>>> To find out what is the reason for this difference we ran the following test. >>>> We tried to get the new system faster by change some components of the system. >>>> >>>> -Changing U-Boot on new system -> still 7% slower >>>> -Copy Kernel 3.0.43 to new system -> still 7% slower >>>> -Creating Kernel 3.0.43 with >>>> Xenomai 2.6.4 and copy it to new system -> still 7% slower >>>> -Compiling the new system with >>>> old GCC version -> still 7% slower >>>> -We also checked the settings for RAM and CPU clock -> these are equal >>>> >>>> It looks like that is not one of the big components, >>>> so we started to test some special functions like rt_timer_tsc() >>>> In the following example we stay for 800µs in the while loop and >>>> start this loop again after 200µs delay. >>>> The task application running this code has priotity 95. >>>> >>>> Here a simplified code snipped >>>> start = rt_timer_tsc(); >>>> do >>>> { >>>> current = rt_timer_tsc(); >>>> i++; >>>> } while((current - start) < 800) >>> If your CPU is running at 1 GHz and uses the global timer as clock >>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is >>> something around 1.6 us >> Sorry I simplified the code snippet a little bit to much. >> Thats the correct code. >> >> current = rt_timer_tsc2ns(rt_timer_tsc()); >> >>> So, I do not really understand what you are talking about. But are >>> you sure the two kernels use the same clocksource for xenomai? >>> >>> Could you show us the result of "dmesg | grep I-pipe" with the two >>> kernels ? >> Output of Kernel 3.10.53 with Xenomai 2.6.4 >> I-pipe, 3.000 MHz clocksource >> I-pipe, 396.000 MHz clocksource >> I-pipe, 396.000 MHz timer >> I-pipe, 396.000 MHz timer >> I-pipe: head domain Xenomai registered. >> >> Output of Kernel 3.0.43 with Xenomai 2.6.2.1 >> [ 0.000000] I-pipe 1.18-13: pipeline enabled. >> [ 0.331999] I-pipe, 396.000 MHz timer >> [ 0.335720] I-pipe, 396.000 MHz clocksource >> [ 0.844016] I-pipe: Domain Xenomai registered. >> >> The controller is a imx6dl, this controller can run maximum 800MHz > Ok, so the new kernel registers two tsc emulations, could you run > the "tsc" regression test to measure the tsc latency? The two tsc > emulations have very different latencies, so the result would be > unmistakable. > Output of Kernel 3.10.53 with Xenomai 2.6.4 /usr/xenomai/bin/latency == Sampling period: 1000 us == Test mode: periodic user-mode task == All results in microseconds warming up... RTT| 00:00:01 (periodic user-mode task, 1000 us period, priority 99) RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst RTD| -3.048| -1.960| 4.053| 0| 0| -3.048| 4.053 RTD| -3.064| -1.874| 5.936| 0| 0| -3.064| 5.936 RTD| -3.137| -1.963| 3.545| 0| 0| -3.137| 5.936 RTD| -3.069| -1.968| 5.805| 0| 0| -3.137| 5.936 RTD| -3.064| -1.945| 4.371| 0| 0| -3.137| 5.936 RTD| -3.071| -1.905| 3.613| 0| 0| -3.137| 5.936 RTD| -3.119| -1.766| 4.967| 0| 0| -3.137| 5.936 RTD| -3.119| -1.910| 3.883| 0| 0| -3.137| 5.936 RTD| -3.102| -1.910| 5.494| 0| 0| -3.137| 5.936 RTD| -3.107| -1.907| 3.795| 0| 0| -3.137| 5.936 RTD| -3.066| -1.935| 4.068| 0| 0| -3.137| 5.936 RTD| -2.960| -1.920| 4.270| 0| 0| -3.137| 5.936 RTD| -3.190| -2.003| 3.436| 0| 0| -3.190| 5.936 RTD| -3.026| -2.003| 4.679| 0| 0| -3.190| 5.936 RTD| -3.149| -2.011| 3.861| 0| 0| -3.190| 5.936 RTD| -3.059| -1.990| 3.651| 0| 0| -3.190| 5.936 RTD| -3.119| -1.940| 4.249| 0| 0| -3.190| 5.936 RTD| -3.192| -1.983| 4.270| 0| 0| -3.192| 5.936 RTD| -3.026| -2.003| 3.568| 0| 0| -3.192| 5.936 RTD| -3.096| -1.973| 6.376| 0| 0| -3.192| 6.376 RTD| -3.258| -1.953| 5.131| 0| 0| -3.258| 6.376 RTT| 00:00:22 (periodic user-mode task, 1000 us period, priority 99) Can be the two timers the reason for the -3.xxx at lat min ? Is it possible to disable one of the two timers ? Output of Kernel 3.0.43 with Xenomai 2.6.2.1 /usr/xenomai/bin/latency == Sampling period: 1000 us == Test mode: periodic user-mode task == All results in microseconds warming up... RTT| 00:00:01 (periodic user-mode task, 1000 us period, priority 99) RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst RTD| 3.060| 5.098| 10.255| 0| 0| 3.060| 10.255 RTD| 3.073| 5.146| 10.742| 0| 0| 3.060| 10.742 RTD| 2.999| 5.146| 10.818| 0| 0| 2.999| 10.818 RTD| 3.249| 5.146| 10.936| 0| 0| 2.999| 10.936 RTD| 3.169| 5.184| 11.656| 0| 0| 2.999| 11.656 RTD| 3.133| 5.156| 10.881| 0| 0| 2.999| 11.656 RTD| 3.123| 5.068| 9.835| 0| 0| 2.999| 11.656 RTD| 3.032| 5.101| 10.628| 0| 0| 2.999| 11.656 RTD| 3.088| 5.047| 10.492| 0| 0| 2.999| 11.656 RTD| 3.068| 5.159| 11.681| 0| 0| 2.999| 11.681 RTD| 2.967| 5.073| 11.648| 0| 0| 2.967| 11.681 RTD| 3.073| 5.106| 11.499| 0| 0| 2.967| 11.681 RTD| 3.053| 5.063| 10.727| 0| 0| 2.967| 11.681 RTD| 3.159| 5.113| 10.560| 0| 0| 2.967| 11.681 RTD| 3.020| 5.078| 11.578| 0| 0| 2.967| 11.681 RTD| 3.227| 5.146| 10.856| 0| 0| 2.967| 11.681 RTD| 3.186| 5.118| 10.335| 0| 0| 2.967| 11.681 RTD| 3.116| 5.108| 10.782| 0| 0| 2.967| 11.681 RTD| 2.954| 5.095| 11.921| 0| 0| 2.954| 11.921 RTD| 2.952| 5.156| 10.631| 0| 0| 2.952| 11.921 RTD| 2.898| 5.121| 10.699| 0| 0| 2.898| 11.921 RTT| 00:00:22 (periodic user-mode task, 1000 us period, priority 99) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 9:15 ` Wolfgang Netbal @ 2016-06-28 9:17 ` Gilles Chanteperdrix 2016-06-28 9:28 ` Wolfgang Netbal 0 siblings, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-28 9:17 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix: > > On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: > >>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: > >>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: > >>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: > >>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: > >>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: > >>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: > >>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: > >>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: > >>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > >>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>>>> Dear all, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to > >>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system > >>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a > >>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + > >>>>>>>>>>>>>> Linux 3.0.43) was slightly faster. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls > >>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old > >>>>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main > >>>>>>>>>>>>>> XENOMAI task with priority = 95. > >>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() > >>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task > >>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4. > >>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the > >>>>>> once that are send by my board using GPIOs, but this virtual interrupts > >>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler > >>>>>> installed. > >>>>>> I'm pretty sure that these interrupts are slowing down my system, but > >>>>>> where do they come from ? > >>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? > >>>>>> how long do they need to process ? > >>>>> How do you mean you do not see them? If you are talking about the > >>>>> rescheduling API, it used no to be bound to a virq (so, it would > >>>>> have a different irq number on cortex A9, something between 0 and 31 > >>>>> that would not show in the usual /proc files), I wonder if 3.0 is > >>>>> before or after that. You do not see them in /proc, or you see them > >>>>> and their count does not increase? > >>>> Sorry for the long delay, we ran a lot of tests to find out what could > >>>> be the reason for > >>>> the performance difference. > >>>> > >>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to > >>>> the virtual > >>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel > >>>>> As for where they come from, this is not a mystery, the reschedule > >>>>> IPI is triggered when code on one cpu changes the scheduler state > >>>>> (wakes up a thread for instance) on another cpu. If you want to > >>>>> avoid it, do not do that. That means, do not share mutex between > >>>>> threads running on different cpus, pay attention for timers to be > >>>>> running on the same cpu as the thread they signal, etc... > >>>>> > >>>>> The APC virq is used to multiplex several services, which you can > >>>>> find by grepping the sources for rthal_apc_alloc: > >>>>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", > >>>>> ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, > >>>>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); > >>>>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); > >>>>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); > >>>>> ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", > >>>>> > >>>>> It would be interesting to know which of these services is triggered > >>>>> a lot. One possibility I see would be root thread priority > >>>>> inheritance, so it would be caused by mode switches. This brings the > >>>>> question: do your application have threads migrating between primary > >>>>> and secondary mode, do you see the count of mode switches increase > >>>>> with the kernel changes, do you have root thread priority > >>>>> inheritance enabled? > >>>>> > >>>> Here a short sum up of our tests and the results and at the end a few > >>>> questions :-) > >>>> > >>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from > >>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 > >>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 > >>>> On both Kernels the CONFIG_SMP is set. > >>>> > >>>> What we see is that when we running a customer project in a Xenomai task with priority 95 > >>>> tooks 40% of the CPU time on Kernel 3.0.43 > >>>> and 47% of CPU time on Kernel 3.10.53 > >>>> > >>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% > >>>> To find out what is the reason for this difference we ran the following test. > >>>> We tried to get the new system faster by change some components of the system. > >>>> > >>>> -Changing U-Boot on new system -> still 7% slower > >>>> -Copy Kernel 3.0.43 to new system -> still 7% slower > >>>> -Creating Kernel 3.0.43 with > >>>> Xenomai 2.6.4 and copy it to new system -> still 7% slower > >>>> -Compiling the new system with > >>>> old GCC version -> still 7% slower > >>>> -We also checked the settings for RAM and CPU clock -> these are equal > >>>> > >>>> It looks like that is not one of the big components, > >>>> so we started to test some special functions like rt_timer_tsc() > >>>> In the following example we stay for 800µs in the while loop and > >>>> start this loop again after 200µs delay. > >>>> The task application running this code has priotity 95. > >>>> > >>>> Here a simplified code snipped > >>>> start = rt_timer_tsc(); > >>>> do > >>>> { > >>>> current = rt_timer_tsc(); > >>>> i++; > >>>> } while((current - start) < 800) > >>> If your CPU is running at 1 GHz and uses the global timer as clock > >>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is > >>> something around 1.6 us > >> Sorry I simplified the code snippet a little bit to much. > >> Thats the correct code. > >> > >> current = rt_timer_tsc2ns(rt_timer_tsc()); > >> > >>> So, I do not really understand what you are talking about. But are > >>> you sure the two kernels use the same clocksource for xenomai? > >>> > >>> Could you show us the result of "dmesg | grep I-pipe" with the two > >>> kernels ? > >> Output of Kernel 3.10.53 with Xenomai 2.6.4 > >> I-pipe, 3.000 MHz clocksource > >> I-pipe, 396.000 MHz clocksource > >> I-pipe, 396.000 MHz timer > >> I-pipe, 396.000 MHz timer > >> I-pipe: head domain Xenomai registered. > >> > >> Output of Kernel 3.0.43 with Xenomai 2.6.2.1 > >> [ 0.000000] I-pipe 1.18-13: pipeline enabled. > >> [ 0.331999] I-pipe, 396.000 MHz timer > >> [ 0.335720] I-pipe, 396.000 MHz clocksource > >> [ 0.844016] I-pipe: Domain Xenomai registered. > >> > >> The controller is a imx6dl, this controller can run maximum 800MHz > > Ok, so the new kernel registers two tsc emulations, could you run > > the "tsc" regression test to measure the tsc latency? The two tsc > > emulations have very different latencies, so the result would be > > unmistakable. > > > > Output of Kernel 3.10.53 with Xenomai 2.6.4 > /usr/xenomai/bin/latency This test is named "latency", not "tsc". As the different names imply, they are not measuring the same thing. -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 9:17 ` Gilles Chanteperdrix @ 2016-06-28 9:28 ` Wolfgang Netbal 2016-06-28 9:29 ` Gilles Chanteperdrix 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-28 9:28 UTC (permalink / raw) To: gilles.chanteperdrix; +Cc: xenomai Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix: > On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix: >>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote: >>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: >>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: >>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: >>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: >>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: >>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: >>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: >>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>>>> Dear all, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to >>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system >>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a >>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + >>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls >>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old >>>>>>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main >>>>>>>>>>>>>>>> XENOMAI task with priority = 95. >>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() >>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task >>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4. >>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the >>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts >>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler >>>>>>>> installed. >>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but >>>>>>>> where do they come from ? >>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? >>>>>>>> how long do they need to process ? >>>>>>> How do you mean you do not see them? If you are talking about the >>>>>>> rescheduling API, it used no to be bound to a virq (so, it would >>>>>>> have a different irq number on cortex A9, something between 0 and 31 >>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is >>>>>>> before or after that. You do not see them in /proc, or you see them >>>>>>> and their count does not increase? >>>>>> Sorry for the long delay, we ran a lot of tests to find out what could >>>>>> be the reason for >>>>>> the performance difference. >>>>>> >>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to >>>>>> the virtual >>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel >>>>>>> As for where they come from, this is not a mystery, the reschedule >>>>>>> IPI is triggered when code on one cpu changes the scheduler state >>>>>>> (wakes up a thread for instance) on another cpu. If you want to >>>>>>> avoid it, do not do that. That means, do not share mutex between >>>>>>> threads running on different cpus, pay attention for timers to be >>>>>>> running on the same cpu as the thread they signal, etc... >>>>>>> >>>>>>> The APC virq is used to multiplex several services, which you can >>>>>>> find by grepping the sources for rthal_apc_alloc: >>>>>>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", >>>>>>> ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, >>>>>>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); >>>>>>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); >>>>>>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); >>>>>>> ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", >>>>>>> >>>>>>> It would be interesting to know which of these services is triggered >>>>>>> a lot. One possibility I see would be root thread priority >>>>>>> inheritance, so it would be caused by mode switches. This brings the >>>>>>> question: do your application have threads migrating between primary >>>>>>> and secondary mode, do you see the count of mode switches increase >>>>>>> with the kernel changes, do you have root thread priority >>>>>>> inheritance enabled? >>>>>>> >>>>>> Here a short sum up of our tests and the results and at the end a few >>>>>> questions :-) >>>>>> >>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from >>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 >>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 >>>>>> On both Kernels the CONFIG_SMP is set. >>>>>> >>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95 >>>>>> tooks 40% of the CPU time on Kernel 3.0.43 >>>>>> and 47% of CPU time on Kernel 3.10.53 >>>>>> >>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% >>>>>> To find out what is the reason for this difference we ran the following test. >>>>>> We tried to get the new system faster by change some components of the system. >>>>>> >>>>>> -Changing U-Boot on new system -> still 7% slower >>>>>> -Copy Kernel 3.0.43 to new system -> still 7% slower >>>>>> -Creating Kernel 3.0.43 with >>>>>> Xenomai 2.6.4 and copy it to new system -> still 7% slower >>>>>> -Compiling the new system with >>>>>> old GCC version -> still 7% slower >>>>>> -We also checked the settings for RAM and CPU clock -> these are equal >>>>>> >>>>>> It looks like that is not one of the big components, >>>>>> so we started to test some special functions like rt_timer_tsc() >>>>>> In the following example we stay for 800µs in the while loop and >>>>>> start this loop again after 200µs delay. >>>>>> The task application running this code has priotity 95. >>>>>> >>>>>> Here a simplified code snipped >>>>>> start = rt_timer_tsc(); >>>>>> do >>>>>> { >>>>>> current = rt_timer_tsc(); >>>>>> i++; >>>>>> } while((current - start) < 800) >>>>> If your CPU is running at 1 GHz and uses the global timer as clock >>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is >>>>> something around 1.6 us >>>> Sorry I simplified the code snippet a little bit to much. >>>> Thats the correct code. >>>> >>>> current = rt_timer_tsc2ns(rt_timer_tsc()); >>>> >>>>> So, I do not really understand what you are talking about. But are >>>>> you sure the two kernels use the same clocksource for xenomai? >>>>> >>>>> Could you show us the result of "dmesg | grep I-pipe" with the two >>>>> kernels ? >>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 >>>> I-pipe, 3.000 MHz clocksource >>>> I-pipe, 396.000 MHz clocksource >>>> I-pipe, 396.000 MHz timer >>>> I-pipe, 396.000 MHz timer >>>> I-pipe: head domain Xenomai registered. >>>> >>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1 >>>> [ 0.000000] I-pipe 1.18-13: pipeline enabled. >>>> [ 0.331999] I-pipe, 396.000 MHz timer >>>> [ 0.335720] I-pipe, 396.000 MHz clocksource >>>> [ 0.844016] I-pipe: Domain Xenomai registered. >>>> >>>> The controller is a imx6dl, this controller can run maximum 800MHz >>> Ok, so the new kernel registers two tsc emulations, could you run >>> the "tsc" regression test to measure the tsc latency? The two tsc >>> emulations have very different latencies, so the result would be >>> unmistakable. >>> >> Output of Kernel 3.10.53 with Xenomai 2.6.4 >> /usr/xenomai/bin/latency > This test is named "latency", not "tsc". As the different names > imply, they are not measuring the same thing. > Sorry for the stupied question, but where do I find the "tsc" test, because in folder /usr/xenomai/bin/ is it not located ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 9:28 ` Wolfgang Netbal @ 2016-06-28 9:29 ` Gilles Chanteperdrix 2016-06-28 9:51 ` Wolfgang Netbal 0 siblings, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-28 9:29 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix: > > On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix: > >>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote: > >>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: > >>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: > >>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: > >>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: > >>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: > >>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: > >>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: > >>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: > >>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > >>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>>>>>> Dear all, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to > >>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system > >>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a > >>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + > >>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls > >>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old > >>>>>>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main > >>>>>>>>>>>>>>>> XENOMAI task with priority = 95. > >>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() > >>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task > >>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4. > >>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the > >>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts > >>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler > >>>>>>>> installed. > >>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but > >>>>>>>> where do they come from ? > >>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? > >>>>>>>> how long do they need to process ? > >>>>>>> How do you mean you do not see them? If you are talking about the > >>>>>>> rescheduling API, it used no to be bound to a virq (so, it would > >>>>>>> have a different irq number on cortex A9, something between 0 and 31 > >>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is > >>>>>>> before or after that. You do not see them in /proc, or you see them > >>>>>>> and their count does not increase? > >>>>>> Sorry for the long delay, we ran a lot of tests to find out what could > >>>>>> be the reason for > >>>>>> the performance difference. > >>>>>> > >>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to > >>>>>> the virtual > >>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel > >>>>>>> As for where they come from, this is not a mystery, the reschedule > >>>>>>> IPI is triggered when code on one cpu changes the scheduler state > >>>>>>> (wakes up a thread for instance) on another cpu. If you want to > >>>>>>> avoid it, do not do that. That means, do not share mutex between > >>>>>>> threads running on different cpus, pay attention for timers to be > >>>>>>> running on the same cpu as the thread they signal, etc... > >>>>>>> > >>>>>>> The APC virq is used to multiplex several services, which you can > >>>>>>> find by grepping the sources for rthal_apc_alloc: > >>>>>>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", > >>>>>>> ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, > >>>>>>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); > >>>>>>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); > >>>>>>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); > >>>>>>> ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", > >>>>>>> > >>>>>>> It would be interesting to know which of these services is triggered > >>>>>>> a lot. One possibility I see would be root thread priority > >>>>>>> inheritance, so it would be caused by mode switches. This brings the > >>>>>>> question: do your application have threads migrating between primary > >>>>>>> and secondary mode, do you see the count of mode switches increase > >>>>>>> with the kernel changes, do you have root thread priority > >>>>>>> inheritance enabled? > >>>>>>> > >>>>>> Here a short sum up of our tests and the results and at the end a few > >>>>>> questions :-) > >>>>>> > >>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from > >>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 > >>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 > >>>>>> On both Kernels the CONFIG_SMP is set. > >>>>>> > >>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95 > >>>>>> tooks 40% of the CPU time on Kernel 3.0.43 > >>>>>> and 47% of CPU time on Kernel 3.10.53 > >>>>>> > >>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% > >>>>>> To find out what is the reason for this difference we ran the following test. > >>>>>> We tried to get the new system faster by change some components of the system. > >>>>>> > >>>>>> -Changing U-Boot on new system -> still 7% slower > >>>>>> -Copy Kernel 3.0.43 to new system -> still 7% slower > >>>>>> -Creating Kernel 3.0.43 with > >>>>>> Xenomai 2.6.4 and copy it to new system -> still 7% slower > >>>>>> -Compiling the new system with > >>>>>> old GCC version -> still 7% slower > >>>>>> -We also checked the settings for RAM and CPU clock -> these are equal > >>>>>> > >>>>>> It looks like that is not one of the big components, > >>>>>> so we started to test some special functions like rt_timer_tsc() > >>>>>> In the following example we stay for 800µs in the while loop and > >>>>>> start this loop again after 200µs delay. > >>>>>> The task application running this code has priotity 95. > >>>>>> > >>>>>> Here a simplified code snipped > >>>>>> start = rt_timer_tsc(); > >>>>>> do > >>>>>> { > >>>>>> current = rt_timer_tsc(); > >>>>>> i++; > >>>>>> } while((current - start) < 800) > >>>>> If your CPU is running at 1 GHz and uses the global timer as clock > >>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is > >>>>> something around 1.6 us > >>>> Sorry I simplified the code snippet a little bit to much. > >>>> Thats the correct code. > >>>> > >>>> current = rt_timer_tsc2ns(rt_timer_tsc()); > >>>> > >>>>> So, I do not really understand what you are talking about. But are > >>>>> you sure the two kernels use the same clocksource for xenomai? > >>>>> > >>>>> Could you show us the result of "dmesg | grep I-pipe" with the two > >>>>> kernels ? > >>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 > >>>> I-pipe, 3.000 MHz clocksource > >>>> I-pipe, 396.000 MHz clocksource > >>>> I-pipe, 396.000 MHz timer > >>>> I-pipe, 396.000 MHz timer > >>>> I-pipe: head domain Xenomai registered. > >>>> > >>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1 > >>>> [ 0.000000] I-pipe 1.18-13: pipeline enabled. > >>>> [ 0.331999] I-pipe, 396.000 MHz timer > >>>> [ 0.335720] I-pipe, 396.000 MHz clocksource > >>>> [ 0.844016] I-pipe: Domain Xenomai registered. > >>>> > >>>> The controller is a imx6dl, this controller can run maximum 800MHz > >>> Ok, so the new kernel registers two tsc emulations, could you run > >>> the "tsc" regression test to measure the tsc latency? The two tsc > >>> emulations have very different latencies, so the result would be > >>> unmistakable. > >>> > >> Output of Kernel 3.10.53 with Xenomai 2.6.4 > >> /usr/xenomai/bin/latency > > This test is named "latency", not "tsc". As the different names > > imply, they are not measuring the same thing. > > > Sorry for the stupied question, > but where do I find the "tsc" test, because in folder /usr/xenomai/bin/ > is it not located You have millions of files on your target? Or you compiled busybox without support for the "find" utility ? -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 9:29 ` Gilles Chanteperdrix @ 2016-06-28 9:51 ` Wolfgang Netbal 2016-06-28 9:55 ` Gilles Chanteperdrix 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-28 9:51 UTC (permalink / raw) To: gilles.chanteperdrix; +Cc: xenomai Am 2016-06-28 um 11:29 schrieb Gilles Chanteperdrix: > On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix: >>> On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote: >>>> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix: >>>>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote: >>>>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: >>>>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: >>>>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: >>>>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: >>>>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: >>>>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>>>>>> Dear all, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to >>>>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system >>>>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a >>>>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + >>>>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls >>>>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old >>>>>>>>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main >>>>>>>>>>>>>>>>>> XENOMAI task with priority = 95. >>>>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() >>>>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task >>>>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4. >>>>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the >>>>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts >>>>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler >>>>>>>>>> installed. >>>>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but >>>>>>>>>> where do they come from ? >>>>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? >>>>>>>>>> how long do they need to process ? >>>>>>>>> How do you mean you do not see them? If you are talking about the >>>>>>>>> rescheduling API, it used no to be bound to a virq (so, it would >>>>>>>>> have a different irq number on cortex A9, something between 0 and 31 >>>>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is >>>>>>>>> before or after that. You do not see them in /proc, or you see them >>>>>>>>> and their count does not increase? >>>>>>>> Sorry for the long delay, we ran a lot of tests to find out what could >>>>>>>> be the reason for >>>>>>>> the performance difference. >>>>>>>> >>>>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to >>>>>>>> the virtual >>>>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel >>>>>>>>> As for where they come from, this is not a mystery, the reschedule >>>>>>>>> IPI is triggered when code on one cpu changes the scheduler state >>>>>>>>> (wakes up a thread for instance) on another cpu. If you want to >>>>>>>>> avoid it, do not do that. That means, do not share mutex between >>>>>>>>> threads running on different cpus, pay attention for timers to be >>>>>>>>> running on the same cpu as the thread they signal, etc... >>>>>>>>> >>>>>>>>> The APC virq is used to multiplex several services, which you can >>>>>>>>> find by grepping the sources for rthal_apc_alloc: >>>>>>>>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", >>>>>>>>> ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, >>>>>>>>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); >>>>>>>>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); >>>>>>>>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); >>>>>>>>> ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", >>>>>>>>> >>>>>>>>> It would be interesting to know which of these services is triggered >>>>>>>>> a lot. One possibility I see would be root thread priority >>>>>>>>> inheritance, so it would be caused by mode switches. This brings the >>>>>>>>> question: do your application have threads migrating between primary >>>>>>>>> and secondary mode, do you see the count of mode switches increase >>>>>>>>> with the kernel changes, do you have root thread priority >>>>>>>>> inheritance enabled? >>>>>>>>> >>>>>>>> Here a short sum up of our tests and the results and at the end a few >>>>>>>> questions :-) >>>>>>>> >>>>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from >>>>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 >>>>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 >>>>>>>> On both Kernels the CONFIG_SMP is set. >>>>>>>> >>>>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95 >>>>>>>> tooks 40% of the CPU time on Kernel 3.0.43 >>>>>>>> and 47% of CPU time on Kernel 3.10.53 >>>>>>>> >>>>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% >>>>>>>> To find out what is the reason for this difference we ran the following test. >>>>>>>> We tried to get the new system faster by change some components of the system. >>>>>>>> >>>>>>>> -Changing U-Boot on new system -> still 7% slower >>>>>>>> -Copy Kernel 3.0.43 to new system -> still 7% slower >>>>>>>> -Creating Kernel 3.0.43 with >>>>>>>> Xenomai 2.6.4 and copy it to new system -> still 7% slower >>>>>>>> -Compiling the new system with >>>>>>>> old GCC version -> still 7% slower >>>>>>>> -We also checked the settings for RAM and CPU clock -> these are equal >>>>>>>> >>>>>>>> It looks like that is not one of the big components, >>>>>>>> so we started to test some special functions like rt_timer_tsc() >>>>>>>> In the following example we stay for 800µs in the while loop and >>>>>>>> start this loop again after 200µs delay. >>>>>>>> The task application running this code has priotity 95. >>>>>>>> >>>>>>>> Here a simplified code snipped >>>>>>>> start = rt_timer_tsc(); >>>>>>>> do >>>>>>>> { >>>>>>>> current = rt_timer_tsc(); >>>>>>>> i++; >>>>>>>> } while((current - start) < 800) >>>>>>> If your CPU is running at 1 GHz and uses the global timer as clock >>>>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is >>>>>>> something around 1.6 us >>>>>> Sorry I simplified the code snippet a little bit to much. >>>>>> Thats the correct code. >>>>>> >>>>>> current = rt_timer_tsc2ns(rt_timer_tsc()); >>>>>> >>>>>>> So, I do not really understand what you are talking about. But are >>>>>>> you sure the two kernels use the same clocksource for xenomai? >>>>>>> >>>>>>> Could you show us the result of "dmesg | grep I-pipe" with the two >>>>>>> kernels ? >>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 >>>>>> I-pipe, 3.000 MHz clocksource >>>>>> I-pipe, 396.000 MHz clocksource >>>>>> I-pipe, 396.000 MHz timer >>>>>> I-pipe, 396.000 MHz timer >>>>>> I-pipe: head domain Xenomai registered. >>>>>> >>>>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1 >>>>>> [ 0.000000] I-pipe 1.18-13: pipeline enabled. >>>>>> [ 0.331999] I-pipe, 396.000 MHz timer >>>>>> [ 0.335720] I-pipe, 396.000 MHz clocksource >>>>>> [ 0.844016] I-pipe: Domain Xenomai registered. >>>>>> >>>>>> The controller is a imx6dl, this controller can run maximum 800MHz >>>>> Ok, so the new kernel registers two tsc emulations, could you run >>>>> the "tsc" regression test to measure the tsc latency? The two tsc >>>>> emulations have very different latencies, so the result would be >>>>> unmistakable. >>>>> >>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 >>>> /usr/xenomai/bin/latency >>> This test is named "latency", not "tsc". As the different names >>> imply, they are not measuring the same thing. >>> >> Sorry for the stupied question, >> but where do I find the "tsc" test, because in folder /usr/xenomai/bin/ >> is it not located > You have millions of files on your target? Or you compiled busybox > without support for the "find" utility ? > Sorry I searched for tsc befor but didn't find any executable called "tsc" what I find is /usr/share/ghostscript and its config files and /usr/bin/xtscal. I find in the xenomai sources the file tsc.c, could you please tell me what option do I have to enable that this will be build ? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 9:51 ` Wolfgang Netbal @ 2016-06-28 9:55 ` Gilles Chanteperdrix 2016-06-28 10:10 ` Wolfgang Netbal 0 siblings, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-28 9:55 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Tue, Jun 28, 2016 at 11:51:39AM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-28 um 11:29 schrieb Gilles Chanteperdrix: > > On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix: > >>> On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote: > >>>> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix: > >>>>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote: > >>>>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: > >>>>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: > >>>>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: > >>>>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: > >>>>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: > >>>>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: > >>>>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > >>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>>>>>>>> Dear all, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to > >>>>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system > >>>>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a > >>>>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + > >>>>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls > >>>>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old > >>>>>>>>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main > >>>>>>>>>>>>>>>>>> XENOMAI task with priority = 95. > >>>>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() > >>>>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task > >>>>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4. > >>>>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the > >>>>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts > >>>>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler > >>>>>>>>>> installed. > >>>>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but > >>>>>>>>>> where do they come from ? > >>>>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? > >>>>>>>>>> how long do they need to process ? > >>>>>>>>> How do you mean you do not see them? If you are talking about the > >>>>>>>>> rescheduling API, it used no to be bound to a virq (so, it would > >>>>>>>>> have a different irq number on cortex A9, something between 0 and 31 > >>>>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is > >>>>>>>>> before or after that. You do not see them in /proc, or you see them > >>>>>>>>> and their count does not increase? > >>>>>>>> Sorry for the long delay, we ran a lot of tests to find out what could > >>>>>>>> be the reason for > >>>>>>>> the performance difference. > >>>>>>>> > >>>>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to > >>>>>>>> the virtual > >>>>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel > >>>>>>>>> As for where they come from, this is not a mystery, the reschedule > >>>>>>>>> IPI is triggered when code on one cpu changes the scheduler state > >>>>>>>>> (wakes up a thread for instance) on another cpu. If you want to > >>>>>>>>> avoid it, do not do that. That means, do not share mutex between > >>>>>>>>> threads running on different cpus, pay attention for timers to be > >>>>>>>>> running on the same cpu as the thread they signal, etc... > >>>>>>>>> > >>>>>>>>> The APC virq is used to multiplex several services, which you can > >>>>>>>>> find by grepping the sources for rthal_apc_alloc: > >>>>>>>>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", > >>>>>>>>> ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, > >>>>>>>>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); > >>>>>>>>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); > >>>>>>>>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); > >>>>>>>>> ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", > >>>>>>>>> > >>>>>>>>> It would be interesting to know which of these services is triggered > >>>>>>>>> a lot. One possibility I see would be root thread priority > >>>>>>>>> inheritance, so it would be caused by mode switches. This brings the > >>>>>>>>> question: do your application have threads migrating between primary > >>>>>>>>> and secondary mode, do you see the count of mode switches increase > >>>>>>>>> with the kernel changes, do you have root thread priority > >>>>>>>>> inheritance enabled? > >>>>>>>>> > >>>>>>>> Here a short sum up of our tests and the results and at the end a few > >>>>>>>> questions :-) > >>>>>>>> > >>>>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from > >>>>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 > >>>>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 > >>>>>>>> On both Kernels the CONFIG_SMP is set. > >>>>>>>> > >>>>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95 > >>>>>>>> tooks 40% of the CPU time on Kernel 3.0.43 > >>>>>>>> and 47% of CPU time on Kernel 3.10.53 > >>>>>>>> > >>>>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% > >>>>>>>> To find out what is the reason for this difference we ran the following test. > >>>>>>>> We tried to get the new system faster by change some components of the system. > >>>>>>>> > >>>>>>>> -Changing U-Boot on new system -> still 7% slower > >>>>>>>> -Copy Kernel 3.0.43 to new system -> still 7% slower > >>>>>>>> -Creating Kernel 3.0.43 with > >>>>>>>> Xenomai 2.6.4 and copy it to new system -> still 7% slower > >>>>>>>> -Compiling the new system with > >>>>>>>> old GCC version -> still 7% slower > >>>>>>>> -We also checked the settings for RAM and CPU clock -> these are equal > >>>>>>>> > >>>>>>>> It looks like that is not one of the big components, > >>>>>>>> so we started to test some special functions like rt_timer_tsc() > >>>>>>>> In the following example we stay for 800µs in the while loop and > >>>>>>>> start this loop again after 200µs delay. > >>>>>>>> The task application running this code has priotity 95. > >>>>>>>> > >>>>>>>> Here a simplified code snipped > >>>>>>>> start = rt_timer_tsc(); > >>>>>>>> do > >>>>>>>> { > >>>>>>>> current = rt_timer_tsc(); > >>>>>>>> i++; > >>>>>>>> } while((current - start) < 800) > >>>>>>> If your CPU is running at 1 GHz and uses the global timer as clock > >>>>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is > >>>>>>> something around 1.6 us > >>>>>> Sorry I simplified the code snippet a little bit to much. > >>>>>> Thats the correct code. > >>>>>> > >>>>>> current = rt_timer_tsc2ns(rt_timer_tsc()); > >>>>>> > >>>>>>> So, I do not really understand what you are talking about. But are > >>>>>>> you sure the two kernels use the same clocksource for xenomai? > >>>>>>> > >>>>>>> Could you show us the result of "dmesg | grep I-pipe" with the two > >>>>>>> kernels ? > >>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 > >>>>>> I-pipe, 3.000 MHz clocksource > >>>>>> I-pipe, 396.000 MHz clocksource > >>>>>> I-pipe, 396.000 MHz timer > >>>>>> I-pipe, 396.000 MHz timer > >>>>>> I-pipe: head domain Xenomai registered. > >>>>>> > >>>>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1 > >>>>>> [ 0.000000] I-pipe 1.18-13: pipeline enabled. > >>>>>> [ 0.331999] I-pipe, 396.000 MHz timer > >>>>>> [ 0.335720] I-pipe, 396.000 MHz clocksource > >>>>>> [ 0.844016] I-pipe: Domain Xenomai registered. > >>>>>> > >>>>>> The controller is a imx6dl, this controller can run maximum 800MHz > >>>>> Ok, so the new kernel registers two tsc emulations, could you run > >>>>> the "tsc" regression test to measure the tsc latency? The two tsc > >>>>> emulations have very different latencies, so the result would be > >>>>> unmistakable. > >>>>> > >>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 > >>>> /usr/xenomai/bin/latency > >>> This test is named "latency", not "tsc". As the different names > >>> imply, they are not measuring the same thing. > >>> > >> Sorry for the stupied question, > >> but where do I find the "tsc" test, because in folder /usr/xenomai/bin/ > >> is it not located > > You have millions of files on your target? Or you compiled busybox > > without support for the "find" utility ? > > > Sorry I searched for tsc befor but didn't find any executable called "tsc" > what I find is /usr/share/ghostscript and its config files and > /usr/bin/xtscal. > > I find in the xenomai sources the file tsc.c, could you please tell me > what option do I have to enable that this will be build ? There is no option that can disable its compilation/installation as far as I know. It is normally built and installed under @XENO_TEST_DIR@/regression/native So the installation directory depends on the value you passed to configure --with-testdir option (the default being /usr/bin). -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 9:55 ` Gilles Chanteperdrix @ 2016-06-28 10:10 ` Wolfgang Netbal 2016-06-28 10:19 ` Gilles Chanteperdrix 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-28 10:10 UTC (permalink / raw) To: gilles.chanteperdrix; +Cc: xenomai Am 2016-06-28 um 11:55 schrieb Gilles Chanteperdrix: > On Tue, Jun 28, 2016 at 11:51:39AM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-28 um 11:29 schrieb Gilles Chanteperdrix: >>> On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote: >>>> Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix: >>>>> On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote: >>>>>> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix: >>>>>>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote: >>>>>>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: >>>>>>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: >>>>>>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: >>>>>>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>>>>>>>> Dear all, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to >>>>>>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system >>>>>>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a >>>>>>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + >>>>>>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls >>>>>>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old >>>>>>>>>>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main >>>>>>>>>>>>>>>>>>>> XENOMAI task with priority = 95. >>>>>>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() >>>>>>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task >>>>>>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4. >>>>>>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the >>>>>>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts >>>>>>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler >>>>>>>>>>>> installed. >>>>>>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but >>>>>>>>>>>> where do they come from ? >>>>>>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? >>>>>>>>>>>> how long do they need to process ? >>>>>>>>>>> How do you mean you do not see them? If you are talking about the >>>>>>>>>>> rescheduling API, it used no to be bound to a virq (so, it would >>>>>>>>>>> have a different irq number on cortex A9, something between 0 and 31 >>>>>>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is >>>>>>>>>>> before or after that. You do not see them in /proc, or you see them >>>>>>>>>>> and their count does not increase? >>>>>>>>>> Sorry for the long delay, we ran a lot of tests to find out what could >>>>>>>>>> be the reason for >>>>>>>>>> the performance difference. >>>>>>>>>> >>>>>>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to >>>>>>>>>> the virtual >>>>>>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel >>>>>>>>>>> As for where they come from, this is not a mystery, the reschedule >>>>>>>>>>> IPI is triggered when code on one cpu changes the scheduler state >>>>>>>>>>> (wakes up a thread for instance) on another cpu. If you want to >>>>>>>>>>> avoid it, do not do that. That means, do not share mutex between >>>>>>>>>>> threads running on different cpus, pay attention for timers to be >>>>>>>>>>> running on the same cpu as the thread they signal, etc... >>>>>>>>>>> >>>>>>>>>>> The APC virq is used to multiplex several services, which you can >>>>>>>>>>> find by grepping the sources for rthal_apc_alloc: >>>>>>>>>>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", >>>>>>>>>>> ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, >>>>>>>>>>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); >>>>>>>>>>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); >>>>>>>>>>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); >>>>>>>>>>> ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", >>>>>>>>>>> >>>>>>>>>>> It would be interesting to know which of these services is triggered >>>>>>>>>>> a lot. One possibility I see would be root thread priority >>>>>>>>>>> inheritance, so it would be caused by mode switches. This brings the >>>>>>>>>>> question: do your application have threads migrating between primary >>>>>>>>>>> and secondary mode, do you see the count of mode switches increase >>>>>>>>>>> with the kernel changes, do you have root thread priority >>>>>>>>>>> inheritance enabled? >>>>>>>>>>> >>>>>>>>>> Here a short sum up of our tests and the results and at the end a few >>>>>>>>>> questions :-) >>>>>>>>>> >>>>>>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from >>>>>>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 >>>>>>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 >>>>>>>>>> On both Kernels the CONFIG_SMP is set. >>>>>>>>>> >>>>>>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95 >>>>>>>>>> tooks 40% of the CPU time on Kernel 3.0.43 >>>>>>>>>> and 47% of CPU time on Kernel 3.10.53 >>>>>>>>>> >>>>>>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% >>>>>>>>>> To find out what is the reason for this difference we ran the following test. >>>>>>>>>> We tried to get the new system faster by change some components of the system. >>>>>>>>>> >>>>>>>>>> -Changing U-Boot on new system -> still 7% slower >>>>>>>>>> -Copy Kernel 3.0.43 to new system -> still 7% slower >>>>>>>>>> -Creating Kernel 3.0.43 with >>>>>>>>>> Xenomai 2.6.4 and copy it to new system -> still 7% slower >>>>>>>>>> -Compiling the new system with >>>>>>>>>> old GCC version -> still 7% slower >>>>>>>>>> -We also checked the settings for RAM and CPU clock -> these are equal >>>>>>>>>> >>>>>>>>>> It looks like that is not one of the big components, >>>>>>>>>> so we started to test some special functions like rt_timer_tsc() >>>>>>>>>> In the following example we stay for 800µs in the while loop and >>>>>>>>>> start this loop again after 200µs delay. >>>>>>>>>> The task application running this code has priotity 95. >>>>>>>>>> >>>>>>>>>> Here a simplified code snipped >>>>>>>>>> start = rt_timer_tsc(); >>>>>>>>>> do >>>>>>>>>> { >>>>>>>>>> current = rt_timer_tsc(); >>>>>>>>>> i++; >>>>>>>>>> } while((current - start) < 800) >>>>>>>>> If your CPU is running at 1 GHz and uses the global timer as clock >>>>>>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is >>>>>>>>> something around 1.6 us >>>>>>>> Sorry I simplified the code snippet a little bit to much. >>>>>>>> Thats the correct code. >>>>>>>> >>>>>>>> current = rt_timer_tsc2ns(rt_timer_tsc()); >>>>>>>> >>>>>>>>> So, I do not really understand what you are talking about. But are >>>>>>>>> you sure the two kernels use the same clocksource for xenomai? >>>>>>>>> >>>>>>>>> Could you show us the result of "dmesg | grep I-pipe" with the two >>>>>>>>> kernels ? >>>>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 >>>>>>>> I-pipe, 3.000 MHz clocksource >>>>>>>> I-pipe, 396.000 MHz clocksource >>>>>>>> I-pipe, 396.000 MHz timer >>>>>>>> I-pipe, 396.000 MHz timer >>>>>>>> I-pipe: head domain Xenomai registered. >>>>>>>> >>>>>>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1 >>>>>>>> [ 0.000000] I-pipe 1.18-13: pipeline enabled. >>>>>>>> [ 0.331999] I-pipe, 396.000 MHz timer >>>>>>>> [ 0.335720] I-pipe, 396.000 MHz clocksource >>>>>>>> [ 0.844016] I-pipe: Domain Xenomai registered. >>>>>>>> >>>>>>>> The controller is a imx6dl, this controller can run maximum 800MHz >>>>>>> Ok, so the new kernel registers two tsc emulations, could you run >>>>>>> the "tsc" regression test to measure the tsc latency? The two tsc >>>>>>> emulations have very different latencies, so the result would be >>>>>>> unmistakable. >>>>>>> >>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 >>>>>> /usr/xenomai/bin/latency >>>>> This test is named "latency", not "tsc". As the different names >>>>> imply, they are not measuring the same thing. >>>>> >>>> Sorry for the stupied question, >>>> but where do I find the "tsc" test, because in folder /usr/xenomai/bin/ >>>> is it not located >>> You have millions of files on your target? Or you compiled busybox >>> without support for the "find" utility ? >>> >> Sorry I searched for tsc befor but didn't find any executable called "tsc" >> what I find is /usr/share/ghostscript and its config files and >> /usr/bin/xtscal. >> >> I find in the xenomai sources the file tsc.c, could you please tell me >> what option do I have to enable that this will be build ? > There is no option that can disable its compilation/installation as > far as I know. > > It is normally built and installed under > @XENO_TEST_DIR@/regression/native > > So the installation directory depends on the value you passed to > configure --with-testdir option (the default being /usr/bin). > Thanks a lot I found it and downloaded it to my target Here are the output for Kernel 3.10.53 and Xenomai 2.6.4 #> ./tsc Checking tsc for 1 minute(s) min: 10, max: 596, avg: 10.5056 min: 10, max: 595, avg: 10.5053 min: 10, max: 603, avg: 10.5053 min: 10, max: 630, avg: 10.5052 min: 10, max: 600, avg: 10.505 min: 10, max: 595, avg: 10.5056 min: 10, max: 562, avg: 10.505 min: 10, max: 605, avg: 10.5056 min: 10, max: 602, avg: 10.5055 min: 10, max: 595, avg: 10.5052 Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1 #> ./tsc Checking tsc for 1 minute(s) min: 10, max: 611, avg: 11.5499 min: 10, max: 608, avg: 11.5443 min: 10, max: 81, avg: 11.5265 min: 10, max: 53, avg: 11.5155 min: 10, max: 152, avg: 11.5239 min: 10, max: 618, avg: 11.5352 min: 10, max: 588, avg: 11.5398 min: 10, max: 81, avg: 11.5269 min: 10, max: 532, avg: 11.5541 min: 10, max: 80, avg: 11.5394 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 10:10 ` Wolfgang Netbal @ 2016-06-28 10:19 ` Gilles Chanteperdrix 2016-06-28 10:31 ` Wolfgang Netbal 0 siblings, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-28 10:19 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Tue, Jun 28, 2016 at 12:10:06PM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-28 um 11:55 schrieb Gilles Chanteperdrix: > > On Tue, Jun 28, 2016 at 11:51:39AM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-28 um 11:29 schrieb Gilles Chanteperdrix: > >>> On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote: > >>>> Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix: > >>>>> On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote: > >>>>>> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix: > >>>>>>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote: > >>>>>>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: > >>>>>>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: > >>>>>>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: > >>>>>>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: > >>>>>>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: > >>>>>>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > >>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>>>>>>>>>>>> Dear all, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to > >>>>>>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system > >>>>>>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a > >>>>>>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + > >>>>>>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls > >>>>>>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old > >>>>>>>>>>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main > >>>>>>>>>>>>>>>>>>>> XENOMAI task with priority = 95. > >>>>>>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() > >>>>>>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task > >>>>>>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4. > >>>>>>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the > >>>>>>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts > >>>>>>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler > >>>>>>>>>>>> installed. > >>>>>>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but > >>>>>>>>>>>> where do they come from ? > >>>>>>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? > >>>>>>>>>>>> how long do they need to process ? > >>>>>>>>>>> How do you mean you do not see them? If you are talking about the > >>>>>>>>>>> rescheduling API, it used no to be bound to a virq (so, it would > >>>>>>>>>>> have a different irq number on cortex A9, something between 0 and 31 > >>>>>>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is > >>>>>>>>>>> before or after that. You do not see them in /proc, or you see them > >>>>>>>>>>> and their count does not increase? > >>>>>>>>>> Sorry for the long delay, we ran a lot of tests to find out what could > >>>>>>>>>> be the reason for > >>>>>>>>>> the performance difference. > >>>>>>>>>> > >>>>>>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to > >>>>>>>>>> the virtual > >>>>>>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel > >>>>>>>>>>> As for where they come from, this is not a mystery, the reschedule > >>>>>>>>>>> IPI is triggered when code on one cpu changes the scheduler state > >>>>>>>>>>> (wakes up a thread for instance) on another cpu. If you want to > >>>>>>>>>>> avoid it, do not do that. That means, do not share mutex between > >>>>>>>>>>> threads running on different cpus, pay attention for timers to be > >>>>>>>>>>> running on the same cpu as the thread they signal, etc... > >>>>>>>>>>> > >>>>>>>>>>> The APC virq is used to multiplex several services, which you can > >>>>>>>>>>> find by grepping the sources for rthal_apc_alloc: > >>>>>>>>>>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", > >>>>>>>>>>> ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, > >>>>>>>>>>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); > >>>>>>>>>>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); > >>>>>>>>>>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); > >>>>>>>>>>> ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", > >>>>>>>>>>> > >>>>>>>>>>> It would be interesting to know which of these services is triggered > >>>>>>>>>>> a lot. One possibility I see would be root thread priority > >>>>>>>>>>> inheritance, so it would be caused by mode switches. This brings the > >>>>>>>>>>> question: do your application have threads migrating between primary > >>>>>>>>>>> and secondary mode, do you see the count of mode switches increase > >>>>>>>>>>> with the kernel changes, do you have root thread priority > >>>>>>>>>>> inheritance enabled? > >>>>>>>>>>> > >>>>>>>>>> Here a short sum up of our tests and the results and at the end a few > >>>>>>>>>> questions :-) > >>>>>>>>>> > >>>>>>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from > >>>>>>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 > >>>>>>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 > >>>>>>>>>> On both Kernels the CONFIG_SMP is set. > >>>>>>>>>> > >>>>>>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95 > >>>>>>>>>> tooks 40% of the CPU time on Kernel 3.0.43 > >>>>>>>>>> and 47% of CPU time on Kernel 3.10.53 > >>>>>>>>>> > >>>>>>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% > >>>>>>>>>> To find out what is the reason for this difference we ran the following test. > >>>>>>>>>> We tried to get the new system faster by change some components of the system. > >>>>>>>>>> > >>>>>>>>>> -Changing U-Boot on new system -> still 7% slower > >>>>>>>>>> -Copy Kernel 3.0.43 to new system -> still 7% slower > >>>>>>>>>> -Creating Kernel 3.0.43 with > >>>>>>>>>> Xenomai 2.6.4 and copy it to new system -> still 7% slower > >>>>>>>>>> -Compiling the new system with > >>>>>>>>>> old GCC version -> still 7% slower > >>>>>>>>>> -We also checked the settings for RAM and CPU clock -> these are equal > >>>>>>>>>> > >>>>>>>>>> It looks like that is not one of the big components, > >>>>>>>>>> so we started to test some special functions like rt_timer_tsc() > >>>>>>>>>> In the following example we stay for 800µs in the while loop and > >>>>>>>>>> start this loop again after 200µs delay. > >>>>>>>>>> The task application running this code has priotity 95. > >>>>>>>>>> > >>>>>>>>>> Here a simplified code snipped > >>>>>>>>>> start = rt_timer_tsc(); > >>>>>>>>>> do > >>>>>>>>>> { > >>>>>>>>>> current = rt_timer_tsc(); > >>>>>>>>>> i++; > >>>>>>>>>> } while((current - start) < 800) > >>>>>>>>> If your CPU is running at 1 GHz and uses the global timer as clock > >>>>>>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is > >>>>>>>>> something around 1.6 us > >>>>>>>> Sorry I simplified the code snippet a little bit to much. > >>>>>>>> Thats the correct code. > >>>>>>>> > >>>>>>>> current = rt_timer_tsc2ns(rt_timer_tsc()); > >>>>>>>> > >>>>>>>>> So, I do not really understand what you are talking about. But are > >>>>>>>>> you sure the two kernels use the same clocksource for xenomai? > >>>>>>>>> > >>>>>>>>> Could you show us the result of "dmesg | grep I-pipe" with the two > >>>>>>>>> kernels ? > >>>>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 > >>>>>>>> I-pipe, 3.000 MHz clocksource > >>>>>>>> I-pipe, 396.000 MHz clocksource > >>>>>>>> I-pipe, 396.000 MHz timer > >>>>>>>> I-pipe, 396.000 MHz timer > >>>>>>>> I-pipe: head domain Xenomai registered. > >>>>>>>> > >>>>>>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1 > >>>>>>>> [ 0.000000] I-pipe 1.18-13: pipeline enabled. > >>>>>>>> [ 0.331999] I-pipe, 396.000 MHz timer > >>>>>>>> [ 0.335720] I-pipe, 396.000 MHz clocksource > >>>>>>>> [ 0.844016] I-pipe: Domain Xenomai registered. > >>>>>>>> > >>>>>>>> The controller is a imx6dl, this controller can run maximum 800MHz > >>>>>>> Ok, so the new kernel registers two tsc emulations, could you run > >>>>>>> the "tsc" regression test to measure the tsc latency? The two tsc > >>>>>>> emulations have very different latencies, so the result would be > >>>>>>> unmistakable. > >>>>>>> > >>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 > >>>>>> /usr/xenomai/bin/latency > >>>>> This test is named "latency", not "tsc". As the different names > >>>>> imply, they are not measuring the same thing. > >>>>> > >>>> Sorry for the stupied question, > >>>> but where do I find the "tsc" test, because in folder /usr/xenomai/bin/ > >>>> is it not located > >>> You have millions of files on your target? Or you compiled busybox > >>> without support for the "find" utility ? > >>> > >> Sorry I searched for tsc befor but didn't find any executable called "tsc" > >> what I find is /usr/share/ghostscript and its config files and > >> /usr/bin/xtscal. > >> > >> I find in the xenomai sources the file tsc.c, could you please tell me > >> what option do I have to enable that this will be build ? > > There is no option that can disable its compilation/installation as > > far as I know. > > > > It is normally built and installed under > > @XENO_TEST_DIR@/regression/native > > > > So the installation directory depends on the value you passed to > > configure --with-testdir option (the default being /usr/bin). > > > Thanks a lot I found it and downloaded it to my target > > Here are the output for Kernel 3.10.53 and Xenomai 2.6.4 > > #> ./tsc > Checking tsc for 1 minute(s) > min: 10, max: 596, avg: 10.5056 > min: 10, max: 595, avg: 10.5053 > min: 10, max: 603, avg: 10.5053 > min: 10, max: 630, avg: 10.5052 > min: 10, max: 600, avg: 10.505 > min: 10, max: 595, avg: 10.5056 > min: 10, max: 562, avg: 10.505 > min: 10, max: 605, avg: 10.5056 > min: 10, max: 602, avg: 10.5055 > min: 10, max: 595, avg: 10.5052 Could you let the test run until the end ? (1 minute is not so long). -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 10:19 ` Gilles Chanteperdrix @ 2016-06-28 10:31 ` Wolfgang Netbal 2016-06-28 10:39 ` Gilles Chanteperdrix 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-28 10:31 UTC (permalink / raw) To: gilles.chanteperdrix; +Cc: xenomai Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix: > On Tue, Jun 28, 2016 at 12:10:06PM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-28 um 11:55 schrieb Gilles Chanteperdrix: >>> On Tue, Jun 28, 2016 at 11:51:39AM +0200, Wolfgang Netbal wrote: >>>> Am 2016-06-28 um 11:29 schrieb Gilles Chanteperdrix: >>>>> On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote: >>>>>> Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix: >>>>>>> On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote: >>>>>>>> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix: >>>>>>>>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote: >>>>>>>>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: >>>>>>>>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: >>>>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>>>>>>>>>>>> Dear all, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to >>>>>>>>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system >>>>>>>>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a >>>>>>>>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + >>>>>>>>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls >>>>>>>>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old >>>>>>>>>>>>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our main >>>>>>>>>>>>>>>>>>>>>> XENOMAI task with priority = 95. >>>>>>>>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() >>>>>>>>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task >>>>>>>>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4. >>>>>>>>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the >>>>>>>>>>>>>> once that are send by my board using GPIOs, but this virtual interrupts >>>>>>>>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler >>>>>>>>>>>>>> installed. >>>>>>>>>>>>>> I'm pretty sure that these interrupts are slowing down my system, but >>>>>>>>>>>>>> where do they come from ? >>>>>>>>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? >>>>>>>>>>>>>> how long do they need to process ? >>>>>>>>>>>>> How do you mean you do not see them? If you are talking about the >>>>>>>>>>>>> rescheduling API, it used no to be bound to a virq (so, it would >>>>>>>>>>>>> have a different irq number on cortex A9, something between 0 and 31 >>>>>>>>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is >>>>>>>>>>>>> before or after that. You do not see them in /proc, or you see them >>>>>>>>>>>>> and their count does not increase? >>>>>>>>>>>> Sorry for the long delay, we ran a lot of tests to find out what could >>>>>>>>>>>> be the reason for >>>>>>>>>>>> the performance difference. >>>>>>>>>>>> >>>>>>>>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to >>>>>>>>>>>> the virtual >>>>>>>>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel >>>>>>>>>>>>> As for where they come from, this is not a mystery, the reschedule >>>>>>>>>>>>> IPI is triggered when code on one cpu changes the scheduler state >>>>>>>>>>>>> (wakes up a thread for instance) on another cpu. If you want to >>>>>>>>>>>>> avoid it, do not do that. That means, do not share mutex between >>>>>>>>>>>>> threads running on different cpus, pay attention for timers to be >>>>>>>>>>>>> running on the same cpu as the thread they signal, etc... >>>>>>>>>>>>> >>>>>>>>>>>>> The APC virq is used to multiplex several services, which you can >>>>>>>>>>>>> find by grepping the sources for rthal_apc_alloc: >>>>>>>>>>>>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler", >>>>>>>>>>>>> ./ksrc/skins/rtdm/device.c: rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler, >>>>>>>>>>>>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", ®istry_proc_schedule, NULL); >>>>>>>>>>>>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL); >>>>>>>>>>>>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", &lostage_handler, NULL); >>>>>>>>>>>>> ./ksrc/nucleus/select.c: xnselect_apc = rthal_apc_alloc("xnselectors_destroy", >>>>>>>>>>>>> >>>>>>>>>>>>> It would be interesting to know which of these services is triggered >>>>>>>>>>>>> a lot. One possibility I see would be root thread priority >>>>>>>>>>>>> inheritance, so it would be caused by mode switches. This brings the >>>>>>>>>>>>> question: do your application have threads migrating between primary >>>>>>>>>>>>> and secondary mode, do you see the count of mode switches increase >>>>>>>>>>>>> with the kernel changes, do you have root thread priority >>>>>>>>>>>>> inheritance enabled? >>>>>>>>>>>>> >>>>>>>>>>>> Here a short sum up of our tests and the results and at the end a few >>>>>>>>>>>> questions :-) >>>>>>>>>>>> >>>>>>>>>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from >>>>>>>>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2 >>>>>>>>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2 >>>>>>>>>>>> On both Kernels the CONFIG_SMP is set. >>>>>>>>>>>> >>>>>>>>>>>> What we see is that when we running a customer project in a Xenomai task with priority 95 >>>>>>>>>>>> tooks 40% of the CPU time on Kernel 3.0.43 >>>>>>>>>>>> and 47% of CPU time on Kernel 3.10.53 >>>>>>>>>>>> >>>>>>>>>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15% >>>>>>>>>>>> To find out what is the reason for this difference we ran the following test. >>>>>>>>>>>> We tried to get the new system faster by change some components of the system. >>>>>>>>>>>> >>>>>>>>>>>> -Changing U-Boot on new system -> still 7% slower >>>>>>>>>>>> -Copy Kernel 3.0.43 to new system -> still 7% slower >>>>>>>>>>>> -Creating Kernel 3.0.43 with >>>>>>>>>>>> Xenomai 2.6.4 and copy it to new system -> still 7% slower >>>>>>>>>>>> -Compiling the new system with >>>>>>>>>>>> old GCC version -> still 7% slower >>>>>>>>>>>> -We also checked the settings for RAM and CPU clock -> these are equal >>>>>>>>>>>> >>>>>>>>>>>> It looks like that is not one of the big components, >>>>>>>>>>>> so we started to test some special functions like rt_timer_tsc() >>>>>>>>>>>> In the following example we stay for 800µs in the while loop and >>>>>>>>>>>> start this loop again after 200µs delay. >>>>>>>>>>>> The task application running this code has priotity 95. >>>>>>>>>>>> >>>>>>>>>>>> Here a simplified code snipped >>>>>>>>>>>> start = rt_timer_tsc(); >>>>>>>>>>>> do >>>>>>>>>>>> { >>>>>>>>>>>> current = rt_timer_tsc(); >>>>>>>>>>>> i++; >>>>>>>>>>>> } while((current - start) < 800) >>>>>>>>>>> If your CPU is running at 1 GHz and uses the global timer as clock >>>>>>>>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is >>>>>>>>>>> something around 1.6 us >>>>>>>>>> Sorry I simplified the code snippet a little bit to much. >>>>>>>>>> Thats the correct code. >>>>>>>>>> >>>>>>>>>> current = rt_timer_tsc2ns(rt_timer_tsc()); >>>>>>>>>> >>>>>>>>>>> So, I do not really understand what you are talking about. But are >>>>>>>>>>> you sure the two kernels use the same clocksource for xenomai? >>>>>>>>>>> >>>>>>>>>>> Could you show us the result of "dmesg | grep I-pipe" with the two >>>>>>>>>>> kernels ? >>>>>>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 >>>>>>>>>> I-pipe, 3.000 MHz clocksource >>>>>>>>>> I-pipe, 396.000 MHz clocksource >>>>>>>>>> I-pipe, 396.000 MHz timer >>>>>>>>>> I-pipe, 396.000 MHz timer >>>>>>>>>> I-pipe: head domain Xenomai registered. >>>>>>>>>> >>>>>>>>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1 >>>>>>>>>> [ 0.000000] I-pipe 1.18-13: pipeline enabled. >>>>>>>>>> [ 0.331999] I-pipe, 396.000 MHz timer >>>>>>>>>> [ 0.335720] I-pipe, 396.000 MHz clocksource >>>>>>>>>> [ 0.844016] I-pipe: Domain Xenomai registered. >>>>>>>>>> >>>>>>>>>> The controller is a imx6dl, this controller can run maximum 800MHz >>>>>>>>> Ok, so the new kernel registers two tsc emulations, could you run >>>>>>>>> the "tsc" regression test to measure the tsc latency? The two tsc >>>>>>>>> emulations have very different latencies, so the result would be >>>>>>>>> unmistakable. >>>>>>>>> >>>>>>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 >>>>>>>> /usr/xenomai/bin/latency >>>>>>> This test is named "latency", not "tsc". As the different names >>>>>>> imply, they are not measuring the same thing. >>>>>>> >>>>>> Sorry for the stupied question, >>>>>> but where do I find the "tsc" test, because in folder /usr/xenomai/bin/ >>>>>> is it not located >>>>> You have millions of files on your target? Or you compiled busybox >>>>> without support for the "find" utility ? >>>>> >>>> Sorry I searched for tsc befor but didn't find any executable called "tsc" >>>> what I find is /usr/share/ghostscript and its config files and >>>> /usr/bin/xtscal. >>>> >>>> I find in the xenomai sources the file tsc.c, could you please tell me >>>> what option do I have to enable that this will be build ? >>> There is no option that can disable its compilation/installation as >>> far as I know. >>> >>> It is normally built and installed under >>> @XENO_TEST_DIR@/regression/native >>> >>> So the installation directory depends on the value you passed to >>> configure --with-testdir option (the default being /usr/bin). >>> >> Thanks a lot I found it and downloaded it to my target >> >> Here are the output for Kernel 3.10.53 and Xenomai 2.6.4 >> >> #> ./tsc >> Checking tsc for 1 minute(s) >> min: 10, max: 596, avg: 10.5056 >> min: 10, max: 595, avg: 10.5053 >> min: 10, max: 603, avg: 10.5053 >> min: 10, max: 630, avg: 10.5052 >> min: 10, max: 600, avg: 10.505 >> min: 10, max: 595, avg: 10.5056 >> min: 10, max: 562, avg: 10.505 >> min: 10, max: 605, avg: 10.5056 >> min: 10, max: 602, avg: 10.5055 >> min: 10, max: 595, avg: 10.5052 > Could you let the test run until the end ? (1 minute is not so long). > Sorry I dont want to overload the mail Here are the output for Kernel 3.10.53 and Xenomai 2.6.4 #> ./tsc Checking tsc for 1 minute(s) min: 10, max: 595, avg: 10.5055 min: 10, max: 596, avg: 10.5048 min: 10, max: 595, avg: 10.5048 min: 10, max: 563, avg: 10.5047 min: 10, max: 594, avg: 10.505 min: 10, max: 652, avg: 10.5047 min: 10, max: 604, avg: 10.505 min: 10, max: 595, avg: 10.5045 min: 10, max: 595, avg: 10.5046 min: 10, max: 594, avg: 10.5046 min: 10, max: 582, avg: 10.5045 min: 10, max: 599, avg: 10.5046 min: 10, max: 379, avg: 10.5046 min: 10, max: 571, avg: 10.5046 min: 10, max: 534, avg: 10.5051 min: 10, max: 591, avg: 10.5046 min: 10, max: 595, avg: 10.5048 min: 10, max: 630, avg: 10.5049 min: 10, max: 534, avg: 10.5047 min: 10, max: 564, avg: 10.5048 min: 10, max: 520, avg: 10.5049 min: 10, max: 255, avg: 10.5047 min: 10, max: 594, avg: 10.5047 min: 10, max: 677, avg: 10.505 min: 10, max: 128, avg: 10.5053 min: 10, max: 530, avg: 10.5046 min: 10, max: 595, avg: 10.5047 min: 10, max: 595, avg: 10.5048 min: 10, max: 121, avg: 10.5044 min: 10, max: 595, avg: 10.5053 min: 10, max: 598, avg: 10.505 min: 10, max: 595, avg: 10.505 min: 10, max: 595, avg: 10.505 min: 10, max: 598, avg: 10.5047 min: 10, max: 143, avg: 10.5058 min: 10, max: 125, avg: 10.505 min: 10, max: 202, avg: 10.5047 min: 10, max: 595, avg: 10.5047 min: 10, max: 295, avg: 10.5049 min: 10, max: 156, avg: 10.5047 min: 10, max: 546, avg: 10.5049 min: 10, max: 547, avg: 10.5048 min: 10, max: 153, avg: 10.5048 min: 10, max: 116, avg: 10.5046 min: 10, max: 591, avg: 10.5052 min: 10, max: 594, avg: 10.5046 min: 10, max: 563, avg: 10.5048 min: 10, max: 101, avg: 10.5046 min: 10, max: 117, avg: 10.5048 min: 10, max: 608, avg: 10.505 min: 10, max: 593, avg: 10.5048 min: 10, max: 617, avg: 10.5047 min: 10, max: 596, avg: 10.505 min: 10, max: 529, avg: 10.5048 min: 10, max: 557, avg: 10.5052 min: 10, max: 104, avg: 10.5047 min: 10, max: 595, avg: 10.5048 min: 10, max: 595, avg: 10.5046 min: 10, max: 115, avg: 10.5045 min: 10, max: 196, avg: 10.5046 min: 10, max: 677, avg: 10.5048 -> 0.0265273 us Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1 #> ./tsc Checking tsc for 1 minute(s) min: 10, max: 141, avg: 11.5371 min: 10, max: 181, avg: 11.5377 min: 10, max: 512, avg: 11.5444 min: 10, max: 93, avg: 11.5335 min: 10, max: 578, avg: 11.5581 min: 10, max: 81, avg: 11.5838 min: 10, max: 290, avg: 11.5455 min: 10, max: 105, avg: 11.5441 min: 10, max: 543, avg: 11.5648 min: 10, max: 111, avg: 11.5602 min: 10, max: 96, avg: 11.5607 min: 10, max: 77, avg: 11.5758 min: 10, max: 122, avg: 11.5601 min: 10, max: 78, avg: 11.5494 min: 10, max: 142, avg: 11.581 min: 10, max: 95, avg: 11.5513 min: 10, max: 142, avg: 11.5825 min: 10, max: 580, avg: 11.5829 min: 10, max: 104, avg: 11.5656 min: 10, max: 77, avg: 11.5833 min: 10, max: 667, avg: 11.5623 min: 10, max: 91, avg: 11.5742 min: 10, max: 656, avg: 11.5742 min: 10, max: 78, avg: 11.5866 min: 10, max: 428, avg: 11.5958 min: 10, max: 77, avg: 11.5945 min: 10, max: 81, avg: 11.5948 min: 10, max: 96, avg: 11.5787 min: 10, max: 85, avg: 11.5975 min: 10, max: 122, avg: 11.6049 min: 10, max: 516, avg: 11.5776 min: 10, max: 95, avg: 11.597 min: 10, max: 550, avg: 11.5868 min: 10, max: 147, avg: 11.6125 min: 10, max: 582, avg: 11.611 min: 10, max: 541, avg: 11.6105 min: 10, max: 154, avg: 11.5942 min: 10, max: 94, avg: 11.6217 min: 10, max: 74, avg: 11.6066 min: 10, max: 91, avg: 11.653 min: 10, max: 539, avg: 11.6248 min: 10, max: 166, avg: 11.5915 min: 10, max: 81, avg: 11.6156 min: 10, max: 111, avg: 11.6099 min: 10, max: 578, avg: 11.6348 min: 10, max: 95, avg: 11.6329 min: 10, max: 640, avg: 11.629 min: 10, max: 597, avg: 11.6336 min: 10, max: 95, avg: 11.615 min: 10, max: 555, avg: 11.5757 min: 10, max: 94, avg: 11.5749 min: 10, max: 530, avg: 11.5542 min: 10, max: 82, avg: 11.5346 min: 10, max: 119, avg: 11.5466 min: 10, max: 94, avg: 11.5169 min: 10, max: 173, avg: 11.5044 min: 10, max: 147, avg: 11.5309 min: 10, max: 196, avg: 11.4833 min: 10, max: 542, avg: 11.4979 min: 10, max: 95, avg: 11.4895 min: 10, max: 667, avg: 11.5755 -> 0.029231 us ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 10:31 ` Wolfgang Netbal @ 2016-06-28 10:39 ` Gilles Chanteperdrix 2016-06-28 11:45 ` Wolfgang Netbal 2016-06-28 11:55 ` Wolfgang Netbal 0 siblings, 2 replies; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-28 10:39 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix: > min: 10, max: 677, avg: 10.5048 -> 0.0265273 us > > Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1 > > #> ./tsc > min: 10, max: 667, avg: 11.5755 -> 0.029231 us Ok. So, first it confirms that the two configurations are running the processor at the same frequency. But we seem to see a pattern, the maxima in the case of the new kernel seems consistently higher. Which would suggest that there is some difference in the cache. What is the status of the two configurations with regard to the L2 cache write allocate policy? Could you show us the tsc results of Xenomai 2.6.4 with the 3.0 kernel ? -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 10:39 ` Gilles Chanteperdrix @ 2016-06-28 11:45 ` Wolfgang Netbal 2016-06-28 11:57 ` Gilles Chanteperdrix 2016-06-28 11:55 ` Wolfgang Netbal 1 sibling, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-28 11:45 UTC (permalink / raw) To: xenomai Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix: > On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix: >> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us >> >> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1 >> >> #> ./tsc >> min: 10, max: 667, avg: 11.5755 -> 0.029231 us > Ok. So, first it confirms that the two configurations are running > the processor at the same frequency. But we seem to see a pattern, > the maxima in the case of the new kernel seems consistently higher. > Which would suggest that there is some difference in the cache. What > is the status of the two configurations with regard to the L2 cache > write allocate policy? Could you show us the tsc results of Xenomai > 2.6.4 with the 3.0 kernel ? As requested I created a Kernel 3.0.43 with Xenomai 2.6.4 #> dmesg | grep "Linux version" Linux version 3.0.43 (netwol@DSWUB001) (gcc version 4.7.2 (GCC) ) #186 SMP PREEMPT Tue Jun 28 13:28:40 CEST 2016 #> dmesg | grep "Xenomai" [ 0.844697] I-pipe: Domain Xenomai registered. [ 0.849188] Xenomai: hal/arm started. [ 0.853189] Xenomai: scheduling class idle registered. [ 0.858350] Xenomai: scheduling class rt registered. [ 0.882246] Xenomai: real-time nucleus v2.6.4 (Jumpin' Out) loaded. [ 0.888926] Xenomai: starting native API services. [ 0.893811] Xenomai: starting RTDM services. #> dmesg | grep I-pipe [ 0.000000] I-pipe 1.18-13: pipeline enabled. [ 0.331174] I-pipe, 396.000 MHz timer [ 0.334900] I-pipe, 396.000 MHz clocksource [ 0.844697] I-pipe: Domain Xenomai registered. Here the output of tsc #> ./tsc Checking tsc for 1 minute(s) min: 10, max: 24, avg: 10.5 min: 10, max: 24, avg: 10.5 min: 10, max: 26, avg: 10.5 min: 10, max: 22, avg: 10.5 min: 10, max: 26, avg: 10.5 min: 10, max: 18, avg: 10.5 min: 10, max: 51, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 22, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 32, avg: 10.5 min: 10, max: 23, avg: 10.5 min: 10, max: 47, avg: 10.5 min: 10, max: 35, avg: 10.5 min: 10, max: 29, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 345, avg: 10.5 min: 10, max: 23, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 23, avg: 10.5 min: 10, max: 42, avg: 10.5 min: 10, max: 25, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 23, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 38, avg: 10.5 min: 10, max: 35, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 28, avg: 10.5 min: 10, max: 25, avg: 10.5 min: 10, max: 22, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 26, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 24, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 19, avg: 10.5 min: 10, max: 23, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 35, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 25, avg: 10.5 min: 10, max: 24, avg: 10.5 min: 10, max: 36, avg: 10.5 min: 10, max: 35, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 34, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 36, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 26, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 26, avg: 10.5 min: 10, max: 40, avg: 10.5 min: 10, max: 23, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 21, avg: 10.5 min: 10, max: 345, avg: 10.5 -> 0.0265152 us ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 11:45 ` Wolfgang Netbal @ 2016-06-28 11:57 ` Gilles Chanteperdrix 0 siblings, 0 replies; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-28 11:57 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Tue, Jun 28, 2016 at 01:45:59PM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix: > > On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix: > >> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us > >> > >> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1 > >> > >> #> ./tsc > >> min: 10, max: 667, avg: 11.5755 -> 0.029231 us > > Ok. So, first it confirms that the two configurations are running > > the processor at the same frequency. But we seem to see a pattern, > > the maxima in the case of the new kernel seems consistently higher. > > Which would suggest that there is some difference in the cache. What > > is the status of the two configurations with regard to the L2 cache > > write allocate policy? Could you show us the tsc results of Xenomai > > 2.6.4 with the 3.0 kernel ? > As requested I created a Kernel 3.0.43 with Xenomai 2.6.4 > #> dmesg | grep "Linux version" > Linux version 3.0.43 (netwol@DSWUB001) (gcc version 4.7.2 (GCC) ) #186 > SMP PREEMPT Tue Jun 28 13:28:40 CEST 2016 > > #> dmesg | grep "Xenomai" > [ 0.844697] I-pipe: Domain Xenomai registered. > [ 0.849188] Xenomai: hal/arm started. > [ 0.853189] Xenomai: scheduling class idle registered. > [ 0.858350] Xenomai: scheduling class rt registered. > [ 0.882246] Xenomai: real-time nucleus v2.6.4 (Jumpin' Out) loaded. > [ 0.888926] Xenomai: starting native API services. > [ 0.893811] Xenomai: starting RTDM services. > > #> dmesg | grep I-pipe > [ 0.000000] I-pipe 1.18-13: pipeline enabled. > [ 0.331174] I-pipe, 396.000 MHz timer > [ 0.334900] I-pipe, 396.000 MHz clocksource > [ 0.844697] I-pipe: Domain Xenomai registered. > > > Here the output of tsc > min: 10, max: 345, avg: 10.5 -> 0.0265152 us Ok, so 3.0.43 with 2.6.4 has the same consistent behaviour with regard to the __xn_rdtsc() latency as 3.0.43 with 2.6.2.1. So: - if you find 2.6.4 with 3.0.43 slower than 3.0.43 with 2.6.2.1, you can remove the kernel version change from the mix and do your tests from now on exclusively with the kernel 3.0.43, and the tsc latency is unlikely to be the cause of the performance difference. To really make sure of that, you can replace __xn_rdtsc() in tsc.c with a call to rt_timer_tsc(), recompile and rerun on the two remaining configurations. - if you find 2.6.4 with 3.0.43 is as fast as 3.0.43 with 2.6.2.1, you can remove the Xenomai version change from the mix and do you from now on exclusively with Xenomai 2.6.4. And the question about differences in cache configuration remains. -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 10:39 ` Gilles Chanteperdrix 2016-06-28 11:45 ` Wolfgang Netbal @ 2016-06-28 11:55 ` Wolfgang Netbal 2016-06-28 12:01 ` Gilles Chanteperdrix 1 sibling, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-28 11:55 UTC (permalink / raw) To: xenomai Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix: > On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix: >> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us >> >> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1 >> >> #> ./tsc >> min: 10, max: 667, avg: 11.5755 -> 0.029231 us > Ok. So, first it confirms that the two configurations are running > the processor at the same frequency. But we seem to see a pattern, > the maxima in the case of the new kernel seems consistently higher. > Which would suggest that there is some difference in the cache. What > is the status of the two configurations with regard to the L2 cache > write allocate policy? Do you mean the configuration we checked in this request https://xenomai.org/pipermail/xenomai/2016-June/036390.html ? > Could you show us the tsc results of Xenomai > 2.6.4 with the 3.0 kernel ? > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 11:55 ` Wolfgang Netbal @ 2016-06-28 12:01 ` Gilles Chanteperdrix 2016-06-28 14:32 ` Wolfgang Netbal 0 siblings, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-28 12:01 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Tue, Jun 28, 2016 at 01:55:27PM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix: > > On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix: > >> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us > >> > >> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1 > >> > >> #> ./tsc > >> min: 10, max: 667, avg: 11.5755 -> 0.029231 us > > Ok. So, first it confirms that the two configurations are running > > the processor at the same frequency. But we seem to see a pattern, > > the maxima in the case of the new kernel seems consistently higher. > > Which would suggest that there is some difference in the cache. What > > is the status of the two configurations with regard to the L2 cache > > write allocate policy? > Do you mean the configuration we checked in this request > https://xenomai.org/pipermail/xenomai/2016-June/036390.html This answer is based on a kernel message, which may happen before or after the I-pipe patch has changed the value passed to the register, so, essentially, it is useless. I would not call that checking the L2 cache configuration differences. -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 12:01 ` Gilles Chanteperdrix @ 2016-06-28 14:32 ` Wolfgang Netbal 2016-06-28 14:42 ` Gilles Chanteperdrix 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-28 14:32 UTC (permalink / raw) To: xenomai Am 2016-06-28 um 14:01 schrieb Gilles Chanteperdrix: > On Tue, Jun 28, 2016 at 01:55:27PM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix: >>> On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote: >>>> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix: >>>> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us >>>> >>>> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1 >>>> >>>> #> ./tsc >>>> min: 10, max: 667, avg: 11.5755 -> 0.029231 us >>> Ok. So, first it confirms that the two configurations are running >>> the processor at the same frequency. But we seem to see a pattern, >>> the maxima in the case of the new kernel seems consistently higher. >>> Which would suggest that there is some difference in the cache. What >>> is the status of the two configurations with regard to the L2 cache >>> write allocate policy? >> Do you mean the configuration we checked in this request >> https://xenomai.org/pipermail/xenomai/2016-June/036390.html > This answer is based on a kernel message, which may happen before > or after the I-pipe patch has changed the value passed to the > register, so, essentially, it is useless. I would not call that > checking the L2 cache configuration differences. > I readed the values from the auxiliary control register, when the system is up and running. I get the same values like I see in the Kernel log. Kernel 3.10.53 [0xa02104]=0x32c50000 Kernel 3.0.43 [0xa02104]=0x2850000 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 14:32 ` Wolfgang Netbal @ 2016-06-28 14:42 ` Gilles Chanteperdrix 2016-06-30 9:17 ` Wolfgang Netbal 0 siblings, 1 reply; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-28 14:42 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Tue, Jun 28, 2016 at 04:32:17PM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-28 um 14:01 schrieb Gilles Chanteperdrix: > > On Tue, Jun 28, 2016 at 01:55:27PM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix: > >>> On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote: > >>>> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix: > >>>> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us > >>>> > >>>> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1 > >>>> > >>>> #> ./tsc > >>>> min: 10, max: 667, avg: 11.5755 -> 0.029231 us > >>> Ok. So, first it confirms that the two configurations are running > >>> the processor at the same frequency. But we seem to see a pattern, > >>> the maxima in the case of the new kernel seems consistently higher. > >>> Which would suggest that there is some difference in the cache. What > >>> is the status of the two configurations with regard to the L2 cache > >>> write allocate policy? > >> Do you mean the configuration we checked in this request > >> https://xenomai.org/pipermail/xenomai/2016-June/036390.html > > This answer is based on a kernel message, which may happen before > > or after the I-pipe patch has changed the value passed to the > > register, so, essentially, it is useless. I would not call that > > checking the L2 cache configuration differences. > > > I readed the values from the auxiliary control register, > when the system is up and running. > I get the same values like I see in the Kernel log. > > Kernel 3.10.53 [0xa02104]=0x32c50000 > Kernel 3.0.43 [0xa02104]=0x2850000 Ok, so, if I read this correctly both values have 0x800000 set, which means "force no allocate", and is what we want. But there are a lot of other questions in my answer which you avoid to answer (and note that that one was only relevant in one of two cases, which I believe is not yours). -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-28 14:42 ` Gilles Chanteperdrix @ 2016-06-30 9:17 ` Wolfgang Netbal 2016-06-30 9:39 ` Gilles Chanteperdrix 0 siblings, 1 reply; 36+ messages in thread From: Wolfgang Netbal @ 2016-06-30 9:17 UTC (permalink / raw) To: xenomai Am 2016-06-28 um 16:42 schrieb Gilles Chanteperdrix: > On Tue, Jun 28, 2016 at 04:32:17PM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-28 um 14:01 schrieb Gilles Chanteperdrix: >>> On Tue, Jun 28, 2016 at 01:55:27PM +0200, Wolfgang Netbal wrote: >>>> Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix: >>>>> On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote: >>>>>> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix: >>>>>> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us >>>>>> >>>>>> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1 >>>>>> >>>>>> #> ./tsc >>>>>> min: 10, max: 667, avg: 11.5755 -> 0.029231 us >>>>> Ok. So, first it confirms that the two configurations are running >>>>> the processor at the same frequency. But we seem to see a pattern, >>>>> the maxima in the case of the new kernel seems consistently higher. >>>>> Which would suggest that there is some difference in the cache. What >>>>> is the status of the two configurations with regard to the L2 cache >>>>> write allocate policy? >>>> Do you mean the configuration we checked in this request >>>> https://xenomai.org/pipermail/xenomai/2016-June/036390.html >>> This answer is based on a kernel message, which may happen before >>> or after the I-pipe patch has changed the value passed to the >>> register, so, essentially, it is useless. I would not call that >>> checking the L2 cache configuration differences. >>> >> I readed the values from the auxiliary control register, >> when the system is up and running. >> I get the same values like I see in the Kernel log. >> >> Kernel 3.10.53 [0xa02104]=0x32c50000 >> Kernel 3.0.43 [0xa02104]=0x2850000 > Ok, so, if I read this correctly both values have 0x800000 set, > which means "force no allocate", and is what we want. But there are > a lot of other questions in my answer which you avoid to answer (and > note that that one was only relevant in one of two cases, which I > believe is not yours). > Dear Gilles, your first intention was correct, that the L2 cache configuration may be the reason for our issue. I disabled the instruction and data prefetching and my customer application is as fast as in our old kernel. It was a change in the Kernel file arch/arm/mach-imx/system.c where the prefetching was activated. We will additional replace the function rt_timer_tsc() by __xn_rdtsc() as you recommended. In our customer applications every millisecond the xenomai task with priority 95 is called and works on different objects that are located on different memory locations. When the objects are finished we leave the xenomai domain and let work Linux. Do you have any additional hints for me what configrations L2 cache or other that can speed up this use case ? Thanks a lot for you support and patience. <http://dict.leo.org/ende/index_de.html#/search=patience&searchLoc=0&resultOrder=basic&multiwordShowSingle=on&pos=0> Kind regards Wolfgang ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-30 9:17 ` Wolfgang Netbal @ 2016-06-30 9:39 ` Gilles Chanteperdrix 0 siblings, 0 replies; 36+ messages in thread From: Gilles Chanteperdrix @ 2016-06-30 9:39 UTC (permalink / raw) To: Wolfgang Netbal; +Cc: xenomai On Thu, Jun 30, 2016 at 11:17:59AM +0200, Wolfgang Netbal wrote: > > > Am 2016-06-28 um 16:42 schrieb Gilles Chanteperdrix: > > On Tue, Jun 28, 2016 at 04:32:17PM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-28 um 14:01 schrieb Gilles Chanteperdrix: > >>> On Tue, Jun 28, 2016 at 01:55:27PM +0200, Wolfgang Netbal wrote: > >>>> Am 2016-06-28 um 12:39 schrieb Gilles Chanteperdrix: > >>>>> On Tue, Jun 28, 2016 at 12:31:42PM +0200, Wolfgang Netbal wrote: > >>>>>> Am 2016-06-28 um 12:19 schrieb Gilles Chanteperdrix: > >>>>>> min: 10, max: 677, avg: 10.5048 -> 0.0265273 us > >>>>>> > >>>>>> Here are the output for Kernel 3.0.43 and Xenomai 2.6.2.1 > >>>>>> > >>>>>> #> ./tsc > >>>>>> min: 10, max: 667, avg: 11.5755 -> 0.029231 us > >>>>> Ok. So, first it confirms that the two configurations are running > >>>>> the processor at the same frequency. But we seem to see a pattern, > >>>>> the maxima in the case of the new kernel seems consistently higher. > >>>>> Which would suggest that there is some difference in the cache. What > >>>>> is the status of the two configurations with regard to the L2 cache > >>>>> write allocate policy? > >>>> Do you mean the configuration we checked in this request > >>>> https://xenomai.org/pipermail/xenomai/2016-June/036390.html > >>> This answer is based on a kernel message, which may happen before > >>> or after the I-pipe patch has changed the value passed to the > >>> register, so, essentially, it is useless. I would not call that > >>> checking the L2 cache configuration differences. > >>> > >> I readed the values from the auxiliary control register, > >> when the system is up and running. > >> I get the same values like I see in the Kernel log. > >> > >> Kernel 3.10.53 [0xa02104]=0x32c50000 > >> Kernel 3.0.43 [0xa02104]=0x2850000 > > Ok, so, if I read this correctly both values have 0x800000 set, > > which means "force no allocate", and is what we want. But there are > > a lot of other questions in my answer which you avoid to answer (and > > note that that one was only relevant in one of two cases, which I > > believe is not yours). > > > Dear Gilles, Hi, > > your first intention was correct, that the L2 cache configuration may be > the reason for our issue. > I disabled the instruction and data prefetching and my customer application > is as fast as in our old kernel. I thought you said the contrary in this mail: https://xenomai.org/pipermail/xenomai/2016-June/036390.html Well not exactly the contrary, but that you tried to enable prefetching with 3.0.43 and that performance did not degrade. > It was a change in the Kernel file arch/arm/mach-imx/system.c where the > prefetching > was activated. > > We will additional replace the function rt_timer_tsc() by __xn_rdtsc() > as you recommended. No, do not do that. __xn_rdtsc() is a Xenomai internal function. The "tsc" test uses it because it is a test measuring the execution time of this function. What I said was to replace __xn_rdtsc() with rt_timer_tsc() in the "tsc" test, to see if there was no performance regression in rt_timer_tsc(). Also, I would be curious to understand why the execution time of __xn_rdtsc() changed, it is just one processor cycle, so, it should not matter to applications, but still, I do not see what change could cause this. > > In our customer applications every millisecond the xenomai task with > priority 95 > is called and works on different objects that are located on different > memory locations. > When the objects are finished we leave the xenomai domain and let work > Linux. > > Do you have any additional hints for me what configrations L2 cache or > other that can > speed up this use case ? Well, no, I know nothing about the L2 cache configuration. A customer told me that disabling write allocate was improving the latency test results greatly on imx6, and I benchmarked it on OMAP4, another processor based on cortex A9, you can find the benchmark here: https://xenomai.org/2014/08/benchmarks-xenomai-dual-kernel-over-linux-3-14/#For_the_Texas_Instrument_Panda_board_running_a_TI_OMAP4430_processor_at_1_GHz Since it seemed to improve the performance on all the processors Xenomai supported at the time with this L2 cache (i.e. omap4 and imx6 really), we made the change in the I-pipe patch with a kernel parameter to disable it, in case someone would prefer to disable it. > > Thanks a lot for you support and patience. > <http://dict.leo.org/ende/index_de.html#/search=patience&searchLoc=0&resultOrder=basic&multiwordShowSingle=on&pos=0> Well, I am not always patient. But you are welcome. -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-06-07 14:13 ` Wolfgang Netbal 2016-06-07 17:00 ` Gilles Chanteperdrix @ 2016-06-07 17:22 ` Philippe Gerum 1 sibling, 0 replies; 36+ messages in thread From: Philippe Gerum @ 2016-06-07 17:22 UTC (permalink / raw) To: wolfgang.netbal; +Cc: xenomai On 06/07/2016 04:13 PM, Wolfgang Netbal wrote: > As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() > and 1038 handled by xnpod_schedule_handler() while my realtime task > is running on kernel 3.10.53 with Xenomai 2.6.4. > On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the > once that are send by my board using GPIOs, but this virtual interrupts > are assigned to Xenomai and Linux as well but I didn't see a handler > installed. > I'm pretty sure that these interrupts are slowing down my system, but > where do they come from ? > why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? > how long do they need to process ? > > Is there any dependecy in Xenomai between the kernel version and this > virtual interrupts ? > Maybe you should consider reading all the replies you get: On 05/31/2016 05:08 PM, Philippe Gerum wrote: > On 05/31/2016 04:09 PM, Wolfgang Netbal wrote: >> Dear all, >> >> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to >> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system >> is now up and running and works stable. Unfortunately we see a >> difference in the performance. Our old combination (XENOMAI 2.6.2.1 + >> Linux 3.0.43) was slightly faster. >> > > Could you quantify "slightly faster"? This is a dual kernel system, so > changes on either the regular kernel side and/or the co-kernel side may > have a measurable impact. > >> At the moment it looks like that XENOMAI 2.6.4 calls >> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old >> system. Every call of xnpod_schedule_handler interrupts our main >> XENOMAI task with priority = 95. > > That handler is attached to the inter-processor interrupt used for > rescheduling tasks running on a remote CPU. You may want to check the > CPU affinity settings of your tasks, and the way they interact/synchronize. > You may also want to check the mode switch count for your threads in /proc/xenomai/stat (MSW field). I suspect your application may be switching mode like crazy between Linux and Xenomai, causing interrupt activity for waking up either sides in turn. -- Philippe. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 2016-05-31 14:09 [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 Wolfgang Netbal 2016-05-31 14:16 ` Gilles Chanteperdrix @ 2016-05-31 15:08 ` Philippe Gerum 1 sibling, 0 replies; 36+ messages in thread From: Philippe Gerum @ 2016-05-31 15:08 UTC (permalink / raw) To: wolfgang.netbal, xenomai On 05/31/2016 04:09 PM, Wolfgang Netbal wrote: > Dear all, > > we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to > "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system > is now up and running and works stable. Unfortunately we see a > difference in the performance. Our old combination (XENOMAI 2.6.2.1 + > Linux 3.0.43) was slightly faster. > Could you quantify "slightly faster"? This is a dual kernel system, so changes on either the regular kernel side and/or the co-kernel side may have a measurable impact. > At the moment it looks like that XENOMAI 2.6.4 calls > xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old > system. Every call of xnpod_schedule_handler interrupts our main > XENOMAI task with priority = 95. That handler is attached to the inter-processor interrupt used for rescheduling tasks running on a remote CPU. You may want to check the CPU affinity settings of your tasks, and the way they interact/synchronize. -- Philippe. ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2016-06-30 9:39 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-05-31 14:09 [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 Wolfgang Netbal 2016-05-31 14:16 ` Gilles Chanteperdrix 2016-06-01 13:52 ` Wolfgang Netbal 2016-06-01 14:12 ` Gilles Chanteperdrix 2016-06-02 8:15 ` Wolfgang Netbal 2016-06-02 8:23 ` Gilles Chanteperdrix 2016-06-06 7:03 ` Wolfgang Netbal 2016-06-06 15:35 ` Gilles Chanteperdrix 2016-06-07 14:13 ` Wolfgang Netbal 2016-06-07 17:00 ` Gilles Chanteperdrix 2016-06-27 15:55 ` Wolfgang Netbal 2016-06-27 16:00 ` Gilles Chanteperdrix 2016-06-28 8:08 ` Wolfgang Netbal 2016-06-27 16:46 ` Gilles Chanteperdrix 2016-06-28 8:31 ` Wolfgang Netbal 2016-06-28 8:34 ` Gilles Chanteperdrix 2016-06-28 9:15 ` Wolfgang Netbal 2016-06-28 9:17 ` Gilles Chanteperdrix 2016-06-28 9:28 ` Wolfgang Netbal 2016-06-28 9:29 ` Gilles Chanteperdrix 2016-06-28 9:51 ` Wolfgang Netbal 2016-06-28 9:55 ` Gilles Chanteperdrix 2016-06-28 10:10 ` Wolfgang Netbal 2016-06-28 10:19 ` Gilles Chanteperdrix 2016-06-28 10:31 ` Wolfgang Netbal 2016-06-28 10:39 ` Gilles Chanteperdrix 2016-06-28 11:45 ` Wolfgang Netbal 2016-06-28 11:57 ` Gilles Chanteperdrix 2016-06-28 11:55 ` Wolfgang Netbal 2016-06-28 12:01 ` Gilles Chanteperdrix 2016-06-28 14:32 ` Wolfgang Netbal 2016-06-28 14:42 ` Gilles Chanteperdrix 2016-06-30 9:17 ` Wolfgang Netbal 2016-06-30 9:39 ` Gilles Chanteperdrix 2016-06-07 17:22 ` Philippe Gerum 2016-05-31 15:08 ` Philippe Gerum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.