From mboxrd@z Thu Jan 1 00:00:00 1970 References: <20160531141646.GG5951@hermes.click-hack.org> <574EE886.2020907@sigmatek.at> <20160601141238.GC14103@hermes.click-hack.org> <574FEB2D.5010509@sigmatek.at> <20160602082318.GB1801@hermes.click-hack.org> <5755204C.6090701@sigmatek.at> <20160606153545.GA376@hermes.click-hack.org> <5756D673.4080408@sigmatek.at> <20160607170050.GA13922@hermes.click-hack.org> <57714C60.4070407@sigmatek.at> <20160627164604.GH18662@hermes.click-hack.org> From: Wolfgang Netbal Message-ID: <577235C4.2080507@sigmatek.at> Date: Tue, 28 Jun 2016 10:31:00 +0200 MIME-Version: 1.0 In-Reply-To: <20160627164604.GH18662@hermes.click-hack.org> Content-Type: text/plain; charset="windows-1252"; format="flowed" Content-Transfer-Encoding: quoted-printable Subject: Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 Reply-To: wolfgang.netbal@sigmatek.at List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: xenomai@xenomai.org Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: > On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: >> >> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: >>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: >>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: >>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: >>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: >>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: >>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: >>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: >>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: >>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote: >>>>>>>>>>>> Dear all, >>>>>>>>>>>> >>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.= 0.43" to >>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. Th= e system >>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a >>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.= 6.2.1 + >>>>>>>>>>>> Linux 3.0.43) was slightly faster. >>>>>>>>>>>> >>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls >>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in= our old >>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our m= ain >>>>>>>>>>>> XENOMAI task with priority =3D 95. >>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler() >>>> and 1038 handled by xnpod_schedule_handler() while my realtime task >>>> is running on kernel 3.10.53 with Xenomai 2.6.4. >>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the >>>> once that are send by my board using GPIOs, but this virtual interrupts >>>> are assigned to Xenomai and Linux as well but I didn't see a handler >>>> installed. >>>> I'm pretty sure that these interrupts are slowing down my system, but >>>> where do they come from ? >>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? >>>> how long do they need to process ? >>> How do you mean you do not see them? If you are talking about the >>> rescheduling API, it used no to be bound to a virq (so, it would >>> have a different irq number on cortex A9, something between 0 and 31 >>> that would not show in the usual /proc files), I wonder if 3.0 is >>> before or after that. You do not see them in /proc, or you see them >>> and their count does not increase? >> Sorry for the long delay, we ran a lot of tests to find out what could >> be the reason for >> the performance difference. >> >> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to >> the virtual >> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel >>> As for where they come from, this is not a mystery, the reschedule >>> IPI is triggered when code on one cpu changes the scheduler state >>> (wakes up a thread for instance) on another cpu. If you want to >>> avoid it, do not do that. That means, do not share mutex between >>> threads running on different cpus, pay attention for timers to be >>> running on the same cpu as the thread they signal, etc... >>> >>> The APC virq is used to multiplex several services, which you can >>> find by grepping the sources for rthal_apc_alloc: >>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc =3D rthal_apc_alloc("= pse51_lostage_handler", >>> ./ksrc/skins/rtdm/device.c: rtdm_apc =3D rthal_apc_alloc("deferred = RTDM close", rtdm_apc_handler, >>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export", = ®istry_proc_schedule, NULL); >>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wake= up_proc, NULL); >>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler", = &lostage_handler, NULL); >>> ./ksrc/nucleus/select.c: xnselect_apc =3D rthal_apc_alloc("xnsel= ectors_destroy", >>> >>> It would be interesting to know which of these services is triggered >>> a lot. One possibility I see would be root thread priority >>> inheritance, so it would be caused by mode switches. This brings the >>> question: do your application have threads migrating between primary >>> and secondary mode, do you see the count of mode switches increase >>> with the kernel changes, do you have root thread priority >>> inheritance enabled? >>> >> Here a short sum up of our tests and the results and at the end a few >> questions :-) >> >> we are using a Freescale imx6dl on our hardware and upgraded our operati= ng system from >> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compi= ler we use GCC 4.7.2 >> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compil= er we use GCC 4.8.2 >> On both Kernels the CONFIG_SMP is set. >> >> What we see is that when we running a customer project in a Xenomai task= with priority 95 >> tooks 40% of the CPU time on Kernel 3.0.43 >> and 47% of CPU time on Kernel 3.10.53 >> >> so the new system is slower by 7% if we sum up this to 100% CPU load we = have a difference of 15% >> To find out what is the reason for this difference we ran the following = test. >> We tried to get the new system faster by change some components of the s= ystem. >> >> -Changing U-Boot on new system -> still 7% slower >> -Copy Kernel 3.0.43 to new system -> still 7% slower >> -Creating Kernel 3.0.43 with >> Xenomai 2.6.4 and copy it to new system -> still 7% slower >> -Compiling the new system with >> old GCC version -> still 7%= slower >> -We also checked the settings for RAM and CPU clock -> these are equal >> >> It looks like that is not one of the big components, >> so we started to test some special functions like rt_timer_tsc() >> In the following example we stay for 800=B5s in the while loop and >> start this loop again after 200=B5s delay. >> The task application running this code has priotity 95. >> >> Here a simplified code snipped >> start =3D rt_timer_tsc(); >> do >> { >> current =3D rt_timer_tsc(); >> i++;=09 >> } while((current - start) < 800) > If your CPU is running at 1 GHz and uses the global timer as clock > source, the clock source runs at 500 MHz, so 800 ticks of the tsc is > something around 1.6 us Sorry I simplified the code snippet a little bit to much. Thats the correct code. current =3D rt_timer_tsc2ns(rt_timer_tsc()); > So, I do not really understand what you are talking about. But are > you sure the two kernels use the same clocksource for xenomai? > > Could you show us the result of "dmesg | grep I-pipe" with the two > kernels ? Output of Kernel 3.10.53 with Xenomai 2.6.4 I-pipe, 3.000 MHz clocksource I-pipe, 396.000 MHz clocksource I-pipe, 396.000 MHz timer I-pipe, 396.000 MHz timer I-pipe: head domain Xenomai registered. Output of Kernel 3.0.43 with Xenomai 2.6.2.1 [ 0.000000] I-pipe 1.18-13: pipeline enabled. [ 0.331999] I-pipe, 396.000 MHz timer [ 0.335720] I-pipe, 396.000 MHz clocksource [ 0.844016] I-pipe: Domain Xenomai registered. The controller is a imx6dl, this controller can run maximum 800MHz