From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 28 Jun 2016 11:29:55 +0200 From: Gilles Chanteperdrix Message-ID: <20160628092955.GL18662@hermes.click-hack.org> References: <20160606153545.GA376@hermes.click-hack.org> <5756D673.4080408@sigmatek.at> <20160607170050.GA13922@hermes.click-hack.org> <57714C60.4070407@sigmatek.at> <20160627164604.GH18662@hermes.click-hack.org> <577235C4.2080507@sigmatek.at> <20160628083404.GI18662@hermes.click-hack.org> <57724022.8010904@sigmatek.at> <20160628091747.GK18662@hermes.click-hack.org> <57724333.6010608@sigmatek.at> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <57724333.6010608@sigmatek.at> Subject: Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wolfgang Netbal Cc: xenomai@xenomai.org On Tue, Jun 28, 2016 at 11:28:19AM +0200, Wolfgang Netbal wrote: >=20 >=20 > Am 2016-06-28 um 11:17 schrieb Gilles Chanteperdrix: > > On Tue, Jun 28, 2016 at 11:15:14AM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix: > >>> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote: > >>>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: > >>>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: > >>>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: > >>>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: > >>>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: > >>>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: > >>>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: > >>>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wro= te: > >>>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: > >>>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal w= rote: > >>>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > >>>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal= wrote: > >>>>>>>>>>>>>>>> Dear all, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Li= nux 3.0.43" to > >>>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6= DL. The system > >>>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we= see a > >>>>>>>>>>>>>>>> difference in the performance. Our old combination (XENO= MAI 2.6.2.1 + > >>>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls > >>>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.= 2.1 in our old > >>>>>>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts= our main > >>>>>>>>>>>>>>>> XENOMAI task with priority =3D 95. > >>>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_han= dler() > >>>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime t= ask > >>>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4. > >>>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, exc= ept the > >>>>>>>> once that are send by my board using GPIOs, but this virtual int= errupts > >>>>>>>> are assigned to Xenomai and Linux as well but I didn't see a han= dler > >>>>>>>> installed. > >>>>>>>> I'm pretty sure that these interrupts are slowing down my system= , but > >>>>>>>> where do they come from ? > >>>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? > >>>>>>>> how long do they need to process ? > >>>>>>> How do you mean you do not see them? If you are talking about the > >>>>>>> rescheduling API, it used no to be bound to a virq (so, it would > >>>>>>> have a different irq number on cortex A9, something between 0 and= 31 > >>>>>>> that would not show in the usual /proc files), I wonder if 3.0 is > >>>>>>> before or after that. You do not see them in /proc, or you see th= em > >>>>>>> and their count does not increase? > >>>>>> Sorry for the long delay, we ran a lot of tests to find out what c= ould > >>>>>> be the reason for > >>>>>> the performance difference. > >>>>>> > >>>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assig= ned to > >>>>>> the virtual > >>>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kern= el > >>>>>>> As for where they come from, this is not a mystery, the reschedule > >>>>>>> IPI is triggered when code on one cpu changes the scheduler state > >>>>>>> (wakes up a thread for instance) on another cpu. If you want to > >>>>>>> avoid it, do not do that. That means, do not share mutex between > >>>>>>> threads running on different cpus, pay attention for timers to be > >>>>>>> running on the same cpu as the thread they signal, etc... > >>>>>>> > >>>>>>> The APC virq is used to multiplex several services, which you can > >>>>>>> find by grepping the sources for rthal_apc_alloc: > >>>>>>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc =3D rthal_apc_a= lloc("pse51_lostage_handler", > >>>>>>> ./ksrc/skins/rtdm/device.c: rtdm_apc =3D rthal_apc_alloc("def= erred RTDM close", rtdm_apc_handler, > >>>>>>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_exp= ort", ®istry_proc_schedule, NULL); > >>>>>>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpip= e_wakeup_proc, NULL); > >>>>>>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_hand= ler", &lostage_handler, NULL); > >>>>>>> ./ksrc/nucleus/select.c: xnselect_apc =3D rthal_apc_alloc(= "xnselectors_destroy", > >>>>>>> > >>>>>>> It would be interesting to know which of these services is trigge= red > >>>>>>> a lot. One possibility I see would be root thread priority > >>>>>>> inheritance, so it would be caused by mode switches. This brings = the > >>>>>>> question: do your application have threads migrating between prim= ary > >>>>>>> and secondary mode, do you see the count of mode switches increase > >>>>>>> with the kernel changes, do you have root thread priority > >>>>>>> inheritance enabled? > >>>>>>> > >>>>>> Here a short sum up of our tests and the results and at the end a = few > >>>>>> questions :-) > >>>>>> > >>>>>> we are using a Freescale imx6dl on our hardware and upgraded our o= perating system from > >>>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as= compiler we use GCC 4.7.2 > >>>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as = compiler we use GCC 4.8.2 > >>>>>> On both Kernels the CONFIG_SMP is set. > >>>>>> > >>>>>> What we see is that when we running a customer project in a Xenoma= i task with priority 95 > >>>>>> tooks 40% of the CPU time on Kernel 3.0.43 > >>>>>> and 47% of CPU time on Kernel 3.10.53 > >>>>>> > >>>>>> so the new system is slower by 7% if we sum up this to 100% CPU lo= ad we have a difference of 15% > >>>>>> To find out what is the reason for this difference we ran the foll= owing test. > >>>>>> We tried to get the new system faster by change some components of= the system. > >>>>>> > >>>>>> -Changing U-Boot on new system -> still 7% slower > >>>>>> -Copy Kernel 3.0.43 to new system -> still 7% slower > >>>>>> -Creating Kernel 3.0.43 with > >>>>>> Xenomai 2.6.4 and copy it to new system -> still 7% slo= wer > >>>>>> -Compiling the new system with > >>>>>> old GCC version -> = still 7% slower > >>>>>> -We also checked the settings for RAM and CPU clock -> these are e= qual > >>>>>> > >>>>>> It looks like that is not one of the big components, > >>>>>> so we started to test some special functions like rt_timer_tsc() > >>>>>> In the following example we stay for 800=B5s in the while loop and > >>>>>> start this loop again after 200=B5s delay. > >>>>>> The task application running this code has priotity 95. > >>>>>> > >>>>>> Here a simplified code snipped > >>>>>> start =3D rt_timer_tsc(); > >>>>>> do > >>>>>> { > >>>>>> current =3D rt_timer_tsc(); > >>>>>> i++;=09 > >>>>>> } while((current - start) < 800) > >>>>> If your CPU is running at 1 GHz and uses the global timer as clock > >>>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is > >>>>> something around 1.6 us > >>>> Sorry I simplified the code snippet a little bit to much. > >>>> Thats the correct code. > >>>> > >>>> current =3D rt_timer_tsc2ns(rt_timer_tsc()); > >>>> > >>>>> So, I do not really understand what you are talking about. But are > >>>>> you sure the two kernels use the same clocksource for xenomai? > >>>>> > >>>>> Could you show us the result of "dmesg | grep I-pipe" with the two > >>>>> kernels ? > >>>> Output of Kernel 3.10.53 with Xenomai 2.6.4 > >>>> I-pipe, 3.000 MHz clocksource > >>>> I-pipe, 396.000 MHz clocksource > >>>> I-pipe, 396.000 MHz timer > >>>> I-pipe, 396.000 MHz timer > >>>> I-pipe: head domain Xenomai registered. > >>>> > >>>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1 > >>>> [ 0.000000] I-pipe 1.18-13: pipeline enabled. > >>>> [ 0.331999] I-pipe, 396.000 MHz timer > >>>> [ 0.335720] I-pipe, 396.000 MHz clocksource > >>>> [ 0.844016] I-pipe: Domain Xenomai registered. > >>>> > >>>> The controller is a imx6dl, this controller can run maximum 800MHz > >>> Ok, so the new kernel registers two tsc emulations, could you run > >>> the "tsc" regression test to measure the tsc latency? The two tsc > >>> emulations have very different latencies, so the result would be > >>> unmistakable. > >>> > >> Output of Kernel 3.10.53 with Xenomai 2.6.4 > >> /usr/xenomai/bin/latency > > This test is named "latency", not "tsc". As the different names > > imply, they are not measuring the same thing. > > > Sorry for the stupied question, > but where do I find the "tsc" test, because in folder /usr/xenomai/bin/=20 > is it not located You have millions of files on your target? Or you compiled busybox without support for the "find" utility ? --=20 Gilles. https://click-hack.org