From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 28 Jun 2016 10:34:04 +0200 From: Gilles Chanteperdrix Message-ID: <20160628083404.GI18662@hermes.click-hack.org> References: <20160601141238.GC14103@hermes.click-hack.org> <574FEB2D.5010509@sigmatek.at> <20160602082318.GB1801@hermes.click-hack.org> <5755204C.6090701@sigmatek.at> <20160606153545.GA376@hermes.click-hack.org> <5756D673.4080408@sigmatek.at> <20160607170050.GA13922@hermes.click-hack.org> <57714C60.4070407@sigmatek.at> <20160627164604.GH18662@hermes.click-hack.org> <577235C4.2080507@sigmatek.at> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <577235C4.2080507@sigmatek.at> Subject: Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wolfgang Netbal Cc: xenomai@xenomai.org On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote: >=20 >=20 > Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix: > > On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote: > >> > >> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix: > >>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote: > >>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix: > >>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote: > >>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix: > >>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote: > >>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix: > >>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote: > >>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix: > >>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wro= te: > >>>>>>>>>>>> Dear all, > >>>>>>>>>>>> > >>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux = 3.0.43" to > >>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. = The system > >>>>>>>>>>>> is now up and running and works stable. Unfortunately we see= a > >>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI = 2.6.2.1 + > >>>>>>>>>>>> Linux 3.0.43) was slightly faster. > >>>>>>>>>>>> > >>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls > >>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 = in our old > >>>>>>>>>>>> system. Every call of xnpod_schedule_handler interrupts our= main > >>>>>>>>>>>> XENOMAI task with priority =3D 95. > >>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler= () > >>>> and 1038 handled by xnpod_schedule_handler() while my realtime task > >>>> is running on kernel 3.10.53 with Xenomai 2.6.4. > >>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except = the > >>>> once that are send by my board using GPIOs, but this virtual interru= pts > >>>> are assigned to Xenomai and Linux as well but I didn't see a handler > >>>> installed. > >>>> I'm pretty sure that these interrupts are slowing down my system, but > >>>> where do they come from ? > >>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ? > >>>> how long do they need to process ? > >>> How do you mean you do not see them? If you are talking about the > >>> rescheduling API, it used no to be bound to a virq (so, it would > >>> have a different irq number on cortex A9, something between 0 and 31 > >>> that would not show in the usual /proc files), I wonder if 3.0 is > >>> before or after that. You do not see them in /proc, or you see them > >>> and their count does not increase? > >> Sorry for the long delay, we ran a lot of tests to find out what could > >> be the reason for > >> the performance difference. > >> > >> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned = to > >> the virtual > >> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel > >>> As for where they come from, this is not a mystery, the reschedule > >>> IPI is triggered when code on one cpu changes the scheduler state > >>> (wakes up a thread for instance) on another cpu. If you want to > >>> avoid it, do not do that. That means, do not share mutex between > >>> threads running on different cpus, pay attention for timers to be > >>> running on the same cpu as the thread they signal, etc... > >>> > >>> The APC virq is used to multiplex several services, which you can > >>> find by grepping the sources for rthal_apc_alloc: > >>> ./ksrc/skins/posix/apc.c: pse51_lostage_apc =3D rthal_apc_alloc= ("pse51_lostage_handler", > >>> ./ksrc/skins/rtdm/device.c: rtdm_apc =3D rthal_apc_alloc("deferre= d RTDM close", rtdm_apc_handler, > >>> ./ksrc/nucleus/registry.c: rthal_apc_alloc("registry_export"= , ®istry_proc_schedule, NULL); > >>> ./ksrc/nucleus/pipe.c: rthal_apc_alloc("pipe_wakeup", &xnpipe_wa= keup_proc, NULL); > >>> ./ksrc/nucleus/shadow.c: rthal_apc_alloc("lostage_handler"= , &lostage_handler, NULL); > >>> ./ksrc/nucleus/select.c: xnselect_apc =3D rthal_apc_alloc("xns= electors_destroy", > >>> > >>> It would be interesting to know which of these services is triggered > >>> a lot. One possibility I see would be root thread priority > >>> inheritance, so it would be caused by mode switches. This brings the > >>> question: do your application have threads migrating between primary > >>> and secondary mode, do you see the count of mode switches increase > >>> with the kernel changes, do you have root thread priority > >>> inheritance enabled? > >>> > >> Here a short sum up of our tests and the results and at the end a few > >> questions :-) > >> > >> we are using a Freescale imx6dl on our hardware and upgraded our opera= ting system from > >> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as com= piler we use GCC 4.7.2 > >> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as comp= iler we use GCC 4.8.2 > >> On both Kernels the CONFIG_SMP is set. > >> > >> What we see is that when we running a customer project in a Xenomai ta= sk with priority 95 > >> tooks 40% of the CPU time on Kernel 3.0.43 > >> and 47% of CPU time on Kernel 3.10.53 > >> > >> so the new system is slower by 7% if we sum up this to 100% CPU load w= e have a difference of 15% > >> To find out what is the reason for this difference we ran the followin= g test. > >> We tried to get the new system faster by change some components of the= system. > >> > >> -Changing U-Boot on new system -> still 7% slower > >> -Copy Kernel 3.0.43 to new system -> still 7% slower > >> -Creating Kernel 3.0.43 with > >> Xenomai 2.6.4 and copy it to new system -> still 7% slower > >> -Compiling the new system with > >> old GCC version -> still = 7% slower > >> -We also checked the settings for RAM and CPU clock -> these are equal > >> > >> It looks like that is not one of the big components, > >> so we started to test some special functions like rt_timer_tsc() > >> In the following example we stay for 800=B5s in the while loop and > >> start this loop again after 200=B5s delay. > >> The task application running this code has priotity 95. > >> > >> Here a simplified code snipped > >> start =3D rt_timer_tsc(); > >> do > >> { > >> current =3D rt_timer_tsc(); > >> i++;=09 > >> } while((current - start) < 800) > > If your CPU is running at 1 GHz and uses the global timer as clock > > source, the clock source runs at 500 MHz, so 800 ticks of the tsc is > > something around 1.6 us > Sorry I simplified the code snippet a little bit to much. > Thats the correct code. >=20 > current =3D rt_timer_tsc2ns(rt_timer_tsc()); >=20 > > So, I do not really understand what you are talking about. But are > > you sure the two kernels use the same clocksource for xenomai? > > > > Could you show us the result of "dmesg | grep I-pipe" with the two > > kernels ? > Output of Kernel 3.10.53 with Xenomai 2.6.4 > I-pipe, 3.000 MHz clocksource > I-pipe, 396.000 MHz clocksource > I-pipe, 396.000 MHz timer > I-pipe, 396.000 MHz timer > I-pipe: head domain Xenomai registered. >=20 > Output of Kernel 3.0.43 with Xenomai 2.6.2.1 > [ 0.000000] I-pipe 1.18-13: pipeline enabled. > [ 0.331999] I-pipe, 396.000 MHz timer > [ 0.335720] I-pipe, 396.000 MHz clocksource > [ 0.844016] I-pipe: Domain Xenomai registered. >=20 > The controller is a imx6dl, this controller can run maximum 800MHz Ok, so the new kernel registers two tsc emulations, could you run the "tsc" regression test to measure the tsc latency? The two tsc emulations have very different latencies, so the result would be unmistakable. --=20 Gilles. https://click-hack.org