All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wolfgang Netbal <wolfgang.netbal@sigmatek.at>
To: xenomai@xenomai.org
Subject: Re: [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4
Date: Tue, 28 Jun 2016 11:15:14 +0200	[thread overview]
Message-ID: <57724022.8010904@sigmatek.at> (raw)
In-Reply-To: <20160628083404.GI18662@hermes.click-hack.org>



Am 2016-06-28 um 10:34 schrieb Gilles Chanteperdrix:
> On Tue, Jun 28, 2016 at 10:31:00AM +0200, Wolfgang Netbal wrote:
>>
>> Am 2016-06-27 um 18:46 schrieb Gilles Chanteperdrix:
>>> On Mon, Jun 27, 2016 at 05:55:12PM +0200, Wolfgang Netbal wrote:
>>>> Am 2016-06-07 um 19:00 schrieb Gilles Chanteperdrix:
>>>>> On Tue, Jun 07, 2016 at 04:13:07PM +0200, Wolfgang Netbal wrote:
>>>>>> Am 2016-06-06 um 17:35 schrieb Gilles Chanteperdrix:
>>>>>>> On Mon, Jun 06, 2016 at 09:03:40AM +0200, Wolfgang Netbal wrote:
>>>>>>>> Am 2016-06-02 um 10:23 schrieb Gilles Chanteperdrix:
>>>>>>>>> On Thu, Jun 02, 2016 at 10:15:41AM +0200, Wolfgang Netbal wrote:
>>>>>>>>>> Am 2016-06-01 um 16:12 schrieb Gilles Chanteperdrix:
>>>>>>>>>>> On Wed, Jun 01, 2016 at 03:52:06PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>> Am 2016-05-31 um 16:16 schrieb Gilles Chanteperdrix:
>>>>>>>>>>>>> On Tue, May 31, 2016 at 04:09:07PM +0200, Wolfgang Netbal wrote:
>>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> we have moved our application from "XENOMAI 2.6.2.1 + Linux 3.0.43" to
>>>>>>>>>>>>>> "XENOMAI 2.6.4. + Linux 3.10.53". Our target is an i.MX6DL. The system
>>>>>>>>>>>>>> is now up and running and works stable. Unfortunately we see a
>>>>>>>>>>>>>> difference in the performance. Our old combination (XENOMAI 2.6.2.1 +
>>>>>>>>>>>>>> Linux 3.0.43) was slightly faster.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At the moment it looks like that XENOMAI 2.6.4 calls
>>>>>>>>>>>>>> xnpod_schedule_handler much more often then XENOMAI 2.6.2.1 in our old
>>>>>>>>>>>>>> system.  Every call of xnpod_schedule_handler interrupts our main
>>>>>>>>>>>>>> XENOMAI task with priority = 95.
>>>>>> As I wrote above, I get interrupts 1037 handled by rthal_apc_handler()
>>>>>> and 1038 handled by xnpod_schedule_handler() while my realtime task
>>>>>> is running on kernel 3.10.53 with Xenomai 2.6.4.
>>>>>> On kernel 3.0.43 with Xenomai 2.6.4 there are no interrupts, except the
>>>>>> once that are send by my board using GPIOs, but this virtual interrupts
>>>>>> are assigned to Xenomai and Linux as well but I didn't see a handler
>>>>>> installed.
>>>>>> I'm pretty sure that these interrupts are slowing down my system, but
>>>>>> where do they come from ?
>>>>>> why didn't I see them on Kernel 3.0.43 with Xenomai 2.6.4 ?
>>>>>> how long do they need to process ?
>>>>> How do you mean you do not see them? If you are talking about the
>>>>> rescheduling API, it used no to be bound to a virq (so, it would
>>>>> have a different irq number on cortex A9, something between 0 and 31
>>>>> that would not show in the usual /proc files), I wonder if 3.0 is
>>>>> before or after that. You do not see them in /proc, or you see them
>>>>> and their count does not increase?
>>>> Sorry for the long delay, we ran a lot of tests to find out what could
>>>> be the reason for
>>>> the performance difference.
>>>>
>>>> If I call cat /proc/ipipe/Xenomai I dont see the IRQ handler assigned to
>>>> the virtual
>>>> IRQ on Kernel 3.0.43, but it looks like thats an issue of the Kernel
>>>>> As for where they come from, this is not a mystery, the reschedule
>>>>> IPI is triggered when code on one cpu changes the scheduler state
>>>>> (wakes up a thread for instance) on another cpu. If you want to
>>>>> avoid it, do not do that. That means, do not share mutex between
>>>>> threads running on different cpus, pay attention for timers to be
>>>>> running on the same cpu as the thread they signal, etc...
>>>>>
>>>>> The APC virq is used to multiplex several services, which you can
>>>>> find by grepping the sources for rthal_apc_alloc:
>>>>> ./ksrc/skins/posix/apc.c:       pse51_lostage_apc = rthal_apc_alloc("pse51_lostage_handler",
>>>>> ./ksrc/skins/rtdm/device.c:     rtdm_apc = rthal_apc_alloc("deferred RTDM close", rtdm_apc_handler,
>>>>> ./ksrc/nucleus/registry.c:          rthal_apc_alloc("registry_export", &registry_proc_schedule, NULL);
>>>>> ./ksrc/nucleus/pipe.c:      rthal_apc_alloc("pipe_wakeup", &xnpipe_wakeup_proc, NULL);
>>>>> ./ksrc/nucleus/shadow.c:            rthal_apc_alloc("lostage_handler", &lostage_handler, NULL);
>>>>> ./ksrc/nucleus/select.c:        xnselect_apc = rthal_apc_alloc("xnselectors_destroy",
>>>>>
>>>>> It would be interesting to know which of these services is triggered
>>>>> a lot. One possibility I see would be root thread priority
>>>>> inheritance, so it would be caused by mode switches. This brings the
>>>>> question: do your application have threads migrating between primary
>>>>> and secondary mode, do you see the count of mode switches increase
>>>>> with the kernel changes, do you have root thread priority
>>>>> inheritance enabled?
>>>>>
>>>> Here a short sum up of our tests and the results and at the end a few
>>>> questions :-)
>>>>
>>>> we are using a Freescale imx6dl on our hardware and upgraded our operating system from
>>>> Freescale Kernel 3.0.43 with Xenomai 2.6.2.1 and U-Boot 2013.04 as compiler we use GCC 4.7.2
>>>> Freescale Kernel 3.10.53 with Xenomai 2.6.4 and U-Boot 2016.01 as compiler we use GCC 4.8.2
>>>> On both Kernels the CONFIG_SMP is set.
>>>>
>>>> What we see is that when we running a customer project in a Xenomai task with priority 95
>>>> tooks 40% of the CPU time on Kernel 3.0.43
>>>> and 47% of CPU time on Kernel 3.10.53
>>>>
>>>> so the new system is slower by 7% if we sum up this to 100% CPU load we have a difference of 15%
>>>> To find out what is the reason for this difference we ran the following test.
>>>> We tried to get the new system faster by change some components of the system.
>>>>
>>>> -Changing U-Boot on new system                -> still 7% slower
>>>> -Copy Kernel 3.0.43 to new system            -> still 7% slower
>>>> -Creating Kernel 3.0.43 with
>>>>        Xenomai 2.6.4 and copy it to new system    -> still 7% slower
>>>> -Compiling the new system with
>>>>        old GCC version                                        -> still 7% slower
>>>> -We also checked the settings for RAM and CPU clock -> these are equal
>>>>
>>>> It looks like that is not one of the big components,
>>>> so we started to test some special functions like rt_timer_tsc()
>>>> In the following example we stay for 800µs in the while loop and
>>>> start this loop again after 200µs delay.
>>>> The task application running this code has priotity 95.
>>>>
>>>> Here a simplified code snipped
>>>> start = rt_timer_tsc();
>>>> do
>>>> {
>>>> 	current = rt_timer_tsc();
>>>> 	i++;	
>>>> } while((current - start) < 800)
>>> If your CPU is running at 1 GHz and uses the global timer as clock
>>> source, the clock source runs at 500 MHz, so 800 ticks of the tsc is
>>> something around 1.6 us
>> Sorry I simplified the code snippet a little bit to much.
>> Thats the correct code.
>>
>> current = rt_timer_tsc2ns(rt_timer_tsc());
>>
>>> So, I do not really understand what you are talking about. But are
>>> you sure the two kernels use the same clocksource for xenomai?
>>>
>>> Could you show us the result of "dmesg | grep I-pipe" with the two
>>> kernels ?
>> Output of Kernel 3.10.53 with Xenomai 2.6.4
>> I-pipe, 3.000 MHz clocksource
>> I-pipe, 396.000 MHz clocksource
>> I-pipe, 396.000 MHz timer
>> I-pipe, 396.000 MHz timer
>> I-pipe: head domain Xenomai registered.
>>
>> Output of Kernel 3.0.43 with Xenomai 2.6.2.1
>> [    0.000000] I-pipe 1.18-13: pipeline enabled.
>> [    0.331999] I-pipe, 396.000 MHz timer
>> [    0.335720] I-pipe, 396.000 MHz clocksource
>> [    0.844016] I-pipe: Domain Xenomai registered.
>>
>> The controller is a imx6dl, this controller can run maximum 800MHz
> Ok, so the new kernel registers two tsc emulations, could you run
> the "tsc" regression test to measure the tsc latency? The two tsc
> emulations have very different latencies, so the result would be
> unmistakable.
>

Output of Kernel 3.10.53 with Xenomai 2.6.4
/usr/xenomai/bin/latency
== Sampling period: 1000 us
== Test mode: periodic user-mode task
== All results in microseconds
warming up...
RTT|  00:00:01  (periodic user-mode task, 1000 us period, priority 99)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
RTD|     -3.048|     -1.960|      4.053|       0|     0|     -3.048|      4.053
RTD|     -3.064|     -1.874|      5.936|       0|     0|     -3.064|      5.936
RTD|     -3.137|     -1.963|      3.545|       0|     0|     -3.137|      5.936
RTD|     -3.069|     -1.968|      5.805|       0|     0|     -3.137|      5.936
RTD|     -3.064|     -1.945|      4.371|       0|     0|     -3.137|      5.936
RTD|     -3.071|     -1.905|      3.613|       0|     0|     -3.137|      5.936
RTD|     -3.119|     -1.766|      4.967|       0|     0|     -3.137|      5.936
RTD|     -3.119|     -1.910|      3.883|       0|     0|     -3.137|      5.936
RTD|     -3.102|     -1.910|      5.494|       0|     0|     -3.137|      5.936
RTD|     -3.107|     -1.907|      3.795|       0|     0|     -3.137|      5.936
RTD|     -3.066|     -1.935|      4.068|       0|     0|     -3.137|      5.936
RTD|     -2.960|     -1.920|      4.270|       0|     0|     -3.137|      5.936
RTD|     -3.190|     -2.003|      3.436|       0|     0|     -3.190|      5.936
RTD|     -3.026|     -2.003|      4.679|       0|     0|     -3.190|      5.936
RTD|     -3.149|     -2.011|      3.861|       0|     0|     -3.190|      5.936
RTD|     -3.059|     -1.990|      3.651|       0|     0|     -3.190|      5.936
RTD|     -3.119|     -1.940|      4.249|       0|     0|     -3.190|      5.936
RTD|     -3.192|     -1.983|      4.270|       0|     0|     -3.192|      5.936
RTD|     -3.026|     -2.003|      3.568|       0|     0|     -3.192|      5.936
RTD|     -3.096|     -1.973|      6.376|       0|     0|     -3.192|      6.376
RTD|     -3.258|     -1.953|      5.131|       0|     0|     -3.258|      6.376
RTT|  00:00:22  (periodic user-mode task, 1000 us period, priority 99)

Can be the two timers the reason for the -3.xxx at lat min ?
Is it possible to disable one of the two timers ?

Output of Kernel 3.0.43 with Xenomai 2.6.2.1

/usr/xenomai/bin/latency
== Sampling period: 1000 us
== Test mode: periodic user-mode task
== All results in microseconds
warming up...
RTT|  00:00:01  (periodic user-mode task, 1000 us period, priority 99)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
RTD|      3.060|      5.098|     10.255|       0|     0|      3.060|     10.255
RTD|      3.073|      5.146|     10.742|       0|     0|      3.060|     10.742
RTD|      2.999|      5.146|     10.818|       0|     0|      2.999|     10.818
RTD|      3.249|      5.146|     10.936|       0|     0|      2.999|     10.936
RTD|      3.169|      5.184|     11.656|       0|     0|      2.999|     11.656
RTD|      3.133|      5.156|     10.881|       0|     0|      2.999|     11.656
RTD|      3.123|      5.068|      9.835|       0|     0|      2.999|     11.656
RTD|      3.032|      5.101|     10.628|       0|     0|      2.999|     11.656
RTD|      3.088|      5.047|     10.492|       0|     0|      2.999|     11.656
RTD|      3.068|      5.159|     11.681|       0|     0|      2.999|     11.681
RTD|      2.967|      5.073|     11.648|       0|     0|      2.967|     11.681
RTD|      3.073|      5.106|     11.499|       0|     0|      2.967|     11.681
RTD|      3.053|      5.063|     10.727|       0|     0|      2.967|     11.681
RTD|      3.159|      5.113|     10.560|       0|     0|      2.967|     11.681
RTD|      3.020|      5.078|     11.578|       0|     0|      2.967|     11.681
RTD|      3.227|      5.146|     10.856|       0|     0|      2.967|     11.681
RTD|      3.186|      5.118|     10.335|       0|     0|      2.967|     11.681
RTD|      3.116|      5.108|     10.782|       0|     0|      2.967|     11.681
RTD|      2.954|      5.095|     11.921|       0|     0|      2.954|     11.921
RTD|      2.952|      5.156|     10.631|       0|     0|      2.952|     11.921
RTD|      2.898|      5.121|     10.699|       0|     0|      2.898|     11.921
RTT|  00:00:22  (periodic user-mode task, 1000 us period, priority 99)




  reply	other threads:[~2016-06-28  9:15 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-31 14:09 [Xenomai] Performance impact after switching from 2.6.2.1 to 2.6.4 Wolfgang Netbal
2016-05-31 14:16 ` Gilles Chanteperdrix
2016-06-01 13:52   ` Wolfgang Netbal
2016-06-01 14:12     ` Gilles Chanteperdrix
2016-06-02  8:15       ` Wolfgang Netbal
2016-06-02  8:23         ` Gilles Chanteperdrix
2016-06-06  7:03           ` Wolfgang Netbal
2016-06-06 15:35             ` Gilles Chanteperdrix
2016-06-07 14:13               ` Wolfgang Netbal
2016-06-07 17:00                 ` Gilles Chanteperdrix
2016-06-27 15:55                   ` Wolfgang Netbal
2016-06-27 16:00                     ` Gilles Chanteperdrix
2016-06-28  8:08                       ` Wolfgang Netbal
2016-06-27 16:46                     ` Gilles Chanteperdrix
2016-06-28  8:31                       ` Wolfgang Netbal
2016-06-28  8:34                         ` Gilles Chanteperdrix
2016-06-28  9:15                           ` Wolfgang Netbal [this message]
2016-06-28  9:17                             ` Gilles Chanteperdrix
2016-06-28  9:28                               ` Wolfgang Netbal
2016-06-28  9:29                                 ` Gilles Chanteperdrix
2016-06-28  9:51                                   ` Wolfgang Netbal
2016-06-28  9:55                                     ` Gilles Chanteperdrix
2016-06-28 10:10                                       ` Wolfgang Netbal
2016-06-28 10:19                                         ` Gilles Chanteperdrix
2016-06-28 10:31                                           ` Wolfgang Netbal
2016-06-28 10:39                                             ` Gilles Chanteperdrix
2016-06-28 11:45                                               ` Wolfgang Netbal
2016-06-28 11:57                                                 ` Gilles Chanteperdrix
2016-06-28 11:55                                               ` Wolfgang Netbal
2016-06-28 12:01                                                 ` Gilles Chanteperdrix
2016-06-28 14:32                                                   ` Wolfgang Netbal
2016-06-28 14:42                                                     ` Gilles Chanteperdrix
2016-06-30  9:17                                                       ` Wolfgang Netbal
2016-06-30  9:39                                                         ` Gilles Chanteperdrix
2016-06-07 17:22                 ` Philippe Gerum
2016-05-31 15:08 ` Philippe Gerum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57724022.8010904@sigmatek.at \
    --to=wolfgang.netbal@sigmatek.at \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.