* [Xenomai] "RT throttling" issue
@ 2015-11-26 10:32 JK.Behnke
2015-11-28 6:43 ` Gilles Chanteperdrix
0 siblings, 1 reply; 13+ messages in thread
From: JK.Behnke @ 2015-11-26 10:32 UTC (permalink / raw)
To: xenomai
Hello,
in my xenomai 2.6.3 application I sporadically experience very long execution times of
a task that switches back and forth between secondary and primary mode.
I added the following code to measure elapsed and cpu time
//------------- start code snippet ---------------
struct rusage rsgT1;
struct rusage rsgT2;
struct timeval tvT1;
struct timeval tvT2;
gettimeofday(&tvT1, NULL);
getrusage(RUSAGE_SELF, &rsgT1);
//
// code block to be measured
//
gettimeofday(&tvT2, NULL);
getrusage(RUSAGE_SELF, &rsgT2);
//------------- end code snippet ---------------
I then calculate the difference and keep the maximum values in
CpuMax (getrusage) and TimeMax (gettimeofday) variables.
Typically I observe CpuMax=20 .. 100 ms and TimeMax with similar values.
But sometimes I get huge CpuMax values in the range 5000 .. 50000 ms.
Now my questions
In my system logfile I sometimes observe "RT throttling activated" messages.
According to the Linux kernel 3.8.13 documentation the Linux scheduler throttles
RT tasks, if they comsume too much cpu time.
Could the observed behaviour be related to this?
Is it a problem, if I call gettimeofday/getrusage multiple times in
a loop and the loop is executed every 10 ms?
I would expect TimeMax to always be greater or equal CpuMax.
But sometimes I observe CpuMax values that are significantly larger than
TimeMax during the same test session.
Any ideas what might be the cause?
Thanks for any comments
Jochen
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xenomai] "RT throttling" issue
2015-11-26 10:32 [Xenomai] "RT throttling" issue JK.Behnke
@ 2015-11-28 6:43 ` Gilles Chanteperdrix
2015-11-29 9:30 ` Jochen Behnke
2015-12-01 13:09 ` JK.Behnke
0 siblings, 2 replies; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-11-28 6:43 UTC (permalink / raw)
To: JK.Behnke; +Cc: xenomai
On Thu, Nov 26, 2015 at 11:32:57AM +0100, JK.Behnke@web.de wrote:
> Hello,
>
> in my xenomai 2.6.3 application I sporadically experience very long execution times of
> a task that switches back and forth between secondary and primary mode.
>
> I added the following code to measure elapsed and cpu time
> //------------- start code snippet ---------------
> struct rusage rsgT1;
> struct rusage rsgT2;
> struct timeval tvT1;
> struct timeval tvT2;
> gettimeofday(&tvT1, NULL);
Do you observe the same problem with Xenomai 2.6 git head, if you
use clock_gettime(CLOCK_HOST_REALTIME) instead of gettimeofday, at
least for threads running in primary mode ?
--
Gilles.
https://click-hack.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xenomai] "RT throttling" issue
2015-11-28 6:43 ` Gilles Chanteperdrix
@ 2015-11-29 9:30 ` Jochen Behnke
2015-12-01 13:09 ` JK.Behnke
1 sibling, 0 replies; 13+ messages in thread
From: Jochen Behnke @ 2015-11-29 9:30 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
Hello Gilles,
in the meanwhile I added rt_timer_tsc as a second alternative to measure
elapsed time. rt_timer_tsc returns values that are a somewhat greater
than expected (around 39 ms instead of expected 20 ms), but still
reasonable.
> On Thu, Nov 26, 2015 at 11:32:57AM +0100, JK.Behnke@web.de wrote:
>> Hello,
>>
>> in my xenomai 2.6.3 application I sporadically experience very long execution times of
>> a task that switches back and forth between secondary and primary mode.
>>
>> I added the following code to measure elapsed and cpu time
>> //------------- start code snippet ---------------
>> struct rusage rsgT1;
>> struct rusage rsgT2;
>> struct timeval tvT1;
>> struct timeval tvT2;
>> gettimeofday(&tvT1, NULL);
>
> Do you observe the same problem with Xenomai 2.6 git head, if you
> use clock_gettime(CLOCK_HOST_REALTIME) instead of gettimeofday, at
> least for threads running in primary mode ?
>
I'll give Xenomai 2.6 git head a try.
Thanks for your help.
Regards
Jochen
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xenomai] "RT throttling" issue
2015-11-28 6:43 ` Gilles Chanteperdrix
2015-11-29 9:30 ` Jochen Behnke
@ 2015-12-01 13:09 ` JK.Behnke
2015-12-01 18:11 ` Gilles Chanteperdrix
1 sibling, 1 reply; 13+ messages in thread
From: JK.Behnke @ 2015-12-01 13:09 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
Hello Gilles,
> in the meanwhile I added rt_timer_tsc as a second alternative to
measure
> elapsed time. rt_timer_tsc returns values that are a somewhat greater
> than expected (around 39 ms instead of expected 20 ms), but still
> reasonable.
>
> > On Thu, Nov 26, 2015 at 11:32:57AM +0100, JK.Behnke@web.de wrote:
> >> Hello,
> >>
> >> in my xenomai 2.6.3 application I sporadically experience very
long execution times of
> >> a task that switches back and forth between secondary and primary
mode.
> >>
> >> I added the following code to measure elapsed and cpu time
> >> //------------- start code snippet ---------------
> >> struct rusage rsgT1;
> >> struct rusage rsgT2;
> >> struct timeval tvT1;
> >> struct timeval tvT2;
> >> gettimeofday(&tvT1, NULL);
> >
> > Do you observe the same problem with Xenomai 2.6 git head, if you
> > use clock_gettime(CLOCK_HOST_REALTIME) instead of gettimeofday, at
> > least for threads running in primary mode ?
I am now running xenomai 2.6.4 git head.
The application now uses the posix wrappers as described in the
documentation "Porting a Linux application to Xenomai dual kernel"
I replaced the gettimeofday() calls by
clock_gettime(CLOCK_HOST_REALTIME).
The getrusage() calls have been removed.
Both clock_gettime and rt_timer_tsc return reasonable values.
However, once I observed a loop execution time of 14000 ms instead of
typical 30 ms.
This means that the task was blocked for about 14 seconds.
As I still experience "RT throttling" messages of the Linux kernel,
I guess that some lengthy operation (file access, memory
allocation,...)
of some other task in my application produces the "RT throttling".
As a side effect the task gets blocked by the Linux scheduler while
in secondary mode. As mentioned earlier, the task in question switches
back and forth between secondary and primary mode.
Is it dangerous to use memory allocation and file access functions
inside a Linux task running under policy SCHED_FIFO, even though
they are not really time critical?
Could some other Linux process block these kind of tasks for such
a long time (14 sec) or even longer?
TIA for your comments on this
Regards
Jochen
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xenomai] "RT throttling" issue
2015-12-01 13:09 ` JK.Behnke
@ 2015-12-01 18:11 ` Gilles Chanteperdrix
2015-12-02 12:58 ` JK.Behnke
0 siblings, 1 reply; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-12-01 18:11 UTC (permalink / raw)
To: JK.Behnke; +Cc: xenomai
On Tue, Dec 01, 2015 at 02:09:45PM +0100, JK.Behnke@web.de wrote:
> Hello Gilles,
>
> > in the meanwhile I added rt_timer_tsc as a second alternative to
> measure
> > elapsed time. rt_timer_tsc returns values that are a somewhat greater
> > than expected (around 39 ms instead of expected 20 ms), but still
> > reasonable.
> >
> > > On Thu, Nov 26, 2015 at 11:32:57AM +0100, JK.Behnke@web.de wrote:
> > >> Hello,
> > >>
> > >> in my xenomai 2.6.3 application I sporadically experience very
> long execution times of
> > >> a task that switches back and forth between secondary and primary
> mode.
> > >>
> > >> I added the following code to measure elapsed and cpu time
> > >> //------------- start code snippet ---------------
> > >> struct rusage rsgT1;
> > >> struct rusage rsgT2;
> > >> struct timeval tvT1;
> > >> struct timeval tvT2;
> > >> gettimeofday(&tvT1, NULL);
> > >
> > > Do you observe the same problem with Xenomai 2.6 git head, if you
> > > use clock_gettime(CLOCK_HOST_REALTIME) instead of gettimeofday, at
> > > least for threads running in primary mode ?
> I am now running xenomai 2.6.4 git head.
> The application now uses the posix wrappers as described in the
> documentation "Porting a Linux application to Xenomai dual kernel"
> I replaced the gettimeofday() calls by
> clock_gettime(CLOCK_HOST_REALTIME).
> The getrusage() calls have been removed.
>
> Both clock_gettime and rt_timer_tsc return reasonable values.
> However, once I observed a loop execution time of 14000 ms instead of
> typical 30 ms.
If a "loop", runs 30ms in primary mode then you are starving Linux
from its timer interrupts. You should try to let linux at least
handle its interrupt every 10ms if running with HZ==100 or every
millisecond if running with HZ==1000.
> This means that the task was blocked for about 14 seconds.
> As I still experience "RT throttling" messages of the Linux kernel,
> I guess that some lengthy operation (file access, memory
> allocation,...)
No. "RT throttling" means that a task run by the Linux scheduler
with scheduling policy SCHED_FIFO or SCHED_RR is using more than 95%
of the cpu over a period of one second. This is bad, this is
probably a kind of infinite loop. gettimeofday called from primary
mode is known to cause such an issue (when xenomai watchdog has
detected the infinite loop in primary mode and kicked the task to
secondary mode), but if you have removed all calls to gettimeofday,
then the issue is elsewhere.
> of some other task in my application produces the "RT throttling".
> As a side effect the task gets blocked by the Linux scheduler while
> in secondary mode. As mentioned earlier, the task in question switches
> back and forth between secondary and primary mode.
>
> Is it dangerous to use memory allocation and file access functions
> inside a Linux task running under policy SCHED_FIFO, even though
> they are not really time critical?
File access cause the function to wait. Not to use cpu. Memory
allocation should not be a problem either. Though due to the way
Linux handle its memory (all unused memory is used as disk cache), a
memory allocation may actually have to wait for a disk access to
finish, so the worst case would make your task not real-time at all.
> Could some other Linux process block these kind of tasks for such
> a long time (14 sec) or even longer?
Using the I-pipe tracer, you may be able to understand what happens.
--
Gilles.
https://click-hack.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xenomai] "RT throttling" issue
2015-12-01 18:11 ` Gilles Chanteperdrix
@ 2015-12-02 12:58 ` JK.Behnke
2015-12-04 16:24 ` JK.Behnke
0 siblings, 1 reply; 13+ messages in thread
From: JK.Behnke @ 2015-12-02 12:58 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
Hello Gilles,
thanks for your comments.
> On Tue, Dec 01, 2015 at 02:09:45PM +0100, JK.Behnke@web.de wrote:
> > Hello Gilles,
> >
> > > in the meanwhile I added rt_timer_tsc as a second alternative to
> > measure
> > > elapsed time. rt_timer_tsc returns values that are a somewhat greater
> > > than expected (around 39 ms instead of expected 20 ms), but still
> > > reasonable.
> > >
> > > > On Thu, Nov 26, 2015 at 11:32:57AM +0100, JK.Behnke@web.de wrote:
> > > >> Hello,
> > > >>
> > > >> in my xenomai 2.6.3 application I sporadically experience very
> > long execution times of
> > > >> a task that switches back and forth between secondary and primary
> > mode.
> > > >>
> > > >> I added the following code to measure elapsed and cpu time
> > > >> //------------- start code snippet ---------------
> > > >> struct rusage rsgT1;
> > > >> struct rusage rsgT2;
> > > >> struct timeval tvT1;
> > > >> struct timeval tvT2;
> > > >> gettimeofday(&tvT1, NULL);
> > > >
> > > > Do you observe the same problem with Xenomai 2.6 git head, if you
> > > > use clock_gettime(CLOCK_HOST_REALTIME) instead of gettimeofday, at
> > > > least for threads running in primary mode ?
> > I am now running xenomai 2.6.4 git head.
> > The application now uses the posix wrappers as described in the
> > documentation "Porting a Linux application to Xenomai dual kernel"
> > I replaced the gettimeofday() calls by
> > clock_gettime(CLOCK_HOST_REALTIME).
> > The getrusage() calls have been removed.
> >
> > Both clock_gettime and rt_timer_tsc return reasonable values.
> > However, once I observed a loop execution time of 14000 ms instead of
> > typical 30 ms.
>
> If a "loop", runs 30ms in primary mode then you are starving Linux
> from its timer interrupts. You should try to let linux at least
> handle its interrupt every 10ms if running with HZ==100 or every
> millisecond if running with HZ==1000.
By "loop execution time" I meant the elapsed time between two
consecutive loops. This includes the sleep time of 20 ms per loop.
I typically observe max elapsed time values of 23ms .. 39ms.
Sorry for being unprecise.
>
>
> > This means that the task was blocked for about 14 seconds.
> > As I still experience "RT throttling" messages of the Linux kernel,
> > I guess that some lengthy operation (file access, memory
> > allocation,...)
>
> No. "RT throttling" means that a task run by the Linux scheduler
> with scheduling policy SCHED_FIFO or SCHED_RR is using more than 95%
> of the cpu over a period of one second. This is bad, this is
> probably a kind of infinite loop. gettimeofday called from primary
> mode is known to cause such an issue (when xenomai watchdog has
> detected the infinite loop in primary mode and kicked the task to
> secondary mode), but if you have removed all calls to gettimeofday,
> then the issue is elsewhere.
>
My understandig is, that tasks migrate automatically to secondary mode,
when issuing a Linux system call and migrate back to primary mode on
the next call of a xenomai service. Isn't that true?
Are there any Linux system calls other than gettimeofday which may cause
problems whenn called from primary mode?
> > of some other task in my application produces the "RT throttling".
> > As a side effect the task gets blocked by the Linux scheduler while
> > in secondary mode. As mentioned earlier, the task in question switches
> > back and forth between secondary and primary mode.
> >
> > Is it dangerous to use memory allocation and file access functions
> > inside a Linux task running under policy SCHED_FIFO, even though
> > they are not really time critical?
>
> File access cause the function to wait. Not to use cpu. Memory
> allocation should not be a problem either. Though due to the way
> Linux handle its memory (all unused memory is used as disk cache), a
> memory allocation may actually have to wait for a disk access to
> finish, so the worst case would make your task not real-time at all.
>
> > Could some other Linux process block these kind of tasks for such
> > a long time (14 sec) or even longer?
>
> Using the I-pipe tracer, you may be able to understand what happens.
Should I patch my Linux kernel so that xntrace_user_freeze is called,
when Linux scheduler sends "RT throttling activated" message?
Regards
Jochen
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xenomai] "RT throttling" issue
2015-12-02 12:58 ` JK.Behnke
@ 2015-12-04 16:24 ` JK.Behnke
2015-12-04 16:33 ` Gilles Chanteperdrix
0 siblings, 1 reply; 13+ messages in thread
From: JK.Behnke @ 2015-12-04 16:24 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
Hello Gilles,
sorry for bothering you again.
> >
> > Using the I-pipe tracer, you may be able to understand what happens.
> Should I patch my Linux kernel so that xntrace_user_freeze is called,
> when Linux scheduler sends "RT throttling activated" message?
I now have been able to capture an I-pipe frozen file, when the
"RT throttling" condition occurs.
I have done this by adding the line
ipipe_trace_freeze(1);
right before
printk_sched("sched: RT Throttling activated\n");
inside "kernel/sched/rt.c".
I have set back_trace_points to 10000 to get about 1 second of
backtrace history.
Here is a fragment of my huge frozen file which contains
the last 1.6 seconds before the freeze.
//================== frozen file start =======================
+----- Hard IRQs ('|': locked)
|+-- Xenomai
||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
||| +---------- Delay flag ('+': > 1 us, '!': > 10 us)
||| | +- NMI noise ('N')
||| | |
Type User Val. Time Delay Function (Parent)
:| # [ 4936] -<?>- 0 -1613869 0.852 xnpod_resume_thread+0x4e (xnthread_timeout_handler+0x1e)
:| # event tick@-1611814-1613868+ 3.538 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:| # [ 5066] prc6hmi -1 -1613864+ 1.964 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:| # [ 4936] -<?>- 0 -1613862+ 3.127 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:| # [ 4936] -<?>- 0 -1613859+ 1.192 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:| # [ 5066] prc6hmi -1 -1613858! 27.286 __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
:| *+[ 4936] -<?>- 0 -1613831+ 1.824 xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
:| # [ 456] gatekee -1 -1613829+ 4.200 __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
:| # [ 4936] -<?>- 0 -1613825 0.952 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:| # [ 456] gatekee -1 -1613824! 13.001 __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
:| *+[ 4936] -<?>- 0 -1613811 0.882 xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
:| # [ 456] gatekee -1 -1613810+ 2.506 __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
:| # [ 4936] -<?>- 0 -1613807 0.952 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:| # [ 456] gatekee -1 -1613806! 11.588 __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
:| *+[ 4936] -<?>- 0 -1613795 0.721 xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
:| # [ 456] gatekee -1 -1613794+ 6.395 __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
:| # [ 4936] -<?>- 0 -1613788 0.922 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:| # [ 456] gatekee -1 -1613787! 10.124 __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
:| *+[ 4936] -<?>- 0 -1613777 0.711 xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
:| # [ 456] gatekee -1 -1613776+ 2.686 __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
:| # [ 4936] -<?>- 0 -1613773 0.892 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
:| # [ 456] gatekee -1 -1613772! 1977.417 __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
:| # [ 4956] -<?>- 50 -1611795+ 1.794 xnpod_resume_thread+0x4e (xnthread_timeout_handler+0x1e)
...
...
...
:| # event tick@-5129 -5382+ 4.771 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:| # [ 4961] -<?>- -1 -5377+ 3.448 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:| # [ 4936] -<?>- 0 -5374+ 7.027 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:| # [ 4936] -<?>- 0 -5367+ 1.172 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:| # [ 4961] -<?>- -1 -5366! 253.408 __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
:| # [ 4957] -<?>- 50 -5112+ 1.884 xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
:| # event tick@-4525 -5110+ 3.548 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:| # [ 4961] -<?>- -1 -5107+ 2.476 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:| # [ 4957] -<?>- 50 -5104! 59.785 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:| # [ 4957] -<?>- 50 -5045+ 1.794 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
:| # [ 4961] -<?>- -1 -5043! 538.795 __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
:| # [ 5012] -<?>- 99 -4504+ 2.195 xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
:| # event tick@-129 -4502+ 4.180 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:| # [ 4961] -<?>- -1 -4498+ 2.596 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:| # [ 5012] -<?>- 99 -4495! 677.824 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:| # [ 5012] -<?>- 99 -3817+ 5.844 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
:| # [ 4961] -<?>- -1 -3811! 3709.015 __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
:| # [ 4957] -<?>- 50 -102+ 3.147 xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
:| # event tick@474 -99+ 4.210 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:| # [ 4961] -<?>- -1 -95+ 3.388 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:| # [ 4957] -<?>- 50 -92! 55.645 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:| # [ 4957] -<?>- 50 -36+ 2.065 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
:| # [ 4961] -<?>- -1 -34! 34.474 __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
< #freeze 0x00000001 0 489.264 update_curr_rt+0x10f (task_tick_rt+0x15)
| # [ 5012] -<?>- 99 489 1.724 xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
| # event tick@4870 490 3.147 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
| # [ 0] -<?>- -1 494 2.866 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
| # [ 5012] -<?>- 99 497 662.948 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
| # [ 5012] -<?>- 99 1159 3.608 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
| # [ 0] -<?>- -1 1163 3720.243 __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
| # [ 4957] -<?>- 50 4883 2.085 xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
| # event tick@5474 4885 3.368 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
| # [ 0] -<?>- -1 4889 1.233 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
| # [ 4957] -<?>- 50 4890 0.000 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
//================== frozen file end =======================
There is no process utilizing the CPU for > 950000 us, which
leads me to the conclusion, that the problem is not caused by
an infinite loop.
However I observe delay values of up to 3900 us multiple times which
makes me think that there is a high CPU utilization which in the
worst case exceeds the limit of 950000 us per 1 second interval.
Probably I would have to sum up all delay values of processes in
secondary mode within 1 second time interval and see if it exceeds
950000 us.
Now I have the following questions
1. What is meant by "domain stalled", "current" and "current+stalled"?
2. What is that process having ID "[ 0]"
3. How can I tell processes running in secondary mode from others?
(just by looking at the priority value?)
Any hint on interpreting my frozen file is appreciated.
Thanks
Jochen
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xenomai] "RT throttling" issue
2015-12-04 16:24 ` JK.Behnke
@ 2015-12-04 16:33 ` Gilles Chanteperdrix
2015-12-04 19:09 ` Jochen Behnke
0 siblings, 1 reply; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-12-04 16:33 UTC (permalink / raw)
To: JK.Behnke; +Cc: xenomai
On Fri, Dec 04, 2015 at 05:24:28PM +0100, JK.Behnke@web.de wrote:
> Hello Gilles,
>
> sorry for bothering you again.
>
> > >
> > > Using the I-pipe tracer, you may be able to understand what happens.
> > Should I patch my Linux kernel so that xntrace_user_freeze is called,
> > when Linux scheduler sends "RT throttling activated" message?
>
> I now have been able to capture an I-pipe frozen file, when the
> "RT throttling" condition occurs.
> I have done this by adding the line
> ipipe_trace_freeze(1);
> right before
> printk_sched("sched: RT Throttling activated\n");
> inside "kernel/sched/rt.c".
> I have set back_trace_points to 10000 to get about 1 second of
> backtrace history.
>
> Here is a fragment of my huge frozen file which contains
> the last 1.6 seconds before the freeze.
>
> //================== frozen file start =======================
> +----- Hard IRQs ('|': locked)
> |+-- Xenomai
> ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
> ||| +---------- Delay flag ('+': > 1 us, '!': > 10 us)
> ||| | +- NMI noise ('N')
> ||| | |
> Type User Val. Time Delay Function (Parent)
> :| # [ 4936] -<?>- 0 -1613869 0.852 xnpod_resume_thread+0x4e (xnthread_timeout_handler+0x1e)
> :| # event tick@-1611814-1613868+ 3.538 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
> :| # [ 5066] prc6hmi -1 -1613864+ 1.964 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
> :| # [ 4936] -<?>- 0 -1613862+ 3.127 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> :| # [ 4936] -<?>- 0 -1613859+ 1.192 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
> :| # [ 5066] prc6hmi -1 -1613858! 27.286 __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
> :| *+[ 4936] -<?>- 0 -1613831+ 1.824 xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
> :| # [ 456] gatekee -1 -1613829+ 4.200 __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
> :| # [ 4936] -<?>- 0 -1613825 0.952 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
> :| # [ 456] gatekee -1 -1613824! 13.001 __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
> :| *+[ 4936] -<?>- 0 -1613811 0.882 xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
> :| # [ 456] gatekee -1 -1613810+ 2.506 __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
> :| # [ 4936] -<?>- 0 -1613807 0.952 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
> :| # [ 456] gatekee -1 -1613806! 11.588 __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
> :| *+[ 4936] -<?>- 0 -1613795 0.721 xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
> :| # [ 456] gatekee -1 -1613794+ 6.395 __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
> :| # [ 4936] -<?>- 0 -1613788 0.922 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
> :| # [ 456] gatekee -1 -1613787! 10.124 __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
> :| *+[ 4936] -<?>- 0 -1613777 0.711 xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
> :| # [ 456] gatekee -1 -1613776+ 2.686 __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
> :| # [ 4936] -<?>- 0 -1613773 0.892 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
> :| # [ 456] gatekee -1 -1613772! 1977.417 __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
> :| # [ 4956] -<?>- 50 -1611795+ 1.794 xnpod_resume_thread+0x4e (xnthread_timeout_handler+0x1e)
> ...
> ...
> ...
> :| # event tick@-5129 -5382+ 4.771 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
> :| # [ 4961] -<?>- -1 -5377+ 3.448 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
> :| # [ 4936] -<?>- 0 -5374+ 7.027 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> :| # [ 4936] -<?>- 0 -5367+ 1.172 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
> :| # [ 4961] -<?>- -1 -5366! 253.408 __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
> :| # [ 4957] -<?>- 50 -5112+ 1.884 xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
> :| # event tick@-4525 -5110+ 3.548 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
> :| # [ 4961] -<?>- -1 -5107+ 2.476 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
> :| # [ 4957] -<?>- 50 -5104! 59.785 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> :| # [ 4957] -<?>- 50 -5045+ 1.794 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
> :| # [ 4961] -<?>- -1 -5043! 538.795 __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
> :| # [ 5012] -<?>- 99 -4504+ 2.195 xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
> :| # event tick@-129 -4502+ 4.180 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
> :| # [ 4961] -<?>- -1 -4498+ 2.596 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
> :| # [ 5012] -<?>- 99 -4495! 677.824 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> :| # [ 5012] -<?>- 99 -3817+ 5.844 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
> :| # [ 4961] -<?>- -1 -3811! 3709.015 __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
> :| # [ 4957] -<?>- 50 -102+ 3.147 xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
> :| # event tick@474 -99+ 4.210 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
> :| # [ 4961] -<?>- -1 -95+ 3.388 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
> :| # [ 4957] -<?>- 50 -92! 55.645 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> :| # [ 4957] -<?>- 50 -36+ 2.065 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
> :| # [ 4961] -<?>- -1 -34! 34.474 __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
> < #freeze 0x00000001 0 489.264 update_curr_rt+0x10f (task_tick_rt+0x15)
> | # [ 5012] -<?>- 99 489 1.724 xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
> | # event tick@4870 490 3.147 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
> | # [ 0] -<?>- -1 494 2.866 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
> | # [ 5012] -<?>- 99 497 662.948 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> | # [ 5012] -<?>- 99 1159 3.608 __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
> | # [ 0] -<?>- -1 1163 3720.243 __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
> | # [ 4957] -<?>- 50 4883 2.085 xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
> | # event tick@5474 4885 3.368 xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
> | # [ 0] -<?>- -1 4889 1.233 __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
> | # [ 4957] -<?>- 50 4890 0.000 __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> //================== frozen file end =======================
>
> There is no process utilizing the CPU for > 950000 us, which
> leads me to the conclusion, that the problem is not caused by
> an infinite loop.
The problem is caused by processes scheduled by the Linux scheduler
utilizing the CPU without never releasing it. These may be different
processes you are right.
> However I observe delay values of up to 3900 us multiple times which
> makes me think that there is a high CPU utilization which in the
> worst case exceeds the limit of 950000 us per 1 second interval.
> Probably I would have to sum up all delay values of processes in
> secondary mode within 1 second time interval and see if it exceeds
> 950000 us.
>
> Now I have the following questions
> 1. What is meant by "domain stalled", "current" and "current+stalled"?
> 2. What is that process having ID "[ 0]"
> 3. How can I tell processes running in secondary mode from others?
> (just by looking at the priority value?)
>
> Any hint on interpreting my frozen file is appreciated.
This trace is worthless. To have a meaningful trace, enable tracing
on functions entry (the default when you enable the I-pipe tracer).
--
Gilles.
https://click-hack.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xenomai] "RT throttling" issue
2015-12-04 16:33 ` Gilles Chanteperdrix
@ 2015-12-04 19:09 ` Jochen Behnke
2015-12-04 20:41 ` Gilles Chanteperdrix
0 siblings, 1 reply; 13+ messages in thread
From: Jochen Behnke @ 2015-12-04 19:09 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
Hello Gilles,
thanks for your response.
>>
>> Now I have the following questions
>> 1. What is meant by "domain stalled", "current" and "current+stalled"?
>> 2. What is that process having ID "[ 0]"
>> 3. How can I tell processes running in secondary mode from others?
>> (just by looking at the priority value?)
>>
>> Any hint on interpreting my frozen file is appreciated.
>
> This trace is worthless. To have a meaningful trace, enable tracing
> on functions entry (the default when you enable the I-pipe tracer).
>
Do you mean CONFIG_FUNCTION_TRACE?
Regards
Jochen
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xenomai] "RT throttling" issue
2015-12-04 19:09 ` Jochen Behnke
@ 2015-12-04 20:41 ` Gilles Chanteperdrix
2015-12-08 10:42 ` Jochen Behnke
0 siblings, 1 reply; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-12-04 20:41 UTC (permalink / raw)
To: Jochen Behnke; +Cc: xenomai
On Fri, Dec 04, 2015 at 08:09:09PM +0100, Jochen Behnke wrote:
> Hello Gilles,
>
> thanks for your response.
>
> >>
> >> Now I have the following questions
> >> 1. What is meant by "domain stalled", "current" and "current+stalled"?
> >> 2. What is that process having ID "[ 0]"
> >> 3. How can I tell processes running in secondary mode from others?
> >> (just by looking at the priority value?)
> >>
> >> Any hint on interpreting my frozen file is appreciated.
> >
> > This trace is worthless. To have a meaningful trace, enable tracing
> > on functions entry (the default when you enable the I-pipe tracer).
> >
> Do you mean CONFIG_FUNCTION_TRACE?
I mean CONFIG_IPIPE_TRACE_MCOUNT
--
Gilles.
https://click-hack.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xenomai] "RT throttling" issue
2015-12-04 20:41 ` Gilles Chanteperdrix
@ 2015-12-08 10:42 ` Jochen Behnke
2015-12-08 11:05 ` Gilles Chanteperdrix
0 siblings, 1 reply; 13+ messages in thread
From: Jochen Behnke @ 2015-12-08 10:42 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
Hello Gilles,
> > >> Now I have the following questions
> > >> 1. What is meant by "domain stalled", "current" and
"current+stalled"?
> > >> 2. What is that process having ID "[ 0]"
> > >> 3. How can I tell processes running in secondary mode from
others?
> > >> (just by looking at the priority value?)
> > >>
> > >> Any hint on interpreting my frozen file is appreciated.
> > >
> > > This trace is worthless. To have a meaningful trace, enable
tracing
> > > on functions entry (the default when you enable the I-pipe
tracer).
> > >
> > Do you mean CONFIG_FUNCTION_TRACE?
>
> I mean CONFIG_IPIPE_TRACE_MCOUNT
>
I enabled CONFIG_IPIPE_TRACE_MCOUNT=Y and set
CONFIG_IPIPE_TRACE_SHIFT=18
to get 256000 lines of backtrace history.
I captured some frozen files, when RT Throttling occurred and during
normal
operation.
To me, they look very similar and I am not sure at what functions
I should look at.
The longest delay values (>= 1000us) I observe for the following
functions.
They occur every 5..10 ms.
__ipipe_halt_root : delays = 1000us .. 3900us
__ipipe_restore_root_nosync : delays = 1000us .. 3900us
As "__ipipe_halt_root" is called by "default_idle", and the delay
values
increase (up to 9984 us), when the application is not running, I guess
this
is actually CPU idle time and not relevant.
"ipipe_root_only" also occasionally has long delay times.
Is ipipe_root_only run when Linux is executing?
Is this critical?
See extract below.
================== start: extract from frozen file
================================
: +func -420045+ 1.443 ipipe_root_only+0x4
(copy_to_user+0x31)
: +func -420044 0.260 __ipipe_syscall_root+0x5
(sysenter_past_esp+0x5b)
: +func -420043 0.200 sys_time+0x5
(sysenter_do_call+0x12)
: +func -420043 0.210 get_seconds+0x3
(sys_time+0xf)
: +func -420043! 3910.393 ipipe_root_only+0x4
(sys_time+0x1c)
:| +func -416132 0.651 __ipipe_handle_irq+0x5
(common_interrupt+0x2c)
:| +func -416132 0.340 __ipipe_dispatch_irq+0x9
(__ipipe_handle_irq+0x7f)
================== end: extract from frozen file
================================
Any other suggestions where to look at?
Thanks in advance
Jochen
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xenomai] "RT throttling" issue
2015-12-08 10:42 ` Jochen Behnke
@ 2015-12-08 11:05 ` Gilles Chanteperdrix
2015-12-08 12:12 ` Jochen Behnke
0 siblings, 1 reply; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-12-08 11:05 UTC (permalink / raw)
To: Jochen Behnke; +Cc: xenomai
On Tue, Dec 08, 2015 at 11:42:32AM +0100, Jochen Behnke wrote:
> "ipipe_root_only" also occasionally has long delay times.
> Is ipipe_root_only run when Linux is executing?
> Is this critical?
> See extract below.
> ================== start: extract from frozen file
> ================================
> : +func -420045+ 1.443 ipipe_root_only+0x4
> (copy_to_user+0x31)
> : +func -420044 0.260 __ipipe_syscall_root+0x5
> (sysenter_past_esp+0x5b)
> : +func -420043 0.200 sys_time+0x5
> (sysenter_do_call+0x12)
> : +func -420043 0.210 get_seconds+0x3
> (sys_time+0xf)
> : +func -420043! 3910.393 ipipe_root_only+0x4
> (sys_time+0x1c)
> :| +func -416132 0.651 __ipipe_handle_irq+0x5
> (common_interrupt+0x2c)
Hi,
What you have here is a system call to the "time" function from a
task scheduled by Linux scheduler. When the system call is finished,
the system returns to user-space. ipipe_root_only happens to be the
last function called before the return to user-space and before the
next interrupt. ipipe_root_only gets then charged for the time spent
in user-space.
What version of the I-pipe patch are you using on which
architecture? On some architectures we try to create trace points
when entering and leaving the kernel to avoid this issue (with the
added benefit that the user-space value of the PC gets printed, this
may be useful if you have an infinite loop).
Regards.
--
Gilles.
https://click-hack.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Xenomai] "RT throttling" issue
2015-12-08 11:05 ` Gilles Chanteperdrix
@ 2015-12-08 12:12 ` Jochen Behnke
0 siblings, 0 replies; 13+ messages in thread
From: Jochen Behnke @ 2015-12-08 12:12 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
> Gesendet: Dienstag, 08. Dezember 2015 um 12:05 Uhr
> Von: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> An: "Jochen Behnke" <JK.Behnke@web.de>
> Cc: xenomai@xenomai.org
> Betreff: Re: [Xenomai] "RT throttling" issue
> On Tue, Dec 08, 2015 at 11:42:32AM +0100, Jochen Behnke wrote:
> > "ipipe_root_only" also occasionally has long delay times.
> > Is ipipe_root_only run when Linux is executing?
> > Is this critical?
> > See extract below.
> > ================== start: extract from frozen file
> > ================================
> > : +func -420045+ 1.443 ipipe_root_only+0x4
> > (copy_to_user+0x31)
> > : +func -420044 0.260 __ipipe_syscall_root+0x5
> > (sysenter_past_esp+0x5b)
> > : +func -420043 0.200 sys_time+0x5
> > (sysenter_do_call+0x12)
> > : +func -420043 0.210 get_seconds+0x3
> > (sys_time+0xf)
> > : +func -420043! 3910.393 ipipe_root_only+0x4
> > (sys_time+0x1c)
> > :| +func -416132 0.651 __ipipe_handle_irq+0x5
> > (common_interrupt+0x2c)
>
> Hi,
>
> What you have here is a system call to the "time" function from a
> task scheduled by Linux scheduler. When the system call is finished,
> the system returns to user-space. ipipe_root_only happens to be the
> last function called before the return to user-space and before the
> next interrupt. ipipe_root_only gets then charged for the time spent
> in user-space.
>
> What version of the I-pipe patch are you using on which
> architecture? On some architectures we try to create trace points
> when entering and leaving the kernel to avoid this issue (with the
> added benefit that the user-space value of the PC gets printed, this
> may be useful if you have an infinite loop).
Xenomai: 2.6.4 githead
Architecture: x86
Platform: Atom E640/E660
I-pipe patch: ipipe-core-3.8.13-x86-4.patch
Regards
Jochen
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-12-08 12:12 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-26 10:32 [Xenomai] "RT throttling" issue JK.Behnke
2015-11-28 6:43 ` Gilles Chanteperdrix
2015-11-29 9:30 ` Jochen Behnke
2015-12-01 13:09 ` JK.Behnke
2015-12-01 18:11 ` Gilles Chanteperdrix
2015-12-02 12:58 ` JK.Behnke
2015-12-04 16:24 ` JK.Behnke
2015-12-04 16:33 ` Gilles Chanteperdrix
2015-12-04 19:09 ` Jochen Behnke
2015-12-04 20:41 ` Gilles Chanteperdrix
2015-12-08 10:42 ` Jochen Behnke
2015-12-08 11:05 ` Gilles Chanteperdrix
2015-12-08 12:12 ` Jochen Behnke
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.