All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] "RT throttling" issue
@ 2015-11-26 10:32 JK.Behnke
  2015-11-28  6:43 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 13+ messages in thread
From: JK.Behnke @ 2015-11-26 10:32 UTC (permalink / raw)
  To: xenomai

Hello,
 
in my xenomai 2.6.3 application I sporadically experience very long execution times of
a task that switches back and forth between secondary and primary mode.
 
I added the following code to measure elapsed and cpu time
//------------- start code snippet ---------------
struct rusage rsgT1;
struct rusage rsgT2;
struct timeval tvT1;
struct timeval tvT2;
gettimeofday(&tvT1, NULL);
getrusage(RUSAGE_SELF, &rsgT1);
//
// code block to be measured
//
gettimeofday(&tvT2, NULL);
getrusage(RUSAGE_SELF, &rsgT2);
//------------- end code snippet ---------------

I then calculate the difference and keep the maximum values in
CpuMax (getrusage) and TimeMax (gettimeofday) variables.

Typically I observe CpuMax=20 .. 100 ms and TimeMax with similar values.
But sometimes I get huge CpuMax values in the range 5000 .. 50000 ms.

Now my questions
In my system logfile I sometimes observe "RT throttling activated" messages.
According to the Linux kernel 3.8.13 documentation the Linux scheduler throttles
RT tasks, if they comsume too much cpu time.
Could the observed behaviour be related to this?

Is it a problem, if I call gettimeofday/getrusage multiple times in
a loop and the loop is executed every 10 ms?

I would expect TimeMax to always be greater or equal CpuMax.
But sometimes I observe CpuMax values that are significantly larger than
TimeMax during the same test session.
Any ideas what might be the cause?



Thanks for any comments
Jochen


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] "RT throttling" issue
  2015-11-26 10:32 [Xenomai] "RT throttling" issue JK.Behnke
@ 2015-11-28  6:43 ` Gilles Chanteperdrix
  2015-11-29  9:30   ` Jochen Behnke
  2015-12-01 13:09   ` JK.Behnke
  0 siblings, 2 replies; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-11-28  6:43 UTC (permalink / raw)
  To: JK.Behnke; +Cc: xenomai

On Thu, Nov 26, 2015 at 11:32:57AM +0100, JK.Behnke@web.de wrote:
> Hello,
>  
> in my xenomai 2.6.3 application I sporadically experience very long execution times of
> a task that switches back and forth between secondary and primary mode.
>  
> I added the following code to measure elapsed and cpu time
> //------------- start code snippet ---------------
> struct rusage rsgT1;
> struct rusage rsgT2;
> struct timeval tvT1;
> struct timeval tvT2;
> gettimeofday(&tvT1, NULL);

Do you observe the same problem with Xenomai 2.6 git head, if you
use clock_gettime(CLOCK_HOST_REALTIME) instead of gettimeofday, at
least for threads running in primary mode ? 

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] "RT throttling" issue
  2015-11-28  6:43 ` Gilles Chanteperdrix
@ 2015-11-29  9:30   ` Jochen Behnke
  2015-12-01 13:09   ` JK.Behnke
  1 sibling, 0 replies; 13+ messages in thread
From: Jochen Behnke @ 2015-11-29  9:30 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Hello Gilles,

in the meanwhile I added rt_timer_tsc as a second alternative to measure 
elapsed time. rt_timer_tsc returns values that are a somewhat greater 
than expected (around 39 ms instead of expected 20 ms), but still 
reasonable.

> On Thu, Nov 26, 2015 at 11:32:57AM +0100, JK.Behnke@web.de wrote:
>> Hello,
>>
>> in my xenomai 2.6.3 application I sporadically experience very long execution times of
>> a task that switches back and forth between secondary and primary mode.
>>
>> I added the following code to measure elapsed and cpu time
>> //------------- start code snippet ---------------
>> struct rusage rsgT1;
>> struct rusage rsgT2;
>> struct timeval tvT1;
>> struct timeval tvT2;
>> gettimeofday(&tvT1, NULL);
>
> Do you observe the same problem with Xenomai 2.6 git head, if you
> use clock_gettime(CLOCK_HOST_REALTIME) instead of gettimeofday, at
> least for threads running in primary mode ?
>

I'll give Xenomai 2.6 git head a try.
Thanks for your help.

Regards
Jochen



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] "RT throttling" issue
  2015-11-28  6:43 ` Gilles Chanteperdrix
  2015-11-29  9:30   ` Jochen Behnke
@ 2015-12-01 13:09   ` JK.Behnke
  2015-12-01 18:11     ` Gilles Chanteperdrix
  1 sibling, 1 reply; 13+ messages in thread
From: JK.Behnke @ 2015-12-01 13:09 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

   Hello Gilles,

   > in the meanwhile I added rt_timer_tsc as a second alternative to
   measure
   > elapsed time. rt_timer_tsc returns values that are a somewhat greater
   > than expected (around 39 ms instead of expected 20 ms), but still
   > reasonable.
   >
   > > On Thu, Nov 26, 2015 at 11:32:57AM +0100, JK.Behnke@web.de wrote:
   > >> Hello,
   > >>
   > >> in my xenomai 2.6.3 application I sporadically experience very
   long execution times of
   > >> a task that switches back and forth between secondary and primary
   mode.
   > >>
   > >> I added the following code to measure elapsed and cpu time
   > >> //------------- start code snippet ---------------
   > >> struct rusage rsgT1;
   > >> struct rusage rsgT2;
   > >> struct timeval tvT1;
   > >> struct timeval tvT2;
   > >> gettimeofday(&tvT1, NULL);
   > >
   > > Do you observe the same problem with Xenomai 2.6 git head, if you
   > > use clock_gettime(CLOCK_HOST_REALTIME) instead of gettimeofday, at
   > > least for threads running in primary mode ?
   I am now running xenomai 2.6.4 git head.
   The application now uses the posix wrappers as described in the
   documentation "Porting a Linux application to Xenomai dual kernel"
   I replaced the gettimeofday() calls by
   clock_gettime(CLOCK_HOST_REALTIME).
   The getrusage() calls have been removed.

   Both clock_gettime and rt_timer_tsc return reasonable values.
   However, once I observed a loop execution time of 14000 ms instead of
   typical 30 ms.
   This means that the task was blocked for about 14 seconds.
   As I still experience "RT throttling" messages of the Linux kernel,
   I guess that some lengthy operation (file access, memory
   allocation,...)
   of some other task in my application produces the "RT throttling".
   As a side effect the task gets blocked by the Linux scheduler while
   in secondary mode. As mentioned earlier, the task in question switches
   back and forth between secondary and primary mode.

   Is it dangerous to use memory allocation and file access functions
   inside a Linux task running under policy SCHED_FIFO, even though
   they are not really time critical?
   Could some other Linux process block these kind of tasks for such
   a long time (14 sec) or even longer?
   TIA for your comments on this
   Regards
   Jochen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] "RT throttling" issue
  2015-12-01 13:09   ` JK.Behnke
@ 2015-12-01 18:11     ` Gilles Chanteperdrix
  2015-12-02 12:58       ` JK.Behnke
  0 siblings, 1 reply; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-12-01 18:11 UTC (permalink / raw)
  To: JK.Behnke; +Cc: xenomai

On Tue, Dec 01, 2015 at 02:09:45PM +0100, JK.Behnke@web.de wrote:
>    Hello Gilles,
> 
>    > in the meanwhile I added rt_timer_tsc as a second alternative to
>    measure
>    > elapsed time. rt_timer_tsc returns values that are a somewhat greater
>    > than expected (around 39 ms instead of expected 20 ms), but still
>    > reasonable.
>    >
>    > > On Thu, Nov 26, 2015 at 11:32:57AM +0100, JK.Behnke@web.de wrote:
>    > >> Hello,
>    > >>
>    > >> in my xenomai 2.6.3 application I sporadically experience very
>    long execution times of
>    > >> a task that switches back and forth between secondary and primary
>    mode.
>    > >>
>    > >> I added the following code to measure elapsed and cpu time
>    > >> //------------- start code snippet ---------------
>    > >> struct rusage rsgT1;
>    > >> struct rusage rsgT2;
>    > >> struct timeval tvT1;
>    > >> struct timeval tvT2;
>    > >> gettimeofday(&tvT1, NULL);
>    > >
>    > > Do you observe the same problem with Xenomai 2.6 git head, if you
>    > > use clock_gettime(CLOCK_HOST_REALTIME) instead of gettimeofday, at
>    > > least for threads running in primary mode ?
>    I am now running xenomai 2.6.4 git head.
>    The application now uses the posix wrappers as described in the
>    documentation "Porting a Linux application to Xenomai dual kernel"
>    I replaced the gettimeofday() calls by
>    clock_gettime(CLOCK_HOST_REALTIME).
>    The getrusage() calls have been removed.
> 
>    Both clock_gettime and rt_timer_tsc return reasonable values.
>    However, once I observed a loop execution time of 14000 ms instead of
>    typical 30 ms.

If a "loop", runs 30ms in primary mode then you are starving Linux
from its timer interrupts. You should try to let linux at least
handle its interrupt every 10ms if running with HZ==100 or every
millisecond if running with HZ==1000.


>    This means that the task was blocked for about 14 seconds.
>    As I still experience "RT throttling" messages of the Linux kernel,
>    I guess that some lengthy operation (file access, memory
>    allocation,...)

No. "RT throttling" means that a task run by the Linux scheduler
with scheduling policy SCHED_FIFO or SCHED_RR is using more than 95%
of the cpu over a period of one second. This is bad, this is
probably a kind of infinite loop. gettimeofday called from primary
mode is known to cause such an issue (when xenomai watchdog has
detected the infinite loop in primary mode and kicked the task to
secondary mode), but if you have removed all calls to gettimeofday,
then the issue is elsewhere.

>    of some other task in my application produces the "RT throttling".
>    As a side effect the task gets blocked by the Linux scheduler while
>    in secondary mode. As mentioned earlier, the task in question switches
>    back and forth between secondary and primary mode.
> 
>    Is it dangerous to use memory allocation and file access functions
>    inside a Linux task running under policy SCHED_FIFO, even though
>    they are not really time critical?

File access cause the function to wait. Not to use cpu. Memory
allocation should not be a problem either. Though due to the way
Linux handle its memory (all unused memory is used as disk cache), a
memory allocation may actually have to wait for a disk access to
finish, so the worst case would make your task not real-time at all.

>    Could some other Linux process block these kind of tasks for such
>    a long time (14 sec) or even longer?

Using the I-pipe tracer, you may be able to understand what happens.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] "RT throttling" issue
  2015-12-01 18:11     ` Gilles Chanteperdrix
@ 2015-12-02 12:58       ` JK.Behnke
  2015-12-04 16:24         ` JK.Behnke
  0 siblings, 1 reply; 13+ messages in thread
From: JK.Behnke @ 2015-12-02 12:58 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Hello Gilles,

thanks for your comments.

> On Tue, Dec 01, 2015 at 02:09:45PM +0100, JK.Behnke@web.de wrote:
> > Hello Gilles,
> >
> > > in the meanwhile I added rt_timer_tsc as a second alternative to
> > measure
> > > elapsed time. rt_timer_tsc returns values that are a somewhat greater
> > > than expected (around 39 ms instead of expected 20 ms), but still
> > > reasonable.
> > >
> > > > On Thu, Nov 26, 2015 at 11:32:57AM +0100, JK.Behnke@web.de wrote:
> > > >> Hello,
> > > >>
> > > >> in my xenomai 2.6.3 application I sporadically experience very
> > long execution times of
> > > >> a task that switches back and forth between secondary and primary
> > mode.
> > > >>
> > > >> I added the following code to measure elapsed and cpu time
> > > >> //------------- start code snippet ---------------
> > > >> struct rusage rsgT1;
> > > >> struct rusage rsgT2;
> > > >> struct timeval tvT1;
> > > >> struct timeval tvT2;
> > > >> gettimeofday(&tvT1, NULL);
> > > >
> > > > Do you observe the same problem with Xenomai 2.6 git head, if you
> > > > use clock_gettime(CLOCK_HOST_REALTIME) instead of gettimeofday, at
> > > > least for threads running in primary mode ?
> > I am now running xenomai 2.6.4 git head.
> > The application now uses the posix wrappers as described in the
> > documentation "Porting a Linux application to Xenomai dual kernel"
> > I replaced the gettimeofday() calls by
> > clock_gettime(CLOCK_HOST_REALTIME).
> > The getrusage() calls have been removed.
> >
> > Both clock_gettime and rt_timer_tsc return reasonable values.
> > However, once I observed a loop execution time of 14000 ms instead of
> > typical 30 ms.
> 
> If a "loop", runs 30ms in primary mode then you are starving Linux
> from its timer interrupts. You should try to let linux at least
> handle its interrupt every 10ms if running with HZ==100 or every
> millisecond if running with HZ==1000.
By "loop execution time" I meant the elapsed time between two
consecutive loops. This includes the sleep time of 20 ms per loop.
I typically observe max elapsed time values of 23ms .. 39ms. 
Sorry for being unprecise.


> 
> 
> > This means that the task was blocked for about 14 seconds.
> > As I still experience "RT throttling" messages of the Linux kernel,
> > I guess that some lengthy operation (file access, memory
> > allocation,...)
> 
> No. "RT throttling" means that a task run by the Linux scheduler
> with scheduling policy SCHED_FIFO or SCHED_RR is using more than 95%
> of the cpu over a period of one second. This is bad, this is
> probably a kind of infinite loop. gettimeofday called from primary
> mode is known to cause such an issue (when xenomai watchdog has
> detected the infinite loop in primary mode and kicked the task to
> secondary mode), but if you have removed all calls to gettimeofday,
> then the issue is elsewhere.
> 
My understandig is, that tasks migrate automatically to secondary mode,
when issuing a Linux system call and migrate back to primary mode on
the next call of a xenomai service. Isn't that true?

Are there any Linux system calls other than gettimeofday which may cause
problems whenn called from primary mode?


> > of some other task in my application produces the "RT throttling".
> > As a side effect the task gets blocked by the Linux scheduler while
> > in secondary mode. As mentioned earlier, the task in question switches
> > back and forth between secondary and primary mode.
> >
> > Is it dangerous to use memory allocation and file access functions
> > inside a Linux task running under policy SCHED_FIFO, even though
> > they are not really time critical?
> 
> File access cause the function to wait. Not to use cpu. Memory
> allocation should not be a problem either. Though due to the way
> Linux handle its memory (all unused memory is used as disk cache), a
> memory allocation may actually have to wait for a disk access to
> finish, so the worst case would make your task not real-time at all.
> 
> > Could some other Linux process block these kind of tasks for such
> > a long time (14 sec) or even longer?
> 
> Using the I-pipe tracer, you may be able to understand what happens.
Should I patch my Linux kernel so that xntrace_user_freeze is called,
when Linux scheduler sends "RT throttling activated" message?


Regards
Jochen


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] "RT throttling" issue
  2015-12-02 12:58       ` JK.Behnke
@ 2015-12-04 16:24         ` JK.Behnke
  2015-12-04 16:33           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 13+ messages in thread
From: JK.Behnke @ 2015-12-04 16:24 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Hello Gilles,

sorry for bothering you again.

> >
> > Using the I-pipe tracer, you may be able to understand what happens.
> Should I patch my Linux kernel so that xntrace_user_freeze is called,
> when Linux scheduler sends "RT throttling activated" message?

I now have been able to capture an I-pipe frozen file, when the
"RT throttling" condition occurs.
I have done this by adding the line
  ipipe_trace_freeze(1);
right before
  printk_sched("sched: RT Throttling activated\n");
inside "kernel/sched/rt.c".
I have set back_trace_points to 10000 to get about 1 second of
backtrace history.

Here is a fragment of my huge frozen file which contains
the last 1.6 seconds before the freeze.

//================== frozen file start =======================
 +----- Hard IRQs ('|': locked)
 |+-- Xenomai
 ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
 |||              +---------- Delay flag ('+': > 1 us, '!': > 10 us)
 |||              |       +- NMI noise ('N')
 |||              |       |
      Type      User Val.   Time    Delay  Function (Parent)
:|  # [ 4936] -<?>-    0 -1613869      0.852  xnpod_resume_thread+0x4e (xnthread_timeout_handler+0x1e)
:|  # event   tick@-1611814-1613868+   3.538  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:|  # [ 5066] prc6hmi -1 -1613864+   1.964  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:|  # [ 4936] -<?>-    0 -1613862+   3.127  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:|  # [ 4936] -<?>-    0 -1613859+   1.192  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:|  # [ 5066] prc6hmi -1 -1613858!  27.286  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
:|  *+[ 4936] -<?>-    0 -1613831+   1.824  xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
:|  # [  456] gatekee -1 -1613829+   4.200  __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
:|  # [ 4936] -<?>-    0 -1613825      0.952  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:|  # [  456] gatekee -1 -1613824!  13.001  __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
:|  *+[ 4936] -<?>-    0 -1613811      0.882  xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
:|  # [  456] gatekee -1 -1613810+   2.506  __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
:|  # [ 4936] -<?>-    0 -1613807      0.952  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:|  # [  456] gatekee -1 -1613806!  11.588  __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
:|  *+[ 4936] -<?>-    0 -1613795      0.721  xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
:|  # [  456] gatekee -1 -1613794+   6.395  __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
:|  # [ 4936] -<?>-    0 -1613788      0.922  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:|  # [  456] gatekee -1 -1613787!  10.124  __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
:|  *+[ 4936] -<?>-    0 -1613777      0.711  xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
:|  # [  456] gatekee -1 -1613776+   2.686  __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
:|  # [ 4936] -<?>-    0 -1613773      0.892  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
:|  # [  456] gatekee -1 -1613772! 1977.417  __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
:|  # [ 4956] -<?>-   50 -1611795+   1.794  xnpod_resume_thread+0x4e (xnthread_timeout_handler+0x1e)
...
...
...
:|  # event   tick@-5129 -5382+   4.771  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:|  # [ 4961] -<?>-   -1 -5377+   3.448  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:|  # [ 4936] -<?>-    0 -5374+   7.027  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:|  # [ 4936] -<?>-    0 -5367+   1.172  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:|  # [ 4961] -<?>-   -1 -5366! 253.408  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
:|  # [ 4957] -<?>-   50 -5112+   1.884  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
:|  # event   tick@-4525 -5110+   3.548  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:|  # [ 4961] -<?>-   -1 -5107+   2.476  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:|  # [ 4957] -<?>-   50 -5104!  59.785  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:|  # [ 4957] -<?>-   50 -5045+   1.794  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
:|  # [ 4961] -<?>-   -1 -5043! 538.795  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
:|  # [ 5012] -<?>-   99 -4504+   2.195  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
:|  # event   tick@-129  -4502+   4.180  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:|  # [ 4961] -<?>-   -1 -4498+   2.596  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:|  # [ 5012] -<?>-   99 -4495! 677.824  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:|  # [ 5012] -<?>-   99 -3817+   5.844  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
:|  # [ 4961] -<?>-   -1 -3811! 3709.015  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
:|  # [ 4957] -<?>-   50  -102+   3.147  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
:|  # event   tick@474     -99+   4.210  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:|  # [ 4961] -<?>-   -1   -95+   3.388  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:|  # [ 4957] -<?>-   50   -92!  55.645  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:|  # [ 4957] -<?>-   50   -36+   2.065  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
:|  # [ 4961] -<?>-   -1   -34!  34.474  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
<    #freeze  0x00000001     0    489.264  update_curr_rt+0x10f (task_tick_rt+0x15)
 |  # [ 5012] -<?>-   99   489      1.724  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
 |  # event   tick@4870    490      3.147  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
 |  # [    0] -<?>-   -1   494      2.866  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
 |  # [ 5012] -<?>-   99   497    662.948  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
 |  # [ 5012] -<?>-   99  1159      3.608  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
 |  # [    0] -<?>-   -1  1163    3720.243  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
 |  # [ 4957] -<?>-   50  4883      2.085  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
 |  # event   tick@5474   4885      3.368  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
 |  # [    0] -<?>-   -1  4889      1.233  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
 |  # [ 4957] -<?>-   50  4890      0.000  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
//================== frozen file end =======================

There is no process utilizing the CPU  for > 950000 us, which
leads me to the conclusion, that the problem is not caused by
an infinite loop.
However I observe delay values of up to 3900 us multiple times which
makes me think that there is a high CPU utilization which in the
worst case exceeds the limit of 950000 us per 1 second interval.
Probably I would have to sum up all delay values of processes in 
secondary mode within 1 second time interval and see if it exceeds
950000 us.

Now I have the following questions
1. What is meant by "domain stalled", "current" and "current+stalled"?
2. What is that process having ID "[   0]"
3. How can I tell processes running in secondary mode from others?
   (just by looking at the priority value?)

Any hint on interpreting my frozen file is appreciated.

Thanks
Jochen


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] "RT throttling" issue
  2015-12-04 16:24         ` JK.Behnke
@ 2015-12-04 16:33           ` Gilles Chanteperdrix
  2015-12-04 19:09             ` Jochen Behnke
  0 siblings, 1 reply; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-12-04 16:33 UTC (permalink / raw)
  To: JK.Behnke; +Cc: xenomai

On Fri, Dec 04, 2015 at 05:24:28PM +0100, JK.Behnke@web.de wrote:
> Hello Gilles,
> 
> sorry for bothering you again.
> 
> > >
> > > Using the I-pipe tracer, you may be able to understand what happens.
> > Should I patch my Linux kernel so that xntrace_user_freeze is called,
> > when Linux scheduler sends "RT throttling activated" message?
> 
> I now have been able to capture an I-pipe frozen file, when the
> "RT throttling" condition occurs.
> I have done this by adding the line
>   ipipe_trace_freeze(1);
> right before
>   printk_sched("sched: RT Throttling activated\n");
> inside "kernel/sched/rt.c".
> I have set back_trace_points to 10000 to get about 1 second of
> backtrace history.
> 
> Here is a fragment of my huge frozen file which contains
> the last 1.6 seconds before the freeze.
> 
> //================== frozen file start =======================
>  +----- Hard IRQs ('|': locked)
>  |+-- Xenomai
>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>  |||              +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>  |||              |       +- NMI noise ('N')
>  |||              |       |
>       Type      User Val.   Time    Delay  Function (Parent)
> :|  # [ 4936] -<?>-    0 -1613869      0.852  xnpod_resume_thread+0x4e (xnthread_timeout_handler+0x1e)
> :|  # event   tick@-1611814-1613868+   3.538  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
> :|  # [ 5066] prc6hmi -1 -1613864+   1.964  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
> :|  # [ 4936] -<?>-    0 -1613862+   3.127  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> :|  # [ 4936] -<?>-    0 -1613859+   1.192  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
> :|  # [ 5066] prc6hmi -1 -1613858!  27.286  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
> :|  *+[ 4936] -<?>-    0 -1613831+   1.824  xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
> :|  # [  456] gatekee -1 -1613829+   4.200  __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
> :|  # [ 4936] -<?>-    0 -1613825      0.952  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
> :|  # [  456] gatekee -1 -1613824!  13.001  __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
> :|  *+[ 4936] -<?>-    0 -1613811      0.882  xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
> :|  # [  456] gatekee -1 -1613810+   2.506  __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
> :|  # [ 4936] -<?>-    0 -1613807      0.952  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
> :|  # [  456] gatekee -1 -1613806!  11.588  __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
> :|  *+[ 4936] -<?>-    0 -1613795      0.721  xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
> :|  # [  456] gatekee -1 -1613794+   6.395  __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
> :|  # [ 4936] -<?>-    0 -1613788      0.922  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
> :|  # [  456] gatekee -1 -1613787!  10.124  __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
> :|  *+[ 4936] -<?>-    0 -1613777      0.711  xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
> :|  # [  456] gatekee -1 -1613776+   2.686  __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
> :|  # [ 4936] -<?>-    0 -1613773      0.892  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
> :|  # [  456] gatekee -1 -1613772! 1977.417  __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
> :|  # [ 4956] -<?>-   50 -1611795+   1.794  xnpod_resume_thread+0x4e (xnthread_timeout_handler+0x1e)
> ...
> ...
> ...
> :|  # event   tick@-5129 -5382+   4.771  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
> :|  # [ 4961] -<?>-   -1 -5377+   3.448  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
> :|  # [ 4936] -<?>-    0 -5374+   7.027  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> :|  # [ 4936] -<?>-    0 -5367+   1.172  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
> :|  # [ 4961] -<?>-   -1 -5366! 253.408  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
> :|  # [ 4957] -<?>-   50 -5112+   1.884  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
> :|  # event   tick@-4525 -5110+   3.548  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
> :|  # [ 4961] -<?>-   -1 -5107+   2.476  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
> :|  # [ 4957] -<?>-   50 -5104!  59.785  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> :|  # [ 4957] -<?>-   50 -5045+   1.794  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
> :|  # [ 4961] -<?>-   -1 -5043! 538.795  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
> :|  # [ 5012] -<?>-   99 -4504+   2.195  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
> :|  # event   tick@-129  -4502+   4.180  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
> :|  # [ 4961] -<?>-   -1 -4498+   2.596  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
> :|  # [ 5012] -<?>-   99 -4495! 677.824  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> :|  # [ 5012] -<?>-   99 -3817+   5.844  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
> :|  # [ 4961] -<?>-   -1 -3811! 3709.015  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
> :|  # [ 4957] -<?>-   50  -102+   3.147  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
> :|  # event   tick@474     -99+   4.210  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
> :|  # [ 4961] -<?>-   -1   -95+   3.388  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
> :|  # [ 4957] -<?>-   50   -92!  55.645  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> :|  # [ 4957] -<?>-   50   -36+   2.065  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
> :|  # [ 4961] -<?>-   -1   -34!  34.474  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
> <    #freeze  0x00000001     0    489.264  update_curr_rt+0x10f (task_tick_rt+0x15)
>  |  # [ 5012] -<?>-   99   489      1.724  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
>  |  # event   tick@4870    490      3.147  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
>  |  # [    0] -<?>-   -1   494      2.866  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
>  |  # [ 5012] -<?>-   99   497    662.948  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
>  |  # [ 5012] -<?>-   99  1159      3.608  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
>  |  # [    0] -<?>-   -1  1163    3720.243  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
>  |  # [ 4957] -<?>-   50  4883      2.085  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
>  |  # event   tick@5474   4885      3.368  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
>  |  # [    0] -<?>-   -1  4889      1.233  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
>  |  # [ 4957] -<?>-   50  4890      0.000  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
> //================== frozen file end =======================
> 
> There is no process utilizing the CPU  for > 950000 us, which
> leads me to the conclusion, that the problem is not caused by
> an infinite loop.

The problem is caused by processes scheduled by the Linux scheduler
utilizing the CPU without never releasing it. These may be different
processes you are right.

> However I observe delay values of up to 3900 us multiple times which
> makes me think that there is a high CPU utilization which in the
> worst case exceeds the limit of 950000 us per 1 second interval.
> Probably I would have to sum up all delay values of processes in 
> secondary mode within 1 second time interval and see if it exceeds
> 950000 us.
> 
> Now I have the following questions
> 1. What is meant by "domain stalled", "current" and "current+stalled"?
> 2. What is that process having ID "[   0]"
> 3. How can I tell processes running in secondary mode from others?
>    (just by looking at the priority value?)
> 
> Any hint on interpreting my frozen file is appreciated.

This trace is worthless. To have a meaningful trace, enable tracing
on functions entry (the default when you enable the I-pipe tracer).

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] "RT throttling" issue
  2015-12-04 16:33           ` Gilles Chanteperdrix
@ 2015-12-04 19:09             ` Jochen Behnke
  2015-12-04 20:41               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 13+ messages in thread
From: Jochen Behnke @ 2015-12-04 19:09 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Hello Gilles,

thanks for your response.

>>
>> Now I have the following questions
>> 1. What is meant by "domain stalled", "current" and "current+stalled"?
>> 2. What is that process having ID "[   0]"
>> 3. How can I tell processes running in secondary mode from others?
>>     (just by looking at the priority value?)
>>
>> Any hint on interpreting my frozen file is appreciated.
>
> This trace is worthless. To have a meaningful trace, enable tracing
> on functions entry (the default when you enable the I-pipe tracer).
>
Do you mean CONFIG_FUNCTION_TRACE?

Regards
Jochen



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] "RT throttling" issue
  2015-12-04 19:09             ` Jochen Behnke
@ 2015-12-04 20:41               ` Gilles Chanteperdrix
  2015-12-08 10:42                 ` Jochen Behnke
  0 siblings, 1 reply; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-12-04 20:41 UTC (permalink / raw)
  To: Jochen Behnke; +Cc: xenomai

On Fri, Dec 04, 2015 at 08:09:09PM +0100, Jochen Behnke wrote:
> Hello Gilles,
> 
> thanks for your response.
> 
> >>
> >> Now I have the following questions
> >> 1. What is meant by "domain stalled", "current" and "current+stalled"?
> >> 2. What is that process having ID "[   0]"
> >> 3. How can I tell processes running in secondary mode from others?
> >>     (just by looking at the priority value?)
> >>
> >> Any hint on interpreting my frozen file is appreciated.
> >
> > This trace is worthless. To have a meaningful trace, enable tracing
> > on functions entry (the default when you enable the I-pipe tracer).
> >
> Do you mean CONFIG_FUNCTION_TRACE?

I mean CONFIG_IPIPE_TRACE_MCOUNT

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] "RT throttling" issue
  2015-12-04 20:41               ` Gilles Chanteperdrix
@ 2015-12-08 10:42                 ` Jochen Behnke
  2015-12-08 11:05                   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 13+ messages in thread
From: Jochen Behnke @ 2015-12-08 10:42 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

   Hello Gilles,

   > > >> Now I have the following questions
   > > >> 1. What is meant by "domain stalled", "current" and
   "current+stalled"?
   > > >> 2. What is that process having ID "[ 0]"
   > > >> 3. How can I tell processes running in secondary mode from
   others?
   > > >> (just by looking at the priority value?)
   > > >>
   > > >> Any hint on interpreting my frozen file is appreciated.
   > > >
   > > > This trace is worthless. To have a meaningful trace, enable
   tracing
   > > > on functions entry (the default when you enable the I-pipe
   tracer).
   > > >
   > > Do you mean CONFIG_FUNCTION_TRACE?
   >
   > I mean CONFIG_IPIPE_TRACE_MCOUNT
   >
   I enabled CONFIG_IPIPE_TRACE_MCOUNT=Y and set
   CONFIG_IPIPE_TRACE_SHIFT=18
   to get 256000 lines of backtrace history.
   I captured some frozen files, when RT Throttling occurred and during
   normal
   operation.
   To me, they look very similar and I am not sure at what functions
   I should look at.

   The longest delay values (>= 1000us) I observe for the following
   functions.
   They occur every 5..10 ms.
   __ipipe_halt_root           : delays = 1000us .. 3900us
   __ipipe_restore_root_nosync : delays = 1000us .. 3900us

   As "__ipipe_halt_root" is called by "default_idle", and the delay
   values
   increase (up to 9984 us), when the application is not running, I guess
   this
   is actually CPU idle time and not relevant.

   "ipipe_root_only" also occasionally has long delay times.
   Is ipipe_root_only run when Linux is executing?
   Is this critical?
   See extract below.
   ================== start: extract from frozen file
   ================================
   :    +func               -420045+   1.443  ipipe_root_only+0x4
   (copy_to_user+0x31)
   :    +func               -420044      0.260  __ipipe_syscall_root+0x5
   (sysenter_past_esp+0x5b)
   :    +func               -420043      0.200  sys_time+0x5
   (sysenter_do_call+0x12)
   :    +func               -420043      0.210  get_seconds+0x3
   (sys_time+0xf)
   :    +func               -420043! 3910.393  ipipe_root_only+0x4
   (sys_time+0x1c)
   :|   +func               -416132      0.651  __ipipe_handle_irq+0x5
   (common_interrupt+0x2c)
   :|   +func               -416132      0.340  __ipipe_dispatch_irq+0x9
   (__ipipe_handle_irq+0x7f)
   ================== end: extract from frozen file
   ================================

   Any other suggestions where to look at?
   Thanks in advance
   Jochen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] "RT throttling" issue
  2015-12-08 10:42                 ` Jochen Behnke
@ 2015-12-08 11:05                   ` Gilles Chanteperdrix
  2015-12-08 12:12                     ` Jochen Behnke
  0 siblings, 1 reply; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-12-08 11:05 UTC (permalink / raw)
  To: Jochen Behnke; +Cc: xenomai

On Tue, Dec 08, 2015 at 11:42:32AM +0100, Jochen Behnke wrote:
>    "ipipe_root_only" also occasionally has long delay times.
>    Is ipipe_root_only run when Linux is executing?
>    Is this critical?
>    See extract below.
>    ================== start: extract from frozen file
>    ================================
>    :    +func               -420045+   1.443  ipipe_root_only+0x4
>    (copy_to_user+0x31)
>    :    +func               -420044      0.260  __ipipe_syscall_root+0x5
>    (sysenter_past_esp+0x5b)
>    :    +func               -420043      0.200  sys_time+0x5
>    (sysenter_do_call+0x12)
>    :    +func               -420043      0.210  get_seconds+0x3
>    (sys_time+0xf)
>    :    +func               -420043! 3910.393  ipipe_root_only+0x4
>    (sys_time+0x1c)
>    :|   +func               -416132      0.651  __ipipe_handle_irq+0x5
>    (common_interrupt+0x2c)

Hi,

What you have here is a system call to the "time" function from a
task scheduled by Linux scheduler. When the system call is finished,
the system returns to user-space. ipipe_root_only happens to be the
last function called before the return to user-space and before the
next interrupt. ipipe_root_only gets then charged for the time spent
in user-space.

What version of the I-pipe patch are you using on which
architecture? On some architectures we try to create trace points
when entering and leaving the kernel to avoid this issue (with the
added benefit that the user-space value of the PC gets printed, this
may be useful if you have an infinite loop).

Regards.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] "RT throttling" issue
  2015-12-08 11:05                   ` Gilles Chanteperdrix
@ 2015-12-08 12:12                     ` Jochen Behnke
  0 siblings, 0 replies; 13+ messages in thread
From: Jochen Behnke @ 2015-12-08 12:12 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

> Gesendet: Dienstag, 08. Dezember 2015 um 12:05 Uhr
> Von: "Gilles Chanteperdrix" <gilles.chanteperdrix@xenomai.org>
> An: "Jochen Behnke" <JK.Behnke@web.de>
> Cc: xenomai@xenomai.org
> Betreff: Re: [Xenomai] "RT throttling" issue
> On Tue, Dec 08, 2015 at 11:42:32AM +0100, Jochen Behnke wrote:
> > "ipipe_root_only" also occasionally has long delay times.
> > Is ipipe_root_only run when Linux is executing?
> > Is this critical?
> > See extract below.
> > ================== start: extract from frozen file
> > ================================
> > : +func -420045+ 1.443 ipipe_root_only+0x4
> > (copy_to_user+0x31)
> > : +func -420044 0.260 __ipipe_syscall_root+0x5
> > (sysenter_past_esp+0x5b)
> > : +func -420043 0.200 sys_time+0x5
> > (sysenter_do_call+0x12)
> > : +func -420043 0.210 get_seconds+0x3
> > (sys_time+0xf)
> > : +func -420043! 3910.393 ipipe_root_only+0x4
> > (sys_time+0x1c)
> > :| +func -416132 0.651 __ipipe_handle_irq+0x5
> > (common_interrupt+0x2c)
>
> Hi,
>
> What you have here is a system call to the "time" function from a
> task scheduled by Linux scheduler. When the system call is finished,
> the system returns to user-space. ipipe_root_only happens to be the
> last function called before the return to user-space and before the
> next interrupt. ipipe_root_only gets then charged for the time spent
> in user-space.
>
> What version of the I-pipe patch are you using on which
> architecture? On some architectures we try to create trace points
> when entering and leaving the kernel to avoid this issue (with the
> added benefit that the user-space value of the PC gets printed, this
> may be useful if you have an infinite loop).
Xenomai:      2.6.4 githead
Architecture: x86
Platform:     Atom E640/E660
I-pipe patch: ipipe-core-3.8.13-x86-4.patch


Regards
Jochen
 
 


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-12-08 12:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-26 10:32 [Xenomai] "RT throttling" issue JK.Behnke
2015-11-28  6:43 ` Gilles Chanteperdrix
2015-11-29  9:30   ` Jochen Behnke
2015-12-01 13:09   ` JK.Behnke
2015-12-01 18:11     ` Gilles Chanteperdrix
2015-12-02 12:58       ` JK.Behnke
2015-12-04 16:24         ` JK.Behnke
2015-12-04 16:33           ` Gilles Chanteperdrix
2015-12-04 19:09             ` Jochen Behnke
2015-12-04 20:41               ` Gilles Chanteperdrix
2015-12-08 10:42                 ` Jochen Behnke
2015-12-08 11:05                   ` Gilles Chanteperdrix
2015-12-08 12:12                     ` Jochen Behnke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.