All of lore.kernel.org
 help / color / mirror / Atom feed
* Hunting mode switches
@ 2021-05-31 16:06 François Legal
  2021-05-31 16:14 ` Jan Kiszka
  0 siblings, 1 reply; 16+ messages in thread
From: François Legal @ 2021-05-31 16:06 UTC (permalink / raw)
  To: xenomai

Hello,

I'm having trouble hunting mode switches on an arm embedded application.

When I look at cat /proc/xenomai/sched/stat

CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
  0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
  1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
  1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
  0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
  1  901    110        218        203        0     00264044    0.0  APP_Main
  0  911    1          1216       3524       0     00240046    0.0  ThreadPar
  0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
  0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
  0  914    7          62         71         0     00240042    0.0  TheadGO
  0  915    3          647        1190       0     00248046    0.0  ThreadDbg
  0  916    1          2024       3087       0     00248046    0.0  ThreadCt
  0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
  0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
  0  919    1          161554     479305     0     00248042    1.4  ThreadSched
  0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
  1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
  0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
  1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
  0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
  1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
  0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
  1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]

But when following the procedure in [1], I get
no slacker

I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
What am I doing wrong.

Thanks in advance

François

[1] : https://source.denx.de/Xenomai/xenomai/-/wikis/Finding_Spurious_Relaxes



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-05-31 16:06 Hunting mode switches François Legal
@ 2021-05-31 16:14 ` Jan Kiszka
  2021-05-31 18:57   ` François Legal
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kiszka @ 2021-05-31 16:14 UTC (permalink / raw)
  To: François Legal, xenomai

On 31.05.21 18:06, François Legal via Xenomai wrote:
> Hello,
> 
> I'm having trouble hunting mode switches on an arm embedded application.
> 
> When I look at cat /proc/xenomai/sched/stat
> 
> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
>   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
>   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
>   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
>   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
>   1  901    110        218        203        0     00264044    0.0  APP_Main
>   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
>   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
>   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
>   0  914    7          62         71         0     00240042    0.0  TheadGO
>   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
>   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
>   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
>   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
>   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
>   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
>   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
>   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
>   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
>   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
>   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
>   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
>   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> 
> But when following the procedure in [1], I get 
> no slacker
> 
> I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
> What am I doing wrong.
> 

CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?

Did you also try the in-app instrumentation via SIGDEBUG, maybe
attaching gdb to the process to catch the unwanted cases?

Jan

> Thanks in advance
> 
> François
> 
> [1] : https://source.denx.de/Xenomai/xenomai/-/wikis/Finding_Spurious_Relaxes
> 
> 

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-05-31 16:14 ` Jan Kiszka
@ 2021-05-31 18:57   ` François Legal
  2021-06-01  7:47     ` François Legal
  0 siblings, 1 reply; 16+ messages in thread
From: François Legal @ 2021-05-31 18:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:

> On 31.05.21 18:06, François Legal via Xenomai wrote:
> > Hello,
> >
> > I'm having trouble hunting mode switches on an arm embedded application.
> >
> > When I look at cat /proc/xenomai/sched/stat
> >
> > CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
> >   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
> >   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
> >   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
> >   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
> >   1  901    110        218        203        0     00264044    0.0  APP_Main
> >   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
> >   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
> >   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
> >   0  914    7          62         71         0     00240042    0.0  TheadGO
> >   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
> >   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
> >   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
> >   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
> >   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
> >   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
> >   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
> >   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
> >   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
> >   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> >   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> >   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> >   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> >
> > But when following the procedure in [1], I get
> > no slacker
> >
> > I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
> > What am I doing wrong.
> >
>
> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
>

Yes, it is.

> Did you also try the in-app instrumentation via SIGDEBUG, maybe
> attaching gdb to the process to catch the unwanted cases?
>

No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
I'll try that tomorrow and report back to the list

> Jan
>
> > Thanks in advance
> >
> > François
> >
> > [1] : https://source.denx.de/Xenomai/xenomai/-/wikis/Finding_Spurious_Relaxes
> >
> >
>
> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-05-31 18:57   ` François Legal
@ 2021-06-01  7:47     ` François Legal
  2021-06-01  8:31       ` Jan Kiszka
  2021-06-01  8:46       ` Philippe Gerum
  0 siblings, 2 replies; 16+ messages in thread
From: François Legal @ 2021-06-01  7:47 UTC (permalink / raw)
  Cc: Jan Kiszka, xenomai

Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit:

> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
>
> > On 31.05.21 18:06, François Legal via Xenomai wrote:
> > > Hello,
> > >
> > > I'm having trouble hunting mode switches on an arm embedded application.
> > >
> > > When I look at cat /proc/xenomai/sched/stat
> > >
> > > CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
> > >   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
> > >   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
> > >   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
> > >   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
> > >   1  901    110        218        203        0     00264044    0.0  APP_Main
> > >   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
> > >   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
> > >   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
> > >   0  914    7          62         71         0     00240042    0.0  TheadGO
> > >   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
> > >   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
> > >   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
> > >   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
> > >   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
> > >   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
> > >   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
> > >   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
> > >   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
> > >   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> > >   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> > >   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> > >   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> > >
> > > But when following the procedure in [1], I get
> > > no slacker
> > >
> > > I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
> > > What am I doing wrong.
> > >
> >
> > CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
> >
>
> Yes, it is.
>
> > Did you also try the in-app instrumentation via SIGDEBUG, maybe
> > attaching gdb to the process to catch the unwanted cases?
> >
>
> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
> I'll try that tomorrow and report back to the list
>

So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.

On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !

Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.

Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.

I'm really confused now.

François

> > Jan
> >
> > > Thanks in advance
> > >
> > > François
> > >
> > > [1] : https://source.denx.de/Xenomai/xenomai/-/wikis/Finding_Spurious_Relaxes
> > >
> > >
> >
> > --
> > Siemens AG, T RDA IOT
> > Corporate Competence Center Embedded Linux
>
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-06-01  7:47     ` François Legal
@ 2021-06-01  8:31       ` Jan Kiszka
  2021-06-02 10:26         ` François Legal
  2021-06-01  8:46       ` Philippe Gerum
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Kiszka @ 2021-06-01  8:31 UTC (permalink / raw)
  To: François Legal; +Cc: xenomai

On 01.06.21 09:47, François Legal wrote:
> Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit: 
>  
>> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>  
>>> On 31.05.21 18:06, François Legal via Xenomai wrote:
>>>> Hello,
>>>>
>>>> I'm having trouble hunting mode switches on an arm embedded application.
>>>>
>>>> When I look at cat /proc/xenomai/sched/stat
>>>>
>>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
>>>>   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
>>>>   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
>>>>   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
>>>>   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
>>>>   1  901    110        218        203        0     00264044    0.0  APP_Main
>>>>   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
>>>>   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
>>>>   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
>>>>   0  914    7          62         71         0     00240042    0.0  TheadGO
>>>>   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
>>>>   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
>>>>   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
>>>>   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
>>>>   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
>>>>   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
>>>>   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
>>>>   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
>>>>   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
>>>>   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
>>>>
>>>> But when following the procedure in [1], I get 
>>>> no slacker
>>>>
>>>> I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
>>>> What am I doing wrong.
>>>>
>>>
>>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
>>>
>>
>> Yes, it is.
>>
>>> Did you also try the in-app instrumentation via SIGDEBUG, maybe
>>> attaching gdb to the process to catch the unwanted cases?
>>>
>>
>> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
>> I'll try that tomorrow and report back to the list 
>>
> 
> So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.
> 
> On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !
> 
> Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.
> 
> Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.
> 
> I'm really confused now.
> 

Are you instrumenting only the part of the thread that has all
initialization done and until it starts to de-initialize again? You can
control when the mode switches should be reported and when not.

Then, if you have a backtrace and that backtrace contain a valid
syscall, you should also find a valid point in your own application when
this was started (indirectly). Maybe some library call?

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-06-01  7:47     ` François Legal
  2021-06-01  8:31       ` Jan Kiszka
@ 2021-06-01  8:46       ` Philippe Gerum
  1 sibling, 0 replies; 16+ messages in thread
From: Philippe Gerum @ 2021-06-01  8:46 UTC (permalink / raw)
  To: François Legal; +Cc: xenomai


François Legal via Xenomai <xenomai@xenomai.org> writes:

> Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit: 
>  
>> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>  
>> > On 31.05.21 18:06, François Legal via Xenomai wrote:
>> > > Hello,
>> > > 
>> > > I'm having trouble hunting mode switches on an arm embedded application.
>> > > 
>> > > When I look at cat /proc/xenomai/sched/stat
>> > > 
>> > > CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
>> > >   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
>> > >   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
>> > >   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
>> > >   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
>> > >   1  901    110        218        203        0     00264044    0.0  APP_Main
>> > >   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
>> > >   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
>> > >   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
>> > >   0  914    7          62         71         0     00240042    0.0  TheadGO
>> > >   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
>> > >   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
>> > >   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
>> > >   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
>> > >   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
>> > >   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
>> > >   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
>> > >   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
>> > >   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
>> > >   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
>> > >   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
>> > >   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
>> > >   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
>> > > 
>> > > But when following the procedure in [1], I get 
>> > > no slacker
>> > > 
>> > > I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
>> > > What am I doing wrong.
>> > > 
>> > 
>> > CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
>> > 
>> 
>> Yes, it is.
>> 
>> > Did you also try the in-app instrumentation via SIGDEBUG, maybe
>> > attaching gdb to the process to catch the unwanted cases?
>> > 
>> 
>> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
>> I'll try that tomorrow and report back to the list 
>> 
>
> So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.
>
> On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !
>
> Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.
>
> Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.
>
> I'm really confused now.
>

You may be receiving a SIGDEBUG signal from the Cobalt core which is not
handled by cobalt_sigdebug_handler() in lib/cobalt (internal.c) and
therefore passed on to the original handler, likely SIG_DFL. So first
thing is to run the app over gdb, breakpoint into
cobalt_sigdebug_handler(), then inspect the cause code in siginfo and
the backtrace at this point (bt).

-- 
Philippe.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-06-01  8:31       ` Jan Kiszka
@ 2021-06-02 10:26         ` François Legal
  2021-06-02 10:59           ` Jan Kiszka
  0 siblings, 1 reply; 16+ messages in thread
From: François Legal @ 2021-06-02 10:26 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Le Mardi, Juin 01, 2021 10:31 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:

> On 01.06.21 09:47, François Legal wrote:
> > Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit:
> >
> >> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>
> >>> On 31.05.21 18:06, François Legal via Xenomai wrote:
> >>>> Hello,
> >>>>
> >>>> I'm having trouble hunting mode switches on an arm embedded application.
> >>>>
> >>>> When I look at cat /proc/xenomai/sched/stat
> >>>>
> >>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
> >>>>   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
> >>>>   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
> >>>>   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
> >>>>   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
> >>>>   1  901    110        218        203        0     00264044    0.0  APP_Main
> >>>>   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
> >>>>   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
> >>>>   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
> >>>>   0  914    7          62         71         0     00240042    0.0  TheadGO
> >>>>   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
> >>>>   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
> >>>>   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
> >>>>   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
> >>>>   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
> >>>>   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
> >>>>   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
> >>>>   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
> >>>>   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
> >>>>   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> >>>>   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> >>>>   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> >>>>   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> >>>>
> >>>> But when following the procedure in [1], I get
> >>>> no slacker
> >>>>
> >>>> I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
> >>>> What am I doing wrong.
> >>>>
> >>>
> >>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
> >>>
> >>
> >> Yes, it is.
> >>
> >>> Did you also try the in-app instrumentation via SIGDEBUG, maybe
> >>> attaching gdb to the process to catch the unwanted cases?
> >>>
> >>
> >> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
> >> I'll try that tomorrow and report back to the list
> >>
> >
> > So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.
> >
> > On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !
> >
> > Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.
> >
> > Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.
> >
> > I'm really confused now.
> >
>
> Are you instrumenting only the part of the thread that has all
> initialization done and until it starts to de-initialize again? You can
> control when the mode switches should be reported and when not.
>
> Then, if you have a backtrace and that backtrace contain a valid
> syscall, you should also find a valid point in your own application when
> this was started (indirectly). Maybe some library call?
>
> Jan
>

So in the end, slackspot seems to work. I did not understand, by reading the manual, that slackspot/CONFIG_XENO_OPT_DEBUG_TRACE_RELAX also needed that each thread call pthread_setmode_np(0, PTHREAD_WARNSW, NULL);

So now, I've got a working backtrace, however, I'm still lost onto understanding what happens.

I believe I might be having some troubles with the signals, but I can't seem to find an explanation.
Maybe I should document more my app.

ThreadSched is pending on a timer (with sigwaitinfo). That time triggers every 750µs. When waking up, this thread will post one or 2 semaphores, that will unlock other threads (of which ThreadExec which seem to be doing MSW more than once every two activations).

I ended up modifying the app startup code because I was experiencing weirdness in the startup with debugger attached (some semaphores wouldn't unlock, ...). Anyhow, I ended up doing :

    sigemptyset(&sigHandling.sa_mask);
    sigHandling.sa_sigaction = sigdebug_handler;
    sigHandling.sa_flags = SA_SIGINFO;
    sigaction(SIGDEBUG, &sigHandling, NULL);

    sigemptyset(&sigset);
    sigaddset (&sigset, SIGALRM);
    sigaddset (&sigset, SIGDEBUG);
    sigprocmask (SIG_BLOCK, &sigset, NULL);
    pthread_sigmask (SIG_BLOCK, &sigset, NULL);

before creating my RT threads (of which ThreadSched), then doing

        sigdelset(&sigset, SIGALRM);
        sigprocmask (SIG_UNBLOCK, &sigset, NULL);
        pthread_sigmask (SIG_UNBLOCK, &sigset, NULL);

        signal(SIGINT, sighandler_exit);
        signal(SIGTERM, sighandler_exit);
        signal(SIGUSR1, sighandler_exit);

After my RT threads are created.

If I comment out all of these, and add the pthread_setmode_np some place when initialisation of ThreadExec is over, I end up with the program being forcibly stopped with the indication [1]+  CPU time limit exceeded

So I put all those back in place, start again, then /proc/xenomai/debug/relax gets populated :
1115 75731 syscall ThreadExec
0x12720 /usr/lib/libcobalt.so.2.0.0
0x127b0 /usr/lib/libcobalt.so.2.0.0
0x2cd40 /usr/lib/libc-2.29.so
.
/usr/bin/CCN2019_DSP
1115 80299 syscall ThreadExec
0xd58a0 /usr/lib/libc-2.29.so
.

On this, slackspot gives :
Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (107769 times):
Caused by: syscall
   #0  0x00000000000d589f prctl() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/misc/../sysdeps/unix/syscall-template.S:78

Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (101892 times):
Caused by: syscall
   #0  0x000000000001271f ??? in [/usr/lib/libcobalt.so.2.0.0]
   #1  0x00000000000127af ??? in [/usr/lib/libcobalt.so.2.0.0]
   #2  0x000000000002cd3f __default_sa_restorer() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:77

So know, I really don't know where to go from here !

Thanks in advance

François

> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-06-02 10:26         ` François Legal
@ 2021-06-02 10:59           ` Jan Kiszka
  2021-06-02 15:06             ` François Legal
  2021-06-03  8:57             ` François Legal
  0 siblings, 2 replies; 16+ messages in thread
From: Jan Kiszka @ 2021-06-02 10:59 UTC (permalink / raw)
  To: François Legal; +Cc: xenomai

On 02.06.21 12:26, François Legal wrote:
> Le Mardi, Juin 01, 2021 10:31 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>  
>> On 01.06.21 09:47, François Legal wrote:
>>> Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit: 
>>>  
>>>> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>  
>>>>> On 31.05.21 18:06, François Legal via Xenomai wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I'm having trouble hunting mode switches on an arm embedded application.
>>>>>>
>>>>>> When I look at cat /proc/xenomai/sched/stat
>>>>>>
>>>>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
>>>>>>   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
>>>>>>   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
>>>>>>   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
>>>>>>   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
>>>>>>   1  901    110        218        203        0     00264044    0.0  APP_Main
>>>>>>   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
>>>>>>   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
>>>>>>   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
>>>>>>   0  914    7          62         71         0     00240042    0.0  TheadGO
>>>>>>   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
>>>>>>   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
>>>>>>   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
>>>>>>   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
>>>>>>   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
>>>>>>   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
>>>>>>   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
>>>>>>   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
>>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
>>>>>>   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
>>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
>>>>>>   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
>>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
>>>>>>
>>>>>> But when following the procedure in [1], I get 
>>>>>> no slacker
>>>>>>
>>>>>> I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
>>>>>> What am I doing wrong.
>>>>>>
>>>>>
>>>>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
>>>>>
>>>>
>>>> Yes, it is.
>>>>
>>>>> Did you also try the in-app instrumentation via SIGDEBUG, maybe
>>>>> attaching gdb to the process to catch the unwanted cases?
>>>>>
>>>>
>>>> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
>>>> I'll try that tomorrow and report back to the list 
>>>>
>>>
>>> So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.
>>>
>>> On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !
>>>
>>> Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.
>>>
>>> Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.
>>>
>>> I'm really confused now.
>>>
>>
>> Are you instrumenting only the part of the thread that has all
>> initialization done and until it starts to de-initialize again? You can
>> control when the mode switches should be reported and when not.
>>
>> Then, if you have a backtrace and that backtrace contain a valid
>> syscall, you should also find a valid point in your own application when
>> this was started (indirectly). Maybe some library call?
>>
>> Jan
>>
> 
> So in the end, slackspot seems to work. I did not understand, by reading the manual, that slackspot/CONFIG_XENO_OPT_DEBUG_TRACE_RELAX also needed that each thread call pthread_setmode_np(0, PTHREAD_WARNSW, NULL);
> 

We are happy to take patches - also against the doc / the wiki.

> So now, I've got a working backtrace, however, I'm still lost onto understanding what happens.
> 
> I believe I might be having some troubles with the signals, but I can't seem to find an explanation.
> Maybe I should document more my app.
> 
> ThreadSched is pending on a timer (with sigwaitinfo). That time triggers every 750µs. When waking up, this thread will post one or 2 semaphores, that will unlock other threads (of which ThreadExec which seem to be doing MSW more than once every two activations).
> 
> I ended up modifying the app startup code because I was experiencing weirdness in the startup with debugger attached (some semaphores wouldn't unlock, ...). Anyhow, I ended up doing :
> 
>     sigemptyset(&sigHandling.sa_mask);
>     sigHandling.sa_sigaction = sigdebug_handler;
>     sigHandling.sa_flags = SA_SIGINFO;
>     sigaction(SIGDEBUG, &sigHandling, NULL);
> 
>     sigemptyset(&sigset);
>     sigaddset (&sigset, SIGALRM);
>     sigaddset (&sigset, SIGDEBUG);
>     sigprocmask (SIG_BLOCK, &sigset, NULL);
>     pthread_sigmask (SIG_BLOCK, &sigset, NULL);
> 
> before creating my RT threads (of which ThreadSched), then doing
> 
>         sigdelset(&sigset, SIGALRM);
>         sigprocmask (SIG_UNBLOCK, &sigset, NULL);
>         pthread_sigmask (SIG_UNBLOCK, &sigset, NULL);
> 
>         signal(SIGINT, sighandler_exit);
>         signal(SIGTERM, sighandler_exit);
>         signal(SIGUSR1, sighandler_exit);
> 
> After my RT threads are created.
> 
> If I comment out all of these, and add the pthread_setmode_np some place when initialisation of ThreadExec is over, I end up with the program being forcibly stopped with the indication [1]+  CPU time limit exceeded
> 
> So I put all those back in place, start again, then /proc/xenomai/debug/relax gets populated :
> 1115 75731 syscall ThreadExec 
> 0x12720 /usr/lib/libcobalt.so.2.0.0
> 0x127b0 /usr/lib/libcobalt.so.2.0.0
> 0x2cd40 /usr/lib/libc-2.29.so
> .
> /usr/bin/CCN2019_DSP
> 1115 80299 syscall ThreadExec 
> 0xd58a0 /usr/lib/libc-2.29.so
> .
> 
> On this, slackspot gives :
> Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (107769 times):
> Caused by: syscall
>    #0  0x00000000000d589f prctl() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/misc/../sysdeps/unix/syscall-template.S:78
> 
> Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (101892 times):
> Caused by: syscall
>    #0  0x000000000001271f ??? in [/usr/lib/libcobalt.so.2.0.0]
>    #1  0x00000000000127af ??? in [/usr/lib/libcobalt.so.2.0.0]
>    #2  0x000000000002cd3f __default_sa_restorer() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:77
> 
> So know, I really don't know where to go from here !
> 

Could you recompile with debugging symbols for libcobalt and your
application and check where exactly the Linux syscall is issued and what
library call from your application relates to it? gdb may provide this
better.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-06-02 10:59           ` Jan Kiszka
@ 2021-06-02 15:06             ` François Legal
  2021-06-02 15:09               ` François Legal
  2021-06-03  8:57             ` François Legal
  1 sibling, 1 reply; 16+ messages in thread
From: François Legal @ 2021-06-02 15:06 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Le Mercredi, Juin 02, 2021 12:59 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:

> On 02.06.21 12:26, François Legal wrote:
> > Le Mardi, Juin 01, 2021 10:31 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >
> >> On 01.06.21 09:47, François Legal wrote:
> >>> Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit:
> >>>
> >>>> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>>>
> >>>>> On 31.05.21 18:06, François Legal via Xenomai wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> I'm having trouble hunting mode switches on an arm embedded application.
> >>>>>>
> >>>>>> When I look at cat /proc/xenomai/sched/stat
> >>>>>>
> >>>>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
> >>>>>>   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
> >>>>>>   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
> >>>>>>   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
> >>>>>>   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
> >>>>>>   1  901    110        218        203        0     00264044    0.0  APP_Main
> >>>>>>   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
> >>>>>>   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
> >>>>>>   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
> >>>>>>   0  914    7          62         71         0     00240042    0.0  TheadGO
> >>>>>>   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
> >>>>>>   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
> >>>>>>   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
> >>>>>>   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
> >>>>>>   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
> >>>>>>   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
> >>>>>>   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
> >>>>>>   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
> >>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
> >>>>>>   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> >>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> >>>>>>   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> >>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> >>>>>>
> >>>>>> But when following the procedure in [1], I get
> >>>>>> no slacker
> >>>>>>
> >>>>>> I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
> >>>>>> What am I doing wrong.
> >>>>>>
> >>>>>
> >>>>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
> >>>>>
> >>>>
> >>>> Yes, it is.
> >>>>
> >>>>> Did you also try the in-app instrumentation via SIGDEBUG, maybe
> >>>>> attaching gdb to the process to catch the unwanted cases?
> >>>>>
> >>>>
> >>>> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
> >>>> I'll try that tomorrow and report back to the list
> >>>>
> >>>
> >>> So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.
> >>>
> >>> On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !
> >>>
> >>> Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.
> >>>
> >>> Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.
> >>>
> >>> I'm really confused now.
> >>>
> >>
> >> Are you instrumenting only the part of the thread that has all
> >> initialization done and until it starts to de-initialize again? You can
> >> control when the mode switches should be reported and when not.
> >>
> >> Then, if you have a backtrace and that backtrace contain a valid
> >> syscall, you should also find a valid point in your own application when
> >> this was started (indirectly). Maybe some library call?
> >>
> >> Jan
> >>
> >
> > So in the end, slackspot seems to work. I did not understand, by reading the manual, that slackspot/CONFIG_XENO_OPT_DEBUG_TRACE_RELAX also needed that each thread call pthread_setmode_np(0, PTHREAD_WARNSW, NULL);
> >
>
> We are happy to take patches - also against the doc / the wiki.
>
> > So now, I've got a working backtrace, however, I'm still lost onto understanding what happens.
> >
> > I believe I might be having some troubles with the signals, but I can't seem to find an explanation.
> > Maybe I should document more my app.
> >
> > ThreadSched is pending on a timer (with sigwaitinfo). That time triggers every 750µs. When waking up, this thread will post one or 2 semaphores, that will unlock other threads (of which ThreadExec which seem to be doing MSW more than once every two activations).
> >
> > I ended up modifying the app startup code because I was experiencing weirdness in the startup with debugger attached (some semaphores wouldn't unlock, ...). Anyhow, I ended up doing :
> >
> >     sigemptyset(&sigHandling.sa_mask);
> >     sigHandling.sa_sigaction = sigdebug_handler;
> >     sigHandling.sa_flags = SA_SIGINFO;
> >     sigaction(SIGDEBUG, &sigHandling, NULL);
> >
> >     sigemptyset(&sigset);
> >     sigaddset (&sigset, SIGALRM);
> >     sigaddset (&sigset, SIGDEBUG);
> >     sigprocmask (SIG_BLOCK, &sigset, NULL);
> >     pthread_sigmask (SIG_BLOCK, &sigset, NULL);
> >
> > before creating my RT threads (of which ThreadSched), then doing
> >
> >         sigdelset(&sigset, SIGALRM);
> >         sigprocmask (SIG_UNBLOCK, &sigset, NULL);
> >         pthread_sigmask (SIG_UNBLOCK, &sigset, NULL);
> >
> >         signal(SIGINT, sighandler_exit);
> >         signal(SIGTERM, sighandler_exit);
> >         signal(SIGUSR1, sighandler_exit);
> >
> > After my RT threads are created.
> >
> > If I comment out all of these, and add the pthread_setmode_np some place when initialisation of ThreadExec is over, I end up with the program being forcibly stopped with the indication [1]+  CPU time limit exceeded
> >
> > So I put all those back in place, start again, then /proc/xenomai/debug/relax gets populated :
> > 1115 75731 syscall ThreadExec
> > 0x12720 /usr/lib/libcobalt.so.2.0.0
> > 0x127b0 /usr/lib/libcobalt.so.2.0.0
> > 0x2cd40 /usr/lib/libc-2.29.so
> > .
> > /usr/bin/CCN2019_DSP
> > 1115 80299 syscall ThreadExec
> > 0xd58a0 /usr/lib/libc-2.29.so
> > .
> >
> > On this, slackspot gives :
> > Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (107769 times):
> > Caused by: syscall
> >    #0  0x00000000000d589f prctl() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/misc/../sysdeps/unix/syscall-template.S:78
> >
> > Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (101892 times):
> > Caused by: syscall
> >    #0  0x000000000001271f ??? in [/usr/lib/libcobalt.so.2.0.0]
> >    #1  0x00000000000127af ??? in [/usr/lib/libcobalt.so.2.0.0]
> >    #2  0x000000000002cd3f __default_sa_restorer() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:77
> >
> > So know, I really don't know where to go from here !
> >
>
> Could you recompile with debugging symbols for libcobalt and your
> application and check where exactly the Linux syscall is issued and what
> library call from your application relates to it? gdb may provide this
> better.
>

I'm trying hard to do that, but with the debug enabled libcobalt, my app stays blocked on a semaphore (I posted a topic on this on the list a while ago, but had no time to diagnose this). What is strange, is that when I start the app and reach the point where it gets locked, I get the following :

 cat /proc/xenomai/sched/threads
CPU  PID    CLASS  TYPE      PRI   TIMEOUT       STAT       NAME
  0  0      idle   core       -1   -             R          [ROOT/0]
  1  0      idle   core       -1   -             R          [ROOT/1]
  0  765    rt     core       98   -             W          [rtnet-stack]
  1  767    rt     core        0   -             W          [rtnet-rtpc]
  0  1069   rt     cobalt      0   -             Xts        APP_Main
  0  1071   rt     cobalt     60   -             s          ThreadInit
  0  1072   rt     cobalt     62   -             W          ThreadEcat
  0  1073   rt     cobalt     52   -             s          ThreadPar
  0  1074   rt     cobalt     60   -             s          ThreadMessage
  0  1075   rt     cobalt     55   -             s          ThreadExec
  0  1076   rt     cobalt     55   -             Ws         ThreadGO
  0  1077   rt     cobalt     51   -             s          ThreadDbg
  0  1078   rt     cobalt     64   -             ts         ThreadCT
  0  1079   rt     cobalt     51   -             s          ThreadSupInv
  0  1080   rt     cobalt     52   -             s          ThreadMessage2
  0  1081   rt     cobalt     70   -             s          ThreadSched

Then I halt the program in the debugger, then hit continue, then I get :

cat /proc/xenomai/sched/threads
CPU  PID    CLASS  TYPE      PRI   TIMEOUT       STAT       NAME
  0  0      idle   core       -1   -             R          [ROOT/0]
  1  0      idle   core       -1   -             R          [ROOT/1]
  0  765    rt     core       98   -             W          [rtnet-stack]
  1  767    rt     core        0   -             W          [rtnet-rtpc]
  0  1069   rt     cobalt      0   -             Xt         APP_Main
  0  1071   rt     cobalt     60   1266874550s283m  w          ThreadInit
  0  1072   rt     cobalt     62   -             W          ThreadEcat
  0  1073   rt     cobalt     52   41ms937us     w          ThreadPar
  0  1074   rt     cobalt     60   41ms917us     w          ThreadMessage
  0  1075   rt     cobalt     55   9ms653us      w          ThreadExec
  0  1076   rt     cobalt     55   -             W          ThreadGO
  0  1077   rt     cobalt     51   41ms949us     w          ThreadDbg
  0  1078   rt     cobalt     64   1ms387us      Dt         ThreadCT
  0  1079   rt     cobalt     51   41ms963us     w          ThreadSupInv
  0  1080   rt     cobalt     52   41ms955us     w          ThreadMessage2
  0  1081   rt     cobalt     70   -             W          ThreadSched

There shall be a ping/pong with 2 semaphores between ThreadInit and ThreadEcat. Thread init is supposed to pend with a 30s timeout on the semaphore.

I don't know what to do from here.

I tried to make a simple program to diagnose that too, but wasn't able to reproduce the problem.

François

> Jan
>
> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-06-02 15:06             ` François Legal
@ 2021-06-02 15:09               ` François Legal
  2021-06-02 15:13                 ` Jan Kiszka
  0 siblings, 1 reply; 16+ messages in thread
From: François Legal @ 2021-06-02 15:09 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Le Mercredi, Juin 02, 2021 17:06 CEST, François Legal <devel@thom.fr.eu.org> a écrit:

> Le Mercredi, Juin 02, 2021 12:59 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
>
> > On 02.06.21 12:26, François Legal wrote:
> > > Le Mardi, Juin 01, 2021 10:31 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> > >
> > >> On 01.06.21 09:47, François Legal wrote:
> > >>> Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit:
> > >>>
> > >>>> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> > >>>>
> > >>>>> On 31.05.21 18:06, François Legal via Xenomai wrote:
> > >>>>>> Hello,
> > >>>>>>
> > >>>>>> I'm having trouble hunting mode switches on an arm embedded application.
> > >>>>>>
> > >>>>>> When I look at cat /proc/xenomai/sched/stat
> > >>>>>>
> > >>>>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
> > >>>>>>   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
> > >>>>>>   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
> > >>>>>>   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
> > >>>>>>   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
> > >>>>>>   1  901    110        218        203        0     00264044    0.0  APP_Main
> > >>>>>>   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
> > >>>>>>   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
> > >>>>>>   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
> > >>>>>>   0  914    7          62         71         0     00240042    0.0  TheadGO
> > >>>>>>   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
> > >>>>>>   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
> > >>>>>>   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
> > >>>>>>   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
> > >>>>>>   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
> > >>>>>>   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
> > >>>>>>   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
> > >>>>>>   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
> > >>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
> > >>>>>>   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> > >>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> > >>>>>>   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> > >>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> > >>>>>>
> > >>>>>> But when following the procedure in [1], I get
> > >>>>>> no slacker
> > >>>>>>
> > >>>>>> I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
> > >>>>>> What am I doing wrong.
> > >>>>>>
> > >>>>>
> > >>>>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
> > >>>>>
> > >>>>
> > >>>> Yes, it is.
> > >>>>
> > >>>>> Did you also try the in-app instrumentation via SIGDEBUG, maybe> >>>>> attaching gdb to the process to catch the unwanted cases?
> > >>>>>
> > >>>>
> > >>>> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
> > >>>> I'll try that tomorrow and report back to the list
> > >>>>
> > >>>
> > >>> So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.
> > >>>
> > >>> On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !
> > >>>
> > >>> Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.
> > >>>
> > >>> Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.
> > >>>
> > >>> I'm really confused now.
> > >>>
> > >>
> > >> Are you instrumenting only the part of the thread that has all
> > >> initialization done and until it starts to de-initialize again? You can
> > >> control when the mode switches should be reported and when not.
> > >>
> > >> Then, if you have a backtrace and that backtrace contain a valid
> > >> syscall, you should also find a valid point in your own application when
> > >> this was started (indirectly). Maybe some library call?
> > >>
> > >> Jan
> > >>
> > >
> > > So in the end, slackspot seems to work. I did not understand, by reading the manual, that slackspot/CONFIG_XENO_OPT_DEBUG_TRACE_RELAX also needed that each thread call pthread_setmode_np(0, PTHREAD_WARNSW, NULL);
> > >
> >
> > We are happy to take patches - also against the doc / the wiki.
> >
> > > So now, I've got a working backtrace, however, I'm still lost onto understanding what happens.
> > >
> > > I believe I might be having some troubles with the signals, but I can't seem to find an explanation.
> > > Maybe I should document more my app.
> > >
> > > ThreadSched is pending on a timer (with sigwaitinfo). That time triggers every 750µs. When waking up, this thread will post one or 2 semaphores, that will unlock other threads (of which ThreadExec which seem to be doing MSW more than once every two activations).
> > >
> > > I ended up modifying the app startup code because I was experiencing weirdness in the startup with debugger attached (some semaphores wouldn't unlock, ...). Anyhow, I ended up doing :
> > >
> > >     sigemptyset(&sigHandling.sa_mask);
> > >     sigHandling.sa_sigaction = sigdebug_handler;
> > >     sigHandling.sa_flags = SA_SIGINFO;
> > >     sigaction(SIGDEBUG, &sigHandling, NULL);
> > >
> > >     sigemptyset(&sigset);
> > >     sigaddset (&sigset, SIGALRM);
> > >     sigaddset (&sigset, SIGDEBUG);
> > >     sigprocmask (SIG_BLOCK, &sigset, NULL);
> > >     pthread_sigmask (SIG_BLOCK, &sigset, NULL);
> > >
> > > before creating my RT threads (of which ThreadSched), then doing
> > >
> > >         sigdelset(&sigset, SIGALRM);
> > >         sigprocmask (SIG_UNBLOCK, &sigset, NULL);
> > >         pthread_sigmask (SIG_UNBLOCK, &sigset, NULL);
> > >
> > >         signal(SIGINT, sighandler_exit);
> > >         signal(SIGTERM, sighandler_exit);
> > >         signal(SIGUSR1, sighandler_exit);
> > >
> > > After my RT threads are created.
> > >
> > > If I comment out all of these, and add the pthread_setmode_np some place when initialisation of ThreadExec is over, I end up with the program being forcibly stopped with the indication [1]+  CPU time limit exceeded
> > >
> > > So I put all those back in place, start again, then /proc/xenomai/debug/relax gets populated :
> > > 1115 75731 syscall ThreadExec
> > > 0x12720 /usr/lib/libcobalt.so.2.0.0
> > > 0x127b0 /usr/lib/libcobalt.so.2.0.0
> > > 0x2cd40 /usr/lib/libc-2.29.so
> > > .
> > > /usr/bin/CCN2019_DSP
> > > 1115 80299 syscall ThreadExec
> > > 0xd58a0 /usr/lib/libc-2.29.so
> > > .
> > >
> > > On this, slackspot gives :
> > > Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (107769 times):
> > > Caused by: syscall
> > >    #0  0x00000000000d589f prctl() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/misc/../sysdeps/unix/syscall-template.S:78
> > >
> > > Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (101892 times):
> > > Caused by: syscall
> > >    #0  0x000000000001271f ??? in [/usr/lib/libcobalt.so.2.0.0]
> > >    #1  0x00000000000127af ??? in [/usr/lib/libcobalt.so.2.0.0]
> > >    #2  0x000000000002cd3f __default_sa_restorer() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:77
> > >
> > > So know, I really don't know where to go from here !
> > >
> >
> > Could you recompile with debugging symbols for libcobalt and your
> > application and check where exactly the Linux syscall is issued and what
> > library call from your application relates to it? gdb may provide this
> > better.
> >
>
> I'm trying hard to do that, but with the debug enabled libcobalt, my app stays blocked on a semaphore (I posted a topic on this on the list a while ago, but had no time to diagnose this). What is strange, is that when I start the app and reach the point where it gets locked, I get the following :
>
>  cat /proc/xenomai/sched/threads
> CPU  PID    CLASS  TYPE      PRI   TIMEOUT       STAT       NAME
>   0  0      idle   core       -1   -             R          [ROOT/0]
>   1  0      idle   core       -1   -             R          [ROOT/1]
>   0  765    rt     core       98   -             W          [rtnet-stack]
>   1  767    rt     core        0   -             W          [rtnet-rtpc]
>   0  1069   rt     cobalt      0   -             Xts        APP_Main
>   0  1071   rt     cobalt     60   -             s          ThreadInit
>   0  1072   rt     cobalt     62   -             W          ThreadEcat
>   0  1073   rt     cobalt     52   -             s          ThreadPar
>   0  1074   rt     cobalt     60   -             s          ThreadMessage
>   0  1075   rt     cobalt     55   -             s          ThreadExec
>   0  1076   rt     cobalt     55   -             Ws         ThreadGO
>   0  1077   rt     cobalt     51   -             s          ThreadDbg
>   0  1078   rt     cobalt     64   -             ts         ThreadCT
>   0  1079   rt     cobalt     51   -             s          ThreadSupInv
>   0  1080   rt     cobalt     52   -             s          ThreadMessage2
>   0  1081   rt     cobalt     70   -             s          ThreadSched
> Then I halt the program in the debugger, then hit continue, then I get :
>
> cat /proc/xenomai/sched/threads
> CPU  PID    CLASS  TYPE      PRI   TIMEOUT       STAT       NAME
>   0  0      idle   core       -1   -             R          [ROOT/0]
>   1  0      idle   core       -1   -             R          [ROOT/1]
>   0  765    rt     core       98   -             W          [rtnet-stack]
>   1  767    rt     core        0   -             W          [rtnet-rtpc]
>   0  1069   rt     cobalt      0   -             Xt         APP_Main
>   0  1071   rt     cobalt     60   1266874550s283m  w          ThreadInit
>   0  1072   rt     cobalt     62   -             W          ThreadEcat
>   0  1073   rt     cobalt     52   41ms937us     w          ThreadPar
>   0  1074   rt     cobalt     60   41ms917us     w          ThreadMessage
>   0  1075   rt     cobalt     55   9ms653us      w          ThreadExec
>   0  1076   rt     cobalt     55   -             W          ThreadGO
>   0  1077   rt     cobalt     51   41ms949us     w          ThreadDbg
>   0  1078   rt     cobalt     64   1ms387us      Dt         ThreadCT
>   0  1079   rt     cobalt     51   41ms963us     w          ThreadSupInv
>   0  1080   rt     cobalt     52   41ms955us     w          ThreadMessage2
>   0  1081   rt     cobalt     70   -             W          ThreadSched
> There shall be a ping/pong with 2 semaphores between ThreadInit and ThreadEcat. Thread init is supposed to pend with a 30s timeout on the semaphore.
>
> I don't know what to do from here.
>
> I tried to make a simple program to diagnose that too, but wasn't able to reproduce the problem.
>
> François
>

Plus, I don't know if it's related, but upon starting the app in the debugger, I get a SIGILL before entering main.

> > Jan
> >
> > --
> > Siemens AG, T RDA IOT
> > Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-06-02 15:09               ` François Legal
@ 2021-06-02 15:13                 ` Jan Kiszka
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kiszka @ 2021-06-02 15:13 UTC (permalink / raw)
  To: François Legal; +Cc: xenomai

On 02.06.21 17:09, François Legal wrote:
> Le Mercredi, Juin 02, 2021 17:06 CEST, François Legal <devel@thom.fr.eu.org> a écrit: 
>  
>> Le Mercredi, Juin 02, 2021 12:59 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>  
>>> On 02.06.21 12:26, François Legal wrote:
>>>> Le Mardi, Juin 01, 2021 10:31 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>  
>>>>> On 01.06.21 09:47, François Legal wrote:
>>>>>> Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit: 
>>>>>>  
>>>>>>> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>  
>>>>>>>> On 31.05.21 18:06, François Legal via Xenomai wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I'm having trouble hunting mode switches on an arm embedded application.
>>>>>>>>>
>>>>>>>>> When I look at cat /proc/xenomai/sched/stat
>>>>>>>>>
>>>>>>>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
>>>>>>>>>   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
>>>>>>>>>   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
>>>>>>>>>   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
>>>>>>>>>   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
>>>>>>>>>   1  901    110        218        203        0     00264044    0.0  APP_Main
>>>>>>>>>   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
>>>>>>>>>   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
>>>>>>>>>   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
>>>>>>>>>   0  914    7          62         71         0     00240042    0.0  TheadGO
>>>>>>>>>   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
>>>>>>>>>   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
>>>>>>>>>   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
>>>>>>>>>   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
>>>>>>>>>   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
>>>>>>>>>   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
>>>>>>>>>   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
>>>>>>>>>   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
>>>>>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
>>>>>>>>>   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
>>>>>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
>>>>>>>>>   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
>>>>>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
>>>>>>>>>
>>>>>>>>> But when following the procedure in [1], I get 
>>>>>>>>> no slacker
>>>>>>>>>
>>>>>>>>> I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
>>>>>>>>> What am I doing wrong.
>>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, it is.
>>>>>>>
>>>>>>>> Did you also try the in-app instrumentation via SIGDEBUG, maybe> >>>>> attaching gdb to the process to catch the unwanted cases?
>>>>>>>>
>>>>>>>
>>>>>>> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
>>>>>>> I'll try that tomorrow and report back to the list 
>>>>>>>
>>>>>>
>>>>>> So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.
>>>>>>
>>>>>> On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !
>>>>>>
>>>>>> Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.
>>>>>>
>>>>>> Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.
>>>>>>
>>>>>> I'm really confused now.
>>>>>>
>>>>>
>>>>> Are you instrumenting only the part of the thread that has all
>>>>> initialization done and until it starts to de-initialize again? You can
>>>>> control when the mode switches should be reported and when not.
>>>>>
>>>>> Then, if you have a backtrace and that backtrace contain a valid
>>>>> syscall, you should also find a valid point in your own application when
>>>>> this was started (indirectly). Maybe some library call?
>>>>>
>>>>> Jan
>>>>>
>>>>
>>>> So in the end, slackspot seems to work. I did not understand, by reading the manual, that slackspot/CONFIG_XENO_OPT_DEBUG_TRACE_RELAX also needed that each thread call pthread_setmode_np(0, PTHREAD_WARNSW, NULL);
>>>>
>>>
>>> We are happy to take patches - also against the doc / the wiki.
>>>
>>>> So now, I've got a working backtrace, however, I'm still lost onto understanding what happens.
>>>>
>>>> I believe I might be having some troubles with the signals, but I can't seem to find an explanation.
>>>> Maybe I should document more my app.
>>>>
>>>> ThreadSched is pending on a timer (with sigwaitinfo). That time triggers every 750µs. When waking up, this thread will post one or 2 semaphores, that will unlock other threads (of which ThreadExec which seem to be doing MSW more than once every two activations).
>>>>
>>>> I ended up modifying the app startup code because I was experiencing weirdness in the startup with debugger attached (some semaphores wouldn't unlock, ...). Anyhow, I ended up doing :
>>>>
>>>>     sigemptyset(&sigHandling.sa_mask);
>>>>     sigHandling.sa_sigaction = sigdebug_handler;
>>>>     sigHandling.sa_flags = SA_SIGINFO;
>>>>     sigaction(SIGDEBUG, &sigHandling, NULL);
>>>>
>>>>     sigemptyset(&sigset);
>>>>     sigaddset (&sigset, SIGALRM);
>>>>     sigaddset (&sigset, SIGDEBUG);
>>>>     sigprocmask (SIG_BLOCK, &sigset, NULL);
>>>>     pthread_sigmask (SIG_BLOCK, &sigset, NULL);
>>>>
>>>> before creating my RT threads (of which ThreadSched), then doing
>>>>
>>>>         sigdelset(&sigset, SIGALRM);
>>>>         sigprocmask (SIG_UNBLOCK, &sigset, NULL);
>>>>         pthread_sigmask (SIG_UNBLOCK, &sigset, NULL);
>>>>
>>>>         signal(SIGINT, sighandler_exit);
>>>>         signal(SIGTERM, sighandler_exit);
>>>>         signal(SIGUSR1, sighandler_exit);
>>>>
>>>> After my RT threads are created.
>>>>
>>>> If I comment out all of these, and add the pthread_setmode_np some place when initialisation of ThreadExec is over, I end up with the program being forcibly stopped with the indication [1]+  CPU time limit exceeded
>>>>
>>>> So I put all those back in place, start again, then /proc/xenomai/debug/relax gets populated :
>>>> 1115 75731 syscall ThreadExec 
>>>> 0x12720 /usr/lib/libcobalt.so.2.0.0
>>>> 0x127b0 /usr/lib/libcobalt.so.2.0.0
>>>> 0x2cd40 /usr/lib/libc-2.29.so
>>>> .
>>>> /usr/bin/CCN2019_DSP
>>>> 1115 80299 syscall ThreadExec 
>>>> 0xd58a0 /usr/lib/libc-2.29.so
>>>> .
>>>>
>>>> On this, slackspot gives :
>>>> Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (107769 times):
>>>> Caused by: syscall
>>>>    #0  0x00000000000d589f prctl() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/misc/../sysdeps/unix/syscall-template.S:78
>>>>
>>>> Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (101892 times):
>>>> Caused by: syscall
>>>>    #0  0x000000000001271f ??? in [/usr/lib/libcobalt.so.2.0.0]
>>>>    #1  0x00000000000127af ??? in [/usr/lib/libcobalt.so.2.0.0]
>>>>    #2  0x000000000002cd3f __default_sa_restorer() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:77
>>>>
>>>> So know, I really don't know where to go from here !
>>>>
>>>
>>> Could you recompile with debugging symbols for libcobalt and your
>>> application and check where exactly the Linux syscall is issued and what
>>> library call from your application relates to it? gdb may provide this
>>> better.
>>>
>>
>> I'm trying hard to do that, but with the debug enabled libcobalt, my app stays blocked on a semaphore (I posted a topic on this on the list a while ago, but had no time to diagnose this). What is strange, is that when I start the app and reach the point where it gets locked, I get the following :
>>
>>  cat /proc/xenomai/sched/threads
>> CPU  PID    CLASS  TYPE      PRI   TIMEOUT       STAT       NAME
>>   0  0      idle   core       -1   -             R          [ROOT/0]
>>   1  0      idle   core       -1   -             R          [ROOT/1]
>>   0  765    rt     core       98   -             W          [rtnet-stack]
>>   1  767    rt     core        0   -             W          [rtnet-rtpc]
>>   0  1069   rt     cobalt      0   -             Xts        APP_Main
>>   0  1071   rt     cobalt     60   -             s          ThreadInit
>>   0  1072   rt     cobalt     62   -             W          ThreadEcat
>>   0  1073   rt     cobalt     52   -             s          ThreadPar
>>   0  1074   rt     cobalt     60   -             s          ThreadMessage
>>   0  1075   rt     cobalt     55   -             s          ThreadExec
>>   0  1076   rt     cobalt     55   -             Ws         ThreadGO
>>   0  1077   rt     cobalt     51   -             s          ThreadDbg
>>   0  1078   rt     cobalt     64   -             ts         ThreadCT
>>   0  1079   rt     cobalt     51   -             s          ThreadSupInv
>>   0  1080   rt     cobalt     52   -             s          ThreadMessage2
>>   0  1081   rt     cobalt     70   -             s          ThreadSched
>> Then I halt the program in the debugger, then hit continue, then I get :
>>
>> cat /proc/xenomai/sched/threads
>> CPU  PID    CLASS  TYPE      PRI   TIMEOUT       STAT       NAME
>>   0  0      idle   core       -1   -             R          [ROOT/0]
>>   1  0      idle   core       -1   -             R          [ROOT/1]
>>   0  765    rt     core       98   -             W          [rtnet-stack]
>>   1  767    rt     core        0   -             W          [rtnet-rtpc]
>>   0  1069   rt     cobalt      0   -             Xt         APP_Main
>>   0  1071   rt     cobalt     60   1266874550s283m  w          ThreadInit
>>   0  1072   rt     cobalt     62   -             W          ThreadEcat
>>   0  1073   rt     cobalt     52   41ms937us     w          ThreadPar
>>   0  1074   rt     cobalt     60   41ms917us     w          ThreadMessage
>>   0  1075   rt     cobalt     55   9ms653us      w          ThreadExec
>>   0  1076   rt     cobalt     55   -             W          ThreadGO
>>   0  1077   rt     cobalt     51   41ms949us     w          ThreadDbg
>>   0  1078   rt     cobalt     64   1ms387us      Dt         ThreadCT
>>   0  1079   rt     cobalt     51   41ms963us     w          ThreadSupInv
>>   0  1080   rt     cobalt     52   41ms955us     w          ThreadMessage2
>>   0  1081   rt     cobalt     70   -             W          ThreadSched
>> There shall be a ping/pong with 2 semaphores between ThreadInit and ThreadEcat. Thread init is supposed to pend with a 30s timeout on the semaphore.
>>
>> I don't know what to do from here.
>>
>> I tried to make a simple program to diagnose that too, but wasn't able to reproduce the problem.
>>
>> François
>>
> 
> Plus, I don't know if it's related, but upon starting the app in the debugger, I get a SIGILL before entering main.
> 

OK, one after the other - can you translate the addresses offline into
line numbers? Again, gdb can help, even if not running the code under it.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-06-02 10:59           ` Jan Kiszka
  2021-06-02 15:06             ` François Legal
@ 2021-06-03  8:57             ` François Legal
  2021-06-03 18:39               ` François Legal
  1 sibling, 1 reply; 16+ messages in thread
From: François Legal @ 2021-06-03  8:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Le Mercredi, Juin 02, 2021 12:59 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:

> On 02.06.21 12:26, François Legal wrote:
> > Le Mardi, Juin 01, 2021 10:31 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >
> >> On 01.06.21 09:47, François Legal wrote:
> >>> Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit:
> >>>
> >>>> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>>>
> >>>>> On 31.05.21 18:06, François Legal via Xenomai wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> I'm having trouble hunting mode switches on an arm embedded application.
> >>>>>>
> >>>>>> When I look at cat /proc/xenomai/sched/stat
> >>>>>>
> >>>>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
> >>>>>>   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
> >>>>>>   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
> >>>>>>   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
> >>>>>>   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
> >>>>>>   1  901    110        218        203        0     00264044    0.0  APP_Main
> >>>>>>   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
> >>>>>>   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
> >>>>>>   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
> >>>>>>   0  914    7          62         71         0     00240042    0.0  TheadGO
> >>>>>>   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
> >>>>>>   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
> >>>>>>   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
> >>>>>>   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
> >>>>>>   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
> >>>>>>   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
> >>>>>>   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
> >>>>>>   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
> >>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
> >>>>>>   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> >>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> >>>>>>   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> >>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> >>>>>>
> >>>>>> But when following the procedure in [1], I get
> >>>>>> no slacker
> >>>>>>
> >>>>>> I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
> >>>>>> What am I doing wrong.
> >>>>>>
> >>>>>
> >>>>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
> >>>>>
> >>>>
> >>>> Yes, it is.
> >>>>
> >>>>> Did you also try the in-app instrumentation via SIGDEBUG, maybe
> >>>>> attaching gdb to the process to catch the unwanted cases?
> >>>>>
> >>>>
> >>>> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
> >>>> I'll try that tomorrow and report back to the list
> >>>>
> >>>
> >>> So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.
> >>>
> >>> On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !
> >>>
> >>> Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.
> >>>
> >>> Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.
> >>>
> >>> I'm really confused now.
> >>>
> >>
> >> Are you instrumenting only the part of the thread that has all
> >> initialization done and until it starts to de-initialize again? You can
> >> control when the mode switches should be reported and when not.
> >>
> >> Then, if you have a backtrace and that backtrace contain a valid
> >> syscall, you should also find a valid point in your own application when
> >> this was started (indirectly). Maybe some library call?
> >>
> >> Jan
> >>
> >
> > So in the end, slackspot seems to work. I did not understand, by reading the manual, that slackspot/CONFIG_XENO_OPT_DEBUG_TRACE_RELAX also needed that each thread call pthread_setmode_np(0, PTHREAD_WARNSW, NULL);
> >
>
> We are happy to take patches - also against the doc / the wiki.
>
> > So now, I've got a working backtrace, however, I'm still lost onto understanding what happens.
> >
> > I believe I might be having some troubles with the signals, but I can't seem to find an explanation.
> > Maybe I should document more my app.
> >
> > ThreadSched is pending on a timer (with sigwaitinfo). That time triggers every 750µs. When waking up, this thread will post one or 2 semaphores, that will unlock other threads (of which ThreadExec which seem to be doing MSW more than once every two activations).
> >
> > I ended up modifying the app startup code because I was experiencing weirdness in the startup with debugger attached (some semaphores wouldn't unlock, ...). Anyhow, I ended up doing :
> >
> >     sigemptyset(&sigHandling.sa_mask);
> >     sigHandling.sa_sigaction = sigdebug_handler;
> >     sigHandling.sa_flags = SA_SIGINFO;
> >     sigaction(SIGDEBUG, &sigHandling, NULL);
> >
> >     sigemptyset(&sigset);
> >     sigaddset (&sigset, SIGALRM);
> >     sigaddset (&sigset, SIGDEBUG);
> >     sigprocmask (SIG_BLOCK, &sigset, NULL);
> >     pthread_sigmask (SIG_BLOCK, &sigset, NULL);
> >
> > before creating my RT threads (of which ThreadSched), then doing
> >
> >         sigdelset(&sigset, SIGALRM);
> >         sigprocmask (SIG_UNBLOCK, &sigset, NULL);
> >         pthread_sigmask (SIG_UNBLOCK, &sigset, NULL);
> >
> >         signal(SIGINT, sighandler_exit);
> >         signal(SIGTERM, sighandler_exit);
> >         signal(SIGUSR1, sighandler_exit);
> >
> > After my RT threads are created.
> >
> > If I comment out all of these, and add the pthread_setmode_np some place when initialisation of ThreadExec is over, I end up with the program being forcibly stopped with the indication [1]+  CPU time limit exceeded
> >
> > So I put all those back in place, start again, then /proc/xenomai/debug/relax gets populated :
> > 1115 75731 syscall ThreadExec
> > 0x12720 /usr/lib/libcobalt.so.2.0.0
> > 0x127b0 /usr/lib/libcobalt.so.2.0.0
> > 0x2cd40 /usr/lib/libc-2.29.so
> > .
> > /usr/bin/CCN2019_DSP
> > 1115 80299 syscall ThreadExec
> > 0xd58a0 /usr/lib/libc-2.29.so
> > .
> >
> > On this, slackspot gives :
> > Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (107769 times):
> > Caused by: syscall
> >    #0  0x00000000000d589f prctl() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/misc/../sysdeps/unix/syscall-template.S:78
> >
> > Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (101892 times):
> > Caused by: syscall
> >    #0  0x000000000001271f ??? in [/usr/lib/libcobalt.so.2.0.0]
> >    #1  0x00000000000127af ??? in [/usr/lib/libcobalt.so.2.0.0]
> >    #2  0x000000000002cd3f __default_sa_restorer() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:77
> >
> > So know, I really don't know where to go from here !
> >
>
> Could you recompile with debugging symbols for libcobalt and your
> application and check where exactly the Linux syscall is issued and what
> library call from your application relates to it? gdb may provide this
> better.
>

Ok. I think I found it. I had put in some debug code in my semaphore wrappers to check if the timeout given was not before the current time, and thus I used pthread_getname to display a trace with the name of the thread. This was the main problem.

> Jan
>
> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-06-03  8:57             ` François Legal
@ 2021-06-03 18:39               ` François Legal
  2021-06-04  7:21                 ` Jan Kiszka
  0 siblings, 1 reply; 16+ messages in thread
From: François Legal @ 2021-06-03 18:39 UTC (permalink / raw)
  Cc: Jan Kiszka, xenomai

Le Jeudi, Juin 03, 2021 10:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit:

> Le Mercredi, Juin 02, 2021 12:59 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
>
> > On 02.06.21 12:26, François Legal wrote:
> > > Le Mardi, Juin 01, 2021 10:31 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> > >
> > >> On 01.06.21 09:47, François Legal wrote:
> > >>> Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit:
> > >>>
> > >>>> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> > >>>>
> > >>>>> On 31.05.21 18:06, François Legal via Xenomai wrote:
> > >>>>>> Hello,
> > >>>>>>
> > >>>>>> I'm having trouble hunting mode switches on an arm embedded application.
> > >>>>>>
> > >>>>>> When I look at cat /proc/xenomai/sched/stat
> > >>>>>>
> > >>>>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
> > >>>>>>   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
> > >>>>>>   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
> > >>>>>>   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
> > >>>>>>   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
> > >>>>>>   1  901    110        218        203        0     00264044    0.0  APP_Main
> > >>>>>>   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
> > >>>>>>   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
> > >>>>>>   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
> > >>>>>>   0  914    7          62         71         0     00240042    0.0  TheadGO
> > >>>>>>   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
> > >>>>>>   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
> > >>>>>>   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
> > >>>>>>   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
> > >>>>>>   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
> > >>>>>>   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
> > >>>>>>   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
> > >>>>>>   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
> > >>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
> > >>>>>>   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> > >>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> > >>>>>>   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> > >>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> > >>>>>>
> > >>>>>> But when following the procedure in [1], I get
> > >>>>>> no slacker
> > >>>>>>
> > >>>>>> I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
> > >>>>>> What am I doing wrong.
> > >>>>>>
> > >>>>>
> > >>>>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
> > >>>>>
> > >>>>
> > >>>> Yes, it is.
> > >>>>
> > >>>>> Did you also try the in-app instrumentation via SIGDEBUG, maybe> >>>>> attaching gdb to the process to catch the unwanted cases?
> > >>>>>
> > >>>>
> > >>>> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
> > >>>> I'll try that tomorrow and report back to the list
> > >>>>
> > >>>
> > >>> So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.
> > >>>
> > >>> On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !
> > >>>
> > >>> Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.
> > >>>
> > >>> Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.
> > >>>
> > >>> I'm really confused now.
> > >>>
> > >>
> > >> Are you instrumenting only the part of the thread that has all
> > >> initialization done and until it starts to de-initialize again? You can
> > >> control when the mode switches should be reported and when not.
> > >>
> > >> Then, if you have a backtrace and that backtrace contain a valid
> > >> syscall, you should also find a valid point in your own application when
> > >> this was started (indirectly). Maybe some library call?
> > >>
> > >> Jan
> > >>
> > >
> > > So in the end, slackspot seems to work. I did not understand, by reading the manual, that slackspot/CONFIG_XENO_OPT_DEBUG_TRACE_RELAX also needed that each thread call pthread_setmode_np(0, PTHREAD_WARNSW, NULL);
> > >
> >
> > We are happy to take patches - also against the doc / the wiki.
> >
> > > So now, I've got a working backtrace, however, I'm still lost onto understanding what happens.
> > >
> > > I believe I might be having some troubles with the signals, but I can't seem to find an explanation.
> > > Maybe I should document more my app.
> > >
> > > ThreadSched is pending on a timer (with sigwaitinfo). That time triggers every 750µs. When waking up, this thread will post one or 2 semaphores, that will unlock other threads (of which ThreadExec which seem to be doing MSW more than once every two activations).
> > >
> > > I ended up modifying the app startup code because I was experiencing weirdness in the startup with debugger attached (some semaphores wouldn't unlock, ...). Anyhow, I ended up doing :
> > >
> > >     sigemptyset(&sigHandling.sa_mask);
> > >     sigHandling.sa_sigaction = sigdebug_handler;
> > >     sigHandling.sa_flags = SA_SIGINFO;
> > >     sigaction(SIGDEBUG, &sigHandling, NULL);
> > >
> > >     sigemptyset(&sigset);
> > >     sigaddset (&sigset, SIGALRM);
> > >     sigaddset (&sigset, SIGDEBUG);
> > >     sigprocmask (SIG_BLOCK, &sigset, NULL);
> > >     pthread_sigmask (SIG_BLOCK, &sigset, NULL);
> > >
> > > before creating my RT threads (of which ThreadSched), then doing
> > >
> > >         sigdelset(&sigset, SIGALRM);
> > >         sigprocmask (SIG_UNBLOCK, &sigset, NULL);
> > >         pthread_sigmask (SIG_UNBLOCK, &sigset, NULL);
> > >
> > >         signal(SIGINT, sighandler_exit);
> > >         signal(SIGTERM, sighandler_exit);
> > >         signal(SIGUSR1, sighandler_exit);
> > >
> > > After my RT threads are created.
> > >
> > > If I comment out all of these, and add the pthread_setmode_np some place when initialisation of ThreadExec is over, I end up with the program being forcibly stopped with the indication [1]+  CPU time limit exceeded
> > >
> > > So I put all those back in place, start again, then /proc/xenomai/debug/relax gets populated :
> > > 1115 75731 syscall ThreadExec
> > > 0x12720 /usr/lib/libcobalt.so.2.0.0
> > > 0x127b0 /usr/lib/libcobalt.so.2.0.0
> > > 0x2cd40 /usr/lib/libc-2.29.so
> > > .
> > > /usr/bin/CCN2019_DSP
> > > 1115 80299 syscall ThreadExec
> > > 0xd58a0 /usr/lib/libc-2.29.so
> > > .
> > >
> > > On this, slackspot gives :
> > > Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (107769 times):
> > > Caused by: syscall
> > >    #0  0x00000000000d589f prctl() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/misc/../sysdeps/unix/syscall-template.S:78
> > >
> > > Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (101892 times):
> > > Caused by: syscall
> > >    #0  0x000000000001271f ??? in [/usr/lib/libcobalt.so.2.0.0]
> > >    #1  0x00000000000127af ??? in [/usr/lib/libcobalt.so.2.0.0]
> > >    #2  0x000000000002cd3f __default_sa_restorer() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:77
> > >
> > > So know, I really don't know where to go from here !
> > >
> >
> > Could you recompile with debugging symbols for libcobalt and your
> > application and check where exactly the Linux syscall is issued and what
> > library call from your application relates to it? gdb may provide this
> > better.
> >
>
> Ok. I think I found it. I had put in some debug code in my semaphore wrappers to check if the timeout given was not before the current time, and thus I used pthread_getname to display a trace with the name of the thread. This was the main problem.
>

After checking again, that wasn't it. I can still see lots (about the same as before) MSW on that same thread. However, the sigdebug never gets called now !
By checking the other, I could see that socket() triggers a MSW which is surprising.


> > Jan
> >
> > --
> > Siemens AG, T RDA IOT
> > Corporate Competence Center Embedded Linux
>
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-06-03 18:39               ` François Legal
@ 2021-06-04  7:21                 ` Jan Kiszka
  2021-06-04 13:04                   ` François Legal
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kiszka @ 2021-06-04  7:21 UTC (permalink / raw)
  To: François Legal; +Cc: xenomai

On 03.06.21 20:39, François Legal wrote:
> Le Jeudi, Juin 03, 2021 10:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit: 
>  
>> Le Mercredi, Juin 02, 2021 12:59 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>  
>>> On 02.06.21 12:26, François Legal wrote:
>>>> Le Mardi, Juin 01, 2021 10:31 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>  
>>>>> On 01.06.21 09:47, François Legal wrote:
>>>>>> Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit: 
>>>>>>  
>>>>>>> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>  
>>>>>>>> On 31.05.21 18:06, François Legal via Xenomai wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I'm having trouble hunting mode switches on an arm embedded application.
>>>>>>>>>
>>>>>>>>> When I look at cat /proc/xenomai/sched/stat
>>>>>>>>>
>>>>>>>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
>>>>>>>>>   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
>>>>>>>>>   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
>>>>>>>>>   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
>>>>>>>>>   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
>>>>>>>>>   1  901    110        218        203        0     00264044    0.0  APP_Main
>>>>>>>>>   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
>>>>>>>>>   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
>>>>>>>>>   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
>>>>>>>>>   0  914    7          62         71         0     00240042    0.0  TheadGO
>>>>>>>>>   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
>>>>>>>>>   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
>>>>>>>>>   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
>>>>>>>>>   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
>>>>>>>>>   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
>>>>>>>>>   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
>>>>>>>>>   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
>>>>>>>>>   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
>>>>>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
>>>>>>>>>   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
>>>>>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
>>>>>>>>>   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
>>>>>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
>>>>>>>>>
>>>>>>>>> But when following the procedure in [1], I get 
>>>>>>>>> no slacker
>>>>>>>>>
>>>>>>>>> I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
>>>>>>>>> What am I doing wrong.
>>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, it is.
>>>>>>>
>>>>>>>> Did you also try the in-app instrumentation via SIGDEBUG, maybe> >>>>> attaching gdb to the process to catch the unwanted cases?
>>>>>>>>
>>>>>>>
>>>>>>> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
>>>>>>> I'll try that tomorrow and report back to the list 
>>>>>>>
>>>>>>
>>>>>> So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.
>>>>>>
>>>>>> On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !
>>>>>>
>>>>>> Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.
>>>>>>
>>>>>> Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.
>>>>>>
>>>>>> I'm really confused now.
>>>>>>
>>>>>
>>>>> Are you instrumenting only the part of the thread that has all
>>>>> initialization done and until it starts to de-initialize again? You can
>>>>> control when the mode switches should be reported and when not.
>>>>>
>>>>> Then, if you have a backtrace and that backtrace contain a valid
>>>>> syscall, you should also find a valid point in your own application when
>>>>> this was started (indirectly). Maybe some library call?
>>>>>
>>>>> Jan
>>>>>
>>>>
>>>> So in the end, slackspot seems to work. I did not understand, by reading the manual, that slackspot/CONFIG_XENO_OPT_DEBUG_TRACE_RELAX also needed that each thread call pthread_setmode_np(0, PTHREAD_WARNSW, NULL);
>>>>
>>>
>>> We are happy to take patches - also against the doc / the wiki.
>>>
>>>> So now, I've got a working backtrace, however, I'm still lost onto understanding what happens.
>>>>
>>>> I believe I might be having some troubles with the signals, but I can't seem to find an explanation.
>>>> Maybe I should document more my app.
>>>>
>>>> ThreadSched is pending on a timer (with sigwaitinfo). That time triggers every 750µs. When waking up, this thread will post one or 2 semaphores, that will unlock other threads (of which ThreadExec which seem to be doing MSW more than once every two activations).
>>>>
>>>> I ended up modifying the app startup code because I was experiencing weirdness in the startup with debugger attached (some semaphores wouldn't unlock, ...). Anyhow, I ended up doing :
>>>>
>>>>     sigemptyset(&sigHandling.sa_mask);
>>>>     sigHandling.sa_sigaction = sigdebug_handler;
>>>>     sigHandling.sa_flags = SA_SIGINFO;
>>>>     sigaction(SIGDEBUG, &sigHandling, NULL);
>>>>
>>>>     sigemptyset(&sigset);
>>>>     sigaddset (&sigset, SIGALRM);
>>>>     sigaddset (&sigset, SIGDEBUG);
>>>>     sigprocmask (SIG_BLOCK, &sigset, NULL);
>>>>     pthread_sigmask (SIG_BLOCK, &sigset, NULL);
>>>>
>>>> before creating my RT threads (of which ThreadSched), then doing
>>>>
>>>>         sigdelset(&sigset, SIGALRM);
>>>>         sigprocmask (SIG_UNBLOCK, &sigset, NULL);
>>>>         pthread_sigmask (SIG_UNBLOCK, &sigset, NULL);
>>>>
>>>>         signal(SIGINT, sighandler_exit);
>>>>         signal(SIGTERM, sighandler_exit);
>>>>         signal(SIGUSR1, sighandler_exit);
>>>>
>>>> After my RT threads are created.
>>>>
>>>> If I comment out all of these, and add the pthread_setmode_np some place when initialisation of ThreadExec is over, I end up with the program being forcibly stopped with the indication [1]+  CPU time limit exceeded
>>>>
>>>> So I put all those back in place, start again, then /proc/xenomai/debug/relax gets populated :
>>>> 1115 75731 syscall ThreadExec 
>>>> 0x12720 /usr/lib/libcobalt.so.2.0.0
>>>> 0x127b0 /usr/lib/libcobalt.so.2.0.0
>>>> 0x2cd40 /usr/lib/libc-2.29.so
>>>> .
>>>> /usr/bin/CCN2019_DSP
>>>> 1115 80299 syscall ThreadExec 
>>>> 0xd58a0 /usr/lib/libc-2.29.so
>>>> .
>>>>
>>>> On this, slackspot gives :
>>>> Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (107769 times):
>>>> Caused by: syscall
>>>>    #0  0x00000000000d589f prctl() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/misc/../sysdeps/unix/syscall-template.S:78
>>>>
>>>> Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (101892 times):
>>>> Caused by: syscall
>>>>    #0  0x000000000001271f ??? in [/usr/lib/libcobalt.so.2.0.0]
>>>>    #1  0x00000000000127af ??? in [/usr/lib/libcobalt.so.2.0.0]
>>>>    #2  0x000000000002cd3f __default_sa_restorer() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:77
>>>>
>>>> So know, I really don't know where to go from here !
>>>>
>>>
>>> Could you recompile with debugging symbols for libcobalt and your
>>> application and check where exactly the Linux syscall is issued and what
>>> library call from your application relates to it? gdb may provide this
>>> better.
>>>
>>
>> Ok. I think I found it. I had put in some debug code in my semaphore wrappers to check if the timeout given was not before the current time, and thus I used pthread_getname to display a trace with the name of the thread. This was the main problem.
>>
> 
> After checking again, that wasn't it. I can still see lots (about the same as before) MSW on that same thread. However, the sigdebug never gets called now !
> By checking the other, I could see that socket() triggers a MSW which is surprising.
> 

socket() is not a real-time service. File descriptor creation and
destruction - even for RTDM - will happen in the Linux domain.
Furthermore, socket() will fall back to Linux if the specified fd is not
valid for RTDM.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hunting mode switches
  2021-06-04  7:21                 ` Jan Kiszka
@ 2021-06-04 13:04                   ` François Legal
  0 siblings, 0 replies; 16+ messages in thread
From: François Legal @ 2021-06-04 13:04 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Le Vendredi, Juin 04, 2021 09:21 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:

> On 03.06.21 20:39, François Legal wrote:
> > Le Jeudi, Juin 03, 2021 10:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit:
> >
> >> Le Mercredi, Juin 02, 2021 12:59 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>
> >>> On 02.06.21 12:26, François Legal wrote:
> >>>> Le Mardi, Juin 01, 2021 10:31 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>>>
> >>>>> On 01.06.21 09:47, François Legal wrote:
> >>>>>> Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit:
> >>>>>>
> >>>>>>> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>>>>>>
> >>>>>>>> On 31.05.21 18:06, François Legal via Xenomai wrote:
> >>>>>>>>> Hello,
> >>>>>>>>>
> >>>>>>>>> I'm having trouble hunting mode switches on an arm embedded application.
> >>>>>>>>>
> >>>>>>>>> When I look at cat /proc/xenomai/sched/stat
> >>>>>>>>>
> >>>>>>>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
> >>>>>>>>>   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
> >>>>>>>>>   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
> >>>>>>>>>   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
> >>>>>>>>>   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
> >>>>>>>>>   1  901    110        218        203        0     00264044    0.0  APP_Main
> >>>>>>>>>   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
> >>>>>>>>>   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
> >>>>>>>>>   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
> >>>>>>>>>   0  914    7          62         71         0     00240042    0.0  TheadGO
> >>>>>>>>>   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
> >>>>>>>>>   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
> >>>>>>>>>   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
> >>>>>>>>>   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
> >>>>>>>>>   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
> >>>>>>>>>   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
> >>>>>>>>>   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
> >>>>>>>>>   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
> >>>>>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
> >>>>>>>>>   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> >>>>>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> >>>>>>>>>   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> >>>>>>>>>   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> >>>>>>>>>
> >>>>>>>>> But when following the procedure in [1], I get
> >>>>>>>>> no slacker
> >>>>>>>>>
> >>>>>>>>> I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
> >>>>>>>>> What am I doing wrong.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes, it is.
> >>>>>>>
> >>>>>>>> Did you also try the in-app instrumentation via SIGDEBUG, maybe> >>>>> attaching gdb to the process to catch the unwanted cases?
> >>>>>>>>
> >>>>>>>
> >>>>>>> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
> >>>>>>> I'll try that tomorrow and report back to the list
> >>>>>>>
> >>>>>>
> >>>>>> So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.
> >>>>>>
> >>>>>> On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !
> >>>>>>
> >>>>>> Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.
> >>>>>>
> >>>>>> Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.
> >>>>>>
> >>>>>> I'm really confused now.
> >>>>>>
> >>>>>
> >>>>> Are you instrumenting only the part of the thread that has all
> >>>>> initialization done and until it starts to de-initialize again? You can
> >>>>> control when the mode switches should be reported and when not.
> >>>>>
> >>>>> Then, if you have a backtrace and that backtrace contain a valid
> >>>>> syscall, you should also find a valid point in your own application when
> >>>>> this was started (indirectly). Maybe some library call?
> >>>>>
> >>>>> Jan
> >>>>>
> >>>>
> >>>> So in the end, slackspot seems to work. I did not understand, by reading the manual, that slackspot/CONFIG_XENO_OPT_DEBUG_TRACE_RELAX also needed that each thread call pthread_setmode_np(0, PTHREAD_WARNSW, NULL);
> >>>>
> >>>
> >>> We are happy to take patches - also against the doc / the wiki.
> >>>
> >>>> So now, I've got a working backtrace, however, I'm still lost onto understanding what happens.
> >>>>
> >>>> I believe I might be having some troubles with the signals, but I can't seem to find an explanation.
> >>>> Maybe I should document more my app.
> >>>>
> >>>> ThreadSched is pending on a timer (with sigwaitinfo). That time triggers every 750µs. When waking up, this thread will post one or 2 semaphores, that will unlock other threads (of which ThreadExec which seem to be doing MSW more than once every two activations).
> >>>>
> >>>> I ended up modifying the app startup code because I was experiencing weirdness in the startup with debugger attached (some semaphores wouldn't unlock, ...). Anyhow, I ended up doing :
> >>>>
> >>>>     sigemptyset(&sigHandling.sa_mask);
> >>>>     sigHandling.sa_sigaction = sigdebug_handler;
> >>>>     sigHandling.sa_flags = SA_SIGINFO;
> >>>>     sigaction(SIGDEBUG, &sigHandling, NULL);
> >>>>
> >>>>     sigemptyset(&sigset);
> >>>>     sigaddset (&sigset, SIGALRM);
> >>>>     sigaddset (&sigset, SIGDEBUG);
> >>>>     sigprocmask (SIG_BLOCK, &sigset, NULL);
> >>>>     pthread_sigmask (SIG_BLOCK, &sigset, NULL);
> >>>>
> >>>> before creating my RT threads (of which ThreadSched), then doing
> >>>>
> >>>>         sigdelset(&sigset, SIGALRM);
> >>>>         sigprocmask (SIG_UNBLOCK, &sigset, NULL);
> >>>>         pthread_sigmask (SIG_UNBLOCK, &sigset, NULL);
> >>>>
> >>>>         signal(SIGINT, sighandler_exit);
> >>>>         signal(SIGTERM, sighandler_exit);
> >>>>         signal(SIGUSR1, sighandler_exit);
> >>>>
> >>>> After my RT threads are created.
> >>>>
> >>>> If I comment out all of these, and add the pthread_setmode_np some place when initialisation of ThreadExec is over, I end up with the program being forcibly stopped with the indication [1]+  CPU time limit exceeded
> >>>>
> >>>> So I put all those back in place, start again, then /proc/xenomai/debug/relax gets populated :
> >>>> 1115 75731 syscall ThreadExec
> >>>> 0x12720 /usr/lib/libcobalt.so.2.0.0
> >>>> 0x127b0 /usr/lib/libcobalt.so.2.0.0
> >>>> 0x2cd40 /usr/lib/libc-2.29.so
> >>>> .
> >>>> /usr/bin/CCN2019_DSP
> >>>> 1115 80299 syscall ThreadExec
> >>>> 0xd58a0 /usr/lib/libc-2.29.so
> >>>> .
> >>>>
> >>>> On this, slackspot gives :
> >>>> Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (107769 times):
> >>>> Caused by: syscall
> >>>>    #0  0x00000000000d589f prctl() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/misc/../sysdeps/unix/syscall-template.S:78
> >>>>
> >>>> Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (101892 times):
> >>>> Caused by: syscall
> >>>>    #0  0x000000000001271f ??? in [/usr/lib/libcobalt.so.2.0.0]
> >>>>    #1  0x00000000000127af ??? in [/usr/lib/libcobalt.so.2.0.0]
> >>>>    #2  0x000000000002cd3f __default_sa_restorer() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:77
> >>>>
> >>>> So know, I really don't know where to go from here !
> >>>>
> >>>
> >>> Could you recompile with debugging symbols for libcobalt and your
> >>> application and check where exactly the Linux syscall is issued and what
> >>> library call from your application relates to it? gdb may provide this
> >>> better.
> >>>
> >>
> >> Ok. I think I found it. I had put in some debug code in my semaphore wrappers to check if the timeout given was not before the current time, and thus I used pthread_getname to display a trace with the name of the thread. This was the main problem.
> >>
> >
> > After checking again, that wasn't it. I can still see lots (about the same as before) MSW on that same thread. However, the sigdebug never gets called now !
> > By checking the other, I could see that socket() triggers a MSW which is surprising.
> >
>
> socket() is not a real-time service. File descriptor creation and
> destruction - even for RTDM - will happen in the Linux domain.
> Furthermore, socket() will fall back to Linux if the specified fd is not
> valid for RTDM.
>

Ok. So I may have been doing something wrong yesterday. I'm back to normal, and I had another libc clock_gettime call in the ThreadExec that was causing the MSW. I no longer experience MSW in the app threads, except for the main one, that loops on a 1s nanosleep. I get an MSW about every one seconds.

I decided to break in the kernel on xnthread_relax to try to get a glympse of what happens.
So here is the kernel bt when the break is taken :
    0  0x80197dd8 xnthread_relax(): thread.c, line 2074
    1  0x801b5108 ipipe_syscall_hook()+592: syscall.c, line 765
    2  0x801b5108 ipipe_syscall_hook()+592: syscall.c, line 765
    3  0x80187d08 __ipipe_notify_syscall()+116: core.c, line 1008
    4  0x801089f0 pipeline_syscall()+8: entry-common.S, line 319
    5  0x801089f0 pipeline_syscall()+8: entry-common.S, line 319

and the application main bt :
    0  0x76f9f050 __cobalt_clock_nanosleep()+72: clock.c, line 315
    1  0x76f9f084 __cobalt_nanosleep()+24: clock.c, line 354
    2  0x147c0 main()+392: ./src/Runtime/RunApplication.c, line 1147
    3  0x14adc xenomai_main()+76: bootstrap.c, line 96
    4  0x76d2d7fc
    5  unknown-pc

Can someone explain what is causing the MSW ?  Is it that I'm not supposed to call nanosleep or cobalt services from the main thread ?

François

> Jan
>
> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Hunting mode switches
@ 2021-05-31 16:06 François Legal
  0 siblings, 0 replies; 16+ messages in thread
From: François Legal @ 2021-05-31 16:06 UTC (permalink / raw)
  To: xenomai

Hello,

I'm having trouble hunting mode switches on an arm embedded application.

When I look at cat /proc/xenomai/sched/stat

CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
  0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
  1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
  1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
  0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
  1  901    110        218        203        0     00264044    0.0  APP_Main
  0  911    1          1216       3524       0     00240046    0.0  ThreadPar
  0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
  0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
  0  914    7          62         71         0     00240042    0.0  TheadGO
  0  915    3          647        1190       0     00248046    0.0  ThreadDbg
  0  916    1          2024       3087       0     00248046    0.0  ThreadCt
  0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
  0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
  0  919    1          161554     479305     0     00248042    1.4  ThreadSched
  0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
  1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
  0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
  1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
  0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
  1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
  0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
  1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]

But when following the procedure in [1], I get
no slacker

I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
What am I doing wrong.

Thanks in advance

François

[1] : https://source.denx.de/Xenomai/xenomai/-/wikis/Finding_Spurious_Relaxes



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-06-04 13:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-31 16:06 Hunting mode switches François Legal
2021-05-31 16:14 ` Jan Kiszka
2021-05-31 18:57   ` François Legal
2021-06-01  7:47     ` François Legal
2021-06-01  8:31       ` Jan Kiszka
2021-06-02 10:26         ` François Legal
2021-06-02 10:59           ` Jan Kiszka
2021-06-02 15:06             ` François Legal
2021-06-02 15:09               ` François Legal
2021-06-02 15:13                 ` Jan Kiszka
2021-06-03  8:57             ` François Legal
2021-06-03 18:39               ` François Legal
2021-06-04  7:21                 ` Jan Kiszka
2021-06-04 13:04                   ` François Legal
2021-06-01  8:46       ` Philippe Gerum
  -- strict thread matches above, loose matches on Subject: below --
2021-05-31 16:06 François Legal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.