All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: More details on health monitoring notifications
@ 2022-07-07 20:02 Russell Johnson
  2022-07-14  8:19 ` Philippe Gerum
  0 siblings, 1 reply; 6+ messages in thread
From: Russell Johnson @ 2022-07-07 20:02 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai


[-- Attachment #1.1: Type: text/plain, Size: 847 bytes --]

Correct, it is running on x86. I am not sure what would be causing a page 

fault in the EVL threads as I figured the mlockall call by evl_init would
take 

care of preventing that from happening.

 

I have tried to run the app with gdb (used to work fine pre-EVL), but now I 

get all sorts off exceptions and error codes when trying to run in the 

debugger. I quite frequently see EPERM issues with evl mutex unlock/lock
(even 

though it works fine when not run through a debugger). Also, when I swap the


thread mode flag from T_HMOBS to T_HMSIG and run in the debugger, I get a 

"11776 CPU time limit exceeded (core dumped)" error on the create call for
an 

evl event in one of my threads. I am not sure why I am seeing such different


behavior in the debugger versus the normal run when it comes to the evl API 

calls. Any ideas?

 


[-- Attachment #1.2: Type: text/html, Size: 3067 bytes --]

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6759 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: More details on health monitoring notifications
  2022-07-07 20:02 More details on health monitoring notifications Russell Johnson
@ 2022-07-14  8:19 ` Philippe Gerum
  0 siblings, 0 replies; 6+ messages in thread
From: Philippe Gerum @ 2022-07-14  8:19 UTC (permalink / raw)
  To: Russell Johnson; +Cc: xenomai


Russell Johnson <russell.johnson@kratosdefense.com> writes:

> [[S/MIME Signed Part:Undecided]]
> Correct, it is running on x86. I am not sure what would be causing a page 
>
> fault in the EVL threads as I figured the mlockall call by evl_init would take 
>
> care of preventing that from happening.
>

We may have a clue by knowing which type of memory is being referred
to.

- Did you trace the offending access using gdb, is it regular memory?

- Does your application fork() in any way prior to running into this
  issue?

>  
>
> I have tried to run the app with gdb (used to work fine pre-EVL), but now I 
>

Which previous environment was this? Xenomai 3.x + I-pipe?

> get all sorts off exceptions and error codes when trying to run in the 
>
> debugger. I quite frequently see EPERM issues with evl mutex unlock/lock (even 
>
> though it works fine when not run through a debugger). Also, when I swap the 
>
> thread mode flag from T_HMOBS to T_HMSIG and run in the debugger, I get a 
>
> "11776 CPU time limit exceeded (core dumped)" error on the create call for an 
>
> evl event in one of my threads.
> I am not sure why I am seeing such different 
>
> behavior in the debugger versus the normal run when it comes to the evl API 
>
> calls. Any ideas?
>

SIGXCPU is an alias of SIGDEBUG, the signal the EVL core uses to warn
you about something wrong in general. e.g. if CONFIG_EVL_DEBUG_WOLI is
set in the kernel configuration and the application is doing something
wrong wrt mutex-based locking patterns, the core would notify it using
SIGDEBUG. The causes are described for the T_WOLI bit at [1]. The core
might also issue SIGDEBUG when the watchdog triggers for the current
thread (CONFIG_EVL_WATCHDOG).

[1] https://evlproject.org/core/user-api/thread/#health-monitoring

-- 
Philippe.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: More details on health monitoring notifications
  2022-07-06 14:57   ` Philippe Gerum
@ 2022-07-06 19:05     ` Leonid Gasheev via Xenomai
  0 siblings, 0 replies; 6+ messages in thread
From: Leonid Gasheev via Xenomai @ 2022-07-06 19:05 UTC (permalink / raw)
  To: xenomai

06.07.2022 17:57, Philippe Gerum пишет:
> 
> Russell Johnson <russell.johnson@kratosdefense.com> writes:
> 
>> [[S/MIME Signed Part:Undecided]]
>> I also notice these lines in “dmesg” output from the EVL core. Can I use the excpt or user_pc to get any more information about what is causing these in-band switches?
>>
> 
> Yes. Exception #14 on x86 is 'page fault' event, which seems to be the
> architecture this code is running on. User_pc is the PC value in the
> virtual address space of the process when it causes the trap in this
> case. This is a - possibly legit - memory access which requires a
> so-called 'major fault' to be taken.
> 
> GDB may help figuring out the location of this code, in two ways:
> 
> - switching the HM notifier to 'signal' mode for your threads, using
>    T_HMSIG, you only need to run the app over GDB: the core will send it
>    a SIGDEBUG event, which GDB will trap. Using the GDB 'backtrace/bt'
>    command once the app is in break state after receipt should display a
>    backtrace to the offending code (IOW the EVL core makes sure that your
>    app receives SIGDEBUG immediately on top of the code causing the
>    trap).
> 
> - using the GDB 'list' command to list the code at the PC value reported
>    by the kernel should work. You just need to start the application
>    first: unless you have a complex dlopen-based scheme for running code
>    plugins, running until a breakpoint is taken in main() should be
>    enough before you can issue 'list *<PC_value>'.
> 
>>   
>>
>> [ 7301.352255] EVL: thread_1 switching in-band [pid=6319, excpt=14, user_pc=0x7f54a7877fa6]
>>
>> [ 7301.352689] EVL: thread_2 switching in-band [pid=6285, excpt=14, user_pc=0x7f54a7877fa6]
>>
>>   
>>
>> From: Russell Johnson
>> Sent: Friday, July 1, 2022 11:19 AM
>> To: xenomai@lists.linux.dev
>> Subject: More details on health monitoring notifications
>>
>>   
>>
>> Hello,
>>
>>   
>>
>> I have gotten a health monitoring thread going that tracks all the EVL threads running in my app. The goal was to try and figure out what is causing occasional in-band switches in some of my EVL threads. I am seeing the notifications now that give me a little more insight, and the two diagnostic codes I am seeing are 2 (switched inband due to syscall) and 3 (switched inband due to fault). I am finding these to be quite hard to pinpoint
>> specifically throughout my code. Is there any way to get more information into what exactly is causing these notifications from the EVL core? It would be great to have a callstack of some kind, but I am not sure if that is possible. Any insight would be helpful.
>>
>>   
>>
>> Thanks,
>>
>>   
>>
>> Russell
>>
>> [[End of S/MIME Signed Part]]
> 
> 

Hello, Philippe, could you make a small comment on the following?

The OS made a start and I executed the following commands:
leo@orangepipc2:~$ uname -a
Linux orangepipc2 5.15.52-evl.2-sunxi64 #22.07.sl SMP IRQPIPE Mon Jul 4 
09:36:12 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

leo@orangepipc2:~$ dmesg | grep EVL
[    1.200014] IRQ pipeline: high-priority EVL stage added.
[    1.203250] EVL: core started [DEBUG]

leo@orangepipc2:~$ sudo evl test
basic-xbuf: OK
....
All test exit: OK

leo@orangepipc2:~$ dmesg | grep EVL
[    1.200014] IRQ pipeline: high-priority EVL stage added.
[    1.203250] EVL: core started [DEBUG]
[  413.466523] EVL: fault:1540 switching in-band [pid=1540, excpt=0, 
__arch_copy_to_user+0x84/0x218]
[  413.466757] EVL: fault:1540 resuming out-of-band [pid=1540, excpt=0, 
__entry_tramp_text_end+0x13e8/0x11000]
[  413.466793] EVL: fault:1540 switching in-band [pid=1540, excpt=0, 
__entry_tramp_text_end+0x13f4/0x11000]
[  413.466847] EVL: fault:1540 resuming out-of-band [pid=1540, excpt=0, 
__entry_tramp_text_end+0x13fc/0x11000]
[  413.466874] EVL: fault:1540 switching in-band [pid=1540, excpt=0, 
user_pc=0xaaaacc4f0cb8]

-- 
Leonid Gasheev

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: More details on health monitoring notifications
  2022-07-05 22:25 ` Russell Johnson
@ 2022-07-06 14:57   ` Philippe Gerum
  2022-07-06 19:05     ` Leonid Gasheev via Xenomai
  0 siblings, 1 reply; 6+ messages in thread
From: Philippe Gerum @ 2022-07-06 14:57 UTC (permalink / raw)
  To: Russell Johnson; +Cc: xenomai


Russell Johnson <russell.johnson@kratosdefense.com> writes:

> [[S/MIME Signed Part:Undecided]]
> I also notice these lines in “dmesg” output from the EVL core. Can I use the excpt or user_pc to get any more information about what is causing these in-band switches?
>

Yes. Exception #14 on x86 is 'page fault' event, which seems to be the
architecture this code is running on. User_pc is the PC value in the
virtual address space of the process when it causes the trap in this
case. This is a - possibly legit - memory access which requires a
so-called 'major fault' to be taken.

GDB may help figuring out the location of this code, in two ways:

- switching the HM notifier to 'signal' mode for your threads, using
  T_HMSIG, you only need to run the app over GDB: the core will send it
  a SIGDEBUG event, which GDB will trap. Using the GDB 'backtrace/bt'
  command once the app is in break state after receipt should display a
  backtrace to the offending code (IOW the EVL core makes sure that your
  app receives SIGDEBUG immediately on top of the code causing the
  trap).

- using the GDB 'list' command to list the code at the PC value reported
  by the kernel should work. You just need to start the application
  first: unless you have a complex dlopen-based scheme for running code
  plugins, running until a breakpoint is taken in main() should be
  enough before you can issue 'list *<PC_value>'.

>  
>
> [ 7301.352255] EVL: thread_1 switching in-band [pid=6319, excpt=14, user_pc=0x7f54a7877fa6]
>
> [ 7301.352689] EVL: thread_2 switching in-band [pid=6285, excpt=14, user_pc=0x7f54a7877fa6]
>
>  
>
> From: Russell Johnson 
> Sent: Friday, July 1, 2022 11:19 AM
> To: xenomai@lists.linux.dev
> Subject: More details on health monitoring notifications
>
>  
>
> Hello,
>
>  
>
> I have gotten a health monitoring thread going that tracks all the EVL threads running in my app. The goal was to try and figure out what is causing occasional in-band switches in some of my EVL threads. I am seeing the notifications now that give me a little more insight, and the two diagnostic codes I am seeing are 2 (switched inband due to syscall) and 3 (switched inband due to fault). I am finding these to be quite hard to pinpoint
> specifically throughout my code. Is there any way to get more information into what exactly is causing these notifications from the EVL core? It would be great to have a callstack of some kind, but I am not sure if that is possible. Any insight would be helpful.
>
>  
>
> Thanks,
>
>  
>
> Russell
>
> [[End of S/MIME Signed Part]]


-- 
Philippe.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* More details on health monitoring notifications
  2022-07-01 17:19 Russell Johnson
@ 2022-07-05 22:25 ` Russell Johnson
  2022-07-06 14:57   ` Philippe Gerum
  0 siblings, 1 reply; 6+ messages in thread
From: Russell Johnson @ 2022-07-05 22:25 UTC (permalink / raw)
  To: xenomai


[-- Attachment #1.1: Type: text/plain, Size: 1248 bytes --]

I also notice these lines in "dmesg" output from the EVL core. Can I use the
excpt or user_pc to get any more information about what is causing these
in-band switches?

 

[ 7301.352255] EVL: thread_1 switching in-band [pid=6319, excpt=14,
user_pc=0x7f54a7877fa6]

[ 7301.352689] EVL: thread_2 switching in-band [pid=6285, excpt=14,
user_pc=0x7f54a7877fa6]

 

From: Russell Johnson 
Sent: Friday, July 1, 2022 11:19 AM
To: xenomai@lists.linux.dev
Subject: More details on health monitoring notifications

 

Hello,

 

I have gotten a health monitoring thread going that tracks all the EVL
threads running in my app. The goal was to try and figure out what is
causing occasional in-band switches in some of my EVL threads. I am seeing
the notifications now that give me a little more insight, and the two
diagnostic codes I am seeing are 2 (switched inband due to syscall) and 3
(switched inband due to fault). I am finding these to be quite hard to
pinpoint specifically throughout my code. Is there any way to get more
information into what exactly is causing these notifications from the EVL
core? It would be great to have a callstack of some kind, but I am not sure
if that is possible. Any insight would be helpful.

 

Thanks,

 

Russell


[-- Attachment #1.2: Type: text/html, Size: 3233 bytes --]

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6759 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* More details on health monitoring notifications
@ 2022-07-01 17:19 Russell Johnson
  2022-07-05 22:25 ` Russell Johnson
  0 siblings, 1 reply; 6+ messages in thread
From: Russell Johnson @ 2022-07-01 17:19 UTC (permalink / raw)
  To: xenomai


[-- Attachment #1.1: Type: text/plain, Size: 739 bytes --]

Hello,

 

I have gotten a health monitoring thread going that tracks all the EVL
threads running in my app. The goal was to try and figure out what is
causing occasional in-band switches in some of my EVL threads. I am seeing
the notifications now that give me a little more insight, and the two
diagnostic codes I am seeing are 2 (switched inband due to syscall) and 3
(switched inband due to fault). I am finding these to be quite hard to
pinpoint specifically throughout my code. Is there any way to get more
information into what exactly is causing these notifications from the EVL
core? It would be great to have a callstack of some kind, but I am not sure
if that is possible. Any insight would be helpful.

 

Thanks,

 

Russell


[-- Attachment #1.2: Type: text/html, Size: 2357 bytes --]

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6759 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-14  8:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-07 20:02 More details on health monitoring notifications Russell Johnson
2022-07-14  8:19 ` Philippe Gerum
  -- strict thread matches above, loose matches on Subject: below --
2022-07-01 17:19 Russell Johnson
2022-07-05 22:25 ` Russell Johnson
2022-07-06 14:57   ` Philippe Gerum
2022-07-06 19:05     ` Leonid Gasheev via Xenomai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.