All of lore.kernel.org
 help / color / mirror / Atom feed
From: "François Legal" <devel@thom.fr.eu.org>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: xenomai@xenomai.org
Subject: Re: Hunting mode switches
Date: Wed, 02 Jun 2021 12:26:06 +0200	[thread overview]
Message-ID: <1202-60b75c80-10b-1ab27320@266157148> (raw)
In-Reply-To: <3bc0c0d9-c283-13f8-5c42-fcf8229a3d90@siemens.com>

Le Mardi, Juin 01, 2021 10:31 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:

> On 01.06.21 09:47, François Legal wrote:
> > Le Lundi, Mai 31, 2021 20:57 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit:
> >
> >> Le Lundi, Mai 31, 2021 18:14 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>
> >>> On 31.05.21 18:06, François Legal via Xenomai wrote:
> >>>> Hello,
> >>>>
> >>>> I'm having trouble hunting mode switches on an arm embedded application.
> >>>>
> >>>> When I look at cat /proc/xenomai/sched/stat
> >>>>
> >>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
> >>>>   0  0      0          476986     0          0     00218000   84.5  [ROOT/0]
> >>>>   1  0      0          137743     0          0     00218000   99.3  [ROOT/1]
> >>>>   1  667    0          137550     0          0     00200042    0.7  [rtnet-stack]
> >>>>   0  669    0          2          0          0     00220042    0.0  [rtnet-rtpc]
> >>>>   1  901    110        218        203        0     00264044    0.0  APP_Main
> >>>>   0  911    1          1216       3524       0     00240046    0.0  ThreadPar
> >>>>   0  912    3          56683      3531       0     00240046    0.2  ThreadMessage
> >>>>   0  913    309702     470522     2152301    0     00240046    8.9  ThreadExec
> >>>>   0  914    7          62         71         0     00240042    0.0  TheadGO
> >>>>   0  915    3          647        1190       0     00248046    0.0  ThreadDbg
> >>>>   0  916    1          2024       3087       0     00248046    0.0  ThreadCt
> >>>>   0  917    1          35801      320489     0     00248046    0.8  ThreadSupInv
> >>>>   0  918    1          1231       3596       0     00248046    0.0  ThreadMessage2
> >>>>   0  919    1          161554     479305     0     00248042    1.4  ThreadSched
> >>>>   0  0      0          247850     0          0     00000000    0.4  [IRQ18: [timer]]
> >>>>   1  0      0          15419      0          0     00000000    0.0  [IRQ18: [timer]]
> >>>>   0  0      0          137548     0          0     00000000    1.5  [IRQ31: rteth0]
> >>>>   1  0      0          0          0          0     00000000    0.0  [IRQ31: rteth0]
> >>>>   0  0      0          1          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> >>>>   1  0      0          0          0          0     00000000    0.0  [IRQ118: rteth1-tx]
> >>>>   0  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> >>>>   1  0      0          0          0          0     00000000    0.0  [IRQ119: rteth1-rx]
> >>>>
> >>>> But when following the procedure in [1], I get
> >>>> no slacker
> >>>>
> >>>> I can however see that Thread 913 seem to switch to secondary more than once out of 2 cycles !
> >>>> What am I doing wrong.
> >>>>
> >>>
> >>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX is on?
> >>>
> >>
> >> Yes, it is.
> >>
> >>> Did you also try the in-app instrumentation via SIGDEBUG, maybe
> >>> attaching gdb to the process to catch the unwanted cases?
> >>>
> >>
> >> No, not yet. It seems mode switches happen hundreds of time every seconds, so I was willing to try the slackspot.
> >> I'll try that tomorrow and report back to the list
> >>
> >
> > So trying the manual way, there seem to be something happening, but I could not tell exactly what, and I not sure how to track this.
> >
> > On thread ThreadExec, I get execution halted with SIGXCPU, and a backtrace indicating getpid(), then __pthread_kill() which I'm 100% sure I never call !
> >
> > Moreover, when being halted at that point, all threads are not created, and all of which are created seem to have done 3 or 4 MSW each, and without triggering any break.
> >
> > Moreover, in the stats capture below, the APP_Main (the main program entry), is looping on a 1s nanosleep, and it seems the MSW number increases once per second.
> >
> > I'm really confused now.
> >
>
> Are you instrumenting only the part of the thread that has all
> initialization done and until it starts to de-initialize again? You can
> control when the mode switches should be reported and when not.
>
> Then, if you have a backtrace and that backtrace contain a valid
> syscall, you should also find a valid point in your own application when
> this was started (indirectly). Maybe some library call?
>
> Jan
>

So in the end, slackspot seems to work. I did not understand, by reading the manual, that slackspot/CONFIG_XENO_OPT_DEBUG_TRACE_RELAX also needed that each thread call pthread_setmode_np(0, PTHREAD_WARNSW, NULL);

So now, I've got a working backtrace, however, I'm still lost onto understanding what happens.

I believe I might be having some troubles with the signals, but I can't seem to find an explanation.
Maybe I should document more my app.

ThreadSched is pending on a timer (with sigwaitinfo). That time triggers every 750µs. When waking up, this thread will post one or 2 semaphores, that will unlock other threads (of which ThreadExec which seem to be doing MSW more than once every two activations).

I ended up modifying the app startup code because I was experiencing weirdness in the startup with debugger attached (some semaphores wouldn't unlock, ...). Anyhow, I ended up doing :

    sigemptyset(&sigHandling.sa_mask);
    sigHandling.sa_sigaction = sigdebug_handler;
    sigHandling.sa_flags = SA_SIGINFO;
    sigaction(SIGDEBUG, &sigHandling, NULL);

    sigemptyset(&sigset);
    sigaddset (&sigset, SIGALRM);
    sigaddset (&sigset, SIGDEBUG);
    sigprocmask (SIG_BLOCK, &sigset, NULL);
    pthread_sigmask (SIG_BLOCK, &sigset, NULL);

before creating my RT threads (of which ThreadSched), then doing

        sigdelset(&sigset, SIGALRM);
        sigprocmask (SIG_UNBLOCK, &sigset, NULL);
        pthread_sigmask (SIG_UNBLOCK, &sigset, NULL);

        signal(SIGINT, sighandler_exit);
        signal(SIGTERM, sighandler_exit);
        signal(SIGUSR1, sighandler_exit);

After my RT threads are created.

If I comment out all of these, and add the pthread_setmode_np some place when initialisation of ThreadExec is over, I end up with the program being forcibly stopped with the indication [1]+  CPU time limit exceeded

So I put all those back in place, start again, then /proc/xenomai/debug/relax gets populated :
1115 75731 syscall ThreadExec
0x12720 /usr/lib/libcobalt.so.2.0.0
0x127b0 /usr/lib/libcobalt.so.2.0.0
0x2cd40 /usr/lib/libc-2.29.so
.
/usr/bin/CCN2019_DSP
1115 80299 syscall ThreadExec
0xd58a0 /usr/lib/libc-2.29.so
.

On this, slackspot gives :
Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (107769 times):
Caused by: syscall
   #0  0x00000000000d589f prctl() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/misc/../sysdeps/unix/syscall-template.S:78

Thread[1115] "Module d'Execution des Blocs" started by /usr/bin/CCN2019_DSP (101892 times):
Caused by: syscall
   #0  0x000000000001271f ??? in [/usr/lib/libcobalt.so.2.0.0]
   #1  0x00000000000127af ??? in [/usr/lib/libcobalt.so.2.0.0]
   #2  0x000000000002cd3f __default_sa_restorer() in /home/dev/crosstool/arm-cortexa9_neon-linux-gnueabihf/.build/src/glibc-2.29/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:77

So know, I really don't know where to go from here !

Thanks in advance

François

> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux



  reply	other threads:[~2021-06-02 10:26 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-31 16:06 Hunting mode switches François Legal
2021-05-31 16:14 ` Jan Kiszka
2021-05-31 18:57   ` François Legal
2021-06-01  7:47     ` François Legal
2021-06-01  8:31       ` Jan Kiszka
2021-06-02 10:26         ` François Legal [this message]
2021-06-02 10:59           ` Jan Kiszka
2021-06-02 15:06             ` François Legal
2021-06-02 15:09               ` François Legal
2021-06-02 15:13                 ` Jan Kiszka
2021-06-03  8:57             ` François Legal
2021-06-03 18:39               ` François Legal
2021-06-04  7:21                 ` Jan Kiszka
2021-06-04 13:04                   ` François Legal
2021-06-01  8:46       ` Philippe Gerum
  -- strict thread matches above, loose matches on Subject: below --
2021-05-31 16:06 François Legal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1202-60b75c80-10b-1ab27320@266157148 \
    --to=devel@thom.fr.eu.org \
    --cc=jan.kiszka@siemens.com \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.