* Xenomai 3: Wierd problem resuming task
@ 2021-07-16 9:45 Mauro S.
2021-07-16 9:47 ` Jan Kiszka
2021-07-16 12:06 ` Philippe Gerum
0 siblings, 2 replies; 8+ messages in thread
From: Mauro S. @ 2021-07-16 9:45 UTC (permalink / raw)
To: xenomai
Hi,
I'm using Xenomai3 (master branch, commit
bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
coniguration. SMI workaround is enabled and all latency tests are good.
I'm facing with a very weird problem in my application. I have some
tasks with priority < 90 that call rt_task_suspend() on themselves.
Then, I have a task with priority 99 that resumes all other tasks with
rt_task_resume(), when they are suspended.
Sometimes a task does not get resumed.
In /proc/xenomai/sched/stat I have this status for my suspendend task:
CPU PID MSW CSW XSC PF STAT %CPU NAME
1 620 3 8 13 0 00048041 0.0 t12
Analizing the scenario attaching gdb to the application, I observe that
the not-resuming task has this backtrace:
#0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
100 signal.c: No such file or directory.
(gdb) bt
#0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
#1 0x00007f0ec14eb379 in threadobj_suspend
(thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
#2 0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
task.c:1154
that seems to me OK. If I understood correctly, it is locked in its
SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
Then, I placed some breakpoints where rt_task_resume() were called, and
in rt_task_resume() itself. I set tcb->suspends=1 with GDB and followed
the subsequent call of threadobj_resume(). Then, I placed some
breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED bit in
thobj->status, and I observed that __RT(kill(thobj->pid, SIGRESM)) got
called, with a retval 0. thobj->pid has the right value.
But the suspended task does not get resumed.
Any idea/suggestion?
Thanks in advance, regards
--
Mauro
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xenomai 3: Wierd problem resuming task
2021-07-16 9:45 Xenomai 3: Wierd problem resuming task Mauro S.
@ 2021-07-16 9:47 ` Jan Kiszka
2021-07-16 11:32 ` Mauro S.
2021-07-16 12:06 ` Philippe Gerum
1 sibling, 1 reply; 8+ messages in thread
From: Jan Kiszka @ 2021-07-16 9:47 UTC (permalink / raw)
To: Mauro S., xenomai
On 16.07.21 11:45, Mauro S. via Xenomai wrote:
> Hi,
>
> I'm using Xenomai3 (master branch, commit
> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
> coniguration. SMI workaround is enabled and all latency tests are good.
>
> I'm facing with a very weird problem in my application. I have some
> tasks with priority < 90 that call rt_task_suspend() on themselves.
> Then, I have a task with priority 99 that resumes all other tasks with
> rt_task_resume(), when they are suspended.
>
> Sometimes a task does not get resumed.
>
> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>
> CPU PID MSW CSW XSC PF STAT %CPU NAME
> 1 620 3 8 13 0 00048041 0.0 t12
>
> Analizing the scenario attaching gdb to the application, I observe that
> the not-resuming task has this backtrace:
>
> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
> 100 signal.c: No such file or directory.
> (gdb) bt
> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
> #1 0x00007f0ec14eb379 in threadobj_suspend
> (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
> #2 0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
> task.c:1154
>
> that seems to me OK. If I understood correctly, it is locked in its
> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>
> Then, I placed some breakpoints where rt_task_resume() were called, and
> in rt_task_resume() itself. I set tcb->suspends=1 with GDB and followed
> the subsequent call of threadobj_resume(). Then, I placed some
> breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED bit in
> thobj->status, and I observed that __RT(kill(thobj->pid, SIGRESM)) got
> called, with a retval 0. thobj->pid has the right value.
>
> But the suspended task does not get resumed.
>
> Any idea/suggestion?
>
Can you derive a sharable simple test case from this? That will ensure
we are not misinterpreting what you code actually does.
Jan
--
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xenomai 3: Wierd problem resuming task
2021-07-16 9:47 ` Jan Kiszka
@ 2021-07-16 11:32 ` Mauro S.
0 siblings, 0 replies; 8+ messages in thread
From: Mauro S. @ 2021-07-16 11:32 UTC (permalink / raw)
To: Jan Kiszka, xenomai
Il 16/07/21 11:47, Jan Kiszka ha scritto:
> On 16.07.21 11:45, Mauro S. via Xenomai wrote:
>> Hi,
>>
>> I'm using Xenomai3 (master branch, commit
>> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
>> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
>> coniguration. SMI workaround is enabled and all latency tests are good.
>>
>> I'm facing with a very weird problem in my application. I have some
>> tasks with priority < 90 that call rt_task_suspend() on themselves.
>> Then, I have a task with priority 99 that resumes all other tasks with
>> rt_task_resume(), when they are suspended.
>>
>> Sometimes a task does not get resumed.
>>
>> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>>
>> CPU PID MSW CSW XSC PF STAT %CPU NAME
>> 1 620 3 8 13 0 00048041 0.0 t12
>>
>> Analizing the scenario attaching gdb to the application, I observe that
>> the not-resuming task has this backtrace:
>>
>> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>> 100 signal.c: No such file or directory.
>> (gdb) bt
>> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>> #1 0x00007f0ec14eb379 in threadobj_suspend
>> (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
>> #2 0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
>> task.c:1154
>>
>> that seems to me OK. If I understood correctly, it is locked in its
>> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>>
>> Then, I placed some breakpoints where rt_task_resume() were called, and
>> in rt_task_resume() itself. I set tcb->suspends=1 with GDB and followed
>> the subsequent call of threadobj_resume(). Then, I placed some
>> breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED bit in
>> thobj->status, and I observed that __RT(kill(thobj->pid, SIGRESM)) got
>> called, with a retval 0. thobj->pid has the right value.
>>
>> But the suspended task does not get resumed.
>>
>> Any idea/suggestion?
>>
>
> Can you derive a sharable simple test case from this? That will ensure
> we are not misinterpreting what you code actually does.
>
> Jan
>
I will try to reduce the application code to a simple test, but I'm not
sure to succeed.
I will get back to you anyway.
Thanks.
--
Mauro
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xenomai 3: Wierd problem resuming task
2021-07-16 9:45 Xenomai 3: Wierd problem resuming task Mauro S.
2021-07-16 9:47 ` Jan Kiszka
@ 2021-07-16 12:06 ` Philippe Gerum
2021-07-16 14:43 ` Mauro S.
1 sibling, 1 reply; 8+ messages in thread
From: Philippe Gerum @ 2021-07-16 12:06 UTC (permalink / raw)
To: Mauro S.; +Cc: xenomai
Mauro S. via Xenomai <xenomai@xenomai.org> writes:
> Hi,
>
> I'm using Xenomai3 (master branch, commit
> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
> coniguration. SMI workaround is enabled and all latency tests are
> good.
>
> I'm facing with a very weird problem in my application. I have some
> tasks with priority < 90 that call rt_task_suspend() on themselves.
> Then, I have a task with priority 99 that resumes all other tasks with
> rt_task_resume(), when they are suspended.
>
> Sometimes a task does not get resumed.
>
> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>
> CPU PID MSW CSW XSC PF STAT %CPU NAME
> 1 620 3 8 13 0 00048041 0.0
> t12
^
That bit (XNSUSP) indicates that the core thinks the task is still in
suspended state.
>
> Analizing the scenario attaching gdb to the application, I observe
> that the not-resuming task has this backtrace:
>
> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
> 100 signal.c: No such file or directory.
> (gdb) bt
> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
> #1 0x00007f0ec14eb379 in threadobj_suspend
> (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
> #2 0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
> task.c:1154
>
> that seems to me OK. If I understood correctly, it is locked in its
> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>
> Then, I placed some breakpoints where rt_task_resume() were called,
> and in rt_task_resume() itself. I set tcb->suspends=1 with GDB and
> followed the subsequent call of threadobj_resume(). Then, I placed
> some breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED
> bit in thobj->status, and I observed that __RT(kill(thobj->pid,
> SIGRESM)) got called, with a retval 0. thobj->pid has the right value.
>
> But the suspended task does not get resumed.
>
> Any idea/suggestion?
>
You may want to check what happens in __cobalt_kill()
(kernel/cobalt/posix/signal.c), and in COBALT_SYSCALL(kill, ...), to
make sure the call actually succeeds.
--
Philippe.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xenomai 3: Wierd problem resuming task
2021-07-16 12:06 ` Philippe Gerum
@ 2021-07-16 14:43 ` Mauro S.
2021-07-16 14:58 ` Philippe Gerum
0 siblings, 1 reply; 8+ messages in thread
From: Mauro S. @ 2021-07-16 14:43 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
Il 16/07/21 14:06, Philippe Gerum ha scritto:
>
> Mauro S. via Xenomai <xenomai@xenomai.org> writes:
>
>> Hi,
>>
>> I'm using Xenomai3 (master branch, commit
>> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
>> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
>> coniguration. SMI workaround is enabled and all latency tests are
>> good.
>>
>> I'm facing with a very weird problem in my application. I have some
>> tasks with priority < 90 that call rt_task_suspend() on themselves.
>> Then, I have a task with priority 99 that resumes all other tasks with
>> rt_task_resume(), when they are suspended.
>>
>> Sometimes a task does not get resumed.
>>
>> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>>
>> CPU PID MSW CSW XSC PF STAT %CPU NAME
>> 1 620 3 8 13 0 00048041 0.0
>> t12
> ^
>
> That bit (XNSUSP) indicates that the core thinks the task is still in
> suspended state.
>
>>
>> Analizing the scenario attaching gdb to the application, I observe
>> that the not-resuming task has this backtrace:
>>
>> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>> 100 signal.c: No such file or directory.
>> (gdb) bt
>> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>> #1 0x00007f0ec14eb379 in threadobj_suspend
>> (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
>> #2 0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
>> task.c:1154
>>
>> that seems to me OK. If I understood correctly, it is locked in its
>> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>>
>> Then, I placed some breakpoints where rt_task_resume() were called,
>> and in rt_task_resume() itself. I set tcb->suspends=1 with GDB and
>> followed the subsequent call of threadobj_resume(). Then, I placed
>> some breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED
>> bit in thobj->status, and I observed that __RT(kill(thobj->pid,
>> SIGRESM)) got called, with a retval 0. thobj->pid has the right value.
>>
>> But the suspended task does not get resumed.
>>
>> Any idea/suggestion?
>>
>
> You may want to check what happens in __cobalt_kill()
> (kernel/cobalt/posix/signal.c), and in COBALT_SYSCALL(kill, ...), to
> make sure the call actually succeeds.
I traversed the code until lib/cobalt/signal.c (__cobalt_kill()
library-side) and I cheched retval of XENOMAI_SYSCALL2(sc_cobalt_kill,
pid, sig) (that is 0).
If I understand correctly, kernel/cobalt/posix/signal.c is kernel code.
Should I add printks? Or should I have to attach gdb to kernel (I think
I'm not able)
thanks
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xenomai 3: Wierd problem resuming task
2021-07-16 14:43 ` Mauro S.
@ 2021-07-16 14:58 ` Philippe Gerum
2021-07-19 8:10 ` Mauro S.
0 siblings, 1 reply; 8+ messages in thread
From: Philippe Gerum @ 2021-07-16 14:58 UTC (permalink / raw)
To: Mauro S.; +Cc: xenomai
Mauro S. <mau.salvi@tin.it> writes:
> Il 16/07/21 14:06, Philippe Gerum ha scritto:
>> Mauro S. via Xenomai <xenomai@xenomai.org> writes:
>>
>>> Hi,
>>>
>>> I'm using Xenomai3 (master branch, commit
>>> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
>>> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
>>> coniguration. SMI workaround is enabled and all latency tests are
>>> good.
>>>
>>> I'm facing with a very weird problem in my application. I have some
>>> tasks with priority < 90 that call rt_task_suspend() on themselves.
>>> Then, I have a task with priority 99 that resumes all other tasks with
>>> rt_task_resume(), when they are suspended.
>>>
>>> Sometimes a task does not get resumed.
>>>
>>> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>>>
>>> CPU PID MSW CSW XSC PF STAT %CPU NAME
>>> 1 620 3 8 13 0 00048041 0.0
>>> t12
>> ^
>> That bit (XNSUSP) indicates that the core thinks the task is still
>> in
>> suspended state.
>>
>>>
>>> Analizing the scenario attaching gdb to the application, I observe
>>> that the not-resuming task has this backtrace:
>>>
>>> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>>> 100 signal.c: No such file or directory.
>>> (gdb) bt
>>> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>>> #1 0x00007f0ec14eb379 in threadobj_suspend
>>> (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
>>> #2 0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
>>> task.c:1154
>>>
>>> that seems to me OK. If I understood correctly, it is locked in its
>>> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>>>
>>> Then, I placed some breakpoints where rt_task_resume() were called,
>>> and in rt_task_resume() itself. I set tcb->suspends=1 with GDB and
>>> followed the subsequent call of threadobj_resume(). Then, I placed
>>> some breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED
>>> bit in thobj->status, and I observed that __RT(kill(thobj->pid,
>>> SIGRESM)) got called, with a retval 0. thobj->pid has the right value.
>>>
>>> But the suspended task does not get resumed.
>>>
>>> Any idea/suggestion?
>>>
>> You may want to check what happens in __cobalt_kill()
>> (kernel/cobalt/posix/signal.c), and in COBALT_SYSCALL(kill, ...), to
>> make sure the call actually succeeds.
>
> I traversed the code until lib/cobalt/signal.c (__cobalt_kill()
> library-side) and I cheched retval of XENOMAI_SYSCALL2(sc_cobalt_kill,
> pid, sig) (that is 0).
>
> If I understand correctly, kernel/cobalt/posix/signal.c is kernel
> code. Should I add printks? Or should I have to attach gdb to kernel
> (I think I'm not able)
printk() would be fine.
--
Philippe.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xenomai 3: Wierd problem resuming task
2021-07-16 14:58 ` Philippe Gerum
@ 2021-07-19 8:10 ` Mauro S.
2021-09-14 14:09 ` Mauro S.
0 siblings, 1 reply; 8+ messages in thread
From: Mauro S. @ 2021-07-19 8:10 UTC (permalink / raw)
To: xenomai
Il 16/07/21 16:58, Philippe Gerum ha scritto:
>
> Mauro S. <mau.salvi@tin.it> writes:
>
>> Il 16/07/21 14:06, Philippe Gerum ha scritto:
>>> Mauro S. via Xenomai <xenomai@xenomai.org> writes:
>>>
>>>> Hi,
>>>>
>>>> I'm using Xenomai3 (master branch, commit
>>>> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
>>>> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
>>>> coniguration. SMI workaround is enabled and all latency tests are
>>>> good.
>>>>
>>>> I'm facing with a very weird problem in my application. I have some
>>>> tasks with priority < 90 that call rt_task_suspend() on themselves.
>>>> Then, I have a task with priority 99 that resumes all other tasks with
>>>> rt_task_resume(), when they are suspended.
>>>>
>>>> Sometimes a task does not get resumed.
>>>>
>>>> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>>>>
>>>> CPU PID MSW CSW XSC PF STAT %CPU NAME
>>>> 1 620 3 8 13 0 00048041 0.0
>>>> t12
>>> ^
>>> That bit (XNSUSP) indicates that the core thinks the task is still
>>> in
>>> suspended state.
>>>
>>>>
>>>> Analizing the scenario attaching gdb to the application, I observe
>>>> that the not-resuming task has this backtrace:
>>>>
>>>> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>>>> 100 signal.c: No such file or directory.
>>>> (gdb) bt
>>>> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>>>> #1 0x00007f0ec14eb379 in threadobj_suspend
>>>> (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
>>>> #2 0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
>>>> task.c:1154
>>>>
>>>> that seems to me OK. If I understood correctly, it is locked in its
>>>> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>>>>
>>>> Then, I placed some breakpoints where rt_task_resume() were called,
>>>> and in rt_task_resume() itself. I set tcb->suspends=1 with GDB and
>>>> followed the subsequent call of threadobj_resume(). Then, I placed
>>>> some breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED
>>>> bit in thobj->status, and I observed that __RT(kill(thobj->pid,
>>>> SIGRESM)) got called, with a retval 0. thobj->pid has the right value.
>>>>
>>>> But the suspended task does not get resumed.
>>>>
>>>> Any idea/suggestion?
>>>>
>>> You may want to check what happens in __cobalt_kill()
>>> (kernel/cobalt/posix/signal.c), and in COBALT_SYSCALL(kill, ...), to
>>> make sure the call actually succeeds.
>>
>> I traversed the code until lib/cobalt/signal.c (__cobalt_kill()
>> library-side) and I cheched retval of XENOMAI_SYSCALL2(sc_cobalt_kill,
>> pid, sig) (that is 0).
>>
>> If I understand correctly, kernel/cobalt/posix/signal.c is kernel
>> code. Should I add printks? Or should I have to attach gdb to kernel
>> (I think I'm not able)
>
> printk() would be fine.
>
Hi Philippe an Jan,
I have to check more deeply my application because there are some other
strange behaviors that make me think that there could be memory leaks
somewhere (and a standalone simple test code does not have the problem I
seen).
I will be back if I will find something regarding Xenomai.
Thanks again for your help.
Regards
--
Mauro
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xenomai 3: Wierd problem resuming task
2021-07-19 8:10 ` Mauro S.
@ 2021-09-14 14:09 ` Mauro S.
0 siblings, 0 replies; 8+ messages in thread
From: Mauro S. @ 2021-09-14 14:09 UTC (permalink / raw)
To: xenomai
Il 19/07/21 10:10, Mauro S. via Xenomai ha scritto:
> Il 16/07/21 16:58, Philippe Gerum ha scritto:
>>
>> Mauro S. <mau.salvi@tin.it> writes:
>>
>>> Il 16/07/21 14:06, Philippe Gerum ha scritto:
>>>> Mauro S. via Xenomai <xenomai@xenomai.org> writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm using Xenomai3 (master branch, commit
>>>>> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
>>>>> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
>>>>> coniguration. SMI workaround is enabled and all latency tests are
>>>>> good.
>>>>>
>>>>> I'm facing with a very weird problem in my application. I have some
>>>>> tasks with priority < 90 that call rt_task_suspend() on themselves.
>>>>> Then, I have a task with priority 99 that resumes all other tasks with
>>>>> rt_task_resume(), when they are suspended.
>>>>>
>>>>> Sometimes a task does not get resumed.
>>>>>
>>>>> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>>>>>
>>>>> CPU PID MSW CSW XSC PF STAT %CPU
>>>>> NAME
>>>>> 1 620 3 8 13 0 00048041 0.0
>>>>> t12
>>>> ^
>>>> That bit (XNSUSP) indicates that the core thinks the task is still
>>>> in
>>>> suspended state.
>>>>
>>>>>
>>>>> Analizing the scenario attaching gdb to the application, I observe
>>>>> that the not-resuming task has this backtrace:
>>>>>
>>>>> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at
>>>>> signal.c:100
>>>>> 100 signal.c: No such file or directory.
>>>>> (gdb) bt
>>>>> #0 0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at
>>>>> signal.c:100
>>>>> #1 0x00007f0ec14eb379 in threadobj_suspend
>>>>> (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
>>>>> #2 0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
>>>>> task.c:1154
>>>>>
>>>>> that seems to me OK. If I understood correctly, it is locked in its
>>>>> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>>>>>
>>>>> Then, I placed some breakpoints where rt_task_resume() were called,
>>>>> and in rt_task_resume() itself. I set tcb->suspends=1 with GDB and
>>>>> followed the subsequent call of threadobj_resume(). Then, I placed
>>>>> some breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED
>>>>> bit in thobj->status, and I observed that __RT(kill(thobj->pid,
>>>>> SIGRESM)) got called, with a retval 0. thobj->pid has the right value.
>>>>>
>>>>> But the suspended task does not get resumed.
>>>>>
>>>>> Any idea/suggestion?
>>>>>
>>>> You may want to check what happens in __cobalt_kill()
>>>> (kernel/cobalt/posix/signal.c), and in COBALT_SYSCALL(kill, ...), to
>>>> make sure the call actually succeeds.
>>>
>>> I traversed the code until lib/cobalt/signal.c (__cobalt_kill()
>>> library-side) and I cheched retval of XENOMAI_SYSCALL2(sc_cobalt_kill,
>>> pid, sig) (that is 0).
>>>
>>> If I understand correctly, kernel/cobalt/posix/signal.c is kernel
>>> code. Should I add printks? Or should I have to attach gdb to kernel
>>> (I think I'm not able)
>>
>> printk() would be fine.
>>
>
> Hi Philippe an Jan,
>
> I have to check more deeply my application because there are some other
> strange behaviors that make me think that there could be memory leaks
> somewhere (and a standalone simple test code does not have the problem I
> seen).
>
> I will be back if I will find something regarding Xenomai.
>
> Thanks again for your help.
>
> Regards
>
Hi all,
after a ton of tries, small test executables, headaches...I think I can
assert with almost certainty that the problem I observe is the same
reported in this old thread [1] and then that is better to move to
semaphores instead of using rt_task_suspend/rt_task_resume.
Thanks to all, regards
[1] https://xenomai.org/pipermail/xenomai/2016-January/035781.html
--
Mauro
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-09-14 14:09 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-16 9:45 Xenomai 3: Wierd problem resuming task Mauro S.
2021-07-16 9:47 ` Jan Kiszka
2021-07-16 11:32 ` Mauro S.
2021-07-16 12:06 ` Philippe Gerum
2021-07-16 14:43 ` Mauro S.
2021-07-16 14:58 ` Philippe Gerum
2021-07-19 8:10 ` Mauro S.
2021-09-14 14:09 ` Mauro S.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.