All of lore.kernel.org
 help / color / mirror / Atom feed
* Xenomai 3: Wierd problem resuming task
@ 2021-07-16  9:45 Mauro S.
  2021-07-16  9:47 ` Jan Kiszka
  2021-07-16 12:06 ` Philippe Gerum
  0 siblings, 2 replies; 8+ messages in thread
From: Mauro S. @ 2021-07-16  9:45 UTC (permalink / raw)
  To: xenomai

Hi,

I'm using Xenomai3 (master branch, commit 
bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom 
x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt 
coniguration. SMI workaround is enabled and all latency tests are good.

I'm facing with a very weird problem in my application. I have some 
tasks with priority < 90 that call rt_task_suspend() on themselves. 
Then, I have a task with priority 99 that resumes all other tasks with 
rt_task_resume(), when they are suspended.

Sometimes a task does not get resumed.

In /proc/xenomai/sched/stat I have this status for my suspendend task:

CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
   1  620    3          8          13         0     00048041    0.0  t12

Analizing the scenario attaching gdb to the application, I observe that 
the not-resuming task has this backtrace:

#0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
100	signal.c: No such file or directory.
(gdb) bt
#0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
#1  0x00007f0ec14eb379 in threadobj_suspend 
(thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
#2  0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at 
task.c:1154

that seems to me OK. If I understood correctly, it is locked in its 
SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".

Then, I placed some breakpoints where rt_task_resume() were called, and 
in rt_task_resume() itself. I set tcb->suspends=1 with GDB and followed 
the subsequent call of threadobj_resume(). Then, I placed some 
breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED bit in 
thobj->status, and I observed that __RT(kill(thobj->pid, SIGRESM)) got 
called, with a retval 0. thobj->pid has the right value.

But the suspended task does not get resumed.

Any idea/suggestion?

Thanks in advance, regards

-- 
Mauro


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xenomai 3: Wierd problem resuming task
  2021-07-16  9:45 Xenomai 3: Wierd problem resuming task Mauro S.
@ 2021-07-16  9:47 ` Jan Kiszka
  2021-07-16 11:32   ` Mauro S.
  2021-07-16 12:06 ` Philippe Gerum
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Kiszka @ 2021-07-16  9:47 UTC (permalink / raw)
  To: Mauro S., xenomai

On 16.07.21 11:45, Mauro S. via Xenomai wrote:
> Hi,
> 
> I'm using Xenomai3 (master branch, commit
> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
> coniguration. SMI workaround is enabled and all latency tests are good.
> 
> I'm facing with a very weird problem in my application. I have some
> tasks with priority < 90 that call rt_task_suspend() on themselves.
> Then, I have a task with priority 99 that resumes all other tasks with
> rt_task_resume(), when they are suspended.
> 
> Sometimes a task does not get resumed.
> 
> In /proc/xenomai/sched/stat I have this status for my suspendend task:
> 
> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
>   1  620    3          8          13         0     00048041    0.0  t12
> 
> Analizing the scenario attaching gdb to the application, I observe that
> the not-resuming task has this backtrace:
> 
> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
> 100    signal.c: No such file or directory.
> (gdb) bt
> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
> #1  0x00007f0ec14eb379 in threadobj_suspend
> (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
> #2  0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
> task.c:1154
> 
> that seems to me OK. If I understood correctly, it is locked in its
> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
> 
> Then, I placed some breakpoints where rt_task_resume() were called, and
> in rt_task_resume() itself. I set tcb->suspends=1 with GDB and followed
> the subsequent call of threadobj_resume(). Then, I placed some
> breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED bit in
> thobj->status, and I observed that __RT(kill(thobj->pid, SIGRESM)) got
> called, with a retval 0. thobj->pid has the right value.
> 
> But the suspended task does not get resumed.
> 
> Any idea/suggestion?
> 

Can you derive a sharable simple test case from this? That will ensure
we are not misinterpreting what you code actually does.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xenomai 3: Wierd problem resuming task
  2021-07-16  9:47 ` Jan Kiszka
@ 2021-07-16 11:32   ` Mauro S.
  0 siblings, 0 replies; 8+ messages in thread
From: Mauro S. @ 2021-07-16 11:32 UTC (permalink / raw)
  To: Jan Kiszka, xenomai

Il 16/07/21 11:47, Jan Kiszka ha scritto:
> On 16.07.21 11:45, Mauro S. via Xenomai wrote:
>> Hi,
>>
>> I'm using Xenomai3 (master branch, commit
>> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
>> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
>> coniguration. SMI workaround is enabled and all latency tests are good.
>>
>> I'm facing with a very weird problem in my application. I have some
>> tasks with priority < 90 that call rt_task_suspend() on themselves.
>> Then, I have a task with priority 99 that resumes all other tasks with
>> rt_task_resume(), when they are suspended.
>>
>> Sometimes a task does not get resumed.
>>
>> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>>
>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
>>    1  620    3          8          13         0     00048041    0.0  t12
>>
>> Analizing the scenario attaching gdb to the application, I observe that
>> the not-resuming task has this backtrace:
>>
>> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>> 100    signal.c: No such file or directory.
>> (gdb) bt
>> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>> #1  0x00007f0ec14eb379 in threadobj_suspend
>> (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
>> #2  0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
>> task.c:1154
>>
>> that seems to me OK. If I understood correctly, it is locked in its
>> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>>
>> Then, I placed some breakpoints where rt_task_resume() were called, and
>> in rt_task_resume() itself. I set tcb->suspends=1 with GDB and followed
>> the subsequent call of threadobj_resume(). Then, I placed some
>> breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED bit in
>> thobj->status, and I observed that __RT(kill(thobj->pid, SIGRESM)) got
>> called, with a retval 0. thobj->pid has the right value.
>>
>> But the suspended task does not get resumed.
>>
>> Any idea/suggestion?
>>
> 
> Can you derive a sharable simple test case from this? That will ensure
> we are not misinterpreting what you code actually does.
> 
> Jan
> 


I will try to reduce the application code to a simple test, but I'm not 
sure to succeed.
I will get back to you anyway.

Thanks.

--
Mauro


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xenomai 3: Wierd problem resuming task
  2021-07-16  9:45 Xenomai 3: Wierd problem resuming task Mauro S.
  2021-07-16  9:47 ` Jan Kiszka
@ 2021-07-16 12:06 ` Philippe Gerum
  2021-07-16 14:43   ` Mauro S.
  1 sibling, 1 reply; 8+ messages in thread
From: Philippe Gerum @ 2021-07-16 12:06 UTC (permalink / raw)
  To: Mauro S.; +Cc: xenomai


Mauro S. via Xenomai <xenomai@xenomai.org> writes:

> Hi,
>
> I'm using Xenomai3 (master branch, commit
> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom 
> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
> coniguration. SMI workaround is enabled and all latency tests are
> good.
>
> I'm facing with a very weird problem in my application. I have some
> tasks with priority < 90 that call rt_task_suspend() on themselves. 
> Then, I have a task with priority 99 that resumes all other tasks with
> rt_task_resume(), when they are suspended.
>
> Sometimes a task does not get resumed.
>
> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>
> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
>   1  620    3          8          13         0     00048041    0.0
> t12
                                                            ^

That bit (XNSUSP) indicates that the core thinks the task is still in
suspended state.

>
> Analizing the scenario attaching gdb to the application, I observe
> that the not-resuming task has this backtrace:
>
> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
> 100	signal.c: No such file or directory.
> (gdb) bt
> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
> #1  0x00007f0ec14eb379 in threadobj_suspend
>  (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
> #2  0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
>  task.c:1154
>
> that seems to me OK. If I understood correctly, it is locked in its
> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>
> Then, I placed some breakpoints where rt_task_resume() were called,
> and in rt_task_resume() itself. I set tcb->suspends=1 with GDB and
> followed the subsequent call of threadobj_resume(). Then, I placed
> some breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED
> bit in thobj->status, and I observed that __RT(kill(thobj->pid,
> SIGRESM)) got called, with a retval 0. thobj->pid has the right value.
>
> But the suspended task does not get resumed.
>
> Any idea/suggestion?
>

You may want to check what happens in __cobalt_kill()
(kernel/cobalt/posix/signal.c), and in COBALT_SYSCALL(kill, ...), to
make sure the call actually succeeds.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xenomai 3: Wierd problem resuming task
  2021-07-16 12:06 ` Philippe Gerum
@ 2021-07-16 14:43   ` Mauro S.
  2021-07-16 14:58     ` Philippe Gerum
  0 siblings, 1 reply; 8+ messages in thread
From: Mauro S. @ 2021-07-16 14:43 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

Il 16/07/21 14:06, Philippe Gerum ha scritto:
> 
> Mauro S. via Xenomai <xenomai@xenomai.org> writes:
> 
>> Hi,
>>
>> I'm using Xenomai3 (master branch, commit
>> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
>> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
>> coniguration. SMI workaround is enabled and all latency tests are
>> good.
>>
>> I'm facing with a very weird problem in my application. I have some
>> tasks with priority < 90 that call rt_task_suspend() on themselves.
>> Then, I have a task with priority 99 that resumes all other tasks with
>> rt_task_resume(), when they are suspended.
>>
>> Sometimes a task does not get resumed.
>>
>> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>>
>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
>>    1  620    3          8          13         0     00048041    0.0
>> t12
>                                                              ^
> 
> That bit (XNSUSP) indicates that the core thinks the task is still in
> suspended state.
> 
>>
>> Analizing the scenario attaching gdb to the application, I observe
>> that the not-resuming task has this backtrace:
>>
>> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>> 100	signal.c: No such file or directory.
>> (gdb) bt
>> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>> #1  0x00007f0ec14eb379 in threadobj_suspend
>>   (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
>> #2  0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
>>   task.c:1154
>>
>> that seems to me OK. If I understood correctly, it is locked in its
>> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>>
>> Then, I placed some breakpoints where rt_task_resume() were called,
>> and in rt_task_resume() itself. I set tcb->suspends=1 with GDB and
>> followed the subsequent call of threadobj_resume(). Then, I placed
>> some breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED
>> bit in thobj->status, and I observed that __RT(kill(thobj->pid,
>> SIGRESM)) got called, with a retval 0. thobj->pid has the right value.
>>
>> But the suspended task does not get resumed.
>>
>> Any idea/suggestion?
>>
> 
> You may want to check what happens in __cobalt_kill()
> (kernel/cobalt/posix/signal.c), and in COBALT_SYSCALL(kill, ...), to
> make sure the call actually succeeds.

I traversed the code until lib/cobalt/signal.c (__cobalt_kill() 
library-side) and I cheched retval of XENOMAI_SYSCALL2(sc_cobalt_kill, 
pid, sig) (that is 0).

If I understand correctly, kernel/cobalt/posix/signal.c is kernel code. 
Should I add printks? Or should I have to attach gdb to kernel (I think 
I'm not able)

thanks


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xenomai 3: Wierd problem resuming task
  2021-07-16 14:43   ` Mauro S.
@ 2021-07-16 14:58     ` Philippe Gerum
  2021-07-19  8:10       ` Mauro S.
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Gerum @ 2021-07-16 14:58 UTC (permalink / raw)
  To: Mauro S.; +Cc: xenomai


Mauro S. <mau.salvi@tin.it> writes:

> Il 16/07/21 14:06, Philippe Gerum ha scritto:
>> Mauro S. via Xenomai <xenomai@xenomai.org> writes:
>> 
>>> Hi,
>>>
>>> I'm using Xenomai3 (master branch, commit
>>> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
>>> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
>>> coniguration. SMI workaround is enabled and all latency tests are
>>> good.
>>>
>>> I'm facing with a very weird problem in my application. I have some
>>> tasks with priority < 90 that call rt_task_suspend() on themselves.
>>> Then, I have a task with priority 99 that resumes all other tasks with
>>> rt_task_resume(), when they are suspended.
>>>
>>> Sometimes a task does not get resumed.
>>>
>>> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>>>
>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
>>>    1  620    3          8          13         0     00048041    0.0
>>> t12
>>                                                              ^
>> That bit (XNSUSP) indicates that the core thinks the task is still
>> in
>> suspended state.
>> 
>>>
>>> Analizing the scenario attaching gdb to the application, I observe
>>> that the not-resuming task has this backtrace:
>>>
>>> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>>> 100	signal.c: No such file or directory.
>>> (gdb) bt
>>> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>>> #1  0x00007f0ec14eb379 in threadobj_suspend
>>>   (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
>>> #2  0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
>>>   task.c:1154
>>>
>>> that seems to me OK. If I understood correctly, it is locked in its
>>> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>>>
>>> Then, I placed some breakpoints where rt_task_resume() were called,
>>> and in rt_task_resume() itself. I set tcb->suspends=1 with GDB and
>>> followed the subsequent call of threadobj_resume(). Then, I placed
>>> some breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED
>>> bit in thobj->status, and I observed that __RT(kill(thobj->pid,
>>> SIGRESM)) got called, with a retval 0. thobj->pid has the right value.
>>>
>>> But the suspended task does not get resumed.
>>>
>>> Any idea/suggestion?
>>>
>> You may want to check what happens in __cobalt_kill()
>> (kernel/cobalt/posix/signal.c), and in COBALT_SYSCALL(kill, ...), to
>> make sure the call actually succeeds.
>
> I traversed the code until lib/cobalt/signal.c (__cobalt_kill()
> library-side) and I cheched retval of XENOMAI_SYSCALL2(sc_cobalt_kill, 
> pid, sig) (that is 0).
>
> If I understand correctly, kernel/cobalt/posix/signal.c is kernel
> code. Should I add printks? Or should I have to attach gdb to kernel
> (I think I'm not able)

printk() would be fine.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xenomai 3: Wierd problem resuming task
  2021-07-16 14:58     ` Philippe Gerum
@ 2021-07-19  8:10       ` Mauro S.
  2021-09-14 14:09         ` Mauro S.
  0 siblings, 1 reply; 8+ messages in thread
From: Mauro S. @ 2021-07-19  8:10 UTC (permalink / raw)
  To: xenomai

Il 16/07/21 16:58, Philippe Gerum ha scritto:
> 
> Mauro S. <mau.salvi@tin.it> writes:
> 
>> Il 16/07/21 14:06, Philippe Gerum ha scritto:
>>> Mauro S. via Xenomai <xenomai@xenomai.org> writes:
>>>
>>>> Hi,
>>>>
>>>> I'm using Xenomai3 (master branch, commit
>>>> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
>>>> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
>>>> coniguration. SMI workaround is enabled and all latency tests are
>>>> good.
>>>>
>>>> I'm facing with a very weird problem in my application. I have some
>>>> tasks with priority < 90 that call rt_task_suspend() on themselves.
>>>> Then, I have a task with priority 99 that resumes all other tasks with
>>>> rt_task_resume(), when they are suspended.
>>>>
>>>> Sometimes a task does not get resumed.
>>>>
>>>> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>>>>
>>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  NAME
>>>>     1  620    3          8          13         0     00048041    0.0
>>>> t12
>>>                                                               ^
>>> That bit (XNSUSP) indicates that the core thinks the task is still
>>> in
>>> suspended state.
>>>
>>>>
>>>> Analizing the scenario attaching gdb to the application, I observe
>>>> that the not-resuming task has this backtrace:
>>>>
>>>> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>>>> 100	signal.c: No such file or directory.
>>>> (gdb) bt
>>>> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at signal.c:100
>>>> #1  0x00007f0ec14eb379 in threadobj_suspend
>>>>    (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
>>>> #2  0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
>>>>    task.c:1154
>>>>
>>>> that seems to me OK. If I understood correctly, it is locked in its
>>>> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>>>>
>>>> Then, I placed some breakpoints where rt_task_resume() were called,
>>>> and in rt_task_resume() itself. I set tcb->suspends=1 with GDB and
>>>> followed the subsequent call of threadobj_resume(). Then, I placed
>>>> some breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED
>>>> bit in thobj->status, and I observed that __RT(kill(thobj->pid,
>>>> SIGRESM)) got called, with a retval 0. thobj->pid has the right value.
>>>>
>>>> But the suspended task does not get resumed.
>>>>
>>>> Any idea/suggestion?
>>>>
>>> You may want to check what happens in __cobalt_kill()
>>> (kernel/cobalt/posix/signal.c), and in COBALT_SYSCALL(kill, ...), to
>>> make sure the call actually succeeds.
>>
>> I traversed the code until lib/cobalt/signal.c (__cobalt_kill()
>> library-side) and I cheched retval of XENOMAI_SYSCALL2(sc_cobalt_kill,
>> pid, sig) (that is 0).
>>
>> If I understand correctly, kernel/cobalt/posix/signal.c is kernel
>> code. Should I add printks? Or should I have to attach gdb to kernel
>> (I think I'm not able)
> 
> printk() would be fine.
> 

Hi Philippe an Jan,

I have to check more deeply my application because there are some other 
strange behaviors that make me think that there could be memory leaks 
somewhere (and a standalone simple test code does not have the problem I 
seen).

I will be back if I will find something regarding Xenomai.

Thanks again for your help.

Regards

-- 
Mauro



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Xenomai 3: Wierd problem resuming task
  2021-07-19  8:10       ` Mauro S.
@ 2021-09-14 14:09         ` Mauro S.
  0 siblings, 0 replies; 8+ messages in thread
From: Mauro S. @ 2021-09-14 14:09 UTC (permalink / raw)
  To: xenomai

Il 19/07/21 10:10, Mauro S. via Xenomai ha scritto:
> Il 16/07/21 16:58, Philippe Gerum ha scritto:
>>
>> Mauro S. <mau.salvi@tin.it> writes:
>>
>>> Il 16/07/21 14:06, Philippe Gerum ha scritto:
>>>> Mauro S. via Xenomai <xenomai@xenomai.org> writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm using Xenomai3 (master branch, commit
>>>>> bca41678742be80c3a0d5a01935c671c385a95a1) on a X86_64bit Intel Atom
>>>>> x5-E8000 with 2GB RAM, using kernel from Xenomai repos, in Cobalt
>>>>> coniguration. SMI workaround is enabled and all latency tests are
>>>>> good.
>>>>>
>>>>> I'm facing with a very weird problem in my application. I have some
>>>>> tasks with priority < 90 that call rt_task_suspend() on themselves.
>>>>> Then, I have a task with priority 99 that resumes all other tasks with
>>>>> rt_task_resume(), when they are suspended.
>>>>>
>>>>> Sometimes a task does not get resumed.
>>>>>
>>>>> In /proc/xenomai/sched/stat I have this status for my suspendend task:
>>>>>
>>>>> CPU  PID    MSW        CSW        XSC        PF    STAT       %CPU  
>>>>> NAME
>>>>>     1  620    3          8          13         0     00048041    0.0
>>>>> t12
>>>>                                                               ^
>>>> That bit (XNSUSP) indicates that the core thinks the task is still
>>>> in
>>>> suspended state.
>>>>
>>>>>
>>>>> Analizing the scenario attaching gdb to the application, I observe
>>>>> that the not-resuming task has this backtrace:
>>>>>
>>>>> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at 
>>>>> signal.c:100
>>>>> 100    signal.c: No such file or directory.
>>>>> (gdb) bt
>>>>> #0  0x00007f0ec14c9b38 in __cobalt_kill (pid=620, sig=65) at 
>>>>> signal.c:100
>>>>> #1  0x00007f0ec14eb379 in threadobj_suspend
>>>>>    (thobj=thobj@entry=0x7f0ec06bfcd0) at threadobj.c:335
>>>>> #2  0x00007f0ec15013dc in rt_task_suspend (task=task@entry=0x0) at
>>>>>    task.c:1154
>>>>>
>>>>> that seems to me OK. If I understood correctly, it is locked in its
>>>>> SIGSUSP handler, that calls sigsuspend() waiting SIGRESM to "restart".
>>>>>
>>>>> Then, I placed some breakpoints where rt_task_resume() were called,
>>>>> and in rt_task_resume() itself. I set tcb->suspends=1 with GDB and
>>>>> followed the subsequent call of threadobj_resume(). Then, I placed
>>>>> some breakpoints in threadobj_resume, I forced __THREAD_S_SUSPENDED
>>>>> bit in thobj->status, and I observed that __RT(kill(thobj->pid,
>>>>> SIGRESM)) got called, with a retval 0. thobj->pid has the right value.
>>>>>
>>>>> But the suspended task does not get resumed.
>>>>>
>>>>> Any idea/suggestion?
>>>>>
>>>> You may want to check what happens in __cobalt_kill()
>>>> (kernel/cobalt/posix/signal.c), and in COBALT_SYSCALL(kill, ...), to
>>>> make sure the call actually succeeds.
>>>
>>> I traversed the code until lib/cobalt/signal.c (__cobalt_kill()
>>> library-side) and I cheched retval of XENOMAI_SYSCALL2(sc_cobalt_kill,
>>> pid, sig) (that is 0).
>>>
>>> If I understand correctly, kernel/cobalt/posix/signal.c is kernel
>>> code. Should I add printks? Or should I have to attach gdb to kernel
>>> (I think I'm not able)
>>
>> printk() would be fine.
>>
> 
> Hi Philippe an Jan,
> 
> I have to check more deeply my application because there are some other 
> strange behaviors that make me think that there could be memory leaks 
> somewhere (and a standalone simple test code does not have the problem I 
> seen).
> 
> I will be back if I will find something regarding Xenomai.
> 
> Thanks again for your help.
> 
> Regards
> 

Hi all,

after a ton of tries, small test executables, headaches...I think I can 
assert with almost certainty that the problem I observe is the same 
reported in this old thread [1] and then that is better to move to 
semaphores instead of using rt_task_suspend/rt_task_resume.

Thanks to all, regards

[1] https://xenomai.org/pipermail/xenomai/2016-January/035781.html

-- 
Mauro


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-14 14:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-16  9:45 Xenomai 3: Wierd problem resuming task Mauro S.
2021-07-16  9:47 ` Jan Kiszka
2021-07-16 11:32   ` Mauro S.
2021-07-16 12:06 ` Philippe Gerum
2021-07-16 14:43   ` Mauro S.
2021-07-16 14:58     ` Philippe Gerum
2021-07-19  8:10       ` Mauro S.
2021-09-14 14:09         ` Mauro S.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.