xenomai.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* "failed to allocate main tcb" error
@ 2022-12-21  9:05 Mauro S.
  2022-12-21 14:29 ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2022-12-21  9:05 UTC (permalink / raw)
  To: xenomai

Hi all,

I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with Copperplate and 
pshared enabled. Linux kernel is 5.4.181.

I have a main application, and another service application that connects 
to the main one using shared session.

After some time the main application is running, the service application 
stops to connect to the main one with the error

     0"004.166| BUG in main_overlay(): [main] failed to allocate main tcb

Digging a bit in the code, I found that the failing function is the 
xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
If I understood correctly, this function tries to allocate some space 
from the shared heap. But the shared heap is almost free

   # cat /proc/xenomai/heap
     TOTAL      FREE  NAME
   4194304   3616512  system heap
   1048576   1041776  shared heap
   1048576   1048480  private heap[14505]
     36864     19968  xddp-pool@0
     24576      7680  xddp-pool@10

Any suggestions about what could be the problem?

Thanks in advance, regards
-- 
Mauro S.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2022-12-21  9:05 "failed to allocate main tcb" error Mauro S.
@ 2022-12-21 14:29 ` Jan Kiszka
  2022-12-21 14:45   ` Mauro S.
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-12-21 14:29 UTC (permalink / raw)
  To: Mauro S., xenomai

On 21.12.22 10:05, Mauro S. wrote:
> Hi all,
>
> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with Copperplate and
> pshared enabled. Linux kernel is 5.4.181.
>
> I have a main application, and another service application that connects
> to the main one using shared session.
>
> After some time the main application is running, the service application
> stops to connect to the main one with the error
>
>     0"004.166| BUG in main_overlay(): [main] failed to allocate main tcb
>
> Digging a bit in the code, I found that the failing function is the
> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
> If I understood correctly, this function tries to allocate some space
> from the shared heap. But the shared heap is almost free
>
>   # cat /proc/xenomai/heap
>     TOTAL      FREE  NAME
>   4194304   3616512  system heap
>   1048576   1041776  shared heap
>   1048576   1048480  private heap[14505]
>     36864     19968  xddp-pool@0
>     24576      7680  xddp-pool@10
>

This is the code that triggers the message:

        if (pthread_key_create(&threadobj_tskey, finalize_thread))
                early_panic("failed to allocate TSD key");

pthread_key_create has nothing to do with xnmalloc. That's a plain glibc
service, and its documentation says:

       The pthread_key_create() function shall fail if:

       EAGAIN The system lacked the necessary resources to create
another thread-specific data key, or the system-imposed limit on the
total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded.

       ENOMEM Insufficient memory exists to create the key.

Already checked THOSE conditions? Are you possibly creating threads
without cleaning them up completely?

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2022-12-21 14:29 ` Jan Kiszka
@ 2022-12-21 14:45   ` Mauro S.
  2022-12-21 14:48     ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2022-12-21 14:45 UTC (permalink / raw)
  To: Jan Kiszka, xenomai

Il 21/12/22 15:29, Jan Kiszka ha scritto:
> On 21.12.22 10:05, Mauro S. wrote:
>> Hi all,
>>
>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with Copperplate and
>> pshared enabled. Linux kernel is 5.4.181.
>>
>> I have a main application, and another service application that connects
>> to the main one using shared session.
>>
>> After some time the main application is running, the service application
>> stops to connect to the main one with the error
>>
>>      0"004.166| BUG in main_overlay(): [main] failed to allocate main tcb
>>
>> Digging a bit in the code, I found that the failing function is the
>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>> If I understood correctly, this function tries to allocate some space
>> from the shared heap. But the shared heap is almost free
>>
>>    # cat /proc/xenomai/heap
>>      TOTAL      FREE  NAME
>>    4194304   3616512  system heap
>>    1048576   1041776  shared heap
>>    1048576   1048480  private heap[14505]
>>      36864     19968  xddp-pool@0
>>      24576      7680  xddp-pool@10
>>
> 
> This is the code that triggers the message:
> 
>          if (pthread_key_create(&threadobj_tskey, finalize_thread))
>                  early_panic("failed to allocate TSD key");
> 
> pthread_key_create has nothing to do with xnmalloc. That's a plain glibc
> service, and its documentation says:
> 
>         The pthread_key_create() function shall fail if:
> 
>         EAGAIN The system lacked the necessary resources to create
> another thread-specific data key, or the system-imposed limit on the
> total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded.
> 
>         ENOMEM Insufficient memory exists to create the key.
> 
> Already checked THOSE conditions? Are you possibly creating threads
> without cleaning them up completely?
> 
> Jan
> 

Hi Jan,

thank you.
Sorry but I can't figure out how the code you specified (that panics 
generating the message "failed to allocate TSD key") could trigger the 
"failed to allocate main tcb" message.

Thanks again, regards

-- 
Mauro S.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2022-12-21 14:45   ` Mauro S.
@ 2022-12-21 14:48     ` Jan Kiszka
  2022-12-21 14:57       ` Mauro S.
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-12-21 14:48 UTC (permalink / raw)
  To: Mauro S., xenomai

On 21.12.22 15:45, Mauro S. wrote:
> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>> On 21.12.22 10:05, Mauro S. wrote:
>>> Hi all,
>>>
>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with Copperplate and
>>> pshared enabled. Linux kernel is 5.4.181.
>>>
>>> I have a main application, and another service application that connects
>>> to the main one using shared session.
>>>
>>> After some time the main application is running, the service application
>>> stops to connect to the main one with the error
>>>
>>>      0"004.166| BUG in main_overlay(): [main] failed to allocate main
>>> tcb
>>>
>>> Digging a bit in the code, I found that the failing function is the
>>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>>> If I understood correctly, this function tries to allocate some space
>>> from the shared heap. But the shared heap is almost free
>>>
>>>    # cat /proc/xenomai/heap
>>>      TOTAL      FREE  NAME
>>>    4194304   3616512  system heap
>>>    1048576   1041776  shared heap
>>>    1048576   1048480  private heap[14505]
>>>      36864     19968  xddp-pool@0
>>>      24576      7680  xddp-pool@10
>>>
>>
>> This is the code that triggers the message:
>>
>>          if (pthread_key_create(&threadobj_tskey, finalize_thread))
>>                  early_panic("failed to allocate TSD key");
>>
>> pthread_key_create has nothing to do with xnmalloc. That's a plain glibc
>> service, and its documentation says:
>>
>>         The pthread_key_create() function shall fail if:
>>
>>         EAGAIN The system lacked the necessary resources to create
>> another thread-specific data key, or the system-imposed limit on the
>> total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded.
>>
>>         ENOMEM Insufficient memory exists to create the key.
>>
>> Already checked THOSE conditions? Are you possibly creating threads
>> without cleaning them up completely?
>>
>> Jan
>>
>
> Hi Jan,
>
> thank you.
> Sorry but I can't figure out how the code you specified (that panics
> generating the message "failed to allocate TSD key") could trigger the
> "failed to allocate main tcb" message.
>

https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86

That's the only line in Xenomai which raises this message. If you don't
believe it triggered it, start your application in gdb and put a
breakpoint on that panic line.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2022-12-21 14:48     ` Jan Kiszka
@ 2022-12-21 14:57       ` Mauro S.
  2022-12-21 15:11         ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2022-12-21 14:57 UTC (permalink / raw)
  To: Jan Kiszka, xenomai

Il 21/12/22 15:48, Jan Kiszka ha scritto:
> On 21.12.22 15:45, Mauro S. wrote:
>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>> On 21.12.22 10:05, Mauro S. wrote:
>>>> Hi all,
>>>>
>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with Copperplate and
>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>
>>>> I have a main application, and another service application that connects
>>>> to the main one using shared session.
>>>>
>>>> After some time the main application is running, the service application
>>>> stops to connect to the main one with the error
>>>>
>>>>       0"004.166| BUG in main_overlay(): [main] failed to allocate main
>>>> tcb
>>>>
>>>> Digging a bit in the code, I found that the failing function is the
>>>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>>>> If I understood correctly, this function tries to allocate some space
>>>> from the shared heap. But the shared heap is almost free
>>>>
>>>>     # cat /proc/xenomai/heap
>>>>       TOTAL      FREE  NAME
>>>>     4194304   3616512  system heap
>>>>     1048576   1041776  shared heap
>>>>     1048576   1048480  private heap[14505]
>>>>       36864     19968  xddp-pool@0
>>>>       24576      7680  xddp-pool@10
>>>>
>>>
>>> This is the code that triggers the message:
>>>
>>>           if (pthread_key_create(&threadobj_tskey, finalize_thread))
>>>                   early_panic("failed to allocate TSD key");
>>>
>>> pthread_key_create has nothing to do with xnmalloc. That's a plain glibc
>>> service, and its documentation says:
>>>
>>>          The pthread_key_create() function shall fail if:
>>>
>>>          EAGAIN The system lacked the necessary resources to create
>>> another thread-specific data key, or the system-imposed limit on the
>>> total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded.
>>>
>>>          ENOMEM Insufficient memory exists to create the key.
>>>
>>> Already checked THOSE conditions? Are you possibly creating threads
>>> without cleaning them up completely?
>>>
>>> Jan
>>>
>>
>> Hi Jan,
>>
>> thank you.
>> Sorry but I can't figure out how the code you specified (that panics
>> generating the message "failed to allocate TSD key") could trigger the
>> "failed to allocate main tcb" message.
>>
> 
> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
> 
> That's the only line in Xenomai which raises this message. If you don't
> believe it triggered it, start your application in gdb and put a
> breakpoint on that panic line.
> 
> Jan
> 

I believe thet this code triggers the message "failed to allocate TSD key".

But I get the message "failed to allocate main tcb", that is in the same 
file, but at the line #1790.

Regards
-- 
Mauro S.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2022-12-21 14:57       ` Mauro S.
@ 2022-12-21 15:11         ` Jan Kiszka
  2022-12-21 15:25           ` Mauro S.
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-12-21 15:11 UTC (permalink / raw)
  To: Mauro S., xenomai

On 21.12.22 15:57, Mauro S. wrote:
> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>> On 21.12.22 15:45, Mauro S. wrote:
>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>> Copperplate and
>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>
>>>>> I have a main application, and another service application that
>>>>> connects
>>>>> to the main one using shared session.
>>>>>
>>>>> After some time the main application is running, the service
>>>>> application
>>>>> stops to connect to the main one with the error
>>>>>
>>>>>       0"004.166| BUG in main_overlay(): [main] failed to allocate main
>>>>> tcb
>>>>>
>>>>> Digging a bit in the code, I found that the failing function is the
>>>>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>> If I understood correctly, this function tries to allocate some space
>>>>> from the shared heap. But the shared heap is almost free
>>>>>
>>>>>     # cat /proc/xenomai/heap
>>>>>       TOTAL      FREE  NAME
>>>>>     4194304   3616512  system heap
>>>>>     1048576   1041776  shared heap
>>>>>     1048576   1048480  private heap[14505]
>>>>>       36864     19968  xddp-pool@0
>>>>>       24576      7680  xddp-pool@10
>>>>>
>>>>
>>>> This is the code that triggers the message:
>>>>
>>>>           if (pthread_key_create(&threadobj_tskey, finalize_thread))
>>>>                   early_panic("failed to allocate TSD key");
>>>>
>>>> pthread_key_create has nothing to do with xnmalloc. That's a plain
>>>> glibc
>>>> service, and its documentation says:
>>>>
>>>>          The pthread_key_create() function shall fail if:
>>>>
>>>>          EAGAIN The system lacked the necessary resources to create
>>>> another thread-specific data key, or the system-imposed limit on the
>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded.
>>>>
>>>>          ENOMEM Insufficient memory exists to create the key.
>>>>
>>>> Already checked THOSE conditions? Are you possibly creating threads
>>>> without cleaning them up completely?
>>>>
>>>> Jan
>>>>
>>>
>>> Hi Jan,
>>>
>>> thank you.
>>> Sorry but I can't figure out how the code you specified (that panics
>>> generating the message "failed to allocate TSD key") could trigger the
>>> "failed to allocate main tcb" message.
>>>
>>
>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>
>> That's the only line in Xenomai which raises this message. If you don't
>> believe it triggered it, start your application in gdb and put a
>> breakpoint on that panic line.
>>
>> Jan
>>
>
> I believe thet this code triggers the message "failed to allocate TSD key".
>
> But I get the message "failed to allocate main tcb", that is in the same
> file, but at the line #1790.
>

Ah, sorry, you are right. Didn't read careful enough.

Could the heap be fragmented after lots and lots of
allocations/releases? But what is also a bit strange is that the main
thread is continuously mapped. Are you restarting the main application
for every request?

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2022-12-21 15:11         ` Jan Kiszka
@ 2022-12-21 15:25           ` Mauro S.
  2022-12-21 16:04             ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2022-12-21 15:25 UTC (permalink / raw)
  To: Jan Kiszka, xenomai

Il 21/12/22 16:11, Jan Kiszka ha scritto:
> On 21.12.22 15:57, Mauro S. wrote:
>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>> On 21.12.22 15:45, Mauro S. wrote:
>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>> Copperplate and
>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>
>>>>>> I have a main application, and another service application that
>>>>>> connects
>>>>>> to the main one using shared session.
>>>>>>
>>>>>> After some time the main application is running, the service
>>>>>> application
>>>>>> stops to connect to the main one with the error
>>>>>>
>>>>>>        0"004.166| BUG in main_overlay(): [main] failed to allocate main
>>>>>> tcb
>>>>>>
>>>>>> Digging a bit in the code, I found that the failing function is the
>>>>>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>> If I understood correctly, this function tries to allocate some space
>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>
>>>>>>      # cat /proc/xenomai/heap
>>>>>>        TOTAL      FREE  NAME
>>>>>>      4194304   3616512  system heap
>>>>>>      1048576   1041776  shared heap
>>>>>>      1048576   1048480  private heap[14505]
>>>>>>        36864     19968  xddp-pool@0
>>>>>>        24576      7680  xddp-pool@10
>>>>>>
>>>>>
>>>>> This is the code that triggers the message:
>>>>>
>>>>>            if (pthread_key_create(&threadobj_tskey, finalize_thread))
>>>>>                    early_panic("failed to allocate TSD key");
>>>>>
>>>>> pthread_key_create has nothing to do with xnmalloc. That's a plain
>>>>> glibc
>>>>> service, and its documentation says:
>>>>>
>>>>>           The pthread_key_create() function shall fail if:
>>>>>
>>>>>           EAGAIN The system lacked the necessary resources to create
>>>>> another thread-specific data key, or the system-imposed limit on the
>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded.
>>>>>
>>>>>           ENOMEM Insufficient memory exists to create the key.
>>>>>
>>>>> Already checked THOSE conditions? Are you possibly creating threads
>>>>> without cleaning them up completely?
>>>>>
>>>>> Jan
>>>>>
>>>>
>>>> Hi Jan,
>>>>
>>>> thank you.
>>>> Sorry but I can't figure out how the code you specified (that panics
>>>> generating the message "failed to allocate TSD key") could trigger the
>>>> "failed to allocate main tcb" message.
>>>>
>>>
>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>
>>> That's the only line in Xenomai which raises this message. If you don't
>>> believe it triggered it, start your application in gdb and put a
>>> breakpoint on that panic line.
>>>
>>> Jan
>>>
>>
>> I believe thet this code triggers the message "failed to allocate TSD key".
>>
>> But I get the message "failed to allocate main tcb", that is in the same
>> file, but at the line #1790.
>>
> 
> Ah, sorry, you are right. Didn't read careful enough.

Ok, no problem :-)

> 
> Could the heap be fragmented after lots and lots of
> allocations/releases? But what is also a bit strange is that the main
> thread is continuously mapped. Are you restarting the main application
> for every request?
> 

The main application is started once and keeps running, but internally 
creates and destroys many Xenomay tasks.

During the main application lifetime (that can be long), the service 
application reads some statistics from the main application, prints 
these statistics and exits. Then, the main application is never 
restarted, and the service application is restarted for every request, 
that happens every 5 seconds (but the error seems not dependent on the 
service application restart period length).

On another device I have a similar (but not same) main application, and 
the same service application, and this problem does not show.

Then, the problem should be located in the main application, but I can't 
figure out where the problem could be.

Thanks in advance

-- 
Mauro S.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2022-12-21 15:25           ` Mauro S.
@ 2022-12-21 16:04             ` Jan Kiszka
  2022-12-22 13:57               ` Mauro S.
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-12-21 16:04 UTC (permalink / raw)
  To: Mauro S., xenomai

On 21.12.22 16:25, Mauro S. wrote:
> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>> On 21.12.22 15:57, Mauro S. wrote:
>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>> Copperplate and
>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>
>>>>>>> I have a main application, and another service application that
>>>>>>> connects
>>>>>>> to the main one using shared session.
>>>>>>>
>>>>>>> After some time the main application is running, the service
>>>>>>> application
>>>>>>> stops to connect to the main one with the error
>>>>>>>
>>>>>>>        0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>> allocate main
>>>>>>> tcb
>>>>>>>
>>>>>>> Digging a bit in the code, I found that the failing function is the
>>>>>>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>> space
>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>
>>>>>>>      # cat /proc/xenomai/heap
>>>>>>>        TOTAL      FREE  NAME
>>>>>>>      4194304   3616512  system heap
>>>>>>>      1048576   1041776  shared heap
>>>>>>>      1048576   1048480  private heap[14505]
>>>>>>>        36864     19968  xddp-pool@0
>>>>>>>        24576      7680  xddp-pool@10
>>>>>>>
>>>>>>
>>>>>> This is the code that triggers the message:
>>>>>>
>>>>>>            if (pthread_key_create(&threadobj_tskey, finalize_thread))
>>>>>>                    early_panic("failed to allocate TSD key");
>>>>>>
>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a plain
>>>>>> glibc
>>>>>> service, and its documentation says:
>>>>>>
>>>>>>           The pthread_key_create() function shall fail if:
>>>>>>
>>>>>>           EAGAIN The system lacked the necessary resources to create
>>>>>> another thread-specific data key, or the system-imposed limit on the
>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>> exceeded.
>>>>>>
>>>>>>           ENOMEM Insufficient memory exists to create the key.
>>>>>>
>>>>>> Already checked THOSE conditions? Are you possibly creating threads
>>>>>> without cleaning them up completely?
>>>>>>
>>>>>> Jan
>>>>>>
>>>>>
>>>>> Hi Jan,
>>>>>
>>>>> thank you.
>>>>> Sorry but I can't figure out how the code you specified (that panics
>>>>> generating the message "failed to allocate TSD key") could trigger the
>>>>> "failed to allocate main tcb" message.
>>>>>
>>>>
>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>
>>>> That's the only line in Xenomai which raises this message. If you don't
>>>> believe it triggered it, start your application in gdb and put a
>>>> breakpoint on that panic line.
>>>>
>>>> Jan
>>>>
>>>
>>> I believe thet this code triggers the message "failed to allocate TSD
>>> key".
>>>
>>> But I get the message "failed to allocate main tcb", that is in the same
>>> file, but at the line #1790.
>>>
>>
>> Ah, sorry, you are right. Didn't read careful enough.
>
> Ok, no problem :-)
>
>>
>> Could the heap be fragmented after lots and lots of
>> allocations/releases? But what is also a bit strange is that the main
>> thread is continuously mapped. Are you restarting the main application
>> for every request?
>>
>
> The main application is started once and keeps running, but internally
> creates and destroys many Xenomay tasks.
>
> During the main application lifetime (that can be long), the service
> application reads some statistics from the main application, prints
> these statistics and exits. Then, the main application is never
> restarted, and the service application is restarted for every request,
> that happens every 5 seconds (but the error seems not dependent on the
> service application restart period length).

Ok, so the service applications are mapping its main thread against
Xenomai, and that as pshared apps using the shared heap.

>
> On another device I have a similar (but not same) main application, and
> the same service application, and this problem does not show.
>
> Then, the problem should be located in the main application, but I can't
> figure out where the problem could be.

As the allocation happens against a shared, global heap, the issue may
be global as well. You could either try to trace the allocation/release
pattern against the shared heap and look for anomalies or even factor
out that pattern into a stand-alone reproducer.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2022-12-21 16:04             ` Jan Kiszka
@ 2022-12-22 13:57               ` Mauro S.
  2022-12-22 14:11                 ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2022-12-22 13:57 UTC (permalink / raw)
  To: Jan Kiszka, xenomai

Il 21/12/22 17:04, Jan Kiszka ha scritto:
> On 21.12.22 16:25, Mauro S. wrote:
>> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>>> On 21.12.22 15:57, Mauro S. wrote:
>>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>>> Copperplate and
>>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>>
>>>>>>>> I have a main application, and another service application that
>>>>>>>> connects
>>>>>>>> to the main one using shared session.
>>>>>>>>
>>>>>>>> After some time the main application is running, the service
>>>>>>>> application
>>>>>>>> stops to connect to the main one with the error
>>>>>>>>
>>>>>>>>         0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>>> allocate main
>>>>>>>> tcb
>>>>>>>>
>>>>>>>> Digging a bit in the code, I found that the failing function is the
>>>>>>>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>>> space
>>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>>
>>>>>>>>       # cat /proc/xenomai/heap
>>>>>>>>         TOTAL      FREE  NAME
>>>>>>>>       4194304   3616512  system heap
>>>>>>>>       1048576   1041776  shared heap
>>>>>>>>       1048576   1048480  private heap[14505]
>>>>>>>>         36864     19968  xddp-pool@0
>>>>>>>>         24576      7680  xddp-pool@10
>>>>>>>>
>>>>>>>
>>>>>>> This is the code that triggers the message:
>>>>>>>
>>>>>>>             if (pthread_key_create(&threadobj_tskey, finalize_thread))
>>>>>>>                     early_panic("failed to allocate TSD key");
>>>>>>>
>>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a plain
>>>>>>> glibc
>>>>>>> service, and its documentation says:
>>>>>>>
>>>>>>>            The pthread_key_create() function shall fail if:
>>>>>>>
>>>>>>>            EAGAIN The system lacked the necessary resources to create
>>>>>>> another thread-specific data key, or the system-imposed limit on the
>>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>>> exceeded.
>>>>>>>
>>>>>>>            ENOMEM Insufficient memory exists to create the key.
>>>>>>>
>>>>>>> Already checked THOSE conditions? Are you possibly creating threads
>>>>>>> without cleaning them up completely?
>>>>>>>
>>>>>>> Jan
>>>>>>>
>>>>>>
>>>>>> Hi Jan,
>>>>>>
>>>>>> thank you.
>>>>>> Sorry but I can't figure out how the code you specified (that panics
>>>>>> generating the message "failed to allocate TSD key") could trigger the
>>>>>> "failed to allocate main tcb" message.
>>>>>>
>>>>>
>>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>>
>>>>> That's the only line in Xenomai which raises this message. If you don't
>>>>> believe it triggered it, start your application in gdb and put a
>>>>> breakpoint on that panic line.
>>>>>
>>>>> Jan
>>>>>
>>>>
>>>> I believe thet this code triggers the message "failed to allocate TSD
>>>> key".
>>>>
>>>> But I get the message "failed to allocate main tcb", that is in the same
>>>> file, but at the line #1790.
>>>>
>>>
>>> Ah, sorry, you are right. Didn't read careful enough.
>>
>> Ok, no problem :-)
>>
>>>
>>> Could the heap be fragmented after lots and lots of
>>> allocations/releases? But what is also a bit strange is that the main
>>> thread is continuously mapped. Are you restarting the main application
>>> for every request?
>>>
>>
>> The main application is started once and keeps running, but internally
>> creates and destroys many Xenomay tasks.
>>
>> During the main application lifetime (that can be long), the service
>> application reads some statistics from the main application, prints
>> these statistics and exits. Then, the main application is never
>> restarted, and the service application is restarted for every request,
>> that happens every 5 seconds (but the error seems not dependent on the
>> service application restart period length).
> 
> Ok, so the service applications are mapping its main thread against
> Xenomai, and that as pshared apps using the shared heap.
> 

Yes, exactly.

>>
>> On another device I have a similar (but not same) main application, and
>> the same service application, and this problem does not show.
>>
>> Then, the problem should be located in the main application, but I can't
>> figure out where the problem could be.
> 
> As the allocation happens against a shared, global heap, the issue may
> be global as well. You could either try to trace the allocation/release
> pattern against the shared heap and look for anomalies or even factor
> out that pattern into a stand-alone reproducer.

Thanks for your suggestions. Trace the allocation/release pattern would 
be a huge work for this application, then I started with a simple test: 
enlarge the heap sizes.
Before I had a 4096kb system heap, 1024kb shared heap, 1024kb private heap.
Now I have a 8192kb system heap, 4096kb shared heap, 2048kb private heap.
Nothing, the error happens again. Could this be a clue that this is not 
a fragmentation problem and could be a heap corruption problem?

During tests, I also noted that when the error starts to happen, no 
tasks in main applications are created/deleted. But there are some 
queues and pipes communications active, then the heap could be used by 
them.

I will continue investigating on this problem. If you have some other 
ideas/hints, I'll be glad to hear them.

Thank you, regards

-- 
Mauro S.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2022-12-22 13:57               ` Mauro S.
@ 2022-12-22 14:11                 ` Jan Kiszka
  2022-12-22 15:42                   ` Mauro S.
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-12-22 14:11 UTC (permalink / raw)
  To: Mauro S., xenomai

On 22.12.22 14:57, Mauro S. wrote:
> Il 21/12/22 17:04, Jan Kiszka ha scritto:
>> On 21.12.22 16:25, Mauro S. wrote:
>>> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>>>> On 21.12.22 15:57, Mauro S. wrote:
>>>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>>>> Copperplate and
>>>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>>>
>>>>>>>>> I have a main application, and another service application that
>>>>>>>>> connects
>>>>>>>>> to the main one using shared session.
>>>>>>>>>
>>>>>>>>> After some time the main application is running, the service
>>>>>>>>> application
>>>>>>>>> stops to connect to the main one with the error
>>>>>>>>>
>>>>>>>>>         0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>>>> allocate main
>>>>>>>>> tcb
>>>>>>>>>
>>>>>>>>> Digging a bit in the code, I found that the failing function is
>>>>>>>>> the
>>>>>>>>> xnmalloc() call in
>>>>>>>>> lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>>>> space
>>>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>>>
>>>>>>>>>       # cat /proc/xenomai/heap
>>>>>>>>>         TOTAL      FREE  NAME
>>>>>>>>>       4194304   3616512  system heap
>>>>>>>>>       1048576   1041776  shared heap
>>>>>>>>>       1048576   1048480  private heap[14505]
>>>>>>>>>         36864     19968  xddp-pool@0
>>>>>>>>>         24576      7680  xddp-pool@10
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is the code that triggers the message:
>>>>>>>>
>>>>>>>>             if (pthread_key_create(&threadobj_tskey,
>>>>>>>> finalize_thread))
>>>>>>>>                     early_panic("failed to allocate TSD key");
>>>>>>>>
>>>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a plain
>>>>>>>> glibc
>>>>>>>> service, and its documentation says:
>>>>>>>>
>>>>>>>>            The pthread_key_create() function shall fail if:
>>>>>>>>
>>>>>>>>            EAGAIN The system lacked the necessary resources to
>>>>>>>> create
>>>>>>>> another thread-specific data key, or the system-imposed limit on
>>>>>>>> the
>>>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>>>> exceeded.
>>>>>>>>
>>>>>>>>            ENOMEM Insufficient memory exists to create the key.
>>>>>>>>
>>>>>>>> Already checked THOSE conditions? Are you possibly creating threads
>>>>>>>> without cleaning them up completely?
>>>>>>>>
>>>>>>>> Jan
>>>>>>>>
>>>>>>>
>>>>>>> Hi Jan,
>>>>>>>
>>>>>>> thank you.
>>>>>>> Sorry but I can't figure out how the code you specified (that panics
>>>>>>> generating the message "failed to allocate TSD key") could
>>>>>>> trigger the
>>>>>>> "failed to allocate main tcb" message.
>>>>>>>
>>>>>>
>>>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>>>
>>>>>> That's the only line in Xenomai which raises this message. If you
>>>>>> don't
>>>>>> believe it triggered it, start your application in gdb and put a
>>>>>> breakpoint on that panic line.
>>>>>>
>>>>>> Jan
>>>>>>
>>>>>
>>>>> I believe thet this code triggers the message "failed to allocate TSD
>>>>> key".
>>>>>
>>>>> But I get the message "failed to allocate main tcb", that is in the
>>>>> same
>>>>> file, but at the line #1790.
>>>>>
>>>>
>>>> Ah, sorry, you are right. Didn't read careful enough.
>>>
>>> Ok, no problem :-)
>>>
>>>>
>>>> Could the heap be fragmented after lots and lots of
>>>> allocations/releases? But what is also a bit strange is that the main
>>>> thread is continuously mapped. Are you restarting the main application
>>>> for every request?
>>>>
>>>
>>> The main application is started once and keeps running, but internally
>>> creates and destroys many Xenomay tasks.
>>>
>>> During the main application lifetime (that can be long), the service
>>> application reads some statistics from the main application, prints
>>> these statistics and exits. Then, the main application is never
>>> restarted, and the service application is restarted for every request,
>>> that happens every 5 seconds (but the error seems not dependent on the
>>> service application restart period length).
>>
>> Ok, so the service applications are mapping its main thread against
>> Xenomai, and that as pshared apps using the shared heap.
>>
>
> Yes, exactly.
>
>>>
>>> On another device I have a similar (but not same) main application, and
>>> the same service application, and this problem does not show.
>>>
>>> Then, the problem should be located in the main application, but I can't
>>> figure out where the problem could be.
>>
>> As the allocation happens against a shared, global heap, the issue may
>> be global as well. You could either try to trace the allocation/release
>> pattern against the shared heap and look for anomalies or even factor
>> out that pattern into a stand-alone reproducer.
>
> Thanks for your suggestions. Trace the allocation/release pattern would
> be a huge work for this application, then I started with a simple test:
> enlarge the heap sizes.
> Before I had a 4096kb system heap, 1024kb shared heap, 1024kb private heap.
> Now I have a 8192kb system heap, 4096kb shared heap, 2048kb private heap.
> Nothing, the error happens again. Could this be a clue that this is not
> a fragmentation problem and could be a heap corruption problem?

Not impossible. The control structures should be heading the allocated
memory blocks, thus could get corrupted if some other user overwrites
its assigned block.

>
> During tests, I also noted that when the error starts to happen, no
> tasks in main applications are created/deleted. But there are some
> queues and pipes communications active, then the heap could be used by
> them.

All objects that could now be used across processes are allocated on the
shared heap if pshared is on. So we are not only looking at task objects
and their life cycles.

>
> I will continue investigating on this problem. If you have some other
> ideas/hints, I'll be glad to hear them.

Try reducing factors that could contribute to it, e.g. the types of
objects used. That may help narrowing down the actual trigger.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2022-12-22 14:11                 ` Jan Kiszka
@ 2022-12-22 15:42                   ` Mauro S.
  2023-01-05 20:25                     ` Mauro S.
  0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2022-12-22 15:42 UTC (permalink / raw)
  To: Jan Kiszka, xenomai

Il 22/12/22 15:11, Jan Kiszka ha scritto:
> On 22.12.22 14:57, Mauro S. wrote:
>> Il 21/12/22 17:04, Jan Kiszka ha scritto:
>>> On 21.12.22 16:25, Mauro S. wrote:
>>>> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>>>>> On 21.12.22 15:57, Mauro S. wrote:
>>>>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>>>>> Copperplate and
>>>>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>>>>
>>>>>>>>>> I have a main application, and another service application that
>>>>>>>>>> connects
>>>>>>>>>> to the main one using shared session.
>>>>>>>>>>
>>>>>>>>>> After some time the main application is running, the service
>>>>>>>>>> application
>>>>>>>>>> stops to connect to the main one with the error
>>>>>>>>>>
>>>>>>>>>>          0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>>>>> allocate main
>>>>>>>>>> tcb
>>>>>>>>>>
>>>>>>>>>> Digging a bit in the code, I found that the failing function is
>>>>>>>>>> the
>>>>>>>>>> xnmalloc() call in
>>>>>>>>>> lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>>>>> space
>>>>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>>>>
>>>>>>>>>>        # cat /proc/xenomai/heap
>>>>>>>>>>          TOTAL      FREE  NAME
>>>>>>>>>>        4194304   3616512  system heap
>>>>>>>>>>        1048576   1041776  shared heap
>>>>>>>>>>        1048576   1048480  private heap[14505]
>>>>>>>>>>          36864     19968  xddp-pool@0
>>>>>>>>>>          24576      7680  xddp-pool@10
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is the code that triggers the message:
>>>>>>>>>
>>>>>>>>>              if (pthread_key_create(&threadobj_tskey,
>>>>>>>>> finalize_thread))
>>>>>>>>>                      early_panic("failed to allocate TSD key");
>>>>>>>>>
>>>>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a plain
>>>>>>>>> glibc
>>>>>>>>> service, and its documentation says:
>>>>>>>>>
>>>>>>>>>             The pthread_key_create() function shall fail if:
>>>>>>>>>
>>>>>>>>>             EAGAIN The system lacked the necessary resources to
>>>>>>>>> create
>>>>>>>>> another thread-specific data key, or the system-imposed limit on
>>>>>>>>> the
>>>>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>>>>> exceeded.
>>>>>>>>>
>>>>>>>>>             ENOMEM Insufficient memory exists to create the key.
>>>>>>>>>
>>>>>>>>> Already checked THOSE conditions? Are you possibly creating threads
>>>>>>>>> without cleaning them up completely?
>>>>>>>>>
>>>>>>>>> Jan
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Jan,
>>>>>>>>
>>>>>>>> thank you.
>>>>>>>> Sorry but I can't figure out how the code you specified (that panics
>>>>>>>> generating the message "failed to allocate TSD key") could
>>>>>>>> trigger the
>>>>>>>> "failed to allocate main tcb" message.
>>>>>>>>
>>>>>>>
>>>>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>>>>
>>>>>>> That's the only line in Xenomai which raises this message. If you
>>>>>>> don't
>>>>>>> believe it triggered it, start your application in gdb and put a
>>>>>>> breakpoint on that panic line.
>>>>>>>
>>>>>>> Jan
>>>>>>>
>>>>>>
>>>>>> I believe thet this code triggers the message "failed to allocate TSD
>>>>>> key".
>>>>>>
>>>>>> But I get the message "failed to allocate main tcb", that is in the
>>>>>> same
>>>>>> file, but at the line #1790.
>>>>>>
>>>>>
>>>>> Ah, sorry, you are right. Didn't read careful enough.
>>>>
>>>> Ok, no problem :-)
>>>>
>>>>>
>>>>> Could the heap be fragmented after lots and lots of
>>>>> allocations/releases? But what is also a bit strange is that the main
>>>>> thread is continuously mapped. Are you restarting the main application
>>>>> for every request?
>>>>>
>>>>
>>>> The main application is started once and keeps running, but internally
>>>> creates and destroys many Xenomay tasks.
>>>>
>>>> During the main application lifetime (that can be long), the service
>>>> application reads some statistics from the main application, prints
>>>> these statistics and exits. Then, the main application is never
>>>> restarted, and the service application is restarted for every request,
>>>> that happens every 5 seconds (but the error seems not dependent on the
>>>> service application restart period length).
>>>
>>> Ok, so the service applications are mapping its main thread against
>>> Xenomai, and that as pshared apps using the shared heap.
>>>
>>
>> Yes, exactly.
>>
>>>>
>>>> On another device I have a similar (but not same) main application, and
>>>> the same service application, and this problem does not show.
>>>>
>>>> Then, the problem should be located in the main application, but I can't
>>>> figure out where the problem could be.
>>>
>>> As the allocation happens against a shared, global heap, the issue may
>>> be global as well. You could either try to trace the allocation/release
>>> pattern against the shared heap and look for anomalies or even factor
>>> out that pattern into a stand-alone reproducer.
>>
>> Thanks for your suggestions. Trace the allocation/release pattern would
>> be a huge work for this application, then I started with a simple test:
>> enlarge the heap sizes.
>> Before I had a 4096kb system heap, 1024kb shared heap, 1024kb private heap.
>> Now I have a 8192kb system heap, 4096kb shared heap, 2048kb private heap.
>> Nothing, the error happens again. Could this be a clue that this is not
>> a fragmentation problem and could be a heap corruption problem?
> 
> Not impossible. The control structures should be heading the allocated
> memory blocks, thus could get corrupted if some other user overwrites
> its assigned block.
> 
>>
>> During tests, I also noted that when the error starts to happen, no
>> tasks in main applications are created/deleted. But there are some
>> queues and pipes communications active, then the heap could be used by
>> them.
> 
> All objects that could now be used across processes are allocated on the
> shared heap if pshared is on. So we are not only looking at task objects
> and their life cycles.
> 
>>
>> I will continue investigating on this problem. If you have some other
>> ideas/hints, I'll be glad to hear them.
> 
> Try reducing factors that could contribute to it, e.g. the types of
> objects used. That may help narrowing down the actual trigger.
> 
> Jan
> 

Thanks Jan, I will do some tests and (hopefully) come back with results.

Regards
-- 
Mauro S.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2022-12-22 15:42                   ` Mauro S.
@ 2023-01-05 20:25                     ` Mauro S.
  2023-01-09 13:00                       ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2023-01-05 20:25 UTC (permalink / raw)
  To: xenomai; +Cc: Jan Kiszka

[-- Attachment #1: Type: text/plain, Size: 7983 bytes --]

Il 22/12/22 16:42, Mauro S. ha scritto:
> Il 22/12/22 15:11, Jan Kiszka ha scritto:
>> On 22.12.22 14:57, Mauro S. wrote:
>>> Il 21/12/22 17:04, Jan Kiszka ha scritto:
>>>> On 21.12.22 16:25, Mauro S. wrote:
>>>>> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>>>>>> On 21.12.22 15:57, Mauro S. wrote:
>>>>>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>>>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>>>>>> Copperplate and
>>>>>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>>>>>
>>>>>>>>>>> I have a main application, and another service application that
>>>>>>>>>>> connects
>>>>>>>>>>> to the main one using shared session.
>>>>>>>>>>>
>>>>>>>>>>> After some time the main application is running, the service
>>>>>>>>>>> application
>>>>>>>>>>> stops to connect to the main one with the error
>>>>>>>>>>>
>>>>>>>>>>>          0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>>>>>> allocate main
>>>>>>>>>>> tcb
>>>>>>>>>>>
>>>>>>>>>>> Digging a bit in the code, I found that the failing function is
>>>>>>>>>>> the
>>>>>>>>>>> xnmalloc() call in
>>>>>>>>>>> lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>>>>>> space
>>>>>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>>>>>
>>>>>>>>>>>        # cat /proc/xenomai/heap
>>>>>>>>>>>          TOTAL      FREE  NAME
>>>>>>>>>>>        4194304   3616512  system heap
>>>>>>>>>>>        1048576   1041776  shared heap
>>>>>>>>>>>        1048576   1048480  private heap[14505]
>>>>>>>>>>>          36864     19968  xddp-pool@0
>>>>>>>>>>>          24576      7680  xddp-pool@10
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is the code that triggers the message:
>>>>>>>>>>
>>>>>>>>>>              if (pthread_key_create(&threadobj_tskey,
>>>>>>>>>> finalize_thread))
>>>>>>>>>>                      early_panic("failed to allocate TSD key");
>>>>>>>>>>
>>>>>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a 
>>>>>>>>>> plain
>>>>>>>>>> glibc
>>>>>>>>>> service, and its documentation says:
>>>>>>>>>>
>>>>>>>>>>             The pthread_key_create() function shall fail if:
>>>>>>>>>>
>>>>>>>>>>             EAGAIN The system lacked the necessary resources to
>>>>>>>>>> create
>>>>>>>>>> another thread-specific data key, or the system-imposed limit on
>>>>>>>>>> the
>>>>>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>>>>>> exceeded.
>>>>>>>>>>
>>>>>>>>>>             ENOMEM Insufficient memory exists to create the key.
>>>>>>>>>>
>>>>>>>>>> Already checked THOSE conditions? Are you possibly creating 
>>>>>>>>>> threads
>>>>>>>>>> without cleaning them up completely?
>>>>>>>>>>
>>>>>>>>>> Jan
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Jan,
>>>>>>>>>
>>>>>>>>> thank you.
>>>>>>>>> Sorry but I can't figure out how the code you specified (that 
>>>>>>>>> panics
>>>>>>>>> generating the message "failed to allocate TSD key") could
>>>>>>>>> trigger the
>>>>>>>>> "failed to allocate main tcb" message.
>>>>>>>>>
>>>>>>>>
>>>>>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>>>>>
>>>>>>>> That's the only line in Xenomai which raises this message. If you
>>>>>>>> don't
>>>>>>>> believe it triggered it, start your application in gdb and put a
>>>>>>>> breakpoint on that panic line.
>>>>>>>>
>>>>>>>> Jan
>>>>>>>>
>>>>>>>
>>>>>>> I believe thet this code triggers the message "failed to allocate 
>>>>>>> TSD
>>>>>>> key".
>>>>>>>
>>>>>>> But I get the message "failed to allocate main tcb", that is in the
>>>>>>> same
>>>>>>> file, but at the line #1790.
>>>>>>>
>>>>>>
>>>>>> Ah, sorry, you are right. Didn't read careful enough.
>>>>>
>>>>> Ok, no problem :-)
>>>>>
>>>>>>
>>>>>> Could the heap be fragmented after lots and lots of
>>>>>> allocations/releases? But what is also a bit strange is that the main
>>>>>> thread is continuously mapped. Are you restarting the main 
>>>>>> application
>>>>>> for every request?
>>>>>>
>>>>>
>>>>> The main application is started once and keeps running, but internally
>>>>> creates and destroys many Xenomay tasks.
>>>>>
>>>>> During the main application lifetime (that can be long), the service
>>>>> application reads some statistics from the main application, prints
>>>>> these statistics and exits. Then, the main application is never
>>>>> restarted, and the service application is restarted for every request,
>>>>> that happens every 5 seconds (but the error seems not dependent on the
>>>>> service application restart period length).
>>>>
>>>> Ok, so the service applications are mapping its main thread against
>>>> Xenomai, and that as pshared apps using the shared heap.
>>>>
>>>
>>> Yes, exactly.
>>>
>>>>>
>>>>> On another device I have a similar (but not same) main application, 
>>>>> and
>>>>> the same service application, and this problem does not show.
>>>>>
>>>>> Then, the problem should be located in the main application, but I 
>>>>> can't
>>>>> figure out where the problem could be.
>>>>
>>>> As the allocation happens against a shared, global heap, the issue may
>>>> be global as well. You could either try to trace the allocation/release
>>>> pattern against the shared heap and look for anomalies or even factor
>>>> out that pattern into a stand-alone reproducer.
>>>
>>> Thanks for your suggestions. Trace the allocation/release pattern would
>>> be a huge work for this application, then I started with a simple test:
>>> enlarge the heap sizes.
>>> Before I had a 4096kb system heap, 1024kb shared heap, 1024kb private 
>>> heap.
>>> Now I have a 8192kb system heap, 4096kb shared heap, 2048kb private 
>>> heap.
>>> Nothing, the error happens again. Could this be a clue that this is not
>>> a fragmentation problem and could be a heap corruption problem?
>>
>> Not impossible. The control structures should be heading the allocated
>> memory blocks, thus could get corrupted if some other user overwrites
>> its assigned block.
>>
>>>
>>> During tests, I also noted that when the error starts to happen, no
>>> tasks in main applications are created/deleted. But there are some
>>> queues and pipes communications active, then the heap could be used by
>>> them.
>>
>> All objects that could now be used across processes are allocated on the
>> shared heap if pshared is on. So we are not only looking at task objects
>> and their life cycles.
>>
>>>
>>> I will continue investigating on this problem. If you have some other
>>> ideas/hints, I'll be glad to hear them.
>>
>> Try reducing factors that could contribute to it, e.g. the types of
>> objects used. That may help narrowing down the actual trigger.
>>
>> Jan
>>
> 
> Thanks Jan, I will do some tests and (hopefully) come back with results.

Hi Jan,

sorry for the delay.

Attached there is the code of two small test programs able to reproduce 
the problem. There is also the script to start them.

Briefly, the "xeno-test-tcb-allocate-mainproc" application creates two 
rt buffers (request and response) and waits indefinitely on the request 
buffer. On the request buffer, the "xeno-test-tcb-allocate-secproc" 
writes a request and waits for the response on the response buffer. When 
the request is received by the "mainproc", the "mainproc" sends a 
response with an incremental counter on the response buffer, and returns 
to wait requests. "secproc" receives the response, prints its content 
and exits.

"secproc" is launched every 0.5 seconds. After about four minutes 
running, the "secproc" starts to fail at launch with the error

    BUG in main_overlay(): [main] failed to allocate main tcb

Thanks in advance, regards

-- 
Mauro S.

[-- Attachment #2: tcb_allocate_test.tar.gz --]
[-- Type: application/gzip, Size: 1755 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2023-01-05 20:25                     ` Mauro S.
@ 2023-01-09 13:00                       ` Jan Kiszka
  2023-01-09 19:30                         ` Jan Kiszka
  2023-01-09 21:13                         ` R: " Mauro
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Kiszka @ 2023-01-09 13:00 UTC (permalink / raw)
  To: Mauro S., xenomai

On 05.01.23 21:25, Mauro S. wrote:
> Il 22/12/22 16:42, Mauro S. ha scritto:
>> Il 22/12/22 15:11, Jan Kiszka ha scritto:
>>> On 22.12.22 14:57, Mauro S. wrote:
>>>> Il 21/12/22 17:04, Jan Kiszka ha scritto:
>>>>> On 21.12.22 16:25, Mauro S. wrote:
>>>>>> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>>>>>>> On 21.12.22 15:57, Mauro S. wrote:
>>>>>>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>>>>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>>>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>>>>>>> Copperplate and
>>>>>>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>>>>>>
>>>>>>>>>>>> I have a main application, and another service application that
>>>>>>>>>>>> connects
>>>>>>>>>>>> to the main one using shared session.
>>>>>>>>>>>>
>>>>>>>>>>>> After some time the main application is running, the service
>>>>>>>>>>>> application
>>>>>>>>>>>> stops to connect to the main one with the error
>>>>>>>>>>>>
>>>>>>>>>>>>          0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>>>>>>> allocate main
>>>>>>>>>>>> tcb
>>>>>>>>>>>>
>>>>>>>>>>>> Digging a bit in the code, I found that the failing function is
>>>>>>>>>>>> the
>>>>>>>>>>>> xnmalloc() call in
>>>>>>>>>>>> lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>>>>>>> space
>>>>>>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>>>>>>
>>>>>>>>>>>>        # cat /proc/xenomai/heap
>>>>>>>>>>>>          TOTAL      FREE  NAME
>>>>>>>>>>>>        4194304   3616512  system heap
>>>>>>>>>>>>        1048576   1041776  shared heap
>>>>>>>>>>>>        1048576   1048480  private heap[14505]
>>>>>>>>>>>>          36864     19968  xddp-pool@0
>>>>>>>>>>>>          24576      7680  xddp-pool@10
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This is the code that triggers the message:
>>>>>>>>>>>
>>>>>>>>>>>              if (pthread_key_create(&threadobj_tskey,
>>>>>>>>>>> finalize_thread))
>>>>>>>>>>>                      early_panic("failed to allocate TSD key");
>>>>>>>>>>>
>>>>>>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a
>>>>>>>>>>> plain
>>>>>>>>>>> glibc
>>>>>>>>>>> service, and its documentation says:
>>>>>>>>>>>
>>>>>>>>>>>             The pthread_key_create() function shall fail if:
>>>>>>>>>>>
>>>>>>>>>>>             EAGAIN The system lacked the necessary resources to
>>>>>>>>>>> create
>>>>>>>>>>> another thread-specific data key, or the system-imposed limit on
>>>>>>>>>>> the
>>>>>>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>>>>>>> exceeded.
>>>>>>>>>>>
>>>>>>>>>>>             ENOMEM Insufficient memory exists to create the key.
>>>>>>>>>>>
>>>>>>>>>>> Already checked THOSE conditions? Are you possibly creating
>>>>>>>>>>> threads
>>>>>>>>>>> without cleaning them up completely?
>>>>>>>>>>>
>>>>>>>>>>> Jan
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Jan,
>>>>>>>>>>
>>>>>>>>>> thank you.
>>>>>>>>>> Sorry but I can't figure out how the code you specified (that
>>>>>>>>>> panics
>>>>>>>>>> generating the message "failed to allocate TSD key") could
>>>>>>>>>> trigger the
>>>>>>>>>> "failed to allocate main tcb" message.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>>>>>>
>>>>>>>>> That's the only line in Xenomai which raises this message. If you
>>>>>>>>> don't
>>>>>>>>> believe it triggered it, start your application in gdb and put a
>>>>>>>>> breakpoint on that panic line.
>>>>>>>>>
>>>>>>>>> Jan
>>>>>>>>>
>>>>>>>>
>>>>>>>> I believe thet this code triggers the message "failed to
>>>>>>>> allocate TSD
>>>>>>>> key".
>>>>>>>>
>>>>>>>> But I get the message "failed to allocate main tcb", that is in the
>>>>>>>> same
>>>>>>>> file, but at the line #1790.
>>>>>>>>
>>>>>>>
>>>>>>> Ah, sorry, you are right. Didn't read careful enough.
>>>>>>
>>>>>> Ok, no problem :-)
>>>>>>
>>>>>>>
>>>>>>> Could the heap be fragmented after lots and lots of
>>>>>>> allocations/releases? But what is also a bit strange is that the
>>>>>>> main
>>>>>>> thread is continuously mapped. Are you restarting the main
>>>>>>> application
>>>>>>> for every request?
>>>>>>>
>>>>>>
>>>>>> The main application is started once and keeps running, but
>>>>>> internally
>>>>>> creates and destroys many Xenomay tasks.
>>>>>>
>>>>>> During the main application lifetime (that can be long), the service
>>>>>> application reads some statistics from the main application, prints
>>>>>> these statistics and exits. Then, the main application is never
>>>>>> restarted, and the service application is restarted for every
>>>>>> request,
>>>>>> that happens every 5 seconds (but the error seems not dependent on
>>>>>> the
>>>>>> service application restart period length).
>>>>>
>>>>> Ok, so the service applications are mapping its main thread against
>>>>> Xenomai, and that as pshared apps using the shared heap.
>>>>>
>>>>
>>>> Yes, exactly.
>>>>
>>>>>>
>>>>>> On another device I have a similar (but not same) main
>>>>>> application, and
>>>>>> the same service application, and this problem does not show.
>>>>>>
>>>>>> Then, the problem should be located in the main application, but I
>>>>>> can't
>>>>>> figure out where the problem could be.
>>>>>
>>>>> As the allocation happens against a shared, global heap, the issue may
>>>>> be global as well. You could either try to trace the
>>>>> allocation/release
>>>>> pattern against the shared heap and look for anomalies or even factor
>>>>> out that pattern into a stand-alone reproducer.
>>>>
>>>> Thanks for your suggestions. Trace the allocation/release pattern would
>>>> be a huge work for this application, then I started with a simple test:
>>>> enlarge the heap sizes.
>>>> Before I had a 4096kb system heap, 1024kb shared heap, 1024kb
>>>> private heap.
>>>> Now I have a 8192kb system heap, 4096kb shared heap, 2048kb private
>>>> heap.
>>>> Nothing, the error happens again. Could this be a clue that this is not
>>>> a fragmentation problem and could be a heap corruption problem?
>>>
>>> Not impossible. The control structures should be heading the allocated
>>> memory blocks, thus could get corrupted if some other user overwrites
>>> its assigned block.
>>>
>>>>
>>>> During tests, I also noted that when the error starts to happen, no
>>>> tasks in main applications are created/deleted. But there are some
>>>> queues and pipes communications active, then the heap could be used by
>>>> them.
>>>
>>> All objects that could now be used across processes are allocated on the
>>> shared heap if pshared is on. So we are not only looking at task objects
>>> and their life cycles.
>>>
>>>>
>>>> I will continue investigating on this problem. If you have some other
>>>> ideas/hints, I'll be glad to hear them.
>>>
>>> Try reducing factors that could contribute to it, e.g. the types of
>>> objects used. That may help narrowing down the actual trigger.
>>>
>>> Jan
>>>
>>
>> Thanks Jan, I will do some tests and (hopefully) come back with results.
> 
> Hi Jan,
> 
> sorry for the delay.
> 
> Attached there is the code of two small test programs able to reproduce
> the problem. There is also the script to start them.
> 
> Briefly, the "xeno-test-tcb-allocate-mainproc" application creates two
> rt buffers (request and response) and waits indefinitely on the request
> buffer. On the request buffer, the "xeno-test-tcb-allocate-secproc"
> writes a request and waits for the response on the response buffer. When
> the request is received by the "mainproc", the "mainproc" sends a
> response with an incremental counter on the response buffer, and returns
> to wait requests. "secproc" receives the response, prints its content
> and exits.
> 
> "secproc" is launched every 0.5 seconds. After about four minutes
> running, the "secproc" starts to fail at launch with the error
> 
>    BUG in main_overlay(): [main] failed to allocate main tcb
> 

Thanks, reproduced, now trying to understand.

Your test already revealed some other issue, namely avoidable
sign-conversion warnings of Xenomai headers. Patch will come.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: "failed to allocate main tcb" error
  2023-01-09 13:00                       ` Jan Kiszka
@ 2023-01-09 19:30                         ` Jan Kiszka
  2023-01-09 21:13                         ` R: " Mauro
  1 sibling, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2023-01-09 19:30 UTC (permalink / raw)
  To: Mauro S., xenomai

On 09.01.23 14:00, Jan Kiszka wrote:
> On 05.01.23 21:25, Mauro S. wrote:
>> Il 22/12/22 16:42, Mauro S. ha scritto:
>>> Il 22/12/22 15:11, Jan Kiszka ha scritto:
>>>> On 22.12.22 14:57, Mauro S. wrote:
>>>>> Il 21/12/22 17:04, Jan Kiszka ha scritto:
>>>>>> On 21.12.22 16:25, Mauro S. wrote:
>>>>>>> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>>>>>>>> On 21.12.22 15:57, Mauro S. wrote:
>>>>>>>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>>>>>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>>>>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>>>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>>>>>>>> Copperplate and
>>>>>>>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have a main application, and another service application that
>>>>>>>>>>>>> connects
>>>>>>>>>>>>> to the main one using shared session.
>>>>>>>>>>>>>
>>>>>>>>>>>>> After some time the main application is running, the service
>>>>>>>>>>>>> application
>>>>>>>>>>>>> stops to connect to the main one with the error
>>>>>>>>>>>>>
>>>>>>>>>>>>>          0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>>>>>>>> allocate main
>>>>>>>>>>>>> tcb
>>>>>>>>>>>>>
>>>>>>>>>>>>> Digging a bit in the code, I found that the failing function is
>>>>>>>>>>>>> the
>>>>>>>>>>>>> xnmalloc() call in
>>>>>>>>>>>>> lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>>>>>>>> space
>>>>>>>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>>>>>>>
>>>>>>>>>>>>>        # cat /proc/xenomai/heap
>>>>>>>>>>>>>          TOTAL      FREE  NAME
>>>>>>>>>>>>>        4194304   3616512  system heap
>>>>>>>>>>>>>        1048576   1041776  shared heap
>>>>>>>>>>>>>        1048576   1048480  private heap[14505]
>>>>>>>>>>>>>          36864     19968  xddp-pool@0
>>>>>>>>>>>>>          24576      7680  xddp-pool@10
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This is the code that triggers the message:
>>>>>>>>>>>>
>>>>>>>>>>>>              if (pthread_key_create(&threadobj_tskey,
>>>>>>>>>>>> finalize_thread))
>>>>>>>>>>>>                      early_panic("failed to allocate TSD key");
>>>>>>>>>>>>
>>>>>>>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a
>>>>>>>>>>>> plain
>>>>>>>>>>>> glibc
>>>>>>>>>>>> service, and its documentation says:
>>>>>>>>>>>>
>>>>>>>>>>>>             The pthread_key_create() function shall fail if:
>>>>>>>>>>>>
>>>>>>>>>>>>             EAGAIN The system lacked the necessary resources to
>>>>>>>>>>>> create
>>>>>>>>>>>> another thread-specific data key, or the system-imposed limit on
>>>>>>>>>>>> the
>>>>>>>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>>>>>>>> exceeded.
>>>>>>>>>>>>
>>>>>>>>>>>>             ENOMEM Insufficient memory exists to create the key.
>>>>>>>>>>>>
>>>>>>>>>>>> Already checked THOSE conditions? Are you possibly creating
>>>>>>>>>>>> threads
>>>>>>>>>>>> without cleaning them up completely?
>>>>>>>>>>>>
>>>>>>>>>>>> Jan
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Jan,
>>>>>>>>>>>
>>>>>>>>>>> thank you.
>>>>>>>>>>> Sorry but I can't figure out how the code you specified (that
>>>>>>>>>>> panics
>>>>>>>>>>> generating the message "failed to allocate TSD key") could
>>>>>>>>>>> trigger the
>>>>>>>>>>> "failed to allocate main tcb" message.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>>>>>>>
>>>>>>>>>> That's the only line in Xenomai which raises this message. If you
>>>>>>>>>> don't
>>>>>>>>>> believe it triggered it, start your application in gdb and put a
>>>>>>>>>> breakpoint on that panic line.
>>>>>>>>>>
>>>>>>>>>> Jan
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I believe thet this code triggers the message "failed to
>>>>>>>>> allocate TSD
>>>>>>>>> key".
>>>>>>>>>
>>>>>>>>> But I get the message "failed to allocate main tcb", that is in the
>>>>>>>>> same
>>>>>>>>> file, but at the line #1790.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Ah, sorry, you are right. Didn't read careful enough.
>>>>>>>
>>>>>>> Ok, no problem :-)
>>>>>>>
>>>>>>>>
>>>>>>>> Could the heap be fragmented after lots and lots of
>>>>>>>> allocations/releases? But what is also a bit strange is that the
>>>>>>>> main
>>>>>>>> thread is continuously mapped. Are you restarting the main
>>>>>>>> application
>>>>>>>> for every request?
>>>>>>>>
>>>>>>>
>>>>>>> The main application is started once and keeps running, but
>>>>>>> internally
>>>>>>> creates and destroys many Xenomay tasks.
>>>>>>>
>>>>>>> During the main application lifetime (that can be long), the service
>>>>>>> application reads some statistics from the main application, prints
>>>>>>> these statistics and exits. Then, the main application is never
>>>>>>> restarted, and the service application is restarted for every
>>>>>>> request,
>>>>>>> that happens every 5 seconds (but the error seems not dependent on
>>>>>>> the
>>>>>>> service application restart period length).
>>>>>>
>>>>>> Ok, so the service applications are mapping its main thread against
>>>>>> Xenomai, and that as pshared apps using the shared heap.
>>>>>>
>>>>>
>>>>> Yes, exactly.
>>>>>
>>>>>>>
>>>>>>> On another device I have a similar (but not same) main
>>>>>>> application, and
>>>>>>> the same service application, and this problem does not show.
>>>>>>>
>>>>>>> Then, the problem should be located in the main application, but I
>>>>>>> can't
>>>>>>> figure out where the problem could be.
>>>>>>
>>>>>> As the allocation happens against a shared, global heap, the issue may
>>>>>> be global as well. You could either try to trace the
>>>>>> allocation/release
>>>>>> pattern against the shared heap and look for anomalies or even factor
>>>>>> out that pattern into a stand-alone reproducer.
>>>>>
>>>>> Thanks for your suggestions. Trace the allocation/release pattern would
>>>>> be a huge work for this application, then I started with a simple test:
>>>>> enlarge the heap sizes.
>>>>> Before I had a 4096kb system heap, 1024kb shared heap, 1024kb
>>>>> private heap.
>>>>> Now I have a 8192kb system heap, 4096kb shared heap, 2048kb private
>>>>> heap.
>>>>> Nothing, the error happens again. Could this be a clue that this is not
>>>>> a fragmentation problem and could be a heap corruption problem?
>>>>
>>>> Not impossible. The control structures should be heading the allocated
>>>> memory blocks, thus could get corrupted if some other user overwrites
>>>> its assigned block.
>>>>
>>>>>
>>>>> During tests, I also noted that when the error starts to happen, no
>>>>> tasks in main applications are created/deleted. But there are some
>>>>> queues and pipes communications active, then the heap could be used by
>>>>> them.
>>>>
>>>> All objects that could now be used across processes are allocated on the
>>>> shared heap if pshared is on. So we are not only looking at task objects
>>>> and their life cycles.
>>>>
>>>>>
>>>>> I will continue investigating on this problem. If you have some other
>>>>> ideas/hints, I'll be glad to hear them.
>>>>
>>>> Try reducing factors that could contribute to it, e.g. the types of
>>>> objects used. That may help narrowing down the actual trigger.
>>>>
>>>> Jan
>>>>
>>>
>>> Thanks Jan, I will do some tests and (hopefully) come back with results.
>>
>> Hi Jan,
>>
>> sorry for the delay.
>>
>> Attached there is the code of two small test programs able to reproduce
>> the problem. There is also the script to start them.
>>
>> Briefly, the "xeno-test-tcb-allocate-mainproc" application creates two
>> rt buffers (request and response) and waits indefinitely on the request
>> buffer. On the request buffer, the "xeno-test-tcb-allocate-secproc"
>> writes a request and waits for the response on the response buffer. When
>> the request is received by the "mainproc", the "mainproc" sends a
>> response with an incremental counter on the response buffer, and returns
>> to wait requests. "secproc" receives the response, prints its content
>> and exits.
>>
>> "secproc" is launched every 0.5 seconds. After about four minutes
>> running, the "secproc" starts to fail at launch with the error
>>
>>    BUG in main_overlay(): [main] failed to allocate main tcb
>>
> 
> Thanks, reproduced, now trying to understand.
> 

We are leaking the main tcb on the shared heap. That heap - and I also 
regularly forget this - is not tracked by the kernel but purely in 
userspace. Enable the registry, and you can watch /var/run/xenomai
/root/test/system/heaps going up with every secondary call. This seems 
to be needed because the normal pthread-key cleanup handler is not 
called on main exits:

diff --git a/lib/copperplate/threadobj.c b/lib/copperplate/threadobj.c
index f4588a17a8..56d0f1b231 100644
--- a/lib/copperplate/threadobj.c
+++ b/lib/copperplate/threadobj.c
@@ -1770,6 +1770,11 @@ int threadobj_set_schedprio(struct threadobj *thobj, int priority)
 	return threadobj_set_schedparam(thobj, policy, &param_ex);
 }
 
+static void main_exit(void)
+{
+	threadobj_free(threadobj_current());
+}
+
 static inline int main_overlay(void)
 {
 	struct threadobj_init_data idata;
@@ -1806,6 +1811,8 @@ static inline int main_overlay(void)
 	threadobj_prologue(tcb, NULL);
 	pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
 
+	atexit(main_exit);
+
 	return 0;
 }

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* R: Re: "failed to allocate main tcb" error
  2023-01-09 13:00                       ` Jan Kiszka
  2023-01-09 19:30                         ` Jan Kiszka
@ 2023-01-09 21:13                         ` Mauro
  1 sibling, 0 replies; 15+ messages in thread
From: Mauro @ 2023-01-09 21:13 UTC (permalink / raw)
  To: Jan Kiszka, xenomai

    ------ Messaggio Originale ------
    Da: jan.kiszka@siemens.com
    A: mau.salvi@tin.it; xenomai@lists.linux.dev
    Inviato: lunedì 9 gennaio 2023 20:30
    Oggetto: Re: "failed to allocate main tcb" error

  ---8<----

  We are leaking the main tcb on the shared heap. That heap - and I also
  regularly forget this - is not tracked by the kernel but purely in
  userspace. Enable the registry, and you can watch /var/run/xenomai
  /root/test/system/heaps going up with every secondary call. This seems
  to be needed because the normal pthread-key cleanup handler is not
  called on main exits:

  diff --git a/lib/copperplate/threadobj.c b/lib/copperplate/threadobj.c
  index f4588a17a8..56d0f1b231 100644
  --- a/lib/copperplate/threadobj.c
  +++ b/lib/copperplate/threadobj.c
  @@ -1770,6 +1770,11 @@ int threadobj_set_schedprio(struct threadobj 
*thobj, int priority)
   	return threadobj_set_schedparam(thobj, policy, ¶m_ex);
   }

  +static void main_exit(void)
  +{
  +	threadobj_free(threadobj_current());
  +}
  +
   static inline int main_overlay(void)
   {
   	struct threadobj_init_data idata;
  @@ -1806,6 +1811,8 @@ static inline int main_overlay(void)
   	threadobj_prologue(tcb, NULL);
   	pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);

  +	atexit(main_exit);
  +
   	return 0;
   }

  Jan



Thank you very much Jan.

Regards

-- 
Mauro S.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-01-09 21:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-21  9:05 "failed to allocate main tcb" error Mauro S.
2022-12-21 14:29 ` Jan Kiszka
2022-12-21 14:45   ` Mauro S.
2022-12-21 14:48     ` Jan Kiszka
2022-12-21 14:57       ` Mauro S.
2022-12-21 15:11         ` Jan Kiszka
2022-12-21 15:25           ` Mauro S.
2022-12-21 16:04             ` Jan Kiszka
2022-12-22 13:57               ` Mauro S.
2022-12-22 14:11                 ` Jan Kiszka
2022-12-22 15:42                   ` Mauro S.
2023-01-05 20:25                     ` Mauro S.
2023-01-09 13:00                       ` Jan Kiszka
2023-01-09 19:30                         ` Jan Kiszka
2023-01-09 21:13                         ` R: " Mauro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).