* "failed to allocate main tcb" error
@ 2022-12-21 9:05 Mauro S.
2022-12-21 14:29 ` Jan Kiszka
0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2022-12-21 9:05 UTC (permalink / raw)
To: xenomai
Hi all,
I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with Copperplate and
pshared enabled. Linux kernel is 5.4.181.
I have a main application, and another service application that connects
to the main one using shared session.
After some time the main application is running, the service application
stops to connect to the main one with the error
0"004.166| BUG in main_overlay(): [main] failed to allocate main tcb
Digging a bit in the code, I found that the failing function is the
xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
If I understood correctly, this function tries to allocate some space
from the shared heap. But the shared heap is almost free
# cat /proc/xenomai/heap
TOTAL FREE NAME
4194304 3616512 system heap
1048576 1041776 shared heap
1048576 1048480 private heap[14505]
36864 19968 xddp-pool@0
24576 7680 xddp-pool@10
Any suggestions about what could be the problem?
Thanks in advance, regards
--
Mauro S.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2022-12-21 9:05 "failed to allocate main tcb" error Mauro S.
@ 2022-12-21 14:29 ` Jan Kiszka
2022-12-21 14:45 ` Mauro S.
0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-12-21 14:29 UTC (permalink / raw)
To: Mauro S., xenomai
On 21.12.22 10:05, Mauro S. wrote:
> Hi all,
>
> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with Copperplate and
> pshared enabled. Linux kernel is 5.4.181.
>
> I have a main application, and another service application that connects
> to the main one using shared session.
>
> After some time the main application is running, the service application
> stops to connect to the main one with the error
>
> 0"004.166| BUG in main_overlay(): [main] failed to allocate main tcb
>
> Digging a bit in the code, I found that the failing function is the
> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
> If I understood correctly, this function tries to allocate some space
> from the shared heap. But the shared heap is almost free
>
> # cat /proc/xenomai/heap
> TOTAL FREE NAME
> 4194304 3616512 system heap
> 1048576 1041776 shared heap
> 1048576 1048480 private heap[14505]
> 36864 19968 xddp-pool@0
> 24576 7680 xddp-pool@10
>
This is the code that triggers the message:
if (pthread_key_create(&threadobj_tskey, finalize_thread))
early_panic("failed to allocate TSD key");
pthread_key_create has nothing to do with xnmalloc. That's a plain glibc
service, and its documentation says:
The pthread_key_create() function shall fail if:
EAGAIN The system lacked the necessary resources to create
another thread-specific data key, or the system-imposed limit on the
total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded.
ENOMEM Insufficient memory exists to create the key.
Already checked THOSE conditions? Are you possibly creating threads
without cleaning them up completely?
Jan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2022-12-21 14:29 ` Jan Kiszka
@ 2022-12-21 14:45 ` Mauro S.
2022-12-21 14:48 ` Jan Kiszka
0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2022-12-21 14:45 UTC (permalink / raw)
To: Jan Kiszka, xenomai
Il 21/12/22 15:29, Jan Kiszka ha scritto:
> On 21.12.22 10:05, Mauro S. wrote:
>> Hi all,
>>
>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with Copperplate and
>> pshared enabled. Linux kernel is 5.4.181.
>>
>> I have a main application, and another service application that connects
>> to the main one using shared session.
>>
>> After some time the main application is running, the service application
>> stops to connect to the main one with the error
>>
>> 0"004.166| BUG in main_overlay(): [main] failed to allocate main tcb
>>
>> Digging a bit in the code, I found that the failing function is the
>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>> If I understood correctly, this function tries to allocate some space
>> from the shared heap. But the shared heap is almost free
>>
>> # cat /proc/xenomai/heap
>> TOTAL FREE NAME
>> 4194304 3616512 system heap
>> 1048576 1041776 shared heap
>> 1048576 1048480 private heap[14505]
>> 36864 19968 xddp-pool@0
>> 24576 7680 xddp-pool@10
>>
>
> This is the code that triggers the message:
>
> if (pthread_key_create(&threadobj_tskey, finalize_thread))
> early_panic("failed to allocate TSD key");
>
> pthread_key_create has nothing to do with xnmalloc. That's a plain glibc
> service, and its documentation says:
>
> The pthread_key_create() function shall fail if:
>
> EAGAIN The system lacked the necessary resources to create
> another thread-specific data key, or the system-imposed limit on the
> total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded.
>
> ENOMEM Insufficient memory exists to create the key.
>
> Already checked THOSE conditions? Are you possibly creating threads
> without cleaning them up completely?
>
> Jan
>
Hi Jan,
thank you.
Sorry but I can't figure out how the code you specified (that panics
generating the message "failed to allocate TSD key") could trigger the
"failed to allocate main tcb" message.
Thanks again, regards
--
Mauro S.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2022-12-21 14:45 ` Mauro S.
@ 2022-12-21 14:48 ` Jan Kiszka
2022-12-21 14:57 ` Mauro S.
0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-12-21 14:48 UTC (permalink / raw)
To: Mauro S., xenomai
On 21.12.22 15:45, Mauro S. wrote:
> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>> On 21.12.22 10:05, Mauro S. wrote:
>>> Hi all,
>>>
>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with Copperplate and
>>> pshared enabled. Linux kernel is 5.4.181.
>>>
>>> I have a main application, and another service application that connects
>>> to the main one using shared session.
>>>
>>> After some time the main application is running, the service application
>>> stops to connect to the main one with the error
>>>
>>> 0"004.166| BUG in main_overlay(): [main] failed to allocate main
>>> tcb
>>>
>>> Digging a bit in the code, I found that the failing function is the
>>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>>> If I understood correctly, this function tries to allocate some space
>>> from the shared heap. But the shared heap is almost free
>>>
>>> # cat /proc/xenomai/heap
>>> TOTAL FREE NAME
>>> 4194304 3616512 system heap
>>> 1048576 1041776 shared heap
>>> 1048576 1048480 private heap[14505]
>>> 36864 19968 xddp-pool@0
>>> 24576 7680 xddp-pool@10
>>>
>>
>> This is the code that triggers the message:
>>
>> if (pthread_key_create(&threadobj_tskey, finalize_thread))
>> early_panic("failed to allocate TSD key");
>>
>> pthread_key_create has nothing to do with xnmalloc. That's a plain glibc
>> service, and its documentation says:
>>
>> The pthread_key_create() function shall fail if:
>>
>> EAGAIN The system lacked the necessary resources to create
>> another thread-specific data key, or the system-imposed limit on the
>> total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded.
>>
>> ENOMEM Insufficient memory exists to create the key.
>>
>> Already checked THOSE conditions? Are you possibly creating threads
>> without cleaning them up completely?
>>
>> Jan
>>
>
> Hi Jan,
>
> thank you.
> Sorry but I can't figure out how the code you specified (that panics
> generating the message "failed to allocate TSD key") could trigger the
> "failed to allocate main tcb" message.
>
https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
That's the only line in Xenomai which raises this message. If you don't
believe it triggered it, start your application in gdb and put a
breakpoint on that panic line.
Jan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2022-12-21 14:48 ` Jan Kiszka
@ 2022-12-21 14:57 ` Mauro S.
2022-12-21 15:11 ` Jan Kiszka
0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2022-12-21 14:57 UTC (permalink / raw)
To: Jan Kiszka, xenomai
Il 21/12/22 15:48, Jan Kiszka ha scritto:
> On 21.12.22 15:45, Mauro S. wrote:
>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>> On 21.12.22 10:05, Mauro S. wrote:
>>>> Hi all,
>>>>
>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with Copperplate and
>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>
>>>> I have a main application, and another service application that connects
>>>> to the main one using shared session.
>>>>
>>>> After some time the main application is running, the service application
>>>> stops to connect to the main one with the error
>>>>
>>>> 0"004.166| BUG in main_overlay(): [main] failed to allocate main
>>>> tcb
>>>>
>>>> Digging a bit in the code, I found that the failing function is the
>>>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>>>> If I understood correctly, this function tries to allocate some space
>>>> from the shared heap. But the shared heap is almost free
>>>>
>>>> # cat /proc/xenomai/heap
>>>> TOTAL FREE NAME
>>>> 4194304 3616512 system heap
>>>> 1048576 1041776 shared heap
>>>> 1048576 1048480 private heap[14505]
>>>> 36864 19968 xddp-pool@0
>>>> 24576 7680 xddp-pool@10
>>>>
>>>
>>> This is the code that triggers the message:
>>>
>>> if (pthread_key_create(&threadobj_tskey, finalize_thread))
>>> early_panic("failed to allocate TSD key");
>>>
>>> pthread_key_create has nothing to do with xnmalloc. That's a plain glibc
>>> service, and its documentation says:
>>>
>>> The pthread_key_create() function shall fail if:
>>>
>>> EAGAIN The system lacked the necessary resources to create
>>> another thread-specific data key, or the system-imposed limit on the
>>> total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded.
>>>
>>> ENOMEM Insufficient memory exists to create the key.
>>>
>>> Already checked THOSE conditions? Are you possibly creating threads
>>> without cleaning them up completely?
>>>
>>> Jan
>>>
>>
>> Hi Jan,
>>
>> thank you.
>> Sorry but I can't figure out how the code you specified (that panics
>> generating the message "failed to allocate TSD key") could trigger the
>> "failed to allocate main tcb" message.
>>
>
> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>
> That's the only line in Xenomai which raises this message. If you don't
> believe it triggered it, start your application in gdb and put a
> breakpoint on that panic line.
>
> Jan
>
I believe thet this code triggers the message "failed to allocate TSD key".
But I get the message "failed to allocate main tcb", that is in the same
file, but at the line #1790.
Regards
--
Mauro S.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2022-12-21 14:57 ` Mauro S.
@ 2022-12-21 15:11 ` Jan Kiszka
2022-12-21 15:25 ` Mauro S.
0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-12-21 15:11 UTC (permalink / raw)
To: Mauro S., xenomai
On 21.12.22 15:57, Mauro S. wrote:
> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>> On 21.12.22 15:45, Mauro S. wrote:
>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>> Copperplate and
>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>
>>>>> I have a main application, and another service application that
>>>>> connects
>>>>> to the main one using shared session.
>>>>>
>>>>> After some time the main application is running, the service
>>>>> application
>>>>> stops to connect to the main one with the error
>>>>>
>>>>> 0"004.166| BUG in main_overlay(): [main] failed to allocate main
>>>>> tcb
>>>>>
>>>>> Digging a bit in the code, I found that the failing function is the
>>>>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>> If I understood correctly, this function tries to allocate some space
>>>>> from the shared heap. But the shared heap is almost free
>>>>>
>>>>> # cat /proc/xenomai/heap
>>>>> TOTAL FREE NAME
>>>>> 4194304 3616512 system heap
>>>>> 1048576 1041776 shared heap
>>>>> 1048576 1048480 private heap[14505]
>>>>> 36864 19968 xddp-pool@0
>>>>> 24576 7680 xddp-pool@10
>>>>>
>>>>
>>>> This is the code that triggers the message:
>>>>
>>>> if (pthread_key_create(&threadobj_tskey, finalize_thread))
>>>> early_panic("failed to allocate TSD key");
>>>>
>>>> pthread_key_create has nothing to do with xnmalloc. That's a plain
>>>> glibc
>>>> service, and its documentation says:
>>>>
>>>> The pthread_key_create() function shall fail if:
>>>>
>>>> EAGAIN The system lacked the necessary resources to create
>>>> another thread-specific data key, or the system-imposed limit on the
>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded.
>>>>
>>>> ENOMEM Insufficient memory exists to create the key.
>>>>
>>>> Already checked THOSE conditions? Are you possibly creating threads
>>>> without cleaning them up completely?
>>>>
>>>> Jan
>>>>
>>>
>>> Hi Jan,
>>>
>>> thank you.
>>> Sorry but I can't figure out how the code you specified (that panics
>>> generating the message "failed to allocate TSD key") could trigger the
>>> "failed to allocate main tcb" message.
>>>
>>
>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>
>> That's the only line in Xenomai which raises this message. If you don't
>> believe it triggered it, start your application in gdb and put a
>> breakpoint on that panic line.
>>
>> Jan
>>
>
> I believe thet this code triggers the message "failed to allocate TSD key".
>
> But I get the message "failed to allocate main tcb", that is in the same
> file, but at the line #1790.
>
Ah, sorry, you are right. Didn't read careful enough.
Could the heap be fragmented after lots and lots of
allocations/releases? But what is also a bit strange is that the main
thread is continuously mapped. Are you restarting the main application
for every request?
Jan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2022-12-21 15:11 ` Jan Kiszka
@ 2022-12-21 15:25 ` Mauro S.
2022-12-21 16:04 ` Jan Kiszka
0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2022-12-21 15:25 UTC (permalink / raw)
To: Jan Kiszka, xenomai
Il 21/12/22 16:11, Jan Kiszka ha scritto:
> On 21.12.22 15:57, Mauro S. wrote:
>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>> On 21.12.22 15:45, Mauro S. wrote:
>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>> Copperplate and
>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>
>>>>>> I have a main application, and another service application that
>>>>>> connects
>>>>>> to the main one using shared session.
>>>>>>
>>>>>> After some time the main application is running, the service
>>>>>> application
>>>>>> stops to connect to the main one with the error
>>>>>>
>>>>>> 0"004.166| BUG in main_overlay(): [main] failed to allocate main
>>>>>> tcb
>>>>>>
>>>>>> Digging a bit in the code, I found that the failing function is the
>>>>>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>> If I understood correctly, this function tries to allocate some space
>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>
>>>>>> # cat /proc/xenomai/heap
>>>>>> TOTAL FREE NAME
>>>>>> 4194304 3616512 system heap
>>>>>> 1048576 1041776 shared heap
>>>>>> 1048576 1048480 private heap[14505]
>>>>>> 36864 19968 xddp-pool@0
>>>>>> 24576 7680 xddp-pool@10
>>>>>>
>>>>>
>>>>> This is the code that triggers the message:
>>>>>
>>>>> if (pthread_key_create(&threadobj_tskey, finalize_thread))
>>>>> early_panic("failed to allocate TSD key");
>>>>>
>>>>> pthread_key_create has nothing to do with xnmalloc. That's a plain
>>>>> glibc
>>>>> service, and its documentation says:
>>>>>
>>>>> The pthread_key_create() function shall fail if:
>>>>>
>>>>> EAGAIN The system lacked the necessary resources to create
>>>>> another thread-specific data key, or the system-imposed limit on the
>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been exceeded.
>>>>>
>>>>> ENOMEM Insufficient memory exists to create the key.
>>>>>
>>>>> Already checked THOSE conditions? Are you possibly creating threads
>>>>> without cleaning them up completely?
>>>>>
>>>>> Jan
>>>>>
>>>>
>>>> Hi Jan,
>>>>
>>>> thank you.
>>>> Sorry but I can't figure out how the code you specified (that panics
>>>> generating the message "failed to allocate TSD key") could trigger the
>>>> "failed to allocate main tcb" message.
>>>>
>>>
>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>
>>> That's the only line in Xenomai which raises this message. If you don't
>>> believe it triggered it, start your application in gdb and put a
>>> breakpoint on that panic line.
>>>
>>> Jan
>>>
>>
>> I believe thet this code triggers the message "failed to allocate TSD key".
>>
>> But I get the message "failed to allocate main tcb", that is in the same
>> file, but at the line #1790.
>>
>
> Ah, sorry, you are right. Didn't read careful enough.
Ok, no problem :-)
>
> Could the heap be fragmented after lots and lots of
> allocations/releases? But what is also a bit strange is that the main
> thread is continuously mapped. Are you restarting the main application
> for every request?
>
The main application is started once and keeps running, but internally
creates and destroys many Xenomay tasks.
During the main application lifetime (that can be long), the service
application reads some statistics from the main application, prints
these statistics and exits. Then, the main application is never
restarted, and the service application is restarted for every request,
that happens every 5 seconds (but the error seems not dependent on the
service application restart period length).
On another device I have a similar (but not same) main application, and
the same service application, and this problem does not show.
Then, the problem should be located in the main application, but I can't
figure out where the problem could be.
Thanks in advance
--
Mauro S.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2022-12-21 15:25 ` Mauro S.
@ 2022-12-21 16:04 ` Jan Kiszka
2022-12-22 13:57 ` Mauro S.
0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-12-21 16:04 UTC (permalink / raw)
To: Mauro S., xenomai
On 21.12.22 16:25, Mauro S. wrote:
> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>> On 21.12.22 15:57, Mauro S. wrote:
>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>> Copperplate and
>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>
>>>>>>> I have a main application, and another service application that
>>>>>>> connects
>>>>>>> to the main one using shared session.
>>>>>>>
>>>>>>> After some time the main application is running, the service
>>>>>>> application
>>>>>>> stops to connect to the main one with the error
>>>>>>>
>>>>>>> 0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>> allocate main
>>>>>>> tcb
>>>>>>>
>>>>>>> Digging a bit in the code, I found that the failing function is the
>>>>>>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>> space
>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>
>>>>>>> # cat /proc/xenomai/heap
>>>>>>> TOTAL FREE NAME
>>>>>>> 4194304 3616512 system heap
>>>>>>> 1048576 1041776 shared heap
>>>>>>> 1048576 1048480 private heap[14505]
>>>>>>> 36864 19968 xddp-pool@0
>>>>>>> 24576 7680 xddp-pool@10
>>>>>>>
>>>>>>
>>>>>> This is the code that triggers the message:
>>>>>>
>>>>>> if (pthread_key_create(&threadobj_tskey, finalize_thread))
>>>>>> early_panic("failed to allocate TSD key");
>>>>>>
>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a plain
>>>>>> glibc
>>>>>> service, and its documentation says:
>>>>>>
>>>>>> The pthread_key_create() function shall fail if:
>>>>>>
>>>>>> EAGAIN The system lacked the necessary resources to create
>>>>>> another thread-specific data key, or the system-imposed limit on the
>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>> exceeded.
>>>>>>
>>>>>> ENOMEM Insufficient memory exists to create the key.
>>>>>>
>>>>>> Already checked THOSE conditions? Are you possibly creating threads
>>>>>> without cleaning them up completely?
>>>>>>
>>>>>> Jan
>>>>>>
>>>>>
>>>>> Hi Jan,
>>>>>
>>>>> thank you.
>>>>> Sorry but I can't figure out how the code you specified (that panics
>>>>> generating the message "failed to allocate TSD key") could trigger the
>>>>> "failed to allocate main tcb" message.
>>>>>
>>>>
>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>
>>>> That's the only line in Xenomai which raises this message. If you don't
>>>> believe it triggered it, start your application in gdb and put a
>>>> breakpoint on that panic line.
>>>>
>>>> Jan
>>>>
>>>
>>> I believe thet this code triggers the message "failed to allocate TSD
>>> key".
>>>
>>> But I get the message "failed to allocate main tcb", that is in the same
>>> file, but at the line #1790.
>>>
>>
>> Ah, sorry, you are right. Didn't read careful enough.
>
> Ok, no problem :-)
>
>>
>> Could the heap be fragmented after lots and lots of
>> allocations/releases? But what is also a bit strange is that the main
>> thread is continuously mapped. Are you restarting the main application
>> for every request?
>>
>
> The main application is started once and keeps running, but internally
> creates and destroys many Xenomay tasks.
>
> During the main application lifetime (that can be long), the service
> application reads some statistics from the main application, prints
> these statistics and exits. Then, the main application is never
> restarted, and the service application is restarted for every request,
> that happens every 5 seconds (but the error seems not dependent on the
> service application restart period length).
Ok, so the service applications are mapping its main thread against
Xenomai, and that as pshared apps using the shared heap.
>
> On another device I have a similar (but not same) main application, and
> the same service application, and this problem does not show.
>
> Then, the problem should be located in the main application, but I can't
> figure out where the problem could be.
As the allocation happens against a shared, global heap, the issue may
be global as well. You could either try to trace the allocation/release
pattern against the shared heap and look for anomalies or even factor
out that pattern into a stand-alone reproducer.
Jan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2022-12-21 16:04 ` Jan Kiszka
@ 2022-12-22 13:57 ` Mauro S.
2022-12-22 14:11 ` Jan Kiszka
0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2022-12-22 13:57 UTC (permalink / raw)
To: Jan Kiszka, xenomai
Il 21/12/22 17:04, Jan Kiszka ha scritto:
> On 21.12.22 16:25, Mauro S. wrote:
>> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>>> On 21.12.22 15:57, Mauro S. wrote:
>>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>>> Copperplate and
>>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>>
>>>>>>>> I have a main application, and another service application that
>>>>>>>> connects
>>>>>>>> to the main one using shared session.
>>>>>>>>
>>>>>>>> After some time the main application is running, the service
>>>>>>>> application
>>>>>>>> stops to connect to the main one with the error
>>>>>>>>
>>>>>>>> 0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>>> allocate main
>>>>>>>> tcb
>>>>>>>>
>>>>>>>> Digging a bit in the code, I found that the failing function is the
>>>>>>>> xnmalloc() call in lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>>> space
>>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>>
>>>>>>>> # cat /proc/xenomai/heap
>>>>>>>> TOTAL FREE NAME
>>>>>>>> 4194304 3616512 system heap
>>>>>>>> 1048576 1041776 shared heap
>>>>>>>> 1048576 1048480 private heap[14505]
>>>>>>>> 36864 19968 xddp-pool@0
>>>>>>>> 24576 7680 xddp-pool@10
>>>>>>>>
>>>>>>>
>>>>>>> This is the code that triggers the message:
>>>>>>>
>>>>>>> if (pthread_key_create(&threadobj_tskey, finalize_thread))
>>>>>>> early_panic("failed to allocate TSD key");
>>>>>>>
>>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a plain
>>>>>>> glibc
>>>>>>> service, and its documentation says:
>>>>>>>
>>>>>>> The pthread_key_create() function shall fail if:
>>>>>>>
>>>>>>> EAGAIN The system lacked the necessary resources to create
>>>>>>> another thread-specific data key, or the system-imposed limit on the
>>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>>> exceeded.
>>>>>>>
>>>>>>> ENOMEM Insufficient memory exists to create the key.
>>>>>>>
>>>>>>> Already checked THOSE conditions? Are you possibly creating threads
>>>>>>> without cleaning them up completely?
>>>>>>>
>>>>>>> Jan
>>>>>>>
>>>>>>
>>>>>> Hi Jan,
>>>>>>
>>>>>> thank you.
>>>>>> Sorry but I can't figure out how the code you specified (that panics
>>>>>> generating the message "failed to allocate TSD key") could trigger the
>>>>>> "failed to allocate main tcb" message.
>>>>>>
>>>>>
>>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>>
>>>>> That's the only line in Xenomai which raises this message. If you don't
>>>>> believe it triggered it, start your application in gdb and put a
>>>>> breakpoint on that panic line.
>>>>>
>>>>> Jan
>>>>>
>>>>
>>>> I believe thet this code triggers the message "failed to allocate TSD
>>>> key".
>>>>
>>>> But I get the message "failed to allocate main tcb", that is in the same
>>>> file, but at the line #1790.
>>>>
>>>
>>> Ah, sorry, you are right. Didn't read careful enough.
>>
>> Ok, no problem :-)
>>
>>>
>>> Could the heap be fragmented after lots and lots of
>>> allocations/releases? But what is also a bit strange is that the main
>>> thread is continuously mapped. Are you restarting the main application
>>> for every request?
>>>
>>
>> The main application is started once and keeps running, but internally
>> creates and destroys many Xenomay tasks.
>>
>> During the main application lifetime (that can be long), the service
>> application reads some statistics from the main application, prints
>> these statistics and exits. Then, the main application is never
>> restarted, and the service application is restarted for every request,
>> that happens every 5 seconds (but the error seems not dependent on the
>> service application restart period length).
>
> Ok, so the service applications are mapping its main thread against
> Xenomai, and that as pshared apps using the shared heap.
>
Yes, exactly.
>>
>> On another device I have a similar (but not same) main application, and
>> the same service application, and this problem does not show.
>>
>> Then, the problem should be located in the main application, but I can't
>> figure out where the problem could be.
>
> As the allocation happens against a shared, global heap, the issue may
> be global as well. You could either try to trace the allocation/release
> pattern against the shared heap and look for anomalies or even factor
> out that pattern into a stand-alone reproducer.
Thanks for your suggestions. Trace the allocation/release pattern would
be a huge work for this application, then I started with a simple test:
enlarge the heap sizes.
Before I had a 4096kb system heap, 1024kb shared heap, 1024kb private heap.
Now I have a 8192kb system heap, 4096kb shared heap, 2048kb private heap.
Nothing, the error happens again. Could this be a clue that this is not
a fragmentation problem and could be a heap corruption problem?
During tests, I also noted that when the error starts to happen, no
tasks in main applications are created/deleted. But there are some
queues and pipes communications active, then the heap could be used by
them.
I will continue investigating on this problem. If you have some other
ideas/hints, I'll be glad to hear them.
Thank you, regards
--
Mauro S.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2022-12-22 13:57 ` Mauro S.
@ 2022-12-22 14:11 ` Jan Kiszka
2022-12-22 15:42 ` Mauro S.
0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-12-22 14:11 UTC (permalink / raw)
To: Mauro S., xenomai
On 22.12.22 14:57, Mauro S. wrote:
> Il 21/12/22 17:04, Jan Kiszka ha scritto:
>> On 21.12.22 16:25, Mauro S. wrote:
>>> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>>>> On 21.12.22 15:57, Mauro S. wrote:
>>>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>>>> Copperplate and
>>>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>>>
>>>>>>>>> I have a main application, and another service application that
>>>>>>>>> connects
>>>>>>>>> to the main one using shared session.
>>>>>>>>>
>>>>>>>>> After some time the main application is running, the service
>>>>>>>>> application
>>>>>>>>> stops to connect to the main one with the error
>>>>>>>>>
>>>>>>>>> 0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>>>> allocate main
>>>>>>>>> tcb
>>>>>>>>>
>>>>>>>>> Digging a bit in the code, I found that the failing function is
>>>>>>>>> the
>>>>>>>>> xnmalloc() call in
>>>>>>>>> lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>>>> space
>>>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>>>
>>>>>>>>> # cat /proc/xenomai/heap
>>>>>>>>> TOTAL FREE NAME
>>>>>>>>> 4194304 3616512 system heap
>>>>>>>>> 1048576 1041776 shared heap
>>>>>>>>> 1048576 1048480 private heap[14505]
>>>>>>>>> 36864 19968 xddp-pool@0
>>>>>>>>> 24576 7680 xddp-pool@10
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is the code that triggers the message:
>>>>>>>>
>>>>>>>> if (pthread_key_create(&threadobj_tskey,
>>>>>>>> finalize_thread))
>>>>>>>> early_panic("failed to allocate TSD key");
>>>>>>>>
>>>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a plain
>>>>>>>> glibc
>>>>>>>> service, and its documentation says:
>>>>>>>>
>>>>>>>> The pthread_key_create() function shall fail if:
>>>>>>>>
>>>>>>>> EAGAIN The system lacked the necessary resources to
>>>>>>>> create
>>>>>>>> another thread-specific data key, or the system-imposed limit on
>>>>>>>> the
>>>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>>>> exceeded.
>>>>>>>>
>>>>>>>> ENOMEM Insufficient memory exists to create the key.
>>>>>>>>
>>>>>>>> Already checked THOSE conditions? Are you possibly creating threads
>>>>>>>> without cleaning them up completely?
>>>>>>>>
>>>>>>>> Jan
>>>>>>>>
>>>>>>>
>>>>>>> Hi Jan,
>>>>>>>
>>>>>>> thank you.
>>>>>>> Sorry but I can't figure out how the code you specified (that panics
>>>>>>> generating the message "failed to allocate TSD key") could
>>>>>>> trigger the
>>>>>>> "failed to allocate main tcb" message.
>>>>>>>
>>>>>>
>>>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>>>
>>>>>> That's the only line in Xenomai which raises this message. If you
>>>>>> don't
>>>>>> believe it triggered it, start your application in gdb and put a
>>>>>> breakpoint on that panic line.
>>>>>>
>>>>>> Jan
>>>>>>
>>>>>
>>>>> I believe thet this code triggers the message "failed to allocate TSD
>>>>> key".
>>>>>
>>>>> But I get the message "failed to allocate main tcb", that is in the
>>>>> same
>>>>> file, but at the line #1790.
>>>>>
>>>>
>>>> Ah, sorry, you are right. Didn't read careful enough.
>>>
>>> Ok, no problem :-)
>>>
>>>>
>>>> Could the heap be fragmented after lots and lots of
>>>> allocations/releases? But what is also a bit strange is that the main
>>>> thread is continuously mapped. Are you restarting the main application
>>>> for every request?
>>>>
>>>
>>> The main application is started once and keeps running, but internally
>>> creates and destroys many Xenomay tasks.
>>>
>>> During the main application lifetime (that can be long), the service
>>> application reads some statistics from the main application, prints
>>> these statistics and exits. Then, the main application is never
>>> restarted, and the service application is restarted for every request,
>>> that happens every 5 seconds (but the error seems not dependent on the
>>> service application restart period length).
>>
>> Ok, so the service applications are mapping its main thread against
>> Xenomai, and that as pshared apps using the shared heap.
>>
>
> Yes, exactly.
>
>>>
>>> On another device I have a similar (but not same) main application, and
>>> the same service application, and this problem does not show.
>>>
>>> Then, the problem should be located in the main application, but I can't
>>> figure out where the problem could be.
>>
>> As the allocation happens against a shared, global heap, the issue may
>> be global as well. You could either try to trace the allocation/release
>> pattern against the shared heap and look for anomalies or even factor
>> out that pattern into a stand-alone reproducer.
>
> Thanks for your suggestions. Trace the allocation/release pattern would
> be a huge work for this application, then I started with a simple test:
> enlarge the heap sizes.
> Before I had a 4096kb system heap, 1024kb shared heap, 1024kb private heap.
> Now I have a 8192kb system heap, 4096kb shared heap, 2048kb private heap.
> Nothing, the error happens again. Could this be a clue that this is not
> a fragmentation problem and could be a heap corruption problem?
Not impossible. The control structures should be heading the allocated
memory blocks, thus could get corrupted if some other user overwrites
its assigned block.
>
> During tests, I also noted that when the error starts to happen, no
> tasks in main applications are created/deleted. But there are some
> queues and pipes communications active, then the heap could be used by
> them.
All objects that could now be used across processes are allocated on the
shared heap if pshared is on. So we are not only looking at task objects
and their life cycles.
>
> I will continue investigating on this problem. If you have some other
> ideas/hints, I'll be glad to hear them.
Try reducing factors that could contribute to it, e.g. the types of
objects used. That may help narrowing down the actual trigger.
Jan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2022-12-22 14:11 ` Jan Kiszka
@ 2022-12-22 15:42 ` Mauro S.
2023-01-05 20:25 ` Mauro S.
0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2022-12-22 15:42 UTC (permalink / raw)
To: Jan Kiszka, xenomai
Il 22/12/22 15:11, Jan Kiszka ha scritto:
> On 22.12.22 14:57, Mauro S. wrote:
>> Il 21/12/22 17:04, Jan Kiszka ha scritto:
>>> On 21.12.22 16:25, Mauro S. wrote:
>>>> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>>>>> On 21.12.22 15:57, Mauro S. wrote:
>>>>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>>>>> Copperplate and
>>>>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>>>>
>>>>>>>>>> I have a main application, and another service application that
>>>>>>>>>> connects
>>>>>>>>>> to the main one using shared session.
>>>>>>>>>>
>>>>>>>>>> After some time the main application is running, the service
>>>>>>>>>> application
>>>>>>>>>> stops to connect to the main one with the error
>>>>>>>>>>
>>>>>>>>>> 0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>>>>> allocate main
>>>>>>>>>> tcb
>>>>>>>>>>
>>>>>>>>>> Digging a bit in the code, I found that the failing function is
>>>>>>>>>> the
>>>>>>>>>> xnmalloc() call in
>>>>>>>>>> lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>>>>> space
>>>>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>>>>
>>>>>>>>>> # cat /proc/xenomai/heap
>>>>>>>>>> TOTAL FREE NAME
>>>>>>>>>> 4194304 3616512 system heap
>>>>>>>>>> 1048576 1041776 shared heap
>>>>>>>>>> 1048576 1048480 private heap[14505]
>>>>>>>>>> 36864 19968 xddp-pool@0
>>>>>>>>>> 24576 7680 xddp-pool@10
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is the code that triggers the message:
>>>>>>>>>
>>>>>>>>> if (pthread_key_create(&threadobj_tskey,
>>>>>>>>> finalize_thread))
>>>>>>>>> early_panic("failed to allocate TSD key");
>>>>>>>>>
>>>>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a plain
>>>>>>>>> glibc
>>>>>>>>> service, and its documentation says:
>>>>>>>>>
>>>>>>>>> The pthread_key_create() function shall fail if:
>>>>>>>>>
>>>>>>>>> EAGAIN The system lacked the necessary resources to
>>>>>>>>> create
>>>>>>>>> another thread-specific data key, or the system-imposed limit on
>>>>>>>>> the
>>>>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>>>>> exceeded.
>>>>>>>>>
>>>>>>>>> ENOMEM Insufficient memory exists to create the key.
>>>>>>>>>
>>>>>>>>> Already checked THOSE conditions? Are you possibly creating threads
>>>>>>>>> without cleaning them up completely?
>>>>>>>>>
>>>>>>>>> Jan
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Jan,
>>>>>>>>
>>>>>>>> thank you.
>>>>>>>> Sorry but I can't figure out how the code you specified (that panics
>>>>>>>> generating the message "failed to allocate TSD key") could
>>>>>>>> trigger the
>>>>>>>> "failed to allocate main tcb" message.
>>>>>>>>
>>>>>>>
>>>>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>>>>
>>>>>>> That's the only line in Xenomai which raises this message. If you
>>>>>>> don't
>>>>>>> believe it triggered it, start your application in gdb and put a
>>>>>>> breakpoint on that panic line.
>>>>>>>
>>>>>>> Jan
>>>>>>>
>>>>>>
>>>>>> I believe thet this code triggers the message "failed to allocate TSD
>>>>>> key".
>>>>>>
>>>>>> But I get the message "failed to allocate main tcb", that is in the
>>>>>> same
>>>>>> file, but at the line #1790.
>>>>>>
>>>>>
>>>>> Ah, sorry, you are right. Didn't read careful enough.
>>>>
>>>> Ok, no problem :-)
>>>>
>>>>>
>>>>> Could the heap be fragmented after lots and lots of
>>>>> allocations/releases? But what is also a bit strange is that the main
>>>>> thread is continuously mapped. Are you restarting the main application
>>>>> for every request?
>>>>>
>>>>
>>>> The main application is started once and keeps running, but internally
>>>> creates and destroys many Xenomay tasks.
>>>>
>>>> During the main application lifetime (that can be long), the service
>>>> application reads some statistics from the main application, prints
>>>> these statistics and exits. Then, the main application is never
>>>> restarted, and the service application is restarted for every request,
>>>> that happens every 5 seconds (but the error seems not dependent on the
>>>> service application restart period length).
>>>
>>> Ok, so the service applications are mapping its main thread against
>>> Xenomai, and that as pshared apps using the shared heap.
>>>
>>
>> Yes, exactly.
>>
>>>>
>>>> On another device I have a similar (but not same) main application, and
>>>> the same service application, and this problem does not show.
>>>>
>>>> Then, the problem should be located in the main application, but I can't
>>>> figure out where the problem could be.
>>>
>>> As the allocation happens against a shared, global heap, the issue may
>>> be global as well. You could either try to trace the allocation/release
>>> pattern against the shared heap and look for anomalies or even factor
>>> out that pattern into a stand-alone reproducer.
>>
>> Thanks for your suggestions. Trace the allocation/release pattern would
>> be a huge work for this application, then I started with a simple test:
>> enlarge the heap sizes.
>> Before I had a 4096kb system heap, 1024kb shared heap, 1024kb private heap.
>> Now I have a 8192kb system heap, 4096kb shared heap, 2048kb private heap.
>> Nothing, the error happens again. Could this be a clue that this is not
>> a fragmentation problem and could be a heap corruption problem?
>
> Not impossible. The control structures should be heading the allocated
> memory blocks, thus could get corrupted if some other user overwrites
> its assigned block.
>
>>
>> During tests, I also noted that when the error starts to happen, no
>> tasks in main applications are created/deleted. But there are some
>> queues and pipes communications active, then the heap could be used by
>> them.
>
> All objects that could now be used across processes are allocated on the
> shared heap if pshared is on. So we are not only looking at task objects
> and their life cycles.
>
>>
>> I will continue investigating on this problem. If you have some other
>> ideas/hints, I'll be glad to hear them.
>
> Try reducing factors that could contribute to it, e.g. the types of
> objects used. That may help narrowing down the actual trigger.
>
> Jan
>
Thanks Jan, I will do some tests and (hopefully) come back with results.
Regards
--
Mauro S.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2022-12-22 15:42 ` Mauro S.
@ 2023-01-05 20:25 ` Mauro S.
2023-01-09 13:00 ` Jan Kiszka
0 siblings, 1 reply; 15+ messages in thread
From: Mauro S. @ 2023-01-05 20:25 UTC (permalink / raw)
To: xenomai; +Cc: Jan Kiszka
[-- Attachment #1: Type: text/plain, Size: 7983 bytes --]
Il 22/12/22 16:42, Mauro S. ha scritto:
> Il 22/12/22 15:11, Jan Kiszka ha scritto:
>> On 22.12.22 14:57, Mauro S. wrote:
>>> Il 21/12/22 17:04, Jan Kiszka ha scritto:
>>>> On 21.12.22 16:25, Mauro S. wrote:
>>>>> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>>>>>> On 21.12.22 15:57, Mauro S. wrote:
>>>>>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>>>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>>>>>> Copperplate and
>>>>>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>>>>>
>>>>>>>>>>> I have a main application, and another service application that
>>>>>>>>>>> connects
>>>>>>>>>>> to the main one using shared session.
>>>>>>>>>>>
>>>>>>>>>>> After some time the main application is running, the service
>>>>>>>>>>> application
>>>>>>>>>>> stops to connect to the main one with the error
>>>>>>>>>>>
>>>>>>>>>>> 0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>>>>>> allocate main
>>>>>>>>>>> tcb
>>>>>>>>>>>
>>>>>>>>>>> Digging a bit in the code, I found that the failing function is
>>>>>>>>>>> the
>>>>>>>>>>> xnmalloc() call in
>>>>>>>>>>> lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>>>>>> space
>>>>>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>>>>>
>>>>>>>>>>> # cat /proc/xenomai/heap
>>>>>>>>>>> TOTAL FREE NAME
>>>>>>>>>>> 4194304 3616512 system heap
>>>>>>>>>>> 1048576 1041776 shared heap
>>>>>>>>>>> 1048576 1048480 private heap[14505]
>>>>>>>>>>> 36864 19968 xddp-pool@0
>>>>>>>>>>> 24576 7680 xddp-pool@10
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is the code that triggers the message:
>>>>>>>>>>
>>>>>>>>>> if (pthread_key_create(&threadobj_tskey,
>>>>>>>>>> finalize_thread))
>>>>>>>>>> early_panic("failed to allocate TSD key");
>>>>>>>>>>
>>>>>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a
>>>>>>>>>> plain
>>>>>>>>>> glibc
>>>>>>>>>> service, and its documentation says:
>>>>>>>>>>
>>>>>>>>>> The pthread_key_create() function shall fail if:
>>>>>>>>>>
>>>>>>>>>> EAGAIN The system lacked the necessary resources to
>>>>>>>>>> create
>>>>>>>>>> another thread-specific data key, or the system-imposed limit on
>>>>>>>>>> the
>>>>>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>>>>>> exceeded.
>>>>>>>>>>
>>>>>>>>>> ENOMEM Insufficient memory exists to create the key.
>>>>>>>>>>
>>>>>>>>>> Already checked THOSE conditions? Are you possibly creating
>>>>>>>>>> threads
>>>>>>>>>> without cleaning them up completely?
>>>>>>>>>>
>>>>>>>>>> Jan
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Jan,
>>>>>>>>>
>>>>>>>>> thank you.
>>>>>>>>> Sorry but I can't figure out how the code you specified (that
>>>>>>>>> panics
>>>>>>>>> generating the message "failed to allocate TSD key") could
>>>>>>>>> trigger the
>>>>>>>>> "failed to allocate main tcb" message.
>>>>>>>>>
>>>>>>>>
>>>>>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>>>>>
>>>>>>>> That's the only line in Xenomai which raises this message. If you
>>>>>>>> don't
>>>>>>>> believe it triggered it, start your application in gdb and put a
>>>>>>>> breakpoint on that panic line.
>>>>>>>>
>>>>>>>> Jan
>>>>>>>>
>>>>>>>
>>>>>>> I believe thet this code triggers the message "failed to allocate
>>>>>>> TSD
>>>>>>> key".
>>>>>>>
>>>>>>> But I get the message "failed to allocate main tcb", that is in the
>>>>>>> same
>>>>>>> file, but at the line #1790.
>>>>>>>
>>>>>>
>>>>>> Ah, sorry, you are right. Didn't read careful enough.
>>>>>
>>>>> Ok, no problem :-)
>>>>>
>>>>>>
>>>>>> Could the heap be fragmented after lots and lots of
>>>>>> allocations/releases? But what is also a bit strange is that the main
>>>>>> thread is continuously mapped. Are you restarting the main
>>>>>> application
>>>>>> for every request?
>>>>>>
>>>>>
>>>>> The main application is started once and keeps running, but internally
>>>>> creates and destroys many Xenomay tasks.
>>>>>
>>>>> During the main application lifetime (that can be long), the service
>>>>> application reads some statistics from the main application, prints
>>>>> these statistics and exits. Then, the main application is never
>>>>> restarted, and the service application is restarted for every request,
>>>>> that happens every 5 seconds (but the error seems not dependent on the
>>>>> service application restart period length).
>>>>
>>>> Ok, so the service applications are mapping its main thread against
>>>> Xenomai, and that as pshared apps using the shared heap.
>>>>
>>>
>>> Yes, exactly.
>>>
>>>>>
>>>>> On another device I have a similar (but not same) main application,
>>>>> and
>>>>> the same service application, and this problem does not show.
>>>>>
>>>>> Then, the problem should be located in the main application, but I
>>>>> can't
>>>>> figure out where the problem could be.
>>>>
>>>> As the allocation happens against a shared, global heap, the issue may
>>>> be global as well. You could either try to trace the allocation/release
>>>> pattern against the shared heap and look for anomalies or even factor
>>>> out that pattern into a stand-alone reproducer.
>>>
>>> Thanks for your suggestions. Trace the allocation/release pattern would
>>> be a huge work for this application, then I started with a simple test:
>>> enlarge the heap sizes.
>>> Before I had a 4096kb system heap, 1024kb shared heap, 1024kb private
>>> heap.
>>> Now I have a 8192kb system heap, 4096kb shared heap, 2048kb private
>>> heap.
>>> Nothing, the error happens again. Could this be a clue that this is not
>>> a fragmentation problem and could be a heap corruption problem?
>>
>> Not impossible. The control structures should be heading the allocated
>> memory blocks, thus could get corrupted if some other user overwrites
>> its assigned block.
>>
>>>
>>> During tests, I also noted that when the error starts to happen, no
>>> tasks in main applications are created/deleted. But there are some
>>> queues and pipes communications active, then the heap could be used by
>>> them.
>>
>> All objects that could now be used across processes are allocated on the
>> shared heap if pshared is on. So we are not only looking at task objects
>> and their life cycles.
>>
>>>
>>> I will continue investigating on this problem. If you have some other
>>> ideas/hints, I'll be glad to hear them.
>>
>> Try reducing factors that could contribute to it, e.g. the types of
>> objects used. That may help narrowing down the actual trigger.
>>
>> Jan
>>
>
> Thanks Jan, I will do some tests and (hopefully) come back with results.
Hi Jan,
sorry for the delay.
Attached there is the code of two small test programs able to reproduce
the problem. There is also the script to start them.
Briefly, the "xeno-test-tcb-allocate-mainproc" application creates two
rt buffers (request and response) and waits indefinitely on the request
buffer. On the request buffer, the "xeno-test-tcb-allocate-secproc"
writes a request and waits for the response on the response buffer. When
the request is received by the "mainproc", the "mainproc" sends a
response with an incremental counter on the response buffer, and returns
to wait requests. "secproc" receives the response, prints its content
and exits.
"secproc" is launched every 0.5 seconds. After about four minutes
running, the "secproc" starts to fail at launch with the error
BUG in main_overlay(): [main] failed to allocate main tcb
Thanks in advance, regards
--
Mauro S.
[-- Attachment #2: tcb_allocate_test.tar.gz --]
[-- Type: application/gzip, Size: 1755 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2023-01-05 20:25 ` Mauro S.
@ 2023-01-09 13:00 ` Jan Kiszka
2023-01-09 19:30 ` Jan Kiszka
2023-01-09 21:13 ` R: " Mauro
0 siblings, 2 replies; 15+ messages in thread
From: Jan Kiszka @ 2023-01-09 13:00 UTC (permalink / raw)
To: Mauro S., xenomai
On 05.01.23 21:25, Mauro S. wrote:
> Il 22/12/22 16:42, Mauro S. ha scritto:
>> Il 22/12/22 15:11, Jan Kiszka ha scritto:
>>> On 22.12.22 14:57, Mauro S. wrote:
>>>> Il 21/12/22 17:04, Jan Kiszka ha scritto:
>>>>> On 21.12.22 16:25, Mauro S. wrote:
>>>>>> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>>>>>>> On 21.12.22 15:57, Mauro S. wrote:
>>>>>>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>>>>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>>>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>>>>>>> Copperplate and
>>>>>>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>>>>>>
>>>>>>>>>>>> I have a main application, and another service application that
>>>>>>>>>>>> connects
>>>>>>>>>>>> to the main one using shared session.
>>>>>>>>>>>>
>>>>>>>>>>>> After some time the main application is running, the service
>>>>>>>>>>>> application
>>>>>>>>>>>> stops to connect to the main one with the error
>>>>>>>>>>>>
>>>>>>>>>>>> 0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>>>>>>> allocate main
>>>>>>>>>>>> tcb
>>>>>>>>>>>>
>>>>>>>>>>>> Digging a bit in the code, I found that the failing function is
>>>>>>>>>>>> the
>>>>>>>>>>>> xnmalloc() call in
>>>>>>>>>>>> lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>>>>>>> space
>>>>>>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>>>>>>
>>>>>>>>>>>> # cat /proc/xenomai/heap
>>>>>>>>>>>> TOTAL FREE NAME
>>>>>>>>>>>> 4194304 3616512 system heap
>>>>>>>>>>>> 1048576 1041776 shared heap
>>>>>>>>>>>> 1048576 1048480 private heap[14505]
>>>>>>>>>>>> 36864 19968 xddp-pool@0
>>>>>>>>>>>> 24576 7680 xddp-pool@10
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This is the code that triggers the message:
>>>>>>>>>>>
>>>>>>>>>>> if (pthread_key_create(&threadobj_tskey,
>>>>>>>>>>> finalize_thread))
>>>>>>>>>>> early_panic("failed to allocate TSD key");
>>>>>>>>>>>
>>>>>>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a
>>>>>>>>>>> plain
>>>>>>>>>>> glibc
>>>>>>>>>>> service, and its documentation says:
>>>>>>>>>>>
>>>>>>>>>>> The pthread_key_create() function shall fail if:
>>>>>>>>>>>
>>>>>>>>>>> EAGAIN The system lacked the necessary resources to
>>>>>>>>>>> create
>>>>>>>>>>> another thread-specific data key, or the system-imposed limit on
>>>>>>>>>>> the
>>>>>>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>>>>>>> exceeded.
>>>>>>>>>>>
>>>>>>>>>>> ENOMEM Insufficient memory exists to create the key.
>>>>>>>>>>>
>>>>>>>>>>> Already checked THOSE conditions? Are you possibly creating
>>>>>>>>>>> threads
>>>>>>>>>>> without cleaning them up completely?
>>>>>>>>>>>
>>>>>>>>>>> Jan
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Jan,
>>>>>>>>>>
>>>>>>>>>> thank you.
>>>>>>>>>> Sorry but I can't figure out how the code you specified (that
>>>>>>>>>> panics
>>>>>>>>>> generating the message "failed to allocate TSD key") could
>>>>>>>>>> trigger the
>>>>>>>>>> "failed to allocate main tcb" message.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>>>>>>
>>>>>>>>> That's the only line in Xenomai which raises this message. If you
>>>>>>>>> don't
>>>>>>>>> believe it triggered it, start your application in gdb and put a
>>>>>>>>> breakpoint on that panic line.
>>>>>>>>>
>>>>>>>>> Jan
>>>>>>>>>
>>>>>>>>
>>>>>>>> I believe thet this code triggers the message "failed to
>>>>>>>> allocate TSD
>>>>>>>> key".
>>>>>>>>
>>>>>>>> But I get the message "failed to allocate main tcb", that is in the
>>>>>>>> same
>>>>>>>> file, but at the line #1790.
>>>>>>>>
>>>>>>>
>>>>>>> Ah, sorry, you are right. Didn't read careful enough.
>>>>>>
>>>>>> Ok, no problem :-)
>>>>>>
>>>>>>>
>>>>>>> Could the heap be fragmented after lots and lots of
>>>>>>> allocations/releases? But what is also a bit strange is that the
>>>>>>> main
>>>>>>> thread is continuously mapped. Are you restarting the main
>>>>>>> application
>>>>>>> for every request?
>>>>>>>
>>>>>>
>>>>>> The main application is started once and keeps running, but
>>>>>> internally
>>>>>> creates and destroys many Xenomay tasks.
>>>>>>
>>>>>> During the main application lifetime (that can be long), the service
>>>>>> application reads some statistics from the main application, prints
>>>>>> these statistics and exits. Then, the main application is never
>>>>>> restarted, and the service application is restarted for every
>>>>>> request,
>>>>>> that happens every 5 seconds (but the error seems not dependent on
>>>>>> the
>>>>>> service application restart period length).
>>>>>
>>>>> Ok, so the service applications are mapping its main thread against
>>>>> Xenomai, and that as pshared apps using the shared heap.
>>>>>
>>>>
>>>> Yes, exactly.
>>>>
>>>>>>
>>>>>> On another device I have a similar (but not same) main
>>>>>> application, and
>>>>>> the same service application, and this problem does not show.
>>>>>>
>>>>>> Then, the problem should be located in the main application, but I
>>>>>> can't
>>>>>> figure out where the problem could be.
>>>>>
>>>>> As the allocation happens against a shared, global heap, the issue may
>>>>> be global as well. You could either try to trace the
>>>>> allocation/release
>>>>> pattern against the shared heap and look for anomalies or even factor
>>>>> out that pattern into a stand-alone reproducer.
>>>>
>>>> Thanks for your suggestions. Trace the allocation/release pattern would
>>>> be a huge work for this application, then I started with a simple test:
>>>> enlarge the heap sizes.
>>>> Before I had a 4096kb system heap, 1024kb shared heap, 1024kb
>>>> private heap.
>>>> Now I have a 8192kb system heap, 4096kb shared heap, 2048kb private
>>>> heap.
>>>> Nothing, the error happens again. Could this be a clue that this is not
>>>> a fragmentation problem and could be a heap corruption problem?
>>>
>>> Not impossible. The control structures should be heading the allocated
>>> memory blocks, thus could get corrupted if some other user overwrites
>>> its assigned block.
>>>
>>>>
>>>> During tests, I also noted that when the error starts to happen, no
>>>> tasks in main applications are created/deleted. But there are some
>>>> queues and pipes communications active, then the heap could be used by
>>>> them.
>>>
>>> All objects that could now be used across processes are allocated on the
>>> shared heap if pshared is on. So we are not only looking at task objects
>>> and their life cycles.
>>>
>>>>
>>>> I will continue investigating on this problem. If you have some other
>>>> ideas/hints, I'll be glad to hear them.
>>>
>>> Try reducing factors that could contribute to it, e.g. the types of
>>> objects used. That may help narrowing down the actual trigger.
>>>
>>> Jan
>>>
>>
>> Thanks Jan, I will do some tests and (hopefully) come back with results.
>
> Hi Jan,
>
> sorry for the delay.
>
> Attached there is the code of two small test programs able to reproduce
> the problem. There is also the script to start them.
>
> Briefly, the "xeno-test-tcb-allocate-mainproc" application creates two
> rt buffers (request and response) and waits indefinitely on the request
> buffer. On the request buffer, the "xeno-test-tcb-allocate-secproc"
> writes a request and waits for the response on the response buffer. When
> the request is received by the "mainproc", the "mainproc" sends a
> response with an incremental counter on the response buffer, and returns
> to wait requests. "secproc" receives the response, prints its content
> and exits.
>
> "secproc" is launched every 0.5 seconds. After about four minutes
> running, the "secproc" starts to fail at launch with the error
>
> BUG in main_overlay(): [main] failed to allocate main tcb
>
Thanks, reproduced, now trying to understand.
Your test already revealed some other issue, namely avoidable
sign-conversion warnings of Xenomai headers. Patch will come.
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: "failed to allocate main tcb" error
2023-01-09 13:00 ` Jan Kiszka
@ 2023-01-09 19:30 ` Jan Kiszka
2023-01-09 21:13 ` R: " Mauro
1 sibling, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2023-01-09 19:30 UTC (permalink / raw)
To: Mauro S., xenomai
On 09.01.23 14:00, Jan Kiszka wrote:
> On 05.01.23 21:25, Mauro S. wrote:
>> Il 22/12/22 16:42, Mauro S. ha scritto:
>>> Il 22/12/22 15:11, Jan Kiszka ha scritto:
>>>> On 22.12.22 14:57, Mauro S. wrote:
>>>>> Il 21/12/22 17:04, Jan Kiszka ha scritto:
>>>>>> On 21.12.22 16:25, Mauro S. wrote:
>>>>>>> Il 21/12/22 16:11, Jan Kiszka ha scritto:
>>>>>>>> On 21.12.22 15:57, Mauro S. wrote:
>>>>>>>>> Il 21/12/22 15:48, Jan Kiszka ha scritto:
>>>>>>>>>> On 21.12.22 15:45, Mauro S. wrote:
>>>>>>>>>>> Il 21/12/22 15:29, Jan Kiszka ha scritto:
>>>>>>>>>>>> On 21.12.22 10:05, Mauro S. wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm using Xenomai 3.1.2 on a x86-64 Atom processor, with
>>>>>>>>>>>>> Copperplate and
>>>>>>>>>>>>> pshared enabled. Linux kernel is 5.4.181.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have a main application, and another service application that
>>>>>>>>>>>>> connects
>>>>>>>>>>>>> to the main one using shared session.
>>>>>>>>>>>>>
>>>>>>>>>>>>> After some time the main application is running, the service
>>>>>>>>>>>>> application
>>>>>>>>>>>>> stops to connect to the main one with the error
>>>>>>>>>>>>>
>>>>>>>>>>>>> 0"004.166| BUG in main_overlay(): [main] failed to
>>>>>>>>>>>>> allocate main
>>>>>>>>>>>>> tcb
>>>>>>>>>>>>>
>>>>>>>>>>>>> Digging a bit in the code, I found that the failing function is
>>>>>>>>>>>>> the
>>>>>>>>>>>>> xnmalloc() call in
>>>>>>>>>>>>> lib/copperplate/threadobj.c:__threadobj_alloc().
>>>>>>>>>>>>> If I understood correctly, this function tries to allocate some
>>>>>>>>>>>>> space
>>>>>>>>>>>>> from the shared heap. But the shared heap is almost free
>>>>>>>>>>>>>
>>>>>>>>>>>>> # cat /proc/xenomai/heap
>>>>>>>>>>>>> TOTAL FREE NAME
>>>>>>>>>>>>> 4194304 3616512 system heap
>>>>>>>>>>>>> 1048576 1041776 shared heap
>>>>>>>>>>>>> 1048576 1048480 private heap[14505]
>>>>>>>>>>>>> 36864 19968 xddp-pool@0
>>>>>>>>>>>>> 24576 7680 xddp-pool@10
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This is the code that triggers the message:
>>>>>>>>>>>>
>>>>>>>>>>>> if (pthread_key_create(&threadobj_tskey,
>>>>>>>>>>>> finalize_thread))
>>>>>>>>>>>> early_panic("failed to allocate TSD key");
>>>>>>>>>>>>
>>>>>>>>>>>> pthread_key_create has nothing to do with xnmalloc. That's a
>>>>>>>>>>>> plain
>>>>>>>>>>>> glibc
>>>>>>>>>>>> service, and its documentation says:
>>>>>>>>>>>>
>>>>>>>>>>>> The pthread_key_create() function shall fail if:
>>>>>>>>>>>>
>>>>>>>>>>>> EAGAIN The system lacked the necessary resources to
>>>>>>>>>>>> create
>>>>>>>>>>>> another thread-specific data key, or the system-imposed limit on
>>>>>>>>>>>> the
>>>>>>>>>>>> total number of keys per process {PTHREAD_KEYS_MAX} has been
>>>>>>>>>>>> exceeded.
>>>>>>>>>>>>
>>>>>>>>>>>> ENOMEM Insufficient memory exists to create the key.
>>>>>>>>>>>>
>>>>>>>>>>>> Already checked THOSE conditions? Are you possibly creating
>>>>>>>>>>>> threads
>>>>>>>>>>>> without cleaning them up completely?
>>>>>>>>>>>>
>>>>>>>>>>>> Jan
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Jan,
>>>>>>>>>>>
>>>>>>>>>>> thank you.
>>>>>>>>>>> Sorry but I can't figure out how the code you specified (that
>>>>>>>>>>> panics
>>>>>>>>>>> generating the message "failed to allocate TSD key") could
>>>>>>>>>>> trigger the
>>>>>>>>>>> "failed to allocate main tcb" message.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://source.denx.de/Xenomai/xenomai/-/blob/v3.1.2/lib/copperplate/threadobj.c#L86
>>>>>>>>>>
>>>>>>>>>> That's the only line in Xenomai which raises this message. If you
>>>>>>>>>> don't
>>>>>>>>>> believe it triggered it, start your application in gdb and put a
>>>>>>>>>> breakpoint on that panic line.
>>>>>>>>>>
>>>>>>>>>> Jan
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I believe thet this code triggers the message "failed to
>>>>>>>>> allocate TSD
>>>>>>>>> key".
>>>>>>>>>
>>>>>>>>> But I get the message "failed to allocate main tcb", that is in the
>>>>>>>>> same
>>>>>>>>> file, but at the line #1790.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Ah, sorry, you are right. Didn't read careful enough.
>>>>>>>
>>>>>>> Ok, no problem :-)
>>>>>>>
>>>>>>>>
>>>>>>>> Could the heap be fragmented after lots and lots of
>>>>>>>> allocations/releases? But what is also a bit strange is that the
>>>>>>>> main
>>>>>>>> thread is continuously mapped. Are you restarting the main
>>>>>>>> application
>>>>>>>> for every request?
>>>>>>>>
>>>>>>>
>>>>>>> The main application is started once and keeps running, but
>>>>>>> internally
>>>>>>> creates and destroys many Xenomay tasks.
>>>>>>>
>>>>>>> During the main application lifetime (that can be long), the service
>>>>>>> application reads some statistics from the main application, prints
>>>>>>> these statistics and exits. Then, the main application is never
>>>>>>> restarted, and the service application is restarted for every
>>>>>>> request,
>>>>>>> that happens every 5 seconds (but the error seems not dependent on
>>>>>>> the
>>>>>>> service application restart period length).
>>>>>>
>>>>>> Ok, so the service applications are mapping its main thread against
>>>>>> Xenomai, and that as pshared apps using the shared heap.
>>>>>>
>>>>>
>>>>> Yes, exactly.
>>>>>
>>>>>>>
>>>>>>> On another device I have a similar (but not same) main
>>>>>>> application, and
>>>>>>> the same service application, and this problem does not show.
>>>>>>>
>>>>>>> Then, the problem should be located in the main application, but I
>>>>>>> can't
>>>>>>> figure out where the problem could be.
>>>>>>
>>>>>> As the allocation happens against a shared, global heap, the issue may
>>>>>> be global as well. You could either try to trace the
>>>>>> allocation/release
>>>>>> pattern against the shared heap and look for anomalies or even factor
>>>>>> out that pattern into a stand-alone reproducer.
>>>>>
>>>>> Thanks for your suggestions. Trace the allocation/release pattern would
>>>>> be a huge work for this application, then I started with a simple test:
>>>>> enlarge the heap sizes.
>>>>> Before I had a 4096kb system heap, 1024kb shared heap, 1024kb
>>>>> private heap.
>>>>> Now I have a 8192kb system heap, 4096kb shared heap, 2048kb private
>>>>> heap.
>>>>> Nothing, the error happens again. Could this be a clue that this is not
>>>>> a fragmentation problem and could be a heap corruption problem?
>>>>
>>>> Not impossible. The control structures should be heading the allocated
>>>> memory blocks, thus could get corrupted if some other user overwrites
>>>> its assigned block.
>>>>
>>>>>
>>>>> During tests, I also noted that when the error starts to happen, no
>>>>> tasks in main applications are created/deleted. But there are some
>>>>> queues and pipes communications active, then the heap could be used by
>>>>> them.
>>>>
>>>> All objects that could now be used across processes are allocated on the
>>>> shared heap if pshared is on. So we are not only looking at task objects
>>>> and their life cycles.
>>>>
>>>>>
>>>>> I will continue investigating on this problem. If you have some other
>>>>> ideas/hints, I'll be glad to hear them.
>>>>
>>>> Try reducing factors that could contribute to it, e.g. the types of
>>>> objects used. That may help narrowing down the actual trigger.
>>>>
>>>> Jan
>>>>
>>>
>>> Thanks Jan, I will do some tests and (hopefully) come back with results.
>>
>> Hi Jan,
>>
>> sorry for the delay.
>>
>> Attached there is the code of two small test programs able to reproduce
>> the problem. There is also the script to start them.
>>
>> Briefly, the "xeno-test-tcb-allocate-mainproc" application creates two
>> rt buffers (request and response) and waits indefinitely on the request
>> buffer. On the request buffer, the "xeno-test-tcb-allocate-secproc"
>> writes a request and waits for the response on the response buffer. When
>> the request is received by the "mainproc", the "mainproc" sends a
>> response with an incremental counter on the response buffer, and returns
>> to wait requests. "secproc" receives the response, prints its content
>> and exits.
>>
>> "secproc" is launched every 0.5 seconds. After about four minutes
>> running, the "secproc" starts to fail at launch with the error
>>
>> BUG in main_overlay(): [main] failed to allocate main tcb
>>
>
> Thanks, reproduced, now trying to understand.
>
We are leaking the main tcb on the shared heap. That heap - and I also
regularly forget this - is not tracked by the kernel but purely in
userspace. Enable the registry, and you can watch /var/run/xenomai
/root/test/system/heaps going up with every secondary call. This seems
to be needed because the normal pthread-key cleanup handler is not
called on main exits:
diff --git a/lib/copperplate/threadobj.c b/lib/copperplate/threadobj.c
index f4588a17a8..56d0f1b231 100644
--- a/lib/copperplate/threadobj.c
+++ b/lib/copperplate/threadobj.c
@@ -1770,6 +1770,11 @@ int threadobj_set_schedprio(struct threadobj *thobj, int priority)
return threadobj_set_schedparam(thobj, policy, ¶m_ex);
}
+static void main_exit(void)
+{
+ threadobj_free(threadobj_current());
+}
+
static inline int main_overlay(void)
{
struct threadobj_init_data idata;
@@ -1806,6 +1811,8 @@ static inline int main_overlay(void)
threadobj_prologue(tcb, NULL);
pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
+ atexit(main_exit);
+
return 0;
}
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
^ permalink raw reply related [flat|nested] 15+ messages in thread
* R: Re: "failed to allocate main tcb" error
2023-01-09 13:00 ` Jan Kiszka
2023-01-09 19:30 ` Jan Kiszka
@ 2023-01-09 21:13 ` Mauro
1 sibling, 0 replies; 15+ messages in thread
From: Mauro @ 2023-01-09 21:13 UTC (permalink / raw)
To: Jan Kiszka, xenomai
------ Messaggio Originale ------
Da: jan.kiszka@siemens.com
A: mau.salvi@tin.it; xenomai@lists.linux.dev
Inviato: lunedì 9 gennaio 2023 20:30
Oggetto: Re: "failed to allocate main tcb" error
---8<----
We are leaking the main tcb on the shared heap. That heap - and I also
regularly forget this - is not tracked by the kernel but purely in
userspace. Enable the registry, and you can watch /var/run/xenomai
/root/test/system/heaps going up with every secondary call. This seems
to be needed because the normal pthread-key cleanup handler is not
called on main exits:
diff --git a/lib/copperplate/threadobj.c b/lib/copperplate/threadobj.c
index f4588a17a8..56d0f1b231 100644
--- a/lib/copperplate/threadobj.c
+++ b/lib/copperplate/threadobj.c
@@ -1770,6 +1770,11 @@ int threadobj_set_schedprio(struct threadobj
*thobj, int priority)
return threadobj_set_schedparam(thobj, policy, ¶m_ex);
}
+static void main_exit(void)
+{
+ threadobj_free(threadobj_current());
+}
+
static inline int main_overlay(void)
{
struct threadobj_init_data idata;
@@ -1806,6 +1811,8 @@ static inline int main_overlay(void)
threadobj_prologue(tcb, NULL);
pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
+ atexit(main_exit);
+
return 0;
}
Jan
Thank you very much Jan.
Regards
--
Mauro S.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-01-09 21:16 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-21 9:05 "failed to allocate main tcb" error Mauro S.
2022-12-21 14:29 ` Jan Kiszka
2022-12-21 14:45 ` Mauro S.
2022-12-21 14:48 ` Jan Kiszka
2022-12-21 14:57 ` Mauro S.
2022-12-21 15:11 ` Jan Kiszka
2022-12-21 15:25 ` Mauro S.
2022-12-21 16:04 ` Jan Kiszka
2022-12-22 13:57 ` Mauro S.
2022-12-22 14:11 ` Jan Kiszka
2022-12-22 15:42 ` Mauro S.
2023-01-05 20:25 ` Mauro S.
2023-01-09 13:00 ` Jan Kiszka
2023-01-09 19:30 ` Jan Kiszka
2023-01-09 21:13 ` R: " Mauro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).