All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
@ 2014-12-07 16:26 Ronny Meeus
  2014-12-16 17:36 ` Ronny Meeus
  0 siblings, 1 reply; 21+ messages in thread
From: Ronny Meeus @ 2014-12-07 16:26 UTC (permalink / raw)
  To: xenomai

Hello

we are using the xenomai-forge implementation.
We from time to time see an issue that the timer-internal thread is
consuming a complete core. It is seen when we send broadcast traffic that
needs to be handled by the Linux kernel (ARP).

The kernel thread's priority handling the packets in the middle between the
timer-internal thread and the application thread's priority. All threads
run on the same core.
If the priority of the timer-internal is lowered below the kernel thread,
the load disappears immediately.
So it looks like there is some busy polling on a common resource that is
currently held by the application thread running at the lowest prio.

I see that the timer lock being used is a mutex with priority inheritance
so I would expect that the prio of the application is raised as soon as the
timer-internal thread tries to obtain the mutex.

It might be that it has nothing to do with the mutex, this is just my guess.

Has anybody seen similar issues before?

Best regards,
Ronny

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-07 16:26 [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu Ronny Meeus
@ 2014-12-16 17:36 ` Ronny Meeus
  2014-12-16 17:58   ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Ronny Meeus @ 2014-12-16 17:36 UTC (permalink / raw)
  To: xenomai

On Sun, Dec 7, 2014 at 5:26 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote:
> Hello
>
> we are using the xenomai-forge implementation.
> We from time to time see an issue that the timer-internal thread is
> consuming a complete core. It is seen when we send broadcast traffic that
> needs to be handled by the Linux kernel (ARP).
>
> The kernel thread's priority handling the packets in the middle between the
> timer-internal thread and the application thread's priority. All threads run
> on the same core.
> If the priority of the timer-internal is lowered below the kernel thread,
> the load disappears immediately.
> So it looks like there is some busy polling on a common resource that is
> currently held by the application thread running at the lowest prio.
>
> I see that the timer lock being used is a mutex with priority inheritance so
> I would expect that the prio of the application is raised as soon as the
> timer-internal thread tries to obtain the mutex.

After investigating this issue in more detail I have the impression that it has
nothing to do with the mutex used to protect the timer but with the conditional
variable used to implement the psos event interface.

I found references on the web that explain an issue with the internal mutex used
inside the posix library to implement a conditional variable. See:
https://bugzilla.redhat.com/show_bug.cgi?id=438484
http://marc.info/?t=134688711000002&r=1&w=2

If this is indeed true, it means that the usage of conditional
variables is not safe
at all (from priority inheritance point of view).

Did anybody experiences issues like this before?
Are there any solutions / workarounds available (for example by avoiding using
conditional variables and using PI mutexes instead)?

> It might be that it has nothing to do with the mutex, this is just my guess.
>
> Has anybody seen similar issues before?
>
> Best regards,
> Ronny


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-16 17:36 ` Ronny Meeus
@ 2014-12-16 17:58   ` Philippe Gerum
  2014-12-16 19:41     ` Ronny Meeus
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2014-12-16 17:58 UTC (permalink / raw)
  To: Ronny Meeus, xenomai

On 12/16/2014 06:36 PM, Ronny Meeus wrote:
> On Sun, Dec 7, 2014 at 5:26 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote:
>> Hello
>>
>> we are using the xenomai-forge implementation.
>> We from time to time see an issue that the timer-internal thread is
>> consuming a complete core. It is seen when we send broadcast traffic that
>> needs to be handled by the Linux kernel (ARP).
>>
>> The kernel thread's priority handling the packets in the middle between the
>> timer-internal thread and the application thread's priority. All threads run
>> on the same core.
>> If the priority of the timer-internal is lowered below the kernel thread,
>> the load disappears immediately.
>> So it looks like there is some busy polling on a common resource that is
>> currently held by the application thread running at the lowest prio.
>>
>> I see that the timer lock being used is a mutex with priority inheritance so
>> I would expect that the prio of the application is raised as soon as the
>> timer-internal thread tries to obtain the mutex.
> 
> After investigating this issue in more detail I have the impression that it has
> nothing to do with the mutex used to protect the timer but with the conditional
> variable used to implement the psos event interface.
> 
> I found references on the web that explain an issue with the internal mutex used
> inside the posix library to implement a conditional variable. See:
> https://bugzilla.redhat.com/show_bug.cgi?id=438484
> http://marc.info/?t=134688711000002&r=1&w=2
> 
> If this is indeed true, it means that the usage of conditional
> variables is not safe
> at all (from priority inheritance point of view).

Yes, condvars are known not to work nicely with PI mutexes on the glibc.

> 
> Did anybody experiences issues like this before?
> Are there any solutions / workarounds available (for example by avoiding using
> conditional variables and using PI mutexes instead)?

Disabling PI for mutexes is the only option.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-16 17:58   ` Philippe Gerum
@ 2014-12-16 19:41     ` Ronny Meeus
  2014-12-16 20:07       ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Ronny Meeus @ 2014-12-16 19:41 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Tue, Dec 16, 2014 at 6:58 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 12/16/2014 06:36 PM, Ronny Meeus wrote:
>> On Sun, Dec 7, 2014 at 5:26 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote:
>>> Hello
>>>
>>> we are using the xenomai-forge implementation.
>>> We from time to time see an issue that the timer-internal thread is
>>> consuming a complete core. It is seen when we send broadcast traffic that
>>> needs to be handled by the Linux kernel (ARP).
>>>
>>> The kernel thread's priority handling the packets in the middle between the
>>> timer-internal thread and the application thread's priority. All threads run
>>> on the same core.
>>> If the priority of the timer-internal is lowered below the kernel thread,
>>> the load disappears immediately.
>>> So it looks like there is some busy polling on a common resource that is
>>> currently held by the application thread running at the lowest prio.
>>>
>>> I see that the timer lock being used is a mutex with priority inheritance so
>>> I would expect that the prio of the application is raised as soon as the
>>> timer-internal thread tries to obtain the mutex.
>>
>> After investigating this issue in more detail I have the impression that it has
>> nothing to do with the mutex used to protect the timer but with the conditional
>> variable used to implement the psos event interface.
>>
>> I found references on the web that explain an issue with the internal mutex used
>> inside the posix library to implement a conditional variable. See:
>> https://bugzilla.redhat.com/show_bug.cgi?id=438484
>> http://marc.info/?t=134688711000002&r=1&w=2
>>
>> If this is indeed true, it means that the usage of conditional
>> variables is not safe
>> at all (from priority inheritance point of view).
>
> Yes, condvars are known not to work nicely with PI mutexes on the glibc.

Philippe,
I do not understand.

We just use the pSOS interface of the xenomai-forge, which internally uses
conditional variables.
Does this mean that we cannot use the pSOS interface with glibc?

If above statement is correc, what libc should we use to make it work?

>
>>
>> Did anybody experiences issues like this before?
>> Are there any solutions / workarounds available (for example by avoiding using
>> conditional variables and using PI mutexes instead)?
>
> Disabling PI for mutexes is the only option.
>
> --
> Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-16 19:41     ` Ronny Meeus
@ 2014-12-16 20:07       ` Philippe Gerum
  2014-12-18  8:00         ` Ronny Meeus
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2014-12-16 20:07 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: xenomai

On 12/16/2014 08:41 PM, Ronny Meeus wrote:
> On Tue, Dec 16, 2014 at 6:58 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 12/16/2014 06:36 PM, Ronny Meeus wrote:
>>> On Sun, Dec 7, 2014 at 5:26 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote:
>>>> Hello
>>>>
>>>> we are using the xenomai-forge implementation.
>>>> We from time to time see an issue that the timer-internal thread is
>>>> consuming a complete core. It is seen when we send broadcast traffic that
>>>> needs to be handled by the Linux kernel (ARP).
>>>>
>>>> The kernel thread's priority handling the packets in the middle between the
>>>> timer-internal thread and the application thread's priority. All threads run
>>>> on the same core.
>>>> If the priority of the timer-internal is lowered below the kernel thread,
>>>> the load disappears immediately.
>>>> So it looks like there is some busy polling on a common resource that is
>>>> currently held by the application thread running at the lowest prio.
>>>>
>>>> I see that the timer lock being used is a mutex with priority inheritance so
>>>> I would expect that the prio of the application is raised as soon as the
>>>> timer-internal thread tries to obtain the mutex.
>>>
>>> After investigating this issue in more detail I have the impression that it has
>>> nothing to do with the mutex used to protect the timer but with the conditional
>>> variable used to implement the psos event interface.
>>>
>>> I found references on the web that explain an issue with the internal mutex used
>>> inside the posix library to implement a conditional variable. See:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=438484
>>> http://marc.info/?t=134688711000002&r=1&w=2
>>>
>>> If this is indeed true, it means that the usage of conditional
>>> variables is not safe
>>> at all (from priority inheritance point of view).
>>
>> Yes, condvars are known not to work nicely with PI mutexes on the glibc.
> 
> Philippe,
> I do not understand.
> 
> We just use the pSOS interface of the xenomai-forge, which internally uses
> conditional variables.
> Does this mean that we cannot use the pSOS interface with glibc?
> 
> If above statement is correc, what libc should we use to make it work?
> 

A release of the glibc that fixes this issue. I must admit that I did
not track this problem lately. Jan likely knows better here.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-16 20:07       ` Philippe Gerum
@ 2014-12-18  8:00         ` Ronny Meeus
  2014-12-18  9:04           ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Ronny Meeus @ 2014-12-18  8:00 UTC (permalink / raw)
  To: Philippe Gerum, jan.kiszka; +Cc: xenomai

>
> A release of the glibc that fixes this issue. I must admit that I did
> not track this problem lately. Jan likely knows better here.
>

Jan,

what version glibc solves the priority inversion issue on conditional variables?
I already tried the glibc 2.18 but the issue is still there.

Regards,
Ronny


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18  8:00         ` Ronny Meeus
@ 2014-12-18  9:04           ` Jan Kiszka
  2014-12-18 12:28             ` Ronny Meeus
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2014-12-18  9:04 UTC (permalink / raw)
  To: Ronny Meeus, Philippe Gerum; +Cc: xenomai

On 2014-12-18 09:00, Ronny Meeus wrote:
>>
>> A release of the glibc that fixes this issue. I must admit that I did
>> not track this problem lately. Jan likely knows better here.
>>
> 
> Jan,
> 
> what version glibc solves the priority inversion issue on conditional variables?
> I already tried the glibc 2.18 but the issue is still there.

The bug is still not fixed, and discussion stalled again, see
https://sourceware.org/bugzilla/show_bug.cgi?id=11588

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18  9:04           ` Jan Kiszka
@ 2014-12-18 12:28             ` Ronny Meeus
  2014-12-18 13:35               ` Jan Kiszka
  2014-12-18 14:12               ` Gilles Chanteperdrix
  0 siblings, 2 replies; 21+ messages in thread
From: Ronny Meeus @ 2014-12-18 12:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Thu, Dec 18, 2014 at 10:04 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2014-12-18 09:00, Ronny Meeus wrote:
>>>
>>> A release of the glibc that fixes this issue. I must admit that I did
>>> not track this problem lately. Jan likely knows better here.
>>>
>>
>> Jan,
>>
>> what version glibc solves the priority inversion issue on conditional variables?
>> I already tried the glibc 2.18 but the issue is still there.
>
> The bug is still not fixed, and discussion stalled again, see
> https://sourceware.org/bugzilla/show_bug.cgi?id=11588
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
> Corporate Competence Center Embedded Linux

Philippe, Jan

as long as this issue is not fixed in glibc, it is not OK to use
conditional variables
in application space for real-time applications in my opinion.

Since the pSOS skin uses conditional variables to implement events and realtime
priority threads to implement pSOS tasks, it is by definition broken
and not useable
for any real application.

For example the internal-timer server, sending events to lower priority tasks,
will be blocked until all middle prio tasks have completed. We have seen
massive load consumed by the internal-timer server due to this.
What happens is that the timer thread is blocked on the mutex currently owned
by an thread running at normal (lower) priority. Every time a Linux
timer expires,
a signal is sent to the timer-server which will wake-up the task,
return to the c-library
which will re-invoke the futex call. In case a high number of timers
is used, the overhead
of this can be large. Since the timer-server is running at the highest
priority (-100) we
see all kinds of strange crashes.

The same priority inversion is true for our own drivers since they are
running at
high prio as well.

Has the replacement of these conditional variables by some other POSIX mechanism
(like mutexes) ever been considered?

Ronny


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18 12:28             ` Ronny Meeus
@ 2014-12-18 13:35               ` Jan Kiszka
  2014-12-18 14:17                 ` Ronny Meeus
  2014-12-18 14:12               ` Gilles Chanteperdrix
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2014-12-18 13:35 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: xenomai

On 2014-12-18 13:28, Ronny Meeus wrote:
> On Thu, Dec 18, 2014 at 10:04 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2014-12-18 09:00, Ronny Meeus wrote:
>>>>
>>>> A release of the glibc that fixes this issue. I must admit that I did
>>>> not track this problem lately. Jan likely knows better here.
>>>>
>>>
>>> Jan,
>>>
>>> what version glibc solves the priority inversion issue on conditional variables?
>>> I already tried the glibc 2.18 but the issue is still there.
>>
>> The bug is still not fixed, and discussion stalled again, see
>> https://sourceware.org/bugzilla/show_bug.cgi?id=11588
>>
>> Jan
>>
>> --
>> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
>> Corporate Competence Center Embedded Linux
> 
> Philippe, Jan
> 
> as long as this issue is not fixed in glibc, it is not OK to use
> conditional variables
> in application space for real-time applications in my opinion.

...when combining them with PI mutexes, right. For real-time QEMU/KVM, I
worked around this by using prio-ceiling mutexes. That is by far not
optimal, performance-wise, but at least you avoid random lockups or the
other side effects of that bug.

> 
> Since the pSOS skin uses conditional variables to implement events and realtime
> priority threads to implement pSOS tasks, it is by definition broken
> and not useable
> for any real application.
> 
> For example the internal-timer server, sending events to lower priority tasks,
> will be blocked until all middle prio tasks have completed. We have seen
> massive load consumed by the internal-timer server due to this.
> What happens is that the timer thread is blocked on the mutex currently owned
> by an thread running at normal (lower) priority. Every time a Linux
> timer expires,
> a signal is sent to the timer-server which will wake-up the task,
> return to the c-library
> which will re-invoke the futex call. In case a high number of timers
> is used, the overhead
> of this can be large. Since the timer-server is running at the highest
> priority (-100) we
> see all kinds of strange crashes.
> 
> The same priority inversion is true for our own drivers since they are
> running at
> high prio as well.
> 
> Has the replacement of these conditional variables by some other POSIX mechanism
> (like mutexes) ever been considered?

Sometimes it is possible to design a algorithm that uses a semaphore for
event signaling instead. Doesn't work for all cond-var scenarios, though.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18 12:28             ` Ronny Meeus
  2014-12-18 13:35               ` Jan Kiszka
@ 2014-12-18 14:12               ` Gilles Chanteperdrix
  2014-12-18 14:58                 ` Jan Kiszka
  1 sibling, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-18 14:12 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: Jan Kiszka, xenomai

On Thu, Dec 18, 2014 at 01:28:42PM +0100, Ronny Meeus wrote:
> On Thu, Dec 18, 2014 at 10:04 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > On 2014-12-18 09:00, Ronny Meeus wrote:
> >>>
> >>> A release of the glibc that fixes this issue. I must admit that I did
> >>> not track this problem lately. Jan likely knows better here.
> >>>
> >>
> >> Jan,
> >>
> >> what version glibc solves the priority inversion issue on conditional variables?
> >> I already tried the glibc 2.18 but the issue is still there.
> >
> > The bug is still not fixed, and discussion stalled again, see
> > https://sourceware.org/bugzilla/show_bug.cgi?id=11588
> >
> > Jan
> >
> > --
> > Siemens AG, Corporate Technology, CT RTC ITP SES-DE
> > Corporate Competence Center Embedded Linux
> 
> Philippe, Jan
> 
> as long as this issue is not fixed in glibc, it is not OK to use
> conditional variables

I believe xenomai cobalt does not suffer from the same issue,
condition variables should work fine with priority inheritance.

Otherwise, have you tried some alternate libc, such as musl:
http://www.musl-libc.org/

The following blog:
http://ewontfix.com/

Seems to show that the musl maintainers try and report glibc bugs
and avoid them in their implementation.

I have not tried xenomai with musl at all, so, maybe it does not
even compile. But maybe just compiling a testcase for the condvar
issue with that libc would help know if it has the same issue or
not.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18 13:35               ` Jan Kiszka
@ 2014-12-18 14:17                 ` Ronny Meeus
  0 siblings, 0 replies; 21+ messages in thread
From: Ronny Meeus @ 2014-12-18 14:17 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

> Philippe, Jan
>>
>> as long as this issue is not fixed in glibc, it is not OK to use
>> conditional variables
>> in application space for real-time applications in my opinion.
>
> ...when combining them with PI mutexes, right. For real-time QEMU/KVM, I
> worked around this by using prio-ceiling mutexes. That is by far not
> optimal, performance-wise, but at least you avoid random lockups or the
> other side effects of that bug.
>

Jan, to be clear:

We do not use PI mutexes in our application, we just use pSOS primitives.
Internally in the copperplate lib (see copperplate/syncobj.c), conditional vars
are used in combination with PI mutexes.

static inline
int monitor_wait_grant(struct syncobj *sobj,
                       struct threadobj *current,
                       const struct timespec *timeout)
{
        if (timeout)
                return -pthread_cond_timedwait(&current->core.grant_sync,
                                               &sobj->core.lock, timeout);

        return -pthread_cond_wait(&current->core.grant_sync, &sobj->core.lock);
}

where sobj->core.lock is a mutex with PI:

pthread_mutexattr_init(&mattr);
pthread_mutexattr_settype(&mattr, mutex_type_attribute);
pthread_mutexattr_setprotocol(&mattr, PTHREAD_PRIO_INHERIT);
ret = __bt(-pthread_mutexattr_setpshared(&mattr, mutex_scope_attribute));


Ronny


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18 14:12               ` Gilles Chanteperdrix
@ 2014-12-18 14:58                 ` Jan Kiszka
  2014-12-18 15:04                   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2014-12-18 14:58 UTC (permalink / raw)
  To: Gilles Chanteperdrix, Ronny Meeus; +Cc: xenomai

On 2014-12-18 15:12, Gilles Chanteperdrix wrote:
> On Thu, Dec 18, 2014 at 01:28:42PM +0100, Ronny Meeus wrote:
>> On Thu, Dec 18, 2014 at 10:04 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2014-12-18 09:00, Ronny Meeus wrote:
>>>>>
>>>>> A release of the glibc that fixes this issue. I must admit that I did
>>>>> not track this problem lately. Jan likely knows better here.
>>>>>
>>>>
>>>> Jan,
>>>>
>>>> what version glibc solves the priority inversion issue on conditional variables?
>>>> I already tried the glibc 2.18 but the issue is still there.
>>>
>>> The bug is still not fixed, and discussion stalled again, see
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=11588
>>>
>>> Jan
>>>
>>> --
>>> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
>>> Corporate Competence Center Embedded Linux
>>
>> Philippe, Jan
>>
>> as long as this issue is not fixed in glibc, it is not OK to use
>> conditional variables
> 
> I believe xenomai cobalt does not suffer from the same issue,
> condition variables should work fine with priority inheritance.

Yes, this is a mercury-only issue. Cobalt is fine as its own
implementation of posix mutexes and condvars is correct in this regard.

> 
> Otherwise, have you tried some alternate libc, such as musl:
> http://www.musl-libc.org/
> 
> The following blog:
> http://ewontfix.com/
> 
> Seems to show that the musl maintainers try and report glibc bugs
> and avoid them in their implementation.
> 
> I have not tried xenomai with musl at all, so, maybe it does not
> even compile. But maybe just compiling a testcase for the condvar
> issue with that libc would help know if it has the same issue or
> not.

Well, like with many of those "light-weight" re-implementations, the are
"small" issues with bits required for real-time:

http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18 14:58                 ` Jan Kiszka
@ 2014-12-18 15:04                   ` Gilles Chanteperdrix
  2014-12-18 15:25                     ` Ronny Meeus
  0 siblings, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-18 15:04 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Thu, Dec 18, 2014 at 03:58:52PM +0100, Jan Kiszka wrote:
> On 2014-12-18 15:12, Gilles Chanteperdrix wrote:
> > Otherwise, have you tried some alternate libc, such as musl:
> > http://www.musl-libc.org/
> > 
> > The following blog:
> > http://ewontfix.com/
> > 
> > Seems to show that the musl maintainers try and report glibc bugs
> > and avoid them in their implementation.
> > 
> > I have not tried xenomai with musl at all, so, maybe it does not
> > even compile. But maybe just compiling a testcase for the condvar
> > issue with that libc would help know if it has the same issue or
> > not.
> 
> Well, like with many of those "light-weight" re-implementations, the are
> "small" issues with bits required for real-time:
> 
> http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c

On the other hand, no implementation with a clear ENOTSUPP is better
than a partial and buggy implementation that can not be trusted anyway.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18 15:04                   ` Gilles Chanteperdrix
@ 2014-12-18 15:25                     ` Ronny Meeus
  2014-12-18 15:30                       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 21+ messages in thread
From: Ronny Meeus @ 2014-12-18 15:25 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai

On Thu, Dec 18, 2014 at 4:04 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Thu, Dec 18, 2014 at 03:58:52PM +0100, Jan Kiszka wrote:
>> On 2014-12-18 15:12, Gilles Chanteperdrix wrote:
>> > Otherwise, have you tried some alternate libc, such as musl:
>> > http://www.musl-libc.org/
>> >
>> > The following blog:
>> > http://ewontfix.com/
>> >
>> > Seems to show that the musl maintainers try and report glibc bugs
>> > and avoid them in their implementation.
>> >
>> > I have not tried xenomai with musl at all, so, maybe it does not
>> > even compile. But maybe just compiling a testcase for the condvar
>> > issue with that libc would help know if it has the same issue or
>> > not.
>>
>> Well, like with many of those "light-weight" re-implementations, the are
>> "small" issues with bits required for real-time:
>>
>> http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c
>
> On the other hand, no implementation with a clear ENOTSUPP is better
> than a partial and buggy implementation that can not be trusted anyway.
>
> --
>                                             Gilles.

Gilles I agree.

In the meantime I tried it already.

This is indeed the trace I get when running my test application with musl.
# ./cond_test_arm
# hread_mutexattr_setprotocol: Not supported

Cobalt is not an option for us either since in that case all Linux applications
will run in low-priority. Next to that we also have a huge priority inversion
each time a Linux system call is done.

Do we have other options to fix forge?

Ronny


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18 15:25                     ` Ronny Meeus
@ 2014-12-18 15:30                       ` Gilles Chanteperdrix
  2014-12-18 15:35                         ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2014-12-18 15:30 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: Jan Kiszka, xenomai

On Thu, Dec 18, 2014 at 04:25:40PM +0100, Ronny Meeus wrote:
> On Thu, Dec 18, 2014 at 4:04 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
> > On Thu, Dec 18, 2014 at 03:58:52PM +0100, Jan Kiszka wrote:
> >> On 2014-12-18 15:12, Gilles Chanteperdrix wrote:
> >> > Otherwise, have you tried some alternate libc, such as musl:
> >> > http://www.musl-libc.org/
> >> >
> >> > The following blog:
> >> > http://ewontfix.com/
> >> >
> >> > Seems to show that the musl maintainers try and report glibc bugs
> >> > and avoid them in their implementation.
> >> >
> >> > I have not tried xenomai with musl at all, so, maybe it does not
> >> > even compile. But maybe just compiling a testcase for the condvar
> >> > issue with that libc would help know if it has the same issue or
> >> > not.
> >>
> >> Well, like with many of those "light-weight" re-implementations, the are
> >> "small" issues with bits required for real-time:
> >>
> >> http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c
> >
> > On the other hand, no implementation with a clear ENOTSUPP is better
> > than a partial and buggy implementation that can not be trusted anyway.
> >
> > --
> >                                             Gilles.
> 
> Gilles I agree.
> 
> In the meantime I tried it already.
> 
> This is indeed the trace I get when running my test application with musl.
> # ./cond_test_arm
> # hread_mutexattr_setprotocol: Not supported
> 
> Cobalt is not an option for us either since in that case all Linux
> applications will run in low-priority. Next to that we also have a
> huge priority inversion each time a Linux system call is done.
> 
> Do we have other options to fix forge?

Well, three options have been proposed if I followed this thread
correctly: 
- stop using priority inheritance for these internal mutexes, at the
risk of creating priority inversions
- switch to priority ceiling (but what will be the ceiling? 99?)
- use cobalt.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18 15:30                       ` Gilles Chanteperdrix
@ 2014-12-18 15:35                         ` Jan Kiszka
  2014-12-18 15:49                           ` Ronny Meeus
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2014-12-18 15:35 UTC (permalink / raw)
  To: Gilles Chanteperdrix, Ronny Meeus; +Cc: xenomai

On 2014-12-18 16:30, Gilles Chanteperdrix wrote:
> On Thu, Dec 18, 2014 at 04:25:40PM +0100, Ronny Meeus wrote:
>> On Thu, Dec 18, 2014 at 4:04 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>>> On Thu, Dec 18, 2014 at 03:58:52PM +0100, Jan Kiszka wrote:
>>>> On 2014-12-18 15:12, Gilles Chanteperdrix wrote:
>>>>> Otherwise, have you tried some alternate libc, such as musl:
>>>>> http://www.musl-libc.org/
>>>>>
>>>>> The following blog:
>>>>> http://ewontfix.com/
>>>>>
>>>>> Seems to show that the musl maintainers try and report glibc bugs
>>>>> and avoid them in their implementation.
>>>>>
>>>>> I have not tried xenomai with musl at all, so, maybe it does not
>>>>> even compile. But maybe just compiling a testcase for the condvar
>>>>> issue with that libc would help know if it has the same issue or
>>>>> not.
>>>>
>>>> Well, like with many of those "light-weight" re-implementations, the are
>>>> "small" issues with bits required for real-time:
>>>>
>>>> http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c
>>>
>>> On the other hand, no implementation with a clear ENOTSUPP is better
>>> than a partial and buggy implementation that can not be trusted anyway.
>>>
>>> --
>>>                                             Gilles.
>>
>> Gilles I agree.
>>
>> In the meantime I tried it already.
>>
>> This is indeed the trace I get when running my test application with musl.
>> # ./cond_test_arm
>> # hread_mutexattr_setprotocol: Not supported
>>
>> Cobalt is not an option for us either since in that case all Linux
>> applications will run in low-priority. Next to that we also have a
>> huge priority inversion each time a Linux system call is done.
>>
>> Do we have other options to fix forge?
> 
> Well, three options have been proposed if I followed this thread
> correctly: 
> - stop using priority inheritance for these internal mutexes, at the
> risk of creating priority inversions
> - switch to priority ceiling (but what will be the ceiling? 99?)

Likely - part of the reason why that is no general solution.

> - use cobalt.

- use a patched glibc
- fix upstream glibc - non-trivial, as history shows, but long overdue

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18 15:35                         ` Jan Kiszka
@ 2014-12-18 15:49                           ` Ronny Meeus
  2014-12-18 16:06                             ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Ronny Meeus @ 2014-12-18 15:49 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Thu, Dec 18, 2014 at 4:35 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2014-12-18 16:30, Gilles Chanteperdrix wrote:
>> On Thu, Dec 18, 2014 at 04:25:40PM +0100, Ronny Meeus wrote:
>>> On Thu, Dec 18, 2014 at 4:04 PM, Gilles Chanteperdrix
>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>> On Thu, Dec 18, 2014 at 03:58:52PM +0100, Jan Kiszka wrote:
>>>>> On 2014-12-18 15:12, Gilles Chanteperdrix wrote:
>>>>>> Otherwise, have you tried some alternate libc, such as musl:
>>>>>> http://www.musl-libc.org/
>>>>>>
>>>>>> The following blog:
>>>>>> http://ewontfix.com/
>>>>>>
>>>>>> Seems to show that the musl maintainers try and report glibc bugs
>>>>>> and avoid them in their implementation.
>>>>>>
>>>>>> I have not tried xenomai with musl at all, so, maybe it does not
>>>>>> even compile. But maybe just compiling a testcase for the condvar
>>>>>> issue with that libc would help know if it has the same issue or
>>>>>> not.
>>>>>
>>>>> Well, like with many of those "light-weight" re-implementations, the are
>>>>> "small" issues with bits required for real-time:
>>>>>
>>>>> http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c
>>>>
>>>> On the other hand, no implementation with a clear ENOTSUPP is better
>>>> than a partial and buggy implementation that can not be trusted anyway.
>>>>
>>>> --
>>>>                                             Gilles.
>>>
>>> Gilles I agree.
>>>
>>> In the meantime I tried it already.
>>>
>>> This is indeed the trace I get when running my test application with musl.
>>> # ./cond_test_arm
>>> # hread_mutexattr_setprotocol: Not supported
>>>
>>> Cobalt is not an option for us either since in that case all Linux
>>> applications will run in low-priority. Next to that we also have a
>>> huge priority inversion each time a Linux system call is done.
>>>
>>> Do we have other options to fix forge?
>>
>> Well, three options have been proposed if I followed this thread
>> correctly:
>> - stop using priority inheritance for these internal mutexes, at the
>> risk of creating priority inversions
>> - switch to priority ceiling (but what will be the ceiling? 99?)
>
> Likely - part of the reason why that is no general solution.
>
>> - use cobalt.
>
> - use a patched glibc

I tried to apply the patch on glibc 2.20 but it looks like the issue is still
present. Even if it would be solved with the patch, we force all users
of forge to work with a patched version of glibc and to go to 2.20.
This might not always be easy.

> - fix upstream glibc - non-trivial, as history shows, but long overdue

Another option is to implement the priority boost in the copperplate lib.
Before the signal is done, change the priority to an equal priority as
the task signaling the conditional variable (in case the prio or the
waiting task is lower).
Once the thread is unblocked, restore the original priority (from the
thread that receives the signal).

Ronny


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18 15:49                           ` Ronny Meeus
@ 2014-12-18 16:06                             ` Jan Kiszka
  2014-12-18 17:21                               ` Ronny Meeus
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2014-12-18 16:06 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: xenomai

On 2014-12-18 16:49, Ronny Meeus wrote:
> On Thu, Dec 18, 2014 at 4:35 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2014-12-18 16:30, Gilles Chanteperdrix wrote:
>>> On Thu, Dec 18, 2014 at 04:25:40PM +0100, Ronny Meeus wrote:
>>>> On Thu, Dec 18, 2014 at 4:04 PM, Gilles Chanteperdrix
>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>> On Thu, Dec 18, 2014 at 03:58:52PM +0100, Jan Kiszka wrote:
>>>>>> On 2014-12-18 15:12, Gilles Chanteperdrix wrote:
>>>>>>> Otherwise, have you tried some alternate libc, such as musl:
>>>>>>> http://www.musl-libc.org/
>>>>>>>
>>>>>>> The following blog:
>>>>>>> http://ewontfix.com/
>>>>>>>
>>>>>>> Seems to show that the musl maintainers try and report glibc bugs
>>>>>>> and avoid them in their implementation.
>>>>>>>
>>>>>>> I have not tried xenomai with musl at all, so, maybe it does not
>>>>>>> even compile. But maybe just compiling a testcase for the condvar
>>>>>>> issue with that libc would help know if it has the same issue or
>>>>>>> not.
>>>>>>
>>>>>> Well, like with many of those "light-weight" re-implementations, the are
>>>>>> "small" issues with bits required for real-time:
>>>>>>
>>>>>> http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c
>>>>>
>>>>> On the other hand, no implementation with a clear ENOTSUPP is better
>>>>> than a partial and buggy implementation that can not be trusted anyway.
>>>>>
>>>>> --
>>>>>                                             Gilles.
>>>>
>>>> Gilles I agree.
>>>>
>>>> In the meantime I tried it already.
>>>>
>>>> This is indeed the trace I get when running my test application with musl.
>>>> # ./cond_test_arm
>>>> # hread_mutexattr_setprotocol: Not supported
>>>>
>>>> Cobalt is not an option for us either since in that case all Linux
>>>> applications will run in low-priority. Next to that we also have a
>>>> huge priority inversion each time a Linux system call is done.
>>>>
>>>> Do we have other options to fix forge?
>>>
>>> Well, three options have been proposed if I followed this thread
>>> correctly:
>>> - stop using priority inheritance for these internal mutexes, at the
>>> risk of creating priority inversions
>>> - switch to priority ceiling (but what will be the ceiling? 99?)
>>
>> Likely - part of the reason why that is no general solution.
>>
>>> - use cobalt.
>>
>> - use a patched glibc
> 
> I tried to apply the patch on glibc 2.20 but it looks like the issue is still
> present.

You will need to extend to existing condvar users and tell glibc that
those vars will be used in combination with PI mutexes
(pthread_condattr_setprotocol_np).

> Even if it would be solved with the patch, we force all users
> of forge to work with a patched version of glibc and to go to 2.20.
> This might not always be easy.

Yes, that is unhandy.

> 
>> - fix upstream glibc - non-trivial, as history shows, but long overdue
> 
> Another option is to implement the priority boost in the copperplate lib.
> Before the signal is done, change the priority to an equal priority as
> the task signaling the conditional variable (in case the prio or the
> waiting task is lower).
> Once the thread is unblocked, restore the original priority (from the
> thread that receives the signal).

That implies you know the prios of the involved threads. Doesn't sound
like a generic solution either.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18 16:06                             ` Jan Kiszka
@ 2014-12-18 17:21                               ` Ronny Meeus
  2014-12-23 17:36                                 ` Ronny Meeus
  0 siblings, 1 reply; 21+ messages in thread
From: Ronny Meeus @ 2014-12-18 17:21 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

>
> That implies you know the prios of the involved threads. Doesn't sound
> like a generic solution either.
>
> Jan
>

I think Xenomai knows the involved threads.

Philippe,
is the list of waiting threads not kept in the thread object?

Ronny


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-18 17:21                               ` Ronny Meeus
@ 2014-12-23 17:36                                 ` Ronny Meeus
  2015-01-23 16:45                                   ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Ronny Meeus @ 2014-12-23 17:36 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Thu, Dec 18, 2014 at 6:21 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote:
>>
>> That implies you know the prios of the involved threads. Doesn't sound
>> like a generic solution either.
>>
>> Jan
>>
>
> I think Xenomai knows the involved threads.
>
> Philippe,
> is the list of waiting threads not kept in the thread object?
>

Philippe,
any feedback on the discussion from your side?

Ronny


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu
  2014-12-23 17:36                                 ` Ronny Meeus
@ 2015-01-23 16:45                                   ` Philippe Gerum
  0 siblings, 0 replies; 21+ messages in thread
From: Philippe Gerum @ 2015-01-23 16:45 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: xenomai

On 12/23/2014 06:36 PM, Ronny Meeus wrote:
> On Thu, Dec 18, 2014 at 6:21 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote:
>>>
>>> That implies you know the prios of the involved threads. Doesn't sound
>>> like a generic solution either.
>>>
>>> Jan
>>>
>>
>> I think Xenomai knows the involved threads.
>>
>> Philippe,
>> is the list of waiting threads not kept in the thread object?
>>
> 
> Philippe,
> any feedback on the discussion from your side?
> 

Since designing over a blatant glibc bug does not make any sense, the
only reasonable option is to work around this issue for Mercury
specifically. Cobalt implements PI-aware condvars properly, with an
additional optimization which makes them pretty efficient, so I don't
see the point in changing for a sub-optimal option, only to fix a long
overdue glibc issue.

I pushed a work around following a brute force approach to the -next
branch, which applies a static temporary boost to the caller about to
signal or wait for a PI-enabled condvar. I won't bother for dynamic
tracking of priorities for this issue, this would introduce nasty races
and would only work for the syncobj abstraction. Besides, if the thread
signaling the condvar is the timer manager, the boost would always take
place anyway.

Hopefully this patch should help fixing the issue on your end.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-01-23 16:45 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-07 16:26 [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu Ronny Meeus
2014-12-16 17:36 ` Ronny Meeus
2014-12-16 17:58   ` Philippe Gerum
2014-12-16 19:41     ` Ronny Meeus
2014-12-16 20:07       ` Philippe Gerum
2014-12-18  8:00         ` Ronny Meeus
2014-12-18  9:04           ` Jan Kiszka
2014-12-18 12:28             ` Ronny Meeus
2014-12-18 13:35               ` Jan Kiszka
2014-12-18 14:17                 ` Ronny Meeus
2014-12-18 14:12               ` Gilles Chanteperdrix
2014-12-18 14:58                 ` Jan Kiszka
2014-12-18 15:04                   ` Gilles Chanteperdrix
2014-12-18 15:25                     ` Ronny Meeus
2014-12-18 15:30                       ` Gilles Chanteperdrix
2014-12-18 15:35                         ` Jan Kiszka
2014-12-18 15:49                           ` Ronny Meeus
2014-12-18 16:06                             ` Jan Kiszka
2014-12-18 17:21                               ` Ronny Meeus
2014-12-23 17:36                                 ` Ronny Meeus
2015-01-23 16:45                                   ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.