All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] copperplate/registry daemon connection failure
@ 2016-12-13  7:23 Ronny Meeus
  2016-12-27  9:56 ` Philippe Gerum
  0 siblings, 1 reply; 8+ messages in thread
From: Ronny Meeus @ 2016-12-13  7:23 UTC (permalink / raw)
  To: xenomai

Hello

Context: we use the Mercury core (xenomai-3.0.3).

in commit 880b3acbd876a65f8fbe8c27b09762b06c06e846:
copperplate/registry: force SCHED_OTHER on helper threads
of
Sun Jul 26 12:37:15 2015 +0200

The scheduling class of the registry threads is forced to
SCHED_OTHER  at priority 0.
This change is causing issues in our use case since our system
is fully loaded during init.

What I observe is that the application is not able to connect to
the registry daemon and exists with following error:
   0"988.361| WARNING: [main] cannot connect to registry daemon
   0"989.141| WARNING: [main] setup call copperplate failed
   0"989.369| BUG in xenomai_init(): [main] initialization failed, EAGAIN

As a test I made a change to the code and started the registry threads
in RR mode at a high priority and then the issue is not observed.
In our system all threads (application and kernel one's) are running
in the real-time domain so apps running in the OTHER domain will
have very little CPU left to consume ...

For me it is not clear how the synchronization between the daemon
and the application is happening. The daemon is setting up a
Unix domain socket to which the client connects. But how does the
application knows that the daemon has finished the creation of
the socket?

Thanks
Ronny


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] copperplate/registry daemon connection failure
  2016-12-13  7:23 [Xenomai] copperplate/registry daemon connection failure Ronny Meeus
@ 2016-12-27  9:56 ` Philippe Gerum
  2017-01-06  9:21   ` Ronny Meeus
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Gerum @ 2016-12-27  9:56 UTC (permalink / raw)
  To: Ronny Meeus, xenomai

On 12/13/2016 08:23 AM, Ronny Meeus wrote:
> Hello
> 
> Context: we use the Mercury core (xenomai-3.0.3).
> 
> in commit 880b3acbd876a65f8fbe8c27b09762b06c06e846:
> copperplate/registry: force SCHED_OTHER on helper threads
> of
> Sun Jul 26 12:37:15 2015 +0200
> 
> The scheduling class of the registry threads is forced to
> SCHED_OTHER  at priority 0.
> This change is causing issues in our use case since our system
> is fully loaded during init.
> 
> What I observe is that the application is not able to connect to
> the registry daemon and exists with following error:
>    0"988.361| WARNING: [main] cannot connect to registry daemon
>    0"989.141| WARNING: [main] setup call copperplate failed
>    0"989.369| BUG in xenomai_init(): [main] initialization failed, EAGAIN
> 
> As a test I made a change to the code and started the registry threads
> in RR mode at a high priority and then the issue is not observed.
> In our system all threads (application and kernel one's) are running
> in the real-time domain so apps running in the OTHER domain will
> have very little CPU left to consume ...
> 

libfuse assumes SIGCHLD is not ignored when calling fuse_main(), this is
the root of the issue. Thread priority is not involved in this bug,
tweaking it may only make it less likely, but does not fix it (e.g.
SMP). The core issue is with guaranteeing that a forkee exiting early
stays in a zombie state until the parent can issue waitpid() to collect
its status, which add_mount() from libfuse requires.

Some details are given here:
http://git.xenomai.org/xenomai-3.git/commit/?h=stable-3.0.x&id=4364bdd358e893b1f4f7f644a35191b3fc7f4180

> For me it is not clear how the synchronization between the daemon
> and the application is happening. The daemon is setting up a
> Unix domain socket to which the client connects. But how does the
> application knows that the daemon has finished the creation of
> the socket?
> 

It does not have too. Check the combined logic in bind_socket() and
connect_regd().

-- 
Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] copperplate/registry daemon connection failure
  2016-12-27  9:56 ` Philippe Gerum
@ 2017-01-06  9:21   ` Ronny Meeus
  2017-01-06  9:29     ` Philippe Gerum
  0 siblings, 1 reply; 8+ messages in thread
From: Ronny Meeus @ 2017-01-06  9:21 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Tue, Dec 27, 2016 at 10:56 AM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 12/13/2016 08:23 AM, Ronny Meeus wrote:
>> Hello
>>
>> Context: we use the Mercury core (xenomai-3.0.3).
>>
>> in commit 880b3acbd876a65f8fbe8c27b09762b06c06e846:
>> copperplate/registry: force SCHED_OTHER on helper threads
>> of
>> Sun Jul 26 12:37:15 2015 +0200
>>
>> The scheduling class of the registry threads is forced to
>> SCHED_OTHER  at priority 0.
>> This change is causing issues in our use case since our system
>> is fully loaded during init.
>>
>> What I observe is that the application is not able to connect to
>> the registry daemon and exists with following error:
>>    0"988.361| WARNING: [main] cannot connect to registry daemon
>>    0"989.141| WARNING: [main] setup call copperplate failed
>>    0"989.369| BUG in xenomai_init(): [main] initialization failed, EAGAIN
>>
>> As a test I made a change to the code and started the registry threads
>> in RR mode at a high priority and then the issue is not observed.
>> In our system all threads (application and kernel one's) are running
>> in the real-time domain so apps running in the OTHER domain will
>> have very little CPU left to consume ...
>>
>
> libfuse assumes SIGCHLD is not ignored when calling fuse_main(), this is
> the root of the issue. Thread priority is not involved in this bug,
> tweaking it may only make it less likely, but does not fix it (e.g.
> SMP). The core issue is with guaranteeing that a forkee exiting early
> stays in a zombie state until the parent can issue waitpid() to collect
> its status, which add_mount() from libfuse requires.
>
> Some details are given here:
> http://git.xenomai.org/xenomai-3.git/commit/?h=stable-3.0.x&id=4364bdd358e893b1f4f7f644a35191b3fc7f4180
>

I did a test with this patch but it does not resolve the issue.
I see exactly the same error message as mentioned above.
Note that I added traces to the application code (marked by connect_regd)
and in the sysregd (marked with <pid> regd).

isam_app_wrapper: Starting background application /usr/bin/taskset 1
/bin/setarch linux32 -R   /isam/slot_1101/run/isam_nt_app
mLogicalSlotId=4353 noSMAS
app_config_file=/isam/slot_1101/config/app_config
wdog_kick_file=isam_nt_app_slot_1101
connect_regd (try 0): create socket
connect_regd (try 0): connect
connect_regd (try 0): spawn daemon
2525 regd: before changing to SCHED_OTHER
connect_regd (try 1): create socket
connect_regd (try 1): connect
connect_regd (try 1): spawn daemon
2547 regd: before changing to SCHED_OTHER
connect_regd (try 2): create socket
connect_regd (try 2): connect
connect_regd (try 2): spawn daemon
2569 regd: before changing to SCHED_OTHER
   0"867.727| WARNING: [main] cannot connect to registry daemon
   0"868.518| WARNING: [main] setup call copperplate failed
   0"868.753| BUG in xenomai_init(): [main] initialization failed, EAGAIN
isam_app_wrapper: /isam/slot_1101/run/isam_nt_app has stopped!
isam_app_wrapper: System wide reset triggered due to escalation!
ISAM application /isam/slot_1101/run/isam_nt_app exited.
isam_app_wrapper: Not rebooting (config flag 'reboot' is 0)...
2525 regd: after changing to SCHED_OTHER
2547 regd: after changing to SCHED_OTHER
2569 regd: after changing to SCHED_OTHER

>From the traces you can clearly see that the daemon is started 3 times
and that it is actually started immediately but once the scheduling class
is changed, it gets scheduled out and needs to wait until the application
has given up and triggered a reboot because of the initialization error.
Only when the application has stopped the daemon threads are allowed to
run and continue with printing the traces put directly after changing the
call to change the scheduling class:

+       printf("%d regd: before changing to SCHED_OTHER\n", tmp);
+       fflush(NULL);
+
        /* Force SCHED_OTHER. */
        schedp.sched_priority = 0;
        __STD(pthread_setschedparam(pthread_self(), SCHED_OTHER, &schedp));

+       printf("%d regd: after changing to SCHED_OTHER\n", tmp);
+       fflush(NULL);


>> For me it is not clear how the synchronization between the daemon
>> and the application is happening. The daemon is setting up a
>> Unix domain socket to which the client connects. But how does the
>> application knows that the daemon has finished the creation of
>> the socket?
>>
>
> It does not have too. Check the combined logic in bind_socket() and
> connect_regd().

That logic I had seen before.
As I understand the code it just tries 3 times to connect to the daemon
and if not successful, it just tries to start it again and reconnects ...
I find it a strange logic just to try something 3 times and hope it
will succeed.
In our case the CPU is fully loaded with RT threads so my assumption is that
the daemon, running at nonRT prio will not be scheduled at all.
(also see the traces above that confirm my assumptions)

I would expect to see some kind of synchronization mechanism between the
daemon and the application.

Ronny

>
> --
> Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] copperplate/registry daemon connection failure
  2017-01-06  9:21   ` Ronny Meeus
@ 2017-01-06  9:29     ` Philippe Gerum
  2017-01-06  9:54       ` Ronny Meeus
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Gerum @ 2017-01-06  9:29 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: xenomai

On 01/06/2017 10:21 AM, Ronny Meeus wrote:
> That logic I had seen before.
> As I understand the code it just tries 3 times to connect to the daemon
> and if not successful, it just tries to start it again and reconnects ...
> I find it a strange logic just to try something 3 times and hope it
> will succeed.
> In our case the CPU is fully loaded with RT threads so my assumption is that
> the daemon, running at nonRT prio will not be scheduled at all.
> (also see the traces above that confirm my assumptions)
> 
> I would expect to see some kind of synchronization mechanism between the
> daemon and the application.
> 

The point is that such init code is aimed at running early, prior to any
Xenomai application code, this is cold bootstrap code. The fact that
your app can spawn threads overconsuming the CPU earlier than Xenomai's
basic init code is a problem for Xenomai.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] copperplate/registry daemon connection failure
  2017-01-06  9:29     ` Philippe Gerum
@ 2017-01-06  9:54       ` Ronny Meeus
  2017-01-06 11:00         ` Philippe Gerum
  0 siblings, 1 reply; 8+ messages in thread
From: Ronny Meeus @ 2017-01-06  9:54 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Fri, Jan 6, 2017 at 10:29 AM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 01/06/2017 10:21 AM, Ronny Meeus wrote:
>> That logic I had seen before.
>> As I understand the code it just tries 3 times to connect to the daemon
>> and if not successful, it just tries to start it again and reconnects ...
>> I find it a strange logic just to try something 3 times and hope it
>> will succeed.
>> In our case the CPU is fully loaded with RT threads so my assumption is that
>> the daemon, running at nonRT prio will not be scheduled at all.
>> (also see the traces above that confirm my assumptions)
>>
>> I would expect to see some kind of synchronization mechanism between the
>> daemon and the application.
>>
>
> The point is that such init code is aimed at running early, prior to any
> Xenomai application code, this is cold bootstrap code. The fact that
> your app can spawn threads overconsuming the CPU earlier than Xenomai's
> basic init code is a problem for Xenomai.
>

Philippe,

on our system we have a lot of Xenomai applications running, it can be up to 10
or more. So it is impossible to guarantee that there will be CPU power available
at the moment Xenomai init is called.
Next to the application code also the Linux kernel threads can consume a lot of
CPU power (especially during init).

Xenomai applications can be started during init but also at runtime, so it is
impossible to make assumptions about the availability of CPU power.

Ronny

> --
> Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] copperplate/registry daemon connection failure
  2017-01-06  9:54       ` Ronny Meeus
@ 2017-01-06 11:00         ` Philippe Gerum
  2017-01-06 13:16           ` Ronny Meeus
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Gerum @ 2017-01-06 11:00 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: xenomai

On 01/06/2017 10:54 AM, Ronny Meeus wrote:
> On Fri, Jan 6, 2017 at 10:29 AM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 01/06/2017 10:21 AM, Ronny Meeus wrote:
>>> That logic I had seen before.
>>> As I understand the code it just tries 3 times to connect to the daemon
>>> and if not successful, it just tries to start it again and reconnects ...
>>> I find it a strange logic just to try something 3 times and hope it
>>> will succeed.
>>> In our case the CPU is fully loaded with RT threads so my assumption is that
>>> the daemon, running at nonRT prio will not be scheduled at all.
>>> (also see the traces above that confirm my assumptions)
>>>
>>> I would expect to see some kind of synchronization mechanism between the
>>> daemon and the application.
>>>
>>
>> The point is that such init code is aimed at running early, prior to any
>> Xenomai application code, this is cold bootstrap code. The fact that
>> your app can spawn threads overconsuming the CPU earlier than Xenomai's
>> basic init code is a problem for Xenomai.
>>
> 
> Philippe,
> 
> on our system we have a lot of Xenomai applications running, it can be up to 10
> or more. So it is impossible to guarantee that there will be CPU power available
> at the moment Xenomai init is called.
> Next to the application code also the Linux kernel threads can consume a lot of
> CPU power (especially during init).
>

I don't see how your app could ever compete with drivers during the
kernel bootstrap phase, just because no application can run until user
mode is started, which is last in the process, by definition.

If referring to kernel helper threads overconsuming CPU during plain
runtime or soon after user mode is entered, maybe you should consider
determining why this happens, this does not look quite normal
(vendor-originated mmc driver with broken power mgmt, massive logging on
slow flash medium?). Maybe you did already, and I would be interested to
know about your findings.

> Xenomai applications can be started during init but also at runtime, so it is
> impossible to make assumptions about the availability of CPU power.
> 

You obviously do make assumptions about the CPU power, such as assuming
that your system can cope in a deterministic way with running distinct
or even unrelated set of CPU-hungry threads from multiple real-time apps
concurrently. Xenomai makes the assumption that the current CPU should
be able to process all of the pending regular (non-rt) activity within 3
seconds, which seems reasonable. We could make it 30, no issue with
that, but that would not address the real problem anyway.

Your point is about requiring Xenomai to work around a seemingly massive
overload condition in the regular Linux system when the app initializes,
hoping for the best. I don't think this is the way to go, this would
only paper over the core issue, with potentially nasty effects.

Typically, a consequence of raising the priority of registry threads to
address this issue would be to serve fuse-fs requests at high (rt)
priority, directly competing with other SCHED_RR/SCHED_FIFO threads in
the system, since you don't run Cobalt, and therefore the co-kernel
could not save your day in this case.

Therefore, anyone issuing "cat /var/run/xenomai/*/*" on a terminal would
actually compete with some real-time threads in your application(s),
possibly delaying them for an undefined amount of time. At any rate, our
fuse-fs threads that do string formatting to output human-readable
reports upon (interactive) request should not compete with real-time
threads, really.

Regarding the fact that your system cannot respond within 3 seconds to a
socket connection, you still have the option to start the daemon
separately, before the application is launched. Any showstopper with
that option?

-- 
Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] copperplate/registry daemon connection failure
  2017-01-06 11:00         ` Philippe Gerum
@ 2017-01-06 13:16           ` Ronny Meeus
  2017-01-08 12:02             ` Philippe Gerum
  0 siblings, 1 reply; 8+ messages in thread
From: Ronny Meeus @ 2017-01-06 13:16 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Fri, Jan 6, 2017 at 12:00 PM, Philippe Gerum <rpm@xenomai.org> wrote:
> On 01/06/2017 10:54 AM, Ronny Meeus wrote:
>> On Fri, Jan 6, 2017 at 10:29 AM, Philippe Gerum <rpm@xenomai.org> wrote:
>>> On 01/06/2017 10:21 AM, Ronny Meeus wrote:
>>>> That logic I had seen before.
>>>> As I understand the code it just tries 3 times to connect to the daemon
>>>> and if not successful, it just tries to start it again and reconnects ...
>>>> I find it a strange logic just to try something 3 times and hope it
>>>> will succeed.
>>>> In our case the CPU is fully loaded with RT threads so my assumption is that
>>>> the daemon, running at nonRT prio will not be scheduled at all.
>>>> (also see the traces above that confirm my assumptions)
>>>>
>>>> I would expect to see some kind of synchronization mechanism between the
>>>> daemon and the application.
>>>>
>>>
>>> The point is that such init code is aimed at running early, prior to any
>>> Xenomai application code, this is cold bootstrap code. The fact that
>>> your app can spawn threads overconsuming the CPU earlier than Xenomai's
>>> basic init code is a problem for Xenomai.
>>>
>>
>> Philippe,
>>
>> on our system we have a lot of Xenomai applications running, it can be up to 10
>> or more. So it is impossible to guarantee that there will be CPU power available
>> at the moment Xenomai init is called.
>> Next to the application code also the Linux kernel threads can consume a lot of
>> CPU power (especially during init).
>>
>
> I don't see how your app could ever compete with drivers during the
> kernel bootstrap phase, just because no application can run until user
> mode is started, which is last in the process, by definition.
>
> If referring to kernel helper threads overconsuming CPU during plain
> runtime or soon after user mode is entered, maybe you should consider
> determining why this happens, this does not look quite normal
> (vendor-originated mmc driver with broken power mgmt, massive logging on
> slow flash medium?). Maybe you did already, and I would be interested to
> know about your findings.
>

We have done an evolution from a non-Linux system (psos) to Linux based
system by using Xenomai. Our application's low priority threads run at pSOS
priorities below 5, which is mapped by Xenomai on the Linux FIFO scheduler.
This means that even very low priority application threads (logging etc) have
even a much higher priority than Linux apps (running on the OTHER scheduler).
These low priority application threads can even consume a complete CPU
and this is not an issue since it there is more important work to do, it will be
done by a higher priority thread.

For this model to work we needed to fit all other threads (like
shells, dropbear,
etc) in the RT range. For example the shells are running at prio -31. Linux
kernel threads at prio -95 etc. Also the init thread of Linux runs at prio -31.

So in our system is it perfectly normal that threads running in the non-RT
scheduling range are not scheduled at all.

With the previous version of Xenomai we used, the scheduling class of the
sysregd app was not explicitly set and inherited the prio of it
creator. So it was
nicely put in the medium/low priority range of the system (also -31 in
our case).
So there was no issue ...
Only  with the 3.0.3 version, we see the issue because it is
explicitly put in the
SCHED_OTHER scheduler.

>> Xenomai applications can be started during init but also at runtime, so it is
>> impossible to make assumptions about the availability of CPU power.
>>
>
> You obviously do make assumptions about the CPU power, such as assuming
> that your system can cope in a deterministic way with running distinct
> or even unrelated set of CPU-hungry threads from multiple real-time apps
> concurrently. Xenomai makes the assumption that the current CPU should
> be able to process all of the pending regular (non-rt) activity within 3
> seconds, which seems reasonable. We could make it 30, no issue with
> that, but that would not address the real problem anyway.

I do not know where the 3 seconds you mention are coming from.
I only see 200ms in the code.

        default:
                /*
                 * Make sure we sleep at least 200 ms regardless of
                 * signal receipts.
                 */
                while (usleep(200000) > 0) ;
                regd_pid = pid;
                barrier();

>
> Your point is about requiring Xenomai to work around a seemingly massive
> overload condition in the regular Linux system when the app initializes,
> hoping for the best. I don't think this is the way to go, this would
> only paper over the core issue, with potentially nasty effects.
>
> Typically, a consequence of raising the priority of registry threads to
> address this issue would be to serve fuse-fs requests at high (rt)
> priority, directly competing with other SCHED_RR/SCHED_FIFO threads in
> the system, since you don't run Cobalt, and therefore the co-kernel
> could not save your day in this case.
>
> Therefore, anyone issuing "cat /var/run/xenomai/*/*" on a terminal would
> actually compete with some real-time threads in your application(s),
> possibly delaying them for an undefined amount of time. At any rate, our
> fuse-fs threads that do string formatting to output human-readable
> reports upon (interactive) request should not compete with real-time
> threads, really.

This is acceptable for us since all threads run in the RT range, see before.
It is even correct since otherwise the "cat /var/run/xenomai/*/*" operation
would suffer from a massive priority inversion since it will only be handed
if all lower priority activities have finished and the non-RT task
will get a slot
to run ...

> Regarding the fact that your system cannot respond within 3 seconds to a
> socket connection, you still have the option to start the daemon
> separately, before the application is launched. Any showstopper with
> that option?

I think this will not solve anything. The daemon will be created, but still the
threads will be running in the non-RT range so the connect and/or the receive
will not be handled correctly ...

>
> --
> Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] copperplate/registry daemon connection failure
  2017-01-06 13:16           ` Ronny Meeus
@ 2017-01-08 12:02             ` Philippe Gerum
  0 siblings, 0 replies; 8+ messages in thread
From: Philippe Gerum @ 2017-01-08 12:02 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: xenomai

On 01/06/2017 02:16 PM, Ronny Meeus wrote:
> On Fri, Jan 6, 2017 at 12:00 PM, Philippe Gerum <rpm@xenomai.org> wrote:
>> On 01/06/2017 10:54 AM, Ronny Meeus wrote:
>>> On Fri, Jan 6, 2017 at 10:29 AM, Philippe Gerum <rpm@xenomai.org> wrote:
>>>> On 01/06/2017 10:21 AM, Ronny Meeus wrote:
>>>>> That logic I had seen before.
>>>>> As I understand the code it just tries 3 times to connect to the daemon
>>>>> and if not successful, it just tries to start it again and reconnects ...
>>>>> I find it a strange logic just to try something 3 times and hope it
>>>>> will succeed.
>>>>> In our case the CPU is fully loaded with RT threads so my assumption is that
>>>>> the daemon, running at nonRT prio will not be scheduled at all.
>>>>> (also see the traces above that confirm my assumptions)
>>>>>
>>>>> I would expect to see some kind of synchronization mechanism between the
>>>>> daemon and the application.
>>>>>
>>>>
>>>> The point is that such init code is aimed at running early, prior to any
>>>> Xenomai application code, this is cold bootstrap code. The fact that
>>>> your app can spawn threads overconsuming the CPU earlier than Xenomai's
>>>> basic init code is a problem for Xenomai.
>>>>
>>>
>>> Philippe,
>>>
>>> on our system we have a lot of Xenomai applications running, it can be up to 10
>>> or more. So it is impossible to guarantee that there will be CPU power available
>>> at the moment Xenomai init is called.
>>> Next to the application code also the Linux kernel threads can consume a lot of
>>> CPU power (especially during init).
>>>
>>
>> I don't see how your app could ever compete with drivers during the
>> kernel bootstrap phase, just because no application can run until user
>> mode is started, which is last in the process, by definition.
>>
>> If referring to kernel helper threads overconsuming CPU during plain
>> runtime or soon after user mode is entered, maybe you should consider
>> determining why this happens, this does not look quite normal
>> (vendor-originated mmc driver with broken power mgmt, massive logging on
>> slow flash medium?). Maybe you did already, and I would be interested to
>> know about your findings.
>>
> 
> We have done an evolution from a non-Linux system (psos) to Linux based
> system by using Xenomai. Our application's low priority threads run at pSOS
> priorities below 5, which is mapped by Xenomai on the Linux FIFO scheduler.
> This means that even very low priority application threads (logging etc) have
> even a much higher priority than Linux apps (running on the OTHER scheduler).
> These low priority application threads can even consume a complete CPU
> and this is not an issue since it there is more important work to do, it will be
> done by a higher priority thread.
> 
> For this model to work we needed to fit all other threads (like
> shells, dropbear,
> etc) in the RT range. For example the shells are running at prio -31. Linux
> kernel threads at prio -95 etc. Also the init thread of Linux runs at prio -31.
> 
> So in our system is it perfectly normal that threads running in the non-RT
> scheduling range are not scheduled at all.

Excluding SCHED_OTHER is definitely not a normal situation wrt Xenomai
though.

> 
> With the previous version of Xenomai we used, the scheduling class of the
> sysregd app was not explicitly set and inherited the prio of it
> creator. So it was
> nicely put in the medium/low priority range of the system (also -31 in
> our case).
> So there was no issue ...
> Only  with the 3.0.3 version,

I don't think so.

 we see the issue because it is
> explicitly put in the
> SCHED_OTHER scheduler.
> 

This update was brought in by commit #880b3ac for v3.0-rc5, nothing
changed since then in sysregd regarding thread priorities. Although such
change does have an impact on your system since it pushes all of the
Linux infrastructure to the SCHED_FIFO class, it hardly introduces a new
situation Xenomai-wise.

I explained the logic of such change already, in a regular system, we
just don't want low priority threads to compete with the real-time
activity. That would be even worse in a dual kernel configuration, since
those threads would flip runtime modes between the libfuse routines and
the sysregd implementation.

In addition, we can't force any other policy than SCHED_OTHER in
sysregd, because Xenomai allows some apps to run without root
privileges, so we may not always inherit them.

So, the only way to fix your case would be to set the scheduling policy
after sysregd has started, but before client apps issue connection
attempts. And for that, you need to code change, see below.

>>> Xenomai applications can be started during init but also at runtime, so it is
>>> impossible to make assumptions about the availability of CPU power.
>>>
>>
>> You obviously do make assumptions about the CPU power, such as assuming
>> that your system can cope in a deterministic way with running distinct
>> or even unrelated set of CPU-hungry threads from multiple real-time apps
>> concurrently. Xenomai makes the assumption that the current CPU should
>> be able to process all of the pending regular (non-rt) activity within 3
>> seconds, which seems reasonable. We could make it 30, no issue with
>> that, but that would not address the real problem anyway.
> 
> I do not know where the 3 seconds you mention are coming from.
> I only see 200ms in the code.
> 
>         default:
>                 /*
>                  * Make sure we sleep at least 200 ms regardless of
>                  * signal receipts.
>                  */
>                 while (usleep(200000) > 0) ;
>                 regd_pid = pid;
>                 barrier();
> 

Correct, the code I was looking at inadvertently is v3.0-rc1, likely
because in a previous mail you mentioned that some of your apps would
depend on pre-3.0 material (i.e. prior to changing the
copperplate_init() signature).

The code switched to a 200 ms delay with commit
#f444cb7, in the 3.0-rc4 time frame. Which means that you could revert
that patch, or wait for 15 attempts to complete or whatever count fits
your requirements.

>> Regarding the fact that your system cannot respond within 3 seconds to a
>> socket connection, you still have the option to start the daemon
>> separately, before the application is launched. Any showstopper with
>> that option?
> 
> I think this will not solve anything. The daemon will be created, but still the
> threads will be running in the non-RT range so the connect and/or the receive
> will not be handled correctly ...

# /usr/xenomai/sbin/sysregd --root=/your/registry/rootdir --daemonize
--linger
# chrt -f -p <your-rt-prio> $(pidof sysregd)

-- 
Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-01-08 12:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-13  7:23 [Xenomai] copperplate/registry daemon connection failure Ronny Meeus
2016-12-27  9:56 ` Philippe Gerum
2017-01-06  9:21   ` Ronny Meeus
2017-01-06  9:29     ` Philippe Gerum
2017-01-06  9:54       ` Ronny Meeus
2017-01-06 11:00         ` Philippe Gerum
2017-01-06 13:16           ` Ronny Meeus
2017-01-08 12:02             ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.