All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] Crash with longer dlopen/dlcose sequence
@ 2018-04-26  7:02 Edouard Tisserant
  2018-04-26  7:39 ` Edouard Tisserant
  0 siblings, 1 reply; 15+ messages in thread
From: Edouard Tisserant @ 2018-04-26  7:02 UTC (permalink / raw)
  To: Xenomai (xenomai@xenomai.org)

Good Morning !

I'm chasing the origins of a random segfault when porting Beremiz to
Xenomai 3.

Beremiz PLC runtime loads PLC logic as a shared object. Loading is
performed as dlopen call from python interpreter. Each time PLC
programmer tries a new program, previous shared object is dlcosed and
the new program is dlopened.

Of course, there is in depth checks to ensure that all
dlopen/dlclose/dlsym operations are done from main thread only, and it
is ensured that all real time tasks and resources have been closed
before dlclose.

Also, I did check that implicit call to xenomai_init_dso() really
happens, when linking shared object with bootstrap-pic.o . I also tried
explicit call to xenomai_init (once at first load or after every
dlopen), no change.

I tried last commit about this topic : "boilerplate/setup: introduce
destructors for __setup_call"
(5511e76040444af875ae1bb099c13a25b16336fc). It didn't help,
unfortunately, but did remove Xenomai "Bad syscall" warning sometimes
after dlclose.

Segfault never happen at first reload. i.e. dlopen/dlcose/dlopen never
fails. You have to at least extend the sequence to
dlopen/dlcose/dlopen/dlcose/dlopen to see the crash. In other words,
smokey/dlopen test doesn't try hard enough to catch the problem. I have
to reload about 6 times to have a crash. Also, it seems that crash has
higher probability to occur if no symbol was called from shared object
in between dlopen and dlclose (dlsym was called).

Enabling full Xenomai debug didn't display more details on the crash.
Post-mortem debug (gdb -c core) works, but gdb can't give me any
backtrace :

(gdb) bt
#0  0x00007fb8 in ?? ()
#1  0xb520c098 in ?? ()

Is there a way to have gdb telling a bit more about what happens in
boilerplate/copperplate ? How can I find where it crashes ?

Cheers,

Edouard




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-04-26  7:02 [Xenomai] Crash with longer dlopen/dlcose sequence Edouard Tisserant
@ 2018-04-26  7:39 ` Edouard Tisserant
  2018-04-26  9:24   ` Edouard Tisserant
  0 siblings, 1 reply; 15+ messages in thread
From: Edouard Tisserant @ 2018-04-26  7:39 UTC (permalink / raw)
  To: Xenomai (xenomai@xenomai.org)

@Henning Schild :

I just see your message from Tuesday : '[PATCH 3/3] build: link dlopen
libs with "nodelete"'

Is "nodelete" the only way to make it stable ? Does it apply only to
Xenomai libs (i.e. alchemy, copperplalte) or should it also apply to
final shared object that uses Xenomai libs ? 

Edouard


On 26/04/2018 09:02, Edouard Tisserant wrote:
> Good Morning !
>
> I'm chasing the origins of a random segfault when porting Beremiz to
> Xenomai 3.
>
> Beremiz PLC runtime loads PLC logic as a shared object. Loading is
> performed as dlopen call from python interpreter. Each time PLC
> programmer tries a new program, previous shared object is dlcosed and
> the new program is dlopened.
>
> Of course, there is in depth checks to ensure that all
> dlopen/dlclose/dlsym operations are done from main thread only, and it
> is ensured that all real time tasks and resources have been closed
> before dlclose.
>
> Also, I did check that implicit call to xenomai_init_dso() really
> happens, when linking shared object with bootstrap-pic.o . I also tried
> explicit call to xenomai_init (once at first load or after every
> dlopen), no change.
>
> I tried last commit about this topic : "boilerplate/setup: introduce
> destructors for __setup_call"
> (5511e76040444af875ae1bb099c13a25b16336fc). It didn't help,
> unfortunately, but did remove Xenomai "Bad syscall" warning sometimes
> after dlclose.
>
> Segfault never happen at first reload. i.e. dlopen/dlcose/dlopen never
> fails. You have to at least extend the sequence to
> dlopen/dlcose/dlopen/dlcose/dlopen to see the crash. In other words,
> smokey/dlopen test doesn't try hard enough to catch the problem. I have
> to reload about 6 times to have a crash. Also, it seems that crash has
> higher probability to occur if no symbol was called from shared object
> in between dlopen and dlclose (dlsym was called).
>
> Enabling full Xenomai debug didn't display more details on the crash.
> Post-mortem debug (gdb -c core) works, but gdb can't give me any
> backtrace :
>
> (gdb) bt
> #0  0x00007fb8 in ?? ()
> #1  0xb520c098 in ?? ()
>
> Is there a way to have gdb telling a bit more about what happens in
> boilerplate/copperplate ? How can I find where it crashes ?
>
> Cheers,
>
> Edouard
>
>




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-04-26  7:39 ` Edouard Tisserant
@ 2018-04-26  9:24   ` Edouard Tisserant
  2018-04-26 11:07     ` Henning Schild
  2018-04-26 15:30     ` Philippe Gerum
  0 siblings, 2 replies; 15+ messages in thread
From: Edouard Tisserant @ 2018-04-26  9:24 UTC (permalink / raw)
  To: Xenomai (xenomai@xenomai.org)

One more question. Sorry for flooding the list.

As a workaround to avoid leaking memory, I would like to try this :

 - xeno_stub.so : stub library, linked with bootstrap-pic.o
 - 1.so, 2.so, ... n.so : libraries calling alchemy/posix realtime
resources, NOT linked with bootstrap-pic.o

Process life-cycle would be :

- process start

dlopen(xeno_stub.so)

dlopen(1.so)
dlsym + call 1.so
dlclose(1.so)
...
dlopen(n.so)
dlsym + call n.so
dlclose(n.so)

dlclose(xeno_stub.so)

- process end

Is that correct to assume that this way, pointers setup while calling
xenomai_init() as a side effect of first dlopen() would stay valid while
other non-bootstrap-pic libraries are loaded and unloaded ?


On 26/04/2018 09:39, Edouard Tisserant wrote:
> @Henning Schild :
>
> I just see your message from Tuesday : '[PATCH 3/3] build: link dlopen
> libs with "nodelete"'
>
> Is "nodelete" the only way to make it stable ? Does it apply only to
> Xenomai libs (i.e. alchemy, copperplalte) or should it also apply to
> final shared object that uses Xenomai libs ? 
>
> Edouard
>
>
> On 26/04/2018 09:02, Edouard Tisserant wrote:
>> Good Morning !
>>
>> I'm chasing the origins of a random segfault when porting Beremiz to
>> Xenomai 3.
>>
>> Beremiz PLC runtime loads PLC logic as a shared object. Loading is
>> performed as dlopen call from python interpreter. Each time PLC
>> programmer tries a new program, previous shared object is dlcosed and
>> the new program is dlopened.
>>
>> Of course, there is in depth checks to ensure that all
>> dlopen/dlclose/dlsym operations are done from main thread only, and it
>> is ensured that all real time tasks and resources have been closed
>> before dlclose.
>>
>> Also, I did check that implicit call to xenomai_init_dso() really
>> happens, when linking shared object with bootstrap-pic.o . I also tried
>> explicit call to xenomai_init (once at first load or after every
>> dlopen), no change.
>>
>> I tried last commit about this topic : "boilerplate/setup: introduce
>> destructors for __setup_call"
>> (5511e76040444af875ae1bb099c13a25b16336fc). It didn't help,
>> unfortunately, but did remove Xenomai "Bad syscall" warning sometimes
>> after dlclose.
>>
>> Segfault never happen at first reload. i.e. dlopen/dlcose/dlopen never
>> fails. You have to at least extend the sequence to
>> dlopen/dlcose/dlopen/dlcose/dlopen to see the crash. In other words,
>> smokey/dlopen test doesn't try hard enough to catch the problem. I have
>> to reload about 6 times to have a crash. Also, it seems that crash has
>> higher probability to occur if no symbol was called from shared object
>> in between dlopen and dlclose (dlsym was called).
>>
>> Enabling full Xenomai debug didn't display more details on the crash.
>> Post-mortem debug (gdb -c core) works, but gdb can't give me any
>> backtrace :
>>
>> (gdb) bt
>> #0  0x00007fb8 in ?? ()
>> #1  0xb520c098 in ?? ()
>>
>> Is there a way to have gdb telling a bit more about what happens in
>> boilerplate/copperplate ? How can I find where it crashes ?
>>
>> Cheers,
>>
>> Edouard
>>
>>
>




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-04-26  9:24   ` Edouard Tisserant
@ 2018-04-26 11:07     ` Henning Schild
  2018-04-26 15:30     ` Philippe Gerum
  1 sibling, 0 replies; 15+ messages in thread
From: Henning Schild @ 2018-04-26 11:07 UTC (permalink / raw)
  To: Edouard Tisserant; +Cc: Xenomai (xenomai@xenomai.org)

Hi,

i recently looked into dlopen()/dlclose() again, because the testcase i
wrote did not really work.

What i found is that dlclose() currently can not work because we lack
destructor code for the core and skins. Removing the setup_descriptors
from the list is only a small fraction of what would actually be needed.

In fact we would need the reverse of
"struct setup_descriptor->init()" for every file in lib/*/init.c.
On dlclose() one would need to call those and remove the descriptor
from the list. But since there are no destructors we are leaking half
initialized state because dlclose() might not actually go and free all
its memory/state.

So for now the state is that dlclose() is not supported, restart your
application if you need another "plugin".
Enabling dlclose() is sure possible, but not trivial.

Could you please have a look at
http://www.xenomai.org/pipermail/xenomai/2018-April/038821.html

That will actually "prevent" dlclose.

Henning

Am Thu, 26 Apr 2018 11:24:19 +0200
schrieb Edouard Tisserant <edouard.tisserant@gmail.com>:

> One more question. Sorry for flooding the list.
> 
> As a workaround to avoid leaking memory, I would like to try this :
> 
>  - xeno_stub.so : stub library, linked with bootstrap-pic.o
>  - 1.so, 2.so, ... n.so : libraries calling alchemy/posix realtime
> resources, NOT linked with bootstrap-pic.o
> 
> Process life-cycle would be :
> 
> - process start
> 
> dlopen(xeno_stub.so)
> 
> dlopen(1.so)
> dlsym + call 1.so
> dlclose(1.so)
> ...
> dlopen(n.so)
> dlsym + call n.so
> dlclose(n.so)
> 
> dlclose(xeno_stub.so)
> 
> - process end
> 
> Is that correct to assume that this way, pointers setup while calling
> xenomai_init() as a side effect of first dlopen() would stay valid
> while other non-bootstrap-pic libraries are loaded and unloaded ?
> 
> 
> On 26/04/2018 09:39, Edouard Tisserant wrote:
> > @Henning Schild :
> >
> > I just see your message from Tuesday : '[PATCH 3/3] build: link
> > dlopen libs with "nodelete"'
> >
> > Is "nodelete" the only way to make it stable ? Does it apply only to
> > Xenomai libs (i.e. alchemy, copperplalte) or should it also apply to
> > final shared object that uses Xenomai libs ? 
> >
> > Edouard
> >
> >
> > On 26/04/2018 09:02, Edouard Tisserant wrote:  
> >> Good Morning !
> >>
> >> I'm chasing the origins of a random segfault when porting Beremiz
> >> to Xenomai 3.
> >>
> >> Beremiz PLC runtime loads PLC logic as a shared object. Loading is
> >> performed as dlopen call from python interpreter. Each time PLC
> >> programmer tries a new program, previous shared object is dlcosed
> >> and the new program is dlopened.
> >>
> >> Of course, there is in depth checks to ensure that all
> >> dlopen/dlclose/dlsym operations are done from main thread only,
> >> and it is ensured that all real time tasks and resources have been
> >> closed before dlclose.
> >>
> >> Also, I did check that implicit call to xenomai_init_dso() really
> >> happens, when linking shared object with bootstrap-pic.o . I also
> >> tried explicit call to xenomai_init (once at first load or after
> >> every dlopen), no change.
> >>
> >> I tried last commit about this topic : "boilerplate/setup:
> >> introduce destructors for __setup_call"
> >> (5511e76040444af875ae1bb099c13a25b16336fc). It didn't help,
> >> unfortunately, but did remove Xenomai "Bad syscall" warning
> >> sometimes after dlclose.
> >>
> >> Segfault never happen at first reload. i.e. dlopen/dlcose/dlopen
> >> never fails. You have to at least extend the sequence to
> >> dlopen/dlcose/dlopen/dlcose/dlopen to see the crash. In other
> >> words, smokey/dlopen test doesn't try hard enough to catch the
> >> problem. I have to reload about 6 times to have a crash. Also, it
> >> seems that crash has higher probability to occur if no symbol was
> >> called from shared object in between dlopen and dlclose (dlsym was
> >> called).
> >>
> >> Enabling full Xenomai debug didn't display more details on the
> >> crash. Post-mortem debug (gdb -c core) works, but gdb can't give
> >> me any backtrace :
> >>
> >> (gdb) bt
> >> #0  0x00007fb8 in ?? ()
> >> #1  0xb520c098 in ?? ()
> >>
> >> Is there a way to have gdb telling a bit more about what happens in
> >> boilerplate/copperplate ? How can I find where it crashes ?
> >>
> >> Cheers,
> >>
> >> Edouard
> >>
> >>  
> >  
> 
> 
> 
> _______________________________________________
> Xenomai mailing list
> Xenomai@xenomai.org
> https://xenomai.org/mailman/listinfo/xenomai



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-04-26  9:24   ` Edouard Tisserant
  2018-04-26 11:07     ` Henning Schild
@ 2018-04-26 15:30     ` Philippe Gerum
  2018-04-26 17:16       ` Edouard Tisserant
  1 sibling, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2018-04-26 15:30 UTC (permalink / raw)
  To: Edouard Tisserant, Xenomai (xenomai@xenomai.org)

On 04/26/2018 11:24 AM, Edouard Tisserant wrote:
> One more question. Sorry for flooding the list.
> 
> As a workaround to avoid leaking memory, I would like to try this :
> 
>  - xeno_stub.so : stub library, linked with bootstrap-pic.o
>  - 1.so, 2.so, ... n.so : libraries calling alchemy/posix realtime
> resources, NOT linked with bootstrap-pic.o
> 
> Process life-cycle would be :
> 
> - process start
> 
> dlopen(xeno_stub.so)
> 
> dlopen(1.so)
> dlsym + call 1.so
> dlclose(1.so)
> ...
> dlopen(n.so)
> dlsym + call n.so
> dlclose(n.so)
> 
> dlclose(xeno_stub.so)
> 
> - process end
> 
> Is that correct to assume that this way, pointers setup while calling
> xenomai_init() as a side effect of first dlopen() would stay valid while
> other non-bootstrap-pic libraries are loaded and unloaded ?

If by pointer setup, you mean all the init stuff run by the various
Xenomai libraries when called on behalf of xenomai_init(), then yes. The
bootstrap mechanism should be able to support the lifecycle described above.

The init sequence is explained here:
http://xenomai.org/2015/05/application-setup-and-init/#Initialization_sequence

This document refers to the bootstrap module included in the main
executable, but same logic would apply to DSOs which include
bootstrap-pic such as xeno_stub.so.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-04-26 15:30     ` Philippe Gerum
@ 2018-04-26 17:16       ` Edouard Tisserant
  2018-04-27 10:31         ` Edouard Tisserant
  0 siblings, 1 reply; 15+ messages in thread
From: Edouard Tisserant @ 2018-04-26 17:16 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai (xenomai@xenomai.org)


> If by pointer setup, you mean all the init stuff run by the various
> Xenomai libraries when called on behalf of xenomai_init(), then yes.
Perfect ! Then I will build that xeno_stub.so, dlopen() it and it will
be all fine....

Wait. Did I miss something ? I need to build and dlopen a stub shared
object in order to call something from bootstrap*.o ? Could I do that
directly by dlopening/calling the right shared object directly ?

It seems that __xenomai_init() calls many things from the setup list, in
multiple shared objects. So I would need to build another flavor of the
boilerplate: bootstrap.so, from witch I can call xenomai_init() directly
using python's ctypes. Does it look realistic ?

Also I wonder what happens to non-real-time threads witch have been
created before xenomai_init(). Is that safe if they are already running
threads using posix when xenomai_init() is called ? Should xenomai_int
be called before any thread is created ?

> The init sequence is explained here:
> http://xenomai.org/2015/05/application-setup-and-init/#Initialization_sequence
Seems like search engines never link to the real stuff. Many thanks.

--
Edouard



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-04-26 17:16       ` Edouard Tisserant
@ 2018-04-27 10:31         ` Edouard Tisserant
  2018-04-27 14:43           ` Edouard Tisserant
  0 siblings, 1 reply; 15+ messages in thread
From: Edouard Tisserant @ 2018-04-27 10:31 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai (xenomai@xenomai.org)


> It seems that __xenomai_init() calls many things from the setup list, in
> multiple shared objects. So I would need to build another flavor of the
> boilerplate: bootstrap.so, from witch I can call xenomai_init() directly
> using python's ctypes. Does it look realistic ?

For the record, following python code calls xenomai_init_dso() directly
from libcobalt.so.

from ctypes import *
# declaration
argvp_type = POINTER(POINTER(POINTER(c_char)))
cobalt = cdll.LoadLibrary("libcobalt.so")
cobalt.xenomai_init_dso.argtypes = (POINTER(c_int), argvp_type)
# init params
argc = c_int(0)
argv = (POINTER(c_char) * 2)()
argv[0] = create_string_buffer("prog_name")
argv[1] = None
# call
cobalt.xenomai_init_dso(pointer(argc), cast(pointer(argv), argvp_type))

Now let see if dlopen/dlclose some code using alchemy segfaults or not...

--
Edouard




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-04-27 10:31         ` Edouard Tisserant
@ 2018-04-27 14:43           ` Edouard Tisserant
  2018-04-27 15:43             ` Philippe Gerum
  0 siblings, 1 reply; 15+ messages in thread
From: Edouard Tisserant @ 2018-04-27 14:43 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai (xenomai@xenomai.org)


> Now let see if dlopen/dlclose some code using alchemy segfaults or not...

Well, it didn't work as-is, but I finally got it working !

If I was dlopening anything relying on copperplate after that python
code, then it was making an assert :  __register_setup_call: Assertion
`!main_init_done' failed.

The reason is that all xenomai libraries we plan to use have to be
loaded _before_ call to xenomai_init, so that their setup call is
registered.

Hereafter is updated (and simplified) python code :

from ctypes import *
for name in ["cobalt", "modechk", "copperplate", "alchemy"]:
    globals()[name] = CDLL("lib"+name+".so", mode=RTLD_GLOBAL)
cobalt.xenomai_init(pointer(c_int(0)),
pointer((POINTER(c_char)*2)(create_string_buffer("python"), None)))

Note : Order of dlopen matters. Argument passed to
xenomai_init(&argc,&argv) are here only to prevent de-referencing NULL.

After this being executed, you can load and unload many shared object
that use alchemy and posix (they should not be linked to boostrap_pic.o)

Thanks for your answers.

--
Edouard




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-04-27 14:43           ` Edouard Tisserant
@ 2018-04-27 15:43             ` Philippe Gerum
  2018-05-09  7:45               ` Edouard Tisserant
  0 siblings, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2018-04-27 15:43 UTC (permalink / raw)
  To: Edouard Tisserant, Xenomai (xenomai@xenomai.org)

On 04/27/2018 04:43 PM, Edouard Tisserant wrote:
> 
>> Now let see if dlopen/dlclose some code using alchemy segfaults or not...
> 
> Well, it didn't work as-is, but I finally got it working !
> 
> If I was dlopening anything relying on copperplate after that python
> code, then it was making an assert :  __register_setup_call: Assertion
> `!main_init_done' failed.
> 
> The reason is that all xenomai libraries we plan to use have to be
> loaded _before_ call to xenomai_init, so that their setup call is
> registered.
> 
> Hereafter is updated (and simplified) python code :
> 
> from ctypes import *
> for name in ["cobalt", "modechk", "copperplate", "alchemy"]:
>     globals()[name] = CDLL("lib"+name+".so", mode=RTLD_GLOBAL)
> cobalt.xenomai_init(pointer(c_int(0)),
> pointer((POINTER(c_char)*2)(create_string_buffer("python"), None)))
> 
> Note : Order of dlopen matters. Argument passed to
> xenomai_init(&argc,&argv) are here only to prevent de-referencing NULL.
> 
> After this being executed, you can load and unload many shared object
> that use alchemy and posix (they should not be linked to boostrap_pic.o)
> 

Looks good. The key point is that xenomai_init() is the call from the
API which controls the whole Xenomai init process, after which Xenomai
libs are fully functional. This explains why any library which wants to
be part of that process needs its setup descriptor to be known and
registered prior to issuing this call.

The bootstrap module is just a helper that calls xenomai_init()
automatically before the main() routine is entered (bootstrap.o) when
glued to a final executable, or right after a DSO is loaded
(bootstrap-pic.o) when glued to a shared library. If the application
wants a fine-grained control over the init sequence instead, it just
needs not to include the bootstrap module in the executable/DSO, calling
xenomai_init() manually when it sees fit.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-04-27 15:43             ` Philippe Gerum
@ 2018-05-09  7:45               ` Edouard Tisserant
  2018-05-09  8:57                 ` Edouard Tisserant
  0 siblings, 1 reply; 15+ messages in thread
From: Edouard Tisserant @ 2018-05-09  7:45 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai (xenomai@xenomai.org)

I detected this in my dmesg :

[Xenomai] bad syscall <0xf0002>

The message isn't associated to any crash. It doesn't seem to break
anything, but it doesn't smell good... does it ?

It happens each time I dlopen a cobalt-wrapped shared object (also
happens if just using alchemy, without any other special linker
instuctions).

I was looking at kernel/cobalt/posix/syscall*,  but still I couldn't
figure out what is emitting that wrong syscall when dlopening. In case
it matters, OABI compatibility is not enabled in my build, and target is
an imx28.

I'm a bit lost. Where should I search ?




On 27/04/2018 17:43, Philippe Gerum wrote:
> On 04/27/2018 04:43 PM, Edouard Tisserant wrote:
>>> Now let see if dlopen/dlclose some code using alchemy segfaults or not...
>> Well, it didn't work as-is, but I finally got it working !
>>
>> If I was dlopening anything relying on copperplate after that python
>> code, then it was making an assert :  __register_setup_call: Assertion
>> `!main_init_done' failed.
>>
>> The reason is that all xenomai libraries we plan to use have to be
>> loaded _before_ call to xenomai_init, so that their setup call is
>> registered.
>>
>> Hereafter is updated (and simplified) python code :
>>
>> from ctypes import *
>> for name in ["cobalt", "modechk", "copperplate", "alchemy"]:
>>     globals()[name] = CDLL("lib"+name+".so", mode=RTLD_GLOBAL)
>> cobalt.xenomai_init(pointer(c_int(0)),
>> pointer((POINTER(c_char)*2)(create_string_buffer("python"), None)))
>>
>> Note : Order of dlopen matters. Argument passed to
>> xenomai_init(&argc,&argv) are here only to prevent de-referencing NULL.
>>
>> After this being executed, you can load and unload many shared object
>> that use alchemy and posix (they should not be linked to boostrap_pic.o)
>>
> Looks good. The key point is that xenomai_init() is the call from the
> API which controls the whole Xenomai init process, after which Xenomai
> libs are fully functional. This explains why any library which wants to
> be part of that process needs its setup descriptor to be known and
> registered prior to issuing this call.
>
> The bootstrap module is just a helper that calls xenomai_init()
> automatically before the main() routine is entered (bootstrap.o) when
> glued to a final executable, or right after a DSO is loaded
> (bootstrap-pic.o) when glued to a shared library. If the application
> wants a fine-grained control over the init sequence instead, it just
> needs not to include the bootstrap module in the executable/DSO, calling
> xenomai_init() manually when it sees fit.
>



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-05-09  7:45               ` Edouard Tisserant
@ 2018-05-09  8:57                 ` Edouard Tisserant
  2018-05-09 10:27                   ` Edouard Tisserant
  0 siblings, 1 reply; 15+ messages in thread
From: Edouard Tisserant @ 2018-05-09  8:57 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai (xenomai@xenomai.org)

> [Xenomai] bad syscall <0xf0002>

It appears that 0xf0002 is a private ARM SWI, as declared in linux'
/arch/arm/include/uapi/asm/unistd.h :

/*
 * The following SWIs are ARM private.
 */
#define __ARM_NR_BASE           (__NR_SYSCALL_BASE+0x0f0000)
#define __ARM_NR_breakpoint     (__ARM_NR_BASE+1)
#define __ARM_NR_cacheflush     (__ARM_NR_BASE+2)
...

Code in Xenomai's /kernel/cobalt/posix/syscall.c must be wrong :

linux_syscall:
    code = __xn_get_syscall_nr(regs);
    if (code >= NR_syscalls)
        goto bad_syscall;

NR_syscalls is defined in kernel as 400, so private ARM SWIs are always
rejected.

What should we do ? Can we safely ignore that cacheflush syscall is not
being executed ?

Otherwise, what should be the logic to determine good or bad syscalls ?


On 09/05/2018 09:45, Edouard Tisserant wrote:
> I detected this in my dmesg :
>
> [Xenomai] bad syscall <0xf0002>
>
> The message isn't associated to any crash. It doesn't seem to break
> anything, but it doesn't smell good... does it ?
>
> It happens each time I dlopen a cobalt-wrapped shared object (also
> happens if just using alchemy, without any other special linker
> instuctions).
>
> I was looking at kernel/cobalt/posix/syscall*,  but still I couldn't
> figure out what is emitting that wrong syscall when dlopening. In case
> it matters, OABI compatibility is not enabled in my build, and target is
> an imx28.
>
> I'm a bit lost. Where should I search ?
>
>
>
>
> On 27/04/2018 17:43, Philippe Gerum wrote:
>> On 04/27/2018 04:43 PM, Edouard Tisserant wrote:
>>>> Now let see if dlopen/dlclose some code using alchemy segfaults or not...
>>> Well, it didn't work as-is, but I finally got it working !
>>>
>>> If I was dlopening anything relying on copperplate after that python
>>> code, then it was making an assert :  __register_setup_call: Assertion
>>> `!main_init_done' failed.
>>>
>>> The reason is that all xenomai libraries we plan to use have to be
>>> loaded _before_ call to xenomai_init, so that their setup call is
>>> registered.
>>>
>>> Hereafter is updated (and simplified) python code :
>>>
>>> from ctypes import *
>>> for name in ["cobalt", "modechk", "copperplate", "alchemy"]:
>>>     globals()[name] = CDLL("lib"+name+".so", mode=RTLD_GLOBAL)
>>> cobalt.xenomai_init(pointer(c_int(0)),
>>> pointer((POINTER(c_char)*2)(create_string_buffer("python"), None)))
>>>
>>> Note : Order of dlopen matters. Argument passed to
>>> xenomai_init(&argc,&argv) are here only to prevent de-referencing NULL.
>>>
>>> After this being executed, you can load and unload many shared object
>>> that use alchemy and posix (they should not be linked to boostrap_pic.o)
>>>
>> Looks good. The key point is that xenomai_init() is the call from the
>> API which controls the whole Xenomai init process, after which Xenomai
>> libs are fully functional. This explains why any library which wants to
>> be part of that process needs its setup descriptor to be known and
>> registered prior to issuing this call.
>>
>> The bootstrap module is just a helper that calls xenomai_init()
>> automatically before the main() routine is entered (bootstrap.o) when
>> glued to a final executable, or right after a DSO is loaded
>> (bootstrap-pic.o) when glued to a shared library. If the application
>> wants a fine-grained control over the init sequence instead, it just
>> needs not to include the bootstrap module in the executable/DSO, calling
>> xenomai_init() manually when it sees fit.
>>



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-05-09  8:57                 ` Edouard Tisserant
@ 2018-05-09 10:27                   ` Edouard Tisserant
  2018-05-12 17:08                     ` Philippe Gerum
  0 siblings, 1 reply; 15+ messages in thread
From: Edouard Tisserant @ 2018-05-09 10:27 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai (xenomai@xenomai.org)


>> [Xenomai] bad syscall <0xf0002>
> #define __ARM_NR_BASE           (__NR_SYSCALL_BASE+0x0f0000)
> #define __ARM_NR_cacheflush     (__ARM_NR_BASE+2)

This special syscall is issued by Glibc in _dl_reloacate_object (see
/elf/dl-reloc.c and /sysdeps/unix/sysv/linux/arm/dl-machine.h)

As a workaround, rebuilding everything with -fPIC avoids relocation, and
then bad syscall.








^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-05-09 10:27                   ` Edouard Tisserant
@ 2018-05-12 17:08                     ` Philippe Gerum
  2018-05-16  9:20                       ` Edouard Tisserant
  0 siblings, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2018-05-12 17:08 UTC (permalink / raw)
  To: Edouard Tisserant, Xenomai (xenomai@xenomai.org)

On 05/09/2018 12:27 PM, Edouard Tisserant wrote:
> 
>>> [Xenomai] bad syscall <0xf0002>
>> #define __ARM_NR_BASE           (__NR_SYSCALL_BASE+0x0f0000)
>> #define __ARM_NR_cacheflush     (__ARM_NR_BASE+2)
> 
> This special syscall is issued by Glibc in _dl_reloacate_object (see
> /elf/dl-reloc.c and /sysdeps/unix/sysv/linux/arm/dl-machine.h)
> 
> As a workaround, rebuilding everything with -fPIC avoids relocation, and
> then bad syscall.
> 

Thanks for digging this out. Could you try this patch, keeping the relocation enabled in your build? TIA,

diff --git a/kernel/cobalt/arch/arm/include/asm/xenomai/syscall.h b/kernel/cobalt/arch/arm/include/asm/xenomai/syscall.h
index 92afe8d9d..baa352181 100644
--- a/kernel/cobalt/arch/arm/include/asm/xenomai/syscall.h
+++ b/kernel/cobalt/arch/arm/include/asm/xenomai/syscall.h
@@ -33,6 +33,11 @@
 #define __ARM_NR_ipipe	(__NR_SYSCALL_BASE + XENO_ARM_SYSCALL)
 #endif
 
+/*
+ * Cobalt syscall numbers can be fetched from ARM_ORIG_r0 with ARM_r7
+ * containing the Xenomai syscall marker, Linux syscalls directly from
+ * ARM_r7 (may require the OABI tweak).
+ */
 #define __xn_reg_sys(__regs)	((__regs)->ARM_ORIG_r0)
 /* In OABI_COMPAT mode, handle both OABI and EABI userspace syscalls */
 #ifdef CONFIG_OABI_COMPAT
@@ -46,11 +51,15 @@
 #define __xn_syscall(__regs)	(__xn_reg_sys(__regs) & ~__COBALT_SYSCALL_BIT)
 
 /*
- * Returns the syscall number depending on the handling core. Cobalt
- * syscall numbers can be fetched from ARM_ORIG_r0 with ARM_r7
- * containing the Xenomai syscall marker, Linux syscalls directly from
- * ARM_r7 (may require the OABI tweak).
+ * Root syscall number with predicate (valid only if
+ * !__xn_syscall_p(__regs)).
  */
+#define __xn_rootcall_p(__regs, __code)					\
+	({								\
+		*(__code) = __xn_abi_decode(__regs);			\
+		*(__code) < NR_syscalls || *(__code) >= __ARM_NR_BASE;	\
+	})
+	
 static inline long __xn_get_syscall_nr(struct pt_regs *regs)
 {
 	return __xn_syscall_p(regs) ? __xn_reg_sys(regs) : __xn_abi_decode(regs);
diff --git a/kernel/cobalt/posix/syscall.c b/kernel/cobalt/posix/syscall.c
index 68700a336..058a8282b 100644
--- a/kernel/cobalt/posix/syscall.c
+++ b/kernel/cobalt/posix/syscall.c
@@ -666,10 +666,6 @@ ret_handled:
 	return KEVENT_STOP;
 
 linux_syscall:
-	code = __xn_get_syscall_nr(regs);
-	if (code >= NR_syscalls)
-		goto bad_syscall;
-
 	if (xnsched_root_p())
 		/*
 		 * The call originates from the Linux domain, either
@@ -679,6 +675,9 @@ linux_syscall:
 		 */
 		return KEVENT_PROPAGATE;
 
+	if (!__xn_rootcall_p(regs, &code))
+		goto bad_syscall;
+
 	/*
 	 * We know this is a Cobalt thread since it runs over the head
 	 * domain, however the current syscall should be handled by

-- 
Philippe.


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-05-12 17:08                     ` Philippe Gerum
@ 2018-05-16  9:20                       ` Edouard Tisserant
  2018-05-18  7:08                         ` Philippe Gerum
  0 siblings, 1 reply; 15+ messages in thread
From: Edouard Tisserant @ 2018-05-16  9:20 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai (xenomai@xenomai.org)


>>>> [Xenomai] bad syscall <0xf0002>

> Thanks for digging this out. Could you try this patch, keeping the relocation enabled in your build? TIA,

Works for me : No more "bad syscall" complain even if linking with some
non -fPIC compiled objects.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Crash with longer dlopen/dlcose sequence
  2018-05-16  9:20                       ` Edouard Tisserant
@ 2018-05-18  7:08                         ` Philippe Gerum
  0 siblings, 0 replies; 15+ messages in thread
From: Philippe Gerum @ 2018-05-18  7:08 UTC (permalink / raw)
  To: Edouard Tisserant, Xenomai (xenomai@xenomai.org)

On 05/16/2018 11:20 AM, Edouard Tisserant wrote:
> 
>>>>> [Xenomai] bad syscall <0xf0002>
> 
>> Thanks for digging this out. Could you try this patch, keeping the relocation enabled in your build? TIA,
> 
> Works for me : No more "bad syscall" complain even if linking with some
> non -fPIC compiled objects.
> 

Thanks for following up on this. I merged the change into the -next
branch for now.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-05-18  7:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-26  7:02 [Xenomai] Crash with longer dlopen/dlcose sequence Edouard Tisserant
2018-04-26  7:39 ` Edouard Tisserant
2018-04-26  9:24   ` Edouard Tisserant
2018-04-26 11:07     ` Henning Schild
2018-04-26 15:30     ` Philippe Gerum
2018-04-26 17:16       ` Edouard Tisserant
2018-04-27 10:31         ` Edouard Tisserant
2018-04-27 14:43           ` Edouard Tisserant
2018-04-27 15:43             ` Philippe Gerum
2018-05-09  7:45               ` Edouard Tisserant
2018-05-09  8:57                 ` Edouard Tisserant
2018-05-09 10:27                   ` Edouard Tisserant
2018-05-12 17:08                     ` Philippe Gerum
2018-05-16  9:20                       ` Edouard Tisserant
2018-05-18  7:08                         ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.