All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] segfault in printer_loop()
@ 2017-11-01  6:29 C Smith
  2017-11-01  7:35 ` Jan Kiszka
  0 siblings, 1 reply; 8+ messages in thread
From: C Smith @ 2017-11-01  6:29 UTC (permalink / raw)
  To: xenomai

I finally caught all the variables in a corefile in gdb:
(gdb) bt
#0  0xb76d70db in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/libpthread.so.0
#1  0xb76f97f4 in printer_loop (arg=0x0) at rt_print.c:685
#2  0xb76d3adf in start_thread () from /lib/libpthread.so.0
#3  0xb746444e in clone () from /lib/libc.so.6
(gdb) print printer_wakeup
$1 = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
__woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
__size = '\000' <repeats 47 times>, __align = 0}
(gdb) print buffer_lock
$2 = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 0, __nusers
= 0, {__spins = 0, __list = {__next = 0x0}}}, __size = '\000' <repeats 23
times>, __align = 0}
(gdb) print buffers
$3 = 4
(gdb) print arg
No symbol "arg" in current context.
(gdb) print mask
No symbol "mask" in current context.
(gdb) print unlock
$4 = {void (void *)} 0xb76f96f9 <unlock>
(gdb) info threads
  Id   Target Id         Frame
  2    Thread 0xb73686c0 (LWP 20462) 0xffffe424 in ?? ()
* 1    Thread 0xb76f5b40 (LWP 20464) 0xb76d70db in
pthread_cond_wait@@GLIBC_2.3.2
() from /lib/libpthread.so.0
(gdb) print &printer_wakeup
$5 = (pthread_cond_t *) 0xb76fda20
(gdb) print &buffer_lock
$6 = (pthread_mutex_t *) 0xb76fd9fc

I see now that all variables used in this function are static on the heap,
and thus they are not null pointers, so what could cause a segfault?

Thanks, -C Smith

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] segfault in printer_loop()
  2017-11-01  6:29 [Xenomai] segfault in printer_loop() C Smith
@ 2017-11-01  7:35 ` Jan Kiszka
       [not found]   ` <CA+K1mPF+SOhOeVpYktjNCcD7u403CtUXkM1Hcz_SS-6wwG50xg@mail.gmail.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kiszka @ 2017-11-01  7:35 UTC (permalink / raw)
  To: C Smith, xenomai

On 2017-11-01 07:29, C Smith wrote:
> I finally caught all the variables in a corefile in gdb:
> (gdb) bt
> #0  0xb76d70db in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #1  0xb76f97f4 in printer_loop (arg=0x0) at rt_print.c:685
> #2  0xb76d3adf in start_thread () from /lib/libpthread.so.0
> #3  0xb746444e in clone () from /lib/libc.so.6
> (gdb) print printer_wakeup
> $1 = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
> __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
> __size = '\000' <repeats 47 times>, __align = 0}
> (gdb) print buffer_lock
> $2 = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 0, __nusers
> = 0, {__spins = 0, __list = {__next = 0x0}}}, __size = '\000' <repeats 23
> times>, __align = 0}
> (gdb) print buffers
> $3 = 4
> (gdb) print arg
> No symbol "arg" in current context.
> (gdb) print mask
> No symbol "mask" in current context.
> (gdb) print unlock
> $4 = {void (void *)} 0xb76f96f9 <unlock>
> (gdb) info threads
>   Id   Target Id         Frame
>   2    Thread 0xb73686c0 (LWP 20462) 0xffffe424 in ?? ()
> * 1    Thread 0xb76f5b40 (LWP 20464) 0xb76d70db in
> pthread_cond_wait@@GLIBC_2.3.2
> () from /lib/libpthread.so.0
> (gdb) print &printer_wakeup
> $5 = (pthread_cond_t *) 0xb76fda20
> (gdb) print &buffer_lock
> $6 = (pthread_mutex_t *) 0xb76fd9fc
> 
> I see now that all variables used in this function are static on the heap,
> and thus they are not null pointers, so what could cause a segfault?

The crash is inside pthread_cond_wait, likely while it tries to access
some pointer that the condvar or the mutex structure contains. You could
check if a) both have been properly initialized prior to the crash and
b) none of them have been corrupted by some faulty code. I suspect the
latter is problem here.

Once you identified a corrupted field, you could set a memory watchpoint
on its modification, hopefully tracking down the causing code.

Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://xenomai.org/pipermail/xenomai/attachments/20171101/89cb6eb9/attachment.sig>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] segfault in printer_loop()
       [not found]   ` <CA+K1mPF+SOhOeVpYktjNCcD7u403CtUXkM1Hcz_SS-6wwG50xg@mail.gmail.com>
@ 2017-11-10  7:02     ` Jan Kiszka
       [not found]       ` <CA+K1mPGSTNyk0JZPQs4sSyuu+Xbi=cChCwFU1uokM9gNAe6n2Q@mail.gmail.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kiszka @ 2017-11-10  7:02 UTC (permalink / raw)
  To: C Smith, Xenomai

On 2017-11-10 07:58, C Smith wrote:
> Agreed the segfault is inside pthread_cond_wait(), the contents of the
> args are seen in previous post.
> dmesg says this:
> app[12316]: segfault at c ip b76fe0db sp b771c268 error 4 in
> libpthread-2.15.so <http://libpthread-2.15.so>[b76f4000+16000]
> 
> And gdb shows me the same address. After a segfault generated inside gdb:
> p $_siginfo._sifields._sigfault.si_addr
> $9 = (void *) 0xc
> 
> I've done further testing and in gdb I found that my app segfaults
> before hitting the first line of main().
> Thus I am unable to catch it in gdb with a hardware watchpoint. I
> attempted to do so by first making my
> app hit a breakpoint on the first line of main(), then I set a
> watchpoint on 0xC in
> gdb and run, but I never get a segfault after that point, after over 100
> runs.

You can set a breakpoint on __rt_print_init, e.g.

> 
> Note that the app launches only 1 realtime thread now in these tests (in
> original tests it had 3 threads).
> Here is the one way I was able to get the app to run without
> segfaulting, even with multiple real
> time threads: I set kernel boot option maxcpus=1.  (on a SMP kernel with
> 4 cores). I was then able to run
> the app over 80 times with no segfault.
> 
> So the segfault is happening on about 10% of runs, in printer_loop(),
> apparently before the first line of main(), and I am unable to
> catch the bad memory access in a debugger with a watchpoint.
> Do you have a suggestion as to how to further debug this?

See above. Maybe that alone will give you a hint: if that function, for
what ever reason, happens to be called twice, that could explain the
issue as well. Then please catch the backtraces of all invocations.

Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://xenomai.org/pipermail/xenomai/attachments/20171110/bfad5d61/attachment.sig>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] segfault in printer_loop()
       [not found]       ` <CA+K1mPGSTNyk0JZPQs4sSyuu+Xbi=cChCwFU1uokM9gNAe6n2Q@mail.gmail.com>
@ 2017-11-10 10:07         ` Jan Kiszka
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kiszka @ 2017-11-10 10:07 UTC (permalink / raw)
  To: C Smith, Xenomai

Please always keep the list in CC.

On 2017-11-10 08:34, C Smith wrote:
> The hardware watchpoint did not catch the bad memory access. Here is the
> gdb session based on your advice:
> (gdb) r
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/libthread_db.so.1".
> [New Thread 0xb7fccb40 (LWP 22537)]
> 
> Breakpoint 4, __rt_print_init () at rt_print.c:756
> (gdb) set variable gDebug = 0xC
> (gdb) watch *gDebug
> Hardware watchpoint 10: *gDebug

You need to set the watchpoint in the address of the condition variable
field that is going to be changed, not the invalid value that is written
to it.

Jan

> (gdb) cont
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb7fccb40 (LWP 22537)]
> 0xb7fae0db in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
> (gdb) bt
> #0  0xb7fae0db in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #1  0xb7fd07de in printer_loop (arg=0x0) at rt_print.c:693
> #2  0xb7faaadf in start_thread () from /lib/libpthread.so.0
> #3  0xb7d3b44e in clone () from /lib/libc.so.6
> (gdb) info threads
>   Id   Target Id         Frame
> * 2    Thread 0xb7fccb40 (LWP 22537) "app" 0xb7fae0db in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
>   1    Thread 0xb7c3f6c0 (LWP 22536) "app" 0xffffe424 in
> __kernel_vsyscall ()
> (gdb) p $_siginfo._sifields._sigfault.si_addr
> $5 = (void *) 0xc
> (gdb) frame 1
> #1  0xb7fd07de in printer_loop (arg=0x0) at rt_print.c:693
> (gdb) print buffers
> $6 = 4
> (gdb) print printer_wakeup
> $7 = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq =
> 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
> __size = '\000' <repeats 47 times>, __align = 0}
> (gdb) print buffer_lock
> $8 = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 0,
> __nusers = 0, {__spins = 0, __list = {__next = 0x0}}}, __size = '\000'
> <repeats 23 times>, __align = 0}
> 
> That line 693 in my sources is the same line of printer_loop() as usual: 
>     pthread_cond_wait(&printer_wakeup, &buffer_lock);
> I didn't hit breakpoint 4 more than once.
> How else might I catch the bad memory access?
> 
> thanks,
> -C Smith
> 
> 
> On Thu, Nov 9, 2017 at 11:02 PM, Jan Kiszka <jan.kiszka@web.de
> <mailto:jan.kiszka@web.de>> wrote:
> 
>     On 2017-11-10 07:58, C Smith wrote:
>     > Agreed the segfault is inside pthread_cond_wait(), the contents of the
>     > args are seen in previous post.
>     > dmesg says this:
>     > app[12316]: segfault at c ip b76fe0db sp b771c268 error 4 in
>     > libpthread-2.15.so <http://libpthread-2.15.so>
>     <http://libpthread-2.15.so>[b76f4000+16000]
>     >
>     > And gdb shows me the same address. After a segfault generated inside gdb:
>     > p $_siginfo._sifields._sigfault.si_addr
>     > $9 = (void *) 0xc
>     >
>     > I've done further testing and in gdb I found that my app segfaults
>     > before hitting the first line of main().
>     > Thus I am unable to catch it in gdb with a hardware watchpoint. I
>     > attempted to do so by first making my
>     > app hit a breakpoint on the first line of main(), then I set a
>     > watchpoint on 0xC in
>     > gdb and run, but I never get a segfault after that point, after over 100
>     > runs.
> 
>     You can set a breakpoint on __rt_print_init, e.g.
> 
>     >
>     > Note that the app launches only 1 realtime thread now in these tests (in
>     > original tests it had 3 threads).
>     > Here is the one way I was able to get the app to run without
>     > segfaulting, even with multiple real
>     > time threads: I set kernel boot option maxcpus=1.  (on a SMP kernel with
>     > 4 cores). I was then able to run
>     > the app over 80 times with no segfault.
>     >
>     > So the segfault is happening on about 10% of runs, in printer_loop(),
>     > apparently before the first line of main(), and I am unable to
>     > catch the bad memory access in a debugger with a watchpoint.
>     > Do you have a suggestion as to how to further debug this?
> 
>     See above. Maybe that alone will give you a hint: if that function, for
>     what ever reason, happens to be called twice, that could explain the
>     issue as well. Then please catch the backtraces of all invocations.
> 
>     Jan
> 
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://xenomai.org/pipermail/xenomai/attachments/20171110/87357663/attachment.sig>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] segfault in printer_loop()
  2017-11-13  6:39 C Smith
@ 2017-11-13  7:41 ` Jan Kiszka
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kiszka @ 2017-11-13  7:41 UTC (permalink / raw)
  To: C Smith, xenomai

On 2017-11-13 07:39, C Smith wrote:
> Hi Jan,
> 
> I have found a workaround for the problem. Instead of the startup segfault
> happening 10% of the time, I have now started my RT app 90 times with a
> single RT thread, and 80 times with its original three RT threads - with no
> segfaults.
> 
> Per your question: I don't think the problem is that __rt_print_init() is
> getting called twice. The normal order of execution is like this:
> 
> . printer_loop() gets called first when a xenomai RT app starts up
> 
> . pthread_mutex_lock() sets the buffer_lock struct so __lock and __owner
> are nonzero:
> (gdb) p buffer_lock
> $4 = {__data = {__lock = 1, __count = 0, __owner = 18681, __kind = 0,
> __nusers = 1, {__spins = 0, __list = {__next = 0x0}}}, __size =
> "\001\000\000\000\000\000\000\000\371H\000\000\000\000\000\000\001\000\000\000\000\000\000",
> __align = 1}
> 
> . then pthread_cond_wait() calls __rt_print_init()
> 
> . inside  __rt_print_init(), printer_wakeup has a valid __mutex:
> (gdb) print printer_wakeup
> $5 = {__data = {__lock = 0, __futex = 1, __total_seq = 1, __wakeup_seq = 0,
> __woken_seq = 0, __mutex = 0xb7fd4a1c, __nwaiters = 2, __broadcast_seq =
> 0}, __size = "\000\000\000\000\001\000\000\000\001", '\000' <repeats 23
> times>, "\034J\375\267\002\000\000\000\000\000\000\000\000\000\000",
> __align = 4294967296}
> 
> . Then continuing, we get to first line of main() OK with no segfault.
> 
> You had advised to watch for corruption of the vars pthread_cond_wait()
> uses.
> In contrast to the above, when the segfault occurs, the vars buffer_lock
> and printer_wakeup, which get passed into pthread_cond_wait(), contain all
> zeros:
> 
> (gdb) print buffer_lock
> $6 = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 0, __nusers
> = 0, {__spins = 0, __list = {__next = 0x0}}}, __size = '\000' <repeats 23
> times>, __align = 0}
> (gdb) print printer_wakeup
> $7 = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
> __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
> __size = '\000' <repeats 47 times>, __align = 0}
> 
> There is one pointer in the pthread_cond_t structure:
> printer_wakeup.__data.__mutex
> So perhaps pthread_cond_wait() dereferences this null mutex pointer ? The
> segfault always happens on access of address 0xC.

You can probably find out what it dereferences by installing debug
symbols for the glibc. But let's assume it's the mutex: This reference
is set by pthread_cond_wait itself when it associates the provided mutex
with the condition variable on function entry. Therefore my assumption
that we see a corruption during the execution of cond_wait.

> 
> This segfault first appeared when I compiled my app for SMP, and it goes
> away if I use kernel arg maxcpus=1. Perhaps some SMP race condition is
> occasionally preventing the data structures (buffer_lock,printer_wakeup)
> from being ready for pthread_cond_wait()?
> 
> As a protection against this I have patched the rt_print.c printer_loop()
> code, skipping the call to pthread_cond_wait() if those two structures
> (buffer_lock,printer_wakeup) are not ready. There is no reason to wait on a
> thread which is not locked and where the mutex is nonexistent, right?
> 
> This is the patch:
> 
> --- rt_print_A.c    2014-09-24 13:57:49.000000000 -0700
> +++ rt_print_B.c    2017-11-11 23:24:34.309832301 -0800
> @@ -680,9 +680,10 @@
>      while (1) {
>          pthread_cleanup_push(unlock, &buffer_lock);
>          pthread_mutex_lock(&buffer_lock);
> -
> -        while (buffers == 0)
> -            pthread_cond_wait(&printer_wakeup, &buffer_lock);
> +
> +        if ((buffer_lock.__data.__lock != 0) &&
> (printer_wakeup.__data.__mutex != 0))
> +            while (buffers == 0)
> +                pthread_cond_wait(&printer_wakeup, &buffer_lock);
> 
>          print_buffers();
> 
> Can you verify that this patch is safe?

It's definitely not because we still have no clue what actually goes wrong.

My suggestion to debug this via watchpoints still stands: First find out
which field is actually dereferenced on the crash. Then set a watchpoint
on it during __rt_print_init.

There is some ordering issue of initialization function that I cannot
explain yet. Have a specific look at when and how often
forked_child_init is run because it a) reinitializes buffer_lock and b)
spawns the printer thread. In theory, everything should be up an read
PRIOR to that spawning.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Xenomai] segfault in printer_loop()
@ 2017-11-13  6:39 C Smith
  2017-11-13  7:41 ` Jan Kiszka
  0 siblings, 1 reply; 8+ messages in thread
From: C Smith @ 2017-11-13  6:39 UTC (permalink / raw)
  To: xenomai

Hi Jan,

I have found a workaround for the problem. Instead of the startup segfault
happening 10% of the time, I have now started my RT app 90 times with a
single RT thread, and 80 times with its original three RT threads - with no
segfaults.

Per your question: I don't think the problem is that __rt_print_init() is
getting called twice. The normal order of execution is like this:

. printer_loop() gets called first when a xenomai RT app starts up

. pthread_mutex_lock() sets the buffer_lock struct so __lock and __owner
are nonzero:
(gdb) p buffer_lock
$4 = {__data = {__lock = 1, __count = 0, __owner = 18681, __kind = 0,
__nusers = 1, {__spins = 0, __list = {__next = 0x0}}}, __size =
"\001\000\000\000\000\000\000\000\371H\000\000\000\000\000\000\001\000\000\000\000\000\000",
__align = 1}

. then pthread_cond_wait() calls __rt_print_init()

. inside  __rt_print_init(), printer_wakeup has a valid __mutex:
(gdb) print printer_wakeup
$5 = {__data = {__lock = 0, __futex = 1, __total_seq = 1, __wakeup_seq = 0,
__woken_seq = 0, __mutex = 0xb7fd4a1c, __nwaiters = 2, __broadcast_seq =
0}, __size = "\000\000\000\000\001\000\000\000\001", '\000' <repeats 23
times>, "\034J\375\267\002\000\000\000\000\000\000\000\000\000\000",
__align = 4294967296}

. Then continuing, we get to first line of main() OK with no segfault.

You had advised to watch for corruption of the vars pthread_cond_wait()
uses.
In contrast to the above, when the segfault occurs, the vars buffer_lock
and printer_wakeup, which get passed into pthread_cond_wait(), contain all
zeros:

(gdb) print buffer_lock
$6 = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 0, __nusers
= 0, {__spins = 0, __list = {__next = 0x0}}}, __size = '\000' <repeats 23
times>, __align = 0}
(gdb) print printer_wakeup
$7 = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
__woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
__size = '\000' <repeats 47 times>, __align = 0}

There is one pointer in the pthread_cond_t structure:
printer_wakeup.__data.__mutex
So perhaps pthread_cond_wait() dereferences this null mutex pointer ? The
segfault always happens on access of address 0xC.

This segfault first appeared when I compiled my app for SMP, and it goes
away if I use kernel arg maxcpus=1. Perhaps some SMP race condition is
occasionally preventing the data structures (buffer_lock,printer_wakeup)
from being ready for pthread_cond_wait()?

As a protection against this I have patched the rt_print.c printer_loop()
code, skipping the call to pthread_cond_wait() if those two structures
(buffer_lock,printer_wakeup) are not ready. There is no reason to wait on a
thread which is not locked and where the mutex is nonexistent, right?

This is the patch:

--- rt_print_A.c    2014-09-24 13:57:49.000000000 -0700
+++ rt_print_B.c    2017-11-11 23:24:34.309832301 -0800
@@ -680,9 +680,10 @@
     while (1) {
         pthread_cleanup_push(unlock, &buffer_lock);
         pthread_mutex_lock(&buffer_lock);
-
-        while (buffers == 0)
-            pthread_cond_wait(&printer_wakeup, &buffer_lock);
+
+        if ((buffer_lock.__data.__lock != 0) &&
(printer_wakeup.__data.__mutex != 0))
+            while (buffers == 0)
+                pthread_cond_wait(&printer_wakeup, &buffer_lock);

         print_buffers();

Can you verify that this patch is safe?

thanks,
-C Smith

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai] segfault in printer_loop()
@ 2017-11-01  0:12 C Smith
  0 siblings, 0 replies; 8+ messages in thread
From: C Smith @ 2017-11-01  0:12 UTC (permalink / raw)
  To: xenomai

One update for clarification:

The gcc command line is identical between the old xeno 2.6.2 machine
which produces a stable app,
and this new xeno 2.6.4 machine which produces segfaults in my app.
You'll notice some redundancy
in the gcc command line below, as I am using more than one skin and
calling xeno-config more than
once in the Makefile (but that has never been a problem in years of
using 2.6.2):

gcc -g3 -I/usr/xenomai/include -D_GNU_SOURCE -D_REENTRANT -D__XENO__
-I/usr/include/libxml2 -I/usr/local/rtnet/include -I"SOEM/" -I"SOEM/osal"
-I"SOEM/oshw/linux" -I"SOEM/soem"    -Xlinker -rpath -Xlinker /usr/xenomai/lib
app.c ../include/dia_dev_app.h ../include/crc_table.h ../include/dacdefs.h
../include/ov_version.h ../include/adcdefs.h ../include/app_version.h
../include/app.h ../include/canodefs.h ../include/preproc_app.h
../include/app_mem_manager_data.h ../include/comm_dta_app.h
../include/comproto.h
../modules/rtdinsync.h quad.o dac.o adc.o
SOEM/lib/linux/liboshw.a SOEM/lib/linux/libosal.a SOEM/lib/linux/libsoem.a
-L../lib -lapp -lnative -L/usr/xenomai/lib -lxenomai -lpthread -lrt -lrtdm
-L/usr/xenomai/lib -lxenomai -lpthread -lrt
-Wl,@/usr/xenomai/lib/posix.wrappers
-L/usr/xenomai/lib -lpthread_rt -lxenomai -lpthread -lrt  -lxml2 -lz -lm
-L"SOEM/lib/linux" -Wl,--start-group -loshw -losal -lsoem
-Wl,--end-group -lm -o app

Thanks, -C. Smith

Original post:

My xenomai application is segfaulting at startup, 1 in 10 times I run it.
When I catch it in a debugger or get a core file it says the segfault was
not in my code but in the xenomai sources:

rt_print.c line 685:
pthread_cond_wait(&printer_wakeup, &buffer_lock);

(gdb) bt
#O  Oxb77120db in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/libpthread.so.O
#1  Oxb77347de in printer_loop (arg=Ox0) at rt_print.c:685
#2  Oxb770eadf in start thread () from /lib/libpthread.so.O
#3  Oxb749f44e in clone () from /lib/libc.so.6
(gdb) info threads
 Id  Target Id         Frame
 2   Thread Oxb73a36cO (LWP 7235) Oxffffe424 in ?? ()
*1   Thread Oxb7730b40 (LWP 7238) Oxb77120db in pthread_cond_wait@
@GLZBC_2.3.2
() from /lib/libpthread.so.O

Note that there is no printing whatsover in my code. This is a mature
application which has been running sucessfully on xenomai 2.6.2 for a few
years - but now I am running it on xenomai 2.6.4 on kernel 3.14.17.
Another difference is that I am now using a faster motherboard. I have a
suspicion that there is a race condition which is causing uninitialized
thread variables. I believe this is during the creation of a thread where
xenomai prints the new thread info to stdout.

Could &printer_wakeup, &buffer_lock be invalid?
I was unable to evaluate them in the debugger, I think their values are
gone from the stack/heap by the time I get to them.

There are no differences in rt_print.c between xenomai 2.6.4 and 2.6.5.

Can you provide a way to modify the code of printer_loop() to detect and
work around the problem?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Xenomai] segfault in printer_loop()
@ 2017-10-27  2:08 C Smith
  0 siblings, 0 replies; 8+ messages in thread
From: C Smith @ 2017-10-27  2:08 UTC (permalink / raw)
  To: xenomai

My xenomai application is segfaulting at startup, 1 in 10 times I run it.
When I catch it in a debugger or get a core file it says the segfault was
not in my code but in the xenomai sources:

rt_print.c line 685:
pthread_cond_wait(&printer_wakeup, &buffer_lock);

(gdb) bt
#O  Oxb77120db in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/libpthread.so.O
#1  Oxb77347de in printer_loop (arg=Ox0) at rt_print.c:685
#2  Oxb770eadf in start thread () from /lib/libpthread.so.O
#3  Oxb749f44e in clone () from /lib/libc.so.6
(gdb) info threads
 Id  Target Id         Frame
 2   Thread Oxb73a36cO (LWP 7235) Oxffffe424 in ?? ()
*1   Thread Oxb7730b40 (LWP 7238) Oxb77120db in pthread_cond_wait@
@GLZBC_2.3.2
() from /lib/libpthread.so.O

Note that there is no printing whatsover in my code. This is a mature
application which has been running sucessfully on xenomai 2.6.2 for a few
years - but now I am running it on xenomai 2.6.4 on kernel 3.14.17.
Another difference is that I am now using a faster motherboard. I have a
suspicion that there is a race condition which is causing uninitialized
thread variables. I believe this is during the creation of a thread where
xenomai prints the new thread info to stdout.

Could &printer_wakeup, &buffer_lock be invalid?
I was unable to evaluate them in the debugger, I think their values are
gone from the stack/heap by the time I get to them.

There are no differences in rt_print.c between xenomai 2.6.4 and 2.6.5.

Can you provide a way to modify the code of printer_loop() to detect and
work around the problem?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-11-13  7:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-01  6:29 [Xenomai] segfault in printer_loop() C Smith
2017-11-01  7:35 ` Jan Kiszka
     [not found]   ` <CA+K1mPF+SOhOeVpYktjNCcD7u403CtUXkM1Hcz_SS-6wwG50xg@mail.gmail.com>
2017-11-10  7:02     ` Jan Kiszka
     [not found]       ` <CA+K1mPGSTNyk0JZPQs4sSyuu+Xbi=cChCwFU1uokM9gNAe6n2Q@mail.gmail.com>
2017-11-10 10:07         ` Jan Kiszka
  -- strict thread matches above, loose matches on Subject: below --
2017-11-13  6:39 C Smith
2017-11-13  7:41 ` Jan Kiszka
2017-11-01  0:12 C Smith
2017-10-27  2:08 C Smith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.