All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] lib/cobalt: init: do not call pthread_atfork() from atfork() handlers
@ 2019-05-03 15:09 Philippe Gerum
       [not found] ` <VI1PR05MB591765785A35FB3685463256F6300@VI1PR05MB5917.eurprd05.prod.outlook.com>
  2019-05-27  7:17 ` Jan Kiszka
  0 siblings, 2 replies; 10+ messages in thread
From: Philippe Gerum @ 2019-05-03 15:09 UTC (permalink / raw)
  To: xenomai

Since glibc 2.28, calling pthread_atfork() over the context of a fork
handler hangs, due to unexpected recursive locking on a common lock
both want to acquire.  To fix this, the cobalt fork handler needs to
be registered outside of the atfork handling context it installs.

At this chance, group all base inits which do not need to be
reiterated in the forkee to exclude them from the atfork context.

The problematic change was introduced between glibc-2.27.9000 and
glibc-2.28 [1]; it triggered a bug in the glibc test suite [2].

[1] git://sourceware.org/git/glibc.git, 27761a104
[2] git://sourceware.org/git/glibc.git, 669ff911e

Signed-off-by: Philippe Gerum <rpm@xenomai.org>
---
 lib/cobalt/init.c | 43 +++++++++++++++++++++++++------------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/lib/cobalt/init.c b/lib/cobalt/init.c
index abd990692..02a99c569 100644
--- a/lib/cobalt/init.c
+++ b/lib/cobalt/init.c
@@ -184,20 +184,26 @@ static void low_init(void)
 	cobalt_ticks_init(f->clock_freq);
 }
 
+static int cobalt_init_2(void);
+
 static void cobalt_fork_handler(void)
 {
 	cobalt_unmap_umm();
 	cobalt_clear_tsd();
 	cobalt_print_init_atfork();
-	if (cobalt_init())
+	if (cobalt_init_2())
 		exit(EXIT_FAILURE);
 }
 
-static void __cobalt_init(void)
+static inline void commit_stack_memory(void)
 {
-	struct sigaction sa;
+	char stk[PTHREAD_STACK_MIN / 2];
+	cobalt_commit_memory(stk);
+}
 
-	low_init();
+static void cobalt_init_1(void)
+{
+	struct sigaction sa;
 
 	sa.sa_sigaction = cobalt_sigdebug_handler;
 	sigemptyset(&sa.sa_mask);
@@ -228,20 +234,9 @@ static void __cobalt_init(void)
 			    " sizeof(cobalt_sem_shadow): %Zd!",
 			    sizeof(sem_t),
 			    sizeof(struct cobalt_sem_shadow));
-
-	cobalt_mutex_init();
-	cobalt_sched_init();
-	cobalt_thread_init();
-	cobalt_print_init();
 }
 
-static inline void commit_stack_memory(void)
-{
-	char stk[PTHREAD_STACK_MIN / 2];
-	cobalt_commit_memory(stk);
-}
-
-int cobalt_init(void)
+static int cobalt_init_2(void)
 {
 	pthread_t ptid = pthread_self();
 	struct sched_param parm;
@@ -249,7 +244,12 @@ int cobalt_init(void)
 
 	commit_stack_memory();	/* We only need this for the main thread */
 	cobalt_default_condattr_init();
-	__cobalt_init();
+
+	low_init();
+	cobalt_mutex_init();
+	cobalt_sched_init();
+	cobalt_thread_init();
+	cobalt_print_init();
 
 	if (__cobalt_control_bind)
 		return 0;
@@ -288,12 +288,19 @@ int cobalt_init(void)
 	return 0;
 }
 
+int cobalt_init(void)
+{
+	cobalt_init_1();
+
+	return cobalt_init_2();
+}
+
 static int get_int_arg(const char *name, const char *arg,
 		       int *valp, int min)
 {
 	int value, ret;
 	char *p;
-	
+
 	errno = 0;
 	value = (int)strtol(arg, &p, 10);
 	if (errno || *p || value < min) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [PATCH] lib/cobalt: init: do not call pthread_atfork() from atfork() handlers
       [not found]     ` <c0230398-8da5-a0b4-d30c-17e91e2f0cb7@xenomai.org>
@ 2019-05-14  9:53       ` Lange Norbert
  2019-05-14 10:04         ` Philippe Gerum
  0 siblings, 1 reply; 10+ messages in thread
From: Lange Norbert @ 2019-05-14  9:53 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai (xenomai@xenomai.org)

(readding ML)

> -----Original Message-----
> From: Philippe Gerum <rpm@xenomai.org>
> Sent: Dienstag, 14. Mai 2019 10:38
> To: Lange Norbert <norbert.lange@andritz.com>
> Subject: Re: [PATCH] lib/cobalt: init: do not call pthread_atfork() from
> atfork() handlers
>
> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
>
>
> On 5/14/19 10:35 AM, Philippe Gerum wrote:
> > On 5/6/19 9:56 AM, Lange Norbert wrote:
> >> Hello Philippe,
> >>
> >> using this patch, smokey's "fork test" alone finishes, but..
> >> the smokey suite will hang when running that test after the mutex or
> >> cvars test. Eg.
> >>
> >> smokey --run=10,11
> >> smokey --run=12,11
> >
> > I cannot reproduce this with glibc 2.28, and the tip of my
> > for-upstream tree which includes that fix. Which glibc are you running?

Glibc 2.28, Xenomai userspace is based on current master branch
with fix added (tested both with and without our company stuff on top)

>
> Is this the sequence which hangs on your end?
>
> ~ # smokey --run=13-14
> posix_cond OK
> posix_fork OK
> ~ # smokey --run=15-14
> posix_mutex OK
> posix_fork OK
>

Yes:

root@buildroot:~# /usr/xenomai/bin/smokey --run=10
posix_mutex OK
root@buildroot:~# /usr/xenomai/bin/smokey --run=11
posix_fork OK
root@buildroot:~# /usr/xenomai/bin/smokey --run=10,11
posix_mutex OK

When it hangs, this is the stacktrace:
(switched to crosstool-NG for the toolchain, did not check to enable debuginfo for glibc).

(gdb) bt
#0  0x00007f45b4d86feb in ?? () from /lib64/libc.so.6
#1  0x00007f45b4de8b95 in malloc () from /lib64/libc.so.6
#2  0x00007f45b4f81a53 in ?? () from /lib64/ld-linux-x86-64.so.2
#3  0x00007f45b4f83149 in ?? () from /lib64/ld-linux-x86-64.so.2
#4  0x00007f45b4f8d4cc in ?? () from /lib64/ld-linux-x86-64.so.2
#5  0x00007f45b4ea7bcf in _dl_catch_exception () from /lib64/libc.so.6
#6  0x00007f45b4f8d0bb in ?? () from /lib64/ld-linux-x86-64.so.2
#7  0x00007f45b4ea71a3 in ?? () from /lib64/libc.so.6
#8  0x00007f45b4ea7bcf in _dl_catch_exception () from /lib64/libc.so.6
#9  0x00007f45b4ea7c40 in _dl_catch_error () from /lib64/libc.so.6
#10 0x00007f45b4ea72a8 in ?? () from /lib64/libc.so.6
#11 0x00007f45b4ea7317 in __libc_dlopen_mode () from /lib64/libc.so.6
#12 0x00007f45b4dacfa5 in ?? () from /lib64/libc.so.6
#13 0x00007f45b4f2051f in ?? () from /lib64/libpthread.so.0
#14 0x00007f45b4e7ca84 in backtrace () from /lib64/libc.so.6
#15 0x00007f45b4f4960c in cobalt_sigshadow_handler (sig=<optimized out>, si=<optimized out>, ctxt=<optimized out>) at /tmp/tmp.cTCUJSMLNc/xeno/lib/cobalt/sigshadow.c:55
#16 0x00007f45b4f4965d in sigshadow_handler (sig=28, si=0x7ffcd1fb45f0, ctxt=0x7ffcd1fb44c0) at /tmp/tmp.cTCUJSMLNc/xeno/lib/cobalt/sigshadow.c:80
#17 <signal handler called>
#18 0x00007f45b4e6b242 in mmap64 () from /lib64/libc.so.6
#19 0x00007f45b4de6a3c in ?? () from /lib64/libc.so.6
#20 0x00007f45b4de775c in ?? () from /lib64/libc.so.6
#21 0x00007f45b4de8ba7 in malloc () from /lib64/libc.so.6
#22 0x00007f45b4f66754 in heapobj_pkg_init_private () at /tmp/tmp.cTCUJSMLNc/xeno/lib/copperplate/heapobj-heapmem.c:102
#23 0x00007f45b4f62f17 in copperplate_init () at /tmp/tmp.cTCUJSMLNc/xeno/lib/copperplate/init.c:199
#24 0x00007f45b4f4d6b0 in __xenomai_init (argcp=argcp@entry=0x7ffcd1fb4e44, argvp=argvp@entry=0x7ffcd1fb4e48, me=me@entry=0x7f45b4f53f29 "program") at /tmp/tmp.cTCUJSMLNc/xeno/lib/boilerplate/setup.c:630
#25 0x00007f45b4f4dcac in xenomai_init (argcp=0x7ffcd1fb4e44, argvp=0x7ffcd1fb4e48) at /tmp/tmp.cTCUJSMLNc/xeno/lib/boilerplate/setup.c:685
#26 0x0000000000405177 in ?? ()
#27 0x000000000041ae7d in ?? ()
#28 0x00007f45b4d8740b in __libc_start_main () from /lib64/libc.so.6
#29 0x000000000040544a in ?? ()

Regards, Norbert

PS. Could you please have a look at this aswell: https://www.xenomai.org/pipermail/xenomai/2019-March/040572.html
(Its unrelated, but come back to my mind as I looked at some of my private commits not upstreamed)
________________________________

This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You
________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib/cobalt: init: do not call pthread_atfork() from atfork() handlers
  2019-05-14  9:53       ` Lange Norbert
@ 2019-05-14 10:04         ` Philippe Gerum
  2019-05-14 10:24           ` Lange Norbert
  0 siblings, 1 reply; 10+ messages in thread
From: Philippe Gerum @ 2019-05-14 10:04 UTC (permalink / raw)
  To: Lange Norbert, Xenomai (xenomai@xenomai.org)

On 5/14/19 11:53 AM, Lange Norbert wrote:
> (readding ML)
> 
>> -----Original Message-----
>> From: Philippe Gerum <rpm@xenomai.org>
>> Sent: Dienstag, 14. Mai 2019 10:38
>> To: Lange Norbert <norbert.lange@andritz.com>
>> Subject: Re: [PATCH] lib/cobalt: init: do not call pthread_atfork() from
>> atfork() handlers
>>
>> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
>> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
>> ATTACHMENTS.
>>
>>
>> On 5/14/19 10:35 AM, Philippe Gerum wrote:
>>> On 5/6/19 9:56 AM, Lange Norbert wrote:
>>>> Hello Philippe,
>>>>
>>>> using this patch, smokey's "fork test" alone finishes, but..
>>>> the smokey suite will hang when running that test after the mutex or
>>>> cvars test. Eg.
>>>>
>>>> smokey --run=10,11
>>>> smokey --run=12,11
>>>
>>> I cannot reproduce this with glibc 2.28, and the tip of my
>>> for-upstream tree which includes that fix. Which glibc are you running?
> 
> Glibc 2.28, Xenomai userspace is based on current master branch
> with fix added (tested both with and without our company stuff on top)
> 
>>
>> Is this the sequence which hangs on your end?
>>
>> ~ # smokey --run=13-14
>> posix_cond OK
>> posix_fork OK
>> ~ # smokey --run=15-14
>> posix_mutex OK
>> posix_fork OK
>>
> 
> Yes:
> 
> root@buildroot:~# /usr/xenomai/bin/smokey --run=10
> posix_mutex OK
> root@buildroot:~# /usr/xenomai/bin/smokey --run=11
> posix_fork OK
> root@buildroot:~# /usr/xenomai/bin/smokey --run=10,11
> posix_mutex OK
> 
> When it hangs, this is the stacktrace:
> (switched to crosstool-NG for the toolchain, did not check to enable debuginfo for glibc).
> 
> (gdb) bt
> #0  0x00007f45b4d86feb in ?? () from /lib64/libc.so.6
> #1  0x00007f45b4de8b95 in malloc () from /lib64/libc.so.6
> #2  0x00007f45b4f81a53 in ?? () from /lib64/ld-linux-x86-64.so.2
> #3  0x00007f45b4f83149 in ?? () from /lib64/ld-linux-x86-64.so.2
> #4  0x00007f45b4f8d4cc in ?? () from /lib64/ld-linux-x86-64.so.2
> #5  0x00007f45b4ea7bcf in _dl_catch_exception () from /lib64/libc.so.6

That is a different issue, possibly not directly related. backtrace() is
used over a signal context in the default SIGSHADOW handler libcobalt
installs, which is unsafe since backtrace() calls malloc(). This run
ends up with a recursive call to malloc() which deadlocks on the
internal arena lock. Disabling CONFIG_XENO_OPT_DEBUG_TRACE_RELAX may
paper over the issue.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH] lib/cobalt: init: do not call pthread_atfork() from atfork() handlers
  2019-05-14 10:04         ` Philippe Gerum
@ 2019-05-14 10:24           ` Lange Norbert
  2019-05-14 10:40             ` FW: " Lange Norbert
  2019-05-14 13:58             ` Philippe Gerum
  0 siblings, 2 replies; 10+ messages in thread
From: Lange Norbert @ 2019-05-14 10:24 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai (xenomai@xenomai.org)



> -----Original Message-----
> From: Philippe Gerum <rpm@xenomai.org>
> Sent: Dienstag, 14. Mai 2019 12:05
> To: Lange Norbert <norbert.lange@andritz.com>; Xenomai
> (xenomai@xenomai.org) <xenomai@xenomai.org>
> Subject: Re: [PATCH] lib/cobalt: init: do not call pthread_atfork() from
> atfork() handlers
>
> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
>
>
> On 5/14/19 11:53 AM, Lange Norbert wrote:
> > (readding ML)
> >
> >> -----Original Message-----
> >> From: Philippe Gerum <rpm@xenomai.org>
> >> Sent: Dienstag, 14. Mai 2019 10:38
> >> To: Lange Norbert <norbert.lange@andritz.com>
> >> Subject: Re: [PATCH] lib/cobalt: init: do not call pthread_atfork()
> >> from
> >> atfork() handlers
> >>
> >> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE,
> PLEASE
> >> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 5/14/19 10:35 AM, Philippe Gerum wrote:
> >>> On 5/6/19 9:56 AM, Lange Norbert wrote:
> >>>> Hello Philippe,
> >>>>
> >>>> using this patch, smokey's "fork test" alone finishes, but..
> >>>> the smokey suite will hang when running that test after the mutex
> >>>> or cvars test. Eg.
> >>>>
> >>>> smokey --run=10,11
> >>>> smokey --run=12,11
> >>>
> >>> I cannot reproduce this with glibc 2.28, and the tip of my
> >>> for-upstream tree which includes that fix. Which glibc are you running?
> >
> > Glibc 2.28, Xenomai userspace is based on current master branch with
> > fix added (tested both with and without our company stuff on top)
> >
> >>
> >> Is this the sequence which hangs on your end?
> >>
> >> ~ # smokey --run=13-14
> >> posix_cond OK
> >> posix_fork OK
> >> ~ # smokey --run=15-14
> >> posix_mutex OK
> >> posix_fork OK
> >>
> >
> > Yes:
> >
> > root@buildroot:~# /usr/xenomai/bin/smokey --run=10 posix_mutex OK
> > root@buildroot:~# /usr/xenomai/bin/smokey --run=11 posix_fork OK
> > root@buildroot:~# /usr/xenomai/bin/smokey --run=10,11 posix_mutex OK
> >
> > When it hangs, this is the stacktrace:
> > (switched to crosstool-NG for the toolchain, did not check to enable
> debuginfo for glibc).
> >
> > (gdb) bt
> > #0  0x00007f45b4d86feb in ?? () from /lib64/libc.so.6
> > #1  0x00007f45b4de8b95 in malloc () from /lib64/libc.so.6
> > #2  0x00007f45b4f81a53 in ?? () from /lib64/ld-linux-x86-64.so.2
> > #3  0x00007f45b4f83149 in ?? () from /lib64/ld-linux-x86-64.so.2
> > #4  0x00007f45b4f8d4cc in ?? () from /lib64/ld-linux-x86-64.so.2
> > #5  0x00007f45b4ea7bcf in _dl_catch_exception () from /lib64/libc.so.6
>
> That is a different issue, possibly not directly related. backtrace() is used over
> a signal context in the default SIGSHADOW handler libcobalt installs, which is
> unsafe since backtrace() calls malloc(). This run ends up with a recursive call
> to malloc() which deadlocks on the internal arena lock. Disabling
> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX may paper over the issue.

So smokeys fork will cause some relaxation log entries if I understood this correctly,
and a "clean" application should leave realtime before calling fork.
(Unrelated to the issue that a deadlock should not happen)

Norbert
________________________________

This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You
________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* FW: [PATCH] lib/cobalt: init: do not call pthread_atfork() from atfork() handlers
  2019-05-14 10:24           ` Lange Norbert
@ 2019-05-14 10:40             ` Lange Norbert
  2019-05-14 14:01               ` Philippe Gerum
  2019-05-14 13:58             ` Philippe Gerum
  1 sibling, 1 reply; 10+ messages in thread
From: Lange Norbert @ 2019-05-14 10:40 UTC (permalink / raw)
  To: Xenomai (xenomai@xenomai.org), Philippe Gerum (rpm@xenomai.org)

What do you think about this hackaround?
Still not "clean" to call in a signal handler but it does work.

Norbert

> -----Original Message-----
> From: Lange Norbert
> Sent: Dienstag, 14. Mai 2019 12:24
> To: Philippe Gerum <rpm@xenomai.org>; Xenomai (xenomai@xenomai.org)
> <xenomai@xenomai.org>
> Subject: RE: [PATCH] lib/cobalt: init: do not call pthread_atfork() from
> atfork() handlers
>
>
>
> > -----Original Message-----
> > From: Philippe Gerum <rpm@xenomai.org>
> > Sent: Dienstag, 14. Mai 2019 12:05
> > To: Lange Norbert <norbert.lange@andritz.com>; Xenomai
> > (xenomai@xenomai.org) <xenomai@xenomai.org>
> > Subject: Re: [PATCH] lib/cobalt: init: do not call pthread_atfork()
> > from
> > atfork() handlers
> >
> > E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> > EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
> >
> >
> > On 5/14/19 11:53 AM, Lange Norbert wrote:
> > > (readding ML)
> > >
> > >> -----Original Message-----
> > >> From: Philippe Gerum <rpm@xenomai.org>
> > >> Sent: Dienstag, 14. Mai 2019 10:38
> > >> To: Lange Norbert <norbert.lange@andritz.com>
> > >> Subject: Re: [PATCH] lib/cobalt: init: do not call pthread_atfork()
> > >> from
> > >> atfork() handlers
> > >>
> > >> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE,
> > PLEASE
> > >> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> > ATTACHMENTS.
> > >>
> > >>
> > >> On 5/14/19 10:35 AM, Philippe Gerum wrote:
> > >>> On 5/6/19 9:56 AM, Lange Norbert wrote:
> > >>>> Hello Philippe,
> > >>>>
> > >>>> using this patch, smokey's "fork test" alone finishes, but..
> > >>>> the smokey suite will hang when running that test after the mutex
> > >>>> or cvars test. Eg.
> > >>>>
> > >>>> smokey --run=10,11
> > >>>> smokey --run=12,11
> > >>>
> > >>> I cannot reproduce this with glibc 2.28, and the tip of my
> > >>> for-upstream tree which includes that fix. Which glibc are you running?
> > >
> > > Glibc 2.28, Xenomai userspace is based on current master branch with
> > > fix added (tested both with and without our company stuff on top)
> > >
> > >>
> > >> Is this the sequence which hangs on your end?
> > >>
> > >> ~ # smokey --run=13-14
> > >> posix_cond OK
> > >> posix_fork OK
> > >> ~ # smokey --run=15-14
> > >> posix_mutex OK
> > >> posix_fork OK
> > >>
> > >
> > > Yes:
> > >
> > > root@buildroot:~# /usr/xenomai/bin/smokey --run=10 posix_mutex OK
> > > root@buildroot:~# /usr/xenomai/bin/smokey --run=11 posix_fork OK
> > > root@buildroot:~# /usr/xenomai/bin/smokey --run=10,11 posix_mutex
> OK
> > >
> > > When it hangs, this is the stacktrace:
> > > (switched to crosstool-NG for the toolchain, did not check to enable
> > debuginfo for glibc).
> > >
> > > (gdb) bt
> > > #0  0x00007f45b4d86feb in ?? () from /lib64/libc.so.6
> > > #1  0x00007f45b4de8b95 in malloc () from /lib64/libc.so.6
> > > #2  0x00007f45b4f81a53 in ?? () from /lib64/ld-linux-x86-64.so.2
> > > #3  0x00007f45b4f83149 in ?? () from /lib64/ld-linux-x86-64.so.2
> > > #4  0x00007f45b4f8d4cc in ?? () from /lib64/ld-linux-x86-64.so.2
> > > #5  0x00007f45b4ea7bcf in _dl_catch_exception () from
> > > /lib64/libc.so.6
> >
> > That is a different issue, possibly not directly related. backtrace()
> > is used over a signal context in the default SIGSHADOW handler
> > libcobalt installs, which is unsafe since backtrace() calls malloc().
> > This run ends up with a recursive call to malloc() which deadlocks on
> > the internal arena lock. Disabling
> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX may paper over the issue.
>
> So smokeys fork will cause some relaxation log entries if I understood this
> correctly, and a "clean" application should leave realtime before calling fork.
> (Unrelated to the issue that a deadlock should not happen)
>
> Norbert
________________________________

This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You
________________________________
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-libcobalt-resolve-backtrace-early.patch
Type: application/octet-stream
Size: 1018 bytes
Desc: 0001-libcobalt-resolve-backtrace-early.patch
URL: <http://xenomai.org/pipermail/xenomai/attachments/20190514/6b6aa20a/attachment.obj>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib/cobalt: init: do not call pthread_atfork() from atfork() handlers
  2019-05-14 10:24           ` Lange Norbert
  2019-05-14 10:40             ` FW: " Lange Norbert
@ 2019-05-14 13:58             ` Philippe Gerum
  2019-05-27  7:19               ` Jan Kiszka
  1 sibling, 1 reply; 10+ messages in thread
From: Philippe Gerum @ 2019-05-14 13:58 UTC (permalink / raw)
  To: Lange Norbert, Xenomai (xenomai@xenomai.org)

On 5/14/19 12:24 PM, Lange Norbert wrote:
> 
> 
>> -----Original Message-----
>> From: Philippe Gerum <rpm@xenomai.org>
>> Sent: Dienstag, 14. Mai 2019 12:05
>> To: Lange Norbert <norbert.lange@andritz.com>; Xenomai
>> (xenomai@xenomai.org) <xenomai@xenomai.org>
>> Subject: Re: [PATCH] lib/cobalt: init: do not call pthread_atfork() from
>> atfork() handlers
>>
>> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
>> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
>> ATTACHMENTS.
>>
>>
>> On 5/14/19 11:53 AM, Lange Norbert wrote:
>>> (readding ML)
>>>
>>>> -----Original Message-----
>>>> From: Philippe Gerum <rpm@xenomai.org>
>>>> Sent: Dienstag, 14. Mai 2019 10:38
>>>> To: Lange Norbert <norbert.lange@andritz.com>
>>>> Subject: Re: [PATCH] lib/cobalt: init: do not call pthread_atfork()
>>>> from
>>>> atfork() handlers
>>>>
>>>> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE,
>> PLEASE
>>>> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
>> ATTACHMENTS.
>>>>
>>>>
>>>> On 5/14/19 10:35 AM, Philippe Gerum wrote:
>>>>> On 5/6/19 9:56 AM, Lange Norbert wrote:
>>>>>> Hello Philippe,
>>>>>>
>>>>>> using this patch, smokey's "fork test" alone finishes, but..
>>>>>> the smokey suite will hang when running that test after the mutex
>>>>>> or cvars test. Eg.
>>>>>>
>>>>>> smokey --run=10,11
>>>>>> smokey --run=12,11
>>>>>
>>>>> I cannot reproduce this with glibc 2.28, and the tip of my
>>>>> for-upstream tree which includes that fix. Which glibc are you running?
>>>
>>> Glibc 2.28, Xenomai userspace is based on current master branch with
>>> fix added (tested both with and without our company stuff on top)
>>>
>>>>
>>>> Is this the sequence which hangs on your end?
>>>>
>>>> ~ # smokey --run=13-14
>>>> posix_cond OK
>>>> posix_fork OK
>>>> ~ # smokey --run=15-14
>>>> posix_mutex OK
>>>> posix_fork OK
>>>>
>>>
>>> Yes:
>>>
>>> root@buildroot:~# /usr/xenomai/bin/smokey --run=10 posix_mutex OK
>>> root@buildroot:~# /usr/xenomai/bin/smokey --run=11 posix_fork OK
>>> root@buildroot:~# /usr/xenomai/bin/smokey --run=10,11 posix_mutex OK
>>>
>>> When it hangs, this is the stacktrace:
>>> (switched to crosstool-NG for the toolchain, did not check to enable
>> debuginfo for glibc).
>>>
>>> (gdb) bt
>>> #0  0x00007f45b4d86feb in ?? () from /lib64/libc.so.6
>>> #1  0x00007f45b4de8b95 in malloc () from /lib64/libc.so.6
>>> #2  0x00007f45b4f81a53 in ?? () from /lib64/ld-linux-x86-64.so.2
>>> #3  0x00007f45b4f83149 in ?? () from /lib64/ld-linux-x86-64.so.2
>>> #4  0x00007f45b4f8d4cc in ?? () from /lib64/ld-linux-x86-64.so.2
>>> #5  0x00007f45b4ea7bcf in _dl_catch_exception () from /lib64/libc.so.6
>>
>> That is a different issue, possibly not directly related. backtrace() is used over
>> a signal context in the default SIGSHADOW handler libcobalt installs, which is
>> unsafe since backtrace() calls malloc(). This run ends up with a recursive call
>> to malloc() which deadlocks on the internal arena lock. Disabling
>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX may paper over the issue.
> 
> So smokeys fork will cause some relaxation log entries if I understood this correctly,
> and a "clean" application should leave realtime before calling fork.
> (Unrelated to the issue that a deadlock should not happen)
> 

Aside of calling uninit backtrace() from a sighandler which is wrong, it looks like this issue pops up because two other conditions are met:

- some test which precedes smokey-fork switched the scheduling class of the main() smokey thread to a real-time policy (e.g. posix-mutex.c does so when entering run_posix_mutex()). When smokey-fork re-execs, the child inherits the scheduler settings for the main thread, so eventually runs copperplate inits as a SCHED_FIFO thread, which is eligible for relax detection (SCHED_OTHER/WEAK would not be).
- xnthread_relax() from the core always asks for a backtrace via a SIGSHADOW signal upon detection, including when the current thread does not want to be warned about mode switches, which seems wrong. If the warn-on-switch bit was considered as it should be, the request for backtrace via SIGSHADOW would not be sent in the first place.

Clearing either of these conditions like the couple of patches below does should tame down the issue. I would recommend to merge both actually.

diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
index fa7a65569..6caf9ef37 100644
--- a/kernel/cobalt/thread.c
+++ b/kernel/cobalt/thread.c
@@ -2131,11 +2131,11 @@ void xnthread_relax(int notify, int reason)
 	 * information.
 	 */
 	xnthread_propagate_schedparam(thread);
-	
+
 	if (xnthread_test_state(thread, XNUSER) && notify) {
-		xndebug_notify_relax(thread, reason);
 		if (xnthread_test_state(thread, XNWARN)) {
 			/* Help debugging spurious relaxes. */
+			xndebug_notify_relax(thread, reason);
 			memset(&si, 0, sizeof(si));
 			si.si_signo = SIGDEBUG;
 			si.si_code = SI_QUEUE;
diff --git a/testsuite/smokey/main.c b/testsuite/smokey/main.c
index 12321dfc5..5702e825a 100644
--- a/testsuite/smokey/main.c
+++ b/testsuite/smokey/main.c
@@ -22,6 +22,7 @@
 
 int main(int argc, char *const argv[])
 {
+	struct sched_param param = { .sched_priority = 0 };
 	struct smokey_test *t;
 	int ret, fails = 0;
 
@@ -29,6 +30,7 @@ int main(int argc, char *const argv[])
 		return 0;
 
 	for_each_smokey_test(t) {
+		pthread_setschedparam(pthread_self(), SCHED_OTHER, &param);
 		ret = t->run(t, argc, argv);
 		if (ret) {
 			if (ret == -ENOSYS) {

-- 
Philippe.


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: FW: [PATCH] lib/cobalt: init: do not call pthread_atfork() from atfork() handlers
  2019-05-14 10:40             ` FW: " Lange Norbert
@ 2019-05-14 14:01               ` Philippe Gerum
  0 siblings, 0 replies; 10+ messages in thread
From: Philippe Gerum @ 2019-05-14 14:01 UTC (permalink / raw)
  To: Lange Norbert, Xenomai (xenomai@xenomai.org)

On 5/14/19 12:40 PM, Lange Norbert wrote:
> What do you think about this hackaround?
> Still not "clean" to call in a signal handler but it does work.

Yes, this is documented as such in backtrace() as the recommended work
around for this situation. I would add this stuff to install_sigshadow()
instead though, because this is why/where we need that in the first place.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib/cobalt: init: do not call pthread_atfork() from atfork() handlers
  2019-05-03 15:09 [PATCH] lib/cobalt: init: do not call pthread_atfork() from atfork() handlers Philippe Gerum
       [not found] ` <VI1PR05MB591765785A35FB3685463256F6300@VI1PR05MB5917.eurprd05.prod.outlook.com>
@ 2019-05-27  7:17 ` Jan Kiszka
  1 sibling, 0 replies; 10+ messages in thread
From: Jan Kiszka @ 2019-05-27  7:17 UTC (permalink / raw)
  To: Philippe Gerum, xenomai

On 03.05.19 17:09, Philippe Gerum via Xenomai wrote:
> Since glibc 2.28, calling pthread_atfork() over the context of a fork
> handler hangs, due to unexpected recursive locking on a common lock
> both want to acquire.  To fix this, the cobalt fork handler needs to
> be registered outside of the atfork handling context it installs.
> 
> At this chance, group all base inits which do not need to be
> reiterated in the forkee to exclude them from the atfork context.
> 
> The problematic change was introduced between glibc-2.27.9000 and
> glibc-2.28 [1]; it triggered a bug in the glibc test suite [2].
> 
> [1] git://sourceware.org/git/glibc.git, 27761a104
> [2] git://sourceware.org/git/glibc.git, 669ff911e
> 
> Signed-off-by: Philippe Gerum <rpm@xenomai.org>
> ---
>   lib/cobalt/init.c | 43 +++++++++++++++++++++++++------------------
>   1 file changed, 25 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/cobalt/init.c b/lib/cobalt/init.c
> index abd990692..02a99c569 100644
> --- a/lib/cobalt/init.c
> +++ b/lib/cobalt/init.c
> @@ -184,20 +184,26 @@ static void low_init(void)
>   	cobalt_ticks_init(f->clock_freq);
>   }
>   
> +static int cobalt_init_2(void);
> +
>   static void cobalt_fork_handler(void)
>   {
>   	cobalt_unmap_umm();
>   	cobalt_clear_tsd();
>   	cobalt_print_init_atfork();
> -	if (cobalt_init())
> +	if (cobalt_init_2())
>   		exit(EXIT_FAILURE);
>   }
>   
> -static void __cobalt_init(void)
> +static inline void commit_stack_memory(void)
>   {
> -	struct sigaction sa;
> +	char stk[PTHREAD_STACK_MIN / 2];
> +	cobalt_commit_memory(stk);
> +}
>   
> -	low_init();
> +static void cobalt_init_1(void)
> +{
> +	struct sigaction sa;
>   
>   	sa.sa_sigaction = cobalt_sigdebug_handler;
>   	sigemptyset(&sa.sa_mask);
> @@ -228,20 +234,9 @@ static void __cobalt_init(void)
>   			    " sizeof(cobalt_sem_shadow): %Zd!",
>   			    sizeof(sem_t),
>   			    sizeof(struct cobalt_sem_shadow));
> -
> -	cobalt_mutex_init();
> -	cobalt_sched_init();
> -	cobalt_thread_init();
> -	cobalt_print_init();
>   }
>   
> -static inline void commit_stack_memory(void)
> -{
> -	char stk[PTHREAD_STACK_MIN / 2];
> -	cobalt_commit_memory(stk);
> -}
> -
> -int cobalt_init(void)
> +static int cobalt_init_2(void)
>   {
>   	pthread_t ptid = pthread_self();
>   	struct sched_param parm;
> @@ -249,7 +244,12 @@ int cobalt_init(void)
>   
>   	commit_stack_memory();	/* We only need this for the main thread */
>   	cobalt_default_condattr_init();
> -	__cobalt_init();
> +
> +	low_init();
> +	cobalt_mutex_init();
> +	cobalt_sched_init();
> +	cobalt_thread_init();
> +	cobalt_print_init();
>   
>   	if (__cobalt_control_bind)
>   		return 0;
> @@ -288,12 +288,19 @@ int cobalt_init(void)
>   	return 0;
>   }
>   
> +int cobalt_init(void)
> +{
> +	cobalt_init_1();
> +
> +	return cobalt_init_2();
> +}
> +
>   static int get_int_arg(const char *name, const char *arg,
>   		       int *valp, int min)
>   {
>   	int value, ret;
>   	char *p;
> -	
> +
>   	errno = 0;
>   	value = (int)strtol(arg, &p, 10);
>   	if (errno || *p || value < min) {
> 

Thanks, applied to next.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib/cobalt: init: do not call pthread_atfork() from atfork() handlers
  2019-05-14 13:58             ` Philippe Gerum
@ 2019-05-27  7:19               ` Jan Kiszka
  2019-05-29  8:22                 ` Philippe Gerum
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2019-05-27  7:19 UTC (permalink / raw)
  To: Philippe Gerum, Lange Norbert, Xenomai (xenomai@xenomai.org)

On 14.05.19 15:58, Philippe Gerum via Xenomai wrote:
> On 5/14/19 12:24 PM, Lange Norbert wrote:
>>
>>
>>> -----Original Message-----
>>> From: Philippe Gerum <rpm@xenomai.org>
>>> Sent: Dienstag, 14. Mai 2019 12:05
>>> To: Lange Norbert <norbert.lange@andritz.com>; Xenomai
>>> (xenomai@xenomai.org) <xenomai@xenomai.org>
>>> Subject: Re: [PATCH] lib/cobalt: init: do not call pthread_atfork() from
>>> atfork() handlers
>>>
>>> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
>>> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
>>> ATTACHMENTS.
>>>
>>>
>>> On 5/14/19 11:53 AM, Lange Norbert wrote:
>>>> (readding ML)
>>>>
>>>>> -----Original Message-----
>>>>> From: Philippe Gerum <rpm@xenomai.org>
>>>>> Sent: Dienstag, 14. Mai 2019 10:38
>>>>> To: Lange Norbert <norbert.lange@andritz.com>
>>>>> Subject: Re: [PATCH] lib/cobalt: init: do not call pthread_atfork()
>>>>> from
>>>>> atfork() handlers
>>>>>
>>>>> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE,
>>> PLEASE
>>>>> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
>>> ATTACHMENTS.
>>>>>
>>>>>
>>>>> On 5/14/19 10:35 AM, Philippe Gerum wrote:
>>>>>> On 5/6/19 9:56 AM, Lange Norbert wrote:
>>>>>>> Hello Philippe,
>>>>>>>
>>>>>>> using this patch, smokey's "fork test" alone finishes, but..
>>>>>>> the smokey suite will hang when running that test after the mutex
>>>>>>> or cvars test. Eg.
>>>>>>>
>>>>>>> smokey --run=10,11
>>>>>>> smokey --run=12,11
>>>>>>
>>>>>> I cannot reproduce this with glibc 2.28, and the tip of my
>>>>>> for-upstream tree which includes that fix. Which glibc are you running?
>>>>
>>>> Glibc 2.28, Xenomai userspace is based on current master branch with
>>>> fix added (tested both with and without our company stuff on top)
>>>>
>>>>>
>>>>> Is this the sequence which hangs on your end?
>>>>>
>>>>> ~ # smokey --run=13-14
>>>>> posix_cond OK
>>>>> posix_fork OK
>>>>> ~ # smokey --run=15-14
>>>>> posix_mutex OK
>>>>> posix_fork OK
>>>>>
>>>>
>>>> Yes:
>>>>
>>>> root@buildroot:~# /usr/xenomai/bin/smokey --run=10 posix_mutex OK
>>>> root@buildroot:~# /usr/xenomai/bin/smokey --run=11 posix_fork OK
>>>> root@buildroot:~# /usr/xenomai/bin/smokey --run=10,11 posix_mutex OK
>>>>
>>>> When it hangs, this is the stacktrace:
>>>> (switched to crosstool-NG for the toolchain, did not check to enable
>>> debuginfo for glibc).
>>>>
>>>> (gdb) bt
>>>> #0  0x00007f45b4d86feb in ?? () from /lib64/libc.so.6
>>>> #1  0x00007f45b4de8b95 in malloc () from /lib64/libc.so.6
>>>> #2  0x00007f45b4f81a53 in ?? () from /lib64/ld-linux-x86-64.so.2
>>>> #3  0x00007f45b4f83149 in ?? () from /lib64/ld-linux-x86-64.so.2
>>>> #4  0x00007f45b4f8d4cc in ?? () from /lib64/ld-linux-x86-64.so.2
>>>> #5  0x00007f45b4ea7bcf in _dl_catch_exception () from /lib64/libc.so.6
>>>
>>> That is a different issue, possibly not directly related. backtrace() is used over
>>> a signal context in the default SIGSHADOW handler libcobalt installs, which is
>>> unsafe since backtrace() calls malloc(). This run ends up with a recursive call
>>> to malloc() which deadlocks on the internal arena lock. Disabling
>>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX may paper over the issue.
>>
>> So smokeys fork will cause some relaxation log entries if I understood this correctly,
>> and a "clean" application should leave realtime before calling fork.
>> (Unrelated to the issue that a deadlock should not happen)
>>
> 
> Aside of calling uninit backtrace() from a sighandler which is wrong, it looks like this issue pops up because two other conditions are met:
> 
> - some test which precedes smokey-fork switched the scheduling class of the main() smokey thread to a real-time policy (e.g. posix-mutex.c does so when entering run_posix_mutex()). When smokey-fork re-execs, the child inherits the scheduler settings for the main thread, so eventually runs copperplate inits as a SCHED_FIFO thread, which is eligible for relax detection (SCHED_OTHER/WEAK would not be).
> - xnthread_relax() from the core always asks for a backtrace via a SIGSHADOW signal upon detection, including when the current thread does not want to be warned about mode switches, which seems wrong. If the warn-on-switch bit was considered as it should be, the request for backtrace via SIGSHADOW would not be sent in the first place.
> 
> Clearing either of these conditions like the couple of patches below does should tame down the issue. I would recommend to merge both actually.
> 
> diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
> index fa7a65569..6caf9ef37 100644
> --- a/kernel/cobalt/thread.c
> +++ b/kernel/cobalt/thread.c
> @@ -2131,11 +2131,11 @@ void xnthread_relax(int notify, int reason)
>   	 * information.
>   	 */
>   	xnthread_propagate_schedparam(thread);
> -	
> +
>   	if (xnthread_test_state(thread, XNUSER) && notify) {
> -		xndebug_notify_relax(thread, reason);
>   		if (xnthread_test_state(thread, XNWARN)) {
>   			/* Help debugging spurious relaxes. */
> +			xndebug_notify_relax(thread, reason);
>   			memset(&si, 0, sizeof(si));
>   			si.si_signo = SIGDEBUG;
>   			si.si_code = SI_QUEUE;
> diff --git a/testsuite/smokey/main.c b/testsuite/smokey/main.c
> index 12321dfc5..5702e825a 100644
> --- a/testsuite/smokey/main.c
> +++ b/testsuite/smokey/main.c
> @@ -22,6 +22,7 @@
>   
>   int main(int argc, char *const argv[])
>   {
> +	struct sched_param param = { .sched_priority = 0 };
>   	struct smokey_test *t;
>   	int ret, fails = 0;
>   
> @@ -29,6 +30,7 @@ int main(int argc, char *const argv[])
>   		return 0;
>   
>   	for_each_smokey_test(t) {
> +		pthread_setschedparam(pthread_self(), SCHED_OTHER, &param);
>   		ret = t->run(t, argc, argv);
>   		if (ret) {
>   			if (ret == -ENOSYS) {
> 

Can you provide those hunks as patches?

And how about the backtrace() issue?

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] lib/cobalt: init: do not call pthread_atfork() from atfork() handlers
  2019-05-27  7:19               ` Jan Kiszka
@ 2019-05-29  8:22                 ` Philippe Gerum
  0 siblings, 0 replies; 10+ messages in thread
From: Philippe Gerum @ 2019-05-29  8:22 UTC (permalink / raw)
  To: Jan Kiszka, Lange Norbert, Xenomai (xenomai@xenomai.org)

On 5/27/19 9:19 AM, Jan Kiszka wrote:
> On 14.05.19 15:58, Philippe Gerum via Xenomai wrote:
>> On 5/14/19 12:24 PM, Lange Norbert wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Philippe Gerum <rpm@xenomai.org>
>>>> Sent: Dienstag, 14. Mai 2019 12:05
>>>> To: Lange Norbert <norbert.lange@andritz.com>; Xenomai
>>>> (xenomai@xenomai.org) <xenomai@xenomai.org>
>>>> Subject: Re: [PATCH] lib/cobalt: init: do not call pthread_atfork()
>>>> from
>>>> atfork() handlers
>>>>
>>>> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
>>>> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
>>>> ATTACHMENTS.
>>>>
>>>>
>>>> On 5/14/19 11:53 AM, Lange Norbert wrote:
>>>>> (readding ML)
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Philippe Gerum <rpm@xenomai.org>
>>>>>> Sent: Dienstag, 14. Mai 2019 10:38
>>>>>> To: Lange Norbert <norbert.lange@andritz.com>
>>>>>> Subject: Re: [PATCH] lib/cobalt: init: do not call pthread_atfork()
>>>>>> from
>>>>>> atfork() handlers
>>>>>>
>>>>>> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE,
>>>> PLEASE
>>>>>> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
>>>> ATTACHMENTS.
>>>>>>
>>>>>>
>>>>>> On 5/14/19 10:35 AM, Philippe Gerum wrote:
>>>>>>> On 5/6/19 9:56 AM, Lange Norbert wrote:
>>>>>>>> Hello Philippe,
>>>>>>>>
>>>>>>>> using this patch, smokey's "fork test" alone finishes, but..
>>>>>>>> the smokey suite will hang when running that test after the mutex
>>>>>>>> or cvars test. Eg.
>>>>>>>>
>>>>>>>> smokey --run=10,11
>>>>>>>> smokey --run=12,11
>>>>>>>
>>>>>>> I cannot reproduce this with glibc 2.28, and the tip of my
>>>>>>> for-upstream tree which includes that fix. Which glibc are you
>>>>>>> running?
>>>>>
>>>>> Glibc 2.28, Xenomai userspace is based on current master branch with
>>>>> fix added (tested both with and without our company stuff on top)
>>>>>
>>>>>>
>>>>>> Is this the sequence which hangs on your end?
>>>>>>
>>>>>> ~ # smokey --run=13-14
>>>>>> posix_cond OK
>>>>>> posix_fork OK
>>>>>> ~ # smokey --run=15-14
>>>>>> posix_mutex OK
>>>>>> posix_fork OK
>>>>>>
>>>>>
>>>>> Yes:
>>>>>
>>>>> root@buildroot:~# /usr/xenomai/bin/smokey --run=10 posix_mutex OK
>>>>> root@buildroot:~# /usr/xenomai/bin/smokey --run=11 posix_fork OK
>>>>> root@buildroot:~# /usr/xenomai/bin/smokey --run=10,11 posix_mutex OK
>>>>>
>>>>> When it hangs, this is the stacktrace:
>>>>> (switched to crosstool-NG for the toolchain, did not check to enable
>>>> debuginfo for glibc).
>>>>>
>>>>> (gdb) bt
>>>>> #0  0x00007f45b4d86feb in ?? () from /lib64/libc.so.6
>>>>> #1  0x00007f45b4de8b95 in malloc () from /lib64/libc.so.6
>>>>> #2  0x00007f45b4f81a53 in ?? () from /lib64/ld-linux-x86-64.so.2
>>>>> #3  0x00007f45b4f83149 in ?? () from /lib64/ld-linux-x86-64.so.2
>>>>> #4  0x00007f45b4f8d4cc in ?? () from /lib64/ld-linux-x86-64.so.2
>>>>> #5  0x00007f45b4ea7bcf in _dl_catch_exception () from /lib64/libc.so.6
>>>>
>>>> That is a different issue, possibly not directly related.
>>>> backtrace() is used over
>>>> a signal context in the default SIGSHADOW handler libcobalt
>>>> installs, which is
>>>> unsafe since backtrace() calls malloc(). This run ends up with a
>>>> recursive call
>>>> to malloc() which deadlocks on the internal arena lock. Disabling
>>>> CONFIG_XENO_OPT_DEBUG_TRACE_RELAX may paper over the issue.
>>>
>>> So smokeys fork will cause some relaxation log entries if I
>>> understood this correctly,
>>> and a "clean" application should leave realtime before calling fork.
>>> (Unrelated to the issue that a deadlock should not happen)
>>>
>>
>> Aside of calling uninit backtrace() from a sighandler which is wrong,
>> it looks like this issue pops up because two other conditions are met:
>>
>> - some test which precedes smokey-fork switched the scheduling class
>> of the main() smokey thread to a real-time policy (e.g. posix-mutex.c
>> does so when entering run_posix_mutex()). When smokey-fork re-execs,
>> the child inherits the scheduler settings for the main thread, so
>> eventually runs copperplate inits as a SCHED_FIFO thread, which is
>> eligible for relax detection (SCHED_OTHER/WEAK would not be).
>> - xnthread_relax() from the core always asks for a backtrace via a
>> SIGSHADOW signal upon detection, including when the current thread
>> does not want to be warned about mode switches, which seems wrong. If
>> the warn-on-switch bit was considered as it should be, the request for
>> backtrace via SIGSHADOW would not be sent in the first place.
>>
>> Clearing either of these conditions like the couple of patches below
>> does should tame down the issue. I would recommend to merge both
>> actually.
>>
>> diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
>> index fa7a65569..6caf9ef37 100644
>> --- a/kernel/cobalt/thread.c
>> +++ b/kernel/cobalt/thread.c
>> @@ -2131,11 +2131,11 @@ void xnthread_relax(int notify, int reason)
>>        * information.
>>        */
>>       xnthread_propagate_schedparam(thread);
>> -   
>> +
>>       if (xnthread_test_state(thread, XNUSER) && notify) {
>> -        xndebug_notify_relax(thread, reason);
>>           if (xnthread_test_state(thread, XNWARN)) {
>>               /* Help debugging spurious relaxes. */
>> +            xndebug_notify_relax(thread, reason);
>>               memset(&si, 0, sizeof(si));
>>               si.si_signo = SIGDEBUG;
>>               si.si_code = SI_QUEUE;
>> diff --git a/testsuite/smokey/main.c b/testsuite/smokey/main.c
>> index 12321dfc5..5702e825a 100644
>> --- a/testsuite/smokey/main.c
>> +++ b/testsuite/smokey/main.c
>> @@ -22,6 +22,7 @@
>>     int main(int argc, char *const argv[])
>>   {
>> +    struct sched_param param = { .sched_priority = 0 };
>>       struct smokey_test *t;
>>       int ret, fails = 0;
>>   @@ -29,6 +30,7 @@ int main(int argc, char *const argv[])
>>           return 0;
>>         for_each_smokey_test(t) {
>> +        pthread_setschedparam(pthread_self(), SCHED_OTHER, &param);
>>           ret = t->run(t, argc, argv);
>>           if (ret) {
>>               if (ret == -ENOSYS) {
>>
> 
> Can you provide those hunks as patches?
>

Those are pending in my queue with other patches. I'll submit them later
today.

> And how about the backtrace() issue?
> 
The most practical way is to warmup backtrace() by a dummy call during
inits as Norbert suggested. I'll add a patch doing so to the queue.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-05-29  8:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-03 15:09 [PATCH] lib/cobalt: init: do not call pthread_atfork() from atfork() handlers Philippe Gerum
     [not found] ` <VI1PR05MB591765785A35FB3685463256F6300@VI1PR05MB5917.eurprd05.prod.outlook.com>
     [not found]   ` <6ef3a9cf-cb30-59c4-1983-4475fdc7cd6e@xenomai.org>
     [not found]     ` <c0230398-8da5-a0b4-d30c-17e91e2f0cb7@xenomai.org>
2019-05-14  9:53       ` Lange Norbert
2019-05-14 10:04         ` Philippe Gerum
2019-05-14 10:24           ` Lange Norbert
2019-05-14 10:40             ` FW: " Lange Norbert
2019-05-14 14:01               ` Philippe Gerum
2019-05-14 13:58             ` Philippe Gerum
2019-05-27  7:19               ` Jan Kiszka
2019-05-29  8:22                 ` Philippe Gerum
2019-05-27  7:17 ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.