linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PROBLEM: threaded process stuck on 2.4.2[12]
@ 2003-08-04 17:49 Jan Ciger
  0 siblings, 0 replies; only message in thread
From: Jan Ciger @ 2003-08-04 17:49 UTC (permalink / raw)
  To: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hello,

It seems, that 2.4.21 and 2.4.22-pre10 have a problem with
threads. Some apps hang in rt_sigsuspend() call.

How to reproduce :

1) Download OpenProducer
http://www.andesengineering.com/Producer/Download/Producer-0.8.2-2.tar.gz

2) Download OpenSceneGraph
http://www.openscenegraph.org/download/snapshots/OpenSceneGraph-0.9.4-2.tar.gz

3) build and install OpenProducer and then OpenSceneGraph

4) after installation, run e.g. osganimate

For me the application hangs, until I hit CTRL+Z (suspend) and then do
either bg or fg from the shell.

Strace shows the application hanging in :

sigreturn()                             = ? (mask now [RTMIN])
rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
rt_sigsuspend([] <unfinished ...>

If I start the application in gdb, I get first SIG32, that should be OK
and then SIGTRAP. After hitting continue on SIGTRAP, the application
hangs in :

0x40fda714 in pthread_getconcurrency () from /lib/i686/libpthread.so.0

It seems, that some signal is never delivered or there is a deadlock of
sorts.

I tried another strace and it seems, that something fishy is going on :

- -----------------------

$ strace -e trace=signal -f osganimate
rt_sigaction(SIGRTMIN, {0x40544000, [], SA_RESTORER, 0x405e1488}, NULL,
8) = 0
rt_sigaction(SIGRT_1, {0x405438c0, [], SA_RESTORER, 0x405e1488}, NULL,
8) = 0
rt_sigaction(SIGRT_2, {0x40544040, [], SA_RESTORER, 0x405e1488}, NULL,
8) = 0
rt_sigprocmask(SIG_BLOCK, [RTMIN], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [33], NULL, 8) = 0
Process 5191 attached (waiting for parent)
Process 5191 resumed (parent 5187 ready)
[pid  5191] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[pid  5191] rt_sigprocmask(SIG_SETMASK, ~[TRAP 33], NULL, 8) = 0
[pid  5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid  5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid  5187] rt_sigsuspend([]Process 5192 attached
~ <unfinished ...>
[pid  5192] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[pid  5191] kill(5187, SIGRTMIN)        = 0
[pid  5192] rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
[pid  5187] --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---
[pid  5192] rt_sigprocmask(SIG_SETMASK, NULL,  <unfinished ...>
[pid  5187] <... rt_sigsuspend resumed> ) = -1 EINTR (Interrupted system
call)
[pid  5192] <... rt_sigprocmask resumed> [RTMIN], 8) = 0
[pid  5187] sigreturn( <unfinished ...>
[pid  5192] rt_sigsuspend([] <unfinished ...>
[pid  5187] <... sigreturn resumed> )   = ? (mask now [RTMIN])
[pid  5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid  5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid  5187] rt_sigsuspend([]Process 5193 attached (waiting for parent)
Process 5193 resumed (parent 5191 ready)
~ <unfinished ...>
[pid  5193] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[pid  5191] kill(5187, SIGRTMIN)        = 0
[pid  5193] rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
[pid  5187] --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---
[pid  5187] <... rt_sigsuspend resumed> ) = -1 EINTR (Interrupted system
call)
[pid  5187] sigreturn()                 = ? (mask now [RTMIN])
[pid  5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid  5187] rt_sigsuspend([] <unfinished ...>
[pid  5193] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid  5193] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid  5193] rt_sigsuspend([]Process 5194 attached (waiting for parent)
Process 5194 resumed (parent 5191 ready)
~ <unfinished ...>
[pid  5194] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[pid  5191] kill(5193, SIGRTMIN <unfinished ...>
[pid  5193] --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---
Process 5193 detached

^^^^^^^^^^^^^^^^^^^^^^^^^ thread died ?

[pid  5191] <... kill resumed> )        = 0
[pid  5194] rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
[pid  5194] kill(5193, SIGRTMIN)        = 0
[pid  5194] kill(5193, SIGRTMIN)        = 0

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Trying to send a signal to a dead thread, it seems and then it waits
forever for response.

When I run it inside gdb, the dead thread is in <defunct> state :-(

I am not PosixThreads expert, but this looks abnormal to me. Either some
signal got missing (SIGSEGV or SIGCHLD, for example, if something
crashed and the thread manager didn't "notice" the disappearance of one
of the threads) or there is something funny going on.

I didn't try to reproduce this problem on 2.5/6 kernels, since some
drivers my machine needs do not compile yet (like SCSI AIC7xxxx and NVidia)

My box is dual Xeon 2.2GHz with 1GB RAM, running 2.4.21-6mdk (Mandrake
enterprise kernel) now, but the same bug happens also with 2.4.22-pre10
and stock 2.4.21 compiled from source (tested yesterday and today).

I will provide more info on request, I do not want to flood the list.

Some advice would be extremely appreciated, because this is a show
stopper for me.

Regards,

Jan
- --

Jan Ciger
VRlab EPFL Switzerland
GPG public key : http://www.keyserver.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQE/Lpyyn11XseNj94gRAk/eAKCXI4SIakqr3/n7ZJnnn6+bukhoKACg6s78
43GodPkQAjO1ZBsKzzBpMAw=
=Lq4F
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2003-08-04 17:49 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-04 17:49 PROBLEM: threaded process stuck on 2.4.2[12] Jan Ciger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).