linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* EINTR causes sigwaitinfo and pthread_kill to become hosed
@ 2004-08-24 15:28 Wilkerson, Bryan P
  2004-08-24 21:34 ` Roland McGrath
  0 siblings, 1 reply; 3+ messages in thread
From: Wilkerson, Bryan P @ 2004-08-24 15:28 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2422 bytes --]


The attach code mimics the algorithm used by a major software vendor to
suspend threads.   The attached diagram outlines that algorithm.   The
problems start when you attempt to use ptrace() while T2 is waiting on
sigwaitinfo().   I've tried this on a variety of kernels (2.4.21-99,
2.6.5-7.97 and RHEL4 Alpha 3) and platforms (x86 and IA64) with the same
result.   

1.  Start by running:  ./sig-suspend --no-ptrace

The test harness operates correctly in this mode.  The worker thread
works up until the point that it gets a SIGUSR1 and then it does a
sigwaitinfo which causes the worker thread to effectively suspend (no
"### Worker thread working.." messages). sigwaitinfo returns when it
gets a second SIGUSR and exits and the worker thread resumes.  This
continues for 500 iterations.

2.  Introduce a thread that does a ptrace attach/detach at regular
intervals:  ./sig-suspend

The test does a  couple iterations correctly and then suddenly fails.
The sigwaitinfo in the worker thread's signal handler returns -1 and
errno=4  (EINTR).  This is definitely caused by the PTRACE_DETACH (you
can verify by extending the attach period with --attach=20000).   What
is the proper way to recover from errno=EINTR?   The logical thing to do
seems to be to be to just go back and call sigwaitinfo again...

3.  Tell it not to not to abort test on EINTR:  ./sig-suspend
--no-fail-eintr

This is where it really gets messed up.  Not sure why, but as you can
see the signal handler is no longer able to send a signal back to the
main thread to acknowledge the resume.   It would appear that instead of
sending the signal back to the main thread the signal gets delivered
back to the worker thread and each ack it tries to send comes back to
it.  Worker thread becomes consumed with processing the signals being
echoed back to it.   Main thread never gets it's ack and goes on waiting
forever.  

The vendor's code actually ignores the return from sigwaitinfo and acts
as if it successfully received the signal it was waiting for in the
signal handler.  That can also be simulated in the test harness with the
--no-check-sigwait option but the result is same as the --no-fail-eintr
option.    

If this is patched already that is great but a work around would also be
helpful as my client refuses to require it's customers to patch the
kernel to run their product.  

-bryan


[-- Attachment #2: sig-suspend.diagram.ZIP --]
[-- Type: application/x-zip-compressed, Size: 38308 bytes --]

[-- Attachment #3: sig-suspend.ZIP --]
[-- Type: application/x-zip-compressed, Size: 3148 bytes --]

[-- Attachment #4: Makefile --]
[-- Type: application/octet-stream, Size: 524 bytes --]



CFLAGS += -O0 -g

LIBS += -lpthread

TARGETS=sig-suspend

all: $(TARGETS)

$(TARGETS):  %:%.o 
	$(CC) -o $@ $(CFLAGS) $^ $(LIBS)

clean:
	$(RM) -f *.o

distclean: clean
	$(RM) -f $(TARGETS)

depend:
	$(MKDEP) $(INCDIR) $(CFLAGS) $(SRCS)

.SUFFIXES: .c .S .o .lo .cpp

.S.o:
	$(CC) $(INCDIR) $(CFLAGS) -c $*.S
.c.o:
	$(CC) $(INCDIR) $(CFLAGS) -c $*.c
.cpp.o:
	$(CXX) $(INCDIR) $(CFLAGS) -c $*.cpp
.c.lo:
	$(CC) -fPIC -DPIC $(INCDIR) $(CFLAGS) -c $*.c -o $*.lo
.S.lo:
	$(CC) -fPIC -DPIC $(INCDIR) $(CFLAGS) -c $*.S -o $*.lo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: EINTR causes sigwaitinfo and pthread_kill to become hosed
  2004-08-24 15:28 EINTR causes sigwaitinfo and pthread_kill to become hosed Wilkerson, Bryan P
@ 2004-08-24 21:34 ` Roland McGrath
  0 siblings, 0 replies; 3+ messages in thread
From: Roland McGrath @ 2004-08-24 21:34 UTC (permalink / raw)
  To: Wilkerson, Bryan P; +Cc: linux-kernel

Have you reproduced this on 2.6.8.1?  I'm not seeing it so far.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: EINTR causes sigwaitinfo and pthread_kill to become hosed
@ 2004-08-24 21:58 Wilkerson, Bryan P
  0 siblings, 0 replies; 3+ messages in thread
From: Wilkerson, Bryan P @ 2004-08-24 21:58 UTC (permalink / raw)
  To: Roland McGrath; +Cc: linux-kernel


Roland McGrath [mailto:roland@redhat.com] wrote:

> Have you reproduced this on 2.6.8.1?  I'm not seeing it so far.

Thanks for looking at it.  I have not tried it on 2.6.8.1.  I Will try
as soon as time permits.  So far I've tried 2.4.21-99, 2.6.5-7.97 and
RHEL4 Alpha 3 (don't have the disk loaded in my system right now to tell
you the kernel version).



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-08-24 22:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-24 15:28 EINTR causes sigwaitinfo and pthread_kill to become hosed Wilkerson, Bryan P
2004-08-24 21:34 ` Roland McGrath
2004-08-24 21:58 Wilkerson, Bryan P

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).