* [PATCH] Set SA_RESTART flag on SIGCHLD handler
[not found] ` <20120319033751.GA25237@burratino>
@ 2012-06-21 7:48 ` Anders Kaseorg
[not found] ` <20120622003628.GD4730@burratino>
0 siblings, 1 reply; 6+ messages in thread
From: Anders Kaseorg @ 2012-06-21 7:48 UTC (permalink / raw)
To: dash; +Cc: Jonathan Nieder, git
It looks like this was never forwarded to the dash mailing list. Any
thoughts from upstream?
Context: http://thread.gmane.org/gmane.comp.version-control.git.debian/147
(FTR, it did happen again.)
Anders
On Sun, 18 Mar 2012, Jonathan Nieder wrote:
> Anders Kaseorg wrote:
>
> > So I guess I’m willing to just hope for now that this doesn’t happen
> > again. (Unless you think it’s a dash bug that EINTR isn’t handled
> > there?)
>
> Yes, I suspect it's a dash >= 0.5.6 bug. On initialization dash sets
> up a signal handler for SIGCHLD without the SA_RESTART flag, which can
> cause other dash code to fail at seemingly random times. Similar
> problems would arise when scripts set up traps.
>
> diff --git i/src/trap.c w/src/trap.c
> index 17316c95..cd99596f 100644
> --- i/src/trap.c
> +++ w/src/trap.c
> @@ -259,7 +259,7 @@ setsignal(int signo)
> act.sa_handler = SIG_DFL;
> }
> *t = action;
> - act.sa_flags = 0;
> + act.sa_flags = SA_RESTART;
> sigfillset(&act.sa_mask);
> sigaction(signo, &act, 0);
> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Set SA_RESTART flag on SIGCHLD handler
[not found] ` <20120622003628.GD4730@burratino>
@ 2012-06-22 9:36 ` Anders Kaseorg
2012-06-26 10:03 ` Anders Kaseorg
2012-06-22 22:02 ` Jilles Tjoelker
1 sibling, 1 reply; 6+ messages in thread
From: Anders Kaseorg @ 2012-06-22 9:36 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: dash, git
On 06/21/2012 08:36 PM, Jonathan Nieder wrote:
> Thanks. Here's a less invasive patch to experiment with.
Hmm. I’ve kicked off several builds in the git-core PPA, first under
dash with the original patch, now under dash with the less invasive
patch. Under either dash, almost all of the builds end up trapped in
what looks like a deadlock, at various parts of t9010-svn-fe.sh.
https://launchpad.net/~git-core/+archive/ppa/+build/3596228
https://launchpad.net/~git-core/+archive/ppa/+build/3596229
https://launchpad.net/~git-core/+archive/ppa/+build/3597370
https://launchpad.net/~git-core/+archive/ppa/+build/3597371
I’ll try to debug this later by hacking up the package to run that
script through strace -fiv during build (since I still can’t reproduce
locally), unless you have anything more clever to try.
Anders
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Set SA_RESTART flag on SIGCHLD handler
[not found] ` <20120622003628.GD4730@burratino>
2012-06-22 9:36 ` Anders Kaseorg
@ 2012-06-22 22:02 ` Jilles Tjoelker
1 sibling, 0 replies; 6+ messages in thread
From: Jilles Tjoelker @ 2012-06-22 22:02 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: Anders Kaseorg, dash, git
On Thu, Jun 21, 2012 at 07:36:28PM -0500, Jonathan Nieder wrote:
> Anders Kaseorg wrote:
> > On Sun, 18 Mar 2012, Jonathan Nieder wrote:
> >> --- i/src/trap.c
> >> +++ w/src/trap.c
> >> @@ -259,7 +259,7 @@ setsignal(int signo)
> >> act.sa_handler = SIG_DFL;
> >> }
> >> *t = action;
> >> - act.sa_flags = 0;
> >> + act.sa_flags = SA_RESTART;
> >> sigfillset(&act.sa_mask);
> >> sigaction(signo, &act, 0);
> >> }
> [...]
> > It looks like this was never forwarded to the dash mailing list. Any
> > thoughts from upstream?
> > Context: http://thread.gmane.org/gmane.comp.version-control.git.debian/147
> > (FTR, it did happen again.)
> Thanks. Here's a less invasive patch to experiment with.
> POSIX (XCU §2.11 "Signals and Error Handling") requires that the "wait"
> utility must return immediately when interrupted by a trapped signal,
> so SA_RESTART is not appropriate for those.
Same thing for "read", even though that does not make much sense (the
only thing the trap handler can do is abandon the read, since bytes
already read but not part of a complete line may be lost).
On the other hand, this interruption is not specified to happen with
signals whose default action is to ignore. Many shells don't do this,
though (perhaps because of the difficulty of finding what the default
action is).
> It does not seem to give any advice on whether signals interrupt
> redirection.
For certain situations it certainly does. For simple commands without
command name, it says any redirections shall be performed in a subshell
environment; hence, trapped signals are reset to the default action.
This is what most shells appear to do in all cases. Again, this may not
be the most useful behaviour but it matches what the Bourne shell did.
> diff --git i/src/redir.c w/src/redir.c
> index f96a76bc..7fb7748f 100644
> --- i/src/redir.c
> +++ w/src/redir.c
> @@ -206,8 +206,11 @@ openredirect(union node *redir)
> /* FALLTHROUGH */
> case NCLOBBER:
> fname = redir->nfile.expfname;
> - if ((f = open64(fname, O_WRONLY|O_CREAT|O_TRUNC, 0666)) < 0)
> + while ((f = open64(fname, O_WRONLY|O_CREAT|O_TRUNC, 0666)) < 0) {
> + if (errno == EINTR)
> + continue;
> goto ecreate;
> + }
> break;
> case NAPPEND:
> fname = redir->nfile.expfname;
It looks like this patch breaks interrupting (via SIGINT) a blocking
open, for example of a fifo. Although the race condition associated with
relying on [EINTR] is very hard to close, I don't think that's reason to
give up on it entirely.
I think it is better to rethink commit
3800d4934391b144fd261a7957aea72ced7d47ea that caused this bug (and some
others that have been fixed). A signal handler for SIGCHLD was added to
allow waiting for a trapped signal or a child process at the same time
(using sigsuspend, as was done, or sigwait). However, leaving the signal
handler installed all the time causes unexpected [EINTR] errors, so I
suggest only installing it during the "wait" builtin or specifying
SA_RESTART only for this special SIGCHLD handler (while leaving all
other signal handlers interrupting). In either case, behaviour for
SIGCHLD handlers installed because of CHLD traps should remain
unchanged. The former seems safest because SA_RESTART handlers still
cause visible effects such as short writes.
In case of a script doing 'trap "" CHLD', it is perhaps undesirable for
"wait" to block forever, given that wait(2) doesn't do that either.
--
Jilles Tjoelker
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Set SA_RESTART flag on SIGCHLD handler
2012-06-22 9:36 ` Anders Kaseorg
@ 2012-06-26 10:03 ` Anders Kaseorg
2012-06-26 12:19 ` Anders Kaseorg
0 siblings, 1 reply; 6+ messages in thread
From: Anders Kaseorg @ 2012-06-26 10:03 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: dash, git
I wrote this greatly simplified skeleton of the failing Git test, and
found that I can reproduce what looks like the same problem, fairly
reliably:
$ rm -f fifo
$ mkfifo fifo
$ dash -c 'while echo -n .; do : < fifo & : > fifo; wait $!; done'
...............................dash: 1: cannot create fifo: Interrupted
system call
After running this through strace, it seems clear what went wrong. The
parent’s open("fifo", O_WRONLY) is what causes the child’s open("fifo",
O_RDONLY) to return. If the child exits immediately, the parent may
receive a SIGCHLD before its own open() returns. Then when its SIGCHLD
handler exits, open() returns EINTR.
That also explains why the SA_RESTART and restart-on-EINTR patches only
turned it into a deadlock. If the open() were restarted now, it would
just block to wait for a _second_ reader of the FIFO, which will never
arrive.
But if that’s what happened, it seems like a kernel bug to me!
Shouldn’t an open() call only be restarted or return EINTR in the case
that it hasn’t had any side effects yet and can safely be restarted?
Anders
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Set SA_RESTART flag on SIGCHLD handler
2012-06-26 10:03 ` Anders Kaseorg
@ 2012-06-26 12:19 ` Anders Kaseorg
2012-07-21 23:14 ` Anders Kaseorg
0 siblings, 1 reply; 6+ messages in thread
From: Anders Kaseorg @ 2012-06-26 12:19 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: dash, git
On Tue, 26 Jun 2012, Anders Kaseorg wrote:
> But if that’s what happened, it seems like a kernel bug to me!
> Shouldn’t an open() call only be restarted or return EINTR in the case
> that it hasn’t had any side effects yet and can safely be restarted?
I submitted a kernel patch:
http://marc.info/?l=linux-kernel&m=134071285509470
Anders
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Set SA_RESTART flag on SIGCHLD handler
2012-06-26 12:19 ` Anders Kaseorg
@ 2012-07-21 23:14 ` Anders Kaseorg
0 siblings, 0 replies; 6+ messages in thread
From: Anders Kaseorg @ 2012-07-21 23:14 UTC (permalink / raw)
To: Anders Kaseorg; +Cc: Jonathan Nieder, dash, git, 678852
On 06/26/2012 08:19 AM, Anders Kaseorg wrote:
>> But if that’s what happened, it seems like a kernel bug to me!
>> Shouldn’t an open() call only be restarted or return EINTR in the case
>> that it hasn’t had any side effects yet and can safely be restarted?
>
> I submitted a kernel patch:
> http://marc.info/?l=linux-kernel&m=134071285509470
That’s now merged as v3.5~31, v3.4.6~25, and v3.0.38~17.
Anders
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-07-21 23:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <4F665E77.3060102@mit.edu>
[not found] ` <20120319025201.GA24778@burratino>
[not found] ` <4F66A6CA.4010002@mit.edu>
[not found] ` <20120319033751.GA25237@burratino>
2012-06-21 7:48 ` [PATCH] Set SA_RESTART flag on SIGCHLD handler Anders Kaseorg
[not found] ` <20120622003628.GD4730@burratino>
2012-06-22 9:36 ` Anders Kaseorg
2012-06-26 10:03 ` Anders Kaseorg
2012-06-26 12:19 ` Anders Kaseorg
2012-07-21 23:14 ` Anders Kaseorg
2012-06-22 22:02 ` Jilles Tjoelker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).