dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Set SA_RESTART flag on SIGCHLD handler
       [not found]     ` <20120319033751.GA25237@burratino>
@ 2012-06-21  7:48       ` Anders Kaseorg
       [not found]         ` <20120622003628.GD4730@burratino>
  0 siblings, 1 reply; 6+ messages in thread
From: Anders Kaseorg @ 2012-06-21  7:48 UTC (permalink / raw)
  To: dash; +Cc: Jonathan Nieder, git

It looks like this was never forwarded to the dash mailing list.  Any 
thoughts from upstream?

Context: http://thread.gmane.org/gmane.comp.version-control.git.debian/147
(FTR, it did happen again.)

Anders

On Sun, 18 Mar 2012, Jonathan Nieder wrote:

> Anders Kaseorg wrote:
> 
> > So I guess I’m willing to just hope for now that this doesn’t happen
> > again.  (Unless you think it’s a dash bug that EINTR isn’t handled
> > there?)
> 
> Yes, I suspect it's a dash >= 0.5.6 bug.  On initialization dash sets
> up a signal handler for SIGCHLD without the SA_RESTART flag, which can
> cause other dash code to fail at seemingly random times.  Similar
> problems would arise when scripts set up traps.
> 
> diff --git i/src/trap.c w/src/trap.c
> index 17316c95..cd99596f 100644
> --- i/src/trap.c
> +++ w/src/trap.c
> @@ -259,7 +259,7 @@ setsignal(int signo)
>  		act.sa_handler = SIG_DFL;
>  	}
>  	*t = action;
> -	act.sa_flags = 0;
> +	act.sa_flags = SA_RESTART;
>  	sigfillset(&act.sa_mask);
>  	sigaction(signo, &act, 0);
>  }
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Set SA_RESTART flag on SIGCHLD handler
       [not found]         ` <20120622003628.GD4730@burratino>
@ 2012-06-22  9:36           ` Anders Kaseorg
  2012-06-26 10:03             ` Anders Kaseorg
  2012-06-22 22:02           ` Jilles Tjoelker
  1 sibling, 1 reply; 6+ messages in thread
From: Anders Kaseorg @ 2012-06-22  9:36 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: dash, git

On 06/21/2012 08:36 PM, Jonathan Nieder wrote:
> Thanks.  Here's a less invasive patch to experiment with.

Hmm.  I’ve kicked off several builds in the git-core PPA, first under 
dash with the original patch, now under dash with the less invasive 
patch.  Under either dash, almost all of the builds end up trapped in 
what looks like a deadlock, at various parts of t9010-svn-fe.sh.

https://launchpad.net/~git-core/+archive/ppa/+build/3596228
https://launchpad.net/~git-core/+archive/ppa/+build/3596229
https://launchpad.net/~git-core/+archive/ppa/+build/3597370
https://launchpad.net/~git-core/+archive/ppa/+build/3597371

I’ll try to debug this later by hacking up the package to run that 
script through strace -fiv during build (since I still can’t reproduce 
locally), unless you have anything more clever to try.

Anders


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Set SA_RESTART flag on SIGCHLD handler
       [not found]         ` <20120622003628.GD4730@burratino>
  2012-06-22  9:36           ` Anders Kaseorg
@ 2012-06-22 22:02           ` Jilles Tjoelker
  1 sibling, 0 replies; 6+ messages in thread
From: Jilles Tjoelker @ 2012-06-22 22:02 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Anders Kaseorg, dash, git

On Thu, Jun 21, 2012 at 07:36:28PM -0500, Jonathan Nieder wrote:
> Anders Kaseorg wrote:
> > On Sun, 18 Mar 2012, Jonathan Nieder wrote:

> >> --- i/src/trap.c
> >> +++ w/src/trap.c
> >> @@ -259,7 +259,7 @@ setsignal(int signo)
> >>  		act.sa_handler = SIG_DFL;
> >>  	}
> >>  	*t = action;
> >> -	act.sa_flags = 0;
> >> +	act.sa_flags = SA_RESTART;
> >>  	sigfillset(&act.sa_mask);
> >>  	sigaction(signo, &act, 0);
> >>  }
> [...]
> > It looks like this was never forwarded to the dash mailing list.  Any 
> > thoughts from upstream?

> > Context: http://thread.gmane.org/gmane.comp.version-control.git.debian/147
> > (FTR, it did happen again.)

> Thanks.  Here's a less invasive patch to experiment with.

> POSIX (XCU §2.11 "Signals and Error Handling") requires that the "wait"
> utility must return immediately when interrupted by a trapped signal,
> so SA_RESTART is not appropriate for those.

Same thing for "read", even though that does not make much sense (the
only thing the trap handler can do is abandon the read, since bytes
already read but not part of a complete line may be lost).

On the other hand, this interruption is not specified to happen with
signals whose default action is to ignore. Many shells don't do this,
though (perhaps because of the difficulty of finding what the default
action is).

> It does not seem to give any advice on whether signals interrupt
> redirection.

For certain situations it certainly does. For simple commands without
command name, it says any redirections shall be performed in a subshell
environment; hence, trapped signals are reset to the default action.
This is what most shells appear to do in all cases. Again, this may not
be the most useful behaviour but it matches what the Bourne shell did.

> diff --git i/src/redir.c w/src/redir.c
> index f96a76bc..7fb7748f 100644
> --- i/src/redir.c
> +++ w/src/redir.c
> @@ -206,8 +206,11 @@ openredirect(union node *redir)
>  		/* FALLTHROUGH */
>  	case NCLOBBER:
>  		fname = redir->nfile.expfname;
> -		if ((f = open64(fname, O_WRONLY|O_CREAT|O_TRUNC, 0666)) < 0)
> +		while ((f = open64(fname, O_WRONLY|O_CREAT|O_TRUNC, 0666)) < 0) {
> +			if (errno == EINTR)
> +				continue;
>  			goto ecreate;
> +		}
>  		break;
>  	case NAPPEND:
>  		fname = redir->nfile.expfname;

It looks like this patch breaks interrupting (via SIGINT) a blocking
open, for example of a fifo. Although the race condition associated with
relying on [EINTR] is very hard to close, I don't think that's reason to
give up on it entirely.

I think it is better to rethink commit
3800d4934391b144fd261a7957aea72ced7d47ea that caused this bug (and some
others that have been fixed). A signal handler for SIGCHLD was added to
allow waiting for a trapped signal or a child process at the same time
(using sigsuspend, as was done, or sigwait). However, leaving the signal
handler installed all the time causes unexpected [EINTR] errors, so I
suggest only installing it during the "wait" builtin or specifying
SA_RESTART only for this special SIGCHLD handler (while leaving all
other signal handlers interrupting). In either case, behaviour for
SIGCHLD handlers installed because of CHLD traps should remain
unchanged. The former seems safest because SA_RESTART handlers still
cause visible effects such as short writes.

In case of a script doing 'trap "" CHLD', it is perhaps undesirable for
"wait" to block forever, given that wait(2) doesn't do that either.

-- 
Jilles Tjoelker

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Set SA_RESTART flag on SIGCHLD handler
  2012-06-22  9:36           ` Anders Kaseorg
@ 2012-06-26 10:03             ` Anders Kaseorg
  2012-06-26 12:19               ` Anders Kaseorg
  0 siblings, 1 reply; 6+ messages in thread
From: Anders Kaseorg @ 2012-06-26 10:03 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: dash, git

I wrote this greatly simplified skeleton of the failing Git test, and 
found that I can reproduce what looks like the same problem, fairly 
reliably:

$ rm -f fifo
$ mkfifo fifo
$ dash -c 'while echo -n .; do : < fifo & : > fifo; wait $!; done'
...............................dash: 1: cannot create fifo: Interrupted 
system call

After running this through strace, it seems clear what went wrong.  The 
parent’s open("fifo", O_WRONLY) is what causes the child’s open("fifo", 
O_RDONLY) to return.  If the child exits immediately, the parent may 
receive a SIGCHLD before its own open() returns.  Then when its SIGCHLD 
handler exits, open() returns EINTR.

That also explains why the SA_RESTART and restart-on-EINTR patches only 
turned it into a deadlock.  If the open() were restarted now, it would 
just block to wait for a _second_ reader of the FIFO, which will never 
arrive.

But if that’s what happened, it seems like a kernel bug to me! 
Shouldn’t an open() call only be restarted or return EINTR in the case 
that it hasn’t had any side effects yet and can safely be restarted?

Anders


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Set SA_RESTART flag on SIGCHLD handler
  2012-06-26 10:03             ` Anders Kaseorg
@ 2012-06-26 12:19               ` Anders Kaseorg
  2012-07-21 23:14                 ` Anders Kaseorg
  0 siblings, 1 reply; 6+ messages in thread
From: Anders Kaseorg @ 2012-06-26 12:19 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: dash, git

On Tue, 26 Jun 2012, Anders Kaseorg wrote:
> But if that’s what happened, it seems like a kernel bug to me!
> Shouldn’t an open() call only be restarted or return EINTR in the case
> that it hasn’t had any side effects yet and can safely be restarted?

I submitted a kernel patch:
http://marc.info/?l=linux-kernel&m=134071285509470

Anders

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Set SA_RESTART flag on SIGCHLD handler
  2012-06-26 12:19               ` Anders Kaseorg
@ 2012-07-21 23:14                 ` Anders Kaseorg
  0 siblings, 0 replies; 6+ messages in thread
From: Anders Kaseorg @ 2012-07-21 23:14 UTC (permalink / raw)
  To: Anders Kaseorg; +Cc: Jonathan Nieder, dash, git, 678852

On 06/26/2012 08:19 AM, Anders Kaseorg wrote:
>> But if that’s what happened, it seems like a kernel bug to me!
>> Shouldn’t an open() call only be restarted or return EINTR in the case
>> that it hasn’t had any side effects yet and can safely be restarted?
>
> I submitted a kernel patch:
> http://marc.info/?l=linux-kernel&m=134071285509470

That’s now merged as v3.5~31, v3.4.6~25, and v3.0.38~17.

Anders


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-07-21 23:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4F665E77.3060102@mit.edu>
     [not found] ` <20120319025201.GA24778@burratino>
     [not found]   ` <4F66A6CA.4010002@mit.edu>
     [not found]     ` <20120319033751.GA25237@burratino>
2012-06-21  7:48       ` [PATCH] Set SA_RESTART flag on SIGCHLD handler Anders Kaseorg
     [not found]         ` <20120622003628.GD4730@burratino>
2012-06-22  9:36           ` Anders Kaseorg
2012-06-26 10:03             ` Anders Kaseorg
2012-06-26 12:19               ` Anders Kaseorg
2012-07-21 23:14                 ` Anders Kaseorg
2012-06-22 22:02           ` Jilles Tjoelker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).