From: Jilles Tjoelker <email@example.com>
To: Harald van Dijk <firstname.lastname@example.org>
Cc: busybox <email@example.com>, Martijn Dekker <firstname.lastname@example.org>,
DASH shell mailing list <email@example.com>,
Robert Elz <kre@munnari.OZ.AU>,
Bug reports for the GNU Bourne Again SHell <firstname.lastname@example.org>
Subject: Re: Bizarre interaction bug involving bash w/ lastpipe + Almquist 'wait'
Date: Sun, 9 Feb 2020 20:03:22 +0100 [thread overview]
Message-ID: <20200209190322.GA28226@stack.nl> (raw)
On Sat, Feb 08, 2020 at 06:39:38PM +0000, Harald van Dijk wrote:
> On 07/02/2020 02:41, Robert Elz wrote:
> > Date: Thu, 6 Feb 2020 16:12:06 +0000
> > From: Martijn Dekker <email@example.com>
> > Message-ID: <firstname.lastname@example.org>
> > | NetBSD sh behaves differently. NetBSD 8.1 sh (as installed on sdf.org
> > | and sdf-eu.org) seem to act completely normally, but NetBSD 9.0rc2 sh
> > | (on my VirtualBox test VM) segfaults. Output on NetBSD 9.0rc2:
> > I have updated my opinion on that, I think it is "don't have the bug",
> > though it is possible a blocked SIGCHLD acts differently on NetBSD than
> > on other systems. On NetBSD it seems to affect nothing (the shell does
> > not rely upon receiving SIGCHLD so not getting it is irrelevant) and
> > the wait code when given an arg (as your script did) would always wait
> > until that process exited, and return as soon as it did.
> I think you're right that this isn't SIGCHLD behaving differently on NetBSD,
> it's that NetBSD sh does not have the same problem the other ash-based
> shells do. The problem is with sigsuspend, which in dash looks like:
> > sigblockall(&oldmask);
> > while (!gotsigchld && !pending_sig)
> > sigsuspend(&oldmask);
> > sigclearmask();
> This clearly cannot work when oldmask blocks SIGCHLD.
> NetBSD sh does not use sigsuspend here, so avoids that problem.
> I changed gwsh to call sigclearmask() on shell startup, but plan to check
> whether this loop is really necessary at some later time. It was added to
> dash to fix a race condition, where that race condition was apparently
> introduced by a fix for another race condition. If NetBSD sh manages to
> avoid this pattern, and assuming NetBSD sh is not still susceptible to one
> of those race conditions, the fix for it in the other shells would seem to
> be more complicated than necessary, and simplifying things would be good.
I have not tested whether the bug actually happens in NetBSD sh but I
think the complexity is necessary. The problem is that the wait builtin
must wait for either process termination or a signal, and relying on an
[EINTR] error return to abort a blocking waitpid() or similar leaves a
window where a signal could come in after which the program goes asleep.
In a script this could look like
trap 'echo cleaning up; exit' TERM
and if a TERM signal comes in just before the wait system call is
invoked, the signal handler sets a flag but the trap is not taken until
a process terminates or another signal comes in.
FreeBSD sh also has a -T flag that causes traps to be taken immediately
while waiting for a process to terminate. This has the same issue with
waiting for process termination or a signal.
There are various solutions here:
* Make sure SIGCHLD is caught, reducing the problem to waiting for
signals only. This can then be done using sigsuspend() or sigwait().
Most ash variants that have closed this race window have chosen this
The SIGCHLD handler could be installed globally or only for the
duration of the wait builtin.
* Call longjmp() from the signal handler. The blocking wait will have to
be changed to waitid() with WNOWAIT so no exit statuses are lost when
a signal comes in just after waitid() returns.
Note that ash variants already call longjmp() from a SIGINT signal
handler in certain situations in interactive mode, so it is not a
really strange thing to do.
* Use musl's solution for [EINTR] in the context of pthread
cancellation, checking the saved program counter when a signal
arrives. Although theoretically portable, it requires writing
architecture-specific code in practice.
* Use FreeBSD libthr's solution for [EINTR] in the context of pthread
cancellation, asking the kernel to abort the next blocking system call
with [EINTR] immediately from the signal handler. This is not portable
to other kernels.
next prev parent reply other threads:[~2020-02-09 19:03 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-06 16:12 Bizarre interaction bug involving bash w/ lastpipe + Almquist 'wait' Martijn Dekker
2020-02-06 19:29 ` Harald van Dijk
2020-02-07 11:19 ` AW: " Walter Harms
2020-02-07 14:33 ` Martijn Dekker
2020-02-07 16:16 ` Chet Ramey
2020-02-06 20:43 ` Robert Elz
2020-02-07 2:41 ` Robert Elz
2020-02-08 18:39 ` Harald van Dijk
2020-02-09 19:03 ` Jilles Tjoelker [this message]
2020-02-18 16:46 ` Denys Vlasenko
2020-02-18 21:59 ` Harald van Dijk
2020-02-18 18:17 ` Robert Elz
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).