dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Changes to job handling cause hangs in wait
@ 2020-11-16 10:29 Andrej Shadura
  2020-11-17  3:33 ` Herbert Xu
  2020-12-01  5:38 ` Herbert Xu
  0 siblings, 2 replies; 21+ messages in thread
From: Andrej Shadura @ 2020-11-16 10:29 UTC (permalink / raw)
  To: dash; +Cc: Michael Biebl

Hi,

I’d like to forward a bug report I received. Michael, who reported it,
was able to bisect it to 6c691b3, which was the first of the series of
commits changing job handling.

I must note that I had troubles debugging it: I was only able to
reproduce the issue in an autopkgtest virtual machine, but not on my
normal system nor in a fresh Debian unstable Docker container.

I’m considering reverting some of the relevant commits, but it’d be best
if we could track this down and get it fixed.

The full details can be found at: https://bugs.debian.org/974705

Thanks.

-- 
Cheers,
  Andrej

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-11-16 10:29 Changes to job handling cause hangs in wait Andrej Shadura
@ 2020-11-17  3:33 ` Herbert Xu
  2020-12-01  5:38 ` Herbert Xu
  1 sibling, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2020-11-17  3:33 UTC (permalink / raw)
  To: Andrej Shadura; +Cc: dash, Michael Biebl

On Mon, Nov 16, 2020 at 10:29:16AM +0000, Andrej Shadura wrote:
>
> I must note that I had troubles debugging it: I was only able to
> reproduce the issue in an autopkgtest virtual machine, but not on my
> normal system nor in a fresh Debian unstable Docker container.

Since you can reproduce it, could you please run the offending script
under strace -f and send me the result? That should pin-point the
problem.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-11-16 10:29 Changes to job handling cause hangs in wait Andrej Shadura
  2020-11-17  3:33 ` Herbert Xu
@ 2020-12-01  5:38 ` Herbert Xu
  2020-12-01  5:42   ` Herbert Xu
  1 sibling, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2020-12-01  5:38 UTC (permalink / raw)
  To: Andrej Shadura; +Cc: dash, Michael Biebl, 974705

On Mon, Nov 16, 2020 at 10:29:16AM +0000, Andrej Shadura wrote:
> 
> I’d like to forward a bug report I received. Michael, who reported it,
> was able to bisect it to 6c691b3, which was the first of the series of
> commits changing job handling.
> 
> I must note that I had troubles debugging it: I was only able to
> reproduce the issue in an autopkgtest virtual machine, but not on my
> normal system nor in a fresh Debian unstable Docker container.

FWIW I'm unable to reproduce it with autopkgtest.  This is what
I get:

root@test0:~# autopkgtest --test-name=timedated systemd-246.6/ -B -- lxc -s autopkgtest-sid
autopkgtest [16:32:45]: starting date: 2020-12-01
autopkgtest [16:32:45]: version 5.15
autopkgtest [16:32:45]: host test0; command line: /usr/bin/autopkgtest --test-name=timedated systemd-246.6/ -B -- lxc -s autopkgtest-sid
autopkgtest [16:33:13]: testbed dpkg architecture: amd64
autopkgtest [16:33:13]: testbed running kernel: Linux 5.9.0-rc1+ #18 SMP PREEMPT Sat Aug 29 23:45:30 AEST 2020
autopkgtest [16:33:13]: @@@@@@@@@@@@@@@@@@@@ unbuilt-tree systemd-246.6/
autopkgtest [16:33:18]: testing package systemd version 246.6-5
autopkgtest [16:33:18]: build not needed
autopkgtest [16:33:18]: test timedated: preparing testbed
Reading package lists... Done
Building dependency tree
Reading state information... Done
Correcting dependencies...Starting pkgProblemResolver with broken count: 0
Starting 2 pkgProblemResolver with broken count: 0
Done
 Done
Starting pkgProblemResolver with broken count: 0
Starting 2 pkgProblemResolver with broken count: 0
Done
The following additional packages will be installed:
  acl libnss-systemd
The following NEW packages will be installed:
  acl libnss-systemd
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
1 not fully installed or removed.
Need to get 255 kB of archives.
After this operation, 577 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian sid/main amd64 libnss-systemd amd64 246.6-5 [194 kB]
Get:2 http://deb.debian.org/debian sid/main amd64 acl amd64 2.2.53-8 [60.8 kB]
Fetched 255 kB in 1s (489 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libnss-systemd:amd64.
(Reading database ... 12641 files and directories currently installed.)
Preparing to unpack .../libnss-systemd_246.6-5_amd64.deb ...
Unpacking libnss-systemd:amd64 (246.6-5) ...
Selecting previously unselected package acl.
Preparing to unpack .../acl_2.2.53-8_amd64.deb ...
Unpacking acl (2.2.53-8) ...
Setting up libnss-systemd:amd64 (246.6-5) ...
First installation detected...
Checking NSS setup...
Setting up acl (2.2.53-8) ...
Setting up autopkgtest-satdep (0) ...
Processing triggers for libc-bin (2.31-4) ...
(Reading database ... 12667 files and directories currently installed.)
Removing autopkgtest-satdep (0) ...
autopkgtest [16:33:29]: test timedated: [-----------------------
original tz: Etc/UTC
timedatectl works
change timezone
reset timezone to original
no adjtime file
UTC set in adjtime file
non-zero values in adjtime file
fourth line adjtime file
no final newline in adjtime file
only one line in adjtime file
only one line in adjtime file, no final newline
only two lines in adjtime file
only two lines in adjtime file, no final newline
unknown value in 3rd line of adjtime file
disable NTP
enable NTP
re-disable NTP
autopkgtest [16:33:32]: test timedated: -----------------------]
autopkgtest [16:33:32]: test timedated:  - - - - - - - - - - results - - - - - - - - - -
timedated            PASS
autopkgtest [16:33:32]: @@@@@@@@@@@@@@@@@@@@ summary
timedated            PASS
root@test0:~# 

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01  5:38 ` Herbert Xu
@ 2020-12-01  5:42   ` Herbert Xu
  2020-12-01  6:06     ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2020-12-01  5:42 UTC (permalink / raw)
  To: Andrej Shadura; +Cc: dash, Michael Biebl, 974705

On Tue, Dec 01, 2020 at 04:38:37PM +1100, Herbert Xu wrote:
> 
> FWIW I'm unable to reproduce it with autopkgtest.  This is what
> I get:
> 
> root@test0:~# autopkgtest --test-name=timedated systemd-246.6/ -B -- lxc -s autopkgtest-sid
> autopkgtest [16:32:45]: starting date: 2020-12-01
> autopkgtest [16:32:45]: version 5.15
> autopkgtest [16:32:45]: host test0; command line: /usr/bin/autopkgtest --test-name=timedated systemd-246.6/ -B -- lxc -s autopkgtest-sid
> autopkgtest [16:33:13]: testbed dpkg architecture: amd64
> autopkgtest [16:33:13]: testbed running kernel: Linux 5.9.0-rc1+ #18 SMP PREEMPT Sat Aug 29 23:45:30 AEST 2020
> autopkgtest [16:33:13]: @@@@@@@@@@@@@@@@@@@@ unbuilt-tree systemd-246.6/
> autopkgtest [16:33:18]: testing package systemd version 246.6-5

Nevermind, I see that the script has been modified to use bash.

I can reproduce the problem now so it's all good.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01  5:42   ` Herbert Xu
@ 2020-12-01  6:06     ` Herbert Xu
  2020-12-01  6:56       ` Herbert Xu
  2020-12-01 10:14       ` Harald van Dijk
  0 siblings, 2 replies; 21+ messages in thread
From: Herbert Xu @ 2020-12-01  6:06 UTC (permalink / raw)
  To: Andrej Shadura; +Cc: dash, Michael Biebl, 974705

On Tue, Dec 01, 2020 at 04:42:03PM +1100, Herbert Xu wrote:
>
> Nevermind, I see that the script has been modified to use bash.
> 
> I can reproduce the problem now so it's all good.

OK the problem is this:

	sh -c 'sleep 1d& exec $MYSHELL -c "sleep 1& wait"'

You can replace MYSHELL with whatever shell you want to use.

Essentially dash will now wait for all children, even ones that
were created prior to its existence, however, bash only waits for
children that it created directly.

FWIW ksh exhibits the same behaviour as dash and I think there
is nothing wrong with this.

So the problem is really in the parent of this shell, which appears
to be bash:

bash -c set -e; export USER=`id -nu`; . /etc/profile >/dev/null 2>&1 || true;  . ~/.profile >/dev/null 2>&1 || true; buildtree="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree"; mkdir -p -m 1777 -- "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export AUTOPKGTEST_ARTIFACTS="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export ADT_ARTIFACTS="$AUTOPKGTEST_ARTIFACTS"; mkdir -p -m 755 "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export AUTOPKGTEST_TMP="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export ADTTMP="$AUTOPKGTEST_TMP"; export DEBIAN_FRONTEND=noninteractive; export LANG=C.UTF-8; export DEB_BUILD_OPTIONS=parallel=2; unset LANGUAGE LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE   LC_MONETARY LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS   LC_TELEPHONE LC_MEASUREMENT LC_IDENTIFICATION LC_ALL;rm -f /tmp/autopkgtest_script_pid; set -C; echo $$ > /tmp/autopkgtest_script_pid; set +C; trap "rm -f /tmp/autopkgtest_script_pid" EXIT INT QUIT 
 PIPE; cd "$buildtree"; export AUTOPKGTEST_NORMAL_USER=; export ADT_NORMAL_USER=; chmod +x /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated; touch /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr; /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated 2> >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr >&2) > >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout);

For some reason this is causing the final two tee's to be created
as children of debian/tests/timedated rather than the bash shell.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01  6:06     ` Herbert Xu
@ 2020-12-01  6:56       ` Herbert Xu
  2020-12-01 23:21         ` Michael Biebl
  2020-12-03 12:49         ` Michael Biebl
  2020-12-01 10:14       ` Harald van Dijk
  1 sibling, 2 replies; 21+ messages in thread
From: Herbert Xu @ 2020-12-01  6:56 UTC (permalink / raw)
  To: Andrej Shadura; +Cc: dash, Michael Biebl, 974705

On Tue, Dec 01, 2020 at 05:06:18PM +1100, Herbert Xu wrote:
>
> For some reason this is causing the final two tee's to be created
> as children of debian/tests/timedated rather than the bash shell.

An alternative to changing the parent is of course to do

	wait $MONPID

instead of

	wait

I think this makes more sense anyway as otherwise someone could
easily introduce a hang if they unwittingly add a backgrounded
job to this script.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01  6:06     ` Herbert Xu
  2020-12-01  6:56       ` Herbert Xu
@ 2020-12-01 10:14       ` Harald van Dijk
  2020-12-01 10:34         ` Herbert Xu
  1 sibling, 1 reply; 21+ messages in thread
From: Harald van Dijk @ 2020-12-01 10:14 UTC (permalink / raw)
  To: Herbert Xu, Andrej Shadura; +Cc: dash, Michael Biebl, 974705

On 01/12/2020 06:06, Herbert Xu wrote:
> On Tue, Dec 01, 2020 at 04:42:03PM +1100, Herbert Xu wrote:
>>
>> Nevermind, I see that the script has been modified to use bash.
>>
>> I can reproduce the problem now so it's all good.
> 
> OK the problem is this:
> 
> 	sh -c 'sleep 1d& exec $MYSHELL -c "sleep 1& wait"'
> 
> You can replace MYSHELL with whatever shell you want to use.
> 
> Essentially dash will now wait for all children, even ones that
> were created prior to its existence, however, bash only waits for
> children that it created directly.
> 
> FWIW ksh exhibits the same behaviour as dash and I think there
> is nothing wrong with this.

POSIX says:

   "If  the  wait utility is invoked with no operands, it shall wait 
until all process IDs known to the invoking shell have terminated and 
exit with a zero exit status."

I would say that child processes that were created before dash was 
started do not have process IDs known to dash.

> So the problem is really in the parent of this shell, which appears
> to be bash:
> 
> bash -c set -e; export USER=`id -nu`; . /etc/profile >/dev/null 2>&1 || true;  . ~/.profile >/dev/null 2>&1 || true; buildtree="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree"; mkdir -p -m 1777 -- "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export AUTOPKGTEST_ARTIFACTS="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export ADT_ARTIFACTS="$AUTOPKGTEST_ARTIFACTS"; mkdir -p -m 755 "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export AUTOPKGTEST_TMP="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export ADTTMP="$AUTOPKGTEST_TMP"; export DEBIAN_FRONTEND=noninteractive; export LANG=C.UTF-8; export DEB_BUILD_OPTIONS=parallel=2; unset LANGUAGE LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE   LC_MONETARY LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS   LC_TELEPHONE LC_MEASUREMENT LC_IDENTIFICATION LC_ALL;rm -f /tmp/autopkgtest_script_pid; set -C; echo $$ > /tmp/autopkgtest_script_pid; set +C; trap "rm -f /tmp/autopkgtest_script_pid" EXIT INT QUIT PIPE; cd "$buildtree"; export AUTOPKGTEST_NORMAL_USER=; export ADT_NORMAL_USER=; chmod +x /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated; touch /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr; /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated 2> >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr >&2) > >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout);
> 
> For some reason this is causing the final two tee's to be created
> as children of debian/tests/timedated rather than the bash shell.

This is because of the same optimisation that dash also has, where it 
tries to avoid creating a subshell for the last command in a list when 
it can just exec() without a fork() instead. A minimal example without 
an explicit exec is

   bash -c 'dash -c ": & wait" <(sleep 1d)'

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01 10:14       ` Harald van Dijk
@ 2020-12-01 10:34         ` Herbert Xu
  2020-12-01 10:50           ` Harald van Dijk
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2020-12-01 10:34 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: Andrej Shadura, dash, Michael Biebl, 974705

On Tue, Dec 01, 2020 at 10:14:04AM +0000, Harald van Dijk wrote:
> 
> POSIX says:
> 
>   "If  the  wait utility is invoked with no operands, it shall wait until
> all process IDs known to the invoking shell have terminated and exit with a
> zero exit status."
> 
> I would say that child processes that were created before dash was started
> do not have process IDs known to dash.

Well I disagree and the fact that the original ksh does the same
thing is an important fact.

> > bash -c set -e; export USER=`id -nu`; . /etc/profile >/dev/null 2>&1 || true;  . ~/.profile >/dev/null 2>&1 || true; buildtree="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree"; mkdir -p -m 1777 -- "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export AUTOPKGTEST_ARTIFACTS="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export ADT_ARTIFACTS="$AUTOPKGTEST_ARTIFACTS"; mkdir -p -m 755 "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export AUTOPKGTEST_TMP="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export ADTTMP="$AUTOPKGTEST_TMP"; export DEBIAN_FRONTEND=noninteractive; export LANG=C.UTF-8; export DEB_BUILD_OPTIONS=parallel=2; unset LANGUAGE LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE   LC_MONETARY LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS   LC_TELEPHONE LC_MEASUREMENT LC_IDENTIFICATION LC_ALL;rm -f /tmp/autopkgtest_script_pid; set -C; echo $$ > /tmp/autopkgtest_script_pid; set +C; trap "rm -f /tmp/autopkgtest_script_pid" EXIT INT Q
 UIT PIPE; cd "$buildtree"; export AUTOPKGTEST_NORMAL_USER=; export ADT_NORMAL_USER=; chmod +x /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated; touch /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr; /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated 2> >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr >&2) > >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout);
> > 
> > For some reason this is causing the final two tee's to be created
> > as children of debian/tests/timedated rather than the bash shell.
> 
> This is because of the same optimisation that dash also has, where it tries
> to avoid creating a subshell for the last command in a list when it can just
> exec() without a fork() instead. A minimal example without an explicit exec
> is
> 
>   bash -c 'dash -c ": & wait" <(sleep 1d)'

I'm not sure about that because bash itself is still hanging around,
if it were really the -c optimisation then bash should not appear in
the ps output at all.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01 10:34         ` Herbert Xu
@ 2020-12-01 10:50           ` Harald van Dijk
  2020-12-01 10:53             ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Harald van Dijk @ 2020-12-01 10:50 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Andrej Shadura, dash, Michael Biebl, 974705

On 01/12/2020 10:34, Herbert Xu wrote:
> On Tue, Dec 01, 2020 at 10:14:04AM +0000, Harald van Dijk wrote:
>>
>> POSIX says:
>>
>>    "If  the  wait utility is invoked with no operands, it shall wait until
>> all process IDs known to the invoking shell have terminated and exit with a
>> zero exit status."
>>
>> I would say that child processes that were created before dash was started
>> do not have process IDs known to dash.
> 
> Well I disagree

Then dash still has a bug regardless:

   bash -c 'dash -c "wait" <(sleep 1d)'

Here, dash does not wait for the child process.

The sleep process is either known to dash, in which case dash must wait 
in both examples, or not known to dash, in which case dash must wait in 
neither example.

>                 and the fact that the original ksh does the same
> thing is an important fact.

The original ksh is well-known not to conform to the current POSIX 
requirements, in some cases because POSIX changed requirements, in some 
cases because ksh's behaviour was clearly buggy and not worth standardising.

>>> bash -c set -e; export USER=`id -nu`; . /etc/profile >/dev/null 2>&1 || true;  . ~/.profile >/dev/null 2>&1 || true; buildtree="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree"; mkdir -p -m 1777 -- "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export AUTOPKGTEST_ARTIFACTS="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-artifacts"; export ADT_ARTIFACTS="$AUTOPKGTEST_ARTIFACTS"; mkdir -p -m 755 "/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export AUTOPKGTEST_TMP="/tmp/autopkgtest-lxc.is4n6xxr/downtmp/autopkgtest_tmp"; export ADTTMP="$AUTOPKGTEST_TMP"; export DEBIAN_FRONTEND=noninteractive; export LANG=C.UTF-8; export DEB_BUILD_OPTIONS=parallel=2; unset LANGUAGE LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE   LC_MONETARY LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS   LC_TELEPHONE LC_MEASUREMENT LC_IDENTIFICATION LC_ALL;rm -f /tmp/autopkgtest_script_pid; set -C; echo $$ > /tmp/autopkgtest_script_pid; set +C; trap "rm -f /tmp/autopkgtest_script_pid" EXIT INT QUIT PIPE; cd "$buildtree"; export AUTOPKGTEST_NORMAL_USER=; export ADT_NORMAL_USER=; chmod +x /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated; touch /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr; /tmp/autopkgtest-lxc.is4n6xxr/downtmp/build.f2G/real-tree/debian/tests/timedated 2> >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stderr >&2) > >(tee -a /tmp/autopkgtest-lxc.is4n6xxr/downtmp/timedated-stdout);
>>>
>>> For some reason this is causing the final two tee's to be created
>>> as children of debian/tests/timedated rather than the bash shell.
>>
>> This is because of the same optimisation that dash also has, where it tries
>> to avoid creating a subshell for the last command in a list when it can just
>> exec() without a fork() instead. A minimal example without an explicit exec
>> is
>>
>>    bash -c 'dash -c ": & wait" <(sleep 1d)'
> 
> I'm not sure about that because bash itself is still hanging around,
> if it were really the -c optimisation then bash should not appear in
> the ps output at all.

Good point, it's a bit more complicated. Can take a more in-depth look 
at that example later.

Here's a simpler example without bash at all, then, where it is far 
clearer that the -c optimisation is responsible:

   dash -c 'sleep 1d & dash -c ": & wait"'

This used to exit immediately, leaving the sleep process running in the 
background without waiting for it. On the dash that's currently provided 
by Ubuntu, based on 0.5.10.2, it still behaves that way. With current 
dash from Git, it does not. This is clearly a change in behaviour in 
dash not in response to any bug (real or not) in bash.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01 10:50           ` Harald van Dijk
@ 2020-12-01 10:53             ` Herbert Xu
  2020-12-01 10:55               ` Harald van Dijk
  2020-12-01 13:04               ` Michael Biebl
  0 siblings, 2 replies; 21+ messages in thread
From: Herbert Xu @ 2020-12-01 10:53 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: Andrej Shadura, dash, Michael Biebl, 974705

On Tue, Dec 01, 2020 at 10:50:19AM +0000, Harald van Dijk wrote:
> 
> This used to exit immediately, leaving the sleep process running in the
> background without waiting for it. On the dash that's currently provided by
> Ubuntu, based on 0.5.10.2, it still behaves that way. With current dash from
> Git, it does not. This is clearly a change in behaviour in dash not in
> response to any bug (real or not) in bash.

I'm not suggesting it's a bug in bash.  If anything it's a bug
in the script.  You should never do a naked wait unless you are
sure that there are no other children around that you don't know of.

In the original bug, the proper solution is to wait on the PID
that the script just sent a kill signal to.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01 10:53             ` Herbert Xu
@ 2020-12-01 10:55               ` Harald van Dijk
  2020-12-01 10:56                 ` Herbert Xu
  2020-12-01 13:04               ` Michael Biebl
  1 sibling, 1 reply; 21+ messages in thread
From: Harald van Dijk @ 2020-12-01 10:55 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Andrej Shadura, dash, Michael Biebl, 974705

On 01/12/2020 10:53, Herbert Xu wrote:
> On Tue, Dec 01, 2020 at 10:50:19AM +0000, Harald van Dijk wrote:
>>
>> This used to exit immediately, leaving the sleep process running in the
>> background without waiting for it. On the dash that's currently provided by
>> Ubuntu, based on 0.5.10.2, it still behaves that way. With current dash from
>> Git, it does not. This is clearly a change in behaviour in dash not in
>> response to any bug (real or not) in bash.
> 
> I'm not suggesting it's a bug in bash.

You wrote: "So the problem is really in the parent of this shell, which 
appears to be bash:"

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01 10:55               ` Harald van Dijk
@ 2020-12-01 10:56                 ` Herbert Xu
  2020-12-01 10:59                   ` Harald van Dijk
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2020-12-01 10:56 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: Andrej Shadura, dash, Michael Biebl, 974705

On Tue, Dec 01, 2020 at 10:55:06AM +0000, Harald van Dijk wrote:
> 
> You wrote: "So the problem is really in the parent of this shell, which
> appears to be bash:"

You should read my follow-up email too that suggested changing
the systemd script.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01 10:56                 ` Herbert Xu
@ 2020-12-01 10:59                   ` Harald van Dijk
  2020-12-01 11:01                     ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Harald van Dijk @ 2020-12-01 10:59 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Andrej Shadura, dash, Michael Biebl, 974705

On 01/12/2020 10:56, Herbert Xu wrote:
> On Tue, Dec 01, 2020 at 10:55:06AM +0000, Harald van Dijk wrote:
>>
>> You wrote: "So the problem is really in the parent of this shell, which
>> appears to be bash:"
> 
> You should read my follow-up email too that suggested changing
> the systemd script.

I do not appreciate your false accusation that I did not read your 
follow-up email. You suggested that as an alternative, without 
retracting your claim that this is a problem in bash.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01 10:59                   ` Harald van Dijk
@ 2020-12-01 11:01                     ` Herbert Xu
  0 siblings, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2020-12-01 11:01 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: Andrej Shadura, dash, Michael Biebl, 974705

On Tue, Dec 01, 2020 at 10:59:20AM +0000, Harald van Dijk wrote:
> 
> I do not appreciate your false accusation that I did not read your follow-up
> email. You suggested that as an alternative, without retracting your claim
> that this is a problem in bash.

Well I'm not accusing you of anything, I was just trying to make
sure you are aware of my complete message.

In any case, I never said that this is a bug in bash, merely that
bash is involved in creating a situation where dash started out
with children in the background.

If anything this could just be an artefact of the bash script
rather than bash itself.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01 10:53             ` Herbert Xu
  2020-12-01 10:55               ` Harald van Dijk
@ 2020-12-01 13:04               ` Michael Biebl
  1 sibling, 0 replies; 21+ messages in thread
From: Michael Biebl @ 2020-12-01 13:04 UTC (permalink / raw)
  To: Herbert Xu, Harald van Dijk; +Cc: Andrej Shadura, dash, 974705

Am 01.12.20 um 11:53 schrieb Herbert Xu:
> On Tue, Dec 01, 2020 at 10:50:19AM +0000, Harald van Dijk wrote:
>>
>> This used to exit immediately, leaving the sleep process running in the
>> background without waiting for it. On the dash that's currently provided by
>> Ubuntu, based on 0.5.10.2, it still behaves that way. With current dash from
>> Git, it does not. This is clearly a change in behaviour in dash not in
>> response to any bug (real or not) in bash.
> 
> I'm not suggesting it's a bug in bash.  If anything it's a bug
> in the script.

Keep in mind, that the timedated script is not the only affected script, 
there is at least src:fence-agents as well and there might be other 
packages affected as well as local scripts which expect /bin/sh to 
behave POSIX compliant in that regard.

If this change in behaviour is kept, I would at least expect a big fat 
NEWS entry in the Debian dash package which is shown on upgrades and it 
should probably also update the package description which currently 
still says that it is a POSIX-compliant shell.

Regards,
Michael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01  6:56       ` Herbert Xu
@ 2020-12-01 23:21         ` Michael Biebl
  2020-12-01 23:26           ` Herbert Xu
  2020-12-03 12:49         ` Michael Biebl
  1 sibling, 1 reply; 21+ messages in thread
From: Michael Biebl @ 2020-12-01 23:21 UTC (permalink / raw)
  To: Herbert Xu, Andrej Shadura; +Cc: dash, 974705

Am 01.12.20 um 07:56 schrieb Herbert Xu:
> On Tue, Dec 01, 2020 at 05:06:18PM +1100, Herbert Xu wrote:
>>
>> For some reason this is causing the final two tee's to be created
>> as children of debian/tests/timedated rather than the bash shell.
> 
> An alternative to changing the parent is of course to do
> 
> 	wait $MONPID
> 
> instead of
> 
> 	wait
> 
> I think this makes more sense anyway as otherwise someone could
> easily introduce a hang if they unwittingly add a backgrounded
> job to this script.

I'll update the timedated test accordingly.
Incidentally we already have
https://salsa.debian.org/systemd-team/systemd/-/merge_requests/110

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01 23:21         ` Michael Biebl
@ 2020-12-01 23:26           ` Herbert Xu
  2020-12-02  5:31             ` [PATCH] jobs: Only block in waitcmd on first run Herbert Xu
  2020-12-03 12:27             ` Changes to job handling cause hangs in wait Michael Biebl
  0 siblings, 2 replies; 21+ messages in thread
From: Herbert Xu @ 2020-12-01 23:26 UTC (permalink / raw)
  To: Michael Biebl; +Cc: Andrej Shadura, dash, 974705

On Wed, Dec 02, 2020 at 12:21:52AM +0100, Michael Biebl wrote:
>
> I'll update the timedated test accordingly.
> Incidentally we already have
> https://salsa.debian.org/systemd-team/systemd/-/merge_requests/110

Thanks.

I will make dash not wait for existing children as this is indeed
a change in behaviour which is undesirable.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] jobs: Only block in waitcmd on first run
  2020-12-01 23:26           ` Herbert Xu
@ 2020-12-02  5:31             ` Herbert Xu
  2020-12-03 12:27             ` Changes to job handling cause hangs in wait Michael Biebl
  1 sibling, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2020-12-02  5:31 UTC (permalink / raw)
  To: Michael Biebl; +Cc: Andrej Shadura, dash, 974705

This patch ensures that waitcmd never blocks unless there are
outstanding jobs.  This could otherwise trigger a hang if children
were created prior to the shell coming into existence, or if
there are backgrounded children of other kinds (e.g., a here-
document).

Fixes: 6c691b3e5099 ("jobs: Only clear gotsigchld when waiting...")
Reported-by: Michael Biebl <biebl@debian.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/src/jobs.c b/src/jobs.c
index 3417633..516786f 100644
--- a/src/jobs.c
+++ b/src/jobs.c
@@ -81,6 +81,7 @@
 #define DOWAIT_NONBLOCK 0
 #define DOWAIT_BLOCK 1
 #define DOWAIT_WAITCMD 2
+#define DOWAIT_WAITCMD_ALL 4
 
 /* array of jobs */
 static struct job *jobtab;
@@ -615,7 +616,7 @@ waitcmd(int argc, char **argv)
 				jp->waited = 1;
 				jp = jp->prev_job;
 			}
-			if (!dowait(DOWAIT_WAITCMD, 0))
+			if (!dowait(DOWAIT_WAITCMD_ALL, 0))
 				goto sigout;
 		}
 	}
@@ -1138,6 +1139,7 @@ static int dowait(int block, struct job *jp)
 		pid = waitone(block, jp);
 		rpid &= !!pid;
 
+		block &= ~DOWAIT_WAITCMD_ALL;
 		if (!pid || (jp && jp->state != JOBRUNNING))
 			block = DOWAIT_NONBLOCK;
 	} while (pid >= 0);
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01 23:26           ` Herbert Xu
  2020-12-02  5:31             ` [PATCH] jobs: Only block in waitcmd on first run Herbert Xu
@ 2020-12-03 12:27             ` Michael Biebl
  2020-12-07  3:55               ` Herbert Xu
  1 sibling, 1 reply; 21+ messages in thread
From: Michael Biebl @ 2020-12-03 12:27 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Andrej Shadura, dash, 974705

Am 02.12.20 um 00:26 schrieb Herbert Xu:
> On Wed, Dec 02, 2020 at 12:21:52AM +0100, Michael Biebl wrote:
>>
>> I'll update the timedated test accordingly.
>> Incidentally we already have
>> https://salsa.debian.org/systemd-team/systemd/-/merge_requests/110
> 
> Thanks.

Hm, so I applied this change and switched backed to dash, but now I get

autopkgtest systemd --test-name=timedated -s -- lxc -s autopkgtest-sid
...
disable NTP
enable NTP
Terminated
autopkgtest [13:21:42]: test timedated: -----------------------]
autopkgtest [13:21:42]: test timedated:  - - - - - - - - - - results - - 
- - - - - - - -
timedated            FAIL non-zero exit status 143


Running with sh -x
...
+ kill 825
+ wait 825
Terminated


Herbert, any idea what going wrong there?
(systemd 247.1-2 and dash 0.5.11+git20200708+dd9ef66-2)

Michael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-01  6:56       ` Herbert Xu
  2020-12-01 23:21         ` Michael Biebl
@ 2020-12-03 12:49         ` Michael Biebl
  1 sibling, 0 replies; 21+ messages in thread
From: Michael Biebl @ 2020-12-03 12:49 UTC (permalink / raw)
  To: Herbert Xu, Andrej Shadura; +Cc: dash, 974705

Am 01.12.20 um 07:56 schrieb Herbert Xu:
> On Tue, Dec 01, 2020 at 05:06:18PM +1100, Herbert Xu wrote:
>>
>> For some reason this is causing the final two tee's to be created
>> as children of debian/tests/timedated rather than the bash shell.
> 
> An alternative to changing the parent is of course to do
> 
> 	wait $MONPID
> 
> instead of
> 
> 	wait
> 
> I think this makes more sense anyway as otherwise someone could
> easily introduce a hang if they unwittingly add a backgrounded
> job to this script.

I applied this change, but now the script terminates with 143 [1]

I'm happy for any suggestions to make the script more correct/robust and 
make it pass successfully.

Michael


[1] 
https://ci.debian.net/data/autopkgtest/testing/amd64/s/systemd/8600208/log.gz


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Changes to job handling cause hangs in wait
  2020-12-03 12:27             ` Changes to job handling cause hangs in wait Michael Biebl
@ 2020-12-07  3:55               ` Herbert Xu
  0 siblings, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2020-12-07  3:55 UTC (permalink / raw)
  To: Michael Biebl; +Cc: Andrej Shadura, dash, 974705

On Thu, Dec 03, 2020 at 01:27:51PM +0100, Michael Biebl wrote:
>
> Hm, so I applied this change and switched backed to dash, but now I get
> 
> autopkgtest systemd --test-name=timedated -s -- lxc -s autopkgtest-sid
> ...
> disable NTP
> enable NTP
> Terminated
> autopkgtest [13:21:42]: test timedated: -----------------------]
> autopkgtest [13:21:42]: test timedated:  - - - - - - - - - - results - - - -
> - - - - - -
> timedated            FAIL non-zero exit status 143
> 
> 
> Running with sh -x
> ...
> + kill 825
> + wait 825
> Terminated
> 
> 
> Herbert, any idea what going wrong there?
> (systemd 247.1-2 and dash 0.5.11+git20200708+dd9ef66-2)

Sorry, my bad.  wait(1) with no arguments ignores the error status
of the child and always return zero.  wait(1) specifically on a
child obviously returns the error status of that child.  Since the
child was killed, we need to ignore that, so this works for me:

	wait $MONPID 2> /dev/null || :

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-12-07  3:56 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-16 10:29 Changes to job handling cause hangs in wait Andrej Shadura
2020-11-17  3:33 ` Herbert Xu
2020-12-01  5:38 ` Herbert Xu
2020-12-01  5:42   ` Herbert Xu
2020-12-01  6:06     ` Herbert Xu
2020-12-01  6:56       ` Herbert Xu
2020-12-01 23:21         ` Michael Biebl
2020-12-01 23:26           ` Herbert Xu
2020-12-02  5:31             ` [PATCH] jobs: Only block in waitcmd on first run Herbert Xu
2020-12-03 12:27             ` Changes to job handling cause hangs in wait Michael Biebl
2020-12-07  3:55               ` Herbert Xu
2020-12-03 12:49         ` Michael Biebl
2020-12-01 10:14       ` Harald van Dijk
2020-12-01 10:34         ` Herbert Xu
2020-12-01 10:50           ` Harald van Dijk
2020-12-01 10:53             ` Herbert Xu
2020-12-01 10:55               ` Harald van Dijk
2020-12-01 10:56                 ` Herbert Xu
2020-12-01 10:59                   ` Harald van Dijk
2020-12-01 11:01                     ` Herbert Xu
2020-12-01 13:04               ` Michael Biebl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).