'return' from subshell in function doesn't

dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 'return' from subshell in function doesn't
       [not found] <S1726267AbgCHLof/20200308114435Z+1505@vger.kernel.org>
@ 2020-03-08 12:35 ` Dirk Fieldhouse
  2020-03-08 13:44   ` Harald van Dijk
  0 siblings, 1 reply; 6+ messages in thread
From: Dirk Fieldhouse @ 2020-03-08 12:35 UTC (permalink / raw)
  To: DASH mailing list

POSIX
<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#return>
says, and has since at least 2004:
"The return utility shall cause the shell to stop executing the current
function or dot script. If the shell is not currently executing a
function or dot script, the results are unspecified."

Clear enough, one would think, but consider this example:

foo() {
  echo "$1" |
    while read -r xx _; do
      if [ "$xx" = fum ]; then
        echo EQ
        return 0
      fi
    done
  echo NE
  return 1
}

According to the spec we expect:

$ foo fum || echo WTF
EQ
$

What actually happens, with DASH-0.5.8-2.1ubuntu2 and -0.5.9.1 built
from source:

$ foo fum || echo WTF
EQ
NE
WTF
$ foo baz || echo OK
NE
OK
$

Same with bash-4.3-14ubuntu1.4, busybox-static-1:1.22.0-15ubuntu1.4.

A simpler test case shows that the issue is 'return' not breaking out of
a subshell:

bar() {
( if [ "$1" = fum ]; then
        echo EQ
        return 0
      fi )
  echo NE
  return 1
}

barbar() {
  if [ "$1" = fum ]; then
    echo EQ
    return 0
  fi
  echo NE
  return 1
}

$ bar fum || echo WTF
EQ
NE
WTF
$ bar baz || echo OK
NE
OK
$ barbar fum || echo WTF
EQ
$

As POSIX refers to subshells explicitly elsewhere (eg 'exit') it's
difficult to believe that "subshell" was accidentally omitted from the
list of contexts that 'return' should return from, but implementation
behaviours consistently contradict the spec as written. Can they be made
conformant without breaking existing scripts?

A work-around is to make any subshell explicit and 'exit' from it:

foo_wa() {
  echo "$1" |
    ( while read -r xx _; do
      if [ "$xx" = fum ]; then
        echo EQ
        exit 0
      fi
    done; exit 1 ) && return
  ret=$?
  echo NE
  return $ret
}

$ foo_wa fum || echo WTF
EQ
$

--
London SW6
UK

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 'return' from subshell in function doesn't
  2020-03-08 12:35 ` 'return' from subshell in function doesn't Dirk Fieldhouse
@ 2020-03-08 13:44   ` Harald van Dijk
  2020-03-08 14:40     ` Dirk Fieldhouse
  0 siblings, 1 reply; 6+ messages in thread
From: Harald van Dijk @ 2020-03-08 13:44 UTC (permalink / raw)
  To: Dirk Fieldhouse, DASH mailing list

On 08/03/2020 12:35, Dirk Fieldhouse wrote:
> POSIX
> <https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#return> 
> 
> says, and has since at least 2004:
> "The return utility shall cause the shell to stop executing the current
> function or dot script. If the shell is not currently executing a
> function or dot script, the results are unspecified."
>[...]
> As POSIX refers to subshells explicitly elsewhere (eg 'exit') it's
> difficult to believe that "subshell" was accidentally omitted from the
> list of contexts that 'return' should return from, but implementation
> behaviours consistently contradict the spec as written. Can they be made
> conformant without breaking existing scripts?

In the subshell, the shell should not be considered to still be 
executing a function or dot script. As such, the results should be 
unspecified, and any behaviour should be valid. The standard may be 
underspecified here, but any other interpretation is not reasonable.

Subshells work by starting a new process. The parent process waits for 
the subshell to finish and acts on its exit status. The child process 
has very little ways to influence its parent process other than that, 
and the parent process might not even still be running by the time the 
child process gets to the return statement.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 'return' from subshell in function doesn't
  2020-03-08 13:44   ` Harald van Dijk
@ 2020-03-08 14:40     ` Dirk Fieldhouse
  2020-03-08 15:19       ` Harald van Dijk
  0 siblings, 1 reply; 6+ messages in thread
From: Dirk Fieldhouse @ 2020-03-08 14:40 UTC (permalink / raw)
  To: DASH mailing list; +Cc: Harald van Dijk

On 08/03/20 13:44, Harald van Dijk wrote:
> On 08/03/2020 12:35, Dirk Fieldhouse wrote:
>> POSIX
>> <https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#return>
>> says, and has since at least 2004:
>> "The return utility shall cause the shell to stop executing the current
>> function or dot script. If the shell is not currently executing a
>> function or dot script, the results are unspecified."
>> [...]
>> As POSIX refers to subshells explicitly elsewhere (eg 'exit') it's
>> difficult to believe that "subshell" was accidentally omitted from the
>> list of contexts that 'return' should return from, but implementation
>> behaviours consistently contradict the spec as written. Can they be made
>> conformant without breaking existing scripts?
>
> In the subshell, the shell should not be considered to still be
> executing a function or dot script. As such, the results should be
> unspecified, and any behaviour should be valid. The standard may be
> underspecified here, but any other interpretation is not reasonable.

Your argument here is essentially saying that the spec left out an
exception concerning subshells. If you read the spec without having
knowledge of existing shell internals, it's entirely reasonable (and IMO
desirable) to consider that a shell function is a lexical group, like a
script file, which is being executed as long as any command within the
function's defining compound command is running.

Otherwise the definition of a shell function would have to be limited to
certain types of compound command, ie excluding command substitution,
commands grouped with parentheses, asynchronous lists, and (under
implementation-specific circumstances) pipelines.

The behaviour that I expected is supported by
<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_05>:

"A function is a user-defined name that is used as a simple command to
call a compound command with new positional parameters. ...

The compound-command shall be executed whenever the function name is
specified as the name of a simple command (see Command Search and
Execution). ... If the special built-in return ... is executed in the
compound-command, the function completes and execution shall resume with
the next command after the function call."

> Subshells work by starting a new process. The parent process waits for
> the subshell to finish and acts on its exit status. The child process
> has very little ways to influence its parent process other than that,
> and the parent process might not even still be running by the time the
> child process gets to the return statement.

What the conforming implementation has to do shouldn't be of concern to
the shell programmer, especially since a subshell may, but need not, be
created implicitly in a pipeline; in particular any subshell processes
are transparent to the shell programmer ($! "shall expand to the same
value as that of the current shell"). What POSIX says presumably means
that the implementation should wait for any subprocesses, threads or
whatever spawned in the course of executing a function to complete
(subject to &) before continuing to execute the next command. If the
calling script process or some spawned thread of control gets killed
before the return can be executed, that's just an exception, the sort of
thing that traps exist for.

However, as your interpretation seems to have been widely made by shell
implementations, is it necessary to abandon the behaviour currently
specified in favour of a more pragmatic specification?

/df

--
London SW6
UK

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 'return' from subshell in function doesn't
  2020-03-08 14:40     ` Dirk Fieldhouse
@ 2020-03-08 15:19       ` Harald van Dijk
  2020-03-09  7:44         ` Stephane Chazelas
  2020-03-09 12:43         ` Dirk Fieldhouse
  0 siblings, 2 replies; 6+ messages in thread
From: Harald van Dijk @ 2020-03-08 15:19 UTC (permalink / raw)
  To: Dirk Fieldhouse, DASH mailing list

On 08/03/2020 14:40, Dirk Fieldhouse wrote:
> On 08/03/20 13:44, Harald van Dijk wrote:
>> Subshells work by starting a new process. The parent process waits for
>> the subshell to finish and acts on its exit status. The child process
>> has very little ways to influence its parent process other than that,
>> and the parent process might not even still be running by the time the
>> child process gets to the return statement.
> 
> What the conforming implementation has to do shouldn't be of concern to
> the shell programmer, especially since a subshell may, but need not, be
> created implicitly in a pipeline; in particular any subshell processes
> are transparent to the shell programmer ($! "shall expand to the same
> value as that of the current shell").

I think you meant $$ there, but this is the difference between theory 
and practice. In theory, the standard is perfect, and shell internals 
are irrelevant, we can just look at what the standard says. In practice, 
unfortunately, the standard is not perfect and there are numerous cases 
where the standard is either ambiguous or contradicts implementations, 
and where this is deemed a defect in the standard rather than in the 
implementations. It need not even be because what the standard specifies 
is unreasonable, it can just be because the what the standard specifies 
is unintended.

>                                       What POSIX says presumably means
> that the implementation should wait for any subprocesses, threads or
> whatever spawned in the course of executing a function to complete
> (subject to &) before continuing to execute the next command. If the
> calling script process or some spawned thread of control gets killed
> before the return can be executed, that's just an exception, the sort of
> thing that traps exist for.

Sure, for the parent process, but for the child process it leaves 
questions unanswered such as what the expected output would be of:

   f() {
     (kill -9 $$; return; echo hello)
   }
   f
   echo bye

This cannot print 'bye', but should it print 'hello'? The 'return' 
statement cannot return from 'f' if the main process is killed, so would 
the subshell just continue execution with the command after 'return'?

I would argue that even if you disagree that the behaviour should be 
unspecified in your original example, it should still be unspecified in 
mine.

> However, as your interpretation seems to have been widely made by shell
> implementations, is it necessary to abandon the behaviour currently
> specified in favour of a more pragmatic specification?

I suspect so. There is a case I forgot about though:

   f() (
     return 0
     echo bug
   )
   f

This should not print 'bug', and does not in any shell I can think to 
test. By your interpretation of the standard, this is currently 
specified. By mine, it would be unspecified, but I would agree that it 
should be fine for the whole function to be defined using a () compound 
command, and to contain a return statement directly inside it.

The same problem applies to the 'break' and 'continue' statements too:

   for var in x y z
   do
     echo $var
     (break)
   done

This prints x, y, and z in all shells, the 'break' statement in the 
subshell does not cause the loop to terminate. Some shells additionally 
print a warning or error message such as "break: not in a loop". Here 
again, presumably the intent of the standard is not that the 'break' 
statement should cause the loop to terminate. It is not something that 
shells do, and it is not something that is reasonable for shells to 
implement.

This is looking like a giant can of worms I'm not sure I'm ready to see 
opened. :)

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 'return' from subshell in function doesn't
  2020-03-08 15:19       ` Harald van Dijk
@ 2020-03-09  7:44         ` Stephane Chazelas
  2020-03-09 12:43         ` Dirk Fieldhouse
  1 sibling, 0 replies; 6+ messages in thread
From: Stephane Chazelas @ 2020-03-09  7:44 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: Dirk Fieldhouse, DASH mailing list

2020-03-08 15:19:01 +0000, Harald van Dijk:
[...]
> The same problem applies to the 'break' and 'continue' statements too:
> 
>   for var in x y z
>   do
>     echo $var
>     (break)
>   done
> 
> This prints x, y, and z in all shells, the 'break' statement in the subshell
> does not cause the loop to terminate. Some shells additionally print a
> warning or error message such as "break: not in a loop". Here again,
> presumably the intent of the standard is not that the 'break' statement
> should cause the loop to terminate. It is not something that shells do, and
> it is not something that is reasonable for shells to implement.
> 
> This is looking like a giant can of worms I'm not sure I'm ready to see
> opened. :)
[...]

See https://www.austingroupbugs.net/view.php?id=842 and its
resolution
(https://www.austingroupbugs.net/view.php?id=842#c2257) about that.

-- 
Stephane

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 'return' from subshell in function doesn't
  2020-03-08 15:19       ` Harald van Dijk
  2020-03-09  7:44         ` Stephane Chazelas
@ 2020-03-09 12:43         ` Dirk Fieldhouse
  1 sibling, 0 replies; 6+ messages in thread
From: Dirk Fieldhouse @ 2020-03-09 12:43 UTC (permalink / raw)
  To: DASH mailing list; +Cc: Harald van Dijk

On 08/03/20 15:19, Harald van Dijk wrote:
> On 08/03/2020 14:40, Dirk Fieldhouse wrote:
>> On 08/03/20 13:44, Harald van Dijk wrote:
>>> Subshells work by starting a new process. ...
>>
>> What the conforming implementation has to do shouldn't be of concern to
>> the shell programmer, especially since a subshell may, but need not, be
>> created implicitly in a pipeline; in particular any subshell processes
>> are transparent to the shell programmer ($! "shall expand to the same
>> value as that of the current shell").
>
> I think you meant $$ there, but this is the difference between theory
> and practice. In theory, the standard is perfect, and shell internals
> are irrelevant, we can just look at what the standard says. In practice,
> unfortunately, the standard is not perfect and there are numerous cases
> where the standard is either ambiguous or contradicts implementations,
> and where this is deemed a defect in the standard rather than in the
> implementations. It need not even be because what the standard specifies
> is unreasonable, it can just be because the what the standard specifies
> is unintended.

Yes obvs $$, thanks. If a supplier has to warrant conformance to the
standard they have a problem if what the standard says is universally
ignored. Something has to give. I think it's fair to say that
historically there has been convergence from both directions. But this
particular issue seems to have been a blind spot, perhaps because it
seems so obvious to the implementers who also work on the spec (we've
all been there).

>>                                       What POSIX says presumably means
>> that the implementation should wait for any subprocesses, threads or
>> whatever spawned in the course of executing a function to complete
>> (subject to &) before continuing to execute the next command. If the
>> calling script process or some spawned thread of control gets killed
>> before the return can be executed, that's just an exception, the sort of
>> thing that traps exist for.
>
> Sure, for the parent process, but for the child process it leaves
> questions unanswered such as what the expected output would be of:
>
>    f() {
>      (kill -9 $$; return; echo hello)
>    }
>    f
>    echo bye

If you kill the shell (you're not supposed to know that kill only kills
some main process) you shouldn't expect any subsequent command to have
run. A better name for this f() is cut_off_the_branch_I_am_sitting_on().

> I would argue that even if you disagree that the behaviour should be
> unspecified in your original example, it should still be unspecified in
> mine.
>
>> However, as your interpretation seems to have been widely made by shell
>> implementations, is it necessary to abandon the behaviour currently
>> specified in favour of a more pragmatic specification?
>
> I suspect so. There is a case I forgot about though:
>
>    f() (
>      return 0
>      echo bug
>    )
>    f
>
> This should not print 'bug', and does not in any shell I can think to
> test. By your interpretation of the standard, this is currently
> specified. By mine, it would be unspecified, but I would agree that it
> should be fine for the whole function to be defined using a () compound
> command, and to contain a return statement directly inside it.

+1. Apparently the consensus has been that 'return' in a subshell means
'exit'. But should someone write a test suite with a test case similar
to my original bar() against POSIX.1-2017 these implementations will all
fail to pass.

> The same problem applies to the 'break' and 'continue' statements too:
>
>    for var in x y z
>    do
>      echo $var
>      (break)
>    done

As Stephane (instigator of the relevant defect report) pointed out, this
has been addressed in the 2017 text, so that your 'break' example is
unspecified behaviour (unenclosed break or continue). Of course similar
wording could have been used to restrict the specified behaviour of
'return' as well -- but wasn't.

> This is looking like a giant can of worms I'm not sure I'm ready to see
> opened. :)

Otherwise it would have been sorted out before and I wouldn't have
raised it!

This <https://www.austingroupbugs.net/view.php?id=1042> POSIX DR touches
on the same issue but doesn't come to grips with 'return'.

This <https://www.austingroupbugs.net/view.php?id=1247> DR identifies at
least one other case where an implicit subshell is used.

I suppose further discussion should be at austin-group-l?

regards
/df

--
London SW6
UK

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-09 12:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <S1726267AbgCHLof/20200308114435Z+1505@vger.kernel.org>
2020-03-08 12:35 ` 'return' from subshell in function doesn't Dirk Fieldhouse
2020-03-08 13:44   ` Harald van Dijk
2020-03-08 14:40     ` Dirk Fieldhouse
2020-03-08 15:19       ` Harald van Dijk
2020-03-09  7:44         ` Stephane Chazelas
2020-03-09 12:43         ` Dirk Fieldhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).