dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* heredoc and subshell
@ 2016-02-23 22:07 Oleg Bulatov
  2016-02-23 22:49 ` Eric Blake
  2016-02-23 23:18 ` Jilles Tjoelker
  0 siblings, 2 replies; 9+ messages in thread
From: Oleg Bulatov @ 2016-02-23 22:07 UTC (permalink / raw)
  To: dash

Hello,

trying to minimize a shell code I found an unobvious moment with heredocs and subshells.

Is it specified by POSIX how next code should be parsed? dash output for this code differs from bash and zsh.

--- code
prefix() { sed -e "s/^/$1:/"; }
DASH_CODE() { :; }

prefix A <<XXX && echo "$(prefix B <<XXX
echo line 1
XXX
echo line 2)" && prefix DASH_CODE <<DASH_CODE
echo line 3
XXX
echo line 4)"
echo line 5
DASH_CODE

--- bash 4.3.42 output:
A:echo line 3
B:echo line 1
line 2
DASH_CODE:echo line 4)"
DASH_CODE:echo line 5

--- dash 0.5.8 output:
A:echo line 1
B:echo line 2)" && prefix DASH_CODE <<DASH_CODE
B:echo line 3
line 4
line 5

-- 
Oleg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heredoc and subshell
  2016-02-23 22:07 heredoc and subshell Oleg Bulatov
@ 2016-02-23 22:49 ` Eric Blake
  2016-02-23 23:16   ` Eric Blake
       [not found]   ` <56CCE1E3.1060805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-02-23 23:18 ` Jilles Tjoelker
  1 sibling, 2 replies; 9+ messages in thread
From: Eric Blake @ 2016-02-23 22:49 UTC (permalink / raw)
  To: Oleg Bulatov, dash, Austin Group

[-- Attachment #1: Type: text/plain, Size: 3462 bytes --]

[adding the Austin Group]

On 02/23/2016 03:07 PM, Oleg Bulatov wrote:
> Hello,
> 
> trying to minimize a shell code I found an unobvious moment with heredocs and subshells.

Thanks for a cool testcase.

> 
> Is it specified by POSIX how next code should be parsed? dash output for this code differs from bash and zsh.

XCU 2.3 says:

When an io_here token has been recognized by the grammar (see Shell
Grammar), one or more of the subsequent lines immediately following the
next NEWLINE token form the body of one or more here-documents and shall
be parsed according to the rules of Here-Document.

and 2.7.4 says:

The here-document shall be treated as a single word that begins after
the next <newline> and continues until there is a line containing only
the delimiter and a <newline>, with no <blank> characters in between.
Then the next here-document starts, if there is one.

but with no mention of what happens if you somehow manage to make the
next <newline> be part of an incomplete shell word on the line
containing the here-doc operator.

> 
> --- code
> prefix() { sed -e "s/^/$1:/"; }
> DASH_CODE() { :; }
> 
> prefix A <<XXX && echo "$(prefix B <<XXX
> echo line 1
> XXX
> echo line 2)" && prefix DASH_CODE <<DASH_CODE
> echo line 3
> XXX
> echo line 4)"
> echo line 5
> DASH_CODE
> 
> --- bash 4.3.42 output:
> A:echo line 3
> B:echo line 1
> line 2
> DASH_CODE:echo line 4)"
> DASH_CODE:echo line 5

So, it looks like bash is interpreting this as "first newline that is
not in the middle of another shell word), and parses the entire $(...)
construct through line 2 as if there were no newlines, then treats the
newline after DASH_CODE as starting the heredoc, for outputting A: while
visiting line 3 as the lone line in that heredoc.  Then it moves on to
the second command in the && sequence, by processing the command
substitution (a heredoc outputting line 1, then the output of line 2;
then moves on to the third component of the && sequence as a final
heredoc delimited by DASH_CODE, with both lines 4 and 5 output with the
DASH_CODE: prefix.

> 
> --- dash 0.5.8 output:
> A:echo line 1
> B:echo line 2)" && prefix DASH_CODE <<DASH_CODE
> B:echo line 3
> line 4
> line 5
> 

Meanwhile, dash is taking the literal first newline as the start of the
first heredoc, and outputting A: with line 1; then consuming the next
heredoc as lines 2 and 3 before finding the end of the command
substitution on line 4, then outputting line 5 on its own and doing
nothing else for the DASH_CODE function call.

ksh 93u+ 2012-08-01 behaves even differently:

B:echo line 1
line 2 && prefix DASH_CODE <<DASH_CODE
echo line 3
XXX
echo line 4)
line 5

and I'm having a hard time explaining that one.  Even better, modify the
script a bit:

$ head -n1 foo
prefix() { echo " $1:"; sed -e "s/^/$1:/"; }

and now I see:

$ ksh ./foo
Segmentation fault (core dumped)

but only sometimes; other times I get:

 A:
 B:
B:echo line 1
line 2 && prefix DASH_CODE <<DASH_CODE
echo line 3
XXX
echo line 4)
line 5

so it looks like some data-dependent race is tickling a bug in ksh.

Maybe we need a defect against the standard that says behavior is
unspecified if the next <newline> after a here-doc operator occurs in
the middle of a shell word.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heredoc and subshell
  2016-02-23 22:49 ` Eric Blake
@ 2016-02-23 23:16   ` Eric Blake
       [not found]   ` <56CCE1E3.1060805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Blake @ 2016-02-23 23:16 UTC (permalink / raw)
  To: Oleg Bulatov, dash, Austin Group

[-- Attachment #1: Type: text/plain, Size: 1975 bytes --]

On 02/23/2016 03:49 PM, Eric Blake wrote:
> [adding the Austin Group]
> 
> On 02/23/2016 03:07 PM, Oleg Bulatov wrote:
>> Hello,
>>
>> trying to minimize a shell code I found an unobvious moment with heredocs and subshells.
> 
> Thanks for a cool testcase.
> 
>>
>> Is it specified by POSIX how next code should be parsed? dash output for this code differs from bash and zsh.
> 
> XCU 2.3 says:
> 
> When an io_here token has been recognized by the grammar (see Shell
> Grammar), one or more of the subsequent lines immediately following the
> next NEWLINE token form the body of one or more here-documents and shall
> be parsed according to the rules of Here-Document.
> 
> and 2.7.4 says:
> 
> The here-document shall be treated as a single word that begins after
> the next <newline> and continues until there is a line containing only
> the delimiter and a <newline>, with no <blank> characters in between.
> Then the next here-document starts, if there is one.
> 
> but with no mention of what happens if you somehow manage to make the
> next <newline> be part of an incomplete shell word on the line
> containing the here-doc operator.

As it is, all shells I tested have a shorter test case that proves they
don't always start looking for the heredoc body after the first newline:

$ dash -c 'cat <<ONE && cat \''
<<TWO
a
ONE
b
TWO
'
a
b

The newline immediately after the backslash is NOT used to start the
first heredoc.

> Maybe we need a defect against the standard that says behavior is
> unspecified if the next <newline> after a here-doc operator occurs in
> the middle of a shell word.

Or maybe refine the wording to state the first unescaped newline, since
backslash escaping seems to consistently work (and only newlines inside
incomplete command substitution is where the confusion begins).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heredoc and subshell
  2016-02-23 22:07 heredoc and subshell Oleg Bulatov
  2016-02-23 22:49 ` Eric Blake
@ 2016-02-23 23:18 ` Jilles Tjoelker
  2016-02-24  8:46   ` Thorsten Glaser
  1 sibling, 1 reply; 9+ messages in thread
From: Jilles Tjoelker @ 2016-02-23 23:18 UTC (permalink / raw)
  To: Oleg Bulatov; +Cc: dash

On Wed, Feb 24, 2016 at 01:07:44AM +0300, Oleg Bulatov wrote:
> trying to minimize a shell code I found an unobvious moment with
> heredocs and subshells.

> Is it specified by POSIX how next code should be parsed? dash output
> for this code differs from bash and zsh.

> --- code
> prefix() { sed -e "s/^/$1:/"; }
> DASH_CODE() { :; }
> 
> prefix A <<XXX && echo "$(prefix B <<XXX
> echo line 1
> XXX
> echo line 2)" && prefix DASH_CODE <<DASH_CODE
> echo line 3
> XXX
> echo line 4)"
> echo line 5
> DASH_CODE

> --- bash 4.3.42 output:
> A:echo line 3
> B:echo line 1
> line 2
> DASH_CODE:echo line 4)"
> DASH_CODE:echo line 5

> --- dash 0.5.8 output:
> A:echo line 1
> B:echo line 2)" && prefix DASH_CODE <<DASH_CODE
> B:echo line 3
> line 4
> line 5

I think POSIX is clear that the bash/zsh behaviour is correct and the
dash behaviour is wrong. In XCU 2.6.3 Command Substitution, it says:

] With the $(command) form, all characters following the open
] parenthesis to the matching closing parenthesis constitute the
] command.

Therefore, the shell should not start reading the here-document
belonging to  prefix A <<XXX  while it is still inside the command
substitution  $(prefix B <<XXX.

Instead, the here-document belonging to  prefix B <<XXX  should be read.
The line ending the command substitution contains another <<
redirection; the here-documents are read in order of the <<
redirections.

In FreeBSD sh, another ash derivative, I fixed this in
FreeBSD SVN r208655,
https://github.com/freebsd/freebsd/commit/930ce3922652c50fc8b621b14b6238b325d7f16f

Interestingly, mksh parses this in yet another way. In unmodified form,
it fails with  here document 'XXX' unclosed. After appending an XXX
line, it outputs:

A:echo line 5
A:DASH_CODE
B:echo line 2)" && prefix DASH_CODE <<DASH_CODE
B:echo line 3
line 4

The here-document containing line 1 seems lost entirely.

The ksh93 93u+ 2012-08-01 from FreeBSD ports segfaults while executing
the script.

Concludingly, it seems unwise to rely on this construct in scripts to be
distributed.

-- 
Jilles Tjoelker

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heredoc and subshell
  2016-02-23 23:18 ` Jilles Tjoelker
@ 2016-02-24  8:46   ` Thorsten Glaser
  0 siblings, 0 replies; 9+ messages in thread
From: Thorsten Glaser @ 2016-02-24  8:46 UTC (permalink / raw)
  To: Jilles Tjoelker; +Cc: Oleg Bulatov, dash, miros-mksh

(Thanks to izabera for bringing this to my attention!)

On Wed, 24 Feb 2016, Jilles Tjoelker wrote:
> On Wed, Feb 24, 2016 at 01:07:44AM +0300, Oleg Bulatov wrote:

> > --- bash 4.3.42 output:

That’s the output I’d expect, from manually reading this.

> Interestingly, mksh parses this in yet another way. In unmodified form,
> it fails with  here document 'XXX' unclosed.

Yeah, I noticed… yay another bug to fix (just had two last night)…

> After appending an XXX line, it outputs:

Ah okay, thanks for this hint, that should make debugging easier.

bye,
//mirabilos
-- 
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font.	-- Rob Pike in "Notes on Programming in C"

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heredoc and subshell
       [not found]   ` <56CCE1E3.1060805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-02-24 10:58     ` Joerg Schilling
       [not found]       ` <56cd8cd1.LdVYbpfe+0tQMm8I%Joerg.Schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org>
  2016-02-24 20:27       ` Oleg Bulatov
  0 siblings, 2 replies; 9+ messages in thread
From: Joerg Schilling @ 2016-02-24 10:58 UTC (permalink / raw)
  To: oleg-BA2faJl1U0pBDLzU/O5InQ, eblake-H+wXaHxf7aLQT0dZR+AlfA,
	dash-u79uwXL29TY76Z2rM5mHXA,
	austin-group-l-7882/jkIBncuagvECLh61g

Eric Blake <eblake-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> > --- code
> > prefix() { sed -e "s/^/$1:/"; }
> > DASH_CODE() { :; }
> > 
> > prefix A <<XXX && echo "$(prefix B <<XXX
> > echo line 1
> > XXX
> > echo line 2)" && prefix DASH_CODE <<DASH_CODE
> > echo line 3
> > XXX
> > echo line 4)"
> > echo line 5
> > DASH_CODE
> > 
> > --- bash 4.3.42 output:
> > A:echo line 3
> > B:echo line 1
> > line 2
> > DASH_CODE:echo line 4)"
> > DASH_CODE:echo line 5
>
> So, it looks like bash is interpreting this as "first newline that is
> not in the middle of another shell word), and parses the entire $(...)

I would like to get an explanation on what I should understand by:

	"first newline that is not in the middle of another shell word)


BTW: If I replace $(..) by `..` and feed the code to the original SVr4 Bourne 
Shell, I get the same output as you got from bash. I would guess that the bash 
output you added above is correct.

Note that the command substitution is part of a " quoted string and even 
without that, it would need to be parsed first.

The POSIX version from ksh88 (seen on Solaris) behaves the same with $(..) and 
the `..` variant and the same as bash.

The fact that ksh93 behaves different with $(..) and the `..` variant makes it 
obvious that ksh93 has a bug.

Jörg

-- 
 EMail:joerg-lSlhzV3CM+2sTnJN9+BGXg@public.gmane.org                    (home) Jörg Schilling D-13353 Berlin
       joerg.schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/'

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heredoc and subshell
       [not found]       ` <56cd8cd1.LdVYbpfe+0tQMm8I%Joerg.Schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org>
@ 2016-02-24 12:27         ` Shware Systems
  2016-02-24 12:32           ` Joerg Schilling
  0 siblings, 1 reply; 9+ messages in thread
From: Shware Systems @ 2016-02-24 12:27 UTC (permalink / raw)
  To: oleg-BA2faJl1U0pBDLzU/O5InQ, eblake-H+wXaHxf7aLQT0dZR+AlfA,
	dash-u79uwXL29TY76Z2rM5mHXA,
	austin-group-l-7882/jkIBncuagvECLh61g,
	Joerg.Schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh

[-- Attachment #1: Type: text/plain, Size: 2881 bytes --]

Near as I can tell, from 2.3, io_here has precedence over string concatenation of normal tokenization so the output of bash should be:
A:echo line 1
Syntax error: Missing right paren
Syntax error: Missing closing double quote

as the list production terminates at the first newline, and a new one doesn't start until after the second line with XXX to satisfy the second io_here. It can be argued provisions for trying to find the end of an open string should continue after the io_here body, but Item 7 of 2.3 has the newline terminating it.

The first newline is the one left after line joining has been done. Before tokenization starts the input file position should be pointing to the character after it so the first io_here can immediately start reading the body from there until the first XXX line, and that advances it so the second might be processed, and the line after starts a new list, not continues as if no io_here was present. The syntax error detection happens before the second block gets consumed, but the presence of the operator while scanning for the right paren reserves it.



On Wednesday, February 24, 2016 Joerg Schilling <Joerg.Schilling-8LS2qeF34IoQrrorzV6ljw@public.gmane.orgnhofer.de> wrote:
Eric Blake <eblake-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> > --- code
> > prefix() { sed -e "s/^/$1:/"; }
> > DASH_CODE() { :; }
> > 
> > prefix A <<XXX && echo "$(prefix B <<XXX
> > echo line 1
> > XXX
> > echo line 2)" && prefix DASH_CODE <<DASH_CODE
> > echo line 3
> > XXX
> > echo line 4)"
> > echo line 5
> > DASH_CODE
> > 
> > --- bash 4.3.42 output:
> > A:echo line 3
> > B:echo line 1
> > line 2
> > DASH_CODE:echo line 4)"
> > DASH_CODE:echo line 5
>
> So, it looks like bash is interpreting this as "first newline that is
> not in the middle of another shell word), and parses the entire $(...)

I would like to get an explanation on what I should understand by:

	"first newline that is not in the middle of another shell word)


BTW: If I replace $(..) by `..` and feed the code to the original SVr4 Bourne 
Shell, I get the same output as you got from bash. I would guess that the bash 
output you added above is correct.

Note that the command substitution is part of a " quoted string and even 
without that, it would need to be parsed first.

The POSIX version from ksh88 (seen on Solaris) behaves the same with $(..) and 
the `..` variant and the same as bash.

The fact that ksh93 behaves different with $(..) and the `..` variant makes it 
obvious that ksh93 has a bug.

Jörg

-- 
EMail:joerg-lSlhzV3CM+2sTnJN9+BGXg@public.gmane.org (home) Jörg Schilling D-13353 Berlin
joerg.schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/'


[-- Attachment #2: Type: text/html, Size: 4026 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heredoc and subshell
  2016-02-24 12:27         ` Shware Systems
@ 2016-02-24 12:32           ` Joerg Schilling
  0 siblings, 0 replies; 9+ messages in thread
From: Joerg Schilling @ 2016-02-24 12:32 UTC (permalink / raw)
  To: shwaresyst, oleg, eblake, dash, austin-group-l

Shware Systems <shwaresyst@aol.com> wrote:

> Near as I can tell, from 2.3, io_here has precedence over string concatenation of normal tokenization so the output of bash should be:
> A:echo line 1
> Syntax error: Missing right paren
> Syntax error: Missing closing double quote
>
> as the list production terminates at the first newline, and a new one doesn't start until after the second line with XXX to satisfy the second io_here. It can be argued provisions for trying to find the end of an open string should continue after the io_here body, but Item 7 of 2.3 has the newline terminating it.

So you are arguing that $(...) and `...` behave different?

The interesting news is that the Schily Bourne Shell and mksh currently behave 
similar to what you like to see.

Jörg

-- 
 EMail:joerg@schily.net                    (home) Jörg Schilling D-13353 Berlin
       joerg.schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/'

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: heredoc and subshell
  2016-02-24 10:58     ` Joerg Schilling
       [not found]       ` <56cd8cd1.LdVYbpfe+0tQMm8I%Joerg.Schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org>
@ 2016-02-24 20:27       ` Oleg Bulatov
  1 sibling, 0 replies; 9+ messages in thread
From: Oleg Bulatov @ 2016-02-24 20:27 UTC (permalink / raw)
  To: Joerg Schilling, eblake, dash, austin-group-l

24.02.2016, 13:58, "Joerg Schilling" <Joerg.Schilling@fokus.fraunhofer.de>:
> BTW: If I replace $(..) by `..` and feed the code to the original SVr4 Bourne
> Shell, I get the same output as you got from bash. I would guess that the bash
> output you added above is correct.

The behavior of `..` with heredoc has also interesting corner cases:

$ cat x.sh 
cat <<END ; echo `echo a
echo b`
hello
END
$ dash x.sh 
echo ba
x.sh: 3: x.sh: hello: not found
x.sh: 4: x.sh: END: not found
$ bash x.sh
hello
a b
$

On the one hand it reads heredoc, on the other hand it processes closing of the subshell.

-- 
Oleg

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-02-24 20:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-23 22:07 heredoc and subshell Oleg Bulatov
2016-02-23 22:49 ` Eric Blake
2016-02-23 23:16   ` Eric Blake
     [not found]   ` <56CCE1E3.1060805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-02-24 10:58     ` Joerg Schilling
     [not found]       ` <56cd8cd1.LdVYbpfe+0tQMm8I%Joerg.Schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org>
2016-02-24 12:27         ` Shware Systems
2016-02-24 12:32           ` Joerg Schilling
2016-02-24 20:27       ` Oleg Bulatov
2016-02-23 23:18 ` Jilles Tjoelker
2016-02-24  8:46   ` Thorsten Glaser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).