dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Oleg Bulatov <oleg@bulatov.me>,
	dash@vger.kernel.org, Austin Group <austin-group-l@opengroup.org>
Subject: Re: heredoc and subshell
Date: Tue, 23 Feb 2016 15:49:07 -0700	[thread overview]
Message-ID: <56CCE1E3.1060805@redhat.com> (raw)
In-Reply-To: <2978711456265264@web9h.yandex.ru>

[-- Attachment #1: Type: text/plain, Size: 3462 bytes --]

[adding the Austin Group]

On 02/23/2016 03:07 PM, Oleg Bulatov wrote:
> Hello,
> 
> trying to minimize a shell code I found an unobvious moment with heredocs and subshells.

Thanks for a cool testcase.

> 
> Is it specified by POSIX how next code should be parsed? dash output for this code differs from bash and zsh.

XCU 2.3 says:

When an io_here token has been recognized by the grammar (see Shell
Grammar), one or more of the subsequent lines immediately following the
next NEWLINE token form the body of one or more here-documents and shall
be parsed according to the rules of Here-Document.

and 2.7.4 says:

The here-document shall be treated as a single word that begins after
the next <newline> and continues until there is a line containing only
the delimiter and a <newline>, with no <blank> characters in between.
Then the next here-document starts, if there is one.

but with no mention of what happens if you somehow manage to make the
next <newline> be part of an incomplete shell word on the line
containing the here-doc operator.

> 
> --- code
> prefix() { sed -e "s/^/$1:/"; }
> DASH_CODE() { :; }
> 
> prefix A <<XXX && echo "$(prefix B <<XXX
> echo line 1
> XXX
> echo line 2)" && prefix DASH_CODE <<DASH_CODE
> echo line 3
> XXX
> echo line 4)"
> echo line 5
> DASH_CODE
> 
> --- bash 4.3.42 output:
> A:echo line 3
> B:echo line 1
> line 2
> DASH_CODE:echo line 4)"
> DASH_CODE:echo line 5

So, it looks like bash is interpreting this as "first newline that is
not in the middle of another shell word), and parses the entire $(...)
construct through line 2 as if there were no newlines, then treats the
newline after DASH_CODE as starting the heredoc, for outputting A: while
visiting line 3 as the lone line in that heredoc.  Then it moves on to
the second command in the && sequence, by processing the command
substitution (a heredoc outputting line 1, then the output of line 2;
then moves on to the third component of the && sequence as a final
heredoc delimited by DASH_CODE, with both lines 4 and 5 output with the
DASH_CODE: prefix.

> 
> --- dash 0.5.8 output:
> A:echo line 1
> B:echo line 2)" && prefix DASH_CODE <<DASH_CODE
> B:echo line 3
> line 4
> line 5
> 

Meanwhile, dash is taking the literal first newline as the start of the
first heredoc, and outputting A: with line 1; then consuming the next
heredoc as lines 2 and 3 before finding the end of the command
substitution on line 4, then outputting line 5 on its own and doing
nothing else for the DASH_CODE function call.

ksh 93u+ 2012-08-01 behaves even differently:

B:echo line 1
line 2 && prefix DASH_CODE <<DASH_CODE
echo line 3
XXX
echo line 4)
line 5

and I'm having a hard time explaining that one.  Even better, modify the
script a bit:

$ head -n1 foo
prefix() { echo " $1:"; sed -e "s/^/$1:/"; }

and now I see:

$ ksh ./foo
Segmentation fault (core dumped)

but only sometimes; other times I get:

 A:
 B:
B:echo line 1
line 2 && prefix DASH_CODE <<DASH_CODE
echo line 3
XXX
echo line 4)
line 5

so it looks like some data-dependent race is tickling a bug in ksh.

Maybe we need a defect against the standard that says behavior is
unspecified if the next <newline> after a here-doc operator occurs in
the middle of a shell word.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

  reply	other threads:[~2016-02-23 22:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-23 22:07 heredoc and subshell Oleg Bulatov
2016-02-23 22:49 ` Eric Blake [this message]
2016-02-23 23:16   ` Eric Blake
     [not found]   ` <56CCE1E3.1060805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-02-24 10:58     ` Joerg Schilling
     [not found]       ` <56cd8cd1.LdVYbpfe+0tQMm8I%Joerg.Schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org>
2016-02-24 12:27         ` Shware Systems
2016-02-24 12:32           ` Joerg Schilling
2016-02-24 20:27       ` Oleg Bulatov
2016-02-23 23:18 ` Jilles Tjoelker
2016-02-24  8:46   ` Thorsten Glaser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56CCE1E3.1060805@redhat.com \
    --to=eblake@redhat.com \
    --cc=austin-group-l@opengroup.org \
    --cc=dash@vger.kernel.org \
    --cc=oleg@bulatov.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).