* heredoc and subshell @ 2016-02-23 22:07 Oleg Bulatov 2016-02-23 22:49 ` Eric Blake 2016-02-23 23:18 ` Jilles Tjoelker 0 siblings, 2 replies; 9+ messages in thread From: Oleg Bulatov @ 2016-02-23 22:07 UTC (permalink / raw) To: dash Hello, trying to minimize a shell code I found an unobvious moment with heredocs and subshells. Is it specified by POSIX how next code should be parsed? dash output for this code differs from bash and zsh. --- code prefix() { sed -e "s/^/$1:/"; } DASH_CODE() { :; } prefix A <<XXX && echo "$(prefix B <<XXX echo line 1 XXX echo line 2)" && prefix DASH_CODE <<DASH_CODE echo line 3 XXX echo line 4)" echo line 5 DASH_CODE --- bash 4.3.42 output: A:echo line 3 B:echo line 1 line 2 DASH_CODE:echo line 4)" DASH_CODE:echo line 5 --- dash 0.5.8 output: A:echo line 1 B:echo line 2)" && prefix DASH_CODE <<DASH_CODE B:echo line 3 line 4 line 5 -- Oleg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: heredoc and subshell 2016-02-23 22:07 heredoc and subshell Oleg Bulatov @ 2016-02-23 22:49 ` Eric Blake 2016-02-23 23:16 ` Eric Blake [not found] ` <56CCE1E3.1060805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2016-02-23 23:18 ` Jilles Tjoelker 1 sibling, 2 replies; 9+ messages in thread From: Eric Blake @ 2016-02-23 22:49 UTC (permalink / raw) To: Oleg Bulatov, dash, Austin Group [-- Attachment #1: Type: text/plain, Size: 3462 bytes --] [adding the Austin Group] On 02/23/2016 03:07 PM, Oleg Bulatov wrote: > Hello, > > trying to minimize a shell code I found an unobvious moment with heredocs and subshells. Thanks for a cool testcase. > > Is it specified by POSIX how next code should be parsed? dash output for this code differs from bash and zsh. XCU 2.3 says: When an io_here token has been recognized by the grammar (see Shell Grammar), one or more of the subsequent lines immediately following the next NEWLINE token form the body of one or more here-documents and shall be parsed according to the rules of Here-Document. and 2.7.4 says: The here-document shall be treated as a single word that begins after the next <newline> and continues until there is a line containing only the delimiter and a <newline>, with no <blank> characters in between. Then the next here-document starts, if there is one. but with no mention of what happens if you somehow manage to make the next <newline> be part of an incomplete shell word on the line containing the here-doc operator. > > --- code > prefix() { sed -e "s/^/$1:/"; } > DASH_CODE() { :; } > > prefix A <<XXX && echo "$(prefix B <<XXX > echo line 1 > XXX > echo line 2)" && prefix DASH_CODE <<DASH_CODE > echo line 3 > XXX > echo line 4)" > echo line 5 > DASH_CODE > > --- bash 4.3.42 output: > A:echo line 3 > B:echo line 1 > line 2 > DASH_CODE:echo line 4)" > DASH_CODE:echo line 5 So, it looks like bash is interpreting this as "first newline that is not in the middle of another shell word), and parses the entire $(...) construct through line 2 as if there were no newlines, then treats the newline after DASH_CODE as starting the heredoc, for outputting A: while visiting line 3 as the lone line in that heredoc. Then it moves on to the second command in the && sequence, by processing the command substitution (a heredoc outputting line 1, then the output of line 2; then moves on to the third component of the && sequence as a final heredoc delimited by DASH_CODE, with both lines 4 and 5 output with the DASH_CODE: prefix. > > --- dash 0.5.8 output: > A:echo line 1 > B:echo line 2)" && prefix DASH_CODE <<DASH_CODE > B:echo line 3 > line 4 > line 5 > Meanwhile, dash is taking the literal first newline as the start of the first heredoc, and outputting A: with line 1; then consuming the next heredoc as lines 2 and 3 before finding the end of the command substitution on line 4, then outputting line 5 on its own and doing nothing else for the DASH_CODE function call. ksh 93u+ 2012-08-01 behaves even differently: B:echo line 1 line 2 && prefix DASH_CODE <<DASH_CODE echo line 3 XXX echo line 4) line 5 and I'm having a hard time explaining that one. Even better, modify the script a bit: $ head -n1 foo prefix() { echo " $1:"; sed -e "s/^/$1:/"; } and now I see: $ ksh ./foo Segmentation fault (core dumped) but only sometimes; other times I get: A: B: B:echo line 1 line 2 && prefix DASH_CODE <<DASH_CODE echo line 3 XXX echo line 4) line 5 so it looks like some data-dependent race is tickling a bug in ksh. Maybe we need a defect against the standard that says behavior is unspecified if the next <newline> after a here-doc operator occurs in the middle of a shell word. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 604 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: heredoc and subshell 2016-02-23 22:49 ` Eric Blake @ 2016-02-23 23:16 ` Eric Blake [not found] ` <56CCE1E3.1060805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 1 sibling, 0 replies; 9+ messages in thread From: Eric Blake @ 2016-02-23 23:16 UTC (permalink / raw) To: Oleg Bulatov, dash, Austin Group [-- Attachment #1: Type: text/plain, Size: 1975 bytes --] On 02/23/2016 03:49 PM, Eric Blake wrote: > [adding the Austin Group] > > On 02/23/2016 03:07 PM, Oleg Bulatov wrote: >> Hello, >> >> trying to minimize a shell code I found an unobvious moment with heredocs and subshells. > > Thanks for a cool testcase. > >> >> Is it specified by POSIX how next code should be parsed? dash output for this code differs from bash and zsh. > > XCU 2.3 says: > > When an io_here token has been recognized by the grammar (see Shell > Grammar), one or more of the subsequent lines immediately following the > next NEWLINE token form the body of one or more here-documents and shall > be parsed according to the rules of Here-Document. > > and 2.7.4 says: > > The here-document shall be treated as a single word that begins after > the next <newline> and continues until there is a line containing only > the delimiter and a <newline>, with no <blank> characters in between. > Then the next here-document starts, if there is one. > > but with no mention of what happens if you somehow manage to make the > next <newline> be part of an incomplete shell word on the line > containing the here-doc operator. As it is, all shells I tested have a shorter test case that proves they don't always start looking for the heredoc body after the first newline: $ dash -c 'cat <<ONE && cat \'' <<TWO a ONE b TWO ' a b The newline immediately after the backslash is NOT used to start the first heredoc. > Maybe we need a defect against the standard that says behavior is > unspecified if the next <newline> after a here-doc operator occurs in > the middle of a shell word. Or maybe refine the wording to state the first unescaped newline, since backslash escaping seems to consistently work (and only newlines inside incomplete command substitution is where the confusion begins). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 604 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <56CCE1E3.1060805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: heredoc and subshell [not found] ` <56CCE1E3.1060805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2016-02-24 10:58 ` Joerg Schilling [not found] ` <56cd8cd1.LdVYbpfe+0tQMm8I%Joerg.Schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org> 2016-02-24 20:27 ` Oleg Bulatov 0 siblings, 2 replies; 9+ messages in thread From: Joerg Schilling @ 2016-02-24 10:58 UTC (permalink / raw) To: oleg-BA2faJl1U0pBDLzU/O5InQ, eblake-H+wXaHxf7aLQT0dZR+AlfA, dash-u79uwXL29TY76Z2rM5mHXA, austin-group-l-7882/jkIBncuagvECLh61g Eric Blake <eblake-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > --- code > > prefix() { sed -e "s/^/$1:/"; } > > DASH_CODE() { :; } > > > > prefix A <<XXX && echo "$(prefix B <<XXX > > echo line 1 > > XXX > > echo line 2)" && prefix DASH_CODE <<DASH_CODE > > echo line 3 > > XXX > > echo line 4)" > > echo line 5 > > DASH_CODE > > > > --- bash 4.3.42 output: > > A:echo line 3 > > B:echo line 1 > > line 2 > > DASH_CODE:echo line 4)" > > DASH_CODE:echo line 5 > > So, it looks like bash is interpreting this as "first newline that is > not in the middle of another shell word), and parses the entire $(...) I would like to get an explanation on what I should understand by: "first newline that is not in the middle of another shell word) BTW: If I replace $(..) by `..` and feed the code to the original SVr4 Bourne Shell, I get the same output as you got from bash. I would guess that the bash output you added above is correct. Note that the command substitution is part of a " quoted string and even without that, it would need to be parsed first. The POSIX version from ksh88 (seen on Solaris) behaves the same with $(..) and the `..` variant and the same as bash. The fact that ksh93 behaves different with $(..) and the `..` variant makes it obvious that ksh93 has a bug. Jörg -- EMail:joerg-lSlhzV3CM+2sTnJN9+BGXg@public.gmane.org (home) Jörg Schilling D-13353 Berlin joerg.schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/' ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <56cd8cd1.LdVYbpfe+0tQMm8I%Joerg.Schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org>]
* Re: heredoc and subshell [not found] ` <56cd8cd1.LdVYbpfe+0tQMm8I%Joerg.Schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org> @ 2016-02-24 12:27 ` Shware Systems 2016-02-24 12:32 ` Joerg Schilling 0 siblings, 1 reply; 9+ messages in thread From: Shware Systems @ 2016-02-24 12:27 UTC (permalink / raw) To: oleg-BA2faJl1U0pBDLzU/O5InQ, eblake-H+wXaHxf7aLQT0dZR+AlfA, dash-u79uwXL29TY76Z2rM5mHXA, austin-group-l-7882/jkIBncuagvECLh61g, Joerg.Schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh [-- Attachment #1: Type: text/plain, Size: 2881 bytes --] Near as I can tell, from 2.3, io_here has precedence over string concatenation of normal tokenization so the output of bash should be: A:echo line 1 Syntax error: Missing right paren Syntax error: Missing closing double quote as the list production terminates at the first newline, and a new one doesn't start until after the second line with XXX to satisfy the second io_here. It can be argued provisions for trying to find the end of an open string should continue after the io_here body, but Item 7 of 2.3 has the newline terminating it. The first newline is the one left after line joining has been done. Before tokenization starts the input file position should be pointing to the character after it so the first io_here can immediately start reading the body from there until the first XXX line, and that advances it so the second might be processed, and the line after starts a new list, not continues as if no io_here was present. The syntax error detection happens before the second block gets consumed, but the presence of the operator while scanning for the right paren reserves it. On Wednesday, February 24, 2016 Joerg Schilling <Joerg.Schilling-8LS2qeF34IoQrrorzV6ljw@public.gmane.orgnhofer.de> wrote: Eric Blake <eblake-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > --- code > > prefix() { sed -e "s/^/$1:/"; } > > DASH_CODE() { :; } > > > > prefix A <<XXX && echo "$(prefix B <<XXX > > echo line 1 > > XXX > > echo line 2)" && prefix DASH_CODE <<DASH_CODE > > echo line 3 > > XXX > > echo line 4)" > > echo line 5 > > DASH_CODE > > > > --- bash 4.3.42 output: > > A:echo line 3 > > B:echo line 1 > > line 2 > > DASH_CODE:echo line 4)" > > DASH_CODE:echo line 5 > > So, it looks like bash is interpreting this as "first newline that is > not in the middle of another shell word), and parses the entire $(...) I would like to get an explanation on what I should understand by: "first newline that is not in the middle of another shell word) BTW: If I replace $(..) by `..` and feed the code to the original SVr4 Bourne Shell, I get the same output as you got from bash. I would guess that the bash output you added above is correct. Note that the command substitution is part of a " quoted string and even without that, it would need to be parsed first. The POSIX version from ksh88 (seen on Solaris) behaves the same with $(..) and the `..` variant and the same as bash. The fact that ksh93 behaves different with $(..) and the `..` variant makes it obvious that ksh93 has a bug. Jörg -- EMail:joerg-lSlhzV3CM+2sTnJN9+BGXg@public.gmane.org (home) Jörg Schilling D-13353 Berlin joerg.schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/' [-- Attachment #2: Type: text/html, Size: 4026 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: heredoc and subshell 2016-02-24 12:27 ` Shware Systems @ 2016-02-24 12:32 ` Joerg Schilling 0 siblings, 0 replies; 9+ messages in thread From: Joerg Schilling @ 2016-02-24 12:32 UTC (permalink / raw) To: shwaresyst, oleg, eblake, dash, austin-group-l Shware Systems <shwaresyst@aol.com> wrote: > Near as I can tell, from 2.3, io_here has precedence over string concatenation of normal tokenization so the output of bash should be: > A:echo line 1 > Syntax error: Missing right paren > Syntax error: Missing closing double quote > > as the list production terminates at the first newline, and a new one doesn't start until after the second line with XXX to satisfy the second io_here. It can be argued provisions for trying to find the end of an open string should continue after the io_here body, but Item 7 of 2.3 has the newline terminating it. So you are arguing that $(...) and `...` behave different? The interesting news is that the Schily Bourne Shell and mksh currently behave similar to what you like to see. Jörg -- EMail:joerg@schily.net (home) Jörg Schilling D-13353 Berlin joerg.schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/' ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: heredoc and subshell 2016-02-24 10:58 ` Joerg Schilling [not found] ` <56cd8cd1.LdVYbpfe+0tQMm8I%Joerg.Schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org> @ 2016-02-24 20:27 ` Oleg Bulatov 1 sibling, 0 replies; 9+ messages in thread From: Oleg Bulatov @ 2016-02-24 20:27 UTC (permalink / raw) To: Joerg Schilling, eblake, dash, austin-group-l 24.02.2016, 13:58, "Joerg Schilling" <Joerg.Schilling@fokus.fraunhofer.de>: > BTW: If I replace $(..) by `..` and feed the code to the original SVr4 Bourne > Shell, I get the same output as you got from bash. I would guess that the bash > output you added above is correct. The behavior of `..` with heredoc has also interesting corner cases: $ cat x.sh cat <<END ; echo `echo a echo b` hello END $ dash x.sh echo ba x.sh: 3: x.sh: hello: not found x.sh: 4: x.sh: END: not found $ bash x.sh hello a b $ On the one hand it reads heredoc, on the other hand it processes closing of the subshell. -- Oleg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: heredoc and subshell 2016-02-23 22:07 heredoc and subshell Oleg Bulatov 2016-02-23 22:49 ` Eric Blake @ 2016-02-23 23:18 ` Jilles Tjoelker 2016-02-24 8:46 ` Thorsten Glaser 1 sibling, 1 reply; 9+ messages in thread From: Jilles Tjoelker @ 2016-02-23 23:18 UTC (permalink / raw) To: Oleg Bulatov; +Cc: dash On Wed, Feb 24, 2016 at 01:07:44AM +0300, Oleg Bulatov wrote: > trying to minimize a shell code I found an unobvious moment with > heredocs and subshells. > Is it specified by POSIX how next code should be parsed? dash output > for this code differs from bash and zsh. > --- code > prefix() { sed -e "s/^/$1:/"; } > DASH_CODE() { :; } > > prefix A <<XXX && echo "$(prefix B <<XXX > echo line 1 > XXX > echo line 2)" && prefix DASH_CODE <<DASH_CODE > echo line 3 > XXX > echo line 4)" > echo line 5 > DASH_CODE > --- bash 4.3.42 output: > A:echo line 3 > B:echo line 1 > line 2 > DASH_CODE:echo line 4)" > DASH_CODE:echo line 5 > --- dash 0.5.8 output: > A:echo line 1 > B:echo line 2)" && prefix DASH_CODE <<DASH_CODE > B:echo line 3 > line 4 > line 5 I think POSIX is clear that the bash/zsh behaviour is correct and the dash behaviour is wrong. In XCU 2.6.3 Command Substitution, it says: ] With the $(command) form, all characters following the open ] parenthesis to the matching closing parenthesis constitute the ] command. Therefore, the shell should not start reading the here-document belonging to prefix A <<XXX while it is still inside the command substitution $(prefix B <<XXX. Instead, the here-document belonging to prefix B <<XXX should be read. The line ending the command substitution contains another << redirection; the here-documents are read in order of the << redirections. In FreeBSD sh, another ash derivative, I fixed this in FreeBSD SVN r208655, https://github.com/freebsd/freebsd/commit/930ce3922652c50fc8b621b14b6238b325d7f16f Interestingly, mksh parses this in yet another way. In unmodified form, it fails with here document 'XXX' unclosed. After appending an XXX line, it outputs: A:echo line 5 A:DASH_CODE B:echo line 2)" && prefix DASH_CODE <<DASH_CODE B:echo line 3 line 4 The here-document containing line 1 seems lost entirely. The ksh93 93u+ 2012-08-01 from FreeBSD ports segfaults while executing the script. Concludingly, it seems unwise to rely on this construct in scripts to be distributed. -- Jilles Tjoelker ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: heredoc and subshell 2016-02-23 23:18 ` Jilles Tjoelker @ 2016-02-24 8:46 ` Thorsten Glaser 0 siblings, 0 replies; 9+ messages in thread From: Thorsten Glaser @ 2016-02-24 8:46 UTC (permalink / raw) To: Jilles Tjoelker; +Cc: Oleg Bulatov, dash, miros-mksh (Thanks to izabera for bringing this to my attention!) On Wed, 24 Feb 2016, Jilles Tjoelker wrote: > On Wed, Feb 24, 2016 at 01:07:44AM +0300, Oleg Bulatov wrote: > > --- bash 4.3.42 output: That’s the output I’d expect, from manually reading this. > Interestingly, mksh parses this in yet another way. In unmodified form, > it fails with here document 'XXX' unclosed. Yeah, I noticed… yay another bug to fix (just had two last night)… > After appending an XXX line, it outputs: Ah okay, thanks for this hint, that should make debugging easier. bye, //mirabilos -- Sometimes they [people] care too much: pretty printers [and syntax highligh- ting, d.A.] mechanically produce pretty output that accentuates irrelevant detail in the program, which is as sensible as putting all the prepositions in English text in bold font. -- Rob Pike in "Notes on Programming in C" ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-02-24 20:32 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-02-23 22:07 heredoc and subshell Oleg Bulatov 2016-02-23 22:49 ` Eric Blake 2016-02-23 23:16 ` Eric Blake [not found] ` <56CCE1E3.1060805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2016-02-24 10:58 ` Joerg Schilling [not found] ` <56cd8cd1.LdVYbpfe+0tQMm8I%Joerg.Schilling-8LS2qeF34IpklNlQbfROjRvVK+yQ3ZXh@public.gmane.org> 2016-02-24 12:27 ` Shware Systems 2016-02-24 12:32 ` Joerg Schilling 2016-02-24 20:27 ` Oleg Bulatov 2016-02-23 23:18 ` Jilles Tjoelker 2016-02-24 8:46 ` Thorsten Glaser
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).