dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Alias/heredoc/cmdsubst regression
@ 2020-01-24 23:48 Martijn Dekker
  2020-01-25  1:13 ` Harald van Dijk
  0 siblings, 1 reply; 7+ messages in thread
From: Martijn Dekker @ 2020-01-24 23:48 UTC (permalink / raw)
  To: DASH shell mailing list

There is a regression involving alias expansion, a here-document, and a 
command substitution. The following worked fine until dash 0.5.8; it 
throws a syntax error as of dash 0.5.9.

alias BEGIN='{' END='}'
BEGIN
cat <<eof
$(echo hi)
eof
END

With some git brute-forcing I determined this was the commit that broke it:

> commit 7c245aa8ed33ba5db30eef9369d67036a05b0371
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Tue Oct 28 17:22:16 2014 +0800
> 
>     [PARSER] Simplify EOF/newline handling in list parser
> 
>     This patch simplifies the EOF and new handling in the list parser.
>     In particular, it eliminates a case where we may leave here-documents
>     unfinished upon EOF.
> 
>     It also removes special EOF/newline handling from parsecmd.

Related shells:
- I've confirmed the test case to work fine on:
	FreeBSD sh 9, 10, 11 and 12
	NetBSD sh 8 and 9.0RC1
	busybox ash 1.20 - 1.24
- I've confirmed the test case broken on:
	busyboxy ash 1.25 through current

Since busybox ash tracks dash, it's not surprising it broke there as well.

- M.

-- 
modernish -- harness the shell
https://github.com/modernish/modernish

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Alias/heredoc/cmdsubst regression
  2020-01-24 23:48 Alias/heredoc/cmdsubst regression Martijn Dekker
@ 2020-01-25  1:13 ` Harald van Dijk
  2020-01-25  1:38   ` Harald van Dijk
  0 siblings, 1 reply; 7+ messages in thread
From: Harald van Dijk @ 2020-01-25  1:13 UTC (permalink / raw)
  To: Martijn Dekker, DASH shell mailing list

On 24/01/2020 23:48, Martijn Dekker wrote:
> There is a regression involving alias expansion, a here-document, and a 
> command substitution. The following worked fine until dash 0.5.8; it 
> throws a syntax error as of dash 0.5.9.
> 
> alias BEGIN='{' END='}'
> BEGIN
> cat <<eof
> $(echo hi)
> eof
> END

Nice find.

When the newline after cat <<eof is seen, checkkwd is changed to 
indicate that the shell is in a state where aliases can be expanded. 
Then, parseheredoc() is called, which in turn calls readtoken1() to 
parse the here-document. readtoken() re-sets checkkwd once it is done, 
but readtoken1() does not, so normally this preserves the "can expand 
aliases" state. However, nested command substitutions do reset checkkwd, 
so things break.

Until 0.5.8, parseheredoc() was called first, and only after that did 
checkkwd get changed.

Either parseheredoc() needs to save and restore checkkwd, or the code 
calling parseheredoc() needs to ensure that it sets checkkwd as 
appropriate afterwards.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Alias/heredoc/cmdsubst regression
  2020-01-25  1:13 ` Harald van Dijk
@ 2020-01-25  1:38   ` Harald van Dijk
  2020-01-25 12:19     ` [PATCH] parseheredoc: fix alias expansion: save and restore checkkwd Martijn Dekker
  2020-01-25 16:26     ` Alias/heredoc/cmdsubst regression Martijn Dekker
  0 siblings, 2 replies; 7+ messages in thread
From: Harald van Dijk @ 2020-01-25  1:38 UTC (permalink / raw)
  To: Martijn Dekker, DASH shell mailing list

On 25/01/2020 01:13, Harald van Dijk wrote:
> On 24/01/2020 23:48, Martijn Dekker wrote:
>> There is a regression involving alias expansion, a here-document, and 
>> a command substitution. The following worked fine until dash 0.5.8; it 
>> throws a syntax error as of dash 0.5.9.
>>
>> alias BEGIN='{' END='}'
>> BEGIN
>> cat <<eof
>> $(echo hi)
>> eof
>> END
> 
> Nice find.
> 
> When the newline after cat <<eof is seen, checkkwd is changed to 
> indicate that the shell is in a state where aliases can be expanded. 
> Then, parseheredoc() is called, which in turn calls readtoken1() to 
> parse the here-document. readtoken() re-sets checkkwd once it is done, 
> but readtoken1() does not, so normally this preserves the "can expand 
> aliases" state. However, nested command substitutions do reset checkkwd, 
> so things break.
> 
> Until 0.5.8, parseheredoc() was called first, and only after that did 
> checkkwd get changed.
> 
> Either parseheredoc() needs to save and restore checkkwd, or the code 
> calling parseheredoc() needs to ensure that it sets checkkwd as 
> appropriate afterwards.

There is another place that parseheredoc() can be called from where 
checkkwd was not being corrected afterwards:

   alias BEGIN='{' END='}'
   : <<EOF &&
   $(echo hi)
   EOF
   BEGIN
   echo ok
   END

This has been failing for longer. This prints "Syntax error: "(" 
unexpected" since at least 0.5.1.

> Cheers,
> Harald van Dijk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] parseheredoc: fix alias expansion: save and restore checkkwd
  2020-01-25  1:38   ` Harald van Dijk
@ 2020-01-25 12:19     ` Martijn Dekker
  2020-01-25 16:17       ` Harald van Dijk
  2020-01-25 16:26     ` Alias/heredoc/cmdsubst regression Martijn Dekker
  1 sibling, 1 reply; 7+ messages in thread
From: Martijn Dekker @ 2020-01-25 12:19 UTC (permalink / raw)
  To: Harald van Dijk, DASH shell mailing list

[-- Attachment #1: Type: text/plain, Size: 1677 bytes --]

Op 25-01-20 om 02:38 schreef Harald van Dijk:
> On 25/01/2020 01:13, Harald van Dijk wrote:
>> On 24/01/2020 23:48, Martijn Dekker wrote:
>>> There is a regression involving alias expansion, a here-document, and 
>>> a command substitution. The following worked fine until dash 0.5.8; 
>>> it throws a syntax error as of dash 0.5.9.
>>>
>>> alias BEGIN='{' END='}'
>>> BEGIN
>>> cat <<eof
>>> $(echo hi)
>>> eof
>>> END
>>
>> Nice find.
>>
>> When the newline after cat <<eof is seen, checkkwd is changed to 
>> indicate that the shell is in a state where aliases can be expanded. 
>> Then, parseheredoc() is called, which in turn calls readtoken1() to 
>> parse the here-document. readtoken() re-sets checkkwd once it is done, 
>> but readtoken1() does not, so normally this preserves the "can expand 
>> aliases" state. However, nested command substitutions do reset 
>> checkkwd, so things break.
>>
>> Until 0.5.8, parseheredoc() was called first, and only after that did 
>> checkkwd get changed.
>>
>> Either parseheredoc() needs to save and restore checkkwd, or the code 
>> calling parseheredoc() needs to ensure that it sets checkkwd as 
>> appropriate afterwards.

Saving and restoring checkkwd in parseheredoc() seems the simplest and 
the most future-proof, so here's a patch to do that.

> There is another place that parseheredoc() can be called from where 
> checkkwd was not being corrected afterwards:
> 
>    alias BEGIN='{' END='}'
>    : <<EOF &&
>    $(echo hi)
>    EOF
>    BEGIN
>    echo ok
>    END
> 
> This has been failing for longer.

The patch fixes this as well.

-- 
modernish -- harness the shell
https://github.com/modernish/modernish

[-- Attachment #2: parseheredoc.patch --]
[-- Type: text/plain, Size: 397 bytes --]

diff --git a/src/parser.c b/src/parser.c
index b318b08..8840262 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -661,6 +661,7 @@ parsefname(void)
 STATIC void
 parseheredoc(void)
 {
+	int savecheckkwd = checkkwd;
 	struct heredoc *here;
 	union node *n;
 
@@ -683,6 +684,8 @@ parseheredoc(void)
 		here->here->nhere.doc = n;
 		here = here->next;
 	}
+
+	checkkwd = savecheckkwd;
 }
 
 STATIC int

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] parseheredoc: fix alias expansion: save and restore checkkwd
  2020-01-25 12:19     ` [PATCH] parseheredoc: fix alias expansion: save and restore checkkwd Martijn Dekker
@ 2020-01-25 16:17       ` Harald van Dijk
  0 siblings, 0 replies; 7+ messages in thread
From: Harald van Dijk @ 2020-01-25 16:17 UTC (permalink / raw)
  To: Martijn Dekker, DASH shell mailing list

On 25/01/2020 12:19, Martijn Dekker wrote:
> Op 25-01-20 om 02:38 schreef Harald van Dijk:
>> On 25/01/2020 01:13, Harald van Dijk wrote:
>>> On 24/01/2020 23:48, Martijn Dekker wrote:
>>>> There is a regression involving alias expansion, a here-document, 
>>>> and a command substitution. The following worked fine until dash 
>>>> 0.5.8; it throws a syntax error as of dash 0.5.9.
>>>>
>>>> alias BEGIN='{' END='}'
>>>> BEGIN
>>>> cat <<eof
>>>> $(echo hi)
>>>> eof
>>>> END
>>>
>>> Nice find.
>>>
>>> When the newline after cat <<eof is seen, checkkwd is changed to 
>>> indicate that the shell is in a state where aliases can be expanded. 
>>> Then, parseheredoc() is called, which in turn calls readtoken1() to 
>>> parse the here-document. readtoken() re-sets checkkwd once it is 
>>> done, but readtoken1() does not, so normally this preserves the "can 
>>> expand aliases" state. However, nested command substitutions do reset 
>>> checkkwd, so things break.
>>>
>>> Until 0.5.8, parseheredoc() was called first, and only after that did 
>>> checkkwd get changed.
>>>
>>> Either parseheredoc() needs to save and restore checkkwd, or the code 
>>> calling parseheredoc() needs to ensure that it sets checkkwd as 
>>> appropriate afterwards.
> 
> Saving and restoring checkkwd in parseheredoc() seems the simplest and 
> the most future-proof, so here's a patch to do that.

I have now gone over the places where parseheredoc() is called.

The calls in list() are followed by a return statement. It does not 
matter what value checkkwd has afterwards, as no new command will be 
parsed, or at least not before checkkwd gets reset anyway.

The call in readtoken() under parsebackq only happens when a word 
containing a command substitution is being parsed. This can never be a 
valid alias name, so here too it does not matter what value checkkwd has 
afterwards. If checkkwd should be restored to make the code easier to 
understand, it should be restored to the value it had before the list() 
call, not the value it had before parseheredoc(), so this is something 
that cannot be done from within parseheredoc().

The only place where checkkwd being clobbered matters is at the start of 
readtoken(), under the "eat newlines" comment. This is very close to the 
place where CHKALIAS gets checked and it is possible to instead change 
the check to respect the already-saved value of checkkwd:

--- a/src/parser.c
+++ b/src/parser.c
@@ -734,7 +734,7 @@ top:
                 }
         }

-       if (checkkwd & CHKALIAS) {
+       if ((kwd | checkkwd) & CHKALIAS) {
                 struct alias *ap;
                 if ((ap = lookupalias(wordtext, 1)) != NULL) {
                         if (*ap->val) {

The value of checkkwd may have been clobbered by xxreadtoken() as well, 
not just by parseheredoc(), so this approach would handle that too. 
However, if a token is a newline, a valid keyword, or a valid alias 
name, then checkkwd is guaranteed not to get clobbered by xxreadtoken(), 
so it should make no difference in practice which earlier value of 
checkkwd gets used, and there should be no observable difference between 
your patch and mine.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Alias/heredoc/cmdsubst regression
  2020-01-25  1:38   ` Harald van Dijk
  2020-01-25 12:19     ` [PATCH] parseheredoc: fix alias expansion: save and restore checkkwd Martijn Dekker
@ 2020-01-25 16:26     ` Martijn Dekker
  2020-01-25 16:33       ` Harald van Dijk
  1 sibling, 1 reply; 7+ messages in thread
From: Martijn Dekker @ 2020-01-25 16:26 UTC (permalink / raw)
  To: Harald van Dijk, DASH shell mailing list

Op 25-01-20 om 02:38 schreef Harald van Dijk:
[...]
>> Nice find.

Thanks, but I can't take credit for being particularly astute here. The 
modernish loop construct (including the 'find' utility being fully 
integrated into the shell as 'LOOP find <var> [ in <path> ... ] [ 
<find-expression> ]; DO ... DONE' -- check it out!) is based on a 
triplet of aliases that define a block using { ... }, so it was only a 
matter of time before this bug simply bit me.

Thanks for the analysis. It made it trivial to patch. I hope Herbert agrees.

> There is another place that parseheredoc() can be called from where 
> checkkwd was not being corrected afterwards:
> 
>    alias BEGIN='{' END='}'
>    : <<EOF &&
>    $(echo hi)
>    EOF
>    BEGIN
>    echo ok
>    END
> 
> This has been failing for longer. This prints "Syntax error: "(" 
> unexpected" since at least 0.5.1.

Interesting. For me, this one prints (on dash 0.5.7 until current):
testalias2.sh: 5: testalias2.sh: {: not found
ok
testalias2.sh: 7: testalias2.sh: Syntax error: "}" unexpected

But anyway, thank you for this second test case. Time to add a bug 
detection ID (BUG_ALIASHDOC) and couple of regression tests with these 
two test cases to modernish -- and document that here-docs containing 
command substitutions that are lexically within modernish loops won't be 
portable for another five years or so. :/

It's unfortunate dash still doesn't have a regression test suite 
(Herbert never responded to my question, just over a year ago, if he 
would consider a patch to add one), but modernish regress-tests all the 
shells it will run on, so 'dash bin/modernish --test' can fulfil that 
function.

- M.

-- 
modernish -- harness the shell
https://github.com/modernish/modernish

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Alias/heredoc/cmdsubst regression
  2020-01-25 16:26     ` Alias/heredoc/cmdsubst regression Martijn Dekker
@ 2020-01-25 16:33       ` Harald van Dijk
  0 siblings, 0 replies; 7+ messages in thread
From: Harald van Dijk @ 2020-01-25 16:33 UTC (permalink / raw)
  To: Martijn Dekker, DASH shell mailing list

On 25/01/2020 16:26, Martijn Dekker wrote:
> Op 25-01-20 om 02:38 schreef Harald van Dijk:
>>    alias BEGIN='{' END='}'
>>    : <<EOF &&
>>    $(echo hi)
>>    EOF
>>    BEGIN
>>    echo ok
>>    END
>>
>> This has been failing for longer. This prints "Syntax error: "(" 
>> unexpected" since at least 0.5.1.
> 
> Interesting. For me, this one prints (on dash 0.5.7 until current):
> testalias2.sh: 5: testalias2.sh: {: not found
> ok
> testalias2.sh: 7: testalias2.sh: Syntax error: "}" unexpected

Oops! That is what it prints for me too and dash has done so since 0.5.1 
(except for the line numbers in the error messages). I got the "Syntax 
error: "(" unexpected" on another test, was typing up an e-mail, noticed 
my test was overly complicated, simplified it, verified that it still 
failed since 0.5.1, but forgot to update the error message in my mail. 
Sorry for the confusion.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-01-25 16:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-24 23:48 Alias/heredoc/cmdsubst regression Martijn Dekker
2020-01-25  1:13 ` Harald van Dijk
2020-01-25  1:38   ` Harald van Dijk
2020-01-25 12:19     ` [PATCH] parseheredoc: fix alias expansion: save and restore checkkwd Martijn Dekker
2020-01-25 16:17       ` Harald van Dijk
2020-01-25 16:26     ` Alias/heredoc/cmdsubst regression Martijn Dekker
2020-01-25 16:33       ` Harald van Dijk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).