On 10/10/16 22:20, Harald van Dijk wrote: > On 08/10/16 21:42, Martijn Dekker wrote: >> Op 01-10-16 om 19:17 schreef Denys Vlasenko: >>> ash-vars/var_unbackslash.tests >> >> ITYM ash-vars/var_unbackslash1.tests >> >>> echo Forty two:$\ >>> (\ >>> (\ >>> 42\ >>> )\ >>> ) >>> dash says: Syntax error: Missing '))' >> >> Yes, but it's not clear to me that it shouldn't. >> >> Hmm... maybe this is indeed a bug: >> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01 >> >> "A that is not quoted shall preserve the literal value of >> the following character, with the exception of a . If a >> follows the , the shell shall interpret this as >> line continuation. The and shall be removed before >> splitting the input into tokens. Since the escaped is removed >> entirely from the input and is not replaced by any white space, it >> cannot serve as a token separator." >> >> So, unless I'm misreading this, it looks like backslashes need to be >> parsed before *any* other kind of lexical analysis. > > There does appear to be one exception: a comment may end with a > backslash. This does not cause the next line to be treated as a comment: > once a # is seen, the remaining characters on the line are not subjected > to the regular lexical analysis, so the above does not apply. > > I would have expected another exception to be in alias expansions that > end in a backslash. Shells are not entirely in agreement there, but most > appear to treat this the regular way, meaning > > dash -c 'alias bs=\\ > bs > ' > > prints nothing. > > dash has a pgetc_eatbnl function already in parser.c which skips any > backslash-newline combinations. It's not used everywhere it could be. > There is also some duplicated backslash-newline handling elsewhere in > parser.c. Replacing all the calls to pgetc() to call pgetc_eatbnl() > instead, with the exception of the one that handles comments, and > removing the duplicated backslash-newline handling, lets this test case > work, as well as several other similar ones, such as: > > : &\ > & : > > : \ > <\ > <\ > EO\ > F > 123 > E\ > OF > > A nice benefit is that the removal of the duplicated BSNL handling > causes a reduction in code size. > > There are probably a few corner cases I'm not handling correctly in this > patch, though. Feedback welcome. With more extensive testing, the only issue I've seen is what Jilles Tjoelker had already mentioned, namely that backslash-newline should be preserved inside single-quoted strings, and also that it should be preserved inside heredocs where any part of the delimiter is quoted: cat <<\EOF \ EOF dash's parsing treats this mostly the same as a single-quoted string, and the same extra check handles both cases. Here's an updated patch. Hoping this looks okay and can be applied. > Cheers, > Harald van Dijk