* dash tested against ash testsuite: 17 failures @ 2016-10-01 17:17 Denys Vlasenko 2016-10-08 19:42 ` Martijn Dekker 0 siblings, 1 reply; 6+ messages in thread From: Denys Vlasenko @ 2016-10-01 17:17 UTC (permalink / raw) To: Herbert Xu, busybox, Ron Yorston, dash [-- Attachment #1: Type: text/plain, Size: 2341 bytes --] ash-signals/savetrap.tests While trying to port some fixes from dash, one of them broke some signal handling testcases. I guess some of things we test for are not tested by dash developers. I ran entire ash testsuite against dash, deleted failing things which dash does not support (some bashisms), and false positives caused by differences in error messages. What's left is: ash-glob/glob2.tests: Evidently, dash supports \f -> ^L escape. This test uses \f as invalid backslash escape, hence differences. ash-misc/echo_write_error.tests EPIPE errors in echo are not reported ash-misc/func2.tests $((i++)) not supported ash-misc/local1.tests Doesn't unset as described: local a # the above line unsets $a echo "A2:'$a'" ash-misc/shift1.tests "shift N" fails if fewer than N argv[i] exists (likely not a bug, but bash does it differently) ash-redir/redir.tests echo errors due to closed stdout are not reported ash-redir/redir3.tests "echo foo >&9" correctly says "9: Bad file descriptor" and exitcode is 2 (in bash, it is 1). ash-redir/redir7.tests ash-redir/redir8.tests uni\x81code filename is not found by uni?code glob pattern. ash-signals/reap1.tests Builtins never wait for children. This loop will not ever stop: sleep 1 & PID=$! while kill -0 $PID >/dev/null 2>&1; do true done ash-signals/savetrap.tests `trap` does not work as expected ash-signals/sigint1.tests trap "exit 0" SIGINT - does not like name "SIGINT". "INT" works. ash-signals/signal4.tests trap "echo Trapped" BADNAME TERM - abort seeing BADNAME. bash complains, but sets the trap for the second signal. ash-signals/signal8.tests "kill %1" in non-interactive shell does not find previously backgrounded task. ash-vars/var-utf8-length.tests ${#VAR} counts unicode chars in bash ash-vars/var_unbackslash.tests b=-$a-\t-\\-\"-\`-\--\z-\*-\?- b="-$a-\t-\\-\"-\`-\--\z-\*-\?-" b='-$a-\t-\\-\"-\`-\--\z-\*-\?-' b1=$b "b1=$b" You can imagine... not everything went well here... ash-vars/var_unbackslash.tests echo Forty two:$\ (\ (\ 42\ )\ ) dash says: Syntax error: Missing '))' The tarball is attached. Unpack, add an ash -> /path/to/your/dash symlink, then run ./run-all [-- Attachment #2: ash_test.tar.gz --] [-- Type: application/x-gzip, Size: 13851 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dash tested against ash testsuite: 17 failures 2016-10-01 17:17 dash tested against ash testsuite: 17 failures Denys Vlasenko @ 2016-10-08 19:42 ` Martijn Dekker 2016-10-10 20:20 ` Harald van Dijk 2016-10-10 21:51 ` Jilles Tjoelker 0 siblings, 2 replies; 6+ messages in thread From: Martijn Dekker @ 2016-10-08 19:42 UTC (permalink / raw) To: dash Op 01-10-16 om 19:17 schreef Denys Vlasenko: > ash-glob/glob2.tests: > Evidently, dash supports \f -> ^L escape. > This test uses \f as invalid backslash escape, > hence differences. The test uses the "echo" builtin, which is very very unportable and explicitly not standardised by POSIX for this case. A test failure based on a difference in how "echo" interprets backslash escapes is not really relevant. Use printf instead. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html#tag_20_37_16 "It is not possible to use echo portably across all POSIX systems unless both -n (as the first argument) and escape sequences are omitted." > ash-misc/echo_write_error.tests > EPIPE errors in echo are not reported They're reported as an I/O error in echo. While that is not as specific, it's still accurate. I don't really see a problem. > ash-misc/func2.tests > $((i++)) not supported This is not required by POSIX. Use $((i+=1)) instead. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_04 "The sizeof() operator and the prefix and postfix "++" and "--" operators are not required." > ash-misc/local1.tests > Doesn't unset as described: > local a > # the above line unsets $a > echo "A2:'$a'" From the dash man page: "When a variable is made local, it inherits the initial value and exported and readonly flags from the variable with the same name in the surrounding scope, if there is one. Otherwise, the variable is initially unset." Looks to me like dash behaves as advertised. Local variables aren't standardised so implementations are free to do as they please. (This is an interesting difference though, one I wasn't aware of until now.) > ash-misc/shift1.tests > "shift N" fails if fewer than N argv[i] exists > (likely not a bug, but bash does it differently) POSIX explicitly allows either behaviour: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_26_14 "If the n operand is invalid or is greater than "$#", this may be considered a syntax error and a non-interactive shell may exit; if the shell does not exit in this case, a non-zero exit status shall be returned. Otherwise, zero shall be returned." Cross-platform scripts must check that the operand is less than or equal than $# before invoking 'shift'. > ash-redir/redir.tests > echo errors due to closed stdout are not reported Again, they are in fact reported but as an I/O error. My response would be the same as for ash-misc/echo_write_error.tests above. > ash-redir/redir3.tests > "echo foo >&9" correctly says "9: Bad file descriptor" > and exitcode is 2 (in bash, it is 1). Not a bug, IMO. POSIX simply says: "A failure to open or create a file shall cause a redirection to fail" but does not specify with what exit status it should fail. Any non-zero exit status < 128 should be fine. > ash-redir/redir7.tests > ash-redir/redir8.tests > uni\x81code filename is not found by uni?code glob pattern. dash does not support unicode; AFAIK, this is a design choice (I'm not a developer). > ash-signals/reap1.tests > Builtins never wait for children. This loop will not ever stop: > sleep 1 & > PID=$! > while kill -0 $PID >/dev/null 2>&1; do > true > done Interesting bug. The behaviour seems to depend on the presence of a 'while' loop. Executed manually without a loop, it behaves as expected: $ sleep 10 & PID=$! $ kill -0 $PID $ kill -0 $PID [...etc...] $ kill -0 $PID [1] + Done sleep 10 $ kill -0 $PID dash: 11: kill: No such process Executed with a loop it's infinite even in an interactive shell, but only the first time around: $ sleep 10 & PID=$! $ while kill -0 $PID; do :; done ^C [1] + Done sleep 10 $ while kill -0 $PID; do :; done dash: 15: kill: No such process Strange behaviour and certainly looks like a rather obscure bug. > ash-signals/savetrap.tests > `trap` does not work as expected Again, the SIG prefix is not required to be supported in POSIX (see above). Unfortunately, according to POSIX, the feature that a subshell containing only a single "trap" command with no operands inherits the parent shell's traps so that they can be stored like save_traps=$(trap), is optional! Hence this can't be considered a bug in dash, although I'd certainly like to submit it as a feature request. POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_28 "When a subshell is entered, traps that are not being ignored shall be set to the default actions, except in the case of a command substitution containing only a single trap command, when the traps need not be altered. Implementations may check for this case using only lexical analysis [...]." Only bash, yash, AT&T ksh93 and busybox ash currently support this optional feature. This makes it extremely inconvenient to store the output of 'trap' in a variable in a cross-platform manner. (In my cross-platform shell library, modernish, which offers stack-based traps, I've had to implement a workaround involving the atomic, parallel-proof creation of a temporary file. Since 'mktemp' isn't cross-platform either, that has been an interesting exercise involving output redirection under 'set -C' in a subshell.) > ash-signals/sigint1.tests > trap "exit 0" SIGINT - does not like name "SIGINT". "INT" works. Not a bug. The SIG prefix is an optional extension. Only "INT" without the prefix is mandated by POSIX. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_28 "The condition can be EXIT, 0 (equivalent to EXIT), or a signal specified using a symbolic name, without the SIG prefix, as listed in the tables of signal names in the <signal.h> header defined in XBD Headers; for example, HUP, INT, QUIT, TERM. Implementations may permit names with the SIG prefix or ignore case in signal names as an extension." > ash-signals/signal4.tests > trap "echo Trapped" BADNAME TERM - abort seeing BADNAME. > bash complains, but sets the trap for the second signal. I don't see how this is a bug. > ash-signals/signal8.tests > "kill %1" in non-interactive shell does not find > previously backgrounded task. Not a bug. '%1' is a job control job ID, but in a non-interactive shell there's no job control. Either 'set -m' to turn on job control first (and deal with extra terminal clutter), or just use "$!" to kill by PID. > ash-vars/var-utf8-length.tests > ${#VAR} counts unicode chars in bash dash does not support unicode; AFAIK, this is a design choice (I'm not a developer). > ash-vars/var_unbackslash.tests > b=-$a-\t-\\-\"-\`-\--\z-\*-\?- > b="-$a-\t-\\-\"-\`-\--\z-\*-\?-" > b='-$a-\t-\\-\"-\`-\--\z-\*-\?-' > b1=$b > "b1=$b" > You can imagine... not everything went well here... Again, this is probably down to legitimate differences in the notoriously unportable "echo" builtin. Test this using printf instead. > ash-vars/var_unbackslash.tests ITYM ash-vars/var_unbackslash1.tests > echo Forty two:$\ > (\ > (\ > 42\ > )\ > ) > dash says: Syntax error: Missing '))' Yes, but it's not clear to me that it shouldn't. Hmm... maybe this is indeed a bug: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01 "A <backslash> that is not quoted shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the shell shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into tokens. Since the escaped <newline> is removed entirely from the input and is not replaced by any white space, it cannot serve as a token separator." So, unless I'm misreading this, it looks like backslashes need to be parsed before *any* other kind of lexical analysis. I hope some of the above was useful. - M. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dash tested against ash testsuite: 17 failures 2016-10-08 19:42 ` Martijn Dekker @ 2016-10-10 20:20 ` Harald van Dijk 2016-10-12 17:24 ` Harald van Dijk 2016-10-10 21:51 ` Jilles Tjoelker 1 sibling, 1 reply; 6+ messages in thread From: Harald van Dijk @ 2016-10-10 20:20 UTC (permalink / raw) To: Martijn Dekker, dash [-- Attachment #1: Type: text/plain, Size: 2358 bytes --] On 08/10/16 21:42, Martijn Dekker wrote: > Op 01-10-16 om 19:17 schreef Denys Vlasenko: >> ash-vars/var_unbackslash.tests > > ITYM ash-vars/var_unbackslash1.tests > >> echo Forty two:$\ >> (\ >> (\ >> 42\ >> )\ >> ) >> dash says: Syntax error: Missing '))' > > Yes, but it's not clear to me that it shouldn't. > > Hmm... maybe this is indeed a bug: > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01 > "A <backslash> that is not quoted shall preserve the literal value of > the following character, with the exception of a <newline>. If a > <newline> follows the <backslash>, the shell shall interpret this as > line continuation. The <backslash> and <newline> shall be removed before > splitting the input into tokens. Since the escaped <newline> is removed > entirely from the input and is not replaced by any white space, it > cannot serve as a token separator." > > So, unless I'm misreading this, it looks like backslashes need to be > parsed before *any* other kind of lexical analysis. There does appear to be one exception: a comment may end with a backslash. This does not cause the next line to be treated as a comment: once a # is seen, the remaining characters on the line are not subjected to the regular lexical analysis, so the above does not apply. I would have expected another exception to be in alias expansions that end in a backslash. Shells are not entirely in agreement there, but most appear to treat this the regular way, meaning dash -c 'alias bs=\\ bs ' prints nothing. dash has a pgetc_eatbnl function already in parser.c which skips any backslash-newline combinations. It's not used everywhere it could be. There is also some duplicated backslash-newline handling elsewhere in parser.c. Replacing all the calls to pgetc() to call pgetc_eatbnl() instead, with the exception of the one that handles comments, and removing the duplicated backslash-newline handling, lets this test case work, as well as several other similar ones, such as: : &\ & : : \ <\ <\ EO\ F 123 E\ OF A nice benefit is that the removal of the duplicated BSNL handling causes a reduction in code size. There are probably a few corner cases I'm not handling correctly in this patch, though. Feedback welcome. Cheers, Harald van Dijk [-- Attachment #2: parser-bnl.patch --] [-- Type: text/x-patch, Size: 4477 bytes --] --- a/src/parser.c +++ b/src/parser.c @@ -106,6 +106,7 @@ STATIC void parseheredoc(void); STATIC int peektoken(void); STATIC int readtoken(void); STATIC int xxreadtoken(void); +STATIC int pgetc_eatbnl(); STATIC int readtoken1(int, char const *, char *, int); STATIC void synexpect(int) __attribute__((__noreturn__)); STATIC void synerror(const char *) __attribute__((__noreturn__)); @@ -656,7 +657,7 @@ parseheredoc(void) if (needprompt) { setprompt(2); } - readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX, + readtoken1(pgetc_eatbnl(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX, here->eofmark, here->striptabs); n = (union node *)stalloc(sizeof (struct narg)); n->narg.type = NARG; @@ -782,7 +783,7 @@ xxreadtoken(void) setprompt(2); } for (;;) { /* until token or start of word found */ - c = pgetc(); + c = pgetc_eatbnl(); switch (c) { case ' ': case '\t': case PEOA: @@ -791,30 +792,23 @@ xxreadtoken(void) while ((c = pgetc()) != '\n' && c != PEOF); pungetc(); continue; - case '\\': - if (pgetc() == '\n') { - nlprompt(); - continue; - } - pungetc(); - goto breakloop; case '\n': nlnoprompt(); RETURN(TNL); case PEOF: RETURN(TEOF); case '&': - if (pgetc() == '&') + if (pgetc_eatbnl() == '&') RETURN(TAND); pungetc(); RETURN(TBACKGND); case '|': - if (pgetc() == '|') + if (pgetc_eatbnl() == '|') RETURN(TOR); pungetc(); RETURN(TPIPE); case ';': - if (pgetc() == ';') + if (pgetc_eatbnl() == ';') RETURN(TENDCASE); pungetc(); RETURN(TSEMI); @@ -822,11 +816,9 @@ xxreadtoken(void) RETURN(TLP); case ')': RETURN(TRP); - default: - goto breakloop; } + break; } -breakloop: return readtoken1(c, BASESYNTAX, (char *)NULL, 0); #undef RETURN } @@ -903,7 +895,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) attyline(); if (syntax == BASESYNTAX) return readtoken(); - c = pgetc(); + c = pgetc_eatbnl(); goto loop; } #endif @@ -916,7 +908,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) goto endword; /* exit outer loop */ USTPUTC(c, out); nlprompt(); - c = pgetc(); + c = pgetc_eatbnl(); goto loop; /* continue outer loop */ case CWORD: USTPUTC(c, out); @@ -997,7 +989,7 @@ quotemark: USTPUTC(c, out); --parenlevel; } else { - if (pgetc() == ')') { + if (pgetc_eatbnl() == ')') { USTPUTC(CTLENDARI, out); if (!--arinest) syntax = prevsyntax; @@ -1025,7 +1017,7 @@ quotemark: USTPUTC(c, out); } } - c = pgetc(); + c = pgetc_eatbnl(); } } endword: @@ -1132,7 +1124,7 @@ parseredir: { np = (union node *)stalloc(sizeof (struct nfile)); if (c == '>') { np->nfile.fd = 1; - c = pgetc(); + c = pgetc_eatbnl(); if (c == '>') np->type = NAPPEND; else if (c == '|') @@ -1145,7 +1137,7 @@ parseredir: { } } else { /* c == '<' */ np->nfile.fd = 0; - switch (c = pgetc()) { + switch (c = pgetc_eatbnl()) { case '<': if (sizeof (struct nfile) != sizeof (struct nhere)) { np = (union node *)stalloc(sizeof (struct nhere)); @@ -1154,7 +1146,7 @@ parseredir: { np->type = NHERE; heredoc = (struct heredoc *)stalloc(sizeof (struct heredoc)); heredoc->here = np; - if ((c = pgetc()) == '-') { + if ((c = pgetc_eatbnl()) == '-') { heredoc->striptabs = 1; } else { heredoc->striptabs = 0; @@ -1336,21 +1328,12 @@ parsebackq: { if (needprompt) { setprompt(2); } - switch (pc = pgetc()) { + switch (pc = pgetc_eatbnl()) { case '`': goto done; case '\\': - if ((pc = pgetc()) == '\n') { - nlprompt(); - /* - * If eating a newline, avoid putting - * the newline into the new character - * stream (via the STPUTC after the - * switch). - */ - continue; - } + pc = pgetc_eatbnl(); if (pc != '\\' && pc != '`' && pc != '$' && (!dblquote || pc != '"')) STPUTC('\\', pout); @@ -1529,7 +1512,7 @@ expandstr(const char *ps) saveprompt = doprompt; doprompt = 0; - readtoken1(pgetc(), DQSYNTAX, FAKEEOFMARK, 0); + readtoken1(pgetc_eatbnl(), DQSYNTAX, FAKEEOFMARK, 0); doprompt = saveprompt; ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dash tested against ash testsuite: 17 failures 2016-10-10 20:20 ` Harald van Dijk @ 2016-10-12 17:24 ` Harald van Dijk 0 siblings, 0 replies; 6+ messages in thread From: Harald van Dijk @ 2016-10-12 17:24 UTC (permalink / raw) To: Martijn Dekker, dash [-- Attachment #1: Type: text/plain, Size: 2977 bytes --] On 10/10/16 22:20, Harald van Dijk wrote: > On 08/10/16 21:42, Martijn Dekker wrote: >> Op 01-10-16 om 19:17 schreef Denys Vlasenko: >>> ash-vars/var_unbackslash.tests >> >> ITYM ash-vars/var_unbackslash1.tests >> >>> echo Forty two:$\ >>> (\ >>> (\ >>> 42\ >>> )\ >>> ) >>> dash says: Syntax error: Missing '))' >> >> Yes, but it's not clear to me that it shouldn't. >> >> Hmm... maybe this is indeed a bug: >> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01 >> >> "A <backslash> that is not quoted shall preserve the literal value of >> the following character, with the exception of a <newline>. If a >> <newline> follows the <backslash>, the shell shall interpret this as >> line continuation. The <backslash> and <newline> shall be removed before >> splitting the input into tokens. Since the escaped <newline> is removed >> entirely from the input and is not replaced by any white space, it >> cannot serve as a token separator." >> >> So, unless I'm misreading this, it looks like backslashes need to be >> parsed before *any* other kind of lexical analysis. > > There does appear to be one exception: a comment may end with a > backslash. This does not cause the next line to be treated as a comment: > once a # is seen, the remaining characters on the line are not subjected > to the regular lexical analysis, so the above does not apply. > > I would have expected another exception to be in alias expansions that > end in a backslash. Shells are not entirely in agreement there, but most > appear to treat this the regular way, meaning > > dash -c 'alias bs=\\ > bs > ' > > prints nothing. > > dash has a pgetc_eatbnl function already in parser.c which skips any > backslash-newline combinations. It's not used everywhere it could be. > There is also some duplicated backslash-newline handling elsewhere in > parser.c. Replacing all the calls to pgetc() to call pgetc_eatbnl() > instead, with the exception of the one that handles comments, and > removing the duplicated backslash-newline handling, lets this test case > work, as well as several other similar ones, such as: > > : &\ > & : > > : \ > <\ > <\ > EO\ > F > 123 > E\ > OF > > A nice benefit is that the removal of the duplicated BSNL handling > causes a reduction in code size. > > There are probably a few corner cases I'm not handling correctly in this > patch, though. Feedback welcome. With more extensive testing, the only issue I've seen is what Jilles Tjoelker had already mentioned, namely that backslash-newline should be preserved inside single-quoted strings, and also that it should be preserved inside heredocs where any part of the delimiter is quoted: cat <<\EOF \ EOF dash's parsing treats this mostly the same as a single-quoted string, and the same extra check handles both cases. Here's an updated patch. Hoping this looks okay and can be applied. > Cheers, > Harald van Dijk [-- Attachment #2: parser-bnl.patch --] [-- Type: text/x-patch, Size: 4955 bytes --] --- a/src/parser.c +++ b/src/parser.c @@ -106,6 +106,7 @@ STATIC void parseheredoc(void); STATIC int peektoken(void); STATIC int readtoken(void); STATIC int xxreadtoken(void); +STATIC int pgetc_eatbnl(); STATIC int readtoken1(int, char const *, char *, int); STATIC void synexpect(int) __attribute__((__noreturn__)); STATIC void synerror(const char *) __attribute__((__noreturn__)); @@ -656,8 +657,10 @@ parseheredoc(void) if (needprompt) { setprompt(2); } - readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX, - here->eofmark, here->striptabs); + if (here->here->type == NHERE) + readtoken1(pgetc(), SQSYNTAX, here->eofmark, here->striptabs); + else + readtoken1(pgetc_eatbnl(), DQSYNTAX, here->eofmark, here->striptabs); n = (union node *)stalloc(sizeof (struct narg)); n->narg.type = NARG; n->narg.next = NULL; @@ -782,7 +785,7 @@ xxreadtoken(void) setprompt(2); } for (;;) { /* until token or start of word found */ - c = pgetc(); + c = pgetc_eatbnl(); switch (c) { case ' ': case '\t': case PEOA: @@ -791,30 +794,23 @@ xxreadtoken(void) while ((c = pgetc()) != '\n' && c != PEOF); pungetc(); continue; - case '\\': - if (pgetc() == '\n') { - nlprompt(); - continue; - } - pungetc(); - goto breakloop; case '\n': nlnoprompt(); RETURN(TNL); case PEOF: RETURN(TEOF); case '&': - if (pgetc() == '&') + if (pgetc_eatbnl() == '&') RETURN(TAND); pungetc(); RETURN(TBACKGND); case '|': - if (pgetc() == '|') + if (pgetc_eatbnl() == '|') RETURN(TOR); pungetc(); RETURN(TPIPE); case ';': - if (pgetc() == ';') + if (pgetc_eatbnl() == ';') RETURN(TENDCASE); pungetc(); RETURN(TSEMI); @@ -822,11 +818,9 @@ xxreadtoken(void) RETURN(TLP); case ')': RETURN(TRP); - default: - goto breakloop; } + break; } -breakloop: return readtoken1(c, BASESYNTAX, (char *)NULL, 0); #undef RETURN } @@ -903,7 +897,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) attyline(); if (syntax == BASESYNTAX) return readtoken(); - c = pgetc(); + c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl(); goto loop; } #endif @@ -916,7 +910,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) goto endword; /* exit outer loop */ USTPUTC(c, out); nlprompt(); - c = pgetc(); + c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl(); goto loop; /* continue outer loop */ case CWORD: USTPUTC(c, out); @@ -933,8 +927,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs) USTPUTC(CTLESC, out); USTPUTC('\\', out); pungetc(); - } else if (c == '\n') { - nlprompt(); } else { if ( dblquote && @@ -997,7 +989,7 @@ quotemark: USTPUTC(c, out); --parenlevel; } else { - if (pgetc() == ')') { + if (pgetc_eatbnl() == ')') { USTPUTC(CTLENDARI, out); if (!--arinest) syntax = prevsyntax; @@ -1025,7 +1017,7 @@ quotemark: USTPUTC(c, out); } } - c = pgetc(); + c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl(); } } endword: @@ -1132,7 +1124,7 @@ parseredir: { np = (union node *)stalloc(sizeof (struct nfile)); if (c == '>') { np->nfile.fd = 1; - c = pgetc(); + c = pgetc_eatbnl(); if (c == '>') np->type = NAPPEND; else if (c == '|') @@ -1145,7 +1137,7 @@ parseredir: { } } else { /* c == '<' */ np->nfile.fd = 0; - switch (c = pgetc()) { + switch (c = pgetc_eatbnl()) { case '<': if (sizeof (struct nfile) != sizeof (struct nhere)) { np = (union node *)stalloc(sizeof (struct nhere)); @@ -1154,7 +1146,7 @@ parseredir: { np->type = NHERE; heredoc = (struct heredoc *)stalloc(sizeof (struct heredoc)); heredoc->here = np; - if ((c = pgetc()) == '-') { + if ((c = pgetc_eatbnl()) == '-') { heredoc->striptabs = 1; } else { heredoc->striptabs = 0; @@ -1336,21 +1328,12 @@ parsebackq: { if (needprompt) { setprompt(2); } - switch (pc = pgetc()) { + switch (pc = pgetc_eatbnl()) { case '`': goto done; case '\\': - if ((pc = pgetc()) == '\n') { - nlprompt(); - /* - * If eating a newline, avoid putting - * the newline into the new character - * stream (via the STPUTC after the - * switch). - */ - continue; - } + pc = pgetc_eatbnl(); if (pc != '\\' && pc != '`' && pc != '$' && (!dblquote || pc != '"')) STPUTC('\\', pout); @@ -1529,7 +1512,7 @@ expandstr(const char *ps) saveprompt = doprompt; doprompt = 0; - readtoken1(pgetc(), DQSYNTAX, FAKEEOFMARK, 0); + readtoken1(pgetc_eatbnl(), DQSYNTAX, FAKEEOFMARK, 0); doprompt = saveprompt; ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dash tested against ash testsuite: 17 failures 2016-10-08 19:42 ` Martijn Dekker 2016-10-10 20:20 ` Harald van Dijk @ 2016-10-10 21:51 ` Jilles Tjoelker 2016-10-10 22:30 ` Harald van Dijk 1 sibling, 1 reply; 6+ messages in thread From: Jilles Tjoelker @ 2016-10-10 21:51 UTC (permalink / raw) To: Martijn Dekker; +Cc: dash On Sat, Oct 08, 2016 at 09:42:12PM +0200, Martijn Dekker wrote: > Op 01-10-16 om 19:17 schreef Denys Vlasenko: > > ash-glob/glob2.tests: > > Evidently, dash supports \f -> ^L escape. > > This test uses \f as invalid backslash escape, > > hence differences. > The test uses the "echo" builtin, which is very very unportable and > explicitly not standardised by POSIX for this case. A test failure based > on a difference in how "echo" interprets backslash escapes is not really > relevant. Use printf instead. > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html#tag_20_37_16 > "It is not possible to use echo portably across all POSIX systems unless > both -n (as the first argument) and escape sequences are omitted." True, but dash seems to be the only common one in the Linux/BSD world to recognize so many backslash escape sequences in echo (without the thoroughly non-standard -e option or special configuration). FreeBSD's /bin/echo knows \c if it is final but shell builtins are more commonly used than /bin/echo. In some sense, it seems like this behaviour is there just to force applications to be POSIX-compliant. I think it may be causing more trouble than it is worth. > > ash-misc/echo_write_error.tests > > EPIPE errors in echo are not reported > They're reported as an I/O error in echo. While that is not as specific, > it's still accurate. I don't really see a problem. The dash code I have, git revision afe0e0152e4dc12d84be3c02d6d62b0456d68580, returns exit status 1 but does not write an error message to standard error. This is definitely a bug. The error message's wording is unspecified, though. > > ash-misc/func2.tests > > $((i++)) not supported > This is not required by POSIX. Use $((i+=1)) instead. > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_04 > "The sizeof() operator and the prefix and postfix "++" and "--" > operators are not required." > > ash-misc/local1.tests > > Doesn't unset as described: > > local a > > # the above line unsets $a > > echo "A2:'$a'" > >From the dash man page: > "When a variable is made local, it inherits the initial value and > exported and readonly flags from the variable with the same name in the > surrounding scope, if there is one. Otherwise, the variable is > initially unset." > Looks to me like dash behaves as advertised. Local variables aren't > standardised so implementations are free to do as they please. > (This is an interesting difference though, one I wasn't aware of until now.) This is not the first time I heard about it but I am not convinced it needs changing. > > ash-misc/shift1.tests > > "shift N" fails if fewer than N argv[i] exists > > (likely not a bug, but bash does it differently) > POSIX explicitly allows either behaviour: > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_26_14 > "If the n operand is invalid or is greater than "$#", this may be > considered a syntax error and a non-interactive shell may exit; if the > shell does not exit in this case, a non-zero exit status shall be > returned. Otherwise, zero shall be returned." > Cross-platform scripts must check that the operand is less than or equal > than $# before invoking 'shift'. Yes. > > ash-redir/redir.tests > > echo errors due to closed stdout are not reported > Again, they are in fact reported but as an I/O error. My response would > be the same as for ash-misc/echo_write_error.tests above. Same bug as above: no diagnostic is written. > > ash-redir/redir3.tests > > "echo foo >&9" correctly says "9: Bad file descriptor" > > and exitcode is 2 (in bash, it is 1). > Not a bug, IMO. POSIX simply says: "A failure to open or create a file > shall cause a redirection to fail" but does not specify with what exit > status it should fail. Any non-zero exit status < 128 should be fine. Yes, and I think an exit status of 2 is slightly more useful if the utility is one that uses status 1 for a "false" result and statuses greater than 1 for real errors, while of the same usefulness if the utility is one that uses statuses greater than 0 for errors. > > ash-redir/redir7.tests > > ash-redir/redir8.tests > > uni\x81code filename is not found by uni?code glob pattern. > dash does not support unicode; AFAIK, this is a design choice (I'm not a > developer). This is actually caused by some level of Unicode support -- a simple bytes=characters matcher would accept this. This Unicode support is in libc's fnmatch() and rejects the invalid byte sequence. What to do with this invalid byte sequence is an interesting question. POSIX is not helpful since it does not consider this case. In FreeBSD sh, I have decided that ? and bracket expressions can only match valid characters, while * and literals can match any byte sequence. One reason for this is the (somewhat simplistic) implementation of ${var#word}, etc. which find the shortest or longest matching part by repeated matching and should definitely take a full character for ? if one exists. > > ash-signals/reap1.tests > > Builtins never wait for children. This loop will not ever stop: > > sleep 1 & > > PID=$! > > while kill -0 $PID >/dev/null 2>&1; do > > true > > done > Interesting bug. > The behaviour seems to depend on the presence of a 'while' loop. > Executed manually without a loop, it behaves as expected: > $ sleep 10 & PID=$! > $ kill -0 $PID > $ kill -0 $PID > [...etc...] > $ kill -0 $PID > [1] + Done sleep 10 > $ kill -0 $PID > dash: 11: kill: No such process > Executed with a loop it's infinite even in an interactive shell, but > only the first time around: > $ sleep 10 & PID=$! > $ while kill -0 $PID; do :; done > ^C > [1] + Done sleep 10 > $ while kill -0 $PID; do :; done > dash: 15: kill: No such process > Strange behaviour and certainly looks like a rather obscure bug. This is because dash only checks for terminated child processes at certain points, such as when waiting for a process or printing a prompt. For example, placing 'sleep 1' within the loop will make it work, since the wait for the foreground process will also collect the background process's status. Interestingly, I already did something about this in FreeBSD sh for the case where this may cause unbounded accumulation of zombies, but not this case, where kill keeps succeeding indefinitely. Note that allowing kill to fail with [ESRCH] "No such process" inherently creates the possibility of signalling an unrelated new process, so it should be avoided if possible. > > ash-signals/savetrap.tests > > `trap` does not work as expected > Again, the SIG prefix is not required to be supported in POSIX (see above). > Unfortunately, according to POSIX, the feature that a subshell > containing only a single "trap" command with no operands inherits the > parent shell's traps so that they can be stored like save_traps=$(trap), > is optional! Hence this can't be considered a bug in dash, although I'd > certainly like to submit it as a feature request. > POSIX: > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_28 For clarification, this is in XCU 2.14 Special Built-In Utilities, trap. > "When a subshell is entered, traps that are not being ignored shall be > set to the default actions, except in the case of a command substitution > containing only a single trap command, when the traps need not be > altered. Implementations may check for this case using only lexical > analysis [...]." The next paragraph puts additional requirements on implementations that do not implement the this check, which likewise ensure that "$(trap)" expands to the current traps and not always an empty string. On another note, the quoted part seems broken. It specifies that the traps be not only printed but completely inherited, and therefore the EXIT trap (if any) also executed after printing the traps. > Only bash, yash, AT&T ksh93 and busybox ash currently support this > optional feature. This makes it extremely inconvenient to store the > output of 'trap' in a variable in a cross-platform manner. > (In my cross-platform shell library, modernish, which offers stack-based > traps, I've had to implement a workaround involving the atomic, > parallel-proof creation of a temporary file. Since 'mktemp' isn't > cross-platform either, that has been an interesting exercise involving > output redirection under 'set -C' in a subshell.) Practically, that is still needed. FreeBSD sh also supports the useful meaning of $(trap). > > ash-signals/sigint1.tests > > trap "exit 0" SIGINT - does not like name "SIGINT". "INT" works. > Not a bug. The SIG prefix is an optional extension. Only "INT" without > the prefix is mandated by POSIX. > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_28 > "The condition can be EXIT, 0 (equivalent to EXIT), or a signal > specified using a symbolic name, without the SIG prefix, as listed in > the tables of signal names in the <signal.h> header defined in XBD > Headers; for example, HUP, INT, QUIT, TERM. Implementations may permit > names with the SIG prefix or ignore case in signal names as an extension." > > ash-signals/signal4.tests > > trap "echo Trapped" BADNAME TERM - abort seeing BADNAME. > > bash complains, but sets the trap for the second signal. > I don't see how this is a bug. POSIX is not entirely clear but there is some text in XCU 1.4 Utility Description Defaults CONSEQUENCES OF ERRORS that suggests subsequent operands should continue to be processed after an erroring operand, which is almost ubiquitously followed by utilities such as cp, mv, rm, find and ls. However, the same section also says that "Default" means any changes to the process state are unspecified. In FreeBSD sh (SVN r199641), I considered it most sensible to continue processing subsequent signal names (returning exit status 1 later). > > ash-signals/signal8.tests > > "kill %1" in non-interactive shell does not find > > previously backgrounded task. > Not a bug. '%1' is a job control job ID, but in a non-interactive shell > there's no job control. Either 'set -m' to turn on job control first > (and deal with extra terminal clutter), or just use "$!" to kill by PID. In my reading of POSIX, there is nothing that makes the %job notation depend on 'set -m'. However, the description of %job operands in kill requires a signal to be sent to a background process group, which does not exist if 'set -m' was not in effect when the job was created. An obvious generalization is to send the signal to each process in the job if 'set -m' was not in effect when it was created. The %1 syntax avoids process ID reuse issues but unfortunately it cannot usually be used if there are multiple background jobs since it may be unclear when a job number becomes available for reuse. > > ash-vars/var-utf8-length.tests > > ${#VAR} counts unicode chars in bash > dash does not support unicode; AFAIK, this is a design choice (I'm not a > developer). That may be but it does cause a non-compliance when the entire system wants to support UTF-8. > > ash-vars/var_unbackslash.tests > > b=-$a-\t-\\-\"-\`-\--\z-\*-\?- > > b="-$a-\t-\\-\"-\`-\--\z-\*-\?-" > > b='-$a-\t-\\-\"-\`-\--\z-\*-\?-' > > b1=$b > > "b1=$b" > > You can imagine... not everything went well here... > Again, this is probably down to legitimate differences in the > notoriously unportable "echo" builtin. Test this using printf instead. Seems like it. > > ash-vars/var_unbackslash.tests > ITYM ash-vars/var_unbackslash1.tests > > echo Forty two:$\ > > (\ > > (\ > > 42\ > > )\ > > ) > > dash says: Syntax error: Missing '))' > Yes, but it's not clear to me that it shouldn't. > Hmm... maybe this is indeed a bug: > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01 > "A <backslash> that is not quoted shall preserve the literal value of > the following character, with the exception of a <newline>. If a > <newline> follows the <backslash>, the shell shall interpret this as > line continuation. The <backslash> and <newline> shall be removed before > splitting the input into tokens. Since the escaped <newline> is removed > entirely from the input and is not replaced by any white space, it > cannot serve as a token separator." > So, unless I'm misreading this, it looks like backslashes need to be > parsed before *any* other kind of lexical analysis. Yes, for <backslash><newline> sequences that are not quoted. For example, <single-quote><backslash><newline><single-quote> contains two characters between the quotes, not zero. -- Jilles Tjoelker ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: dash tested against ash testsuite: 17 failures 2016-10-10 21:51 ` Jilles Tjoelker @ 2016-10-10 22:30 ` Harald van Dijk 0 siblings, 0 replies; 6+ messages in thread From: Harald van Dijk @ 2016-10-10 22:30 UTC (permalink / raw) To: Jilles Tjoelker, Martijn Dekker; +Cc: dash On 10-10-16 23:51, Jilles Tjoelker wrote: > On Sat, Oct 08, 2016 at 09:42:12PM +0200, Martijn Dekker wrote: >> Op 01-10-16 om 19:17 schreef Denys Vlasenko: >>> ash-vars/var_unbackslash.tests > >> ITYM ash-vars/var_unbackslash1.tests > >>> echo Forty two:$\ >>> (\ >>> (\ >>> 42\ >>> )\ >>> ) >>> dash says: Syntax error: Missing '))' > >> Yes, but it's not clear to me that it shouldn't. > >> Hmm... maybe this is indeed a bug: >> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01 >> "A <backslash> that is not quoted shall preserve the literal value of >> the following character, with the exception of a <newline>. If a >> <newline> follows the <backslash>, the shell shall interpret this as >> line continuation. The <backslash> and <newline> shall be removed before >> splitting the input into tokens. Since the escaped <newline> is removed >> entirely from the input and is not replaced by any white space, it >> cannot serve as a token separator." > >> So, unless I'm misreading this, it looks like backslashes need to be >> parsed before *any* other kind of lexical analysis. > > Yes, for <backslash><newline> sequences that are not quoted. > > For example, <single-quote><backslash><newline><single-quote> contains > two characters between the quotes, not zero. Ah, right, I missed that exception. Note that it only applies to single-quoted strings. In double-quoted strings, backslash-newline should be removed just as when unquoted. Cheers, Harald van Dijk ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-10-12 17:32 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-10-01 17:17 dash tested against ash testsuite: 17 failures Denys Vlasenko 2016-10-08 19:42 ` Martijn Dekker 2016-10-10 20:20 ` Harald van Dijk 2016-10-12 17:24 ` Harald van Dijk 2016-10-10 21:51 ` Jilles Tjoelker 2016-10-10 22:30 ` Harald van Dijk
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).