dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* dash tested against ash testsuite: 17 failures
@ 2016-10-01 17:17 Denys Vlasenko
  2016-10-08 19:42 ` Martijn Dekker
  0 siblings, 1 reply; 6+ messages in thread
From: Denys Vlasenko @ 2016-10-01 17:17 UTC (permalink / raw)
  To: Herbert Xu, busybox, Ron Yorston, dash

[-- Attachment #1: Type: text/plain, Size: 2341 bytes --]

ash-signals/savetrap.tests
While trying to port some fixes from dash,
one of them broke some signal handling testcases.

I guess some of things we test for are not tested
by dash developers.

I ran entire ash testsuite against dash, deleted failing
things which dash does not support (some bashisms),
and false positives caused by differences in error
messages.

What's left is:

ash-glob/glob2.tests:
    Evidently, dash supports \f -> ^L escape.
    This test uses \f as invalid backslash escape,
    hence differences.

ash-misc/echo_write_error.tests
    EPIPE errors in echo are not reported

ash-misc/func2.tests
    $((i++)) not supported

ash-misc/local1.tests
    Doesn't unset as described:
        local a
        # the above line unsets $a
        echo "A2:'$a'"

ash-misc/shift1.tests
    "shift N" fails if fewer than N argv[i] exists
    (likely not a bug, but bash does it differently)

ash-redir/redir.tests
    echo errors due to closed stdout are not reported

ash-redir/redir3.tests
    "echo foo >&9" correctly says "9: Bad file descriptor"
    and exitcode is 2 (in bash, it is 1).

ash-redir/redir7.tests
ash-redir/redir8.tests
    uni\x81code filename is not found by uni?code glob pattern.

ash-signals/reap1.tests
    Builtins never wait for children. This loop will not ever stop:
    sleep 1 &
    PID=$!
    while kill -0 $PID >/dev/null 2>&1; do
        true
    done

ash-signals/savetrap.tests
    `trap` does not work as expected

ash-signals/sigint1.tests
    trap "exit 0" SIGINT - does not like name "SIGINT". "INT" works.

ash-signals/signal4.tests
    trap "echo Trapped" BADNAME TERM - abort seeing BADNAME.
    bash complains, but sets the trap for the second signal.

ash-signals/signal8.tests
    "kill %1" in non-interactive shell does not find
    previously backgrounded task.

ash-vars/var-utf8-length.tests
    ${#VAR} counts unicode chars in bash

ash-vars/var_unbackslash.tests
    b=-$a-\t-\\-\"-\`-\--\z-\*-\?-
    b="-$a-\t-\\-\"-\`-\--\z-\*-\?-"
    b='-$a-\t-\\-\"-\`-\--\z-\*-\?-'
    b1=$b
    "b1=$b"
    You can imagine... not everything went well here...

ash-vars/var_unbackslash.tests
    echo Forty two:$\
    (\
    (\
    42\
    )\
    )
    dash says: Syntax error: Missing '))'


The tarball is attached. Unpack, add an ash -> /path/to/your/dash
symlink, then run ./run-all

[-- Attachment #2: ash_test.tar.gz --]
[-- Type: application/x-gzip, Size: 13851 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dash tested against ash testsuite: 17 failures
  2016-10-01 17:17 dash tested against ash testsuite: 17 failures Denys Vlasenko
@ 2016-10-08 19:42 ` Martijn Dekker
  2016-10-10 20:20   ` Harald van Dijk
  2016-10-10 21:51   ` Jilles Tjoelker
  0 siblings, 2 replies; 6+ messages in thread
From: Martijn Dekker @ 2016-10-08 19:42 UTC (permalink / raw)
  To: dash

Op 01-10-16 om 19:17 schreef Denys Vlasenko:
> ash-glob/glob2.tests:
>     Evidently, dash supports \f -> ^L escape.
>     This test uses \f as invalid backslash escape,
>     hence differences.

The test uses the "echo" builtin, which is very very unportable and
explicitly not standardised by POSIX for this case. A test failure based
on a difference in how "echo" interprets backslash escapes is not really
relevant. Use printf instead.

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html#tag_20_37_16
"It is not possible to use echo portably across all POSIX systems unless
both -n (as the first argument) and escape sequences are omitted."

> ash-misc/echo_write_error.tests
>     EPIPE errors in echo are not reported

They're reported as an I/O error in echo. While that is not as specific,
it's still accurate. I don't really see a problem.

> ash-misc/func2.tests
>     $((i++)) not supported

This is not required by POSIX. Use $((i+=1)) instead.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_04
"The sizeof() operator and the prefix and postfix "++" and "--"
operators are not required."

> ash-misc/local1.tests
>     Doesn't unset as described:
>         local a
>         # the above line unsets $a
>         echo "A2:'$a'"

From the dash man page:
"When a variable is made local, it inherits the initial value and
exported and readonly flags from the variable with the same name in the
surrounding scope, if there is one.  Otherwise, the variable is
initially unset."

Looks to me like dash behaves as advertised. Local variables aren't
standardised so implementations are free to do as they please.

(This is an interesting difference though, one I wasn't aware of until now.)

> ash-misc/shift1.tests
>     "shift N" fails if fewer than N argv[i] exists
>     (likely not a bug, but bash does it differently)

POSIX explicitly allows either behaviour:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_26_14
"If the n operand is invalid or is greater than "$#", this may be
considered a syntax error and a non-interactive shell may exit; if the
shell does not exit in this case, a non-zero exit status shall be
returned. Otherwise, zero shall be returned."

Cross-platform scripts must check that the operand is less than or equal
than $# before invoking 'shift'.

> ash-redir/redir.tests
>     echo errors due to closed stdout are not reported

Again, they are in fact reported but as an I/O error. My response would
be the same as for ash-misc/echo_write_error.tests above.

> ash-redir/redir3.tests
>     "echo foo >&9" correctly says "9: Bad file descriptor"
>     and exitcode is 2 (in bash, it is 1).

Not a bug, IMO. POSIX simply says: "A failure to open or create a file
shall cause a redirection to fail" but does not specify with what exit
status it should fail. Any non-zero exit status < 128 should be fine.

> ash-redir/redir7.tests
> ash-redir/redir8.tests
>     uni\x81code filename is not found by uni?code glob pattern.

dash does not support unicode; AFAIK, this is a design choice (I'm not a
developer).

> ash-signals/reap1.tests
>     Builtins never wait for children. This loop will not ever stop:
>     sleep 1 &
>     PID=$!
>     while kill -0 $PID >/dev/null 2>&1; do
>         true
>     done

Interesting bug.

The behaviour seems to depend on the presence of a 'while' loop.
Executed manually without a loop, it behaves as expected:

$ sleep 10 & PID=$!
$ kill -0 $PID
$ kill -0 $PID
[...etc...]
$ kill -0 $PID
[1] + Done                       sleep 10
$ kill -0 $PID
dash: 11: kill: No such process

Executed with a loop it's infinite even in an interactive shell, but
only the first time around:

$ sleep 10 & PID=$!
$ while kill -0 $PID; do :; done
^C
[1] + Done                       sleep 10
$ while kill -0 $PID; do :; done
dash: 15: kill: No such process

Strange behaviour and certainly looks like a rather obscure bug.

> ash-signals/savetrap.tests
>     `trap` does not work as expected

Again, the SIG prefix is not required to be supported in POSIX (see above).

Unfortunately, according to POSIX, the feature that a subshell
containing only a single "trap" command with no operands inherits the
parent shell's traps so that they can be stored like save_traps=$(trap),
is optional! Hence this can't be considered a bug in dash, although I'd
certainly like to submit it as a feature request.

POSIX:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_28
"When a subshell is entered, traps that are not being ignored shall be
set to the default actions, except in the case of a command substitution
containing only a single trap command, when the traps need not be
altered. Implementations may check for this case using only lexical
analysis [...]."

Only bash, yash, AT&T ksh93 and busybox ash currently support this
optional feature. This makes it extremely inconvenient to store the
output of 'trap' in a variable in a cross-platform manner.

(In my cross-platform shell library, modernish, which offers stack-based
traps, I've had to implement a workaround involving the atomic,
parallel-proof creation of a temporary file. Since 'mktemp' isn't
cross-platform either, that has been an interesting exercise involving
output redirection under 'set -C' in a subshell.)

> ash-signals/sigint1.tests
>     trap "exit 0" SIGINT - does not like name "SIGINT". "INT" works.

Not a bug. The SIG prefix is an optional extension. Only "INT" without
the prefix is mandated by POSIX.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_28
"The condition can be EXIT, 0 (equivalent to EXIT), or a signal
specified using a symbolic name, without the SIG prefix, as listed in
the tables of signal names in the <signal.h> header defined in XBD
Headers; for example, HUP, INT, QUIT, TERM. Implementations may permit
names with the SIG prefix or ignore case in signal names as an extension."

> ash-signals/signal4.tests
>     trap "echo Trapped" BADNAME TERM - abort seeing BADNAME.
>     bash complains, but sets the trap for the second signal.

I don't see how this is a bug.

> ash-signals/signal8.tests
>     "kill %1" in non-interactive shell does not find
>     previously backgrounded task.

Not a bug. '%1' is a job control job ID, but in a non-interactive shell
there's no job control. Either 'set -m' to turn on job control first
(and deal with extra terminal clutter), or just use "$!" to kill by PID.

> ash-vars/var-utf8-length.tests
>     ${#VAR} counts unicode chars in bash

dash does not support unicode; AFAIK, this is a design choice (I'm not a
developer).

> ash-vars/var_unbackslash.tests
>     b=-$a-\t-\\-\"-\`-\--\z-\*-\?-
>     b="-$a-\t-\\-\"-\`-\--\z-\*-\?-"
>     b='-$a-\t-\\-\"-\`-\--\z-\*-\?-'
>     b1=$b
>     "b1=$b"
>     You can imagine... not everything went well here...

Again, this is probably down to legitimate differences in the
notoriously unportable "echo" builtin. Test this using printf instead.

> ash-vars/var_unbackslash.tests

ITYM ash-vars/var_unbackslash1.tests

>     echo Forty two:$\
>     (\
>     (\
>     42\
>     )\
>     )
>     dash says: Syntax error: Missing '))'

Yes, but it's not clear to me that it shouldn't.

Hmm... maybe this is indeed a bug:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01
"A <backslash> that is not quoted shall preserve the literal value of
the following character, with the exception of a <newline>. If a
<newline> follows the <backslash>, the shell shall interpret this as
line continuation. The <backslash> and <newline> shall be removed before
splitting the input into tokens. Since the escaped <newline> is removed
entirely from the input and is not replaced by any white space, it
cannot serve as a token separator."

So, unless I'm misreading this, it looks like backslashes need to be
parsed before *any* other kind of lexical analysis.


I hope some of the above was useful.

- M.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dash tested against ash testsuite: 17 failures
  2016-10-08 19:42 ` Martijn Dekker
@ 2016-10-10 20:20   ` Harald van Dijk
  2016-10-12 17:24     ` Harald van Dijk
  2016-10-10 21:51   ` Jilles Tjoelker
  1 sibling, 1 reply; 6+ messages in thread
From: Harald van Dijk @ 2016-10-10 20:20 UTC (permalink / raw)
  To: Martijn Dekker, dash

[-- Attachment #1: Type: text/plain, Size: 2358 bytes --]

On 08/10/16 21:42, Martijn Dekker wrote:
> Op 01-10-16 om 19:17 schreef Denys Vlasenko:
>> ash-vars/var_unbackslash.tests
>
> ITYM ash-vars/var_unbackslash1.tests
>
>>     echo Forty two:$\
>>     (\
>>     (\
>>     42\
>>     )\
>>     )
>>     dash says: Syntax error: Missing '))'
>
> Yes, but it's not clear to me that it shouldn't.
>
> Hmm... maybe this is indeed a bug:
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01
> "A <backslash> that is not quoted shall preserve the literal value of
> the following character, with the exception of a <newline>. If a
> <newline> follows the <backslash>, the shell shall interpret this as
> line continuation. The <backslash> and <newline> shall be removed before
> splitting the input into tokens. Since the escaped <newline> is removed
> entirely from the input and is not replaced by any white space, it
> cannot serve as a token separator."
>
> So, unless I'm misreading this, it looks like backslashes need to be
> parsed before *any* other kind of lexical analysis.

There does appear to be one exception: a comment may end with a 
backslash. This does not cause the next line to be treated as a comment: 
once a # is seen, the remaining characters on the line are not subjected 
to the regular lexical analysis, so the above does not apply.

I would have expected another exception to be in alias expansions that 
end in a backslash. Shells are not entirely in agreement there, but most 
appear to treat this the regular way, meaning

   dash -c 'alias bs=\\
   bs
   '

prints nothing.

dash has a pgetc_eatbnl function already in parser.c which skips any 
backslash-newline combinations. It's not used everywhere it could be. 
There is also some duplicated backslash-newline handling elsewhere in 
parser.c. Replacing all the calls to pgetc() to call pgetc_eatbnl() 
instead, with the exception of the one that handles comments, and 
removing the duplicated backslash-newline handling, lets this test case 
work, as well as several other similar ones, such as:

   : &\
   & :

   : \
   <\
   <\
   EO\
   F
   123
   E\
   OF

A nice benefit is that the removal of the duplicated BSNL handling 
causes a reduction in code size.

There are probably a few corner cases I'm not handling correctly in this 
patch, though. Feedback welcome.

Cheers,
Harald van Dijk

[-- Attachment #2: parser-bnl.patch --]
[-- Type: text/x-patch, Size: 4477 bytes --]

--- a/src/parser.c
+++ b/src/parser.c
@@ -106,6 +106,7 @@ STATIC void parseheredoc(void);
 STATIC int peektoken(void);
 STATIC int readtoken(void);
 STATIC int xxreadtoken(void);
+STATIC int pgetc_eatbnl();
 STATIC int readtoken1(int, char const *, char *, int);
 STATIC void synexpect(int) __attribute__((__noreturn__));
 STATIC void synerror(const char *) __attribute__((__noreturn__));
@@ -656,7 +657,7 @@ parseheredoc(void)
 		if (needprompt) {
 			setprompt(2);
 		}
-		readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX,
+		readtoken1(pgetc_eatbnl(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX,
 				here->eofmark, here->striptabs);
 		n = (union node *)stalloc(sizeof (struct narg));
 		n->narg.type = NARG;
@@ -782,7 +783,7 @@ xxreadtoken(void)
 		setprompt(2);
 	}
 	for (;;) {	/* until token or start of word found */
-		c = pgetc();
+		c = pgetc_eatbnl();
 		switch (c) {
 		case ' ': case '\t':
 		case PEOA:
@@ -791,30 +792,23 @@ xxreadtoken(void)
 			while ((c = pgetc()) != '\n' && c != PEOF);
 			pungetc();
 			continue;
-		case '\\':
-			if (pgetc() == '\n') {
-				nlprompt();
-				continue;
-			}
-			pungetc();
-			goto breakloop;
 		case '\n':
 			nlnoprompt();
 			RETURN(TNL);
 		case PEOF:
 			RETURN(TEOF);
 		case '&':
-			if (pgetc() == '&')
+			if (pgetc_eatbnl() == '&')
 				RETURN(TAND);
 			pungetc();
 			RETURN(TBACKGND);
 		case '|':
-			if (pgetc() == '|')
+			if (pgetc_eatbnl() == '|')
 				RETURN(TOR);
 			pungetc();
 			RETURN(TPIPE);
 		case ';':
-			if (pgetc() == ';')
+			if (pgetc_eatbnl() == ';')
 				RETURN(TENDCASE);
 			pungetc();
 			RETURN(TSEMI);
@@ -822,11 +816,9 @@ xxreadtoken(void)
 			RETURN(TLP);
 		case ')':
 			RETURN(TRP);
-		default:
-			goto breakloop;
 		}
+		break;
 	}
-breakloop:
 	return readtoken1(c, BASESYNTAX, (char *)NULL, 0);
 #undef RETURN
 }
@@ -903,7 +895,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			attyline();
 			if (syntax == BASESYNTAX)
 				return readtoken();
-			c = pgetc();
+			c = pgetc_eatbnl();
 			goto loop;
 		}
 #endif
@@ -916,7 +908,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 					goto endword;	/* exit outer loop */
 				USTPUTC(c, out);
 				nlprompt();
-				c = pgetc();
+				c = pgetc_eatbnl();
 				goto loop;		/* continue outer loop */
 			case CWORD:
 				USTPUTC(c, out);
@@ -997,7 +989,7 @@ quotemark:
 					USTPUTC(c, out);
 					--parenlevel;
 				} else {
-					if (pgetc() == ')') {
+					if (pgetc_eatbnl() == ')') {
 						USTPUTC(CTLENDARI, out);
 						if (!--arinest)
 							syntax = prevsyntax;
@@ -1025,7 +1017,7 @@ quotemark:
 					USTPUTC(c, out);
 				}
 			}
-			c = pgetc();
+			c = pgetc_eatbnl();
 		}
 	}
 endword:
@@ -1132,7 +1124,7 @@ parseredir: {
 	np = (union node *)stalloc(sizeof (struct nfile));
 	if (c == '>') {
 		np->nfile.fd = 1;
-		c = pgetc();
+		c = pgetc_eatbnl();
 		if (c == '>')
 			np->type = NAPPEND;
 		else if (c == '|')
@@ -1145,7 +1137,7 @@ parseredir: {
 		}
 	} else {	/* c == '<' */
 		np->nfile.fd = 0;
-		switch (c = pgetc()) {
+		switch (c = pgetc_eatbnl()) {
 		case '<':
 			if (sizeof (struct nfile) != sizeof (struct nhere)) {
 				np = (union node *)stalloc(sizeof (struct nhere));
@@ -1154,7 +1146,7 @@ parseredir: {
 			np->type = NHERE;
 			heredoc = (struct heredoc *)stalloc(sizeof (struct heredoc));
 			heredoc->here = np;
-			if ((c = pgetc()) == '-') {
+			if ((c = pgetc_eatbnl()) == '-') {
 				heredoc->striptabs = 1;
 			} else {
 				heredoc->striptabs = 0;
@@ -1336,21 +1328,12 @@ parsebackq: {
 			if (needprompt) {
 				setprompt(2);
 			}
-			switch (pc = pgetc()) {
+			switch (pc = pgetc_eatbnl()) {
 			case '`':
 				goto done;
 
 			case '\\':
-                                if ((pc = pgetc()) == '\n') {
-					nlprompt();
-					/*
-					 * If eating a newline, avoid putting
-					 * the newline into the new character
-					 * stream (via the STPUTC after the
-					 * switch).
-					 */
-					continue;
-				}
+                                pc = pgetc_eatbnl();
                                 if (pc != '\\' && pc != '`' && pc != '$'
                                     && (!dblquote || pc != '"'))
                                         STPUTC('\\', pout);
@@ -1529,7 +1512,7 @@ expandstr(const char *ps)
 	saveprompt = doprompt;
 	doprompt = 0;
 
-	readtoken1(pgetc(), DQSYNTAX, FAKEEOFMARK, 0);
+	readtoken1(pgetc_eatbnl(), DQSYNTAX, FAKEEOFMARK, 0);
 
 	doprompt = saveprompt;
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dash tested against ash testsuite: 17 failures
  2016-10-08 19:42 ` Martijn Dekker
  2016-10-10 20:20   ` Harald van Dijk
@ 2016-10-10 21:51   ` Jilles Tjoelker
  2016-10-10 22:30     ` Harald van Dijk
  1 sibling, 1 reply; 6+ messages in thread
From: Jilles Tjoelker @ 2016-10-10 21:51 UTC (permalink / raw)
  To: Martijn Dekker; +Cc: dash

On Sat, Oct 08, 2016 at 09:42:12PM +0200, Martijn Dekker wrote:
> Op 01-10-16 om 19:17 schreef Denys Vlasenko:
> > ash-glob/glob2.tests:
> >     Evidently, dash supports \f -> ^L escape.
> >     This test uses \f as invalid backslash escape,
> >     hence differences.

> The test uses the "echo" builtin, which is very very unportable and
> explicitly not standardised by POSIX for this case. A test failure based
> on a difference in how "echo" interprets backslash escapes is not really
> relevant. Use printf instead.

> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html#tag_20_37_16
> "It is not possible to use echo portably across all POSIX systems unless
> both -n (as the first argument) and escape sequences are omitted."

True, but dash seems to be the only common one in the Linux/BSD world to
recognize so many backslash escape sequences in echo (without the
thoroughly non-standard -e option or special configuration). FreeBSD's
/bin/echo knows \c if it is final but shell builtins are more commonly
used than /bin/echo.

In some sense, it seems like this behaviour is there just to force
applications to be POSIX-compliant. I think it may be causing more
trouble than it is worth.

> > ash-misc/echo_write_error.tests
> >     EPIPE errors in echo are not reported

> They're reported as an I/O error in echo. While that is not as specific,
> it's still accurate. I don't really see a problem.

The dash code I have, git revision
afe0e0152e4dc12d84be3c02d6d62b0456d68580, returns exit status 1 but does
not write an error message to standard error. This is definitely a bug.

The error message's wording is unspecified, though.

> > ash-misc/func2.tests
> >     $((i++)) not supported

> This is not required by POSIX. Use $((i+=1)) instead.
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_04
> "The sizeof() operator and the prefix and postfix "++" and "--"
> operators are not required."

> > ash-misc/local1.tests
> >     Doesn't unset as described:
> >         local a
> >         # the above line unsets $a
> >         echo "A2:'$a'"

> >From the dash man page:
> "When a variable is made local, it inherits the initial value and
> exported and readonly flags from the variable with the same name in the
> surrounding scope, if there is one.  Otherwise, the variable is
> initially unset."

> Looks to me like dash behaves as advertised. Local variables aren't
> standardised so implementations are free to do as they please.

> (This is an interesting difference though, one I wasn't aware of until now.)

This is not the first time I heard about it but I am not convinced it
needs changing.

> > ash-misc/shift1.tests
> >     "shift N" fails if fewer than N argv[i] exists
> >     (likely not a bug, but bash does it differently)

> POSIX explicitly allows either behaviour:
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_26_14
> "If the n operand is invalid or is greater than "$#", this may be
> considered a syntax error and a non-interactive shell may exit; if the
> shell does not exit in this case, a non-zero exit status shall be
> returned. Otherwise, zero shall be returned."

> Cross-platform scripts must check that the operand is less than or equal
> than $# before invoking 'shift'.

Yes.

> > ash-redir/redir.tests
> >     echo errors due to closed stdout are not reported

> Again, they are in fact reported but as an I/O error. My response would
> be the same as for ash-misc/echo_write_error.tests above.

Same bug as above: no diagnostic is written.

> > ash-redir/redir3.tests
> >     "echo foo >&9" correctly says "9: Bad file descriptor"
> >     and exitcode is 2 (in bash, it is 1).

> Not a bug, IMO. POSIX simply says: "A failure to open or create a file
> shall cause a redirection to fail" but does not specify with what exit
> status it should fail. Any non-zero exit status < 128 should be fine.

Yes, and I think an exit status of 2 is slightly more useful if the
utility is one that uses status 1 for a "false" result and statuses
greater than 1 for real errors, while of the same usefulness if the
utility is one that uses statuses greater than 0 for errors.

> > ash-redir/redir7.tests
> > ash-redir/redir8.tests
> >     uni\x81code filename is not found by uni?code glob pattern.

> dash does not support unicode; AFAIK, this is a design choice (I'm not a
> developer).

This is actually caused by some level of Unicode support -- a simple
bytes=characters matcher would accept this. This Unicode support is in
libc's fnmatch() and rejects the invalid byte sequence.

What to do with this invalid byte sequence is an interesting question.
POSIX is not helpful since it does not consider this case.

In FreeBSD sh, I have decided that ? and bracket expressions can only
match valid characters, while * and literals can match any byte
sequence. One reason for this is the (somewhat simplistic)
implementation of ${var#word}, etc. which find the shortest or longest
matching part by repeated matching and should definitely take a full
character for ? if one exists.

> > ash-signals/reap1.tests
> >     Builtins never wait for children. This loop will not ever stop:
> >     sleep 1 &
> >     PID=$!
> >     while kill -0 $PID >/dev/null 2>&1; do
> >         true
> >     done

> Interesting bug.

> The behaviour seems to depend on the presence of a 'while' loop.
> Executed manually without a loop, it behaves as expected:

> $ sleep 10 & PID=$!
> $ kill -0 $PID
> $ kill -0 $PID
> [...etc...]
> $ kill -0 $PID
> [1] + Done                       sleep 10
> $ kill -0 $PID
> dash: 11: kill: No such process

> Executed with a loop it's infinite even in an interactive shell, but
> only the first time around:

> $ sleep 10 & PID=$!
> $ while kill -0 $PID; do :; done
> ^C
> [1] + Done                       sleep 10
> $ while kill -0 $PID; do :; done
> dash: 15: kill: No such process

> Strange behaviour and certainly looks like a rather obscure bug.

This is because dash only checks for terminated child processes at
certain points, such as when waiting for a process or printing a prompt.
For example, placing 'sleep 1' within the loop will make it work, since
the wait for the foreground process will also collect the background
process's status.

Interestingly, I already did something about this in FreeBSD sh for the
case where this may cause unbounded accumulation of zombies, but not
this case, where kill keeps succeeding indefinitely.

Note that allowing kill to fail with [ESRCH] "No such process"
inherently creates the possibility of signalling an unrelated new
process, so it should be avoided if possible.

> > ash-signals/savetrap.tests
> >     `trap` does not work as expected

> Again, the SIG prefix is not required to be supported in POSIX (see above).

> Unfortunately, according to POSIX, the feature that a subshell
> containing only a single "trap" command with no operands inherits the
> parent shell's traps so that they can be stored like save_traps=$(trap),
> is optional! Hence this can't be considered a bug in dash, although I'd
> certainly like to submit it as a feature request.

> POSIX:
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_28

For clarification, this is in XCU 2.14 Special Built-In Utilities, trap.

> "When a subshell is entered, traps that are not being ignored shall be
> set to the default actions, except in the case of a command substitution
> containing only a single trap command, when the traps need not be
> altered. Implementations may check for this case using only lexical
> analysis [...]."

The next paragraph puts additional requirements on implementations that
do not implement the this check, which likewise ensure that "$(trap)"
expands to the current traps and not always an empty string.

On another note, the quoted part seems broken. It specifies that the
traps be not only printed but completely inherited, and therefore the
EXIT trap (if any) also executed after printing the traps.

> Only bash, yash, AT&T ksh93 and busybox ash currently support this
> optional feature. This makes it extremely inconvenient to store the
> output of 'trap' in a variable in a cross-platform manner.

> (In my cross-platform shell library, modernish, which offers stack-based
> traps, I've had to implement a workaround involving the atomic,
> parallel-proof creation of a temporary file. Since 'mktemp' isn't
> cross-platform either, that has been an interesting exercise involving
> output redirection under 'set -C' in a subshell.)

Practically, that is still needed.

FreeBSD sh also supports the useful meaning of $(trap).

> > ash-signals/sigint1.tests
> >     trap "exit 0" SIGINT - does not like name "SIGINT". "INT" works.

> Not a bug. The SIG prefix is an optional extension. Only "INT" without
> the prefix is mandated by POSIX.
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_28
> "The condition can be EXIT, 0 (equivalent to EXIT), or a signal
> specified using a symbolic name, without the SIG prefix, as listed in
> the tables of signal names in the <signal.h> header defined in XBD
> Headers; for example, HUP, INT, QUIT, TERM. Implementations may permit
> names with the SIG prefix or ignore case in signal names as an extension."

> > ash-signals/signal4.tests
> >     trap "echo Trapped" BADNAME TERM - abort seeing BADNAME.
> >     bash complains, but sets the trap for the second signal.

> I don't see how this is a bug.

POSIX is not entirely clear but there is some text in XCU 1.4 Utility
Description Defaults CONSEQUENCES OF ERRORS that suggests subsequent
operands should continue to be processed after an erroring operand,
which is almost ubiquitously followed by utilities such as cp, mv, rm,
find and ls. However, the same section also says that "Default" means
any changes to the process state are unspecified.

In FreeBSD sh (SVN r199641), I considered it most sensible to continue
processing subsequent signal names (returning exit status 1 later).

> > ash-signals/signal8.tests
> >     "kill %1" in non-interactive shell does not find
> >     previously backgrounded task.

> Not a bug. '%1' is a job control job ID, but in a non-interactive shell
> there's no job control. Either 'set -m' to turn on job control first
> (and deal with extra terminal clutter), or just use "$!" to kill by PID.

In my reading of POSIX, there is nothing that makes the %job notation
depend on 'set -m'. However, the description of %job operands in kill
requires a signal to be sent to a background process group, which does
not exist if 'set -m' was not in effect when the job was created.

An obvious generalization is to send the signal to each process in the
job if 'set -m' was not in effect when it was created.

The %1 syntax avoids process ID reuse issues but unfortunately it cannot
usually be used if there are multiple background jobs since it may be
unclear when a job number becomes available for reuse.

> > ash-vars/var-utf8-length.tests
> >     ${#VAR} counts unicode chars in bash

> dash does not support unicode; AFAIK, this is a design choice (I'm not a
> developer).

That may be but it does cause a non-compliance when the entire system
wants to support UTF-8.

> > ash-vars/var_unbackslash.tests
> >     b=-$a-\t-\\-\"-\`-\--\z-\*-\?-
> >     b="-$a-\t-\\-\"-\`-\--\z-\*-\?-"
> >     b='-$a-\t-\\-\"-\`-\--\z-\*-\?-'
> >     b1=$b
> >     "b1=$b"
> >     You can imagine... not everything went well here...

> Again, this is probably down to legitimate differences in the
> notoriously unportable "echo" builtin. Test this using printf instead.

Seems like it.

> > ash-vars/var_unbackslash.tests

> ITYM ash-vars/var_unbackslash1.tests

> >     echo Forty two:$\
> >     (\
> >     (\
> >     42\
> >     )\
> >     )
> >     dash says: Syntax error: Missing '))'

> Yes, but it's not clear to me that it shouldn't.

> Hmm... maybe this is indeed a bug:
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01
> "A <backslash> that is not quoted shall preserve the literal value of
> the following character, with the exception of a <newline>. If a
> <newline> follows the <backslash>, the shell shall interpret this as
> line continuation. The <backslash> and <newline> shall be removed before
> splitting the input into tokens. Since the escaped <newline> is removed
> entirely from the input and is not replaced by any white space, it
> cannot serve as a token separator."

> So, unless I'm misreading this, it looks like backslashes need to be
> parsed before *any* other kind of lexical analysis.

Yes, for <backslash><newline> sequences that are not quoted.

For example, <single-quote><backslash><newline><single-quote> contains
two characters between the quotes, not zero.

-- 
Jilles Tjoelker

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dash tested against ash testsuite: 17 failures
  2016-10-10 21:51   ` Jilles Tjoelker
@ 2016-10-10 22:30     ` Harald van Dijk
  0 siblings, 0 replies; 6+ messages in thread
From: Harald van Dijk @ 2016-10-10 22:30 UTC (permalink / raw)
  To: Jilles Tjoelker, Martijn Dekker; +Cc: dash

On 10-10-16 23:51, Jilles Tjoelker wrote:
> On Sat, Oct 08, 2016 at 09:42:12PM +0200, Martijn Dekker wrote:
>> Op 01-10-16 om 19:17 schreef Denys Vlasenko:
>>> ash-vars/var_unbackslash.tests
>
>> ITYM ash-vars/var_unbackslash1.tests
>
>>>     echo Forty two:$\
>>>     (\
>>>     (\
>>>     42\
>>>     )\
>>>     )
>>>     dash says: Syntax error: Missing '))'
>
>> Yes, but it's not clear to me that it shouldn't.
>
>> Hmm... maybe this is indeed a bug:
>> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01
>> "A <backslash> that is not quoted shall preserve the literal value of
>> the following character, with the exception of a <newline>. If a
>> <newline> follows the <backslash>, the shell shall interpret this as
>> line continuation. The <backslash> and <newline> shall be removed before
>> splitting the input into tokens. Since the escaped <newline> is removed
>> entirely from the input and is not replaced by any white space, it
>> cannot serve as a token separator."
>
>> So, unless I'm misreading this, it looks like backslashes need to be
>> parsed before *any* other kind of lexical analysis.
>
> Yes, for <backslash><newline> sequences that are not quoted.
>
> For example, <single-quote><backslash><newline><single-quote> contains
> two characters between the quotes, not zero.

Ah, right, I missed that exception. Note that it only applies to 
single-quoted strings. In double-quoted strings, backslash-newline 
should be removed just as when unquoted.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dash tested against ash testsuite: 17 failures
  2016-10-10 20:20   ` Harald van Dijk
@ 2016-10-12 17:24     ` Harald van Dijk
  0 siblings, 0 replies; 6+ messages in thread
From: Harald van Dijk @ 2016-10-12 17:24 UTC (permalink / raw)
  To: Martijn Dekker, dash

[-- Attachment #1: Type: text/plain, Size: 2977 bytes --]

On 10/10/16 22:20, Harald van Dijk wrote:
> On 08/10/16 21:42, Martijn Dekker wrote:
>> Op 01-10-16 om 19:17 schreef Denys Vlasenko:
>>> ash-vars/var_unbackslash.tests
>>
>> ITYM ash-vars/var_unbackslash1.tests
>>
>>>     echo Forty two:$\
>>>     (\
>>>     (\
>>>     42\
>>>     )\
>>>     )
>>>     dash says: Syntax error: Missing '))'
>>
>> Yes, but it's not clear to me that it shouldn't.
>>
>> Hmm... maybe this is indeed a bug:
>> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01
>>
>> "A <backslash> that is not quoted shall preserve the literal value of
>> the following character, with the exception of a <newline>. If a
>> <newline> follows the <backslash>, the shell shall interpret this as
>> line continuation. The <backslash> and <newline> shall be removed before
>> splitting the input into tokens. Since the escaped <newline> is removed
>> entirely from the input and is not replaced by any white space, it
>> cannot serve as a token separator."
>>
>> So, unless I'm misreading this, it looks like backslashes need to be
>> parsed before *any* other kind of lexical analysis.
>
> There does appear to be one exception: a comment may end with a
> backslash. This does not cause the next line to be treated as a comment:
> once a # is seen, the remaining characters on the line are not subjected
> to the regular lexical analysis, so the above does not apply.
>
> I would have expected another exception to be in alias expansions that
> end in a backslash. Shells are not entirely in agreement there, but most
> appear to treat this the regular way, meaning
>
>   dash -c 'alias bs=\\
>   bs
>   '
>
> prints nothing.
>
> dash has a pgetc_eatbnl function already in parser.c which skips any
> backslash-newline combinations. It's not used everywhere it could be.
> There is also some duplicated backslash-newline handling elsewhere in
> parser.c. Replacing all the calls to pgetc() to call pgetc_eatbnl()
> instead, with the exception of the one that handles comments, and
> removing the duplicated backslash-newline handling, lets this test case
> work, as well as several other similar ones, such as:
>
>   : &\
>   & :
>
>   : \
>   <\
>   <\
>   EO\
>   F
>   123
>   E\
>   OF
>
> A nice benefit is that the removal of the duplicated BSNL handling
> causes a reduction in code size.
>
> There are probably a few corner cases I'm not handling correctly in this
> patch, though. Feedback welcome.

With more extensive testing, the only issue I've seen is what Jilles 
Tjoelker had already mentioned, namely that backslash-newline should be 
preserved inside single-quoted strings, and also that it should be 
preserved inside heredocs where any part of the delimiter is quoted:

   cat <<\EOF
   \
   EOF

dash's parsing treats this mostly the same as a single-quoted string, 
and the same extra check handles both cases.

Here's an updated patch. Hoping this looks okay and can be applied.

> Cheers,
> Harald van Dijk

[-- Attachment #2: parser-bnl.patch --]
[-- Type: text/x-patch, Size: 4955 bytes --]

--- a/src/parser.c
+++ b/src/parser.c
@@ -106,6 +106,7 @@ STATIC void parseheredoc(void);
 STATIC int peektoken(void);
 STATIC int readtoken(void);
 STATIC int xxreadtoken(void);
+STATIC int pgetc_eatbnl();
 STATIC int readtoken1(int, char const *, char *, int);
 STATIC void synexpect(int) __attribute__((__noreturn__));
 STATIC void synerror(const char *) __attribute__((__noreturn__));
@@ -656,8 +657,10 @@ parseheredoc(void)
 		if (needprompt) {
 			setprompt(2);
 		}
-		readtoken1(pgetc(), here->here->type == NHERE? SQSYNTAX : DQSYNTAX,
-				here->eofmark, here->striptabs);
+		if (here->here->type == NHERE)
+			readtoken1(pgetc(), SQSYNTAX, here->eofmark, here->striptabs);
+		else
+			readtoken1(pgetc_eatbnl(), DQSYNTAX, here->eofmark, here->striptabs);
 		n = (union node *)stalloc(sizeof (struct narg));
 		n->narg.type = NARG;
 		n->narg.next = NULL;
@@ -782,7 +785,7 @@ xxreadtoken(void)
 		setprompt(2);
 	}
 	for (;;) {	/* until token or start of word found */
-		c = pgetc();
+		c = pgetc_eatbnl();
 		switch (c) {
 		case ' ': case '\t':
 		case PEOA:
@@ -791,30 +794,23 @@ xxreadtoken(void)
 			while ((c = pgetc()) != '\n' && c != PEOF);
 			pungetc();
 			continue;
-		case '\\':
-			if (pgetc() == '\n') {
-				nlprompt();
-				continue;
-			}
-			pungetc();
-			goto breakloop;
 		case '\n':
 			nlnoprompt();
 			RETURN(TNL);
 		case PEOF:
 			RETURN(TEOF);
 		case '&':
-			if (pgetc() == '&')
+			if (pgetc_eatbnl() == '&')
 				RETURN(TAND);
 			pungetc();
 			RETURN(TBACKGND);
 		case '|':
-			if (pgetc() == '|')
+			if (pgetc_eatbnl() == '|')
 				RETURN(TOR);
 			pungetc();
 			RETURN(TPIPE);
 		case ';':
-			if (pgetc() == ';')
+			if (pgetc_eatbnl() == ';')
 				RETURN(TENDCASE);
 			pungetc();
 			RETURN(TSEMI);
@@ -822,11 +818,9 @@ xxreadtoken(void)
 			RETURN(TLP);
 		case ')':
 			RETURN(TRP);
-		default:
-			goto breakloop;
 		}
+		break;
 	}
-breakloop:
 	return readtoken1(c, BASESYNTAX, (char *)NULL, 0);
 #undef RETURN
 }
@@ -903,7 +897,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 			attyline();
 			if (syntax == BASESYNTAX)
 				return readtoken();
-			c = pgetc();
+			c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 			goto loop;
 		}
 #endif
@@ -916,7 +910,7 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 					goto endword;	/* exit outer loop */
 				USTPUTC(c, out);
 				nlprompt();
-				c = pgetc();
+				c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 				goto loop;		/* continue outer loop */
 			case CWORD:
 				USTPUTC(c, out);
@@ -933,8 +927,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 					USTPUTC(CTLESC, out);
 					USTPUTC('\\', out);
 					pungetc();
-				} else if (c == '\n') {
-					nlprompt();
 				} else {
 					if (
 						dblquote &&
@@ -997,7 +989,7 @@ quotemark:
 					USTPUTC(c, out);
 					--parenlevel;
 				} else {
-					if (pgetc() == ')') {
+					if (pgetc_eatbnl() == ')') {
 						USTPUTC(CTLENDARI, out);
 						if (!--arinest)
 							syntax = prevsyntax;
@@ -1025,7 +1017,7 @@ quotemark:
 					USTPUTC(c, out);
 				}
 			}
-			c = pgetc();
+			c = syntax == SQSYNTAX ? pgetc() : pgetc_eatbnl();
 		}
 	}
 endword:
@@ -1132,7 +1124,7 @@ parseredir: {
 	np = (union node *)stalloc(sizeof (struct nfile));
 	if (c == '>') {
 		np->nfile.fd = 1;
-		c = pgetc();
+		c = pgetc_eatbnl();
 		if (c == '>')
 			np->type = NAPPEND;
 		else if (c == '|')
@@ -1145,7 +1137,7 @@ parseredir: {
 		}
 	} else {	/* c == '<' */
 		np->nfile.fd = 0;
-		switch (c = pgetc()) {
+		switch (c = pgetc_eatbnl()) {
 		case '<':
 			if (sizeof (struct nfile) != sizeof (struct nhere)) {
 				np = (union node *)stalloc(sizeof (struct nhere));
@@ -1154,7 +1146,7 @@ parseredir: {
 			np->type = NHERE;
 			heredoc = (struct heredoc *)stalloc(sizeof (struct heredoc));
 			heredoc->here = np;
-			if ((c = pgetc()) == '-') {
+			if ((c = pgetc_eatbnl()) == '-') {
 				heredoc->striptabs = 1;
 			} else {
 				heredoc->striptabs = 0;
@@ -1336,21 +1328,12 @@ parsebackq: {
 			if (needprompt) {
 				setprompt(2);
 			}
-			switch (pc = pgetc()) {
+			switch (pc = pgetc_eatbnl()) {
 			case '`':
 				goto done;
 
 			case '\\':
-                                if ((pc = pgetc()) == '\n') {
-					nlprompt();
-					/*
-					 * If eating a newline, avoid putting
-					 * the newline into the new character
-					 * stream (via the STPUTC after the
-					 * switch).
-					 */
-					continue;
-				}
+                                pc = pgetc_eatbnl();
                                 if (pc != '\\' && pc != '`' && pc != '$'
                                     && (!dblquote || pc != '"'))
                                         STPUTC('\\', pout);
@@ -1529,7 +1512,7 @@ expandstr(const char *ps)
 	saveprompt = doprompt;
 	doprompt = 0;
 
-	readtoken1(pgetc(), DQSYNTAX, FAKEEOFMARK, 0);
+	readtoken1(pgetc_eatbnl(), DQSYNTAX, FAKEEOFMARK, 0);
 
 	doprompt = saveprompt;
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-10-12 17:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-01 17:17 dash tested against ash testsuite: 17 failures Denys Vlasenko
2016-10-08 19:42 ` Martijn Dekker
2016-10-10 20:20   ` Harald van Dijk
2016-10-12 17:24     ` Harald van Dijk
2016-10-10 21:51   ` Jilles Tjoelker
2016-10-10 22:30     ` Harald van Dijk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).