dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* dash: read does not ignore trailing spaces
@ 2015-12-02 22:37 Gioele Barabucci
  2015-12-02 23:25 ` Jilles Tjoelker
  2015-12-03 21:02 ` Harald van Dijk
  0 siblings, 2 replies; 20+ messages in thread
From: Gioele Barabucci @ 2015-12-02 22:37 UTC (permalink / raw)
  To: dash

Hello,

I am forwarding a bug [1] reported by a Debian user: `read` does not
ignore trailing spaces. The current version of dash is affected by
this bug.

A simple test from the original reporter:

    $ dash -c 'echo "  a b  " | { read v ; echo "<$v>" ; }'
    <a b  >

    $ bash -c 'echo "  a b  " | { read v ; echo "<$v>" ; }'
    <a b>

Other shells like posh and mksh behave like bash.

This error is reproducible with dash 0.5.7 and with the current master
git master branch, commit 2e5842258bd5b252ffdaa630db09c9a19a9717ca.

[1] https://bugs.debian.org/794965

--
Gioele Barabucci <gioele@svario.it>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dash: read does not ignore trailing spaces
  2015-12-02 22:37 dash: read does not ignore trailing spaces Gioele Barabucci
@ 2015-12-02 23:25 ` Jilles Tjoelker
  2015-12-03 21:02 ` Harald van Dijk
  1 sibling, 0 replies; 20+ messages in thread
From: Jilles Tjoelker @ 2015-12-02 23:25 UTC (permalink / raw)
  To: Gioele Barabucci; +Cc: dash

On Wed, Dec 02, 2015 at 11:37:17PM +0100, Gioele Barabucci wrote:
> I am forwarding a bug [1] reported by a Debian user: `read` does not
> ignore trailing spaces. The current version of dash is affected by
> this bug.

> A simple test from the original reporter:

>     $ dash -c 'echo "  a b  " | { read v ; echo "<$v>" ; }'
>     <a b  >

>     $ bash -c 'echo "  a b  " | { read v ; echo "<$v>" ; }'
>     <a b>

> Other shells like posh and mksh behave like bash.

> This error is reproducible with dash 0.5.7 and with the current master
> git master branch, commit 2e5842258bd5b252ffdaa630db09c9a19a9717ca.

> [1] https://bugs.debian.org/794965

This is a valid bug. Note that it only occurs when there are more fields
than variables. For example,
  dash -c 'echo "  a  " | { read v ; echo "<$v>" ; }'
correctly prints <a>.

Since dash has its own code for read's splitting, it is not possible to
take a fix from NetBSD or FreeBSD sh, other than by replacing the
splitting code completely with their version.

-- 
Jilles Tjoelker

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dash: read does not ignore trailing spaces
  2015-12-02 22:37 dash: read does not ignore trailing spaces Gioele Barabucci
  2015-12-02 23:25 ` Jilles Tjoelker
@ 2015-12-03 21:02 ` Harald van Dijk
  2015-12-03 21:17   ` Stephane Chazelas
  1 sibling, 1 reply; 20+ messages in thread
From: Harald van Dijk @ 2015-12-03 21:02 UTC (permalink / raw)
  To: Gioele Barabucci; +Cc: dash

[-- Attachment #1: Type: text/plain, Size: 1829 bytes --]

On 02/12/2015 23:37, Gioele Barabucci wrote:
> Hello,
>
> I am forwarding a bug [1] reported by a Debian user: `read` does not
> ignore trailing spaces. The current version of dash is affected by
> this bug.
>
> A simple test from the original reporter:
>
>      $ dash -c 'echo "  a b  " | { read v ; echo "<$v>" ; }'
>      <a b  >
>
>      $ bash -c 'echo "  a b  " | { read v ; echo "<$v>" ; }'
>      <a b>
>
> Other shells like posh and mksh behave like bash.

This is indeed a bug based on the current specification. In the past, 
the specification was unclear and the dash interpretation was a 
legitimate one, but currently it explicitly spells out that trailing IFS 
whitespace gets ignored.

However, unless I'm misreading the spec, mksh and bash don't seem to 
implement it properly either: only trailing IFS whitespace is supposed 
to be dropped. IFS non-whitespace is supposed to remain, even at the end 
of the input. mksh and bash remove it, posh and zsh leave it in:

   $ for shell in bash mksh posh zsh; do printf %s: "$shell"; $shell -c 
'IFS=,; echo a, | { read v; echo "<$v>"; }'; done
   bash:<a>
   mksh:<a>
   posh:<a,>
   zsh:<a,>

As far as I can tell, the posh/zsh behaviour is the correct behaviour, 
but I'm not convinced yet my interpretation is correct.

Attached is a not fully tested proof of concept to implement the 
posh/zsh behaviour in dash by extending ifsbreakup() to allow specifying 
a maximum number of arguments instead of fixing it up in 
readcmd_handle_line(). It returns <a b> in your test, and <a,> in mine. 
Feedback welcome.

Cheers,
Harald van Dijk

> This error is reproducible with dash 0.5.7 and with the current master
> git master branch, commit 2e5842258bd5b252ffdaa630db09c9a19a9717ca.
>
> [1] https://bugs.debian.org/794965
>
> --
> Gioele Barabucci <gioele@svario.it>

[-- Attachment #2: dash-read-ifs.patch --]
[-- Type: text/plain, Size: 3999 bytes --]

diff --git a/src/expand.c b/src/expand.c
index b2d710d..6afd562 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -203,7 +203,7 @@ expandarg(union node *arg, struct arglist *arglist, int flag)
 	 * TODO - EXP_REDIR
 	 */
 	if (flag & EXP_FULL) {
-		ifsbreakup(p, &exparg);
+		ifsbreakup(p, 0, &exparg);
 		*exparg.lastp = NULL;
 		exparg.lastp = &exparg.list;
 		expandmeta(exparg.list, flag);
@@ -1016,9 +1016,11 @@ recordregion(int start, int end, int nulonly)
  * Break the argument string into pieces based upon IFS and add the
  * strings to the argument list.  The regions of the string to be
  * searched for IFS characters have been stored by recordregion.
+ * If maxargs is non-zero, at most maxargs arguments will be created, by
+ * joining together the last arguments.
  */
 void
-ifsbreakup(char *string, struct arglist *arglist)
+ifsbreakup(char *string, int maxargs, struct arglist *arglist)
 {
 	struct ifsregion *ifsp;
 	struct strlist *sp;
@@ -1054,12 +1056,36 @@ ifsbreakup(char *string, struct arglist *arglist)
 						start = p;
 						continue;
 					}
+					/* If only reading one more argument, combine any field terminators,
+					 * except for trailing IFS whitespace. */
+					if (maxargs == 1) {
+						q = p;
+						p++;
+						if (ifsspc) {
+							/* Ignore IFS whitespace at end */
+							for (;;) {
+								if (p >= string + ifsp->endoff) {
+									*q = '\0';
+									goto add;
+								}
+								if (*p == (char)CTLESC)
+									p++;
+								ifsspc = strchr(ifs, *p) && strchr(defifs, *p);
+								p++;
+								if (!ifsspc) {
+									break;
+								}
+							}
+						}
+						continue;
+					}
 					*q = '\0';
 					sp = (struct strlist *)stalloc(sizeof *sp);
 					sp->text = start;
 					*arglist->lastp = sp;
 					arglist->lastp = &sp->next;
 					p++;
+					if (maxargs) maxargs--;
 					if (!nulonly) {
 						for (;;) {
 							if (p >= string + ifsp->endoff) {
diff --git a/src/expand.h b/src/expand.h
index 6a90f67..26dc5b4 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -69,7 +69,7 @@ char *_rmescapes(char *, int);
 int casematch(union node *, char *);
 void recordregion(int, int, int);
 void removerecordregions(int); 
-void ifsbreakup(char *, struct arglist *);
+void ifsbreakup(char *, int, struct arglist *);
 void ifsfree(void);
 
 /* From arith.y */
diff --git a/src/miscbltin.c b/src/miscbltin.c
index b596fd2..39b9c47 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -67,28 +67,21 @@
  *  less fields than variables -> remaining variables unset.
  *
  *  @param line complete line of input
+ *  @param ac argument count
  *  @param ap argument (variable) list
  *  @param len length of line including trailing '\0'
  */
 static void
-readcmd_handle_line(char *s, char **ap)
+readcmd_handle_line(char *s, int ac, char **ap)
 {
 	struct arglist arglist;
 	struct strlist *sl;
-	char *backup;
-	char *line;
 
-	/* ifsbreakup will fiddle with stack region... */
-	line = stackblock();
 	s = grabstackstr(s);
 
-	/* need a copy, so that delimiters aren't lost
-	 * in case there are more fields than variables */
-	backup = sstrdup(line);
-
 	arglist.lastp = &arglist.list;
 	
-	ifsbreakup(s, &arglist);
+	ifsbreakup(s, ac, &arglist);
 	*arglist.lastp = NULL;
 	ifsfree();
 
@@ -104,21 +97,6 @@ readcmd_handle_line(char *s, char **ap)
 			return;
 		}
 
-		/* remaining fields present, but no variables left. */
-		if (!ap[1] && sl->next) {
-			size_t offset;
-			char *remainder;
-
-			/* FIXME little bit hacky, assuming that ifsbreakup 
-			 * will not modify the length of the string */
-			offset = sl->text - s;
-			remainder = backup + offset;
-			rmescapes(remainder);
-			setvar(*ap, remainder, 0);
-
-			return;
-		}
-		
 		/* set variable to field */
 		rmescapes(sl->text);
 		setvar(*ap, sl->text, 0);
@@ -211,7 +189,7 @@ start:
 out:
 	recordregion(startloc, p - (char *)stackblock(), 0);
 	STACKSTRNUL(p);
-	readcmd_handle_line(p + 1, ap);
+	readcmd_handle_line(p + 1, argc - (ap - argv), ap);
 	return status;
 }
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: dash: read does not ignore trailing spaces
  2015-12-03 21:02 ` Harald van Dijk
@ 2015-12-03 21:17   ` Stephane Chazelas
  2015-12-03 21:43     ` Martijn Dekker
  2015-12-03 22:26     ` Harald van Dijk
  0 siblings, 2 replies; 20+ messages in thread
From: Stephane Chazelas @ 2015-12-03 21:17 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: Gioele Barabucci, dash

2015-12-03 22:02:14 +0100, Harald van Dijk:
[....]
>   $ for shell in bash mksh posh zsh; do printf %s: "$shell"; $shell
> -c 'IFS=,; echo a, | { read v; echo "<$v>"; }'; done
>   bash:<a>
>   mksh:<a>
>   posh:<a,>
>   zsh:<a,>
> 
> As far as I can tell, the posh/zsh behaviour is the correct
> behaviour, but I'm not convinced yet my interpretation is correct.
[...]

No, that would be the same as for:

v=a:b:
IFS=:
set -f

set -- $v

It's meant to split into "a" and "b", not "a", "b" and "". As
":" is meant to be treated as a *delimiter* or *terminator*.

That has been discussed a few times on the austin group mailing
list.

zsh and pdksh (and other descendants of the Forsyth shell) treat it as
separator (and are not compliant), mksh (derived from pdksh)
changed it recently. posh (also based on pdksh) still hasn't changed it.

That's a bit counter-intuitive and means for instance you can't
use $IFS to split variables like $PATH/$LD_LIBRARY_PATH...

In the case of read, it even makes less sense because:

~$ echo a:: | sh -c 'IFS=: read a; echo "$a"'
a::
~$ echo a: | sh -c 'IFS=: read a; echo "$a"'
a

But that's how it's specified.

So dash is indeed conformant there.

-- 
Stephane

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dash: read does not ignore trailing spaces
  2015-12-03 21:17   ` Stephane Chazelas
@ 2015-12-03 21:43     ` Martijn Dekker
  2015-12-03 23:04       ` Stephane Chazelas
  2015-12-03 22:26     ` Harald van Dijk
  1 sibling, 1 reply; 20+ messages in thread
From: Martijn Dekker @ 2015-12-03 21:43 UTC (permalink / raw)
  To: dash

Stephane Chazelas schreef op 03-12-15 om 22:17:
> It's meant to split into "a" and "b", not "a", "b" and "". As
> ":" is meant to be treated as a *delimiter* or *terminator*.

That was my interpretation of the standard, too. So I reported this as a
bug to author of yash, but he reads the standard differently and came up
with a good argument for that. See:

https://osdn.jp/ticket/browse.php?group_id=3863&tid=35283#comment:3863:35283:1435293070

Summarising: POSIX states that "each occurrence in the input of an IFS
character that is not IFS white space, along with any adjacent IFS white
space, shall delimit a field". This *may* be interpreted to read that a
final non-whitespace IFS character denotes an empty final field, because
otherwise that final character wouldn't be delimiting any field, but
only terminating one. It's pretty ambiguous, though.

Based on that discussion I concluded that both interpretations are
defensible, so I didn't report this to the zsh-workers list as a bug.
(In the shell library I'm developing, which you have an earlier copy of,
I gave this quirk the id QRK_IFSFINAL instead of BUG_IFSFINAL to reflect
that it's a quirk.) Maybe this is still worth bringing up on zsh-workers
as a discussion, but lately I haven't had much time.

- Martijn


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dash: read does not ignore trailing spaces
  2015-12-03 21:17   ` Stephane Chazelas
  2015-12-03 21:43     ` Martijn Dekker
@ 2015-12-03 22:26     ` Harald van Dijk
  2015-12-04 19:51       ` Harald van Dijk
  1 sibling, 1 reply; 20+ messages in thread
From: Harald van Dijk @ 2015-12-03 22:26 UTC (permalink / raw)
  To: Stephane Chazelas; +Cc: Gioele Barabucci, dash

On 03/12/2015 22:17, Stephane Chazelas wrote:
> 2015-12-03 22:02:14 +0100, Harald van Dijk:
> [....]
>>    $ for shell in bash mksh posh zsh; do printf %s: "$shell"; $shell
>> -c 'IFS=,; echo a, | { read v; echo "<$v>"; }'; done
>>    bash:<a>
>>    mksh:<a>
>>    posh:<a,>
>>    zsh:<a,>
>>
>> As far as I can tell, the posh/zsh behaviour is the correct
>> behaviour, but I'm not convinced yet my interpretation is correct.
> [...]
>
> No, that would be the same as for:
>
> v=a:b:
> IFS=:
> set -f
>
> set -- $v
>
> It's meant to split into "a" and "b", not "a", "b" and "". As
> ":" is meant to be treated as a *delimiter* or *terminator*.
>
> That has been discussed a few times on the austin group mailing
> list.
>
> zsh and pdksh (and other descendants of the Forsyth shell) treat it as
> separator (and are not compliant), mksh (derived from pdksh)
> changed it recently. posh (also based on pdksh) still hasn't changed it.

zsh indeed expands this into "a", "b" and "". The same version of posh 
that gives <a,> for my test gives just "a" and "b" for yours though.

I do see your point. Thanks for the clear example, I think I agree with 
you, the description of field splitting mentions that delimiters are 
used as terminators:

   "The shell shall treat each character of the IFS as a delimiter and 
use the delimiters as field terminators to [...]"

It should not be much of a problem to extend the patch I posted to cover 
the rules as you describe them, I will make an attempt at this later.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dash: read does not ignore trailing spaces
  2015-12-03 21:43     ` Martijn Dekker
@ 2015-12-03 23:04       ` Stephane Chazelas
  2015-12-03 23:17         ` Stephane Chazelas
  0 siblings, 1 reply; 20+ messages in thread
From: Stephane Chazelas @ 2015-12-03 23:04 UTC (permalink / raw)
  To: Martijn Dekker; +Cc: dash

2015-12-03 22:43:39 +0100, Martijn Dekker:
> Stephane Chazelas schreef op 03-12-15 om 22:17:
> > It's meant to split into "a" and "b", not "a", "b" and "". As
> > ":" is meant to be treated as a *delimiter* or *terminator*.
> 
> That was my interpretation of the standard, too. So I reported this as a
> bug to author of yash, but he reads the standard differently and came up
> with a good argument for that. See:
> 
> https://osdn.jp/ticket/browse.php?group_id=3863&tid=35283#comment:3863:35283:1435293070
> 
> Summarising: POSIX states that "each occurrence in the input of an IFS
> character that is not IFS white space, along with any adjacent IFS white
> space, shall delimit a field". This *may* be interpreted to read that a
> final non-whitespace IFS character denotes an empty final field, because
> otherwise that final character wouldn't be delimiting any field, but
> only terminating one. It's pretty ambiguous, though.
[...]

I agree the spec is not very clear
http://thread.gmane.org/gmane.comp.shells.bash.bugs/4825
http://thread.gmane.org/gmane.comp.shells.bash.bugs/15768

But see this interpretation:
https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html

I can't find the austin-group discussions, but I'm pretty sure
I've seen several and Chet is refering to one from 2005 over
there.

-- 
Stephane

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dash: read does not ignore trailing spaces
  2015-12-03 23:04       ` Stephane Chazelas
@ 2015-12-03 23:17         ` Stephane Chazelas
  2015-12-04  0:00           ` Stephane Chazelas
  0 siblings, 1 reply; 20+ messages in thread
From: Stephane Chazelas @ 2015-12-03 23:17 UTC (permalink / raw)
  To: dash

2015-12-03 23:04:31 +0000, Stephane Chazelas:
[...]
> > Summarising: POSIX states that "each occurrence in the input of an IFS
> > character that is not IFS white space, along with any adjacent IFS white
> > space, shall delimit a field". This *may* be interpreted to read that a
> > final non-whitespace IFS character denotes an empty final field, because
> > otherwise that final character wouldn't be delimiting any field, but
> > only terminating one. It's pretty ambiguous, though.
> [...]
> 
> I agree the spec is not very clear
> http://thread.gmane.org/gmane.comp.shells.bash.bugs/4825
> http://thread.gmane.org/gmane.comp.shells.bash.bugs/15768
> 
> But see this interpretation:
> https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html
> 
> I can't find the austin-group discussions, but I'm pretty sure
> I've seen several and Chet is refering to one from 2005 over
> there.
[...]

See also:
https://groups.google.com/forum/#!topic/comp.unix.shell/krZy2rvnv2g
(Geoff is from the OpenGroup).

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=512438
I actually was the one raising it for posh. I (and the posh
maintainer apparently) failed to notice that it was applying to
"read" there as well.

-- 
Stephane


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dash: read does not ignore trailing spaces
  2015-12-03 23:17         ` Stephane Chazelas
@ 2015-12-04  0:00           ` Stephane Chazelas
  0 siblings, 0 replies; 20+ messages in thread
From: Stephane Chazelas @ 2015-12-04  0:00 UTC (permalink / raw)
  To: dash

2015-12-03 23:17:58 +0000, Stephane Chazelas:
> 2015-12-03 23:04:31 +0000, Stephane Chazelas:
> [...]
> > > Summarising: POSIX states that "each occurrence in the input of an IFS
> > > character that is not IFS white space, along with any adjacent IFS white
> > > space, shall delimit a field". This *may* be interpreted to read that a
> > > final non-whitespace IFS character denotes an empty final field, because
> > > otherwise that final character wouldn't be delimiting any field, but
> > > only terminating one. It's pretty ambiguous, though.
> > [...]
> > 
> > I agree the spec is not very clear
> > http://thread.gmane.org/gmane.comp.shells.bash.bugs/4825
> > http://thread.gmane.org/gmane.comp.shells.bash.bugs/15768
> > 
> > But see this interpretation:
> > https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html
> > 
> > I can't find the austin-group discussions, but I'm pretty sure
> > I've seen several and Chet is refering to one from 2005 over
> > there.
> [...]
> 
> See also:
> https://groups.google.com/forum/#!topic/comp.unix.shell/krZy2rvnv2g
> (Geoff is from the OpenGroup).
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=512438
> I actually was the one raising it for posh. I (and the posh
> maintainer apparently) failed to notice that it was applying to
> "read" there as well.
[...]

More:

http://thread.gmane.org/gmane.comp.shells.bash.bugs/9661/focus=9662

Where Chet (I think) refers to:
https://standards.ieee.org/findstds/interps/1003-2-92_int/pasc-1003.2-98.html

The 2005 Austin group discussion can be seen on the web archive:
http://web.archive.org/web/20050301075813/http://www.opengroup.org/austin/mailarchives/

One article there was forwarded to the zsh mailing list:
http://www.zsh.org/mla/workers/2005/msg00306.html
(but not acted upon)

Which leads to https://www.opengroup.org/austin/aardvark/latest/xcubug2.txt
confirming that
https://standards.ieee.org/findstds/interps/1003.1-2001/1003.1-2001-98.html
is contemporary and a follow-up of that 2005 discussion.

-- 
Stephane

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: dash: read does not ignore trailing spaces
  2015-12-03 22:26     ` Harald van Dijk
@ 2015-12-04 19:51       ` Harald van Dijk
  2016-01-29 12:57         ` Martijn Dekker
  2016-06-06  8:48         ` [PATCH v2] builtin: Fix handling of trailing IFS white spaces Herbert Xu
  0 siblings, 2 replies; 20+ messages in thread
From: Harald van Dijk @ 2015-12-04 19:51 UTC (permalink / raw)
  To: Stephane Chazelas; +Cc: Gioele Barabucci, dash

[-- Attachment #1: Type: text/plain, Size: 1349 bytes --]

On 03/12/2015 23:26, Harald van Dijk wrote:
> On 03/12/2015 22:17, Stephane Chazelas wrote:
>> 2015-12-03 22:02:14 +0100, Harald van Dijk:
>> [....]
>>>    $ for shell in bash mksh posh zsh; do printf %s: "$shell"; $shell
>>> -c 'IFS=,; echo a, | { read v; echo "<$v>"; }'; done
>>>    bash:<a>
>>>    mksh:<a>
>>>    posh:<a,>
>>>    zsh:<a,>
>>>
>>> As far as I can tell, the posh/zsh behaviour is the correct
>>> behaviour, but I'm not convinced yet my interpretation is correct.
>> [...]
>>
>> zsh and pdksh (and other descendants of the Forsyth shell) treat it as
>> separator (and are not compliant), mksh (derived from pdksh)
>> changed it recently. posh (also based on pdksh) still hasn't changed it.
>
> [...]
> I do see your point. Thanks for the clear example, I think I agree with
> you, the description of field splitting mentions that delimiters are
> used as terminators:
>
>    "The shell shall treat each character of the IFS as a delimiter and
> use the delimiters as field terminators to [...]"
>
> It should not be much of a problem to extend the patch I posted to cover
> the rules as you describe them, I will make an attempt at this later.

Here it is. Attached is an updated patch that ignores the complete 
terminator if only a single field remains, otherwise ignores only 
trailing IFS whitespace.

Cheers,
Harald van Dijk

[-- Attachment #2: dash-read-ifs.patch --]
[-- Type: text/plain, Size: 4515 bytes --]

diff --git a/src/expand.c b/src/expand.c
index b2d710d..c6e87d5 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -203,7 +203,7 @@ expandarg(union node *arg, struct arglist *arglist, int flag)
 	 * TODO - EXP_REDIR
 	 */
 	if (flag & EXP_FULL) {
-		ifsbreakup(p, &exparg);
+		ifsbreakup(p, 0, &exparg);
 		*exparg.lastp = NULL;
 		exparg.lastp = &exparg.list;
 		expandmeta(exparg.list, flag);
@@ -1016,9 +1016,11 @@ recordregion(int start, int end, int nulonly)
  * Break the argument string into pieces based upon IFS and add the
  * strings to the argument list.  The regions of the string to be
  * searched for IFS characters have been stored by recordregion.
+ * If maxargs is non-zero, at most maxargs arguments will be created, by
+ * joining together the last arguments.
  */
 void
-ifsbreakup(char *string, struct arglist *arglist)
+ifsbreakup(char *string, int maxargs, struct arglist *arglist)
 {
 	struct ifsregion *ifsp;
 	struct strlist *sp;
@@ -1054,12 +1056,58 @@ ifsbreakup(char *string, struct arglist *arglist)
 						start = p;
 						continue;
 					}
+					/* If only reading one more argument:
+					 * If we have exactly one field, read that field without its terminator.
+					 * If we have more than one field, read all fields including their terminators,
+					 * except for trailing IFS whitespace.
+					 *
+					 * This means that if we have only IFS characters left, and at most one
+					 * of them is non-whitespace, we stop reading here.
+					 * Otherwise, we read all the remaining characters except for trailing
+					 * IFS whitespace.
+					 *
+					 * To keep track of this, the variable s indicates what we've seen so far.
+					 * 0: only IFS whitespace.
+					 * 1: exactly one IFS non-whitespace character.
+					 * 2: non-IFS characters, or multiple IFS non-whitespace characters.
+					 *
+					 * In any case, q indicates the start of the characters to remove, or NULL
+					 * if no characters should be removed.
+					 */
+					if (maxargs == 1) {
+						int s = !ifsspc;
+						q = p;
+						for (;;) {
+							p++;
+							if (p >= string + ifsp->endoff) {
+								if (q)
+									*q = '\0';
+								goto add;
+							}
+							if (*p == (char)CTLESC)
+								p++;
+							if (strchr(ifs, *p)) {
+								if (strchr(defifs, *p)) {
+									if (!q)
+										q = p;
+									continue;
+								}
+								if (s == 0) {
+									s = 1;
+									continue;
+								}
+							}
+							s = 2;
+							q = 0;
+						}
+					}
 					*q = '\0';
 					sp = (struct strlist *)stalloc(sizeof *sp);
 					sp->text = start;
 					*arglist->lastp = sp;
 					arglist->lastp = &sp->next;
 					p++;
+					if (maxargs) maxargs--;
 					if (!nulonly) {
 						for (;;) {
 							if (p >= string + ifsp->endoff) {
diff --git a/src/expand.h b/src/expand.h
index 6a90f67..26dc5b4 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -69,7 +69,7 @@ char *_rmescapes(char *, int);
 int casematch(union node *, char *);
 void recordregion(int, int, int);
 void removerecordregions(int); 
-void ifsbreakup(char *, struct arglist *);
+void ifsbreakup(char *, int, struct arglist *);
 void ifsfree(void);
 
 /* From arith.y */
diff --git a/src/miscbltin.c b/src/miscbltin.c
index b596fd2..39b9c47 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -67,28 +67,21 @@
  *  less fields than variables -> remaining variables unset.
  *
  *  @param line complete line of input
+ *  @param ac argument count
  *  @param ap argument (variable) list
  *  @param len length of line including trailing '\0'
  */
 static void
-readcmd_handle_line(char *s, char **ap)
+readcmd_handle_line(char *s, int ac, char **ap)
 {
 	struct arglist arglist;
 	struct strlist *sl;
-	char *backup;
-	char *line;
 
-	/* ifsbreakup will fiddle with stack region... */
-	line = stackblock();
 	s = grabstackstr(s);
 
-	/* need a copy, so that delimiters aren't lost
-	 * in case there are more fields than variables */
-	backup = sstrdup(line);
-
 	arglist.lastp = &arglist.list;
 	
-	ifsbreakup(s, &arglist);
+	ifsbreakup(s, ac, &arglist);
 	*arglist.lastp = NULL;
 	ifsfree();
 
@@ -104,21 +97,6 @@ readcmd_handle_line(char *s, char **ap)
 			return;
 		}
 
-		/* remaining fields present, but no variables left. */
-		if (!ap[1] && sl->next) {
-			size_t offset;
-			char *remainder;
-
-			/* FIXME little bit hacky, assuming that ifsbreakup 
-			 * will not modify the length of the string */
-			offset = sl->text - s;
-			remainder = backup + offset;
-			rmescapes(remainder);
-			setvar(*ap, remainder, 0);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: dash: read does not ignore trailing spaces
  2015-12-04 19:51       ` Harald van Dijk
@ 2016-01-29 12:57         ` Martijn Dekker
  2016-06-06  8:48         ` [PATCH v2] builtin: Fix handling of trailing IFS white spaces Herbert Xu
  1 sibling, 0 replies; 20+ messages in thread
From: Martijn Dekker @ 2016-01-29 12:57 UTC (permalink / raw)
  To: dash

Harald van Dijk schreef op 04-12-15 om 19:51:
> Here it is. Attached is an updated patch that ignores the complete
> terminator if only a single field remains, otherwise ignores only
> trailing IFS whitespace.

I've tested the patch and it looks like it fixes the bug nicely.

With the patch, dash (like bash, ksh93, mksh and FreeBSD /bin/sh)
completely passes the IFS test suite written by the AT&T ksh93 people:
http://web.archive.org/web/20050414022354/http://www.research.att.com/~gsf/public/ifs.sh
(archive.org link because the AT&T research site is down)

Any chance of getting this patch pushed into the git repo?

Thanks,

- M.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2] builtin: Fix handling of trailing IFS white spaces
  2015-12-04 19:51       ` Harald van Dijk
  2016-01-29 12:57         ` Martijn Dekker
@ 2016-06-06  8:48         ` Herbert Xu
  2016-06-06 20:43           ` Harald van Dijk
  1 sibling, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2016-06-06  8:48 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: Stephane Chazelas, Gioele Barabucci, dash

On Fri, Dec 04, 2015 at 08:51:19PM +0100, Harald van Dijk wrote:
>
> Here it is. Attached is an updated patch that ignores the complete 
> terminator if only a single field remains, otherwise ignores only 
> trailing IFS whitespace.

There is a little bug in the patch if you happened to start with
an escaped IFS, otherwise I think it's good.

However, I think this loop is getting too deep so I've reworked
it to have just one loop over the string instead of three (two
nested loops).

Please test this.

Thanks!

---8<---
The read built-in does not handle trailing IFS white spaces in
the right way, when there are more fields than variables.  Part
of the problem is that this case is handled outside of ifsbreakup.

Harald van Dijk wrote a patch to fix this by moving the magic
into ifsbreakup itself.

This patch further reorganises the ifsbreakup loop by having only
one loop over the whole string.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/src/expand.c b/src/expand.c
index b2d710d..921eecf 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -50,6 +50,7 @@
 #include <glob.h>
 #endif
 #include <ctype.h>
+#include <stdbool.h>
 
 /*
  * Routines to expand arguments to commands.  We have to deal with
@@ -203,7 +204,7 @@ expandarg(union node *arg, struct arglist *arglist, int flag)
 	 * TODO - EXP_REDIR
 	 */
 	if (flag & EXP_FULL) {
-		ifsbreakup(p, &exparg);
+		ifsbreakup(p, -1, &exparg);
 		*exparg.lastp = NULL;
 		exparg.lastp = &exparg.list;
 		expandmeta(exparg.list, flag);
@@ -1016,15 +1017,18 @@ recordregion(int start, int end, int nulonly)
  * Break the argument string into pieces based upon IFS and add the
  * strings to the argument list.  The regions of the string to be
  * searched for IFS characters have been stored by recordregion.
+ * If maxargs is non-negative, at most maxargs arguments will be created, by
+ * joining together the last arguments.
  */
 void
-ifsbreakup(char *string, struct arglist *arglist)
+ifsbreakup(char *string, int maxargs, struct arglist *arglist)
 {
 	struct ifsregion *ifsp;
 	struct strlist *sp;
 	char *start;
 	char *p;
 	char *q;
+	char *r = NULL;
 	const char *ifs, *realifs;
 	int ifsspc;
 	int nulonly;
@@ -1042,56 +1046,94 @@ ifsbreakup(char *string, struct arglist *arglist)
 			ifs = nulonly ? nullstr : realifs;
 			ifsspc = 0;
 			while (p < string + ifsp->endoff) {
+				int c;
+				bool isifs;
+				bool isdefifs;
+
 				q = p;
-				if (*p == (char)CTLESC)
-					p++;
-				if (strchr(ifs, *p)) {
+				c = *p++;
+				if (c == (char)CTLESC)
+					c = *p++;
+
+				isifs = strchr(ifs, c);
+				isdefifs = false;
+				if (isifs)
+					isdefifs = strchr(defifs, c);
+
+				/* If only reading one more argument:
+				 * If we have exactly one field,
+				 * read that field without its terminator.
+				 * If we have more than one field,
+				 * read all fields including their terminators,
+				 * except for trailing IFS whitespace.
+				 *
+				 * This means that if we have only IFS
+				 * characters left, and at most one
+				 * of them is non-whitespace, we stop
+				 * reading here.
+				 * Otherwise, we read all the remaining
+				 * characters except for trailing
+				 * IFS whitespace.
+				 *
+				 * In any case, r indicates the start
+				 * of the characters to remove, or NULL
+				 * if no characters should be removed.
+				 */
+				if (!maxargs) {
+					if (isdefifs) {
+						if (!r)
+							r = q;
+						continue;
+					}
+
+					if (!(isifs && ifsspc))
+						r = NULL;
+
+					ifsspc = 0;
+					continue;
+				}
+
+				if (ifsspc) {
+					if (isdefifs)
+						continue;
+
+					if (!isifs) {
+						ifsspc = 0;
+						p = q;
+					}
+
+					start = p;
+				}
+
+				if (isifs) {
 					if (!nulonly)
-						ifsspc = (strchr(defifs, *p) != NULL);
+						ifsspc = isdefifs;
 					/* Ignore IFS whitespace at start */
 					if (q == start && ifsspc) {
-						p++;
 						start = p;
 						continue;
 					}
+					if (maxargs > 0 && !--maxargs) {
+						r = q;
+						continue;
+					}
 					*q = '\0';
 					sp = (struct strlist *)stalloc(sizeof *sp);
 					sp->text = start;
 					*arglist->lastp = sp;
 					arglist->lastp = &sp->next;
-					p++;
-					if (!nulonly) {
-						for (;;) {
-							if (p >= string + ifsp->endoff) {
-								break;
-							}
-							q = p;
-							if (*p == (char)CTLESC)
-								p++;
-							if (strchr(ifs, *p) == NULL ) {
-								p = q;
-								break;
-							} else if (strchr(defifs, *p) == NULL) {
-								if (ifsspc) {
-									p++;
-									ifsspc = 0;
-								} else {
-									p = q;
-									break;
-								}
-							} else
-								p++;
-						}
-					}
 					start = p;
 				} else
-					p++;
+					ifsspc = 0;
 			}
 		} while ((ifsp = ifsp->next) != NULL);
 		if (nulonly)
 			goto add;
 	}
 
+	if (r)
+		*r = '\0';
+
 	if (!*start)
 		return;
 
diff --git a/src/expand.h b/src/expand.h
index 6a90f67..26dc5b4 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -69,7 +69,7 @@ char *_rmescapes(char *, int);
 int casematch(union node *, char *);
 void recordregion(int, int, int);
 void removerecordregions(int); 
-void ifsbreakup(char *, struct arglist *);
+void ifsbreakup(char *, int, struct arglist *);
 void ifsfree(void);
 
 /* From arith.y */
diff --git a/src/miscbltin.c b/src/miscbltin.c
index b596fd2..39b9c47 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -67,28 +67,21 @@
  *  less fields than variables -> remaining variables unset.
  *
  *  @param line complete line of input
+ *  @param ac argument count
  *  @param ap argument (variable) list
  *  @param len length of line including trailing '\0'
  */
 static void
-readcmd_handle_line(char *s, char **ap)
+readcmd_handle_line(char *s, int ac, char **ap)
 {
 	struct arglist arglist;
 	struct strlist *sl;
-	char *backup;
-	char *line;
 
-	/* ifsbreakup will fiddle with stack region... */
-	line = stackblock();
 	s = grabstackstr(s);
 
-	/* need a copy, so that delimiters aren't lost
-	 * in case there are more fields than variables */
-	backup = sstrdup(line);
-
 	arglist.lastp = &arglist.list;
 	
-	ifsbreakup(s, &arglist);
+	ifsbreakup(s, ac, &arglist);
 	*arglist.lastp = NULL;
 	ifsfree();
 
@@ -104,21 +97,6 @@ readcmd_handle_line(char *s, char **ap)
 			return;
 		}
 
-		/* remaining fields present, but no variables left. */
-		if (!ap[1] && sl->next) {
-			size_t offset;
-			char *remainder;
-
-			/* FIXME little bit hacky, assuming that ifsbreakup 
-			 * will not modify the length of the string */
-			offset = sl->text - s;
-			remainder = backup + offset;
-			rmescapes(remainder);
-			setvar(*ap, remainder, 0);
-
-			return;
-		}
-		
 		/* set variable to field */
 		rmescapes(sl->text);
 		setvar(*ap, sl->text, 0);
@@ -211,7 +189,7 @@ start:
 out:
 	recordregion(startloc, p - (char *)stackblock(), 0);
 	STACKSTRNUL(p);
-	readcmd_handle_line(p + 1, ap);
+	readcmd_handle_line(p + 1, argc - (ap - argv), ap);
 	return status;
 }
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] builtin: Fix handling of trailing IFS white spaces
  2016-06-06  8:48         ` [PATCH v2] builtin: Fix handling of trailing IFS white spaces Herbert Xu
@ 2016-06-06 20:43           ` Harald van Dijk
  2016-06-07  9:25             ` [PATCH v3] " Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Harald van Dijk @ 2016-06-06 20:43 UTC (permalink / raw)
  To: Herbert Xu, dash; +Cc: Stephane Chazelas, Gioele Barabucci

On 06/06/16 10:48, Herbert Xu wrote:
> On Fri, Dec 04, 2015 at 08:51:19PM +0100, Harald van Dijk wrote:
>>
>> Here it is. Attached is an updated patch that ignores the complete
>> terminator if only a single field remains, otherwise ignores only
>> trailing IFS whitespace.
>
> There is a little bug in the patch if you happened to start with
> an escaped IFS, otherwise I think it's good.

Thanks. If starting with an escaped IFS, the CTLESC should already have 
been skipped before the if (maxargs == 1) check gets executed. That's 
done at the start of the outer loop, but is not visible in the patch 
because it's unmodified from the original code. Or did I misunderstand 
you here?

> However, I think this loop is getting too deep so I've reworked
> it to have just one loop over the string instead of three (two
> nested loops).

> Please test this.

With the link sent by Martijn Dekker earlier in this thread:

 
<http://web.archive.org/web/20050414022354/http://www.research.att.com/~gsf/public/ifs.sh>

it gives a lot of failures:

# tests 6856 passed 1844 failed 5012

I'll look into it in more detail later, if no one beats me to it.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3] builtin: Fix handling of trailing IFS white spaces
  2016-06-06 20:43           ` Harald van Dijk
@ 2016-06-07  9:25             ` Herbert Xu
  2016-06-12 10:35               ` Harald van Dijk
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2016-06-07  9:25 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: dash, Stephane Chazelas, Gioele Barabucci

On Mon, Jun 06, 2016 at 10:43:39PM +0200, Harald van Dijk wrote:
>
> Thanks. If starting with an escaped IFS, the CTLESC should already
> have been skipped before the if (maxargs == 1) check gets executed.
> That's done at the start of the outer loop, but is not visible in
> the patch because it's unmodified from the original code. Or did I
> misunderstand you here?

If the very first character later gets zeroed, you'd be zeroing
the character after the CTLESC, leaving the CTLESC in the value
to be assigned to the last variable of read.

If you got rid of the very first q=p assignment it should just work.

> With the link sent by Martijn Dekker earlier in this thread:
> 
> 
> <http://web.archive.org/web/20050414022354/http://www.research.att.com/~gsf/public/ifs.sh>
> 
> it gives a lot of failures:
> 
> # tests 6856 passed 1844 failed 5012
> 
> I'll look into it in more detail later, if no one beats me to it.

Thanks.  I've fixed up the buglet causing the failures:

---8<--
The read built-in does not handle trailing IFS white spaces in
the right way, when there are more fields than variables.  Part
of the problem is that this case is handled outside of ifsbreakup.

Harald van Dijk wrote a patch to fix this by moving the magic
into ifsbreakup itself.

This patch further reorganises the ifsbreakup loop by having only
one loop over the whole string.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/src/expand.c b/src/expand.c
index b2d710d..2207eba 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -50,6 +50,7 @@
 #include <glob.h>
 #endif
 #include <ctype.h>
+#include <stdbool.h>
 
 /*
  * Routines to expand arguments to commands.  We have to deal with
@@ -203,7 +204,7 @@ expandarg(union node *arg, struct arglist *arglist, int flag)
 	 * TODO - EXP_REDIR
 	 */
 	if (flag & EXP_FULL) {
-		ifsbreakup(p, &exparg);
+		ifsbreakup(p, -1, &exparg);
 		*exparg.lastp = NULL;
 		exparg.lastp = &exparg.list;
 		expandmeta(exparg.list, flag);
@@ -1016,15 +1017,18 @@ recordregion(int start, int end, int nulonly)
  * Break the argument string into pieces based upon IFS and add the
  * strings to the argument list.  The regions of the string to be
  * searched for IFS characters have been stored by recordregion.
+ * If maxargs is non-negative, at most maxargs arguments will be created, by
+ * joining together the last arguments.
  */
 void
-ifsbreakup(char *string, struct arglist *arglist)
+ifsbreakup(char *string, int maxargs, struct arglist *arglist)
 {
 	struct ifsregion *ifsp;
 	struct strlist *sp;
 	char *start;
 	char *p;
 	char *q;
+	char *r = NULL;
 	const char *ifs, *realifs;
 	int ifsspc;
 	int nulonly;
@@ -1042,16 +1046,75 @@ ifsbreakup(char *string, struct arglist *arglist)
 			ifs = nulonly ? nullstr : realifs;
 			ifsspc = 0;
 			while (p < string + ifsp->endoff) {
+				int c;
+				bool isifs;
+				bool isdefifs;
+
 				q = p;
-				if (*p == (char)CTLESC)
-					p++;
-				if (strchr(ifs, *p)) {
+				c = *p++;
+				if (c == (char)CTLESC)
+					c = *p++;
+
+				isifs = strchr(ifs, c);
+				isdefifs = false;
+				if (isifs)
+					isdefifs = strchr(defifs, c);
+
+				/* If only reading one more argument:
+				 * If we have exactly one field,
+				 * read that field without its terminator.
+				 * If we have more than one field,
+				 * read all fields including their terminators,
+				 * except for trailing IFS whitespace.
+				 *
+				 * This means that if we have only IFS
+				 * characters left, and at most one
+				 * of them is non-whitespace, we stop
+				 * reading here.
+				 * Otherwise, we read all the remaining
+				 * characters except for trailing
+				 * IFS whitespace.
+				 *
+				 * In any case, r indicates the start
+				 * of the characters to remove, or NULL
+				 * if no characters should be removed.
+				 */
+				if (!maxargs) {
+					if (isdefifs) {
+						if (!r)
+							r = q;
+						continue;
+					}
+
+					if (!(isifs && ifsspc))
+						r = NULL;
+
+					ifsspc = 0;
+					continue;
+				}
+
+				if (ifsspc) {
+					if (isdefifs)
+						continue;
+
+					if (isifs)
+						q = p;
+
+					start = q;
+					isifs = false;
+				}
+
+				if (isifs) {
 					if (!nulonly)
-						ifsspc = (strchr(defifs, *p) != NULL);
+						ifsspc = isdefifs;
 					/* Ignore IFS whitespace at start */
 					if (q == start && ifsspc) {
-						p++;
 						start = p;
+						ifsspc = 0;
+						continue;
+					}
+					if (maxargs > 0 && !--maxargs) {
+						r = q;
 						continue;
 					}
 					*q = '\0';
@@ -1059,39 +1122,20 @@ ifsbreakup(char *string, struct arglist *arglist)
 					sp->text = start;
 					*arglist->lastp = sp;
 					arglist->lastp = &sp->next;
-					p++;
-					if (!nulonly) {
-						for (;;) {
-							if (p >= string + ifsp->endoff) {
-								break;
-							}
-							q = p;
-							if (*p == (char)CTLESC)
-								p++;
-							if (strchr(ifs, *p) == NULL ) {
-								p = q;
-								break;
-							} else if (strchr(defifs, *p) == NULL) {
-								if (ifsspc) {
-									p++;
-									ifsspc = 0;
-								} else {
-									p = q;
-									break;
-								}
-							} else
-								p++;
-						}
-					}
 					start = p;
-				} else
-					p++;
+					continue;
+				}
+
+				ifsspc = 0;
 			}
 		} while ((ifsp = ifsp->next) != NULL);
 		if (nulonly)
 			goto add;
 	}
 
+	if (r)
+		*r = '\0';
+
 	if (!*start)
 		return;
 
diff --git a/src/expand.h b/src/expand.h
index 6a90f67..26dc5b4 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -69,7 +69,7 @@ char *_rmescapes(char *, int);
 int casematch(union node *, char *);
 void recordregion(int, int, int);
 void removerecordregions(int); 
-void ifsbreakup(char *, struct arglist *);
+void ifsbreakup(char *, int, struct arglist *);
 void ifsfree(void);
 
 /* From arith.y */
diff --git a/src/miscbltin.c b/src/miscbltin.c
index b596fd2..39b9c47 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -67,28 +67,21 @@
  *  less fields than variables -> remaining variables unset.
  *
  *  @param line complete line of input
+ *  @param ac argument count
  *  @param ap argument (variable) list
  *  @param len length of line including trailing '\0'
  */
 static void
-readcmd_handle_line(char *s, char **ap)
+readcmd_handle_line(char *s, int ac, char **ap)
 {
 	struct arglist arglist;
 	struct strlist *sl;
-	char *backup;
-	char *line;
 
-	/* ifsbreakup will fiddle with stack region... */
-	line = stackblock();
 	s = grabstackstr(s);
 
-	/* need a copy, so that delimiters aren't lost
-	 * in case there are more fields than variables */
-	backup = sstrdup(line);
-
 	arglist.lastp = &arglist.list;
 	
-	ifsbreakup(s, &arglist);
+	ifsbreakup(s, ac, &arglist);
 	*arglist.lastp = NULL;
 	ifsfree();
 
@@ -104,21 +97,6 @@ readcmd_handle_line(char *s, char **ap)
 			return;
 		}
 
-		/* remaining fields present, but no variables left. */
-		if (!ap[1] && sl->next) {
-			size_t offset;
-			char *remainder;
-
-			/* FIXME little bit hacky, assuming that ifsbreakup 
-			 * will not modify the length of the string */
-			offset = sl->text - s;
-			remainder = backup + offset;
-			rmescapes(remainder);
-			setvar(*ap, remainder, 0);
-
-			return;
-		}
-		
 		/* set variable to field */
 		rmescapes(sl->text);
 		setvar(*ap, sl->text, 0);
@@ -211,7 +189,7 @@ start:
 out:
 	recordregion(startloc, p - (char *)stackblock(), 0);
 	STACKSTRNUL(p);
-	readcmd_handle_line(p + 1, ap);
+	readcmd_handle_line(p + 1, argc - (ap - argv), ap);
 	return status;
 }
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] builtin: Fix handling of trailing IFS white spaces
  2016-06-07  9:25             ` [PATCH v3] " Herbert Xu
@ 2016-06-12 10:35               ` Harald van Dijk
  2016-06-12 11:06                 ` Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Harald van Dijk @ 2016-06-12 10:35 UTC (permalink / raw)
  To: Herbert Xu, dash; +Cc: Stephane Chazelas, Gioele Barabucci

On 07/06/16 11:25, Herbert Xu wrote:
> If the very first character later gets zeroed, you'd be zeroing
> the character after the CTLESC, leaving the CTLESC in the value
> to be assigned to the last variable of read.

Ah, I see. Thanks for the explanation. While trying to make it misbehave 
I found that rmescapes removes trailing CTLESC, and readcmd_handle_line 
calls rmescapes before the trailing CTLESC gets a chance to cause 
problems. I was now still only able to see the problem when adding some 
extra debugging statements.

> If you got rid of the very first q=p assignment it should just work.

There is a later q=p assignment too that gets performed after CTLESC has 
been skipped. That one also isn't going to cause problems in practice, 
since it only gets executed when a character is both IFS and IFS 
whitespace, but when called from readcmd_handle_line, there's never 
going to be escaped IFS whitespace.

> Thanks.  I've fixed up the buglet causing the failures:

The results are a lot better now, but I did spot a problem:

src/dash -c 'X="x  "; echo $X'

This is supposed to print "x\n", but the IFS breakup of $X generates two 
words, one "x", one " ", meaning "x  \n" gets printed instead. I think 
this is intended to get fixed up in the if (ifsspc) block, but that 
block doesn't get executed when there are no more characters after the 
spaces.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] builtin: Fix handling of trailing IFS white spaces
  2016-06-12 10:35               ` Harald van Dijk
@ 2016-06-12 11:06                 ` Herbert Xu
  2016-06-12 11:12                   ` Harald van Dijk
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2016-06-12 11:06 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: dash, Stephane Chazelas, Gioele Barabucci

On Sun, Jun 12, 2016 at 12:35:15PM +0200, Harald van Dijk wrote:
>
> The results are a lot better now, but I did spot a problem:
> 
> src/dash -c 'X="x  "; echo $X'
> 
> This is supposed to print "x\n", but the IFS breakup of $X generates
> two words, one "x", one " ", meaning "x  \n" gets printed instead. I
> think this is intended to get fixed up in the if (ifsspc) block, but
> that block doesn't get executed when there are no more characters
> after the spaces.

Weird, I can't reproduce this at all:

$ src/dash -c 'X="x "; echo $X' | cat -A
x$
$ 

What am I doing wrong?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] builtin: Fix handling of trailing IFS white spaces
  2016-06-12 11:06                 ` Herbert Xu
@ 2016-06-12 11:12                   ` Harald van Dijk
  2016-06-12 12:17                     ` [PATCH v4] " Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Harald van Dijk @ 2016-06-12 11:12 UTC (permalink / raw)
  To: Herbert Xu, dash; +Cc: Stephane Chazelas, Gioele Barabucci

On 12/06/16 13:06, Herbert Xu wrote:
> On Sun, Jun 12, 2016 at 12:35:15PM +0200, Harald van Dijk wrote:
>>
>> The results are a lot better now, but I did spot a problem:
>>
>> src/dash -c 'X="x  "; echo $X'
>>
>> This is supposed to print "x\n", but the IFS breakup of $X generates
>> two words, one "x", one " ", meaning "x  \n" gets printed instead. I
>> think this is intended to get fixed up in the if (ifsspc) block, but
>> that block doesn't get executed when there are no more characters
>> after the spaces.
>
> Weird, I can't reproduce this at all:
>
> $ src/dash -c 'X="x "; echo $X' | cat -A
> x$
> $
>
> What am I doing wrong?

It needs two spaces after the "x", not just one. With a single space, it 
does work right.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4] builtin: Fix handling of trailing IFS white spaces
  2016-06-12 11:12                   ` Harald van Dijk
@ 2016-06-12 12:17                     ` Herbert Xu
  2016-06-19 22:01                       ` Harald van Dijk
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2016-06-12 12:17 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: dash, Stephane Chazelas, Gioele Barabucci

On Sun, Jun 12, 2016 at 01:12:01PM +0200, Harald van Dijk wrote:
> 
> It needs two spaces after the "x", not just one. With a single
> space, it does work right.

Tricky :) OK here is a new version that should fix this bug.

---8<---
The read built-in does not handle trailing IFS white spaces in
the right way, when there are more fields than variables.  Part
of the problem is that this case is handled outside of ifsbreakup.

Harald van Dijk wrote a patch to fix this by moving the magic
into ifsbreakup itself.

This patch further reorganises the ifsbreakup loop by having only
one loop over the whole string.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/src/expand.c b/src/expand.c
index b2d710d..36bea76 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -50,6 +50,7 @@
 #include <glob.h>
 #endif
 #include <ctype.h>
+#include <stdbool.h>
 
 /*
  * Routines to expand arguments to commands.  We have to deal with
@@ -203,7 +204,7 @@ expandarg(union node *arg, struct arglist *arglist, int flag)
 	 * TODO - EXP_REDIR
 	 */
 	if (flag & EXP_FULL) {
-		ifsbreakup(p, &exparg);
+		ifsbreakup(p, -1, &exparg);
 		*exparg.lastp = NULL;
 		exparg.lastp = &exparg.list;
 		expandmeta(exparg.list, flag);
@@ -1016,15 +1017,18 @@ recordregion(int start, int end, int nulonly)
  * Break the argument string into pieces based upon IFS and add the
  * strings to the argument list.  The regions of the string to be
  * searched for IFS characters have been stored by recordregion.
+ * If maxargs is non-negative, at most maxargs arguments will be created, by
+ * joining together the last arguments.
  */
 void
-ifsbreakup(char *string, struct arglist *arglist)
+ifsbreakup(char *string, int maxargs, struct arglist *arglist)
 {
 	struct ifsregion *ifsp;
 	struct strlist *sp;
 	char *start;
 	char *p;
 	char *q;
+	char *r = NULL;
 	const char *ifs, *realifs;
 	int ifsspc;
 	int nulonly;
@@ -1042,16 +1046,76 @@ ifsbreakup(char *string, struct arglist *arglist)
 			ifs = nulonly ? nullstr : realifs;
 			ifsspc = 0;
 			while (p < string + ifsp->endoff) {
+				int c;
+				bool isifs;
+				bool isdefifs;
+
 				q = p;
-				if (*p == (char)CTLESC)
-					p++;
-				if (strchr(ifs, *p)) {
+				c = *p++;
+				if (c == (char)CTLESC)
+					c = *p++;
+
+				isifs = strchr(ifs, c);
+				isdefifs = false;
+				if (isifs)
+					isdefifs = strchr(defifs, c);
+
+				/* If only reading one more argument:
+				 * If we have exactly one field,
+				 * read that field without its terminator.
+				 * If we have more than one field,
+				 * read all fields including their terminators,
+				 * except for trailing IFS whitespace.
+				 *
+				 * This means that if we have only IFS
+				 * characters left, and at most one
+				 * of them is non-whitespace, we stop
+				 * reading here.
+				 * Otherwise, we read all the remaining
+				 * characters except for trailing
+				 * IFS whitespace.
+				 *
+				 * In any case, r indicates the start
+				 * of the characters to remove, or NULL
+				 * if no characters should be removed.
+				 */
+				if (!maxargs) {
+					if (isdefifs) {
+						if (!r)
+							r = q;
+						continue;
+					}
+
+					if (!(isifs && ifsspc))
+						r = NULL;
+
+					ifsspc = 0;
+					continue;
+				}
+
+				if (ifsspc) {
+					if (isifs)
+						q = p;
+
+					start = q;
+
+					if (isdefifs)
+						continue;
+
+					isifs = false;
+				}
+
+				if (isifs) {
 					if (!nulonly)
-						ifsspc = (strchr(defifs, *p) != NULL);
+						ifsspc = isdefifs;
 					/* Ignore IFS whitespace at start */
 					if (q == start && ifsspc) {
-						p++;
 						start = p;
+						ifsspc = 0;
+						continue;
+					}
+					if (maxargs > 0 && !--maxargs) {
+						r = q;
 						continue;
 					}
 					*q = '\0';
@@ -1059,39 +1123,20 @@ ifsbreakup(char *string, struct arglist *arglist)
 					sp->text = start;
 					*arglist->lastp = sp;
 					arglist->lastp = &sp->next;
-					p++;
-					if (!nulonly) {
-						for (;;) {
-							if (p >= string + ifsp->endoff) {
-								break;
-							}
-							q = p;
-							if (*p == (char)CTLESC)
-								p++;
-							if (strchr(ifs, *p) == NULL ) {
-								p = q;
-								break;
-							} else if (strchr(defifs, *p) == NULL) {
-								if (ifsspc) {
-									p++;
-									ifsspc = 0;
-								} else {
-									p = q;
-									break;
-								}
-							} else
-								p++;
-						}
-					}
 					start = p;
-				} else
-					p++;
+					continue;
+				}
+
+				ifsspc = 0;
 			}
 		} while ((ifsp = ifsp->next) != NULL);
 		if (nulonly)
 			goto add;
 	}
 
+	if (r)
+		*r = '\0';
+
 	if (!*start)
 		return;
 
diff --git a/src/expand.h b/src/expand.h
index 6a90f67..26dc5b4 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -69,7 +69,7 @@ char *_rmescapes(char *, int);
 int casematch(union node *, char *);
 void recordregion(int, int, int);
 void removerecordregions(int); 
-void ifsbreakup(char *, struct arglist *);
+void ifsbreakup(char *, int, struct arglist *);
 void ifsfree(void);
 
 /* From arith.y */
diff --git a/src/miscbltin.c b/src/miscbltin.c
index b596fd2..39b9c47 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -67,28 +67,21 @@
  *  less fields than variables -> remaining variables unset.
  *
  *  @param line complete line of input
+ *  @param ac argument count
  *  @param ap argument (variable) list
  *  @param len length of line including trailing '\0'
  */
 static void
-readcmd_handle_line(char *s, char **ap)
+readcmd_handle_line(char *s, int ac, char **ap)
 {
 	struct arglist arglist;
 	struct strlist *sl;
-	char *backup;
-	char *line;
 
-	/* ifsbreakup will fiddle with stack region... */
-	line = stackblock();
 	s = grabstackstr(s);
 
-	/* need a copy, so that delimiters aren't lost
-	 * in case there are more fields than variables */
-	backup = sstrdup(line);
-
 	arglist.lastp = &arglist.list;
 	
-	ifsbreakup(s, &arglist);
+	ifsbreakup(s, ac, &arglist);
 	*arglist.lastp = NULL;
 	ifsfree();
 
@@ -104,21 +97,6 @@ readcmd_handle_line(char *s, char **ap)
 			return;
 		}
 
-		/* remaining fields present, but no variables left. */
-		if (!ap[1] && sl->next) {
-			size_t offset;
-			char *remainder;
-
-			/* FIXME little bit hacky, assuming that ifsbreakup 
-			 * will not modify the length of the string */
-			offset = sl->text - s;
-			remainder = backup + offset;
-			rmescapes(remainder);
-			setvar(*ap, remainder, 0);
-
-			return;
-		}
-		
 		/* set variable to field */
 		rmescapes(sl->text);
 		setvar(*ap, sl->text, 0);
@@ -211,7 +189,7 @@ start:
 out:
 	recordregion(startloc, p - (char *)stackblock(), 0);
 	STACKSTRNUL(p);
-	readcmd_handle_line(p + 1, ap);
+	readcmd_handle_line(p + 1, argc - (ap - argv), ap);
 	return status;
 }
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] builtin: Fix handling of trailing IFS white spaces
  2016-06-12 12:17                     ` [PATCH v4] " Herbert Xu
@ 2016-06-19 22:01                       ` Harald van Dijk
  2016-06-20  1:28                         ` Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Harald van Dijk @ 2016-06-19 22:01 UTC (permalink / raw)
  To: Herbert Xu, dash; +Cc: Stephane Chazelas, Gioele Barabucci

On 12/06/16 14:17, Herbert Xu wrote:
> Tricky :) OK here is a new version that should fix this bug.

FWIW, I've been testing this along with your eval patch, haven't spotted 
any more issues, either from code inspection or from actual use, and 
don't expect to spot any more issues. Very nice.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] builtin: Fix handling of trailing IFS white spaces
  2016-06-19 22:01                       ` Harald van Dijk
@ 2016-06-20  1:28                         ` Herbert Xu
  0 siblings, 0 replies; 20+ messages in thread
From: Herbert Xu @ 2016-06-20  1:28 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: dash, Stephane Chazelas, Gioele Barabucci

On Mon, Jun 20, 2016 at 12:01:16AM +0200, Harald van Dijk wrote:
> 
> FWIW, I've been testing this along with your eval patch, haven't
> spotted any more issues, either from code inspection or from actual
> use, and don't expect to spot any more issues. Very nice.

Thanks for testing! I'll add a

Tested-by: Harald van Dijk <harald@gigawatt.nl>

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-06-20  1:28 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-02 22:37 dash: read does not ignore trailing spaces Gioele Barabucci
2015-12-02 23:25 ` Jilles Tjoelker
2015-12-03 21:02 ` Harald van Dijk
2015-12-03 21:17   ` Stephane Chazelas
2015-12-03 21:43     ` Martijn Dekker
2015-12-03 23:04       ` Stephane Chazelas
2015-12-03 23:17         ` Stephane Chazelas
2015-12-04  0:00           ` Stephane Chazelas
2015-12-03 22:26     ` Harald van Dijk
2015-12-04 19:51       ` Harald van Dijk
2016-01-29 12:57         ` Martijn Dekker
2016-06-06  8:48         ` [PATCH v2] builtin: Fix handling of trailing IFS white spaces Herbert Xu
2016-06-06 20:43           ` Harald van Dijk
2016-06-07  9:25             ` [PATCH v3] " Herbert Xu
2016-06-12 10:35               ` Harald van Dijk
2016-06-12 11:06                 ` Herbert Xu
2016-06-12 11:12                   ` Harald van Dijk
2016-06-12 12:17                     ` [PATCH v4] " Herbert Xu
2016-06-19 22:01                       ` Harald van Dijk
2016-06-20  1:28                         ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).