dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression
@ 2022-01-12 16:25 Christoph Anton Mitterer
  2022-01-12 17:20 ` Harald van Dijk
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Anton Mitterer @ 2022-01-12 16:25 UTC (permalink / raw)
  To: dash

Hey.


I recently got some big deal of help from the people at the help-bash
mailing list when I've tried to understand what POSIX mandates with
respect to pattern matching (that is in the sense of [0], not
Basic/Extended Regular Expressions).

I'm still not so sure whether I understand it exactly by the wording of
POSIX itself (which seems a bit odd to me), but what people
explained[1] me at help-bash - and I hope I explain it correctly - is:

In the patterns, even in a bracket expression in a pattern, there may
be quoting (with double and single quotes), and - and this is key -
anything that is quoted is already taken literal with respect to the
pattern.

So when one has e.g. a case compound command:
case $foo in
	(['*?'])
...
is already the literal * and ? within a pattern's bracket expression.


Further, POSIX says:
"If an open bracket introduces a bracket expression as in XBD RE
Bracket Expression, except that the <exclamation-mark> character ( '!'
) shall replace the <circumflex> character ( '^' ) in its role in a
non-matching list in the regular expression notation, it shall
introduce a pattern bracket expression. A bracket expression starting
with an unquoted <circumflex> character produces unspecified results.
Otherwise, '[' shall match the character itself."



I found not the following probably wrong behaviour of dash and busybox'
sh:




$ cat circumflex-test 
case "$1" in
(['^.a'])
 echo match
 ;;
(*)
 echo else
esac


$ cat exclamation-test 
case "$1" in
(['!.a'])
 echo match
 ;;
(*)
 echo else
esac


$ cat run-circumflex 
echo dash:
dash circumflex-test ^
dash circumflex-test .
dash circumflex-test a

echo busybox-sh:
busybox sh circumflex-test ^
busybox sh circumflex-test .
busybox sh circumflex-test a

echo bash:
bash circumflex-test ^
bash circumflex-test .
bash circumflex-test a

echo klibc-sh:
/usr/lib/klibc/bin/sh circumflex-test ^
/usr/lib/klibc/bin/sh circumflex-test .
/usr/lib/klibc/bin/sh circumflex-test a


$ cat run-exlamation 
echo dash:
dash exclamation-test '!'
dash exclamation-test .
dash exclamation-test a

echo busybox-sh:
busybox sh exclamation-test '!'
busybox sh exclamation-test .
busybox sh exclamation-test a

echo bash:
bash exclamation-test '!'
bash exclamation-test .
bash exclamation-test a

echo klibc-sh:
/usr/lib/klibc/bin/sh exclamation-test '!'
/usr/lib/klibc/bin/sh exclamation-test .
/usr/lib/klibc/bin/sh exclamation-test a




When run:
$ sh run-circumflex | paste - - - - | column -t
dash: match else else
busybox-sh: match else else
bash: match match match
klibc-sh: match match match
$
 ^ . a


$ sh run-exlamation | paste - - - - | column -t
dash: match match match
busybox-sh: match match match
bash: match match match
klibc-sh: match match match
$
 ! . a


The results for the run-circumflex seem pretty odd.
Apparently, the ^ is taken literally, but the other two are negated.

$ dash circumflex-test b
match
$ busybox sh circumflex-test b
match

match again (which, AFAIU, they should not).

While POSIX does say:
"A bracket expression starting with an unquoted <circumflex> character
produces unspecified results."
... the circumflex *is* quoted above...


I haven't verified any further unusual patterns like ['$var'] vs.
["$var"], so maybe an eye should be kept open, whether there could be
any issues as well.


Thanks,
Chris.

PS: For reference, the bug[2] I've opened at BusyBox.



[0] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13
[1] https://lists.gnu.org/archive/html/help-bash/2022-01/msg00000.html
[2] https://bugs.busybox.net/show_bug.cgi?id=14516


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression
  2022-01-12 16:25 possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression Christoph Anton Mitterer
@ 2022-01-12 17:20 ` Harald van Dijk
  2022-01-12 17:47   ` Christoph Anton Mitterer
  2022-01-18  6:13   ` [PATCH] expand: Always quote caret when using fnmatch Herbert Xu
  0 siblings, 2 replies; 15+ messages in thread
From: Harald van Dijk @ 2022-01-12 17:20 UTC (permalink / raw)
  To: Christoph Anton Mitterer, dash

Hi,

On 12/01/2022 16:25, Christoph Anton Mitterer wrote:
> The results for the run-circumflex seem pretty odd.
> Apparently, the ^ is taken literally, but the other two are negated.

The ^ is not taken literally. The ^ in the pattern is wrongly taken as 
the negation operator, and the ^ in the argument is then reported as a 
match because it is neither . nor a.

This bug (you're right that it's a bug) is specific to builds that use 
fnmatch(). In dash itself, ^ is always assumed as a literal. In builds 
with --disable-fnmatch you get correct results. In builds with 
--enable-fnmatch, because dash assumes ^ is assumed as a literal, dash 
fails to escape it before passing it on to fnmatch(), and the system 
fnmatch() may choose differently from dash on how to deal with unquoted 
^s. What dash should do to get whatever behaviour the system fnmatch() 
chooses is leave unquoted ^s unquoted, and leave quoted ^s quoted. This 
can be achieved by

--- a/src/mksyntax.c
+++ b/src/mksyntax.c
@@ -178,14 +178,14 @@ main(int argc, char **argv)
         add("$", "CVAR");
         add("}", "CENDVAR");
         /* ':/' for tilde expansion, '-' for [a\-x] pattern ranges */
-       add("!*?[=~:/-]", "CCTL");
+       add("!*?[^=~:/-]", "CCTL");
         print("dqsyntax");
         init();
         fputs("\n/* syntax table used when in single quotes */\n", cfile);
         add("\n", "CNL");
         add("'", "CENDQUOTE");
         /* ':/' for tilde expansion, '-' for [a\-x] pattern ranges */
-       add("!*?[=~:/-]\\", "CCTL");
+       add("!*?[^=~:/-]\\", "CCTL");
         print("sqsyntax");
         init();
         fputs("\n/* syntax table used when in arithmetic */\n", cfile);

However, whether this is the correct approach is a matter of opinion: 
dash could alternatively choose to always take ^ as a literal and always 
escape it before passing it on to fnmatch(), overriding whatever 
decision the libc people had taken.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression
  2022-01-12 17:20 ` Harald van Dijk
@ 2022-01-12 17:47   ` Christoph Anton Mitterer
  2022-01-12 18:17     ` Harald van Dijk
  2022-01-18  6:13   ` [PATCH] expand: Always quote caret when using fnmatch Herbert Xu
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Anton Mitterer @ 2022-01-12 17:47 UTC (permalink / raw)
  To: Harald van Dijk, dash

On Wed, 2022-01-12 at 17:20 +0000, Harald van Dijk wrote:
> However, whether this is the correct approach is a matter of opinion:
> dash could alternatively choose to always take ^ as a literal and
> always 
> escape it before passing it on to fnmatch()

Well I personally think the best would to *always* take ^ literally,
whether quoted or not.

That would match the behaviour of bash and klibc sh, and also seems
more in the spirit of POSIX (which, while saying that an unquoted ^
produces undefined behaviour, also says that:
"( '!' ) shall replace the <circumflex> character ( '^' )""

... "replace", not complement.


Cheers,
Chris.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression
  2022-01-12 17:47   ` Christoph Anton Mitterer
@ 2022-01-12 18:17     ` Harald van Dijk
  2022-01-12 18:21       ` Christoph Anton Mitterer
  0 siblings, 1 reply; 15+ messages in thread
From: Harald van Dijk @ 2022-01-12 18:17 UTC (permalink / raw)
  To: Christoph Anton Mitterer, dash

On 12/01/2022 17:47, Christoph Anton Mitterer wrote:
> On Wed, 2022-01-12 at 17:20 +0000, Harald van Dijk wrote:
>> However, whether this is the correct approach is a matter of opinion:
>> dash could alternatively choose to always take ^ as a literal and
>> always
>> escape it before passing it on to fnmatch()
> 
> Well I personally think the best would to *always* take ^ literally,
> whether quoted or not.
> 
> That would match the behaviour of bash and klibc sh, and also seems
> more in the spirit of POSIX (which, while saying that an unquoted ^
> produces undefined behaviour, also says that:
> "( '!' ) shall replace the <circumflex> character ( '^' )""
> 
> ... "replace", not complement.

Actually, in bash, unquoted ^ does act as negation. Same in ksh, which 
is the basis for a lot of POSIX. As for the spirit of POSIX, POSIX does 
say in the rationale that it's unspecified specifically in order to 
allow that:

   The restriction on a <circumflex> in a bracket expression is to allow
   implementations that support pattern matching using the <circumflex>
   as the negation character in addition to the <exclamation-mark>.

But personally, I have no opinion on which action dash should take, as 
long as it gets the right results for quoted ^.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression
  2022-01-12 18:17     ` Harald van Dijk
@ 2022-01-12 18:21       ` Christoph Anton Mitterer
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Anton Mitterer @ 2022-01-12 18:21 UTC (permalink / raw)
  To: Harald van Dijk, dash

On Wed, 2022-01-12 at 18:17 +0000, Harald van Dijk wrote:
> Actually, in bash, unquoted ^ does act as negation.

Oh, sure,... I event tested that half an hour before and still wrote it
wrong ^^.

Then I'd prefer if it would use an unquoted (only) ^ as negation. The
standard allows it, and it's better to have some homogeneity.


Cheers,
Chris.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] expand: Always quote caret when using fnmatch
  2022-01-12 17:20 ` Harald van Dijk
  2022-01-12 17:47   ` Christoph Anton Mitterer
@ 2022-01-18  6:13   ` Herbert Xu
  2022-01-18  8:44     ` Harald van Dijk
  2022-01-18 14:29     ` [PATCH] " Christoph Anton Mitterer
  1 sibling, 2 replies; 15+ messages in thread
From: Herbert Xu @ 2022-01-18  6:13 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: calestyo, dash

Harald van Dijk <harald@gigawatt.nl> wrote:
> 
> On 12/01/2022 16:25, Christoph Anton Mitterer wrote:
>> The results for the run-circumflex seem pretty odd.
>> Apparently, the ^ is taken literally, but the other two are negated.
> 
> The ^ is not taken literally. The ^ in the pattern is wrongly taken as 
> the negation operator, and the ^ in the argument is then reported as a 
> match because it is neither . nor a.
> 
> This bug (you're right that it's a bug) is specific to builds that use 
> fnmatch(). In dash itself, ^ is always assumed as a literal. In builds 
> with --disable-fnmatch you get correct results. In builds with 
> --enable-fnmatch, because dash assumes ^ is assumed as a literal, dash 
> fails to escape it before passing it on to fnmatch(), and the system 
> fnmatch() may choose differently from dash on how to deal with unquoted 
> ^s. What dash should do to get whatever behaviour the system fnmatch() 
> chooses is leave unquoted ^s unquoted, and leave quoted ^s quoted. This 
> can be achieved by
> 
> --- a/src/mksyntax.c
> +++ b/src/mksyntax.c
> @@ -178,14 +178,14 @@ main(int argc, char **argv)
>         add("$", "CVAR");
>         add("}", "CENDVAR");
>         /* ':/' for tilde expansion, '-' for [a\-x] pattern ranges */
> -       add("!*?[=~:/-]", "CCTL");
> +       add("!*?[^=~:/-]", "CCTL");
>         print("dqsyntax");
>         init();
>         fputs("\n/* syntax table used when in single quotes */\n", cfile);
>         add("\n", "CNL");
>         add("'", "CENDQUOTE");
>         /* ':/' for tilde expansion, '-' for [a\-x] pattern ranges */
> -       add("!*?[=~:/-]\\", "CCTL");
> +       add("!*?[^=~:/-]\\", "CCTL");
>         print("sqsyntax");
>         init();
>         fputs("\n/* syntax table used when in arithmetic */\n", cfile);
> 
> However, whether this is the correct approach is a matter of opinion: 
> dash could alternatively choose to always take ^ as a literal and always 
> escape it before passing it on to fnmatch(), overriding whatever 
> decision the libc people had taken.

Yes, this would produce the most consistent result.

This patch forces ^ to be a literal when we use fnmatch.

Fixes: 7638476c18f2 ("shell: Enable fnmatch/glob by default")
Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
Suggested-by: Harald van Dijk <harald@gigawatt.nl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/src/expand.c b/src/expand.c
index aea5cc4..04bf8fb 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -47,6 +47,9 @@
 #include <string.h>
 #ifdef HAVE_FNMATCH
 #include <fnmatch.h>
+#define FNMATCH_IS_ENABLED 1
+#else
+#define FNMATCH_IS_ENABLED 0
 #endif
 #ifdef HAVE_GLOB
 #include <glob.h>
@@ -1693,8 +1696,11 @@ _rmescapes(char *str, int flag)
 			notescaped = 0;
 			goto copy;
 		}
+		if (FNMATCH_IS_ENABLED && *p == '^')
+			goto add_escape;
 		if (*p == (char)CTLESC) {
 			p++;
+add_escape:
 			if (notescaped)
 				*q++ = '\\';
 		}
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] expand: Always quote caret when using fnmatch
  2022-01-18  6:13   ` [PATCH] expand: Always quote caret when using fnmatch Herbert Xu
@ 2022-01-18  8:44     ` Harald van Dijk
  2022-01-19  5:37       ` [v2 PATCH] " Herbert Xu
  2022-01-18 14:29     ` [PATCH] " Christoph Anton Mitterer
  1 sibling, 1 reply; 15+ messages in thread
From: Harald van Dijk @ 2022-01-18  8:44 UTC (permalink / raw)
  To: Herbert Xu; +Cc: calestyo, dash

On 18/01/2022 06:13, Herbert Xu wrote:
> This patch forces ^ to be a literal when we use fnmatch.
> 
> Fixes: 7638476c18f2 ("shell: Enable fnmatch/glob by default")
> Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
> Suggested-by: Harald van Dijk <harald@gigawatt.nl>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/src/expand.c b/src/expand.c
> index aea5cc4..04bf8fb 100644
> --- a/src/expand.c
> +++ b/src/expand.c
> @@ -47,6 +47,9 @@
>   #include <string.h>
>   #ifdef HAVE_FNMATCH
>   #include <fnmatch.h>
> +#define FNMATCH_IS_ENABLED 1
> +#else
> +#define FNMATCH_IS_ENABLED 0
>   #endif
>   #ifdef HAVE_GLOB
>   #include <glob.h>
> @@ -1693,8 +1696,11 @@ _rmescapes(char *str, int flag)
>   			notescaped = 0;
>   			goto copy;
>   		}
> +		if (FNMATCH_IS_ENABLED && *p == '^')
> +			goto add_escape;
>   		if (*p == (char)CTLESC) {
>   			p++;
> +add_escape:
>   			if (notescaped)
>   				*q++ = '\\';
>   		}
> 

The loop that is modified by this patch is only taken after any qchars 
are seen, so for e.g.

   var=abc
   echo ${var#[^a]}

it has no effect. More importantly though, _rmescapes is used to modify 
strings in place. This patch causes _rmescapes to try and grow strings, 
which cannot ever work. A test case for this is

   case aa in \a[^a]) echo match ;; esac

which fails with a segfault after this patch is applied.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] expand: Always quote caret when using fnmatch
  2022-01-18  6:13   ` [PATCH] expand: Always quote caret when using fnmatch Herbert Xu
  2022-01-18  8:44     ` Harald van Dijk
@ 2022-01-18 14:29     ` Christoph Anton Mitterer
  2022-01-18 14:54       ` Chet Ramey
  2022-01-18 22:33       ` Herbert Xu
  1 sibling, 2 replies; 15+ messages in thread
From: Christoph Anton Mitterer @ 2022-01-18 14:29 UTC (permalink / raw)
  To: Herbert Xu, Harald van Dijk; +Cc: dash

Hey.

Just for confirmation:

Would that also be an issue in fnmatch() (i.e. should it be forwarded?)
or really just something in dash?

Cheers,
Chris.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] expand: Always quote caret when using fnmatch
  2022-01-18 14:29     ` [PATCH] " Christoph Anton Mitterer
@ 2022-01-18 14:54       ` Chet Ramey
  2022-01-18 22:33       ` Herbert Xu
  1 sibling, 0 replies; 15+ messages in thread
From: Chet Ramey @ 2022-01-18 14:54 UTC (permalink / raw)
  To: Christoph Anton Mitterer, Herbert Xu, Harald van Dijk; +Cc: chet.ramey, dash

On 1/18/22 9:29 AM, Christoph Anton Mitterer wrote:
> Hey.
> 
> Just for confirmation:
> 
> Would that also be an issue in fnmatch() (i.e. should it be forwarded?)
> or really just something in dash?

The behavior of an unquoted carat as the first character in a bracket
expression is unspecified. Dash can't count on any particular behavior.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] expand: Always quote caret when using fnmatch
  2022-01-18 14:29     ` [PATCH] " Christoph Anton Mitterer
  2022-01-18 14:54       ` Chet Ramey
@ 2022-01-18 22:33       ` Herbert Xu
  1 sibling, 0 replies; 15+ messages in thread
From: Herbert Xu @ 2022-01-18 22:33 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: harald, dash

Christoph Anton Mitterer <calestyo@scientia.org> wrote:
> Hey.
> 
> Just for confirmation:
> 
> Would that also be an issue in fnmatch() (i.e. should it be forwarded?)
> or really just something in dash?

No I don't think this is an fnmatch issue.  It's dash's fault
for not quoting the caret before passing it to glibc.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [v2 PATCH] expand: Always quote caret when using fnmatch
  2022-01-18  8:44     ` Harald van Dijk
@ 2022-01-19  5:37       ` Herbert Xu
  2022-02-20  7:15         ` Stephane Chazelas
  0 siblings, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2022-01-19  5:37 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: calestyo, dash

On Tue, Jan 18, 2022 at 08:44:05AM +0000, Harald van Dijk wrote:
>
> The loop that is modified by this patch is only taken after any qchars are
> seen, so for e.g.
> 
>   var=abc
>   echo ${var#[^a]}
> 
> it has no effect. More importantly though, _rmescapes is used to modify

Good catch.  I have modified qchars accordingly.

> strings in place. This patch causes _rmescapes to try and grow strings,
> which cannot ever work. A test case for this is
> 
>   case aa in \a[^a]) echo match ;; esac
> 
> which fails with a segfault after this patch is applied.

Indeed.  I have modified the RMESCAPE_ALLOC to allow growth.

Thanks,

---8<---
This patch forces ^ to be a literal when we use fnmatch.

In order to allow for the extra space to quote the caret, the
function _rmescapes will allocate up to twice the memory if the
flag RMESCAPE_GLOB is set.

Fixes: 7638476c18f2 ("shell: Enable fnmatch/glob by default")
Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
Suggested-by: Harald van Dijk <harald@gigawatt.nl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/src/expand.c b/src/expand.c
index aea5cc4..9906d8a 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -135,8 +135,6 @@ STATIC int pmatch(const char *, const char *);
 #endif
 static size_t cvtnum(intmax_t num, int flags);
 STATIC size_t esclen(const char *, const char *);
-STATIC char *scanleft(char *, char *, char *, char *, int, int);
-STATIC char *scanright(char *, char *, char *, char *, int, int);
 STATIC void varunset(const char *, const char *, const char *, int)
 	__attribute__((__noreturn__));
 
@@ -541,10 +539,8 @@ out:
 }
 
 
-STATIC char *
-scanleft(
-	char *startp, char *rmesc, char *rmescend, char *str, int quotes,
-	int zero
+static char *scanleft(char *startp, char *endp, char *rmesc, char *rmescend,
+		      char *str, int quotes, int zero
 ) {
 	char *loc;
 	char *loc2;
@@ -573,16 +569,14 @@ scanleft(
 }
 
 
-STATIC char *
-scanright(
-	char *startp, char *rmesc, char *rmescend, char *str, int quotes,
-	int zero
+static char *scanright(char *startp, char *endp, char *rmesc, char *rmescend,
+		       char *str, int quotes, int zero
 ) {
 	int esc = 0;
 	char *loc;
 	char *loc2;
 
-	for (loc = str - 1, loc2 = rmescend; loc >= startp; loc2--) {
+	for (loc = endp, loc2 = rmescend; loc >= startp; loc2--) {
 		int match;
 		char c = *loc2;
 		const char *s = loc2;
@@ -618,7 +612,9 @@ static char *subevalvar(char *start, char *str, int strloc, int startloc,
 	long amount;
 	char *rmesc, *rmescend;
 	int zero;
-	char *(*scan)(char *, char *, char *, char *, int , int);
+	char *(*scan)(char *, char *, char *, char *, char *, int , int);
+	int nstrloc = strloc;
+	char *endp;
 	char *p;
 
 	p = argstr(start, (flag & EXP_DISCARD) | EXP_TILDE |
@@ -646,33 +642,40 @@ static char *subevalvar(char *start, char *str, int strloc, int startloc,
 		abort();
 #endif
 
-	rmesc = startp;
 	rmescend = stackblock() + strloc;
+	str = preglob(rmescend, FNMATCH_IS_ENABLED ?
+				RMESCAPE_ALLOC | RMESCAPE_GROW : 0);
+	if (FNMATCH_IS_ENABLED) {
+		startp = stackblock() + startloc;
+		rmescend = stackblock() + strloc;
+		nstrloc = str - (char *)stackblock();
+	}
+
+	rmesc = startp;
 	if (quotes) {
 		rmesc = _rmescapes(startp, RMESCAPE_ALLOC | RMESCAPE_GROW);
-		if (rmesc != startp) {
+		if (rmesc != startp)
 			rmescend = expdest;
-			startp = stackblock() + startloc;
-		}
+		startp = stackblock() + startloc;
+		str = stackblock() + nstrloc;
 	}
 	rmescend--;
-	str = stackblock() + strloc;
-	preglob(str, 0);
 
 	/* zero = subtype == VSTRIMLEFT || subtype == VSTRIMLEFTMAX */
 	zero = subtype >> 1;
 	/* VSTRIMLEFT/VSTRIMRIGHTMAX -> scanleft */
 	scan = (subtype & 1) ^ zero ? scanleft : scanright;
 
-	loc = scan(startp, rmesc, rmescend, str, quotes, zero);
+	endp = stackblock() + strloc - 1;
+	loc = scan(startp, endp, rmesc, rmescend, str, quotes, zero);
 	if (loc) {
 		if (zero) {
-			memmove(startp, loc, str - loc);
-			loc = startp + (str - loc) - 1;
+			memmove(startp, loc, endp - loc);
+			loc = startp + (endp - loc);
 		}
 		*loc = '\0';
 	} else
-		loc = str - 1;
+		loc = endp;
 
 out:
 	amount = loc - expdest;
@@ -1501,7 +1504,9 @@ msort(struct strlist *list, int len)
 STATIC inline int
 patmatch(char *pattern, const char *string)
 {
-	return pmatch(preglob(pattern, 0), string);
+	return pmatch(preglob(pattern, FNMATCH_IS_ENABLED ?
+				       RMESCAPE_ALLOC | RMESCAPE_GROW : 0),
+		      string);
 }
 
 
@@ -1654,15 +1659,22 @@ _rmescapes(char *str, int flag)
 	int notescaped;
 	int globbing;
 
-	p = strpbrk(str, qchars);
+	p = strpbrk(str, cqchars);
 	if (!p) {
 		return str;
 	}
 	q = p;
 	r = str;
+	globbing = flag & RMESCAPE_GLOB;
+
 	if (flag & RMESCAPE_ALLOC) {
 		size_t len = p - str;
-		size_t fulllen = len + strlen(p) + 1;
+		size_t fulllen = strlen(p);
+
+		if (FNMATCH_IS_ENABLED && globbing)
+			fulllen *= 2;
+
+		fulllen += len + 1;
 
 		if (flag & RMESCAPE_GROW) {
 			int strloc = str - (char *)stackblock();
@@ -1680,7 +1692,6 @@ _rmescapes(char *str, int flag)
 			q = mempcpy(q, str, len);
 		}
 	}
-	globbing = flag & RMESCAPE_GLOB;
 	notescaped = globbing;
 	while (*p) {
 		if (*p == (char)CTLQUOTEMARK) {
@@ -1693,8 +1704,11 @@ _rmescapes(char *str, int flag)
 			notescaped = 0;
 			goto copy;
 		}
+		if (FNMATCH_IS_ENABLED && *p == '^')
+			goto add_escape;
 		if (*p == (char)CTLESC) {
 			p++;
+add_escape:
 			if (notescaped)
 				*q++ = '\\';
 		}
diff --git a/src/mystring.c b/src/mystring.c
index de624b8..ed3c8f6 100644
--- a/src/mystring.c
+++ b/src/mystring.c
@@ -62,7 +62,12 @@ const char spcstr[] = " ";
 const char snlfmt[] = "%s\n";
 const char dolatstr[] = { CTLQUOTEMARK, CTLVAR, VSNORMAL, '@', '=',
 			  CTLQUOTEMARK, '\0' };
-const char qchars[] = { CTLESC, CTLQUOTEMARK, 0 };
+const char cqchars[] = {
+#ifdef HAVE_FNMATCH
+	'^',
+#endif
+	CTLESC, CTLQUOTEMARK, 0
+};
 const char illnum[] = "Illegal number: %s";
 const char homestr[] = "HOME";
 
diff --git a/src/mystring.h b/src/mystring.h
index 083ea98..564b911 100644
--- a/src/mystring.h
+++ b/src/mystring.h
@@ -37,11 +37,18 @@
 #include <inttypes.h>
 #include <string.h>
 
+#ifdef HAVE_FNMATCH
+#define FNMATCH_IS_ENABLED 1
+#else
+#define FNMATCH_IS_ENABLED 0
+#endif
+
 extern const char snlfmt[];
 extern const char spcstr[];
 extern const char dolatstr[];
 #define DOLATSTRLEN 6
-extern const char qchars[];
+extern const char cqchars[];
+#define qchars (cqchars + FNMATCH_IS_ENABLED)
 extern const char illnum[];
 extern const char homestr[];
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [v2 PATCH] expand: Always quote caret when using fnmatch
  2022-01-19  5:37       ` [v2 PATCH] " Herbert Xu
@ 2022-02-20  7:15         ` Stephane Chazelas
  2022-02-21 16:39           ` Eric Blake
  0 siblings, 1 reply; 15+ messages in thread
From: Stephane Chazelas @ 2022-02-20  7:15 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Harald van Dijk, calestyo, dash

2022-01-19 16:37:54 +1100, Herbert Xu:
[...]
> This patch forces ^ to be a literal when we use fnmatch.
[...]

Hi,

I'm coming here from a discussion at
https://www.austingroupbugs.net/view.php?id=1558 where I'm
trying to get POSIX to mandate (or suggest that it will mandate
in the future) [^x] works like [!x] to negate a bracket
expression set like it does already in most
shell/fnmatch()/glob() implementations, and to remove that
non-sensical (to me at least) discrepancy with regexps and most
other shells / languages.

To me, that patch is a step in the wrong direction.

Other Ash-based shells already made the move (make [^x] an alias
for [!x]) a long time ago (FreeBSD / NetBSD sh in 1997), BSD
fnmatch() in 1994. The GNU libc and GNU shell have always
supported [^x] AFAIK. So has zsh. ksh made the move in 2005.

In dash, the move to fnmatch() in that case was a step in the
right direction, and reverting it would be an unfortunate step
back to me.

If the point of having dash use fnmatch() is to add consistency
with other tools used from the shell (like find -name/-path...),
then this change doesn't help either on the majority of systems.

The "fix" also seems incomplete and causes at least this
regression:

$ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\^]'
dash->fnmatch("[\\\\^]", "\\", 0)                              = 0
match

See how that added \ caused it to be included in the set. Compare with:

+++ exited (status 0) +++
$ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\!]'
dash->fnmatch("[\\!]", "\\", 0)                                = 1
+++ exited (status 0) +++
$ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\x]'
dash->fnmatch("[\\x]", "\\", 0)                                = 1
+++ exited (status 0) +++

(here on Debian with glibc 2.33)

-- 
Stephane

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [v2 PATCH] expand: Always quote caret when using fnmatch
  2022-02-20  7:15         ` Stephane Chazelas
@ 2022-02-21 16:39           ` Eric Blake
  2022-02-21 17:06             ` Harald van Dijk
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2022-02-21 16:39 UTC (permalink / raw)
  To: Stephane Chazelas; +Cc: Herbert Xu, Harald van Dijk, calestyo, dash

On Sun, Feb 20, 2022 at 07:15:44AM +0000, Stephane Chazelas wrote:
> 2022-01-19 16:37:54 +1100, Herbert Xu:
> [...]
> > This patch forces ^ to be a literal when we use fnmatch.
> [...]
> The "fix" also seems incomplete and causes at least this
> regression:
> 
> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\^]'
> dash->fnmatch("[\\\\^]", "\\", 0)                              = 0
> match
> 
> See how that added \ caused it to be included in the set. Compare with:
> 
> +++ exited (status 0) +++
> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\!]'
> dash->fnmatch("[\\!]", "\\", 0)                                = 1
> +++ exited (status 0) +++
> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\x]'
> dash->fnmatch("[\\x]", "\\", 0)                                = 1
> +++ exited (status 0) +++

If you want to escape ^ to always be treated as a literal when calling
fnmatch(), you have to spell it something like [[=^=]...], not [\^...]
(that is, since \^ inside [] does NOT treat the \ as an escape
character, but as yet another literal character as part of the brakcet
group, you instead have to resort to calling out the literal character
^ via alternative spellings such as by a collation equivalent).
Another alternative is to rewrite the [] bracket exprsesion so that ^
is no longer first (but be careful if the character after ^ is -, ],
or !, as the rewrite is not always a trivial swap with the next
character).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [v2 PATCH] expand: Always quote caret when using fnmatch
  2022-02-21 16:39           ` Eric Blake
@ 2022-02-21 17:06             ` Harald van Dijk
  2022-02-21 19:15               ` Stephane Chazelas
  0 siblings, 1 reply; 15+ messages in thread
From: Harald van Dijk @ 2022-02-21 17:06 UTC (permalink / raw)
  To: Eric Blake, Stephane Chazelas; +Cc: Herbert Xu, calestyo, dash

On 21/02/2022 16:39, Eric Blake wrote:
> On Sun, Feb 20, 2022 at 07:15:44AM +0000, Stephane Chazelas wrote:
>> 2022-01-19 16:37:54 +1100, Herbert Xu:
>> [...]
>>> This patch forces ^ to be a literal when we use fnmatch.
>> [...]
>> The "fix" also seems incomplete and causes at least this
>> regression:
>>
>> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\^]'
>> dash->fnmatch("[\\\\^]", "\\", 0)                              = 0
>> match
>>
>> See how that added \ caused it to be included in the set. Compare with:
>>
>> +++ exited (status 0) +++
>> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\!]'
>> dash->fnmatch("[\\!]", "\\", 0)                                = 1
>> +++ exited (status 0) +++
>> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\x]'
>> dash->fnmatch("[\\x]", "\\", 0)                                = 1
>> +++ exited (status 0) +++
> 
> If you want to escape ^ to always be treated as a literal when calling
> fnmatch(), you have to spell it something like [[=^=]...], not [\^...]
> (that is, since \^ inside [] does NOT treat the \ as an escape
> character, but as yet another literal character as part of the brakcet
> group, you instead have to resort to calling out the literal character
> ^ via alternative spellings such as by a collation equivalent).

This is correct for regular expressions, unspecified for shell patterns 
in general after bug 1234 was resolved, and always incorrect for 
fnmatch() after bug 1190 was resolved. In fnmatch(), backslash has to 
work even inside bracket expressions. Whether it had to may not 
previously have been clear in the standard, but it has been made clear 
now and that is also how it works in glibc. That part of the patch is 
not a problem.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [v2 PATCH] expand: Always quote caret when using fnmatch
  2022-02-21 17:06             ` Harald van Dijk
@ 2022-02-21 19:15               ` Stephane Chazelas
  0 siblings, 0 replies; 15+ messages in thread
From: Stephane Chazelas @ 2022-02-21 19:15 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: Eric Blake, Herbert Xu, calestyo, dash

2022-02-21 17:06:13 +0000, Harald van Dijk:
[...]
> > If you want to escape ^ to always be treated as a literal when calling
> > fnmatch(), you have to spell it something like [[=^=]...], not [\^...]
> > (that is, since \^ inside [] does NOT treat the \ as an escape
> > character, but as yet another literal character as part of the brakcet
> > group, you instead have to resort to calling out the literal character
> > ^ via alternative spellings such as by a collation equivalent).
> 
> This is correct for regular expressions, unspecified for shell patterns in
> general after bug 1234 was resolved, and always incorrect for fnmatch()
> after bug 1190 was resolved. In fnmatch(), backslash has to work even inside
> bracket expressions. Whether it had to may not previously have been clear in
> the standard, but it has been made clear now and that is also how it works
> in glibc. That part of the patch is not a problem.
[...]

Thanks, and sorry about that. I was actually the one raising bug
1234 and misremembered the outcome.

case $var in ([\!x])

is specified, but:

pattern='[\!x]'
case $var in ($pattern)

is not indeed, so, the behaviour  of:

sh -c 'case "\\" in ($1) echo match; esac' sh '[\^x]'

is unspecified, so even though that patch changes the behaviour
in this instance, it's still not a conformance bug.

-- 
Stephane

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-02-21 19:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-12 16:25 possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression Christoph Anton Mitterer
2022-01-12 17:20 ` Harald van Dijk
2022-01-12 17:47   ` Christoph Anton Mitterer
2022-01-12 18:17     ` Harald van Dijk
2022-01-12 18:21       ` Christoph Anton Mitterer
2022-01-18  6:13   ` [PATCH] expand: Always quote caret when using fnmatch Herbert Xu
2022-01-18  8:44     ` Harald van Dijk
2022-01-19  5:37       ` [v2 PATCH] " Herbert Xu
2022-02-20  7:15         ` Stephane Chazelas
2022-02-21 16:39           ` Eric Blake
2022-02-21 17:06             ` Harald van Dijk
2022-02-21 19:15               ` Stephane Chazelas
2022-01-18 14:29     ` [PATCH] " Christoph Anton Mitterer
2022-01-18 14:54       ` Chet Ramey
2022-01-18 22:33       ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).