* possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression @ 2022-01-12 16:25 Christoph Anton Mitterer 2022-01-12 17:20 ` Harald van Dijk 0 siblings, 1 reply; 15+ messages in thread From: Christoph Anton Mitterer @ 2022-01-12 16:25 UTC (permalink / raw) To: dash Hey. I recently got some big deal of help from the people at the help-bash mailing list when I've tried to understand what POSIX mandates with respect to pattern matching (that is in the sense of [0], not Basic/Extended Regular Expressions). I'm still not so sure whether I understand it exactly by the wording of POSIX itself (which seems a bit odd to me), but what people explained[1] me at help-bash - and I hope I explain it correctly - is: In the patterns, even in a bracket expression in a pattern, there may be quoting (with double and single quotes), and - and this is key - anything that is quoted is already taken literal with respect to the pattern. So when one has e.g. a case compound command: case $foo in (['*?']) ... is already the literal * and ? within a pattern's bracket expression. Further, POSIX says: "If an open bracket introduces a bracket expression as in XBD RE Bracket Expression, except that the <exclamation-mark> character ( '!' ) shall replace the <circumflex> character ( '^' ) in its role in a non-matching list in the regular expression notation, it shall introduce a pattern bracket expression. A bracket expression starting with an unquoted <circumflex> character produces unspecified results. Otherwise, '[' shall match the character itself." I found not the following probably wrong behaviour of dash and busybox' sh: $ cat circumflex-test case "$1" in (['^.a']) echo match ;; (*) echo else esac $ cat exclamation-test case "$1" in (['!.a']) echo match ;; (*) echo else esac $ cat run-circumflex echo dash: dash circumflex-test ^ dash circumflex-test . dash circumflex-test a echo busybox-sh: busybox sh circumflex-test ^ busybox sh circumflex-test . busybox sh circumflex-test a echo bash: bash circumflex-test ^ bash circumflex-test . bash circumflex-test a echo klibc-sh: /usr/lib/klibc/bin/sh circumflex-test ^ /usr/lib/klibc/bin/sh circumflex-test . /usr/lib/klibc/bin/sh circumflex-test a $ cat run-exlamation echo dash: dash exclamation-test '!' dash exclamation-test . dash exclamation-test a echo busybox-sh: busybox sh exclamation-test '!' busybox sh exclamation-test . busybox sh exclamation-test a echo bash: bash exclamation-test '!' bash exclamation-test . bash exclamation-test a echo klibc-sh: /usr/lib/klibc/bin/sh exclamation-test '!' /usr/lib/klibc/bin/sh exclamation-test . /usr/lib/klibc/bin/sh exclamation-test a When run: $ sh run-circumflex | paste - - - - | column -t dash: match else else busybox-sh: match else else bash: match match match klibc-sh: match match match $ ^ . a $ sh run-exlamation | paste - - - - | column -t dash: match match match busybox-sh: match match match bash: match match match klibc-sh: match match match $ ! . a The results for the run-circumflex seem pretty odd. Apparently, the ^ is taken literally, but the other two are negated. $ dash circumflex-test b match $ busybox sh circumflex-test b match match again (which, AFAIU, they should not). While POSIX does say: "A bracket expression starting with an unquoted <circumflex> character produces unspecified results." ... the circumflex *is* quoted above... I haven't verified any further unusual patterns like ['$var'] vs. ["$var"], so maybe an eye should be kept open, whether there could be any issues as well. Thanks, Chris. PS: For reference, the bug[2] I've opened at BusyBox. [0] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13 [1] https://lists.gnu.org/archive/html/help-bash/2022-01/msg00000.html [2] https://bugs.busybox.net/show_bug.cgi?id=14516 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression 2022-01-12 16:25 possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression Christoph Anton Mitterer @ 2022-01-12 17:20 ` Harald van Dijk 2022-01-12 17:47 ` Christoph Anton Mitterer 2022-01-18 6:13 ` [PATCH] expand: Always quote caret when using fnmatch Herbert Xu 0 siblings, 2 replies; 15+ messages in thread From: Harald van Dijk @ 2022-01-12 17:20 UTC (permalink / raw) To: Christoph Anton Mitterer, dash Hi, On 12/01/2022 16:25, Christoph Anton Mitterer wrote: > The results for the run-circumflex seem pretty odd. > Apparently, the ^ is taken literally, but the other two are negated. The ^ is not taken literally. The ^ in the pattern is wrongly taken as the negation operator, and the ^ in the argument is then reported as a match because it is neither . nor a. This bug (you're right that it's a bug) is specific to builds that use fnmatch(). In dash itself, ^ is always assumed as a literal. In builds with --disable-fnmatch you get correct results. In builds with --enable-fnmatch, because dash assumes ^ is assumed as a literal, dash fails to escape it before passing it on to fnmatch(), and the system fnmatch() may choose differently from dash on how to deal with unquoted ^s. What dash should do to get whatever behaviour the system fnmatch() chooses is leave unquoted ^s unquoted, and leave quoted ^s quoted. This can be achieved by --- a/src/mksyntax.c +++ b/src/mksyntax.c @@ -178,14 +178,14 @@ main(int argc, char **argv) add("$", "CVAR"); add("}", "CENDVAR"); /* ':/' for tilde expansion, '-' for [a\-x] pattern ranges */ - add("!*?[=~:/-]", "CCTL"); + add("!*?[^=~:/-]", "CCTL"); print("dqsyntax"); init(); fputs("\n/* syntax table used when in single quotes */\n", cfile); add("\n", "CNL"); add("'", "CENDQUOTE"); /* ':/' for tilde expansion, '-' for [a\-x] pattern ranges */ - add("!*?[=~:/-]\\", "CCTL"); + add("!*?[^=~:/-]\\", "CCTL"); print("sqsyntax"); init(); fputs("\n/* syntax table used when in arithmetic */\n", cfile); However, whether this is the correct approach is a matter of opinion: dash could alternatively choose to always take ^ as a literal and always escape it before passing it on to fnmatch(), overriding whatever decision the libc people had taken. Cheers, Harald van Dijk ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression 2022-01-12 17:20 ` Harald van Dijk @ 2022-01-12 17:47 ` Christoph Anton Mitterer 2022-01-12 18:17 ` Harald van Dijk 2022-01-18 6:13 ` [PATCH] expand: Always quote caret when using fnmatch Herbert Xu 1 sibling, 1 reply; 15+ messages in thread From: Christoph Anton Mitterer @ 2022-01-12 17:47 UTC (permalink / raw) To: Harald van Dijk, dash On Wed, 2022-01-12 at 17:20 +0000, Harald van Dijk wrote: > However, whether this is the correct approach is a matter of opinion: > dash could alternatively choose to always take ^ as a literal and > always > escape it before passing it on to fnmatch() Well I personally think the best would to *always* take ^ literally, whether quoted or not. That would match the behaviour of bash and klibc sh, and also seems more in the spirit of POSIX (which, while saying that an unquoted ^ produces undefined behaviour, also says that: "( '!' ) shall replace the <circumflex> character ( '^' )"" ... "replace", not complement. Cheers, Chris. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression 2022-01-12 17:47 ` Christoph Anton Mitterer @ 2022-01-12 18:17 ` Harald van Dijk 2022-01-12 18:21 ` Christoph Anton Mitterer 0 siblings, 1 reply; 15+ messages in thread From: Harald van Dijk @ 2022-01-12 18:17 UTC (permalink / raw) To: Christoph Anton Mitterer, dash On 12/01/2022 17:47, Christoph Anton Mitterer wrote: > On Wed, 2022-01-12 at 17:20 +0000, Harald van Dijk wrote: >> However, whether this is the correct approach is a matter of opinion: >> dash could alternatively choose to always take ^ as a literal and >> always >> escape it before passing it on to fnmatch() > > Well I personally think the best would to *always* take ^ literally, > whether quoted or not. > > That would match the behaviour of bash and klibc sh, and also seems > more in the spirit of POSIX (which, while saying that an unquoted ^ > produces undefined behaviour, also says that: > "( '!' ) shall replace the <circumflex> character ( '^' )"" > > ... "replace", not complement. Actually, in bash, unquoted ^ does act as negation. Same in ksh, which is the basis for a lot of POSIX. As for the spirit of POSIX, POSIX does say in the rationale that it's unspecified specifically in order to allow that: The restriction on a <circumflex> in a bracket expression is to allow implementations that support pattern matching using the <circumflex> as the negation character in addition to the <exclamation-mark>. But personally, I have no opinion on which action dash should take, as long as it gets the right results for quoted ^. Cheers, Harald van Dijk ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression 2022-01-12 18:17 ` Harald van Dijk @ 2022-01-12 18:21 ` Christoph Anton Mitterer 0 siblings, 0 replies; 15+ messages in thread From: Christoph Anton Mitterer @ 2022-01-12 18:21 UTC (permalink / raw) To: Harald van Dijk, dash On Wed, 2022-01-12 at 18:17 +0000, Harald van Dijk wrote: > Actually, in bash, unquoted ^ does act as negation. Oh, sure,... I event tested that half an hour before and still wrote it wrong ^^. Then I'd prefer if it would use an unquoted (only) ^ as negation. The standard allows it, and it's better to have some homogeneity. Cheers, Chris. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] expand: Always quote caret when using fnmatch 2022-01-12 17:20 ` Harald van Dijk 2022-01-12 17:47 ` Christoph Anton Mitterer @ 2022-01-18 6:13 ` Herbert Xu 2022-01-18 8:44 ` Harald van Dijk 2022-01-18 14:29 ` [PATCH] " Christoph Anton Mitterer 1 sibling, 2 replies; 15+ messages in thread From: Herbert Xu @ 2022-01-18 6:13 UTC (permalink / raw) To: Harald van Dijk; +Cc: calestyo, dash Harald van Dijk <harald@gigawatt.nl> wrote: > > On 12/01/2022 16:25, Christoph Anton Mitterer wrote: >> The results for the run-circumflex seem pretty odd. >> Apparently, the ^ is taken literally, but the other two are negated. > > The ^ is not taken literally. The ^ in the pattern is wrongly taken as > the negation operator, and the ^ in the argument is then reported as a > match because it is neither . nor a. > > This bug (you're right that it's a bug) is specific to builds that use > fnmatch(). In dash itself, ^ is always assumed as a literal. In builds > with --disable-fnmatch you get correct results. In builds with > --enable-fnmatch, because dash assumes ^ is assumed as a literal, dash > fails to escape it before passing it on to fnmatch(), and the system > fnmatch() may choose differently from dash on how to deal with unquoted > ^s. What dash should do to get whatever behaviour the system fnmatch() > chooses is leave unquoted ^s unquoted, and leave quoted ^s quoted. This > can be achieved by > > --- a/src/mksyntax.c > +++ b/src/mksyntax.c > @@ -178,14 +178,14 @@ main(int argc, char **argv) > add("$", "CVAR"); > add("}", "CENDVAR"); > /* ':/' for tilde expansion, '-' for [a\-x] pattern ranges */ > - add("!*?[=~:/-]", "CCTL"); > + add("!*?[^=~:/-]", "CCTL"); > print("dqsyntax"); > init(); > fputs("\n/* syntax table used when in single quotes */\n", cfile); > add("\n", "CNL"); > add("'", "CENDQUOTE"); > /* ':/' for tilde expansion, '-' for [a\-x] pattern ranges */ > - add("!*?[=~:/-]\\", "CCTL"); > + add("!*?[^=~:/-]\\", "CCTL"); > print("sqsyntax"); > init(); > fputs("\n/* syntax table used when in arithmetic */\n", cfile); > > However, whether this is the correct approach is a matter of opinion: > dash could alternatively choose to always take ^ as a literal and always > escape it before passing it on to fnmatch(), overriding whatever > decision the libc people had taken. Yes, this would produce the most consistent result. This patch forces ^ to be a literal when we use fnmatch. Fixes: 7638476c18f2 ("shell: Enable fnmatch/glob by default") Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> Suggested-by: Harald van Dijk <harald@gigawatt.nl> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/src/expand.c b/src/expand.c index aea5cc4..04bf8fb 100644 --- a/src/expand.c +++ b/src/expand.c @@ -47,6 +47,9 @@ #include <string.h> #ifdef HAVE_FNMATCH #include <fnmatch.h> +#define FNMATCH_IS_ENABLED 1 +#else +#define FNMATCH_IS_ENABLED 0 #endif #ifdef HAVE_GLOB #include <glob.h> @@ -1693,8 +1696,11 @@ _rmescapes(char *str, int flag) notescaped = 0; goto copy; } + if (FNMATCH_IS_ENABLED && *p == '^') + goto add_escape; if (*p == (char)CTLESC) { p++; +add_escape: if (notescaped) *q++ = '\\'; } -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] expand: Always quote caret when using fnmatch 2022-01-18 6:13 ` [PATCH] expand: Always quote caret when using fnmatch Herbert Xu @ 2022-01-18 8:44 ` Harald van Dijk 2022-01-19 5:37 ` [v2 PATCH] " Herbert Xu 2022-01-18 14:29 ` [PATCH] " Christoph Anton Mitterer 1 sibling, 1 reply; 15+ messages in thread From: Harald van Dijk @ 2022-01-18 8:44 UTC (permalink / raw) To: Herbert Xu; +Cc: calestyo, dash On 18/01/2022 06:13, Herbert Xu wrote: > This patch forces ^ to be a literal when we use fnmatch. > > Fixes: 7638476c18f2 ("shell: Enable fnmatch/glob by default") > Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> > Suggested-by: Harald van Dijk <harald@gigawatt.nl> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/src/expand.c b/src/expand.c > index aea5cc4..04bf8fb 100644 > --- a/src/expand.c > +++ b/src/expand.c > @@ -47,6 +47,9 @@ > #include <string.h> > #ifdef HAVE_FNMATCH > #include <fnmatch.h> > +#define FNMATCH_IS_ENABLED 1 > +#else > +#define FNMATCH_IS_ENABLED 0 > #endif > #ifdef HAVE_GLOB > #include <glob.h> > @@ -1693,8 +1696,11 @@ _rmescapes(char *str, int flag) > notescaped = 0; > goto copy; > } > + if (FNMATCH_IS_ENABLED && *p == '^') > + goto add_escape; > if (*p == (char)CTLESC) { > p++; > +add_escape: > if (notescaped) > *q++ = '\\'; > } > The loop that is modified by this patch is only taken after any qchars are seen, so for e.g. var=abc echo ${var#[^a]} it has no effect. More importantly though, _rmescapes is used to modify strings in place. This patch causes _rmescapes to try and grow strings, which cannot ever work. A test case for this is case aa in \a[^a]) echo match ;; esac which fails with a segfault after this patch is applied. Cheers, Harald van Dijk ^ permalink raw reply [flat|nested] 15+ messages in thread
* [v2 PATCH] expand: Always quote caret when using fnmatch 2022-01-18 8:44 ` Harald van Dijk @ 2022-01-19 5:37 ` Herbert Xu 2022-02-20 7:15 ` Stephane Chazelas 0 siblings, 1 reply; 15+ messages in thread From: Herbert Xu @ 2022-01-19 5:37 UTC (permalink / raw) To: Harald van Dijk; +Cc: calestyo, dash On Tue, Jan 18, 2022 at 08:44:05AM +0000, Harald van Dijk wrote: > > The loop that is modified by this patch is only taken after any qchars are > seen, so for e.g. > > var=abc > echo ${var#[^a]} > > it has no effect. More importantly though, _rmescapes is used to modify Good catch. I have modified qchars accordingly. > strings in place. This patch causes _rmescapes to try and grow strings, > which cannot ever work. A test case for this is > > case aa in \a[^a]) echo match ;; esac > > which fails with a segfault after this patch is applied. Indeed. I have modified the RMESCAPE_ALLOC to allow growth. Thanks, ---8<--- This patch forces ^ to be a literal when we use fnmatch. In order to allow for the extra space to quote the caret, the function _rmescapes will allocate up to twice the memory if the flag RMESCAPE_GLOB is set. Fixes: 7638476c18f2 ("shell: Enable fnmatch/glob by default") Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> Suggested-by: Harald van Dijk <harald@gigawatt.nl> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/src/expand.c b/src/expand.c index aea5cc4..9906d8a 100644 --- a/src/expand.c +++ b/src/expand.c @@ -135,8 +135,6 @@ STATIC int pmatch(const char *, const char *); #endif static size_t cvtnum(intmax_t num, int flags); STATIC size_t esclen(const char *, const char *); -STATIC char *scanleft(char *, char *, char *, char *, int, int); -STATIC char *scanright(char *, char *, char *, char *, int, int); STATIC void varunset(const char *, const char *, const char *, int) __attribute__((__noreturn__)); @@ -541,10 +539,8 @@ out: } -STATIC char * -scanleft( - char *startp, char *rmesc, char *rmescend, char *str, int quotes, - int zero +static char *scanleft(char *startp, char *endp, char *rmesc, char *rmescend, + char *str, int quotes, int zero ) { char *loc; char *loc2; @@ -573,16 +569,14 @@ scanleft( } -STATIC char * -scanright( - char *startp, char *rmesc, char *rmescend, char *str, int quotes, - int zero +static char *scanright(char *startp, char *endp, char *rmesc, char *rmescend, + char *str, int quotes, int zero ) { int esc = 0; char *loc; char *loc2; - for (loc = str - 1, loc2 = rmescend; loc >= startp; loc2--) { + for (loc = endp, loc2 = rmescend; loc >= startp; loc2--) { int match; char c = *loc2; const char *s = loc2; @@ -618,7 +612,9 @@ static char *subevalvar(char *start, char *str, int strloc, int startloc, long amount; char *rmesc, *rmescend; int zero; - char *(*scan)(char *, char *, char *, char *, int , int); + char *(*scan)(char *, char *, char *, char *, char *, int , int); + int nstrloc = strloc; + char *endp; char *p; p = argstr(start, (flag & EXP_DISCARD) | EXP_TILDE | @@ -646,33 +642,40 @@ static char *subevalvar(char *start, char *str, int strloc, int startloc, abort(); #endif - rmesc = startp; rmescend = stackblock() + strloc; + str = preglob(rmescend, FNMATCH_IS_ENABLED ? + RMESCAPE_ALLOC | RMESCAPE_GROW : 0); + if (FNMATCH_IS_ENABLED) { + startp = stackblock() + startloc; + rmescend = stackblock() + strloc; + nstrloc = str - (char *)stackblock(); + } + + rmesc = startp; if (quotes) { rmesc = _rmescapes(startp, RMESCAPE_ALLOC | RMESCAPE_GROW); - if (rmesc != startp) { + if (rmesc != startp) rmescend = expdest; - startp = stackblock() + startloc; - } + startp = stackblock() + startloc; + str = stackblock() + nstrloc; } rmescend--; - str = stackblock() + strloc; - preglob(str, 0); /* zero = subtype == VSTRIMLEFT || subtype == VSTRIMLEFTMAX */ zero = subtype >> 1; /* VSTRIMLEFT/VSTRIMRIGHTMAX -> scanleft */ scan = (subtype & 1) ^ zero ? scanleft : scanright; - loc = scan(startp, rmesc, rmescend, str, quotes, zero); + endp = stackblock() + strloc - 1; + loc = scan(startp, endp, rmesc, rmescend, str, quotes, zero); if (loc) { if (zero) { - memmove(startp, loc, str - loc); - loc = startp + (str - loc) - 1; + memmove(startp, loc, endp - loc); + loc = startp + (endp - loc); } *loc = '\0'; } else - loc = str - 1; + loc = endp; out: amount = loc - expdest; @@ -1501,7 +1504,9 @@ msort(struct strlist *list, int len) STATIC inline int patmatch(char *pattern, const char *string) { - return pmatch(preglob(pattern, 0), string); + return pmatch(preglob(pattern, FNMATCH_IS_ENABLED ? + RMESCAPE_ALLOC | RMESCAPE_GROW : 0), + string); } @@ -1654,15 +1659,22 @@ _rmescapes(char *str, int flag) int notescaped; int globbing; - p = strpbrk(str, qchars); + p = strpbrk(str, cqchars); if (!p) { return str; } q = p; r = str; + globbing = flag & RMESCAPE_GLOB; + if (flag & RMESCAPE_ALLOC) { size_t len = p - str; - size_t fulllen = len + strlen(p) + 1; + size_t fulllen = strlen(p); + + if (FNMATCH_IS_ENABLED && globbing) + fulllen *= 2; + + fulllen += len + 1; if (flag & RMESCAPE_GROW) { int strloc = str - (char *)stackblock(); @@ -1680,7 +1692,6 @@ _rmescapes(char *str, int flag) q = mempcpy(q, str, len); } } - globbing = flag & RMESCAPE_GLOB; notescaped = globbing; while (*p) { if (*p == (char)CTLQUOTEMARK) { @@ -1693,8 +1704,11 @@ _rmescapes(char *str, int flag) notescaped = 0; goto copy; } + if (FNMATCH_IS_ENABLED && *p == '^') + goto add_escape; if (*p == (char)CTLESC) { p++; +add_escape: if (notescaped) *q++ = '\\'; } diff --git a/src/mystring.c b/src/mystring.c index de624b8..ed3c8f6 100644 --- a/src/mystring.c +++ b/src/mystring.c @@ -62,7 +62,12 @@ const char spcstr[] = " "; const char snlfmt[] = "%s\n"; const char dolatstr[] = { CTLQUOTEMARK, CTLVAR, VSNORMAL, '@', '=', CTLQUOTEMARK, '\0' }; -const char qchars[] = { CTLESC, CTLQUOTEMARK, 0 }; +const char cqchars[] = { +#ifdef HAVE_FNMATCH + '^', +#endif + CTLESC, CTLQUOTEMARK, 0 +}; const char illnum[] = "Illegal number: %s"; const char homestr[] = "HOME"; diff --git a/src/mystring.h b/src/mystring.h index 083ea98..564b911 100644 --- a/src/mystring.h +++ b/src/mystring.h @@ -37,11 +37,18 @@ #include <inttypes.h> #include <string.h> +#ifdef HAVE_FNMATCH +#define FNMATCH_IS_ENABLED 1 +#else +#define FNMATCH_IS_ENABLED 0 +#endif + extern const char snlfmt[]; extern const char spcstr[]; extern const char dolatstr[]; #define DOLATSTRLEN 6 -extern const char qchars[]; +extern const char cqchars[]; +#define qchars (cqchars + FNMATCH_IS_ENABLED) extern const char illnum[]; extern const char homestr[]; -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [v2 PATCH] expand: Always quote caret when using fnmatch 2022-01-19 5:37 ` [v2 PATCH] " Herbert Xu @ 2022-02-20 7:15 ` Stephane Chazelas 2022-02-21 16:39 ` Eric Blake 0 siblings, 1 reply; 15+ messages in thread From: Stephane Chazelas @ 2022-02-20 7:15 UTC (permalink / raw) To: Herbert Xu; +Cc: Harald van Dijk, calestyo, dash 2022-01-19 16:37:54 +1100, Herbert Xu: [...] > This patch forces ^ to be a literal when we use fnmatch. [...] Hi, I'm coming here from a discussion at https://www.austingroupbugs.net/view.php?id=1558 where I'm trying to get POSIX to mandate (or suggest that it will mandate in the future) [^x] works like [!x] to negate a bracket expression set like it does already in most shell/fnmatch()/glob() implementations, and to remove that non-sensical (to me at least) discrepancy with regexps and most other shells / languages. To me, that patch is a step in the wrong direction. Other Ash-based shells already made the move (make [^x] an alias for [!x]) a long time ago (FreeBSD / NetBSD sh in 1997), BSD fnmatch() in 1994. The GNU libc and GNU shell have always supported [^x] AFAIK. So has zsh. ksh made the move in 2005. In dash, the move to fnmatch() in that case was a step in the right direction, and reverting it would be an unfortunate step back to me. If the point of having dash use fnmatch() is to add consistency with other tools used from the shell (like find -name/-path...), then this change doesn't help either on the majority of systems. The "fix" also seems incomplete and causes at least this regression: $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\^]' dash->fnmatch("[\\\\^]", "\\", 0) = 0 match See how that added \ caused it to be included in the set. Compare with: +++ exited (status 0) +++ $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\!]' dash->fnmatch("[\\!]", "\\", 0) = 1 +++ exited (status 0) +++ $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\x]' dash->fnmatch("[\\x]", "\\", 0) = 1 +++ exited (status 0) +++ (here on Debian with glibc 2.33) -- Stephane ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [v2 PATCH] expand: Always quote caret when using fnmatch 2022-02-20 7:15 ` Stephane Chazelas @ 2022-02-21 16:39 ` Eric Blake 2022-02-21 17:06 ` Harald van Dijk 0 siblings, 1 reply; 15+ messages in thread From: Eric Blake @ 2022-02-21 16:39 UTC (permalink / raw) To: Stephane Chazelas; +Cc: Herbert Xu, Harald van Dijk, calestyo, dash On Sun, Feb 20, 2022 at 07:15:44AM +0000, Stephane Chazelas wrote: > 2022-01-19 16:37:54 +1100, Herbert Xu: > [...] > > This patch forces ^ to be a literal when we use fnmatch. > [...] > The "fix" also seems incomplete and causes at least this > regression: > > $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\^]' > dash->fnmatch("[\\\\^]", "\\", 0) = 0 > match > > See how that added \ caused it to be included in the set. Compare with: > > +++ exited (status 0) +++ > $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\!]' > dash->fnmatch("[\\!]", "\\", 0) = 1 > +++ exited (status 0) +++ > $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\x]' > dash->fnmatch("[\\x]", "\\", 0) = 1 > +++ exited (status 0) +++ If you want to escape ^ to always be treated as a literal when calling fnmatch(), you have to spell it something like [[=^=]...], not [\^...] (that is, since \^ inside [] does NOT treat the \ as an escape character, but as yet another literal character as part of the brakcet group, you instead have to resort to calling out the literal character ^ via alternative spellings such as by a collation equivalent). Another alternative is to rewrite the [] bracket exprsesion so that ^ is no longer first (but be careful if the character after ^ is -, ], or !, as the rewrite is not always a trivial swap with the next character). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [v2 PATCH] expand: Always quote caret when using fnmatch 2022-02-21 16:39 ` Eric Blake @ 2022-02-21 17:06 ` Harald van Dijk 2022-02-21 19:15 ` Stephane Chazelas 0 siblings, 1 reply; 15+ messages in thread From: Harald van Dijk @ 2022-02-21 17:06 UTC (permalink / raw) To: Eric Blake, Stephane Chazelas; +Cc: Herbert Xu, calestyo, dash On 21/02/2022 16:39, Eric Blake wrote: > On Sun, Feb 20, 2022 at 07:15:44AM +0000, Stephane Chazelas wrote: >> 2022-01-19 16:37:54 +1100, Herbert Xu: >> [...] >>> This patch forces ^ to be a literal when we use fnmatch. >> [...] >> The "fix" also seems incomplete and causes at least this >> regression: >> >> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\^]' >> dash->fnmatch("[\\\\^]", "\\", 0) = 0 >> match >> >> See how that added \ caused it to be included in the set. Compare with: >> >> +++ exited (status 0) +++ >> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\!]' >> dash->fnmatch("[\\!]", "\\", 0) = 1 >> +++ exited (status 0) +++ >> $ ltrace -e fnmatch ./bin/dash -c 'case "\\" in ($1) echo match; esac' sh '[\x]' >> dash->fnmatch("[\\x]", "\\", 0) = 1 >> +++ exited (status 0) +++ > > If you want to escape ^ to always be treated as a literal when calling > fnmatch(), you have to spell it something like [[=^=]...], not [\^...] > (that is, since \^ inside [] does NOT treat the \ as an escape > character, but as yet another literal character as part of the brakcet > group, you instead have to resort to calling out the literal character > ^ via alternative spellings such as by a collation equivalent). This is correct for regular expressions, unspecified for shell patterns in general after bug 1234 was resolved, and always incorrect for fnmatch() after bug 1190 was resolved. In fnmatch(), backslash has to work even inside bracket expressions. Whether it had to may not previously have been clear in the standard, but it has been made clear now and that is also how it works in glibc. That part of the patch is not a problem. Cheers, Harald van Dijk ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [v2 PATCH] expand: Always quote caret when using fnmatch 2022-02-21 17:06 ` Harald van Dijk @ 2022-02-21 19:15 ` Stephane Chazelas 0 siblings, 0 replies; 15+ messages in thread From: Stephane Chazelas @ 2022-02-21 19:15 UTC (permalink / raw) To: Harald van Dijk; +Cc: Eric Blake, Herbert Xu, calestyo, dash 2022-02-21 17:06:13 +0000, Harald van Dijk: [...] > > If you want to escape ^ to always be treated as a literal when calling > > fnmatch(), you have to spell it something like [[=^=]...], not [\^...] > > (that is, since \^ inside [] does NOT treat the \ as an escape > > character, but as yet another literal character as part of the brakcet > > group, you instead have to resort to calling out the literal character > > ^ via alternative spellings such as by a collation equivalent). > > This is correct for regular expressions, unspecified for shell patterns in > general after bug 1234 was resolved, and always incorrect for fnmatch() > after bug 1190 was resolved. In fnmatch(), backslash has to work even inside > bracket expressions. Whether it had to may not previously have been clear in > the standard, but it has been made clear now and that is also how it works > in glibc. That part of the patch is not a problem. [...] Thanks, and sorry about that. I was actually the one raising bug 1234 and misremembered the outcome. case $var in ([\!x]) is specified, but: pattern='[\!x]' case $var in ($pattern) is not indeed, so, the behaviour of: sh -c 'case "\\" in ($1) echo match; esac' sh '[\^x]' is unspecified, so even though that patch changes the behaviour in this instance, it's still not a conformance bug. -- Stephane ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expand: Always quote caret when using fnmatch 2022-01-18 6:13 ` [PATCH] expand: Always quote caret when using fnmatch Herbert Xu 2022-01-18 8:44 ` Harald van Dijk @ 2022-01-18 14:29 ` Christoph Anton Mitterer 2022-01-18 14:54 ` Chet Ramey 2022-01-18 22:33 ` Herbert Xu 1 sibling, 2 replies; 15+ messages in thread From: Christoph Anton Mitterer @ 2022-01-18 14:29 UTC (permalink / raw) To: Herbert Xu, Harald van Dijk; +Cc: dash Hey. Just for confirmation: Would that also be an issue in fnmatch() (i.e. should it be forwarded?) or really just something in dash? Cheers, Chris. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expand: Always quote caret when using fnmatch 2022-01-18 14:29 ` [PATCH] " Christoph Anton Mitterer @ 2022-01-18 14:54 ` Chet Ramey 2022-01-18 22:33 ` Herbert Xu 1 sibling, 0 replies; 15+ messages in thread From: Chet Ramey @ 2022-01-18 14:54 UTC (permalink / raw) To: Christoph Anton Mitterer, Herbert Xu, Harald van Dijk; +Cc: chet.ramey, dash On 1/18/22 9:29 AM, Christoph Anton Mitterer wrote: > Hey. > > Just for confirmation: > > Would that also be an issue in fnmatch() (i.e. should it be forwarded?) > or really just something in dash? The behavior of an unquoted carat as the first character in a bracket expression is unspecified. Dash can't count on any particular behavior. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet@case.edu http://tiswww.cwru.edu/~chet/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expand: Always quote caret when using fnmatch 2022-01-18 14:29 ` [PATCH] " Christoph Anton Mitterer 2022-01-18 14:54 ` Chet Ramey @ 2022-01-18 22:33 ` Herbert Xu 1 sibling, 0 replies; 15+ messages in thread From: Herbert Xu @ 2022-01-18 22:33 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: harald, dash Christoph Anton Mitterer <calestyo@scientia.org> wrote: > Hey. > > Just for confirmation: > > Would that also be an issue in fnmatch() (i.e. should it be forwarded?) > or really just something in dash? No I don't think this is an fnmatch issue. It's dash's fault for not quoting the caret before passing it to glibc. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-02-21 19:15 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-12 16:25 possible wrong behaviour with patterns using a quoted ^ at the start of a bracket expression Christoph Anton Mitterer 2022-01-12 17:20 ` Harald van Dijk 2022-01-12 17:47 ` Christoph Anton Mitterer 2022-01-12 18:17 ` Harald van Dijk 2022-01-12 18:21 ` Christoph Anton Mitterer 2022-01-18 6:13 ` [PATCH] expand: Always quote caret when using fnmatch Herbert Xu 2022-01-18 8:44 ` Harald van Dijk 2022-01-19 5:37 ` [v2 PATCH] " Herbert Xu 2022-02-20 7:15 ` Stephane Chazelas 2022-02-21 16:39 ` Eric Blake 2022-02-21 17:06 ` Harald van Dijk 2022-02-21 19:15 ` Stephane Chazelas 2022-01-18 14:29 ` [PATCH] " Christoph Anton Mitterer 2022-01-18 14:54 ` Chet Ramey 2022-01-18 22:33 ` Herbert Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).