* BUG: git grep behave oddly with alternatives @ 2023-01-03 9:53 Marco Nenciarini 2023-01-03 16:29 ` René Scharfe 0 siblings, 1 reply; 22+ messages in thread From: Marco Nenciarini @ 2023-01-03 9:53 UTC (permalink / raw) To: git I've installed latest git from brew and suddenly git grep started behaving oddly when using alternatives. ``` $ echo abd > test.file $ git grep --untracked 'a\(b\|c\)d' test.file $ git grep --untracked 'a\(b\)d' test.file test.file:abd ``` It should have matched in both cases. If I switch to exented pattern it starts working again ``` $ git grep --untracked -E 'a(b|c)d' test.file test.file:abd ``` [System Info] git version: git version 2.39.0 cpu: x86_64 no commit associated with this build sizeof-long: 8 sizeof-size_t: 8 shell-path: /bin/sh feature: fsmonitor--daemon uname: Darwin 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:08:47 PST 2022; root:xnu-8792.61.2~4/RELEASE_X86_64 x86_64 compiler info: clang: 14.0.0 (clang-1400.0.29.202) libc info: no libc information available $SHELL (typically, interactive shell): /usr/local/bin/bash -- Marco Nenciarini - EnterpriseDB marco.nenciarini@enterprisedb.com | www.enterprisedb.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-03 9:53 BUG: git grep behave oddly with alternatives Marco Nenciarini @ 2023-01-03 16:29 ` René Scharfe 2023-01-03 18:13 ` Marco Nenciarini 0 siblings, 1 reply; 22+ messages in thread From: René Scharfe @ 2023-01-03 16:29 UTC (permalink / raw) To: Marco Nenciarini, git Am 03.01.23 um 10:53 schrieb Marco Nenciarini: > I've installed latest git from brew and suddenly git grep started behaving oddly when using alternatives. > > ``` > $ echo abd > test.file > $ git grep --untracked 'a\(b\|c\)d' test.file > $ git grep --untracked 'a\(b\)d' test.file > test.file:abd > ``` > > It should have matched in both cases. > > > If I switch to exented pattern it starts working again > > ``` > $ git grep --untracked -E 'a(b|c)d' test.file > test.file:abd > ``` This is expected: Basic regular expressions don't support alternation. Under which circumstances did it work for you without -E or -P? > > [System Info] > git version: > git version 2.39.0 > cpu: x86_64 > no commit associated with this build > sizeof-long: 8 > sizeof-size_t: 8 > shell-path: /bin/sh > feature: fsmonitor--daemon > uname: Darwin 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:08:47 PST 2022; root:xnu-8792.61.2~4/RELEASE_X86_64 x86_64 > compiler info: clang: 14.0.0 (clang-1400.0.29.202) > libc info: no libc information available > $SHELL (typically, interactive shell): /usr/local/bin/bash > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-03 16:29 ` René Scharfe @ 2023-01-03 18:13 ` Marco Nenciarini 2023-01-03 20:52 ` René Scharfe 0 siblings, 1 reply; 22+ messages in thread From: Marco Nenciarini @ 2023-01-03 18:13 UTC (permalink / raw) To: René Scharfe, git On 03/01/23 17:29, René Scharfe wrote: > Am 03.01.23 um 10:53 schrieb Marco Nenciarini: >> I've installed latest git from brew and suddenly git grep started behaving oddly when using alternatives. >> >> ``` >> $ echo abd > test.file >> $ git grep --untracked 'a\(b\|c\)d' test.file >> $ git grep --untracked 'a\(b\)d' test.file >> test.file:abd >> ``` >> >> It should have matched in both cases. >> >> >> If I switch to exented pattern it starts working again >> >> ``` >> $ git grep --untracked -E 'a(b|c)d' test.file >> test.file:abd >> ``` > > This is expected: Basic regular expressions don't support alternation. > > Under which circumstances did it work for you without -E or -P? > It has always worked until I installed 2.39.0 on my mac. I've also checked with other developers that are using a mac and they reports the same. On Linux it works regardless the git version. Using grep directly also works, so it is a different behavior between grep and git grep, that is surprising. >> >> [System Info] >> git version: >> git version 2.39.0 >> cpu: x86_64 >> no commit associated with this build >> sizeof-long: 8 >> sizeof-size_t: 8 >> shell-path: /bin/sh >> feature: fsmonitor--daemon >> uname: Darwin 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:08:47 PST 2022; root:xnu-8792.61.2~4/RELEASE_X86_64 x86_64 >> compiler info: clang: 14.0.0 (clang-1400.0.29.202) >> libc info: no libc information available >> $SHELL (typically, interactive shell): /usr/local/bin/bash >> >> > -- Marco Nenciarini - EnterpriseDB marco.nenciarini@enterprisedb.com | www.enterprisedb.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-03 18:13 ` Marco Nenciarini @ 2023-01-03 20:52 ` René Scharfe 2023-01-04 6:13 ` Junio C Hamano 2023-01-04 7:46 ` Jeff King 0 siblings, 2 replies; 22+ messages in thread From: René Scharfe @ 2023-01-03 20:52 UTC (permalink / raw) To: Marco Nenciarini, git Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason Am 03.01.23 um 19:13 schrieb Marco Nenciarini: > On 03/01/23 17:29, René Scharfe wrote: >> Am 03.01.23 um 10:53 schrieb Marco Nenciarini: >>> I've installed latest git from brew and suddenly git grep started behaving oddly when using alternatives. >>> >>> ``` >>> $ echo abd > test.file >>> $ git grep --untracked 'a\(b\|c\)d' test.file >>> $ git grep --untracked 'a\(b\)d' test.file >>> test.file:abd >>> ``` >>> >>> It should have matched in both cases. >>> >>> >>> If I switch to exented pattern it starts working again >>> >>> ``` >>> $ git grep --untracked -E 'a(b|c)d' test.file >>> test.file:abd >>> ``` >> >> This is expected: Basic regular expressions don't support alternation. >> >> Under which circumstances did it work for you without -E or -P? >> > > It has always worked until I installed 2.39.0 on my mac. I've also checked with other developers that are using a mac and they reports the same. On Linux it works regardless the git version. > > Using grep directly also works, so it is a different behavior between grep and git grep, that is surprising. Meaning you used Apple's version of git before? $ echo abd > test.file $ /usr/bin/git grep --untracked 'b\|c' test.file test.file:abd $ /usr/bin/git version git version 2.37.1 (Apple Git-137.1) $ git grep --untracked 'b\|c' test.file $ git version git version 2.39.0 So I can reproduce your findings on macOS Ventura. Same with grep: $ grep 'b\|c' test.file abd $ grep --version grep (BSD grep, GNU compatible) 2.6.0-FreeBSD re_format(7) says: "Obsolete (“basic”) regular expressions differ in several respects. ‘|’ is an ordinary character and there is no equivalent for its functionality.". Under the headline "ENHANCED FEATURES" it continues, however: "When the REG_ENHANCED flag is passed to one of the regcomp() variants, additional features are activated." And: "For enhanced basic REs, ‘+’, ‘?’ and ‘|’ remain regular characters, but ‘\+’, ‘\?’ and ‘\|’ have the same special meaning as the unescaped characters do for extended REs, i.e., one or more matches, zero or one matches and alteration, respectively." So apparently Apple chose a middle ground between basic and extended regular expressions for its grep and git grep. Under "IMPLEMENTATION CHOICES" it says: "The regex implementation in Mac OS X 10.8 and later is based on a heavily modified subset of TRE (http://laurikari.net/tre/)." The manpage of GNU grep says: "grep understands three different versions of regular expression syntax: “basic” (BRE), “extended” (ERE) and “perl” (PCRE). In GNU grep there is no difference in available functionality between basic and extended syntax. In other implementations, basic regular expressions are less powerful." And under the headline "Basic vs Extended Regular Expressions": "In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \)." So GNU grep apparently made the same choice as Apple, probably far earlier. When I compile git with NO_REGEX and thus with the fallback in compat/regex/, the result supports alternation as well: $ ./git grep --untracked 'b\|c' test.file test.file:abd $ nm ./git | grep regcomp 0000000100255978 T _git_regcomp Based on that I suggest: --- >8 --- Subject: grep: use REG_ENHANCED on macOS GNU grep(1) and regcomp(3) use enhanced basic regular expressions by default, which means that it e.g. supports alternation, e.g. "a\|b" matches both "a" and "b". The same is true for our compat/regex/ implementation. On macOS Ventura (and possibly earlier) grep(1) also uses enhanced BREs, but regcomp(3) requires the flag REG_ENHANCED to turn them on. Do that for git grep if possible, for consistency with the platform's grep(1) and our fallback library. It would be simpler to use REG_ENHANCED everywhere it is defined instead of adding a Makefile setting, but it's a non-standard extension and might mean something else on other platforms. Reported-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: René Scharfe <l.s.r@web.de> --- Makefile | 8 ++++++++ config.mak.uname | 1 + grep.c | 4 ++++ 3 files changed, 13 insertions(+) diff --git a/Makefile b/Makefile index db447d0738..15e7edc9d2 100644 --- a/Makefile +++ b/Makefile @@ -289,6 +289,10 @@ include shared.mak # Define NO_REGEX if your C library lacks regex support with REG_STARTEND # feature. # +# Define GIT_GREP_USES_REG_ENHANCED if your C library provides the flag +# REG_ENHANCED to enable enhanced basic regular expressions and you'd +# like to use it in git grep. +# # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the # user. # @@ -2037,6 +2041,10 @@ endif ifdef NO_REGEX COMPAT_CFLAGS += -Icompat/regex COMPAT_OBJS += compat/regex/regex.o +else +ifdef GIT_GREP_USES_REG_ENHANCED + COMPAT_CFLAGS += -DGIT_GREP_USES_REG_ENHANCED +endif endif ifdef NATIVE_CRLF BASIC_CFLAGS += -DNATIVE_CRLF diff --git a/config.mak.uname b/config.mak.uname index d63629fe80..14c145ae42 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -147,6 +147,7 @@ ifeq ($(uname_S),Darwin) FREAD_READS_DIRECTORIES = UnfortunatelyYes HAVE_NS_GET_EXECUTABLE_PATH = YesPlease CSPRNG_METHOD = arc4random + GIT_GREP_USES_REG_ENHANCED = YesPlease # Workaround for `gettext` being keg-only and not even being linked via # `brew link --force gettext`, should be obsolete as of diff --git a/grep.c b/grep.c index 06eed69493..a8430daaba 100644 --- a/grep.c +++ b/grep.c @@ -502,6 +502,10 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) regflags |= REG_ICASE; if (opt->pattern_type_option == GREP_PATTERN_TYPE_ERE) regflags |= REG_EXTENDED; +#if defined(GIT_GREP_USES_REG_ENHANCED) && defined(REG_ENHANCED) + else + regflags |= REG_ENHANCED; +#endif err = regcomp(&p->regexp, p->pattern, regflags); if (err) { char errbuf[1024]; -- 2.39.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-03 20:52 ` René Scharfe @ 2023-01-04 6:13 ` Junio C Hamano 2023-01-04 7:46 ` Jeff King 1 sibling, 0 replies; 22+ messages in thread From: Junio C Hamano @ 2023-01-04 6:13 UTC (permalink / raw) To: René Scharfe Cc: Marco Nenciarini, git, Ævar Arnfjörð Bjarmason René Scharfe <l.s.r@web.de> writes: > "For enhanced basic REs, ‘+’, ‘?’ and ‘|’ remain regular characters, > but ‘\+’, ‘\?’ and ‘\|’ have the same special meaning as the > unescaped characters do for extended REs, i.e., one or more > matches, zero or one matches and alteration, respectively." > > So apparently Apple chose a middle ground between basic and extended > regular expressions for its grep and git grep. Sounds like GNU extension to me. > So GNU grep apparently made the same choice as Apple, probably far > earlier. Yup. > Based on that I suggest: > ... > > It would be simpler to use REG_ENHANCED everywhere it is defined instead > of adding a Makefile setting, but it's a non-standard extension and > might mean something else on other platforms. OK. Very conservative and good. > Reported-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > Makefile | 8 ++++++++ > config.mak.uname | 1 + > grep.c | 4 ++++ > 3 files changed, 13 insertions(+) > > diff --git a/Makefile b/Makefile > index db447d0738..15e7edc9d2 100644 > --- a/Makefile > +++ b/Makefile > @@ -289,6 +289,10 @@ include shared.mak > # Define NO_REGEX if your C library lacks regex support with REG_STARTEND > # feature. > # > +# Define GIT_GREP_USES_REG_ENHANCED if your C library provides the flag > +# REG_ENHANCED to enable enhanced basic regular expressions and you'd > +# like to use it in git grep. > +# > # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the > # user. > # > @@ -2037,6 +2041,10 @@ endif > ifdef NO_REGEX > COMPAT_CFLAGS += -Icompat/regex > COMPAT_OBJS += compat/regex/regex.o > +else > +ifdef GIT_GREP_USES_REG_ENHANCED > + COMPAT_CFLAGS += -DGIT_GREP_USES_REG_ENHANCED > +endif > endif > ifdef NATIVE_CRLF > BASIC_CFLAGS += -DNATIVE_CRLF > diff --git a/config.mak.uname b/config.mak.uname > index d63629fe80..14c145ae42 100644 > --- a/config.mak.uname > +++ b/config.mak.uname > @@ -147,6 +147,7 @@ ifeq ($(uname_S),Darwin) > FREAD_READS_DIRECTORIES = UnfortunatelyYes > HAVE_NS_GET_EXECUTABLE_PATH = YesPlease > CSPRNG_METHOD = arc4random > + GIT_GREP_USES_REG_ENHANCED = YesPlease > > # Workaround for `gettext` being keg-only and not even being linked via > # `brew link --force gettext`, should be obsolete as of > diff --git a/grep.c b/grep.c > index 06eed69493..a8430daaba 100644 > --- a/grep.c > +++ b/grep.c > @@ -502,6 +502,10 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) > regflags |= REG_ICASE; > if (opt->pattern_type_option == GREP_PATTERN_TYPE_ERE) > regflags |= REG_EXTENDED; > +#if defined(GIT_GREP_USES_REG_ENHANCED) && defined(REG_ENHANCED) > + else > + regflags |= REG_ENHANCED; > +#endif > err = regcomp(&p->regexp, p->pattern, regflags); > if (err) { > char errbuf[1024]; > -- > 2.39.0 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-03 20:52 ` René Scharfe 2023-01-04 6:13 ` Junio C Hamano @ 2023-01-04 7:46 ` Jeff King 2023-01-04 16:36 ` René Scharfe 1 sibling, 1 reply; 22+ messages in thread From: Jeff King @ 2023-01-04 7:46 UTC (permalink / raw) To: René Scharfe Cc: Marco Nenciarini, git, Junio C Hamano, Ævar Arnfjörð Bjarmason On Tue, Jan 03, 2023 at 09:52:27PM +0100, René Scharfe wrote: > diff --git a/Makefile b/Makefile > index db447d0738..15e7edc9d2 100644 > --- a/Makefile > +++ b/Makefile > @@ -289,6 +289,10 @@ include shared.mak > # Define NO_REGEX if your C library lacks regex support with REG_STARTEND > # feature. > # > +# Define GIT_GREP_USES_REG_ENHANCED if your C library provides the flag > +# REG_ENHANCED to enable enhanced basic regular expressions and you'd > +# like to use it in git grep. I didn't test, but just from looking at the patch I'd expect this to affect other parts of Git besides git-grep. E.g., "git log --grep". Which raises two questions: - would a more generalized name be better? USE_REG_ENHANCED or something? That might be _too_ general, but see below. - should this cover other cases? Grepping for "regcomp", would people want this to behave consistently for "git config --get-regexp", or diff funcnames, and so on? If so, then I could envision a USE_REG_ENHANCED which just wraps the system regcomp and adds the REG_ENHANCED flag when REG_EXTENDED is not set? -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-04 7:46 ` Jeff King @ 2023-01-04 16:36 ` René Scharfe 2023-01-06 9:09 ` Jeff King 0 siblings, 1 reply; 22+ messages in thread From: René Scharfe @ 2023-01-04 16:36 UTC (permalink / raw) To: Jeff King Cc: Marco Nenciarini, git, Junio C Hamano, Ævar Arnfjörð Bjarmason Am 04.01.23 um 08:46 schrieb Jeff King: > On Tue, Jan 03, 2023 at 09:52:27PM +0100, René Scharfe wrote: > >> diff --git a/Makefile b/Makefile >> index db447d0738..15e7edc9d2 100644 >> --- a/Makefile >> +++ b/Makefile >> @@ -289,6 +289,10 @@ include shared.mak >> # Define NO_REGEX if your C library lacks regex support with REG_STARTEND >> # feature. >> # >> +# Define GIT_GREP_USES_REG_ENHANCED if your C library provides the flag >> +# REG_ENHANCED to enable enhanced basic regular expressions and you'd >> +# like to use it in git grep. > > I didn't test, but just from looking at the patch I'd expect this to > affect other parts of Git besides git-grep. E.g., "git log --grep". > Which raises two questions: > > - would a more generalized name be better? USE_REG_ENHANCED or > something? That might be _too_ general, but see below. > > - should this cover other cases? Grepping for "regcomp", would people > want this to behave consistently for "git config --get-regexp", or > diff funcnames, and so on? > > If so, then I could envision a USE_REG_ENHANCED which just wraps the > system regcomp and adds the REG_ENHANCED flag when REG_EXTENDED is not > set? Good point. I don't know what people want, though. re_format(7) on macOS/BSD and regex(7) on Linux call basic REs "obsolete" and extended REs "modern", so they seem to push people away from the old kind, enhanced or not. But making a consistent choice for all regex use makes sense -- platforms that use compat/regex/ get the same enhanced flavor everywhere. René ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-04 16:36 ` René Scharfe @ 2023-01-06 9:09 ` Jeff King 2023-01-08 0:42 ` René Scharfe 0 siblings, 1 reply; 22+ messages in thread From: Jeff King @ 2023-01-06 9:09 UTC (permalink / raw) To: René Scharfe Cc: Marco Nenciarini, git, Junio C Hamano, Ævar Arnfjörð Bjarmason On Wed, Jan 04, 2023 at 05:36:21PM +0100, René Scharfe wrote: > > I didn't test, but just from looking at the patch I'd expect this to > > affect other parts of Git besides git-grep. E.g., "git log --grep". > > Which raises two questions: > > > > - would a more generalized name be better? USE_REG_ENHANCED or > > something? That might be _too_ general, but see below. > > > > - should this cover other cases? Grepping for "regcomp", would people > > want this to behave consistently for "git config --get-regexp", or > > diff funcnames, and so on? > > > > If so, then I could envision a USE_REG_ENHANCED which just wraps the > > system regcomp and adds the REG_ENHANCED flag when REG_EXTENDED is not > > set? > > Good point. I don't know what people want, though. re_format(7) on > macOS/BSD and regex(7) on Linux call basic REs "obsolete" and extended > REs "modern", so they seem to push people away from the old kind, > enhanced or not. Oh, good point. I was just grepping for regcomp(), but of course any case which is already passing REG_EXTENDED would not be affected anyway. And most places are already using that. E.g., the config code always does so, and it looks like pickaxe "-G" does so. For diffs, we have diff.*.xfuncname, which uses EREs. We do still support regular "funcname" for backwards compatibility, but we only document the extended version. Ironically, that option was introduced because BREs did not portably support things like alternation, even with the "enhanced" syntax. ;) See 45d9414fa5 (diff.*.xfuncname which uses "extended" regex's for hunk header selection, 2008-09-18). So I think we are embracing the "everyone should use EREs" mentality already. The only spots I see that use BREs are: - grep.c, which handles "git grep" and "git log --grep" - line-range.c, presumably for "-L" function matching - deprecated non-ERE funcname patterns Your patch is handling the first, which is by the far most important. I would be OK leaving the others as-is, but I also wouldn't mind a patch that works at the regcomp() level to make things automatically consistent. -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-06 9:09 ` Jeff King @ 2023-01-08 0:42 ` René Scharfe 2023-01-08 1:27 ` Junio C Hamano 2023-01-11 18:56 ` Jeff King 0 siblings, 2 replies; 22+ messages in thread From: René Scharfe @ 2023-01-08 0:42 UTC (permalink / raw) To: Jeff King Cc: Marco Nenciarini, git, Junio C Hamano, Ævar Arnfjörð Bjarmason Am 06.01.23 um 10:09 schrieb Jeff King: > On Wed, Jan 04, 2023 at 05:36:21PM +0100, René Scharfe wrote: > >>> I didn't test, but just from looking at the patch I'd expect this to >>> affect other parts of Git besides git-grep. E.g., "git log --grep". >>> Which raises two questions: >>> >>> - would a more generalized name be better? USE_REG_ENHANCED or >>> something? That might be _too_ general, but see below. >>> >>> - should this cover other cases? Grepping for "regcomp", would people >>> want this to behave consistently for "git config --get-regexp", or >>> diff funcnames, and so on? >>> >>> If so, then I could envision a USE_REG_ENHANCED which just wraps the >>> system regcomp and adds the REG_ENHANCED flag when REG_EXTENDED is not >>> set? >> >> Good point. I don't know what people want, though. re_format(7) on >> macOS/BSD and regex(7) on Linux call basic REs "obsolete" and extended >> REs "modern", so they seem to push people away from the old kind, >> enhanced or not. > > Oh, good point. I was just grepping for regcomp(), but of course any > case which is already passing REG_EXTENDED would not be affected anyway. > And most places are already using that. E.g., the config code always > does so, and it looks like pickaxe "-G" does so. > > For diffs, we have diff.*.xfuncname, which uses EREs. We do still > support regular "funcname" for backwards compatibility, but we only > document the extended version. Ironically, that option was introduced > because BREs did not portably support things like alternation, even with > the "enhanced" syntax. ;) See 45d9414fa5 (diff.*.xfuncname which uses > "extended" regex's for hunk header selection, 2008-09-18). > > So I think we are embracing the "everyone should use EREs" mentality > already. The only spots I see that use BREs are: > > - grep.c, which handles "git grep" and "git log --grep" > > - line-range.c, presumably for "-L" function matching > > - deprecated non-ERE funcname patterns > > Your patch is handling the first, which is by the far most important. I > would be OK leaving the others as-is, but I also wouldn't mind a patch > that works at the regcomp() level to make things automatically > consistent. There's also the code handling "git log -i -S nonäscii" in diffcore_pickaxe(), but it quotes special characters and thus effectively does a fixed-string search, so you're right in omitting it above. REG_ENHANCED on macOS affects REG_EXTENDED as well. It unlocks e.g. non-greedy repetitions and inline comments. Sounds nice, but also potentially surprising. I was unable to find a current version of the re_format(7) manpage online, unfortunately. Apple's latest version of Git sets NO_REGEX and thus uses compat/regex, if I read their source correctly: https://github.com/apple-oss-distributions/Git/blob/Git-128/src/git/Makefile#L1302 The easiest and most consistent option would be to do the same. But we can't do that, because it would break multibyte support, which was fixed by 1819ad327b (grep: fix multibyte regex handling under macOS, 2022-08-26), which started to use the system regex functions again. Which begs the question: Isn't that a problem for the platforms that still have to use NO_REGEX? Shouldn't we fix compat/regex? Anyway, here's an attempt at a more general, but still targeted fix for macOS: --- >8 --- Subject: [PATCH] use enhanced basic regular expressions on macOS When 1819ad327b (grep: fix multibyte regex handling under macOS, 2022-08-26) started to use the native regex library instead of Git's own (compat/regex/), it lost support for alternation in basic regular expressions. Bring it back by enabling the flag REG_ENHANCED on macOS when compiling basic regular expressions. Reported-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Suggested-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> --- Makefile | 9 +++++++++ compat/regcomp_enhanced.c | 9 +++++++++ config.mak.uname | 1 + git-compat-util.h | 5 +++++ 4 files changed, 24 insertions(+) create mode 100644 compat/regcomp_enhanced.c diff --git a/Makefile b/Makefile index db447d0738..46e30be673 100644 --- a/Makefile +++ b/Makefile @@ -289,6 +289,10 @@ include shared.mak # Define NO_REGEX if your C library lacks regex support with REG_STARTEND # feature. # +# Define USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS if your C library provides +# the flag REG_ENHANCED and you'd like to use it to enable enhanced basic +# regular expressions. +# # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the # user. # @@ -2037,6 +2041,11 @@ endif ifdef NO_REGEX COMPAT_CFLAGS += -Icompat/regex COMPAT_OBJS += compat/regex/regex.o +else +ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS + COMPAT_CFLAGS += -DUSE_ENHANCED_BASIC_REGULAR_EXPRESSIONS + COMPAT_OBJS += compat/regcomp_enhanced.o +endif endif ifdef NATIVE_CRLF BASIC_CFLAGS += -DNATIVE_CRLF diff --git a/compat/regcomp_enhanced.c b/compat/regcomp_enhanced.c new file mode 100644 index 0000000000..84193ce53b --- /dev/null +++ b/compat/regcomp_enhanced.c @@ -0,0 +1,9 @@ +#include "../git-compat-util.h" +#undef regcomp + +int git_regcomp(regex_t *preg, const char *pattern, int cflags) +{ + if (!(cflags & REG_EXTENDED)) + cflags |= REG_ENHANCED; + return regcomp(preg, pattern, cflags); +} diff --git a/config.mak.uname b/config.mak.uname index d63629fe80..7d25995265 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -147,6 +147,7 @@ ifeq ($(uname_S),Darwin) FREAD_READS_DIRECTORIES = UnfortunatelyYes HAVE_NS_GET_EXECUTABLE_PATH = YesPlease CSPRNG_METHOD = arc4random + USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS = YesPlease # Workaround for `gettext` being keg-only and not even being linked via # `brew link --force gettext`, should be obsolete as of diff --git a/git-compat-util.h b/git-compat-util.h index 76e4b11131..1efa834089 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1338,6 +1338,11 @@ static inline int regexec_buf(const regex_t *preg, const char *buf, size_t size, return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND); } +#ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS +int git_regcomp(regex_t *preg, const char *pattern, int cflags); +#define regcomp git_regcomp +#endif + #ifndef DIR_HAS_BSD_GROUP_SEMANTICS # define FORCE_DIR_SET_GID S_ISGID #else -- 2.39.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-08 0:42 ` René Scharfe @ 2023-01-08 1:27 ` Junio C Hamano 2023-01-11 18:56 ` Jeff King 1 sibling, 0 replies; 22+ messages in thread From: Junio C Hamano @ 2023-01-08 1:27 UTC (permalink / raw) To: René Scharfe Cc: Jeff King, Marco Nenciarini, git, Ævar Arnfjörð Bjarmason René Scharfe <l.s.r@web.de> writes: > diff --git a/Makefile b/Makefile > index db447d0738..46e30be673 100644 > --- a/Makefile > +++ b/Makefile > @@ -289,6 +289,10 @@ include shared.mak > # Define NO_REGEX if your C library lacks regex support with REG_STARTEND > # feature. > # > +# Define USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS if your C library provides > +# the flag REG_ENHANCED and you'd like to use it to enable enhanced basic > +# regular expressions. > +# I wondered if we should mention macOS somewhere in the description to help those users that may be affeced, but it seems that looking for "REG_ENHANCED BSD" with a search engine finds pages that indicate this is available on BSD's in general? > @@ -2037,6 +2041,11 @@ endif > ifdef NO_REGEX > COMPAT_CFLAGS += -Icompat/regex > COMPAT_OBJS += compat/regex/regex.o > +else > +ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS > + COMPAT_CFLAGS += -DUSE_ENHANCED_BASIC_REGULAR_EXPRESSIONS > + COMPAT_OBJS += compat/regcomp_enhanced.o > +endif OK. > diff --git a/compat/regcomp_enhanced.c b/compat/regcomp_enhanced.c > new file mode 100644 > index 0000000000..84193ce53b > --- /dev/null > +++ b/compat/regcomp_enhanced.c > @@ -0,0 +1,9 @@ > +#include "../git-compat-util.h" > +#undef regcomp > +int git_regcomp(regex_t *preg, const char *pattern, int cflags) > +{ > + if (!(cflags & REG_EXTENDED)) > + cflags |= REG_ENHANCED; > + return regcomp(preg, pattern, cflags); > +} OK. I like the "we only want to affect BRE" bit, that is carefully done. > diff --git a/config.mak.uname b/config.mak.uname > index d63629fe80..7d25995265 100644 > --- a/config.mak.uname > +++ b/config.mak.uname > @@ -147,6 +147,7 @@ ifeq ($(uname_S),Darwin) > FREAD_READS_DIRECTORIES = UnfortunatelyYes > HAVE_NS_GET_EXECUTABLE_PATH = YesPlease > CSPRNG_METHOD = arc4random > + USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS = YesPlease OK. This would give macOS folks who have already been using the enhanced mode (without us asking) with their older libraries the behaviour they are more familiar with. Good. > diff --git a/git-compat-util.h b/git-compat-util.h > index 76e4b11131..1efa834089 100644 > --- a/git-compat-util.h > +++ b/git-compat-util.h > @@ -1338,6 +1338,11 @@ static inline int regexec_buf(const regex_t *preg, const char *buf, size_t size, > return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND); > } > > +#ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS > +int git_regcomp(regex_t *preg, const char *pattern, int cflags); > +#define regcomp git_regcomp > +#endif OK. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-08 0:42 ` René Scharfe 2023-01-08 1:27 ` Junio C Hamano @ 2023-01-11 18:56 ` Jeff King 2023-01-12 17:13 ` René Scharfe 1 sibling, 1 reply; 22+ messages in thread From: Jeff King @ 2023-01-11 18:56 UTC (permalink / raw) To: René Scharfe Cc: Marco Nenciarini, git, Junio C Hamano, Ævar Arnfjörð Bjarmason On Sun, Jan 08, 2023 at 01:42:04AM +0100, René Scharfe wrote: > REG_ENHANCED on macOS affects REG_EXTENDED as well. It unlocks e.g. > non-greedy repetitions and inline comments. Sounds nice, but also > potentially surprising. I was unable to find a current version of > the re_format(7) manpage online, unfortunately. I'm not quite sure what you mean here by "non-greedy repetitions". Something like: # prefer "foo bar" to "foo bar bar"; only matters for colorizing or # --only-matching git grep -E 'foo.*?bar' ? If so, then yeah, that changes the meaning of a bare "?" and people might be surprised by it. > Apple's latest version of Git sets NO_REGEX and thus uses > compat/regex, if I read their source correctly: > > https://github.com/apple-oss-distributions/Git/blob/Git-128/src/git/Makefile#L1302 > > The easiest and most consistent option would be to do the same. But > we can't do that, because it would break multibyte support, which was > fixed by 1819ad327b (grep: fix multibyte regex handling under macOS, > 2022-08-26), which started to use the system regex functions again. Looks like that NO_REGEX line was dropped by 1819ad327b. So I don't think Apple added it themselves; their version is just based on an older version of Git (looks like 2.24.3). > Which begs the question: Isn't that a problem for the platforms that > still have to use NO_REGEX? Shouldn't we fix compat/regex? I'm not sure. I always assumed our fallback was similar-ish to what was in glibc and was thus pretty featureful, but that may not be true (it actually comes from gawk). It looks like we just didn't pull over the multi-byte parts in a997bf423d (compat/regex: get the gawk regex engine to compile within git, 2010-08-17). > Anyway, here's an attempt at a more general, but still targeted fix > for macOS: > > --- >8 --- > Subject: [PATCH] use enhanced basic regular expressions on macOS This seems pretty sensible, regardless of whether we improve multi-byte support for compat/regex. -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-11 18:56 ` Jeff King @ 2023-01-12 17:13 ` René Scharfe 2023-01-12 17:52 ` Ævar Arnfjörð Bjarmason 2023-01-12 21:54 ` Jeff King 0 siblings, 2 replies; 22+ messages in thread From: René Scharfe @ 2023-01-12 17:13 UTC (permalink / raw) To: Jeff King Cc: Marco Nenciarini, git, Junio C Hamano, Ævar Arnfjörð Bjarmason Am 11.01.23 um 19:56 schrieb Jeff King: > On Sun, Jan 08, 2023 at 01:42:04AM +0100, René Scharfe wrote: > >> REG_ENHANCED on macOS affects REG_EXTENDED as well. It unlocks e.g. >> non-greedy repetitions and inline comments. Sounds nice, but also >> potentially surprising. I was unable to find a current version of >> the re_format(7) manpage online, unfortunately. > > I'm not quite sure what you mean here by "non-greedy repetitions". > Something like: > > # prefer "foo bar" to "foo bar bar"; only matters for colorizing or > # --only-matching > git grep -E 'foo.*?bar' > > ? If so, then yeah, that changes the meaning of a bare "?" and people > might be surprised by it. Right. To be fair, question mark is a special character and you'd probably need to quote it anyway if you want to match a literal question mark. Otherwise I get: $ git grep -E 'foo.*?bar' fatal: command line, 'foo.*?bar': repetition-operator operand invalid >> Apple's latest version of Git sets NO_REGEX and thus uses >> compat/regex, if I read their source correctly: >> >> https://github.com/apple-oss-distributions/Git/blob/Git-128/src/git/Makefile#L1302 >> >> The easiest and most consistent option would be to do the same. But >> we can't do that, because it would break multibyte support, which was >> fixed by 1819ad327b (grep: fix multibyte regex handling under macOS, >> 2022-08-26), which started to use the system regex functions again. > > Looks like that NO_REGEX line was dropped by 1819ad327b. So I don't > think Apple added it themselves; their version is just based on an older > version of Git (looks like 2.24.3). Makes sense. >> Which begs the question: Isn't that a problem for the platforms that >> still have to use NO_REGEX? Shouldn't we fix compat/regex? > > I'm not sure. I always assumed our fallback was similar-ish to what was > in glibc and was thus pretty featureful, but that may not be true (it > actually comes from gawk). It looks like we just didn't pull over the > multi-byte parts in a997bf423d (compat/regex: get the gawk regex engine > to compile within git, 2010-08-17). GAWK removed NO_MBSUPPORT, NO_MBSUPPORT and mbsupport.h in the meantime. I guess that means they support multi-byte characters everywhere now. René ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-12 17:13 ` René Scharfe @ 2023-01-12 17:52 ` Ævar Arnfjörð Bjarmason 2023-01-12 21:54 ` Jeff King 1 sibling, 0 replies; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-01-12 17:52 UTC (permalink / raw) To: René Scharfe; +Cc: Jeff King, Marco Nenciarini, git, Junio C Hamano On Thu, Jan 12 2023, René Scharfe wrote: > Am 11.01.23 um 19:56 schrieb Jeff King: >> On Sun, Jan 08, 2023 at 01:42:04AM +0100, René Scharfe wrote: > [...] >>> Which begs the question: Isn't that a problem for the platforms that >>> still have to use NO_REGEX? Shouldn't we fix compat/regex? >> >> I'm not sure. I always assumed our fallback was similar-ish to what was >> in glibc and was thus pretty featureful, but that may not be true (it >> actually comes from gawk). It looks like we just didn't pull over the >> multi-byte parts in a997bf423d (compat/regex: get the gawk regex engine >> to compile within git, 2010-08-17). > > GAWK removed NO_MBSUPPORT, NO_MBSUPPORT and mbsupport.h in the meantime. > I guess that means they support multi-byte characters everywhere now. > > René Note that GAWK doesn't really have its own regex engine, their is mostly glibc.git's, which they've ported over to their codebase with some difficulty over the years (it not being in gnulib, like most such shared GNU libraries). Ours is then an old copy of gawk.git's, which I attempted to update a few years back, but that patch series ran into some issues I can't remember, so we still have that very old version. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-12 17:13 ` René Scharfe 2023-01-12 17:52 ` Ævar Arnfjörð Bjarmason @ 2023-01-12 21:54 ` Jeff King 2023-01-13 8:28 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 22+ messages in thread From: Jeff King @ 2023-01-12 21:54 UTC (permalink / raw) To: René Scharfe Cc: Marco Nenciarini, git, Junio C Hamano, Ævar Arnfjörð Bjarmason On Thu, Jan 12, 2023 at 06:13:13PM +0100, René Scharfe wrote: > > I'm not quite sure what you mean here by "non-greedy repetitions". > > Something like: > > > > # prefer "foo bar" to "foo bar bar"; only matters for colorizing or > > # --only-matching > > git grep -E 'foo.*?bar' > > > > ? If so, then yeah, that changes the meaning of a bare "?" and people > > might be surprised by it. > > Right. To be fair, question mark is a special character and you'd > probably need to quote it anyway if you want to match a literal > question mark. Otherwise I get: > > $ git grep -E 'foo.*?bar' > fatal: command line, 'foo.*?bar': repetition-operator operand invalid This is on macOS, I assume? With glibc it seems to be quietly ignored: $ git grep -E -o 'foo.*?ba' .clang-format .clang-format:foo, bar, ba So it is not treated literally (as it would be without -E). But nor does it make the match non-greedy (otherwise it would have output "foo, ba", as "git grep -P" does). So it does seem like all bets are off for what people can and should expect here. Which isn't to say we should make things worse. I mostly wondered if REG_ENHANCED might take us closer to what glibc was doing by default, but it doesn't seem like it. -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-12 21:54 ` Jeff King @ 2023-01-13 8:28 ` Ævar Arnfjörð Bjarmason 2023-01-13 17:19 ` Junio C Hamano 2023-01-13 17:24 ` René Scharfe 0 siblings, 2 replies; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-01-13 8:28 UTC (permalink / raw) To: Jeff King; +Cc: René Scharfe, Marco Nenciarini, git, Junio C Hamano On Thu, Jan 12 2023, Jeff King wrote: > On Thu, Jan 12, 2023 at 06:13:13PM +0100, René Scharfe wrote: > >> > I'm not quite sure what you mean here by "non-greedy repetitions". >> > Something like: >> > >> > # prefer "foo bar" to "foo bar bar"; only matters for colorizing or >> > # --only-matching >> > git grep -E 'foo.*?bar' >> > >> > ? If so, then yeah, that changes the meaning of a bare "?" and people >> > might be surprised by it. >> >> Right. To be fair, question mark is a special character and you'd >> probably need to quote it anyway if you want to match a literal >> question mark. Otherwise I get: >> >> $ git grep -E 'foo.*?bar' >> fatal: command line, 'foo.*?bar': repetition-operator operand invalid > > This is on macOS, I assume? With glibc it seems to be quietly ignored: > > $ git grep -E -o 'foo.*?ba' .clang-format > .clang-format:foo, bar, ba > > So it is not treated literally (as it would be without -E). But nor does > it make the match non-greedy (otherwise it would have output "foo, ba", > as "git grep -P" does). > > So it does seem like all bets are off for what people can and should > expect here. Which isn't to say we should make things worse. I mostly > wondered if REG_ENHANCED might take us closer to what glibc was doing by > default, but it doesn't seem like it. There's a couple of ways out of this that I don't see in this thread: - Declare it not a problem: We have -G, -E and -P to map to BRE, ERE and PCRE. One view is to say the first two must match POSIX, another is tha whatever the platform thinks they should do is how they should act. Of course that makes git regex invocations "unportable", but it might be acceptable. People can always use PCRE if they want "portability". - Just mandate PCRE for Mac OS, then map -E to -P. We could do this with the pcre2convert() API and PCRE2_CONVERT_POSIX_EXTENDED flag, i.e. PCRE's own "translate this to ERE". But Perl/PCRE syntax is already a superset of ERE syntax, minus things like (*VERB), (?: etc., which people are unlikely to write without intending to get the PCRE semantics. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-13 8:28 ` Ævar Arnfjörð Bjarmason @ 2023-01-13 17:19 ` Junio C Hamano 2023-01-14 6:44 ` René Scharfe 2023-01-13 17:24 ` René Scharfe 1 sibling, 1 reply; 22+ messages in thread From: Junio C Hamano @ 2023-01-13 17:19 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Jeff King, René Scharfe, Marco Nenciarini, git Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Thu, Jan 12 2023, Jeff King wrote: > >> So it does seem like all bets are off for what people can and should >> expect here. Which isn't to say we should make things worse. I mostly >> wondered if REG_ENHANCED might take us closer to what glibc was doing by >> default, but it doesn't seem like it. I thought that René's "Use enhanced only when doing BRE" was fairly focused, but I am very tempted to accept ... > There's a couple of ways out of this that I don't see in this thread: > > - Declare it not a problem: We have -G, -E and -P to map to BRE, ERE and > PCRE. One view is to say the first two must match POSIX, another is > tha whatever the platform thinks they should do is how they should > act. ... this view. The story "BRE and ERE work via what system libraries provide, and 'git grep' matches what system grep' does" is an easy to understand view. > - Just mandate PCRE for Mac OS, then map -E to -P. We could do this with > the pcre2convert() API and PCRE2_CONVERT_POSIX_EXTENDED flag, > i.e. PCRE's own "translate this to ERE". Presumably this is to ensure -E works identically everywhere? If so, then we should do that everywhere, not just on macOS. But again it makes "git grep -E" slightly incompatible with "grep" on systems (including macOS), so... ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-13 17:19 ` Junio C Hamano @ 2023-01-14 6:44 ` René Scharfe 2023-01-14 8:31 ` René Scharfe 0 siblings, 1 reply; 22+ messages in thread From: René Scharfe @ 2023-01-14 6:44 UTC (permalink / raw) To: Junio C Hamano, Ævar Arnfjörð Bjarmason Cc: Jeff King, Marco Nenciarini, git Am 13.01.23 um 18:19 schrieb Junio C Hamano: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> On Thu, Jan 12 2023, Jeff King wrote: >> >>> So it does seem like all bets are off for what people can and should >>> expect here. Which isn't to say we should make things worse. I mostly >>> wondered if REG_ENHANCED might take us closer to what glibc was doing by >>> default, but it doesn't seem like it. > > I thought that René's "Use enhanced only when doing BRE" was fairly > focused, but I am very tempted to accept ... > >> There's a couple of ways out of this that I don't see in this thread: >> >> - Declare it not a problem: We have -G, -E and -P to map to BRE, ERE and >> PCRE. One view is to say the first two must match POSIX, another is >> tha whatever the platform thinks they should do is how they should >> act. > > ... this view. The story "BRE and ERE work via what system > libraries provide, and 'git grep' matches what system grep' does" is > an easy to understand view. That was my stance in my first reply as well. But 3632cfc248 (Use compatibility regex library for OSX/Darwin, 2008-09-07) explicitly added alternation support for BREs on macOS, and 1819ad327b (grep: fix multibyte regex handling under macOS, 2022-08-26) removed it seemingly by accident. And grep(1) does support them on macOS 13.1: $ uname -rs Darwin 22.2.0 $ which grep /usr/bin/grep $ grep --version grep (BSD grep, GNU compatible) 2.6.0-FreeBSD $ grep '\(REG_STARTEND\|NeededForASAN\)' Makefile # Define NO_REGEX if your C library lacks regex support with REG_STARTEND NO_REGEX = NeededForASAN René ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-14 6:44 ` René Scharfe @ 2023-01-14 8:31 ` René Scharfe 2023-01-14 12:45 ` Diomidis Spinellis 0 siblings, 1 reply; 22+ messages in thread From: René Scharfe @ 2023-01-14 8:31 UTC (permalink / raw) To: Junio C Hamano, Ævar Arnfjörð Bjarmason Cc: Jeff King, Marco Nenciarini, git, Diomidis Spinellis Am 14.01.23 um 07:44 schrieb René Scharfe: > Am 13.01.23 um 18:19 schrieb Junio C Hamano: >> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> >>> On Thu, Jan 12 2023, Jeff King wrote: >>> >>>> So it does seem like all bets are off for what people can and should >>>> expect here. Which isn't to say we should make things worse. I mostly >>>> wondered if REG_ENHANCED might take us closer to what glibc was doing by >>>> default, but it doesn't seem like it. >> >> I thought that René's "Use enhanced only when doing BRE" was fairly >> focused, but I am very tempted to accept ... >> >>> There's a couple of ways out of this that I don't see in this thread: >>> >>> - Declare it not a problem: We have -G, -E and -P to map to BRE, ERE and >>> PCRE. One view is to say the first two must match POSIX, another is >>> tha whatever the platform thinks they should do is how they should >>> act. >> >> ... this view. The story "BRE and ERE work via what system >> libraries provide, and 'git grep' matches what system grep' does" is >> an easy to understand view. > > That was my stance in my first reply as well. But 3632cfc248 (Use > compatibility regex library for OSX/Darwin, 2008-09-07) explicitly > added alternation support for BREs on macOS, and 1819ad327b (grep: fix > multibyte regex handling under macOS, 2022-08-26) removed it seemingly > by accident. And grep(1) does support them on macOS 13.1: > > $ uname -rs > Darwin 22.2.0 > $ which grep > /usr/bin/grep > $ grep --version > grep (BSD grep, GNU compatible) 2.6.0-FreeBSD > $ grep '\(REG_STARTEND\|NeededForASAN\)' Makefile > # Define NO_REGEX if your C library lacks regex support with REG_STARTEND > NO_REGEX = NeededForASAN And I neglected to copy the author of 1819ad327b until now. :-| @Diomidis: Here's a link to the start of this thread: https://lore.kernel.org/git/f82ae28a-fb56-8d1f-96c8-550b61439d3a@enterprisedb.com/ René ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-14 8:31 ` René Scharfe @ 2023-01-14 12:45 ` Diomidis Spinellis 2023-01-14 16:08 ` Junio C Hamano 0 siblings, 1 reply; 22+ messages in thread From: Diomidis Spinellis @ 2023-01-14 12:45 UTC (permalink / raw) To: René Scharfe, Junio C Hamano, Ævar Arnfjörð Bjarmason Cc: Jeff King, Marco Nenciarini, git On 14-Jan-23 10:31, René Scharfe wrote: > Am 14.01.23 um 07:44 schrieb René Scharfe: >> Am 13.01.23 um 18:19 schrieb Junio C Hamano: >>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >>> >>>> On Thu, Jan 12 2023, Jeff King wrote: >>>> There's a couple of ways out of this that I don't see in this thread: >>>> >>>> - Declare it not a problem: We have -G, -E and -P to map to BRE, ERE and >>>> PCRE. One view is to say the first two must match POSIX, another is >>>> tha whatever the platform thinks they should do is how they should >>>> act. >>> >>> ... this view. The story "BRE and ERE work via what system >>> libraries provide, and 'git grep' matches what system grep' does" is >>> an easy to understand view. >> >> That was my stance in my first reply as well. But 3632cfc248 (Use >> compatibility regex library for OSX/Darwin, 2008-09-07) explicitly >> added alternation support for BREs on macOS, and 1819ad327b (grep: fix >> multibyte regex handling under macOS, 2022-08-26) removed it seemingly >> by accident. And grep(1) does support them on macOS 13.1: Indeed, I removed the alternation handling functionality by accident. I was not aware of the BRE-handling difference between the GNU and the macOS native regex library. I think having git-grep behave the same as grep(1) on each platform is consistent with the principle of least astonishment (POLA). This would mean that on macOS plain git-grep should use enhanced basic REs as proposed in René's patch. Diomidis ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-14 12:45 ` Diomidis Spinellis @ 2023-01-14 16:08 ` Junio C Hamano 0 siblings, 0 replies; 22+ messages in thread From: Junio C Hamano @ 2023-01-14 16:08 UTC (permalink / raw) To: Diomidis Spinellis Cc: René Scharfe, Ævar Arnfjörð Bjarmason, Jeff King, Marco Nenciarini, git Diomidis Spinellis <dds@aueb.gr> writes: > Indeed, I removed the alternation handling functionality by accident. > I was not aware of the BRE-handling difference between the GNU and the > macOS native regex library. I think having git-grep behave the same > as grep(1) on each platform is consistent with the principle of least > astonishment (POLA). This would mean that on macOS plain git-grep > should use enhanced basic REs as proposed in René's patch. Thanks. I am fine with what René's patch does, which we already queued, then ;-) Let's merge 54463d32 (use enhanced basic regular expressions on macOS, 2023-01-08) down to 'next'. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-13 8:28 ` Ævar Arnfjörð Bjarmason 2023-01-13 17:19 ` Junio C Hamano @ 2023-01-13 17:24 ` René Scharfe 2023-01-13 23:03 ` René Scharfe 1 sibling, 1 reply; 22+ messages in thread From: René Scharfe @ 2023-01-13 17:24 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Jeff King Cc: Marco Nenciarini, git, Junio C Hamano Am 13.01.23 um 09:28 schrieb Ævar Arnfjörð Bjarmason: > > There's a couple of ways out of this that I don't see in this thread: > > - Declare it not a problem: We have -G, -E and -P to map to BRE, ERE and > PCRE. One view is to say the first two must match POSIX, another is > tha whatever the platform thinks they should do is how they should > act. I like that and it was my first reaction -- alternation is a non-standard extension to BREs. But we supported it explicitly since 3632cfc248 (Use compatibility regex library for OSX/Darwin, 2008-09-07), so not having it anymore is a regression that we should fix, I think. > Of course that makes git regex invocations "unportable", but it might > be acceptable. People can always use PCRE if they want "portability". > > - Just mandate PCRE for Mac OS, then map -E to -P. We could do this with > the pcre2convert() API and PCRE2_CONVERT_POSIX_EXTENDED flag, > i.e. PCRE's own "translate this to ERE". > > But Perl/PCRE syntax is already a superset of ERE syntax, minus things > like (*VERB), (?: etc., which people are unlikely to write without > intending to get the PCRE semantics. PCRE2_CONVERT_POSIX_EXTENDED is a flag for pcre2_pattern_convert(), which allows converting an ERE into a PCRE2 regex (https://pcre.org/current/doc/html/pcre2convert.html). So this feature allows using PCRE for every kind of regexes, right? There may be feature differences for EREs between libc on macOS, glibc and/or GAWK, but I'm not aware of any complaints so far. So I currently don't see the need for that. René ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: BUG: git grep behave oddly with alternatives 2023-01-13 17:24 ` René Scharfe @ 2023-01-13 23:03 ` René Scharfe 0 siblings, 0 replies; 22+ messages in thread From: René Scharfe @ 2023-01-13 23:03 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Jeff King Cc: Marco Nenciarini, git, Junio C Hamano Am 13.01.23 um 18:24 schrieb René Scharfe: > Am 13.01.23 um 09:28 schrieb Ævar Arnfjörð Bjarmason: >> >> - Just mandate PCRE for Mac OS, then map -E to -P. We could do this with >> the pcre2convert() API and PCRE2_CONVERT_POSIX_EXTENDED flag, >> i.e. PCRE's own "translate this to ERE". >> >> But Perl/PCRE syntax is already a superset of ERE syntax, minus things >> like (*VERB), (?: etc., which people are unlikely to write without >> intending to get the PCRE semantics. > > PCRE2_CONVERT_POSIX_EXTENDED is a flag for pcre2_pattern_convert(), > which allows converting an ERE into a PCRE2 regex > (https://pcre.org/current/doc/html/pcre2convert.html). So this feature > allows using PCRE for every kind of regexes, right? > > There may be feature differences for EREs between libc on macOS, glibc > and/or GAWK, but I'm not aware of any complaints so far. So I currently > don't see the need for that. Thought that maybe this could help us matching NUL characters and fix the TODO in t7815, because PCRE2 can handle binary files just fine. But no: pcre2_pattern_convert() uses (NUL*) in its result, meaning that NUL is treated as a line separator and . (dot) won't match it. And it's still experimental according to the documentation link I mentioned above. René ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2023-01-14 16:09 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-03 9:53 BUG: git grep behave oddly with alternatives Marco Nenciarini 2023-01-03 16:29 ` René Scharfe 2023-01-03 18:13 ` Marco Nenciarini 2023-01-03 20:52 ` René Scharfe 2023-01-04 6:13 ` Junio C Hamano 2023-01-04 7:46 ` Jeff King 2023-01-04 16:36 ` René Scharfe 2023-01-06 9:09 ` Jeff King 2023-01-08 0:42 ` René Scharfe 2023-01-08 1:27 ` Junio C Hamano 2023-01-11 18:56 ` Jeff King 2023-01-12 17:13 ` René Scharfe 2023-01-12 17:52 ` Ævar Arnfjörð Bjarmason 2023-01-12 21:54 ` Jeff King 2023-01-13 8:28 ` Ævar Arnfjörð Bjarmason 2023-01-13 17:19 ` Junio C Hamano 2023-01-14 6:44 ` René Scharfe 2023-01-14 8:31 ` René Scharfe 2023-01-14 12:45 ` Diomidis Spinellis 2023-01-14 16:08 ` Junio C Hamano 2023-01-13 17:24 ` René Scharfe 2023-01-13 23:03 ` René Scharfe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).