From: "D. Ben Knoble" <ben.knoble@gmail.com>
To: git@vger.kernel.org
Subject: RE: grep: fix multibyte regex handling under macOS (1819ad327b7a1f19540a819813b70a0e8a7f798f)
Date: Wed, 1 Feb 2023 10:18:39 -0500 [thread overview]
Message-ID: <CALnO6CAZtwfGY4SYeOuKqdP9+e_0EYNf4F703DRQB7UUfd_bUg@mail.gmail.com> (raw)
I recently updated to git 2.39.1 and noticed today that `git diff
--word-diff` fails for files with `diff=scheme`. I was able to narrow
the failure down to the inclusion of control characters \xc0, \xff,
\x80, \xbf by https://github.com/git/git/blob/2fc9e9ca3c7505bc60069f11e7ef09b1aeeee473/userdiff.c#L17
in the definition of the scheme diff pattern (really, all patterns).
I suspect the commit referenced in the subject, given that it messes
with regex handling on macOS.
Relevant environment that I can think of:
```
# locale
LANG="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_CTYPE="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_ALL="fr_FR.UTF-8"
```
I'm on macOS 11.7.
Failure (using Zsh to produce the characters; I think there's a Bash
equivalent):
```
# git diff --word-diff --word-diff-regex=$'[\xc0-\xff][\x80-\xbf]+'
fatal¬†: invalid regular expression: [¿-ˇ][Ä-ø]+
```
(Looks like the output is a bit scrambled; here's the hexdump)
```
# !! |& xxd
00000000: 6661 7461 6cc2 a03a 2069 6e76 616c 6964 fatal..: invalid
00000010: 2072 6567 756c 6172 2065 7870 7265 7373 regular express
00000020: 696f 6e3a 205b c02d ff5d 5b80 2dbf 5d2b ion: [.-.][.-.]+
00000030: 0a .
```
--
D. Ben Knoble
next reply other threads:[~2023-02-01 15:19 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-01 15:18 D. Ben Knoble [this message]
2023-02-01 16:09 ` grep: fix multibyte regex handling under macOS (1819ad327b7a1f19540a819813b70a0e8a7f798f) demerphq
2023-02-01 16:21 ` D. Ben Knoble
2023-02-01 18:23 ` demerphq
2023-02-01 18:54 ` Junio C Hamano
2023-02-01 21:33 ` D. Ben Knoble
2023-02-01 21:34 ` D. Ben Knoble
2023-02-01 22:15 ` Junio C Hamano
2023-02-01 23:03 ` Jeff King
2023-02-02 16:22 ` demerphq
2023-02-02 20:49 ` D. Ben Knoble
2023-02-03 17:01 ` Jeff King
2023-02-03 21:56 ` Ævar Arnfjörð Bjarmason
2023-02-04 11:17 ` Jeff King
2023-02-04 11:32 ` demerphq
2023-02-05 19:51 ` D. Ben Knoble
2023-02-07 18:23 ` Jeff King
2023-02-07 22:27 ` D. Ben Knoble
2023-02-07 18:19 ` Jeff King
2023-02-02 20:47 ` D. Ben Knoble
2023-02-03 16:55 ` Jeff King
2023-02-03 17:06 ` D. Ben Knoble
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALnO6CAZtwfGY4SYeOuKqdP9+e_0EYNf4F703DRQB7UUfd_bUg@mail.gmail.com \
--to=ben.knoble@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).