git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Diomidis Spinellis via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, "Eric Sunshine" <sunshine@sunshineco.com>,
	"Diomidis Spinellis" <dds@aueb.gr>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v2] grep: fix multibyte regex handling under macOS
Date: Wed, 24 Aug 2022 11:56:51 -0700	[thread overview]
Message-ID: <xmqq5yih79n0.fsf@gitster.g> (raw)
In-Reply-To: <pull.1313.v2.git.git.1661289990205.gitgitgadget@gmail.com> (Diomidis Spinellis via GitGitGadget's message of "Tue, 23 Aug 2022 21:26:30 +0000")

"Diomidis Spinellis via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Diomidis Spinellis <dds@aueb.gr>
>
> The 2013 commit 29de20504e9 fixed t0070-fundamental.sh under Darwin
> (macOS) by adopting Git's regex library.  However, this library is
> compiled with NO_MBSUPPORT, which causes git-grep to work incorrectly
> on multibyte (e.g. UTF-8) files.  Current macOS versions pass
> t0070-fundamental.sh with the native macOS regex library, which
> also supports multibyte characters.

Very nice to see the last sentence in that paragraph.

> Signed-off-by: Diomidis Spinellis <dds@aueb.gr>
> ---

This part, from here ...

>     grep: fix multibyte regex handling under macOS
>     
>     The 2013 commit 29de20504e9 fixed t0070-fundamental.sh under Darwin
>     (macOS) by adopting Git's regex library. However, this library is
>     compiled with NO_MBSUPPORT, which causes git-grep to work incorrectly on
>     multibyte (e.g. UTF-8) files. Current macOS versions pass
>     t0070-fundamental.sh with the native macOS regex library, which also
>     supports multibyte characters.
>     
>     Adjust the Makefile to use the native regex library, and call
>     setlocale(3) to set CTYPE according to the user's preference. The
>     setlocale(3) call is required on all platforms, but in platforms
>     supporting gettext(3), setlocale(3) is called as a side-effect of
>     initializing gettext(3).
>     
>     To avoid running the new tests on platforms still using the
>     compatibility library, which is compiled without multibyte support,
>     store the corresponding NO_REGEX setting in the GIT-BUILD-OPTIONS file.
>     This makes it available to the test scripts. In addition, adjust the
>     test-tool regex command to work with multibyte regexes to further test a
>     platform's support for them.
>     
>     Signed-off-by: Diomidis Spinellis dds@aueb.gr

... up to here does not need a separate sign-off.  It is primarily
to describe what got changed between v1 and this version.  The
changes can be seen in range-diff, and in some cases what the
range-diff shows is sufficient to understand the difference, but it
is better to either (1) supply explanation in your own words, or (2)
omit almost duplicate description.

> diff --git a/Makefile b/Makefile
> index 04d0fd1fe60..d1a98257150 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1427,7 +1427,6 @@ ifeq ($(uname_S),Darwin)
>  		APPLE_COMMON_CRYPTO = YesPlease
>  		COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
>  	endif
> -	NO_REGEX = YesPlease
>  	PTHREAD_LIBS =
>  endif

I notice that this is the ONLY conditional section in our primary
Makefile that switches upon the value of $(uname_<anything>).  After
the dust settles, we should move this whole part to config.mak.uname
as an unrelated clean-up.

Alternatively, you can choose to do that (without losing NO_REGEX)
as a preliminary clean-up patch and then build this "recent Darwin
has usable regexp library" change on top of it.  I do not have a
preference either way, other than that we do not want to see that
done as part of this change "while we are at it".

> diff --git a/grep.c b/grep.c
> index 82eb7da1022..c31657c3da3 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -10,6 +10,8 @@
>  #include "quote.h"
>  #include "help.h"
>  
> +#include <locale.h>

Don't ever include system headers in generic *.c files.  The system
headers on different platforms have subtle ordering restrictions
among include files and definitions of feature test macros, and all
those subtleties are supposed to be handled in one central place, in
the "git-compat-util.h" file.

A source file specific only to a particular platform is a different
matter; they can include system headers that exist or are needed
only on those systems directly.

So far, "grep.c" has been successfully used for everybody (except
Darwin), and if it needs touching, that means something is still
wrong with Darwin.

Is is because you are not using gettext()?  I thought our
gettext.[ch] did consider some folks/platforms do not use system
gettext() but perhaps its support for such build environment is
slightly lacking and does not make necessary setlocale() calls.

If that is why you are mucking with this file, the right place to
add changes like this is not here specific to grep.c, but to the
start-up sequence that happens before cmd_main() inside
common-main.c or somewhere around there, I think.  Let me summon the
author of 5e9637c6 (i18n: add infrastructure for translating Git
with gettext, 2011-11-18) by Cc'ing.

  reply	other threads:[~2022-08-24 18:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-23 19:08 [PATCH] grep: fix multibyte regex handling under macOS Diomidis Spinellis via GitGitGadget
2022-08-23 20:40 ` Eric Sunshine
2022-08-24  5:13   ` Diomidis Spinellis
2022-08-23 21:26 ` [PATCH v2] " Diomidis Spinellis via GitGitGadget
2022-08-24 18:56   ` Junio C Hamano [this message]
2022-08-25  8:23     ` Diomidis Spinellis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq5yih79n0.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=dds@aueb.gr \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).