All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "René Scharfe" <l.s.r@web.de>, "Carlo Arenas" <carenas@gmail.com>,
	"Andreas Schwab" <schwab@linux-m68k.org>,
	"Hamza Mahfooz" <someguy@effective-light.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v13 3/3] grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data
Date: Fri, 19 Nov 2021 09:11:45 -0800	[thread overview]
Message-ID: <xmqqczmwqdda.fsf@gitster.g> (raw)
In-Reply-To: <211119.86mtm04og8.gmgdl@evledraar.gmail.com> (=?utf-8?B?IsOG?= =?utf-8?B?dmFyIEFybmZqw7Zyw7A=?= Bjarmason"'s message of "Fri, 19 Nov 2021 08:00:30 +0100")

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Nov 18 2021, René Scharfe wrote:
>>
>> Makes sense to me.  It gives a better starting point to solve the issue
>> afresh without getting entangled in mind-melting boolean expressions.
>
> Yes, agreed. As noted I haven't had time to dig deeply into this, but
> from what I've seen so far there doesn't seem to be any obvious way
> forward in terms of a quick fix.
>
> I thought perhaps your patch would be that (but I haven't looked into it
> carefully enough), but since you're on-board with reverting & retrying.

OK.  Here is what I'll queue 'master', in the hope of later merging
down and be part of v2.34.1.

Thanks, all.

----- >8 --------- >8 --------- >8 --------- >8 -----
Subject: [PATCH] Revert "grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data"

This reverts commit ae39ba431ab861548eb60b4bd2e1d8b8813db76f, as it
breaks "grep" when looking for a string in non UTF-8 haystack, when
linked with certain versions of PCREv2 library.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c                          |  6 ++---
 t/t7812-grep-icase-non-ascii.sh | 48 ---------------------------------
 2 files changed, 2 insertions(+), 52 deletions(-)

diff --git a/grep.c b/grep.c
index f6e113e9f0..fe847a0111 100644
--- a/grep.c
+++ b/grep.c
@@ -382,10 +382,8 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 		}
 		options |= PCRE2_CASELESS;
 	}
-	if ((!opt->ignore_locale && !has_non_ascii(p->pattern)) ||
-	    (!opt->ignore_locale && is_utf8_locale() &&
-	     has_non_ascii(p->pattern) && !(!opt->ignore_case &&
-					    (p->fixed || p->is_fixed))))
+	if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
+	    !(!opt->ignore_case && (p->fixed || p->is_fixed)))
 		options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
 
 #ifdef GIT_PCRE2_VERSION_10_36_OR_HIGHER
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index 22487d90fd..e5d1e4ea68 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -53,54 +53,6 @@ test_expect_success REGEX_LOCALE 'pickaxe -i on non-ascii' '
 	test_cmp expected actual
 '
 
-test_expect_success GETTEXT_LOCALE,PCRE 'log --author with an ascii pattern on UTF-8 data' '
-	cat >expected <<-\EOF &&
-	Author: <BOLD;RED>À Ú Thor<RESET> <author@example.com>
-	EOF
-	test_write_lines "forth" >file4 &&
-	git add file4 &&
-	git commit --author="À Ú Thor <author@example.com>" -m sécond &&
-	git log -1 --color=always --perl-regexp --author=".*Thor" >log &&
-	grep Author log >actual.raw &&
-	test_decode_color <actual.raw >actual &&
-	test_cmp expected actual
-'
-
-test_expect_success GETTEXT_LOCALE,PCRE 'log --committer with an ascii pattern on ISO-8859-1 data' '
-	cat >expected <<-\EOF &&
-	Commit:     Ç<BOLD;RED> O Mîtter <committer@example.com><RESET>
-	EOF
-	test_write_lines "fifth" >file5 &&
-	git add file5 &&
-	GIT_COMMITTER_NAME="Ç O Mîtter" &&
-	GIT_COMMITTER_EMAIL="committer@example.com" &&
-	git -c i18n.commitEncoding=latin1 commit -m thïrd &&
-	git -c i18n.logOutputEncoding=latin1 log -1 --pretty=fuller --color=always --perl-regexp --committer=" O.*" >log &&
-	grep Commit: log >actual.raw &&
-	test_decode_color <actual.raw >actual &&
-	test_cmp expected actual
-'
-
-test_expect_success GETTEXT_LOCALE,PCRE 'log --grep with an ascii pattern on UTF-8 data' '
-	cat >expected <<-\EOF &&
-	    sé<BOLD;RED>con<RESET>d
-	EOF
-	git log -1 --color=always --perl-regexp --grep="con" >log &&
-	grep con log >actual.raw &&
-	test_decode_color <actual.raw >actual &&
-	test_cmp expected actual
-'
-
-test_expect_success GETTEXT_LOCALE,PCRE 'log --grep with an ascii pattern on ISO-8859-1 data' '
-	cat >expected <<-\EOF &&
-	    <BOLD;RED>thïrd<RESET>
-	EOF
-	git -c i18n.logOutputEncoding=latin1 log -1 --color=always --perl-regexp --grep="th.*rd" >log &&
-	grep "th.*rd" log >actual.raw &&
-	test_decode_color <actual.raw >actual &&
-	test_cmp expected actual
-'
-
 test_expect_success GETTEXT_LOCALE,LIBPCRE2 'PCRE v2: setup invalid UTF-8 data' '
 	printf "\\200\\n" >invalid-0x80 &&
 	echo "ævar" >expected &&
-- 
2.34.0-202-gd9146917d7


  parent reply	other threads:[~2021-11-19 17:11 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-15 16:13 [PATCH v13 1/3] grep: refactor next_match() and match_one_pattern() for external use Hamza Mahfooz
2021-10-15 16:13 ` [PATCH v13 2/3] pretty: colorize pattern matches in commit messages Hamza Mahfooz
2021-10-15 16:13 ` [PATCH v13 3/3] grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data Hamza Mahfooz
2021-10-15 20:03   ` Junio C Hamano
2021-10-16 16:25     ` René Scharfe
2021-10-16 17:12       ` Ævar Arnfjörð Bjarmason
2021-10-16 19:44         ` René Scharfe
2021-10-17  6:00           ` Junio C Hamano
2021-10-17  6:55             ` René Scharfe
2021-10-17  9:44               ` Ævar Arnfjörð Bjarmason
2021-11-15 20:43   ` Andreas Schwab
2021-11-15 22:41     ` Ævar Arnfjörð Bjarmason
2021-11-16  2:12       ` Carlo Arenas
2021-11-16  8:41       ` Andreas Schwab
2021-11-16  9:06         ` Carlo Arenas
2021-11-16  9:18           ` Andreas Schwab
2021-11-16  9:29           ` Andreas Schwab
2021-11-16  9:38             ` Carlo Arenas
2021-11-16  9:55               ` Andreas Schwab
2021-11-16 11:00                 ` [PATCH] grep: avoid setting UTF mode when not needed Carlo Marcelo Arenas Belón
2021-11-16 12:32                   ` Ævar Arnfjörð Bjarmason
2021-11-16 13:35                     ` Hamza Mahfooz
2021-11-17 14:31                       ` Ævar Arnfjörð Bjarmason
2021-11-16 18:22                     ` Carlo Arenas
2021-11-16 18:48                     ` Junio C Hamano
2021-11-17 10:23                   ` [PATCH v2] grep: avoid setting UTF mode when dangerous with PCRE Carlo Marcelo Arenas Belón
2021-11-18  7:29                     ` Junio C Hamano
2021-11-18 10:15                       ` Carlo Arenas
2021-11-18 12:49                         ` Hamza Mahfooz
2021-11-19  6:58                           ` Ævar Arnfjörð Bjarmason
2021-11-18 21:21                     ` Hamza Mahfooz
2021-11-17 18:46               ` [PATCH v13 3/3] grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data René Scharfe
2021-11-17 19:56                 ` Carlo Arenas
2021-11-17 20:59                 ` Ævar Arnfjörð Bjarmason
2021-11-17 21:53                   ` Carlo Arenas
2021-11-18  0:00                     ` Ævar Arnfjörð Bjarmason
2021-11-18 18:17                   ` Junio C Hamano
2021-11-18 20:57                     ` René Scharfe
2021-11-19  7:00                       ` Ævar Arnfjörð Bjarmason
2021-11-19 16:08                         ` René Scharfe
2021-11-19 17:33                           ` Carlo Arenas
2021-11-19 17:11                         ` Junio C Hamano [this message]
2021-10-15 18:05 ` [PATCH v13 1/3] grep: refactor next_match() and match_one_pattern() for external use Junio C Hamano
2021-10-15 18:24   ` Hamza Mahfooz
2021-10-15 19:28     ` Junio C Hamano
2021-10-15 19:40       ` Hamza Mahfooz
2021-10-15 19:49       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqczmwqdda.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=carenas@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    --cc=schwab@linux-m68k.org \
    --cc=someguy@effective-light.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.