All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] PCREv2 & more
@ 2017-04-08 13:24 Ævar Arnfjörð Bjarmason
  2017-04-08 13:24 ` [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0 Ævar Arnfjörð Bjarmason
                   ` (12 more replies)
  0 siblings, 13 replies; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

This adds PCRE v2 support, but as I was adding that I kept noticing
other related problems to fix. It's all bundled up into the same
series because much of it conflicts because it modifies the same test
or other code. Notes on each patch below.

Ævar Arnfjörð Bjarmason (12):
  grep: add ability to disable threading with --threads=0 or
    grep.threads=0

This really has nothing to do with the rest except I'm using it to
test non-multithreaded & threaded PCRE more easily.

  grep: remove redundant regflags assignment under PCRE
  Makefile & configure: reword outdated comment about PCRE

Just some trivial cleanups.

  grep: add a test for backreferences in PCRE patterns
  log: add exhaustive tests for pattern style options & config

Yay, more tests!

  log: add -P as a synonym for --perl-regexp

We've had --perl-regexp for years, but not -P like grep, add it.

  grep & rev-list doc: stop promising libpcre for --perl-regexp
  grep: make grep.patternType=[pcre|pcre1] a synonym for "perl"
  test-lib: rename the LIBPCRE prerequisite to PCRE
  grep: change the internal PCRE macro names to be PCRE1
  grep: change the internal PCRE code & header names to be PCRE1
  grep: add support for PCRE v2

These combined add the support for PCRE 2. It's split up for ease of
readability. The last one's still a bit big, and I could e.g. split up
all the Makefile/autoconf stuff into a different patch (which wouldn't
do anything without the code), but I thought on balance doing it this
way made the most sense.

 Documentation/config.txt           |   7 ++
 Documentation/git-grep.txt         |   8 +-
 Documentation/rev-list-options.txt |   6 +-
 Makefile                           |  28 +++++-
 builtin/grep.c                     |  26 +++++-
 configure.ac                       |  61 ++++++++++--
 grep.c                             | 184 ++++++++++++++++++++++++++++++-------
 grep.h                             |  26 ++++--
 revision.c                         |   2 +-
 t/README                           |  16 +++-
 t/t4202-log.sh                     |  76 ++++++++++++++-
 t/t7810-grep.sh                    |  79 +++++++++++++---
 t/t7812-grep-icase-non-ascii.sh    |   4 +-
 t/t7813-grep-icase-iso.sh          |  11 ++-
 t/test-lib.sh                      |   4 +-
 15 files changed, 456 insertions(+), 82 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
@ 2017-04-08 13:24 ` Ævar Arnfjörð Bjarmason
  2017-04-11 10:06   ` Jeff King
  2017-04-08 13:24 ` [PATCH 02/12] grep: remove redundant regflags assignment under PCRE Ævar Arnfjörð Bjarmason
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Add the ability to entirely disable threading by having grep.threads=0
in the config or --threads=0 on the command-line.

This was made configurable in commit 89f09dd34e ("grep: add
--threads=<num> option and grep.threads configuration",
2015-12-15). Before that change there was no way to disable threaded
grep other than to recompile Git.

It's very useful for testing & debugging to be able to entirely
disable threading without recompiling with NO_PTHREADS=YesPlease, so
support setting the value to 0 to disable threading.

There was no reason this wasn't the case already other than an
implementation detail in how OPT_INTEGER() works. When it's used
there's no way to tell the difference between an unset value & the
default value. Use OPT_CALLBACK() instead using the same pattern as in
commit b16a991c1b ("cherry-pick: detect bogus arguments to
--mainline", 2017-03-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-grep.txt |  4 ++--
 builtin/grep.c             | 26 ++++++++++++++++++++++----
 t/t7810-grep.sh            | 10 ++++++++++
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 71f32f3508..7b52e3fbc4 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -56,8 +56,8 @@ grep.extendedRegexp::
 	other than 'default'.
 
 grep.threads::
-	Number of grep worker threads to use.  If unset (or set to 0),
-	8 threads are used by default (for now).
+	Number of grep worker threads to use.  If unset, 8 threads are
+	used by default (for now). Set to 0 to disable threading.
 
 grep.fullName::
 	If set to true, enable `--full-name` option by default.
diff --git a/builtin/grep.c b/builtin/grep.c
index 65070c52fc..9478ab5dff 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -35,7 +35,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
 				 const struct grep_source *gs);
 
 #define GREP_NUM_THREADS_DEFAULT 8
-static int num_threads;
+static int num_threads = -1;
 
 #ifndef NO_PTHREADS
 static pthread_t *threads;
@@ -897,6 +897,24 @@ static int context_callback(const struct option *opt, const char *arg,
 	return 0;
 }
 
+static int thread_callback(const struct option *opt,
+			   const char *arg, int unset)
+{
+	int *threads = (int*)opt->value;
+	char *end;
+
+	if (unset) {
+		*threads = GREP_NUM_THREADS_DEFAULT;
+		return 0;
+	}
+
+	*threads = strtol(arg, &end, 10);
+	if (*end || *threads < 0)
+		return opterror(opt, "invalid number of threads specified", 0);
+
+	return 0;
+}
+
 static int file_callback(const struct option *opt, const char *arg, int unset)
 {
 	struct grep_opt *grep_opt = opt->value;
@@ -1049,8 +1067,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			N_("show <n> context lines before matches")),
 		OPT_INTEGER('A', "after-context", &opt.post_context,
 			N_("show <n> context lines after matches")),
-		OPT_INTEGER(0, "threads", &num_threads,
-			N_("use <n> worker threads")),
+		OPT_CALLBACK(0, "threads", &num_threads, N_("n"),
+			N_("use <n> worker threads"), thread_callback),
 		OPT_NUMBER_CALLBACK(&opt, N_("shortcut for -C NUM"),
 			context_callback),
 		OPT_BOOL('p', "show-function", &opt.funcname,
@@ -1222,7 +1240,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 #ifndef NO_PTHREADS
 	if (list.nr || cached || show_in_pager)
 		num_threads = 0;
-	else if (num_threads == 0)
+	else if (num_threads == -1)
 		num_threads = GREP_NUM_THREADS_DEFAULT;
 	else if (num_threads < 0)
 		die(_("invalid number of threads specified (%d)"), num_threads);
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index cee42097b0..53c2ca05c4 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1505,4 +1505,14 @@ test_expect_success 'grep does not report i-t-a and assume unchanged with -L' '
 	test_cmp expected actual
 '
 
+test_expect_success 'grep with thread options' '
+	git -c grep.threads=4 grep st.*dio &&
+	git grep --threads=4 st.*dio &&
+	git -c grep.threads=4 grep --threads=6 st.*dio &&
+	test_must_fail git -c grep.threads=-1 grep st.*dio &&
+	test_must_fail git -c grep.threads=-1 grep --threads=-1 st.*dio &&
+	test_must_fail git -c grep.threads=-1 grep --threads=1 st.*dio &&
+	test_must_fail git -c grep.threads=1 grep --threads=-1 st.*dio
+'
+
 test_done
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 02/12] grep: remove redundant regflags assignment under PCRE
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
  2017-04-08 13:24 ` [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0 Ævar Arnfjörð Bjarmason
@ 2017-04-08 13:24 ` Ævar Arnfjörð Bjarmason
  2017-04-11 10:10   ` Jeff King
  2017-04-08 13:24 ` [PATCH 03/12] Makefile & configure: reword outdated comment about PCRE Ævar Arnfjörð Bjarmason
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Remove a redundant assignment to the "regflags" variable. This
variable is only used for POSIX regular expression matching, not when
the PCRE library is used.

This redundant assignment was added as a result of copy/paste
programming in commit 84befcd0a4 ("grep: add a grep.patternType
configuration setting", 2012-08-03). That commit modified already
working code in commit cca2c172e0 ("git-grep: do not die upon -F/-P
when grep.extendedRegexp is set.", 2011-05-09) which didn't assign to
regflags when under PCRE.

Revert back to that behavior, more to reduce "wait this is used under
PCRE how?" confusion when reading the code, than to to save ourselves
trivial CPU cycles by removing one assignment.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/grep.c b/grep.c
index 56ef0ecbff..8564fe726d 100644
--- a/grep.c
+++ b/grep.c
@@ -197,7 +197,6 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 	case GREP_PATTERN_TYPE_PCRE:
 		opt->fixed = 0;
 		opt->pcre = 1;
-		opt->regflags &= ~REG_EXTENDED;
 		break;
 	}
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 03/12] Makefile & configure: reword outdated comment about PCRE
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
  2017-04-08 13:24 ` [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0 Ævar Arnfjörð Bjarmason
  2017-04-08 13:24 ` [PATCH 02/12] grep: remove redundant regflags assignment under PCRE Ævar Arnfjörð Bjarmason
@ 2017-04-08 13:24 ` Ævar Arnfjörð Bjarmason
  2017-04-11 10:14   ` Jeff King
  2017-04-08 13:24 ` [PATCH 04/12] grep: add a test for backreferences in PCRE patterns Ævar Arnfjörð Bjarmason
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Reword an outdated comment which suggests that only git-grep can use
PCRE.

This comment was added back when PCRE support was initially added in
commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09), and was true
at the time.

It hasn't been telling the full truth since git-log learned to use
PCRE with --grep in commit 727b6fc3ed ("log --grep: accept
--basic-regexp and --perl-regexp", 2012-10-03), and more importantly
is likely to get more inaccurate over time as more use is made of PCRE
in other areas.

Reword it to be more future-proof, and to more clearly explain that
this enables user-initiated runtime behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile     |  6 ++++--
 configure.ac | 12 ++++++++----
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index 9b36068ac5..23945d87cf 100644
--- a/Makefile
+++ b/Makefile
@@ -24,8 +24,10 @@ all::
 # Define NO_OPENSSL environment variable if you do not have OpenSSL.
 # This also implies BLK_SHA1.
 #
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as like log, grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
diff --git a/configure.ac b/configure.ac
index 128165529f..d09a204a7e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -250,8 +250,10 @@ AS_HELP_STRING([--with-openssl],[use OpenSSL library (default is YES)])
 AS_HELP_STRING([],              [ARG can be prefix for openssl library and headers]),
 GIT_PARSE_WITH([openssl]))
 
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as like log, grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
@@ -499,8 +501,10 @@ GIT_CONF_SUBST([NEEDS_SSL_WITH_CRYPTO])
 GIT_CONF_SUBST([NO_OPENSSL])
 
 #
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as like log, grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 
 if test -n "$USE_LIBPCRE"; then
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 04/12] grep: add a test for backreferences in PCRE patterns
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2017-04-08 13:24 ` [PATCH 03/12] Makefile & configure: reword outdated comment about PCRE Ævar Arnfjörð Bjarmason
@ 2017-04-08 13:24 ` Ævar Arnfjörð Bjarmason
  2017-04-08 13:24 ` [PATCH 05/12] log: add exhaustive tests for pattern style options & config Ævar Arnfjörð Bjarmason
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Add a test for backreferences such as (.)\1 in PCRE patterns. This
test ensures that the PCRE_NO_AUTO_CAPTURE option isn't turned
on. Before this change turning it on would break these sort of
patterns, but wouldn't break any tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7810-grep.sh | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 53c2ca05c4..83b0ee53be 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1102,6 +1102,13 @@ test_expect_success LIBPCRE 'grep -P -w pattern' '
 	test_cmp expected actual
 '
 
+test_expect_success PCRE 'grep -P backreferences work (the PCRE NO_AUTO_CAPTURE flag is not set)' '
+	git grep -P -h "(?P<one>.)(?P=one)" hello_world >actual &&
+	test_cmp hello_world actual &&
+	git grep -P -h "(.)\1" hello_world >actual &&
+	test_cmp hello_world actual
+'
+
 test_expect_success 'grep -G invalidpattern properly dies ' '
 	test_must_fail git grep -G "a["
 '
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 05/12] log: add exhaustive tests for pattern style options & config
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2017-04-08 13:24 ` [PATCH 04/12] grep: add a test for backreferences in PCRE patterns Ævar Arnfjörð Bjarmason
@ 2017-04-08 13:24 ` Ævar Arnfjörð Bjarmason
  2017-04-11 10:23   ` Jeff King
  2017-04-08 13:25 ` [PATCH 06/12] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Add exhaustive tests for how the different grep.patternType options &
the corresponding command-line options affect git-log.

Before this change it was possible to patch revision.c so that the
--basic-regexp option was synonymous with --extended-regexp, and
--perl-regexp wasn't recognized at all, and still have 100% of the
test suite pass.

This was because the first test being modified here, added in commit
34a4ae55b2 ("log --grep: use the same helper to set -E/-F options as
"git grep"", 2012-10-03), didn't actually check whether we'd enabled
extended regular expressions as distinct from re-toggling non-fixed
string support.

Fix that by changing the pattern to a pattern that'll only match if
--extended-regexp option is provided, but won't match under the
default --basic-regexp option.

Other potential regressions were possible since there were no tests
for the rest of the combinations of grep.patternType configuration
toggles & corresponding git-log command-line options. Add exhaustive
tests for those.

The patterns being passed to fixed/basic/extended/PCRE are carefully
crafted to return the wrong thing if the grep engine were to pick any
other matching method than the one it's told to use.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4202-log.sh | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 48b55bfd27..00d0585253 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -261,7 +261,13 @@ test_expect_success 'log --grep -i' '
 
 test_expect_success 'log -F -E --grep=<ere> uses ere' '
 	echo second >expect &&
-	git log -1 --pretty="tformat:%s" -F -E --grep=s.c.nd >actual &&
+	git log -1 --pretty="tformat:%s" -F -E --grep="(s).c.nd" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success LIBPCRE 'log -F -E --perl-regexp --grep=<pcre> uses PCRE' '
+	echo second >expect &&
+	git log -1 --pretty="tformat:%s" -F -E --perl-regexp --grep="s(?=ec)econd" >actual &&
 	test_cmp expect actual
 '
 
@@ -279,6 +285,65 @@ test_expect_success 'log with grep.patternType configuration and command line' '
 	test_cmp expect actual
 '
 
+test_expect_success 'log with various grep.patternType configurations & command-lines' '
+	git init pattern-type &&
+	(
+		cd pattern-type &&
+		test_commit 1 file A &&
+		test_commit "(1|2)" file B &&
+
+		echo "(1|2)" >expect.fixed &&
+		cp expect.fixed expect.basic &&
+		cp expect.fixed expect.extended &&
+		cp expect.fixed expect.perl &&
+
+		git -c grep.patternType=fixed log --pretty=tformat:%s \
+			--grep="(1|2)" >actual.fixed &&
+		git -c grep.patternType=basic log --pretty=tformat:%s \
+			--grep="(.|.)" >actual.basic &&
+		git -c grep.patternType=extended log --pretty=tformat:%s \
+			--grep="\|2" >actual.extended &&
+		if test_have_prereq LIBPCRE
+		then
+			git -c grep.patternType=perl log --pretty=tformat:%s \
+				--grep="\((?=1)" >actual.perl
+		fi &&
+		test_cmp expect.fixed actual.fixed &&
+		test_cmp expect.basic actual.basic &&
+		test_cmp expect.extended actual.extended &&
+		if test_have_prereq LIBPCRE
+		then
+			test_cmp expect.perl actual.perl
+		fi &&
+
+		git log --pretty=tformat:%s -F \
+			--grep="(1|2)" >actual.fixed.short-arg &&
+		git log --pretty=tformat:%s -E \
+			--grep="\|2" >actual.extended.short-arg &&
+		test_cmp expect.fixed actual.fixed.short-arg &&
+		test_cmp expect.extended actual.extended.short-arg &&
+
+		git log --pretty=tformat:%s --fixed-strings \
+			--grep="(1|2)" >actual.fixed.long-arg &&
+		git log --pretty=tformat:%s --basic-regexp \
+			--grep="(.|.)" >actual.basic.long-arg &&
+		git log --pretty=tformat:%s --extended-regexp \
+			--grep="\|2" >actual.extended.long-arg &&
+		if test_have_prereq LIBPCRE
+		then
+			git log --pretty=tformat:%s --perl-regexp \
+				--grep="\((?=1)" >actual.perl.long-arg
+		fi &&
+		test_cmp expect.fixed actual.fixed.long-arg &&
+		test_cmp expect.basic actual.basic.long-arg &&
+		test_cmp expect.extended actual.extended.long-arg &&
+		if test_have_prereq LIBPCRE
+		then
+			test_cmp expect.perl actual.perl.long-arg
+		fi
+	)
+'
+
 cat > expect <<EOF
 * Second
 * sixth
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 06/12] log: add -P as a synonym for --perl-regexp
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
                   ` (4 preceding siblings ...)
  2017-04-08 13:24 ` [PATCH 05/12] log: add exhaustive tests for pattern style options & config Ævar Arnfjörð Bjarmason
@ 2017-04-08 13:25 ` Ævar Arnfjörð Bjarmason
  2017-04-10  2:39   ` Junio C Hamano
  2017-04-11 10:26   ` Jeff King
  2017-04-08 13:25 ` [PATCH 07/12] grep & rev-list doc: stop promising libpcre " Ævar Arnfjörð Bjarmason
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:25 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Add a short -P option as a synonym for the longer --perl-regexp, for
consistency with the options the corresponding grep invocations
accept.

This was intentionally omitted in commit 727b6fc3ed ("log --grep:
accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified
future use.

Since nothing has come along in over 4 1/2 years that's wanted to use
it, it's more valuable to make it consistent with "grep" than to keep
it open for future use, and to avoid the confusion of -P meaning
different things for grep & log, as is the case with the -G option.

As noted in the aforementioned commit the --basic-regexp option can't
have a corresponding -G argument, as the log command already uses that
for -G<regex>.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/rev-list-options.txt | 1 +
 revision.c                         | 2 +-
 t/t4202-log.sh                     | 9 +++++++++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a02f7324c0..5b7fa49a7f 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -91,6 +91,7 @@ endif::git-rev-list[]
 	Consider the limiting patterns to be fixed strings (don't interpret
 	pattern as a regular expression).
 
+-P::
 --perl-regexp::
 	Consider the limiting patterns to be Perl-compatible regular expressions.
 	Requires libpcre to be compiled in.
diff --git a/revision.c b/revision.c
index 7ff61ff5f7..03a3a012de 100644
--- a/revision.c
+++ b/revision.c
@@ -1995,7 +1995,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
 	} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;
-	} else if (!strcmp(arg, "--perl-regexp")) {
+	} else if (!strcmp(arg, "--perl-regexp") || !strcmp(arg, "-P")) {
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE;
 	} else if (!strcmp(arg, "--all-match")) {
 		revs->grep_filter.all_match = 1;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 00d0585253..6f7c74e027 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -320,8 +320,17 @@ test_expect_success 'log with various grep.patternType configurations & command-
 			--grep="(1|2)" >actual.fixed.short-arg &&
 		git log --pretty=tformat:%s -E \
 			--grep="\|2" >actual.extended.short-arg &&
+		if test_have_prereq PCRE
+		then
+			git log --pretty=tformat:%s -P \
+				--grep="\((?=1)" >actual.perl.short-arg
+		fi &&
 		test_cmp expect.fixed actual.fixed.short-arg &&
 		test_cmp expect.extended actual.extended.short-arg &&
+		if test_have_prereq PCRE
+		then
+			test_cmp expect.perl actual.perl.short-arg
+		fi &&
 
 		git log --pretty=tformat:%s --fixed-strings \
 			--grep="(1|2)" >actual.fixed.long-arg &&
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 07/12] grep & rev-list doc: stop promising libpcre for --perl-regexp
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
                   ` (5 preceding siblings ...)
  2017-04-08 13:25 ` [PATCH 06/12] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
@ 2017-04-08 13:25 ` Ævar Arnfjörð Bjarmason
  2017-04-08 13:25 ` [PATCH 08/12] grep: make grep.patternType=[pcre|pcre1] a synonym for "perl" Ævar Arnfjörð Bjarmason
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:25 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Stop promising in our grep & rev-list options documentation that we're
always going to be using libpcre when given the --perl-regexp option.

Instead talk about using "Perl-compatible regular expressions" and
using "the PCRE library". Saying "libpcre" strongly suggests that we
might be talking about libpcre.so, which is always going to be v1.

Not only do does the documentation now align more clearly with future
plans, but it should be more easily readable to the layman.

This is for preparation for libpcre2 support, which in the future may
be powering the --perl-regexp option by default, depending on how Git
is compiled.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-grep.txt         | 4 ++--
 Documentation/rev-list-options.txt | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 7b52e3fbc4..1c01154dbe 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -161,8 +161,8 @@ OPTIONS
 
 -P::
 --perl-regexp::
-	Use Perl-compatible regexp for patterns. Requires libpcre to be
-	compiled in.
+	Use Perl-compatible regular expressions for patterns. Uses the
+	PCRE library, which Git needs to be compiled against.
 
 -F::
 --fixed-strings::
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 5b7fa49a7f..565cde636e 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -93,8 +93,9 @@ endif::git-rev-list[]
 
 -P::
 --perl-regexp::
-	Consider the limiting patterns to be Perl-compatible regular expressions.
-	Requires libpcre to be compiled in.
+	Consider the limiting patterns to be Perl-compatible regular
+	expressions. Uses the PCRE library, which Git needs to be
+	compiled against.
 
 --remove-empty::
 	Stop when a given path disappears from the tree.
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 08/12] grep: make grep.patternType=[pcre|pcre1] a synonym for "perl"
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
                   ` (6 preceding siblings ...)
  2017-04-08 13:25 ` [PATCH 07/12] grep & rev-list doc: stop promising libpcre " Ævar Arnfjörð Bjarmason
@ 2017-04-08 13:25 ` Ævar Arnfjörð Bjarmason
  2017-04-11 10:30   ` Jeff King
  2017-04-08 13:25 ` [PATCH 09/12] test-lib: rename the LIBPCRE prerequisite to PCRE Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:25 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Make the pattern types "pcre" & "pcre1" synonyms for long-standing
"perl" grep.patternType.

This change is part of a longer patch series to add pcre2 support to
Git. It's nice to be able to performance test PCRE v1 v.s. v2 without
having to recompile git, and doing that via grep.patternType makes
sense.

However, just adding "pcre2" when we only have "perl" would be
confusing, so start by adding a "pcre" & "pcre1" synonym.

In the future "perl" and "pcre" might be changed to default to "pcre2"
instead of "pcre1", and depending on how Git is compiled the more
specific "pcre1" or "pcre2" pattern types might produce an error.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt |  9 +++++++++
 grep.c                   |  4 +++-
 t/t7810-grep.sh          | 10 ++++++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d51..5ef12d0694 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1624,6 +1624,15 @@ grep.patternType::
 	'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`,
 	`--fixed-strings`, or `--perl-regexp` option accordingly, while the
 	value 'default' will return to the default matching behavior.
++
+The 'pcre' and 'pcre1' values are synonyms for 'perl'. The other
+values starting with 'pcre' are reserved for future use, e.g. if we'd
+like to use 'pcre2' for the PCRE v2 library.
++
+In the future 'perl' and 'pcre' might become synonyms for some other
+implementation or PCRE version, such as 'pcre2', while the more
+specific 'pcre1' & 'pcre2' might throw errors depending on whether git
+is compiled to include those libraries.
 
 grep.extendedRegexp::
 	If set to true, enable `--extended-regexp` option by default. This
diff --git a/grep.c b/grep.c
index 8564fe726d..1575f8f9ed 100644
--- a/grep.c
+++ b/grep.c
@@ -60,7 +60,9 @@ static int parse_pattern_type_arg(const char *opt, const char *arg)
 		return GREP_PATTERN_TYPE_ERE;
 	else if (!strcmp(arg, "fixed"))
 		return GREP_PATTERN_TYPE_FIXED;
-	else if (!strcmp(arg, "perl"))
+	else if (!strcmp(arg, "perl") ||
+		 !strcmp(arg, "pcre") ||
+		 !strcmp(arg, "pcre1"))
 		return GREP_PATTERN_TYPE_PCRE;
 	die("bad %s argument: %s", opt, arg);
 }
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 83b0ee53be..b50f1dff43 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1522,4 +1522,14 @@ test_expect_success 'grep with thread options' '
 	test_must_fail git -c grep.threads=1 grep --threads=-1 st.*dio
 '
 
+test_expect_success LIBPCRE "grep with grep.patternType synonyms perl/pcre/pcre1" '
+	echo "#include <stdio.h>" >expected &&
+	git -c grep.patternType=perl  grep -h --no-line-number "st(?=dio)" >actual &&
+	test_cmp expected actual &&
+	git -c grep.patternType=pcre  grep -h --no-line-number "st(?=dio)" >actual &&
+	test_cmp expected actual &&
+	git -c grep.patternType=pcre1 grep -h --no-line-number "st(?=dio)" >actual &&
+	test_cmp expected actual
+'
+
 test_done
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 09/12] test-lib: rename the LIBPCRE prerequisite to PCRE
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
                   ` (7 preceding siblings ...)
  2017-04-08 13:25 ` [PATCH 08/12] grep: make grep.patternType=[pcre|pcre1] a synonym for "perl" Ævar Arnfjörð Bjarmason
@ 2017-04-08 13:25 ` Ævar Arnfjörð Bjarmason
  2017-04-11 10:31   ` Jeff King
  2017-04-08 13:25 ` [PATCH 10/12] grep: change the internal PCRE macro names to be PCRE1 Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:25 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 9757 bytes --]

Rename the LIBPCRE prerequisite to PCRE. This is for preparation for
libpcre2 support, where having just "LIBPCRE" would be confusing as it
implies v1 of the library.

None of these tests are incompatible between versions 1 & 2 of
libpcre, it's less confusing to give them a more general name to make
it clear that they work on both library versions.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/README                        |  4 ++--
 t/t4202-log.sh                  | 10 +++++-----
 t/t7810-grep.sh                 | 30 +++++++++++++++---------------
 t/t7812-grep-icase-non-ascii.sh |  4 ++--
 t/t7813-grep-icase-iso.sh       |  2 +-
 t/test-lib.sh                   |  2 +-
 6 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/t/README b/t/README
index ab386c3681..a90cb62583 100644
--- a/t/README
+++ b/t/README
@@ -803,9 +803,9 @@ use these, and "test_set_prereq" for how to define your own.
    Test is not run by root user, and an attempt to write to an
    unwritable file is expected to fail correctly.
 
- - LIBPCRE
+ - PCRE
 
-   Git was compiled with USE_LIBPCRE=YesPlease. Wrap any tests
+   Git was compiled with support for PCRE. Wrap any tests
    that use git-grep --perl-regexp or git-grep -P in these.
 
  - CASE_INSENSITIVE_FS
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 6f7c74e027..936666bb8a 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -265,7 +265,7 @@ test_expect_success 'log -F -E --grep=<ere> uses ere' '
 	test_cmp expect actual
 '
 
-test_expect_success LIBPCRE 'log -F -E --perl-regexp --grep=<pcre> uses PCRE' '
+test_expect_success PCRE 'log -F -E --perl-regexp --grep=<pcre> uses PCRE' '
 	echo second >expect &&
 	git log -1 --pretty="tformat:%s" -F -E --perl-regexp --grep="s(?=ec)econd" >actual &&
 	test_cmp expect actual
@@ -303,7 +303,7 @@ test_expect_success 'log with various grep.patternType configurations & command-
 			--grep="(.|.)" >actual.basic &&
 		git -c grep.patternType=extended log --pretty=tformat:%s \
 			--grep="\|2" >actual.extended &&
-		if test_have_prereq LIBPCRE
+		if test_have_prereq PCRE
 		then
 			git -c grep.patternType=perl log --pretty=tformat:%s \
 				--grep="\((?=1)" >actual.perl
@@ -311,7 +311,7 @@ test_expect_success 'log with various grep.patternType configurations & command-
 		test_cmp expect.fixed actual.fixed &&
 		test_cmp expect.basic actual.basic &&
 		test_cmp expect.extended actual.extended &&
-		if test_have_prereq LIBPCRE
+		if test_have_prereq PCRE
 		then
 			test_cmp expect.perl actual.perl
 		fi &&
@@ -338,7 +338,7 @@ test_expect_success 'log with various grep.patternType configurations & command-
 			--grep="(.|.)" >actual.basic.long-arg &&
 		git log --pretty=tformat:%s --extended-regexp \
 			--grep="\|2" >actual.extended.long-arg &&
-		if test_have_prereq LIBPCRE
+		if test_have_prereq PCRE
 		then
 			git log --pretty=tformat:%s --perl-regexp \
 				--grep="\((?=1)" >actual.perl.long-arg
@@ -346,7 +346,7 @@ test_expect_success 'log with various grep.patternType configurations & command-
 		test_cmp expect.fixed actual.fixed.long-arg &&
 		test_cmp expect.basic actual.basic.long-arg &&
 		test_cmp expect.extended actual.extended.long-arg &&
-		if test_have_prereq LIBPCRE
+		if test_have_prereq PCRE
 		then
 			test_cmp expect.perl actual.perl.long-arg
 		fi
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index b50f1dff43..46f528183d 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -275,7 +275,7 @@ do
 		test_cmp expected actual
 	'
 
-	test_expect_success LIBPCRE "grep $L with grep.patterntype=perl" '
+	test_expect_success PCRE "grep $L with grep.patterntype=perl" '
 		echo "${HC}ab:a+b*c" >expected &&
 		git -c grep.patterntype=perl grep "a\x{2b}b\x{2a}c" $H ab >actual &&
 		test_cmp expected actual
@@ -1053,12 +1053,12 @@ hello.c:int main(int argc, const char **argv)
 hello.c:	printf("Hello world.\n");
 EOF
 
-test_expect_success LIBPCRE 'grep --perl-regexp pattern' '
+test_expect_success PCRE 'grep --perl-regexp pattern' '
 	git grep --perl-regexp "\p{Ps}.*?\p{Pe}" hello.c >actual &&
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern' '
+test_expect_success PCRE 'grep -P pattern' '
 	git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
 	test_cmp expected actual
 '
@@ -1070,13 +1070,13 @@ test_expect_success 'grep pattern with grep.extendedRegexp=true' '
 	test_cmp empty actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern with grep.extendedRegexp=true' '
+test_expect_success PCRE 'grep -P pattern with grep.extendedRegexp=true' '
 	git -c grep.extendedregexp=true \
 		grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P -v pattern' '
+test_expect_success PCRE 'grep -P -v pattern' '
 	{
 		echo "ab:a+b*c"
 		echo "ab:a+bc"
@@ -1085,7 +1085,7 @@ test_expect_success LIBPCRE 'grep -P -v pattern' '
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P -i pattern' '
+test_expect_success PCRE 'grep -P -i pattern' '
 	cat >expected <<-EOF &&
 	hello.c:	printf("Hello world.\n");
 	EOF
@@ -1093,7 +1093,7 @@ test_expect_success LIBPCRE 'grep -P -i pattern' '
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P -w pattern' '
+test_expect_success PCRE 'grep -P -w pattern' '
 	{
 		echo "hello_world:Hello world"
 		echo "hello_world:HeLLo world"
@@ -1125,11 +1125,11 @@ test_expect_success 'grep invalidpattern properly dies with grep.patternType=ext
 	test_must_fail git -c grep.patterntype=extended grep "a["
 '
 
-test_expect_success LIBPCRE 'grep -P invalidpattern properly dies ' '
+test_expect_success PCRE 'grep -P invalidpattern properly dies ' '
 	test_must_fail git grep -P "a["
 '
 
-test_expect_success LIBPCRE 'grep invalidpattern properly dies with grep.patternType=perl' '
+test_expect_success PCRE 'grep invalidpattern properly dies with grep.patternType=perl' '
 	test_must_fail git -c grep.patterntype=perl grep "a["
 '
 
@@ -1198,13 +1198,13 @@ test_expect_success 'grep pattern with grep.patternType=fixed, =basic, =perl, =e
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -G -F -E -P pattern' '
+test_expect_success PCRE 'grep -G -F -E -P pattern' '
 	echo "d0:0" >expected &&
 	git grep -G -F -E -P "[\d]" d0 >actual &&
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep pattern with grep.patternType=fixed, =basic, =extended, =perl' '
+test_expect_success PCRE 'grep pattern with grep.patternType=fixed, =basic, =extended, =perl' '
 	echo "d0:0" >expected &&
 	git \
 		-c grep.patterntype=fixed \
@@ -1215,7 +1215,7 @@ test_expect_success LIBPCRE 'grep pattern with grep.patternType=fixed, =basic, =
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern with grep.patternType=fixed' '
+test_expect_success PCRE 'grep -P pattern with grep.patternType=fixed' '
 	echo "ab:a+b*c" >expected &&
 	git \
 		-c grep.patterntype=fixed \
@@ -1350,12 +1350,12 @@ space: line with leading space2
 space: line with leading space3
 EOF
 
-test_expect_success LIBPCRE 'grep -E "^ "' '
+test_expect_success PCRE 'grep -E "^ "' '
 	git grep -E "^ " space >actual &&
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P "^ "' '
+test_expect_success PCRE 'grep -P "^ "' '
 	git grep -P "^ " space >actual &&
 	test_cmp expected actual
 '
@@ -1522,7 +1522,7 @@ test_expect_success 'grep with thread options' '
 	test_must_fail git -c grep.threads=1 grep --threads=-1 st.*dio
 '
 
-test_expect_success LIBPCRE "grep with grep.patternType synonyms perl/pcre/pcre1" '
+test_expect_success PCRE "grep with grep.patternType synonyms perl/pcre/pcre1" '
 	echo "#include <stdio.h>" >expected &&
 	git -c grep.patternType=perl  grep -h --no-line-number "st(?=dio)" >actual &&
 	test_cmp expected actual &&
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index 169fd8d706..04a61cb8e0 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -20,13 +20,13 @@ test_expect_success REGEX_LOCALE 'grep literal string, no -F' '
 	git grep -i "TILRAUN: HALLÓ HEIMUR!"
 '
 
-test_expect_success GETTEXT_LOCALE,LIBPCRE 'grep pcre utf-8 icase' '
+test_expect_success GETTEXT_LOCALE,PCRE 'grep pcre utf-8 icase' '
 	git grep --perl-regexp    "TILRAUN: H.lló Heimur!" &&
 	git grep --perl-regexp -i "TILRAUN: H.lló Heimur!" &&
 	git grep --perl-regexp -i "TILRAUN: H.LLÓ HEIMUR!"
 '
 
-test_expect_success GETTEXT_LOCALE,LIBPCRE 'grep pcre utf-8 string with "+"' '
+test_expect_success GETTEXT_LOCALE,PCRE 'grep pcre utf-8 string with "+"' '
 	test_write_lines "TILRAUN: Hallóó Heimur!" >file2 &&
 	git add file2 &&
 	git grep -l --perl-regexp "TILRAUN: H.lló+ Heimur!" >actual &&
diff --git a/t/t7813-grep-icase-iso.sh b/t/t7813-grep-icase-iso.sh
index efef7fb81f..701e08a8e5 100755
--- a/t/t7813-grep-icase-iso.sh
+++ b/t/t7813-grep-icase-iso.sh
@@ -11,7 +11,7 @@ test_expect_success GETTEXT_ISO_LOCALE 'setup' '
 	export LC_ALL
 '
 
-test_expect_success GETTEXT_ISO_LOCALE,LIBPCRE 'grep pcre string' '
+test_expect_success GETTEXT_ISO_LOCALE,PCRE 'grep pcre string' '
 	git grep --perl-regexp -i "TILRAUN: H.lló Heimur!" &&
 	git grep --perl-regexp -i "TILRAUN: H.LLÓ HEIMUR!"
 '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 13b5696822..6abf1d1918 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1010,7 +1010,7 @@ esac
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
-test -n "$USE_LIBPCRE" && test_set_prereq LIBPCRE
+test -n "$USE_LIBPCRE" && test_set_prereq PCRE
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
 
 # Can we rely on git's output in the C locale?
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 10/12] grep: change the internal PCRE macro names to be PCRE1
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
                   ` (8 preceding siblings ...)
  2017-04-08 13:25 ` [PATCH 09/12] test-lib: rename the LIBPCRE prerequisite to PCRE Ævar Arnfjörð Bjarmason
@ 2017-04-08 13:25 ` Ævar Arnfjörð Bjarmason
  2017-04-11 10:35   ` Jeff King
  2017-04-08 13:25 ` [PATCH 11/12] grep: change the internal PCRE code & header " Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:25 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Change the internal USE_LIBPCRE define, & build options flag to use a
naming convention ending in PCRE1, without changing the long-standing
USE_LIBPCRE Makefile flag which enables this code.

This is for preparation for libpcre2 support where having things like
USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely
need to for backwards compatibility with old Makefile arguments would
be confusing.

In some ways it would be better to change everything that now uses
USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying
USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time
burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their
build scripts.

However I'd like to leave the door open to making
USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease,
i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it
the default.

This code and the USE_LIBPCRE Makefile argument was added in commit
63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was
no indication that the PCRE project would release an entirely new &
incompatible API around 3 years later.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile      | 4 ++--
 grep.c        | 6 +++---
 grep.h        | 2 +-
 t/test-lib.sh | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index 23945d87cf..c8a26087e3 100644
--- a/Makefile
+++ b/Makefile
@@ -1084,7 +1084,7 @@ ifdef NO_LIBGEN_H
 endif
 
 ifdef USE_LIBPCRE
-	BASIC_CFLAGS += -DUSE_LIBPCRE
+	BASIC_CFLAGS += -DUSE_LIBPCRE1
 	ifdef LIBPCREDIR
 		BASIC_CFLAGS += -I$(LIBPCREDIR)/include
 		EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
@@ -2235,7 +2235,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo TAR=\''$(subst ','\'',$(subst ','\'',$(TAR)))'\' >>$@+
 	@echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
 	@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
-	@echo USE_LIBPCRE=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
+	@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
diff --git a/grep.c b/grep.c
index 1575f8f9ed..99b9e9447f 100644
--- a/grep.c
+++ b/grep.c
@@ -325,7 +325,7 @@ static NORETURN void compile_regexp_failed(const struct grep_pat *p,
 	die("%s'%s': %s", where, p->pattern, error);
 }
 
-#ifdef USE_LIBPCRE
+#ifdef USE_LIBPCRE1
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
 	const char *error;
@@ -377,7 +377,7 @@ static void free_pcre_regexp(struct grep_pat *p)
 	pcre_free(p->pcre_extra_info);
 	pcre_free((void *)p->pcre_tables);
 }
-#else /* !USE_LIBPCRE */
+#else /* !USE_LIBPCRE1 */
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
 	die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");
@@ -392,7 +392,7 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
 static void free_pcre_regexp(struct grep_pat *p)
 {
 }
-#endif /* !USE_LIBPCRE */
+#endif /* !USE_LIBPCRE1 */
 
 static int is_fixed(const char *s, size_t len)
 {
diff --git a/grep.h b/grep.h
index 267534ca24..073b0e4c92 100644
--- a/grep.h
+++ b/grep.h
@@ -1,7 +1,7 @@
 #ifndef GREP_H
 #define GREP_H
 #include "color.h"
-#ifdef USE_LIBPCRE
+#ifdef USE_LIBPCRE1
 #include <pcre.h>
 #else
 typedef int pcre;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 6abf1d1918..e5cfbcc36b 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1010,7 +1010,7 @@ esac
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
-test -n "$USE_LIBPCRE" && test_set_prereq PCRE
+test -n "$USE_LIBPCRE1" && test_set_prereq PCRE
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
 
 # Can we rely on git's output in the C locale?
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 11/12] grep: change the internal PCRE code & header names to be PCRE1
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
                   ` (9 preceding siblings ...)
  2017-04-08 13:25 ` [PATCH 10/12] grep: change the internal PCRE macro names to be PCRE1 Ævar Arnfjörð Bjarmason
@ 2017-04-08 13:25 ` Ævar Arnfjörð Bjarmason
  2017-04-11 10:37   ` Jeff King
  2017-04-08 13:25 ` [PATCH 12/12] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
  2017-04-11 10:47 ` [PATCH 00/12] PCREv2 & more Jeff King
  12 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:25 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

Change the internal PCRE variable & function names to have a "1"
suffix. This is for preparation for libpcre2 support, where having
non-versioned names would be confusing.

The earlier "grep: change the internal PCRE macro names to be PCRE1"
change elaborates on the motivations behind this commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/grep.c |  4 ++--
 grep.c         | 56 ++++++++++++++++++++++++++++----------------------------
 grep.h         | 10 +++++-----
 revision.c     |  2 +-
 4 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 9478ab5dff..dffb9743b8 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -490,7 +490,7 @@ static void compile_submodule_options(const struct grep_opt *opt,
 	case GREP_PATTERN_TYPE_FIXED:
 		argv_array_push(&submodule_options, "-F");
 		break;
-	case GREP_PATTERN_TYPE_PCRE:
+	case GREP_PATTERN_TYPE_PCRE1:
 		argv_array_push(&submodule_options, "-P");
 		break;
 	case GREP_PATTERN_TYPE_UNSPECIFIED:
@@ -1036,7 +1036,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			    GREP_PATTERN_TYPE_FIXED),
 		OPT_SET_INT('P', "perl-regexp", &pattern_type_arg,
 			    N_("use Perl-compatible regular expressions"),
-			    GREP_PATTERN_TYPE_PCRE),
+			    GREP_PATTERN_TYPE_PCRE1),
 		OPT_GROUP(""),
 		OPT_BOOL('n', "line-number", &opt.linenum, N_("show line numbers")),
 		OPT_NEGBIT('h', NULL, &opt.pathname, N_("don't show filenames"), 1),
diff --git a/grep.c b/grep.c
index 99b9e9447f..ac7d6f9bbf 100644
--- a/grep.c
+++ b/grep.c
@@ -63,7 +63,7 @@ static int parse_pattern_type_arg(const char *opt, const char *arg)
 	else if (!strcmp(arg, "perl") ||
 		 !strcmp(arg, "pcre") ||
 		 !strcmp(arg, "pcre1"))
-		return GREP_PATTERN_TYPE_PCRE;
+		return GREP_PATTERN_TYPE_PCRE1;
 	die("bad %s argument: %s", opt, arg);
 }
 
@@ -180,25 +180,25 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 
 	case GREP_PATTERN_TYPE_BRE:
 		opt->fixed = 0;
-		opt->pcre = 0;
+		opt->pcre1 = 0;
 		opt->regflags &= ~REG_EXTENDED;
 		break;
 
 	case GREP_PATTERN_TYPE_ERE:
 		opt->fixed = 0;
-		opt->pcre = 0;
+		opt->pcre1 = 0;
 		opt->regflags |= REG_EXTENDED;
 		break;
 
 	case GREP_PATTERN_TYPE_FIXED:
 		opt->fixed = 1;
-		opt->pcre = 0;
+		opt->pcre1 = 0;
 		opt->regflags &= ~REG_EXTENDED;
 		break;
 
-	case GREP_PATTERN_TYPE_PCRE:
+	case GREP_PATTERN_TYPE_PCRE1:
 		opt->fixed = 0;
-		opt->pcre = 1;
+		opt->pcre1 = 1;
 		break;
 	}
 }
@@ -326,7 +326,7 @@ static NORETURN void compile_regexp_failed(const struct grep_pat *p,
 }
 
 #ifdef USE_LIBPCRE1
-static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
+static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
 	const char *error;
 	int erroffset;
@@ -334,23 +334,23 @@ static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 
 	if (opt->ignore_case) {
 		if (has_non_ascii(p->pattern))
-			p->pcre_tables = pcre_maketables();
+			p->pcre1_tables = pcre_maketables();
 		options |= PCRE_CASELESS;
 	}
 	if (is_utf8_locale() && has_non_ascii(p->pattern))
 		options |= PCRE_UTF8;
 
-	p->pcre_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
-				      p->pcre_tables);
-	if (!p->pcre_regexp)
+	p->pcre1_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
+				      p->pcre1_tables);
+	if (!p->pcre1_regexp)
 		compile_regexp_failed(p, error);
 
-	p->pcre_extra_info = pcre_study(p->pcre_regexp, 0, &error);
-	if (!p->pcre_extra_info && error)
+	p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error);
+	if (!p->pcre1_extra_info && error)
 		die("%s", error);
 }
 
-static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
+static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 		regmatch_t *match, int eflags)
 {
 	int ovector[30], ret, flags = 0;
@@ -358,7 +358,7 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
 	if (eflags & REG_NOTBOL)
 		flags |= PCRE_NOTBOL;
 
-	ret = pcre_exec(p->pcre_regexp, p->pcre_extra_info, line, eol - line,
+	ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
 			0, flags, ovector, ARRAY_SIZE(ovector));
 	if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
 		die("pcre_exec failed with error code %d", ret);
@@ -371,25 +371,25 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
 	return ret;
 }
 
-static void free_pcre_regexp(struct grep_pat *p)
+static void free_pcre1_regexp(struct grep_pat *p)
 {
-	pcre_free(p->pcre_regexp);
-	pcre_free(p->pcre_extra_info);
-	pcre_free((void *)p->pcre_tables);
+	pcre_free(p->pcre1_regexp);
+	pcre_free(p->pcre1_extra_info);
+	pcre_free((void *)p->pcre1_tables);
 }
 #else /* !USE_LIBPCRE1 */
-static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
+static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
 	die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");
 }
 
-static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
+static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 		regmatch_t *match, int eflags)
 {
 	return 1;
 }
 
-static void free_pcre_regexp(struct grep_pat *p)
+static void free_pcre1_regexp(struct grep_pat *p)
 {
 }
 #endif /* !USE_LIBPCRE1 */
@@ -476,8 +476,8 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		return;
 	}
 
-	if (opt->pcre) {
-		compile_pcre_regexp(p, opt);
+	if (opt->pcre1) {
+		compile_pcre1_regexp(p, opt);
 		return;
 	}
 
@@ -833,8 +833,8 @@ void free_grep_patterns(struct grep_opt *opt)
 		case GREP_PATTERN_BODY:
 			if (p->kws)
 				kwsfree(p->kws);
-			else if (p->pcre_regexp)
-				free_pcre_regexp(p);
+			else if (p->pcre1_regexp)
+				free_pcre1_regexp(p);
 			else
 				regfree(&p->regexp);
 			free(p->pattern);
@@ -913,8 +913,8 @@ static int patmatch(struct grep_pat *p, char *line, char *eol,
 
 	if (p->fixed)
 		hit = !fixmatch(p, line, eol, match);
-	else if (p->pcre_regexp)
-		hit = !pcrematch(p, line, eol, match, eflags);
+	else if (p->pcre1_regexp)
+		hit = !pcre1match(p, line, eol, match, eflags);
 	else
 		hit = !regexec_buf(&p->regexp, line, eol - line, 1, match,
 				   eflags);
diff --git a/grep.h b/grep.h
index 073b0e4c92..fa2ab9485f 100644
--- a/grep.h
+++ b/grep.h
@@ -46,9 +46,9 @@ struct grep_pat {
 	size_t patternlen;
 	enum grep_header_field field;
 	regex_t regexp;
-	pcre *pcre_regexp;
-	pcre_extra *pcre_extra_info;
-	const unsigned char *pcre_tables;
+	pcre *pcre1_regexp;
+	pcre_extra *pcre1_extra_info;
+	const unsigned char *pcre1_tables;
 	kwset_t kws;
 	unsigned fixed:1;
 	unsigned ignore_case:1;
@@ -68,7 +68,7 @@ enum grep_pattern_type {
 	GREP_PATTERN_TYPE_BRE,
 	GREP_PATTERN_TYPE_ERE,
 	GREP_PATTERN_TYPE_FIXED,
-	GREP_PATTERN_TYPE_PCRE
+	GREP_PATTERN_TYPE_PCRE1
 };
 
 struct grep_expr {
@@ -111,7 +111,7 @@ struct grep_opt {
 	int allow_textconv;
 	int extended;
 	int use_reflog_filter;
-	int pcre;
+	int pcre1;
 	int relative;
 	int pathname;
 	int null_following_name;
diff --git a/revision.c b/revision.c
index 03a3a012de..7a10a8570a 100644
--- a/revision.c
+++ b/revision.c
@@ -1996,7 +1996,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;
 	} else if (!strcmp(arg, "--perl-regexp") || !strcmp(arg, "-P")) {
-		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE;
+		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE1;
 	} else if (!strcmp(arg, "--all-match")) {
 		revs->grep_filter.all_match = 1;
 	} else if (!strcmp(arg, "--invert-grep")) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 12/12] grep: add support for PCRE v2
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
                   ` (10 preceding siblings ...)
  2017-04-08 13:25 ` [PATCH 11/12] grep: change the internal PCRE code & header " Ævar Arnfjörð Bjarmason
@ 2017-04-08 13:25 ` Ævar Arnfjörð Bjarmason
  2017-04-11 10:43   ` Jeff King
  2017-04-11 10:47 ` [PATCH 00/12] PCREv2 & more Jeff King
  12 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-08 13:25 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 26053 bytes --]

Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1]. The regular expression syntax is
the same, but while similar-ish requires a different codepath to
support it.

Git can now be compiled with any combination of
USE_LIBPCRE=[YesPlease|] & USE_LIBPCRE2=[YesPlease|]. If both are
provided the version of the PCRE library can be selected at runtime
with grep.PatternType, but the default (for now) is v1. This table
shows what the various combinations do depending on what libraries Git
is compiled against:

    |------------------+-----+-----+----------|
    | grep.PatternType | v1  | v2  | v1 && v2 |
    |------------------+-----+-----+----------|
    | perl             | v1  | v2  | v1       |
    | pcre             | v1  | v2  | v1       |
    | pcre1            | v1  | ERR | v1       |
    | pcre2            | ERR | v2  | v2       |
    |------------------+-----+-----+----------|

When Git is only compiled with v2 grep.PatternType=perl, --perl-regexp
& -P will use v2. All tests pass with this new PCRE version. When Git
is compiled with both v1 & v2 most of the tests will only test v1, but
there are some v2-specific tests that will be run.

Originally I started work on this series because my ad-hoc patch to
support v2 ("Very promising results with libpcre2",
<CACBZZX6FcHcY7cYs6s_pv=E43cHNmzyUY5wrcuhPWWmUixCL+g@mail.gmail.com>)
showed very promising performance results. E.g. grepping a rare string
on the linux.git tree for 50 iterations gives[2]:

             s/iter    basic extended    pcre1    fixed    pcre2
    basic      2.31       --      -5%      -6%     -36%     -46%
    extended   2.20       5%       --      -1%     -32%     -44%
    pcre1      2.17       6%       1%       --     -32%     -43%
    fixed      1.49      56%      48%      46%       --     -17%
    pcre2      1.24      87%      77%      75%      20%       --

I.e. PCRE v2 is around 20% faster than the previously fastest codepath
to grep for a fixed string, while grepping for a similar regex. A
similar test for 'log --grep' is less promising[3]:

             s/iter extended    basic    pcre2    pcre1    fixed
    extended   8.36       --      -0%      -1%      -6%     -11%
    basic      8.35       0%       --      -1%      -5%     -11%
    pcre2      8.26       1%       1%       --      -4%     -10%
    pcre1      7.90       6%       6%       5%       --      -6%
    fixed      7.44      12%      12%      11%       6%       --

Here PCRE v2 is around 5% slower than v1. I don't know why that
is. Perhaps PCRE v2 is worse at matching shorter strings, commit
messages tend to be shorter than files.

I eventually found out though that the main difference in v1 and v2
performance is because v2 is using the JIT feature, but we never got
around to doing that for v1. The two benchmarks above run without v2
JIT are for "grep" & "log", respectively:

             s/iter    basic extended    pcre1    pcre2    fixed
    basic      2.23       --      -0%      -1%     -15%     -31%
    extended   2.22       0%       --      -1%     -15%     -31%
    pcre1      2.21       1%       1%       --     -15%     -31%
    pcre2      1.89      18%      18%      17%       --     -19%
    fixed      1.53      46%      45%      44%      23%       --

             s/iter    basic    pcre2 extended    pcre1    fixed
    basic      10.9       --     -21%     -24%     -28%     -33%
    pcre2      8.63      26%       --      -4%      -9%     -16%
    extended   8.30      31%       4%       --      -5%     -12%
    pcre1      7.88      38%      10%       5%       --      -8%
    fixed      7.28      50%      19%      14%       8%       --

Here v2 still outperforms v1 on the "grep" test, but by 15% instead of
43%, v1 is still faster on "log", but by 9% instead of 4%.

When both v1 and v2 use JIT v1 will ever so slightly beat v2 by 1% on
the "grep" test:

             s/iter    basic extended    fixed    pcre2    pcre1
    basic      2.22       --      -0%     -33%     -44%     -45%
    extended   2.21       0%       --     -33%     -44%     -45%
    fixed      1.49      49%      49%       --     -17%     -18%
    pcre2      1.24      80%      79%      20%       --      -1%
    pcre1      1.22      82%      81%      22%       1%       --

But v1 is 10% faster than v2 for "log" when both have JIT, as opposed
to 4% faster when v2 has JIT and v1 doesn't:

         s/iter    basic extended    pcre2    pcre1    fixed
basic      9.43       --     -11%     -12%     -21%     -22%
extended   8.39      12%       --      -2%     -11%     -12%
pcre2      8.26      14%       2%       --     -10%     -11%
pcre1      7.45      27%      13%      11%       --      -1%
fixed      7.37      28%      14%      12%       1%       --

The v1 JIT support uses a patch that isn't part of this series
yet. Supporting it is slightly tricker than for v2 because we need to
support PCRE v1 versions that don't have JIT support, the "SIMPLE USE
OF JIT" section in pcrejit(3) documents how to do that.

The PCRE performance improvements suggest that there are numerous
other follow-up projects on grep.c which could yield better
performance:

 - Disable the "fast" path for fixed patterns in compile_regexp() when
   PCRE is being used, at least in some cases. It was added in commit
   9eceddeec6 ("Use kwset in grep", 2011-08-21) as an optimization,
   but clearly results in taking a slower path with PCRE in some
   cases.

 - Similarly -F on a pattern that !is_fixed() (e.g. -F '[f]oo') might
   be better of escaping the regex characters & feeding them to PCRE
   than using kwset.

 - Other things could be farmed out to the regex engine rather than
   being handled by our own custom code. E.g. the support for -w in
   match_one_pattern() added in commit 7839a25eab ("builtin-grep:
   support -w (--word-regexp).", 2006-05-02) can instead by done by
   munging the PATTERN to \bPATTERN\b.

   This would make it easy to support e.g. -x by similarly prefixing
   the line with "^(?:" & suffixing it with ")$". See how pcre2grep.c
   handles -w, -x & -F by simply munging the pattern.

 - The look-ahead support added in commit a26345b608 ("grep: optimize
   built-in grep by skipping lines that do not hit", 2010-01-10) might
   need tweaking to either match in smaller or bigger
   batches. Possibly with some heuristic that tries to detect if the
   pattern has been matching a lot ("." v.s. "rare.*string") so far.

A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.

See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.

1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html

2. PF=~/g/git/ perl -MBenchmark=cmpthese -wE 'cmpthese(50, { fixed => sub { system "$ENV{PF}git grep -F avarasu >/dev/null" }, basic => sub { system "$ENV{PF}git grep -G avara?su >/dev/null" }, extended => sub { system "$ENV{PF}git grep -E avara?su >/dev/null" }, pcre1 => sub { system "$ENV{PF}git -c grep.patternType=pcre1 grep avara?su >/dev/null" }, pcre2 => sub { system "$ENV{PF}git -c grep.patternType=pcre2 grep avara?su >/dev/null" } })'

3. PF=~/g/git/ perl -MBenchmark=cmpthese -wE 'cmpthese(10, { fixed => sub { system "$ENV{PF}git log -F --grep=avarasu >/dev/null" }, basic => sub { system "$ENV{PF}git log --basic-regexp --grep=avara?su >/dev/null" }, extended => sub { system "$ENV{PF}git log --extended-regexp --grep=avara?su >/dev/null" }, pcre1 => sub { system "$ENV{PF}git -c grep.patternType=pcre1 log --grep=avara?su >/dev/null" }, pcre2 => sub { system "$ENV{PF}git -c grep.patternType=pcre2 log --grep=avara?su >/dev/null" } })'

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt  |  14 +++---
 Makefile                  |  18 +++++++
 builtin/grep.c            |   4 +-
 configure.ac              |  49 +++++++++++++++++-
 grep.c                    | 125 +++++++++++++++++++++++++++++++++++++++++++++-
 grep.h                    |  16 +++++-
 revision.c                |   2 +-
 t/README                  |  12 +++++
 t/t7810-grep.sh           |  30 +++++++++--
 t/t7813-grep-icase-iso.sh |  11 ++--
 t/test-lib.sh             |   4 +-
 11 files changed, 261 insertions(+), 24 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5ef12d0694..a5fc482495 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1625,14 +1625,12 @@ grep.patternType::
 	`--fixed-strings`, or `--perl-regexp` option accordingly, while the
 	value 'default' will return to the default matching behavior.
 +
-The 'pcre' and 'pcre1' values are synonyms for 'perl'. The other
-values starting with 'pcre' are reserved for future use, e.g. if we'd
-like to use 'pcre2' for the PCRE v2 library.
-+
-In the future 'perl' and 'pcre' might become synonyms for some other
-implementation or PCRE version, such as 'pcre2', while the more
-specific 'pcre1' & 'pcre2' might throw errors depending on whether git
-is compiled to include those libraries.
+The 'perl' and 'pcre' values are synonyms. Depending on which PCRE
+library Git was compiled with either or both of 'pcre1' and 'pcre2'
+might also be available.
++
+If both are available Git currently defaults to 'pcre1', but this
+might change in future versions.
 
 grep.extendedRegexp::
 	If set to true, enable `--extended-regexp` option by default. This
diff --git a/Makefile b/Makefile
index c8a26087e3..bf8db4bae9 100644
--- a/Makefile
+++ b/Makefile
@@ -32,6 +32,14 @@ all::
 # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
 #
+# Define USE_LIBPCRE if you have and want to use libpcre2. Various
+# commands such as like log, grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
+#
+# Define LIBPCRE2DIR=/foo/bar if your libpcre2 header and library
+# files are in /foo/bar/include and /foo/bar/lib directories.
+#
 # Define HAVE_ALLOCA_H if you have working alloca(3) defined in that header.
 #
 # Define NO_CURL if you do not have libcurl installed.  git-http-fetch and
@@ -1092,6 +1100,15 @@ ifdef USE_LIBPCRE
 	EXTLIBS += -lpcre
 endif
 
+ifdef USE_LIBPCRE2
+	BASIC_CFLAGS += -DUSE_LIBPCRE2
+	ifdef LIBPCRE2DIR
+		BASIC_CFLAGS += -I$(LIBPCREDIR)/include
+		EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
+	endif
+	EXTLIBS += -lpcre2-8
+endif
+
 ifdef HAVE_ALLOCA_H
 	BASIC_CFLAGS += -DHAVE_ALLOCA_H
 endif
@@ -2236,6 +2253,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
 	@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
 	@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
+	@echo USE_LIBPCRE2=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2)))'\' >>$@+
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
diff --git a/builtin/grep.c b/builtin/grep.c
index dffb9743b8..9478ab5dff 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -490,7 +490,7 @@ static void compile_submodule_options(const struct grep_opt *opt,
 	case GREP_PATTERN_TYPE_FIXED:
 		argv_array_push(&submodule_options, "-F");
 		break;
-	case GREP_PATTERN_TYPE_PCRE1:
+	case GREP_PATTERN_TYPE_PCRE:
 		argv_array_push(&submodule_options, "-P");
 		break;
 	case GREP_PATTERN_TYPE_UNSPECIFIED:
@@ -1036,7 +1036,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			    GREP_PATTERN_TYPE_FIXED),
 		OPT_SET_INT('P', "perl-regexp", &pattern_type_arg,
 			    N_("use Perl-compatible regular expressions"),
-			    GREP_PATTERN_TYPE_PCRE1),
+			    GREP_PATTERN_TYPE_PCRE),
 		OPT_GROUP(""),
 		OPT_BOOL('n', "line-number", &opt.linenum, N_("show line numbers")),
 		OPT_NEGBIT('h', NULL, &opt.pathname, N_("don't show filenames"), 1),
diff --git a/configure.ac b/configure.ac
index d09a204a7e..6c99c4c285 100644
--- a/configure.ac
+++ b/configure.ac
@@ -259,8 +259,8 @@ GIT_PARSE_WITH([openssl]))
 # /foo/bar/include and /foo/bar/lib directories.
 #
 AC_ARG_WITH(libpcre,
-AS_HELP_STRING([--with-libpcre],[support Perl-compatible regexes (default is NO)])
-AS_HELP_STRING([],           [ARG can be also prefix for libpcre library and headers]),
+AS_HELP_STRING([--with-libpcre],[support Perl-compatible regexes via libpcre1 (default is NO)])
+AS_HELP_STRING([],           [ARG can be also prefix for libpcre1 library and headers]),
     if test "$withval" = "no"; then
 	USE_LIBPCRE=
     elif test "$withval" = "yes"; then
@@ -273,6 +273,30 @@ AS_HELP_STRING([],           [ARG can be also prefix for libpcre library and hea
         dnl it yet.
 	GIT_CONF_SUBST([LIBPCREDIR])
     fi)
+
+# Define USE_LIBPCRE2 if you have and want to use libpcre2. Various
+# commands such as like log, grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
+#
+# Define LIBPCR2EDIR=/foo/bar if your libpcre2 header and library
+# files are in /foo/bar/include and /foo/bar/lib directories.
+#
+AC_ARG_WITH(libpcre2,
+AS_HELP_STRING([--with-libpcre2],[support Perl-compatible regexes via libpcre2 (default is NO)])
+AS_HELP_STRING([],           [ARG can be also prefix for libpcre library and headers]),
+    if test "$withval" = "no"; then
+	USE_LIBPCRE2=
+    elif test "$withval" = "yes"; then
+	USE_LIBPCRE2=YesPlease
+    else
+	USE_LIBPCRE2=YesPlease
+	LIBPCRE2DIR=$withval
+	AC_MSG_NOTICE([Setting LIBPCRE2DIR to $LIBPCREDIR])
+        dnl USE_LIBPCRE2 can still be modified below, so don't substitute
+        dnl it yet.
+	GIT_CONF_SUBST([LIBPCRE2DIR])
+    fi)
 #
 # Define HAVE_ALLOCA_H if you have working alloca(3) defined in that header.
 AC_FUNC_ALLOCA
@@ -522,6 +546,27 @@ GIT_CONF_SUBST([USE_LIBPCRE])
 fi
 
 #
+# Define USE_LIBPCRE2 if you have and want to use libpcre. Various
+# commands such as like log, grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
+#
+
+if test -n "$USE_LIBPCRE2"; then
+
+GIT_STASH_FLAGS($LIBPCRE2DIR)
+
+AC_CHECK_LIB([pcre2-8], [pcre2_config_8],
+[USE_LIBPCRE2=YesPlease],
+[USE_LIBPCRE2=])
+
+GIT_UNSTASH_FLAGS($LIBPCRE2DIR)
+
+GIT_CONF_SUBST([USE_LIBPCRE2])
+
+fi
+
+#
 # Define NO_CURL if you do not have libcurl installed.  git-http-pull and
 # git-http-push are not built, and you cannot use http:// and https://
 # transports.
diff --git a/grep.c b/grep.c
index ac7d6f9bbf..1797807f9c 100644
--- a/grep.c
+++ b/grep.c
@@ -61,9 +61,12 @@ static int parse_pattern_type_arg(const char *opt, const char *arg)
 	else if (!strcmp(arg, "fixed"))
 		return GREP_PATTERN_TYPE_FIXED;
 	else if (!strcmp(arg, "perl") ||
-		 !strcmp(arg, "pcre") ||
-		 !strcmp(arg, "pcre1"))
+		 !strcmp(arg, "pcre"))
+		return GREP_PATTERN_TYPE_PCRE;
+	else if (!strcmp(arg, "pcre1"))
 		return GREP_PATTERN_TYPE_PCRE1;
+	else if (!strcmp(arg, "pcre2"))
+		return GREP_PATTERN_TYPE_PCRE2;
 	die("bad %s argument: %s", opt, arg);
 }
 
@@ -181,24 +184,48 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 	case GREP_PATTERN_TYPE_BRE:
 		opt->fixed = 0;
 		opt->pcre1 = 0;
+		opt->pcre2 = 0;
 		opt->regflags &= ~REG_EXTENDED;
 		break;
 
 	case GREP_PATTERN_TYPE_ERE:
 		opt->fixed = 0;
 		opt->pcre1 = 0;
+		opt->pcre2 = 0;
 		opt->regflags |= REG_EXTENDED;
 		break;
 
 	case GREP_PATTERN_TYPE_FIXED:
 		opt->fixed = 1;
 		opt->pcre1 = 0;
+		opt->pcre2 = 0;
 		opt->regflags &= ~REG_EXTENDED;
 		break;
 
+	case GREP_PATTERN_TYPE_PCRE:
+		opt->fixed = 0;
+		/* We default to pcre1 in the prescience of both
+		 * versions. This may change in future versions.
+		 */
+#ifdef USE_LIBPCRE1
+		opt->pcre1 = 1;
+		opt->pcre2 = 0;
+#elif USE_LIBPCRE2
+		opt->pcre1 = 0;
+		opt->pcre2 = 1;
+#endif
+		break;
+
 	case GREP_PATTERN_TYPE_PCRE1:
 		opt->fixed = 0;
 		opt->pcre1 = 1;
+		opt->pcre2 = 0;
+		break;
+
+	case GREP_PATTERN_TYPE_PCRE2:
+		opt->fixed = 0;
+		opt->pcre1 = 0;
+		opt->pcre2 = 1;
 		break;
 	}
 }
@@ -394,6 +421,93 @@ static void free_pcre1_regexp(struct grep_pat *p)
 }
 #endif /* !USE_LIBPCRE1 */
 
+#ifdef USE_LIBPCRE2
+static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
+{
+	int error;
+	PCRE2_UCHAR errbuf[256];
+	PCRE2_SIZE erroffset;
+	int options = PCRE2_MULTILINE;
+	const uint8_t *character_tables = NULL;
+	uint32_t canjit;
+	int jitret;
+
+	p->pcre2_ccontext = NULL;
+
+	if (opt->ignore_case) {
+		if (has_non_ascii(p->pattern)) {
+			character_tables = pcre2_maketables(NULL);
+			p->pcre2_ccontext = pcre2_compile_context_create(NULL);
+			pcre2_set_character_tables(p->pcre2_ccontext, character_tables);
+		}
+		options |= PCRE2_CASELESS;
+	}
+	if (is_utf8_locale() && has_non_ascii(p->pattern))
+		options |= PCRE2_UTF;
+
+	p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
+					 p->patternlen, options, &error, &erroffset,
+					 p->pcre2_ccontext);
+	if (!p->pcre2_pattern) {
+		pcre2_get_error_message(error, errbuf, sizeof(errbuf));
+		compile_regexp_failed(p, (const char *)&errbuf);
+	}
+
+	pcre2_config(PCRE2_CONFIG_JIT, &canjit);
+	if (canjit == 1) {
+		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
+		if (jitret)
+			die("BUG: Couldn't JIT the PCRE pattern '%s', got '%d'\n", p->pattern, jitret);
+	}
+}
+
+static int pcre2match(struct grep_pat *p, const char *line, const char *eol,
+		regmatch_t *match, int eflags)
+{
+	int ret, flags = 0;
+	PCRE2_SIZE *ovector;
+
+	if (eflags & REG_NOTBOL)
+		flags |= PCRE2_NOTBOL;
+
+	p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL);
+	ret = pcre2_match(p->pcre2_pattern, (unsigned char *)line, eol - line,
+			  0, flags, p->pcre2_match_data, NULL);
+	if (ret < 0 && ret != PCRE2_ERROR_NOMATCH)
+		die("pcre_exec failed with error code %d", ret);
+	if (ret > 0) {
+		ovector = pcre2_get_ovector_pointer(p->pcre2_match_data);
+		ret = 0;
+		match->rm_so = (int)ovector[0];
+		match->rm_eo = (int)ovector[1];
+	}
+
+	return ret;
+}
+
+static void free_pcre2_pattern(struct grep_pat *p)
+{
+	pcre2_code_free(p->pcre2_pattern);
+	pcre2_match_data_free(p->pcre2_match_data);
+	pcre2_compile_context_free(p->pcre2_ccontext);
+}
+#else /* !USE_LIBPCRE2 */
+static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
+{
+	die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE2");
+}
+
+static int pcre2match(struct grep_pat *p, const char *line, const char *eol,
+		regmatch_t *match, int eflags)
+{
+	return 1;
+}
+
+static void free_pcre2_pattern(struct grep_pat *p)
+{
+}
+#endif /* !USE_LIBPCRE2 */
+
 static int is_fixed(const char *s, size_t len)
 {
 	size_t i;
@@ -481,6 +595,11 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		return;
 	}
 
+	if (opt->pcre2) {
+		compile_pcre2_pattern(p, opt);
+		return;
+	}
+
 	err = regcomp(&p->regexp, p->pattern, opt->regflags);
 	if (err) {
 		char errbuf[1024];
@@ -915,6 +1034,8 @@ static int patmatch(struct grep_pat *p, char *line, char *eol,
 		hit = !fixmatch(p, line, eol, match);
 	else if (p->pcre1_regexp)
 		hit = !pcre1match(p, line, eol, match, eflags);
+	else if (p->pcre2_pattern)
+		hit = !pcre2match(p, line, eol, match, eflags);
 	else
 		hit = !regexec_buf(&p->regexp, line, eol - line, 1, match,
 				   eflags);
diff --git a/grep.h b/grep.h
index fa2ab9485f..5c7f4dc799 100644
--- a/grep.h
+++ b/grep.h
@@ -7,6 +7,14 @@
 typedef int pcre;
 typedef int pcre_extra;
 #endif
+#ifdef USE_LIBPCRE2
+#define PCRE2_CODE_UNIT_WIDTH 8
+#include <pcre2.h>
+#else
+typedef int pcre2_code;
+typedef int pcre2_match_data;
+typedef int pcre2_compile_context;
+#endif
 #include "kwset.h"
 #include "thread-utils.h"
 #include "userdiff.h"
@@ -49,6 +57,9 @@ struct grep_pat {
 	pcre *pcre1_regexp;
 	pcre_extra *pcre1_extra_info;
 	const unsigned char *pcre1_tables;
+	pcre2_code *pcre2_pattern;
+	pcre2_match_data *pcre2_match_data;
+	pcre2_compile_context *pcre2_ccontext;
 	kwset_t kws;
 	unsigned fixed:1;
 	unsigned ignore_case:1;
@@ -68,7 +79,9 @@ enum grep_pattern_type {
 	GREP_PATTERN_TYPE_BRE,
 	GREP_PATTERN_TYPE_ERE,
 	GREP_PATTERN_TYPE_FIXED,
-	GREP_PATTERN_TYPE_PCRE1
+	GREP_PATTERN_TYPE_PCRE,
+	GREP_PATTERN_TYPE_PCRE1,
+	GREP_PATTERN_TYPE_PCRE2
 };
 
 struct grep_expr {
@@ -112,6 +125,7 @@ struct grep_opt {
 	int extended;
 	int use_reflog_filter;
 	int pcre1;
+	int pcre2;
 	int relative;
 	int pathname;
 	int null_following_name;
diff --git a/revision.c b/revision.c
index 7a10a8570a..03a3a012de 100644
--- a/revision.c
+++ b/revision.c
@@ -1996,7 +1996,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;
 	} else if (!strcmp(arg, "--perl-regexp") || !strcmp(arg, "-P")) {
-		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE1;
+		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE;
 	} else if (!strcmp(arg, "--all-match")) {
 		revs->grep_filter.all_match = 1;
 	} else if (!strcmp(arg, "--invert-grep")) {
diff --git a/t/README b/t/README
index a90cb62583..547b06e700 100644
--- a/t/README
+++ b/t/README
@@ -808,6 +808,18 @@ use these, and "test_set_prereq" for how to define your own.
    Git was compiled with support for PCRE. Wrap any tests
    that use git-grep --perl-regexp or git-grep -P in these.
 
+ - LIBPCRE1
+
+   Git was compiled with PCRE v1 support via
+   USE_LIBPCRE=YesPlease. Wrap any PCRE using tests that for some
+   reason need v1 of the PCRE library instead of v2 in these.
+
+ - LIBPCRE2
+
+   Git was compiled with PCRE v2 support via
+   USE_LIBPCRE2=YesPlease. Wrap any PCRE using tests that for some
+   reason need v2 of the PCRE library instead of v1 in these.
+
  - CASE_INSENSITIVE_FS
 
    Test is run on a case insensitive file system.
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 46f528183d..6e9e38cec9 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1063,6 +1063,16 @@ test_expect_success PCRE 'grep -P pattern' '
 	test_cmp expected actual
 '
 
+test_expect_success LIBPCRE1 'grep libpcre1 pattern' '
+	git -c grep.patternType=pcre1 grep "\p{Ps}.*?\p{Pe}" hello.c >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success LIBPCRE2 'grep libpcre2 pattern' '
+	git -c grep.patternType=pcre2 grep "\p{Ps}.*?\p{Pe}" hello.c >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'grep pattern with grep.extendedRegexp=true' '
 	>empty &&
 	test_must_fail git -c grep.extendedregexp=true \
@@ -1522,14 +1532,28 @@ test_expect_success 'grep with thread options' '
 	test_must_fail git -c grep.threads=1 grep --threads=-1 st.*dio
 '
 
-test_expect_success PCRE "grep with grep.patternType synonyms perl/pcre/pcre1" '
+test_expect_success PCRE "grep with grep.patternType synonyms perl/pcre" '
 	echo "#include <stdio.h>" >expected &&
 	git -c grep.patternType=perl  grep -h --no-line-number "st(?=dio)" >actual &&
 	test_cmp expected actual &&
 	git -c grep.patternType=pcre  grep -h --no-line-number "st(?=dio)" >actual &&
-	test_cmp expected actual &&
-	git -c grep.patternType=pcre1 grep -h --no-line-number "st(?=dio)" >actual &&
 	test_cmp expected actual
 '
 
+test_expect_success LIBPCRE1 "grep with grep.patternType=pcre1" '
+	echo "#include <stdio.h>" >expected &&
+	git -c grep.patternType=pcre1 grep -h --no-line-number "st(?=dio)" >actual &&
+	test_cmp expected actual &&
+	test_must_fail git -c grep.patternType=pcre1 grep "foo(?+bar)" 2>error &&
+	test_i18ngrep -q "digit expected after" error
+'
+
+test_expect_success LIBPCRE2 "grep with grep.patternType=pcre2" '
+	echo "#include <stdio.h>" >expected &&
+	git -c grep.patternType=pcre2 grep -h --no-line-number "st(?=dio)" >actual &&
+	test_cmp expected actual &&
+	test_must_fail git -c grep.patternType=pcre2 grep "foo(?+bar)" 2>error &&
+	test_i18ngrep -q "digit expected after" error
+'
+
 test_done
diff --git a/t/t7813-grep-icase-iso.sh b/t/t7813-grep-icase-iso.sh
index 701e08a8e5..e16570690c 100755
--- a/t/t7813-grep-icase-iso.sh
+++ b/t/t7813-grep-icase-iso.sh
@@ -11,9 +11,12 @@ test_expect_success GETTEXT_ISO_LOCALE 'setup' '
 	export LC_ALL
 '
 
-test_expect_success GETTEXT_ISO_LOCALE,PCRE 'grep pcre string' '
-	git grep --perl-regexp -i "TILRAUN: H.lló Heimur!" &&
-	git grep --perl-regexp -i "TILRAUN: H.LLÓ HEIMUR!"
-'
+for pcrev in 1 2
+do
+	test_expect_success GETTEXT_ISO_LOCALE,LIBPCRE$pcrev "grep -i with i18n string using libpcre$pcrev" "
+		git -c grep.patternType=pcre$pcrev grep -i \"TILRAUN: H.lló Heimur!\" &&
+		git -c grep.patternType=pcre$pcrev grep -i \"TILRAUN: H.LLÓ HEIMUR!\"
+	"
+done
 
 test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
index e5cfbcc36b..6f873de3e7 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1010,7 +1010,9 @@ esac
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
-test -n "$USE_LIBPCRE1" && test_set_prereq PCRE
+test -n "$USE_LIBPCRE1$USE_LIBPCRE2" && test_set_prereq PCRE
+test -n "$USE_LIBPCRE1" && test_set_prereq LIBPCRE1
+test -n "$USE_LIBPCRE2" && test_set_prereq LIBPCRE2
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
 
 # Can we rely on git's output in the C locale?
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 06/12] log: add -P as a synonym for --perl-regexp
  2017-04-08 13:25 ` [PATCH 06/12] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
@ 2017-04-10  2:39   ` Junio C Hamano
  2017-04-11 10:26   ` Jeff King
  1 sibling, 0 replies; 43+ messages in thread
From: Junio C Hamano @ 2017-04-10  2:39 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy

On Sat, Apr 8, 2017 at 10:25 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Add a short -P option as a synonym for the longer --perl-regexp, for
> consistency with the options the corresponding grep invocations
> accept.
>
> This was intentionally omitted in commit 727b6fc3ed ("log --grep:
> accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified
> future use.
>
> Since nothing has come along in over 4 1/2 years that's wanted to use
> it, it's more valuable to make it consistent with "grep" than to keep
> it open for future use, and to avoid the confusion of -P meaning
> different things for grep & log, as is the case with the -G option.

I initially had a strong reaction to the above "4 1/2 years entitles us to
do anything that might inconvenience future developers" reasoning, but
as long as we already have -E for extended regexp usable, even if we
will never be able to use -G for basic regexp, I am OK with giving -P for
pcre to be consistent with "git grep". I'd be even supportive if others
agree with this change.

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0
  2017-04-08 13:24 ` [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0 Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:06   ` Jeff King
  2017-04-11 20:20     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy

On Sat, Apr 08, 2017 at 01:24:55PM +0000, Ævar Arnfjörð Bjarmason wrote:

> Add the ability to entirely disable threading by having grep.threads=0
> in the config or --threads=0 on the command-line.

In pack-objects and index-pack, --threads=0 means "auto-detect". It
seems like we should try to keep this consistent.

Wouldn't --threads=1 be a better way to disable threading? Pack-objects
is smart enough to just use the non-threaded code path entirely in that
case (rather than wasting time spawning a single worker thread). Grep
should probably do the same.

> +static int thread_callback(const struct option *opt,
> +			   const char *arg, int unset)
> +{
> +	int *threads = (int*)opt->value;
> +	char *end;
> +
> +	if (unset) {
> +		*threads = GREP_NUM_THREADS_DEFAULT;
> +		return 0;
> +	}

This means that "--no-threads" will use 8 threads. Which _kind of_ makes
sense in that it cancels any previous "--threads", but I wonder if it
should be the same as "--threads=1".

This isn't really a change in behavior from the existing code, though
(OPT_INTEGER will set it to 0 in that case, too, and we'd later pick up
the default value).

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/12] grep: remove redundant regflags assignment under PCRE
  2017-04-08 13:24 ` [PATCH 02/12] grep: remove redundant regflags assignment under PCRE Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:10   ` Jeff King
  0 siblings, 0 replies; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy

On Sat, Apr 08, 2017 at 01:24:56PM +0000, Ævar Arnfjörð Bjarmason wrote:

> Remove a redundant assignment to the "regflags" variable. This
> variable is only used for POSIX regular expression matching, not when
> the PCRE library is used.
> 
> This redundant assignment was added as a result of copy/paste
> programming in commit 84befcd0a4 ("grep: add a grep.patternType
> configuration setting", 2012-08-03). That commit modified already
> working code in commit cca2c172e0 ("git-grep: do not die upon -F/-P
> when grep.extendedRegexp is set.", 2011-05-09) which didn't assign to
> regflags when under PCRE.
> 
> Revert back to that behavior, more to reduce "wait this is used under
> PCRE how?" confusion when reading the code, than to to save ourselves
> trivial CPU cycles by removing one assignment.

Right, this makes sense. I wondered if there might be an obscure case
where we set the flag, but I couldn't find one.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/12] Makefile & configure: reword outdated comment about PCRE
  2017-04-08 13:24 ` [PATCH 03/12] Makefile & configure: reword outdated comment about PCRE Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:14   ` Jeff King
  2017-04-15 12:10     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy

On Sat, Apr 08, 2017 at 01:24:57PM +0000, Ævar Arnfjörð Bjarmason wrote:

> Reword an outdated comment which suggests that only git-grep can use
> PCRE.

Makes sense.

> -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
> -# able to use Perl-compatible regular expressions.
> +# Define USE_LIBPCRE if you have and want to use libpcre. Various
> +# commands such as like log, grep offer runtime options to use
> +# Perl-compatible regular expressions instead of standard or extended
> +# POSIX regular expressions.

s/such as like log, grep/such as log and grep/ ?

> diff --git a/configure.ac b/configure.ac
> [...]
> -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
> -# able to use Perl-compatible regular expressions.
> +# Define USE_LIBPCRE if you have and want to use libpcre. Various
> +# commands such as like log, grep offer runtime options to use
> +# Perl-compatible regular expressions instead of standard or extended
> +# POSIX regular expressions.

Ditto.

> @@ -499,8 +501,10 @@ GIT_CONF_SUBST([NEEDS_SSL_WITH_CRYPTO])
>  GIT_CONF_SUBST([NO_OPENSSL])
>  
>  #
> -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
> -# able to use Perl-compatible regular expressions.
> +# Define USE_LIBPCRE if you have and want to use libpcre. Various
> +# commands such as like log, grep offer runtime options to use
> +# Perl-compatible regular expressions instead of standard or extended
> +# POSIX regular expressions.

And again. It's weird that the text appears twice in configure.ac.
Apparently you can use --with-libpcre or USE_LIBPCRE in the environment?
Configure is weird.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 05/12] log: add exhaustive tests for pattern style options & config
  2017-04-08 13:24 ` [PATCH 05/12] log: add exhaustive tests for pattern style options & config Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:23   ` Jeff King
  2017-04-11 10:32     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy

On Sat, Apr 08, 2017 at 01:24:59PM +0000, Ævar Arnfjörð Bjarmason wrote:

> Add exhaustive tests for how the different grep.patternType options &
> the corresponding command-line options affect git-log.
> 
> Before this change it was possible to patch revision.c so that the
> --basic-regexp option was synonymous with --extended-regexp, and
> --perl-regexp wasn't recognized at all, and still have 100% of the
> test suite pass.

I thought we _did_ have good coerage here, but I think it is only for
grep (via t7810). It makes sense to cover this for "log", too.

> The patterns being passed to fixed/basic/extended/PCRE are carefully
> crafted to return the wrong thing if the grep engine were to pick any
> other matching method than the one it's told to use.

This can be tricky since POSIX allows implementations to add arbitrary
extensions for otherwise invalid syntax.

See my recent 7675c7bd0 (t7810: avoid assumption about invalid regex
syntax, 2017-01-11). In particular:

> +		if test_have_prereq LIBPCRE
> +		then
> +			git -c grep.patternType=perl log --pretty=tformat:%s \
> +				--grep="\((?=1)" >actual.perl
> +		fi &&

I'd have to double-check POSIX, but I suspect that it may allow (?=1) to
work in an ERE (since it's otherwise bogus to have "?" without a prior
element to match).

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 06/12] log: add -P as a synonym for --perl-regexp
  2017-04-08 13:25 ` [PATCH 06/12] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
  2017-04-10  2:39   ` Junio C Hamano
@ 2017-04-11 10:26   ` Jeff King
  1 sibling, 0 replies; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy

On Sat, Apr 08, 2017 at 01:25:00PM +0000, Ævar Arnfjörð Bjarmason wrote:

> Add a short -P option as a synonym for the longer --perl-regexp, for
> consistency with the options the corresponding grep invocations
> accept.
> 
> This was intentionally omitted in commit 727b6fc3ed ("log --grep:
> accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified
> future use.
> 
> Since nothing has come along in over 4 1/2 years that's wanted to use
> it, it's more valuable to make it consistent with "grep" than to keep
> it open for future use, and to avoid the confusion of -P meaning
> different things for grep & log, as is the case with the -G option.

Like Junio, the "well, nothing else has come along" triggered my
initial reflexes.  But I think the consistency argument you make here is
a good one. The "-P" option is tainted whether we do this or not.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/12] grep: make grep.patternType=[pcre|pcre1] a synonym for "perl"
  2017-04-08 13:25 ` [PATCH 08/12] grep: make grep.patternType=[pcre|pcre1] a synonym for "perl" Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:30   ` Jeff King
  0 siblings, 0 replies; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy

On Sat, Apr 08, 2017 at 01:25:02PM +0000, Ævar Arnfjörð Bjarmason wrote:

> Make the pattern types "pcre" & "pcre1" synonyms for long-standing
> "perl" grep.patternType.
> 
> This change is part of a longer patch series to add pcre2 support to
> Git. It's nice to be able to performance test PCRE v1 v.s. v2 without
> having to recompile git, and doing that via grep.patternType makes
> sense.
> 
> However, just adding "pcre2" when we only have "perl" would be
> confusing, so start by adding a "pcre" & "pcre1" synonym.
> 
> In the future "perl" and "pcre" might be changed to default to "pcre2"
> instead of "pcre1", and depending on how Git is compiled the more
> specific "pcre1" or "pcre2" pattern types might produce an error.

I think this makes sense.

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 475e874d51..5ef12d0694 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1624,6 +1624,15 @@ grep.patternType::
>  	'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`,
>  	`--fixed-strings`, or `--perl-regexp` option accordingly, while the
>  	value 'default' will return to the default matching behavior.
> ++
> +The 'pcre' and 'pcre1' values are synonyms for 'perl'. The other
> +values starting with 'pcre' are reserved for future use, e.g. if we'd
> +like to use 'pcre2' for the PCRE v2 library.

Do you want to define this the other way around (that perl and pcre are
synonyms for pcre1)?

I know that hey are being added now as synonyms, but from the
perspective of the user (and this is the user-facing documentation), the
end-game we want is "you can pick pcre1 or pcre2, and the others are
floating synonyms that may change".

I think it would be OK to continue to refer to "perl" elsewhere. Even
though it's a synonym, it's the one we'd expect people to use.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/12] test-lib: rename the LIBPCRE prerequisite to PCRE
  2017-04-08 13:25 ` [PATCH 09/12] test-lib: rename the LIBPCRE prerequisite to PCRE Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:31   ` Jeff King
  0 siblings, 0 replies; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy

On Sat, Apr 08, 2017 at 01:25:03PM +0000, Ævar Arnfjörð Bjarmason wrote:

> Rename the LIBPCRE prerequisite to PCRE. This is for preparation for
> libpcre2 support, where having just "LIBPCRE" would be confusing as it
> implies v1 of the library.
> 
> None of these tests are incompatible between versions 1 & 2 of
> libpcre, it's less confusing to give them a more general name to make
> it clear that they work on both library versions.

OK. I don't think LIBPCRE is all that confusing, but it doesn't hurt to
be on the careful side.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 05/12] log: add exhaustive tests for pattern style options & config
  2017-04-11 10:23   ` Jeff King
@ 2017-04-11 10:32     ` Ævar Arnfjörð Bjarmason
  2017-04-11 10:51       ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-11 10:32 UTC (permalink / raw)
  To: Jeff King
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy

On Tue, Apr 11, 2017 at 12:23 PM, Jeff King <peff@peff.net> wrote:
> On Sat, Apr 08, 2017 at 01:24:59PM +0000, Ævar Arnfjörð Bjarmason wrote:
>
>> Add exhaustive tests for how the different grep.patternType options &
>> the corresponding command-line options affect git-log.
>>
>> Before this change it was possible to patch revision.c so that the
>> --basic-regexp option was synonymous with --extended-regexp, and
>> --perl-regexp wasn't recognized at all, and still have 100% of the
>> test suite pass.
>
> I thought we _did_ have good coerage here, but I think it is only for
> grep (via t7810). It makes sense to cover this for "log", too.
>
>> The patterns being passed to fixed/basic/extended/PCRE are carefully
>> crafted to return the wrong thing if the grep engine were to pick any
>> other matching method than the one it's told to use.
>
> This can be tricky since POSIX allows implementations to add arbitrary
> extensions for otherwise invalid syntax.

For POSIX basic v.s. extended I'm relying on (|) not being
metacharacters in basic but metachars needing quoting in extended. I
very much doubt any regex implementation doesn't conform to that, but
maybe some crazy implementation does...

> See my recent 7675c7bd0 (t7810: avoid assumption about invalid regex
> syntax, 2017-01-11). In particular:
>
>> +             if test_have_prereq LIBPCRE
>> +             then
>> +                     git -c grep.patternType=perl log --pretty=tformat:%s \
>> +                             --grep="\((?=1)" >actual.perl
>> +             fi &&
>
> I'd have to double-check POSIX, but I suspect that it may allow (?=1) to
> work in an ERE (since it's otherwise bogus to have "?" without a prior
> element to match).

Distinguishing PCRE from the rest is much easier, I'll add some more
obscure PCRE feature to that which definitely doesn't exist in any
POSIX rx library, e.g. (*COMMIT) or something.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/12] grep: change the internal PCRE macro names to be PCRE1
  2017-04-08 13:25 ` [PATCH 10/12] grep: change the internal PCRE macro names to be PCRE1 Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:35   ` Jeff King
  2017-04-11 10:51     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy

On Sat, Apr 08, 2017 at 01:25:04PM +0000, Ævar Arnfjörð Bjarmason wrote:

> Change the internal USE_LIBPCRE define, & build options flag to use a
> naming convention ending in PCRE1, without changing the long-standing
> USE_LIBPCRE Makefile flag which enables this code.
> 
> This is for preparation for libpcre2 support where having things like
> USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely
> need to for backwards compatibility with old Makefile arguments would
> be confusing.
> 
> In some ways it would be better to change everything that now uses
> USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying
> USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time
> burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their
> build scripts.
> 
> However I'd like to leave the door open to making
> USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease,
> i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it
> the default.

Yeah, I think it's nice to keep the build-time knobs compatible. In the
long run I assume we'll want to add a USE_LIBPCRE1 flag and USE_LIBPCRE
just becomes a synonym for it (in fact, we could do that in this
commit).

I suspect we won't ever want to make it a synonym for USE_LIBPCRE2.
Unlike the run-time synonyms, where the expressions themselves are
backwards compatible, builders need to care which one they're using.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 11/12] grep: change the internal PCRE code & header names to be PCRE1
  2017-04-08 13:25 ` [PATCH 11/12] grep: change the internal PCRE code & header " Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:37   ` Jeff King
  2017-04-11 10:45     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy

On Sat, Apr 08, 2017 at 01:25:05PM +0000, Ævar Arnfjörð Bjarmason wrote:

> diff --git a/builtin/grep.c b/builtin/grep.c
> index 9478ab5dff..dffb9743b8 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -490,7 +490,7 @@ static void compile_submodule_options(const struct grep_opt *opt,
>  	case GREP_PATTERN_TYPE_FIXED:
>  		argv_array_push(&submodule_options, "-F");
>  		break;
> -	case GREP_PATTERN_TYPE_PCRE:
> +	case GREP_PATTERN_TYPE_PCRE1:
>  		argv_array_push(&submodule_options, "-P");
>  		break;

Hmm. This isn't a problem yet, but wouldn't this need to pass some
pcre1-specific option instead of just "-P"?

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 12/12] grep: add support for PCRE v2
  2017-04-08 13:25 ` [PATCH 12/12] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:43   ` Jeff King
  0 siblings, 0 replies; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy

On Sat, Apr 08, 2017 at 01:25:06PM +0000, Ævar Arnfjörð Bjarmason wrote:

> --- a/Makefile
> +++ b/Makefile
> @@ -32,6 +32,14 @@ all::
>  # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
>  # /foo/bar/include and /foo/bar/lib directories.
>  #
> +# Define USE_LIBPCRE if you have and want to use libpcre2. Various
> +# commands such as like log, grep offer runtime options to use
> +# Perl-compatible regular expressions instead of standard or extended
> +# POSIX regular expressions.

USE_LIBPCRE2 here, right?

I'll admit I skimmed a bit on reading this patch, as I don't know the
pcre2 API at all. I didn't see anything that looked obviously wrong,
though.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 11/12] grep: change the internal PCRE code & header names to be PCRE1
  2017-04-11 10:37   ` Jeff King
@ 2017-04-11 10:45     ` Ævar Arnfjörð Bjarmason
  2017-04-11 10:48       ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-11 10:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Brandon Williams

On Tue, Apr 11, 2017 at 12:37 PM, Jeff King <peff@peff.net> wrote:
> On Sat, Apr 08, 2017 at 01:25:05PM +0000, Ævar Arnfjörð Bjarmason wrote:
>
>> diff --git a/builtin/grep.c b/builtin/grep.c
>> index 9478ab5dff..dffb9743b8 100644
>> --- a/builtin/grep.c
>> +++ b/builtin/grep.c
>> @@ -490,7 +490,7 @@ static void compile_submodule_options(const struct grep_opt *opt,
>>       case GREP_PATTERN_TYPE_FIXED:
>>               argv_array_push(&submodule_options, "-F");
>>               break;
>> -     case GREP_PATTERN_TYPE_PCRE:
>> +     case GREP_PATTERN_TYPE_PCRE1:
>>               argv_array_push(&submodule_options, "-P");
>>               break;
>
> Hmm. This isn't a problem yet, but wouldn't this need to pass some
> pcre1-specific option instead of just "-P"?

Yes, this is a bug. I'll need to add a git_options along with
submodule_options and pass -c grep.patternType=....

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/12] PCREv2 & more
  2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
                   ` (11 preceding siblings ...)
  2017-04-08 13:25 ` [PATCH 12/12] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:47 ` Jeff King
  2017-04-15  8:11   ` Junio C Hamano
  12 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:47 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy

On Sat, Apr 08, 2017 at 01:24:54PM +0000, Ævar Arnfjörð Bjarmason wrote:

> This adds PCRE v2 support, but as I was adding that I kept noticing
> other related problems to fix. It's all bundled up into the same
> series because much of it conflicts because it modifies the same test
> or other code. Notes on each patch below.

Overall, the series looks OK to me.

I'm not sure if it is worth all the complexity to carry pcre1/pcre2 as
run-time options. That does make it easier to do back-to-back
comparisons, but it makes the code a lot more complicated. In particular
I'm worried about subtle cases where we pcre1 turns into pcre2 (or vice
versa) by use of the aliases. That shouldn't matter to a user for
correctness, but it would throw off the benchmarking.

If we literally just added USE_LIBPCRE2 and built against one or the
other, then all the complexity would be limited to a few #ifdefs. The
big drawback AFAICT is that anybody doing timing tests would have to
recompile in between.

So I dunno. I had hoped libpcre2 would be a hands-down win on timing,
but it sounds like there's a little back-and-forth depending on the
context. Is there a ticking clock on pcre1 going away? I suspect it will
be around for many years to come, but if they'll continue optimizing
pcre2 but not pcre1, then it may make sense to hitch our wagon to the
one that upstream is actually working on.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 11/12] grep: change the internal PCRE code & header names to be PCRE1
  2017-04-11 10:45     ` Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:48       ` Jeff King
  2017-04-11 11:02         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Brandon Williams

On Tue, Apr 11, 2017 at 12:45:55PM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Apr 11, 2017 at 12:37 PM, Jeff King <peff@peff.net> wrote:
> > On Sat, Apr 08, 2017 at 01:25:05PM +0000, Ævar Arnfjörð Bjarmason wrote:
> >
> >> diff --git a/builtin/grep.c b/builtin/grep.c
> >> index 9478ab5dff..dffb9743b8 100644
> >> --- a/builtin/grep.c
> >> +++ b/builtin/grep.c
> >> @@ -490,7 +490,7 @@ static void compile_submodule_options(const struct grep_opt *opt,
> >>       case GREP_PATTERN_TYPE_FIXED:
> >>               argv_array_push(&submodule_options, "-F");
> >>               break;
> >> -     case GREP_PATTERN_TYPE_PCRE:
> >> +     case GREP_PATTERN_TYPE_PCRE1:
> >>               argv_array_push(&submodule_options, "-P");
> >>               break;
> >
> > Hmm. This isn't a problem yet, but wouldn't this need to pass some
> > pcre1-specific option instead of just "-P"?
> 
> Yes, this is a bug. I'll need to add a git_options along with
> submodule_options and pass -c grep.patternType=....

Maybe that's an indication we should have --pcre1-regexp and
--pcre2-regexp, so we don't have to resort to config tweaking.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/12] grep: change the internal PCRE macro names to be PCRE1
  2017-04-11 10:35   ` Jeff King
@ 2017-04-11 10:51     ` Ævar Arnfjörð Bjarmason
  2017-04-11 10:53       ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-11 10:51 UTC (permalink / raw)
  To: Jeff King
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy

On Tue, Apr 11, 2017 at 12:35 PM, Jeff King <peff@peff.net> wrote:
> On Sat, Apr 08, 2017 at 01:25:04PM +0000, Ævar Arnfjörð Bjarmason wrote:
>
>> Change the internal USE_LIBPCRE define, & build options flag to use a
>> naming convention ending in PCRE1, without changing the long-standing
>> USE_LIBPCRE Makefile flag which enables this code.
>>
>> This is for preparation for libpcre2 support where having things like
>> USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely
>> need to for backwards compatibility with old Makefile arguments would
>> be confusing.
>>
>> In some ways it would be better to change everything that now uses
>> USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying
>> USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time
>> burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their
>> build scripts.
>>
>> However I'd like to leave the door open to making
>> USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease,
>> i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it
>> the default.
>
> Yeah, I think it's nice to keep the build-time knobs compatible. In the
> long run I assume we'll want to add a USE_LIBPCRE1 flag and USE_LIBPCRE
> just becomes a synonym for it (in fact, we could do that in this
> commit).

I could just add that. Hints for how to do that without entirely
copy/pasting the "ifdef USE_LIBPCRE" in the Makefile welcome. The
Makefile syntax doesn't support some form of "ifdef X || ifdef Y"
AFAICT, so it looks like I'll need to copy those lines...

> I suspect we won't ever want to make it a synonym for USE_LIBPCRE2.
> Unlike the run-time synonyms, where the expressions themselves are
> backwards compatible, builders need to care which one they're using.

Maybe, that's a problem for another day,  at some point we might want
to turn PCRE on by default, and then maybe v2 will be prolific enough
to make that the default...

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 05/12] log: add exhaustive tests for pattern style options & config
  2017-04-11 10:32     ` Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:51       ` Jeff King
  0 siblings, 0 replies; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy

On Tue, Apr 11, 2017 at 12:32:57PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > This can be tricky since POSIX allows implementations to add arbitrary
> > extensions for otherwise invalid syntax.
> 
> For POSIX basic v.s. extended I'm relying on (|) not being
> metacharacters in basic but metachars needing quoting in extended. I
> very much doubt any regex implementation doesn't conform to that, but
> maybe some crazy implementation does...

Yeah, I think BRE vs ERE (vs fixed) is fine, because POSIX specifies the
differences. It's really PCRE, because ERE implementations have a
surprising amount of flexibility according to the standard (basically
anything invalid is listed as "undefined" and implementations are
allowed to do what they please).

> >> +             if test_have_prereq LIBPCRE
> >> +             then
> >> +                     git -c grep.patternType=perl log --pretty=tformat:%s \
> >> +                             --grep="\((?=1)" >actual.perl
> >> +             fi &&
> >
> > I'd have to double-check POSIX, but I suspect that it may allow (?=1) to
> > work in an ERE (since it's otherwise bogus to have "?" without a prior
> > element to match).
> 
> Distinguishing PCRE from the rest is much easier, I'll add some more
> obscure PCRE feature to that which definitely doesn't exist in any
> POSIX rx library, e.g. (*COMMIT) or something.

I think (*) would be "undefined" by POSIX, and thus allowed for
extensions (though in practice it's probably fine). The "[\d]" trick was
the simplest one I could find that is reliably defined to be different
between ERE and PCRE.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/12] grep: change the internal PCRE macro names to be PCRE1
  2017-04-11 10:51     ` Ævar Arnfjörð Bjarmason
@ 2017-04-11 10:53       ` Jeff King
  0 siblings, 0 replies; 43+ messages in thread
From: Jeff King @ 2017-04-11 10:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy

On Tue, Apr 11, 2017 at 12:51:09PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Yeah, I think it's nice to keep the build-time knobs compatible. In the
> > long run I assume we'll want to add a USE_LIBPCRE1 flag and USE_LIBPCRE
> > just becomes a synonym for it (in fact, we could do that in this
> > commit).
> 
> I could just add that. Hints for how to do that without entirely
> copy/pasting the "ifdef USE_LIBPCRE" in the Makefile welcome. The
> Makefile syntax doesn't support some form of "ifdef X || ifdef Y"
> AFAICT, so it looks like I'll need to copy those lines...

Maybe:

  USE_LIBPCRE1 ?= $(USE_LIBPCRE)
  ifdef USE_LIBPCRE1
  ...
  endif

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 11/12] grep: change the internal PCRE code & header names to be PCRE1
  2017-04-11 10:48       ` Jeff King
@ 2017-04-11 11:02         ` Ævar Arnfjörð Bjarmason
  2017-04-11 12:57           ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-11 11:02 UTC (permalink / raw)
  To: Jeff King
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Brandon Williams

On Tue, Apr 11, 2017 at 12:48 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Apr 11, 2017 at 12:45:55PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> On Tue, Apr 11, 2017 at 12:37 PM, Jeff King <peff@peff.net> wrote:
>> > On Sat, Apr 08, 2017 at 01:25:05PM +0000, Ævar Arnfjörð Bjarmason wrote:
>> >
>> >> diff --git a/builtin/grep.c b/builtin/grep.c
>> >> index 9478ab5dff..dffb9743b8 100644
>> >> --- a/builtin/grep.c
>> >> +++ b/builtin/grep.c
>> >> @@ -490,7 +490,7 @@ static void compile_submodule_options(const struct grep_opt *opt,
>> >>       case GREP_PATTERN_TYPE_FIXED:
>> >>               argv_array_push(&submodule_options, "-F");
>> >>               break;
>> >> -     case GREP_PATTERN_TYPE_PCRE:
>> >> +     case GREP_PATTERN_TYPE_PCRE1:
>> >>               argv_array_push(&submodule_options, "-P");
>> >>               break;
>> >
>> > Hmm. This isn't a problem yet, but wouldn't this need to pass some
>> > pcre1-specific option instead of just "-P"?
>>
>> Yes, this is a bug. I'll need to add a git_options along with
>> submodule_options and pass -c grep.patternType=....
>
> Maybe that's an indication we should have --pcre1-regexp and
> --pcre2-regexp, so we don't have to resort to config tweaking.

I'd rather not. To reply to both your
<20170411103018.dkq5gangx3vcxhp4@sigill.intra.peff.net> & this, one
thing I was trying to do in this series (and I don't think I went far
enough in "grep & rev-list doc: stop promising libpcre for
--perl-regexp") was to stop promising some specific version of PCRE.

I.e. I think we should have the likes of core.patternType=perl & -P
for the user, but whether we implement that with pcre/pcre2, or even
libperl itself, or any of the other PCRE reimplementations around
(there's one for java, one for C#, cpython has one I think...), or
maybe even re2 should be an implementation detail.

I.e. as far as the user is concerned they just want perl-y regexes,
but they most likely don't care about the 1% featureset of those
regexes where the various implementations of "perl-y regex" actually
differ, because those cases tend to be really obscure syntax.

Having core.patternType=pcre2 is a neccesary evil for our own
implementation & testing, e.g. for the submodule grep it would be
really bizarro if some edge case where the implementations differ
causes us to produce different grep results, since one side would use
pcre2 & the other pcre1.

But I don't think we should take it to the level of having documented
--pcreN-regexp options.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 11/12] grep: change the internal PCRE code & header names to be PCRE1
  2017-04-11 11:02         ` Ævar Arnfjörð Bjarmason
@ 2017-04-11 12:57           ` Jeff King
  2017-04-11 16:51             ` Brandon Williams
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2017-04-11 12:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Brandon Williams

On Tue, Apr 11, 2017 at 01:02:56PM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> Yes, this is a bug. I'll need to add a git_options along with
> >> submodule_options and pass -c grep.patternType=....
> >
> > Maybe that's an indication we should have --pcre1-regexp and
> > --pcre2-regexp, so we don't have to resort to config tweaking.
> 
> I'd rather not. To reply to both your
> <20170411103018.dkq5gangx3vcxhp4@sigill.intra.peff.net> & this, one
> thing I was trying to do in this series (and I don't think I went far
> enough in "grep & rev-list doc: stop promising libpcre for
> --perl-regexp") was to stop promising some specific version of PCRE.

We don't necessarily have to document them. This is just in the general
rule of "if there's config, there should be command-line to override
it". Because without that, you get this exact situation where you have
to bolt on "-c" options to another part of the command line, which gets
awkward.

I'm also not sure it would be strictly correct, if the sub-program runs
other sub-programs. Providing "-c" affects all child processes, whereas
command-line options are propagated manually. So imagine you have a
process tree like:

  grep
   \-grep
      \-textconv

I.e., grep recurses to a submodule which then has to kick off a textconv
filter for one of the files. If you use "-c" to pass options to the
second grep, then those options will continue to have an effect inside
the textconv filter. Which _probably_ doesn't run git commands that
would care, but technically it could do anything.

> I.e. as far as the user is concerned they just want perl-y regexes,
> but they most likely don't care about the 1% featureset of those
> regexes where the various implementations of "perl-y regex" actually
> differ, because those cases tend to be really obscure syntax.

Yeah, that's what led me to the "why are we even worrying about run-time
switching" direction. I'd think a build-time switch would be enough for
people to test, and it makes all of this type of complexity go away.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 11/12] grep: change the internal PCRE code & header names to be PCRE1
  2017-04-11 12:57           ` Jeff King
@ 2017-04-11 16:51             ` Brandon Williams
  2017-04-11 18:31               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 43+ messages in thread
From: Brandon Williams @ 2017-04-11 16:51 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	Junio C Hamano, Jeffrey Walton, Michał Kiedrowicz, J Smith,
	Victor Leschuk, Nguyễn Thái Ngọc Duy

On 04/11, Jeff King wrote:
> On Tue, Apr 11, 2017 at 01:02:56PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > >> Yes, this is a bug. I'll need to add a git_options along with
> > >> submodule_options and pass -c grep.patternType=....
> > >
> > > Maybe that's an indication we should have --pcre1-regexp and
> > > --pcre2-regexp, so we don't have to resort to config tweaking.
> > 
> > I'd rather not. To reply to both your
> > <20170411103018.dkq5gangx3vcxhp4@sigill.intra.peff.net> & this, one
> > thing I was trying to do in this series (and I don't think I went far
> > enough in "grep & rev-list doc: stop promising libpcre for
> > --perl-regexp") was to stop promising some specific version of PCRE.
> 
> We don't necessarily have to document them. This is just in the general
> rule of "if there's config, there should be command-line to override
> it". Because without that, you get this exact situation where you have
> to bolt on "-c" options to another part of the command line, which gets
> awkward.
> 
> I'm also not sure it would be strictly correct, if the sub-program runs
> other sub-programs. Providing "-c" affects all child processes, whereas
> command-line options are propagated manually. So imagine you have a
> process tree like:
> 
>   grep
>    \-grep
>       \-textconv
> 
> I.e., grep recurses to a submodule which then has to kick off a textconv
> filter for one of the files. If you use "-c" to pass options to the
> second grep, then those options will continue to have an effect inside
> the textconv filter. Which _probably_ doesn't run git commands that
> would care, but technically it could do anything.
> 
> > I.e. as far as the user is concerned they just want perl-y regexes,
> > but they most likely don't care about the 1% featureset of those
> > regexes where the various implementations of "perl-y regex" actually
> > differ, because those cases tend to be really obscure syntax.
> 
> Yeah, that's what led me to the "why are we even worrying about run-time
> switching" direction. I'd think a build-time switch would be enough for
> people to test, and it makes all of this type of complexity go away.

Yeah I agree with Jeff that we should probably avoid needing to pass a
config option down in addition to a command line switch to do perl
regex's.  I didn't take too hard of a look at how that would be done in
the grep code, but it might be slightly more involved than just changing
the enum name.

From [12/12] it looks like the main purpose of this series is to use a
more preferment version of PCRE, if all else is equal it doesn't really
make much sense to have both versions to be select-able at runtime.  Is
there any benefit of being able to do that, that I'm missing?

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 11/12] grep: change the internal PCRE code & header names to be PCRE1
  2017-04-11 16:51             ` Brandon Williams
@ 2017-04-11 18:31               ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-11 18:31 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Jeff King, Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy

On Tue, Apr 11, 2017 at 6:51 PM, Brandon Williams <bmwill@google.com> wrote:
> On 04/11, Jeff King wrote:
>> On Tue, Apr 11, 2017 at 01:02:56PM +0200, Ęvar Arnfjörš Bjarmason wrote:
>>
>> > >> Yes, this is a bug. I'll need to add a git_options along with
>> > >> submodule_options and pass -c grep.patternType=....
>> > >
>> > > Maybe that's an indication we should have --pcre1-regexp and
>> > > --pcre2-regexp, so we don't have to resort to config tweaking.
>> >
>> > I'd rather not. To reply to both your
>> > <20170411103018.dkq5gangx3vcxhp4@sigill.intra.peff.net> & this, one
>> > thing I was trying to do in this series (and I don't think I went far
>> > enough in "grep & rev-list doc: stop promising libpcre for
>> > --perl-regexp") was to stop promising some specific version of PCRE.
>>
>> We don't necessarily have to document them. This is just in the general
>> rule of "if there's config, there should be command-line to override
>> it". Because without that, you get this exact situation where you have
>> to bolt on "-c" options to another part of the command line, which gets
>> awkward.
>>
>> I'm also not sure it would be strictly correct, if the sub-program runs
>> other sub-programs. Providing "-c" affects all child processes, whereas
>> command-line options are propagated manually. So imagine you have a
>> process tree like:
>>
>>   grep
>>    \-grep
>>       \-textconv
>>
>> I.e., grep recurses to a submodule which then has to kick off a textconv
>> filter for one of the files. If you use "-c" to pass options to the
>> second grep, then those options will continue to have an effect inside
>> the textconv filter. Which _probably_ doesn't run git commands that
>> would care, but technically it could do anything.
>>
>> > I.e. as far as the user is concerned they just want perl-y regexes,
>> > but they most likely don't care about the 1% featureset of those
>> > regexes where the various implementations of "perl-y regex" actually
>> > differ, because those cases tend to be really obscure syntax.
>>
>> Yeah, that's what led me to the "why are we even worrying about run-time
>> switching" direction. I'd think a build-time switch would be enough for
>> people to test, and it makes all of this type of complexity go away.
>
> Yeah I agree with Jeff that we should probably avoid needing to pass a
> config option down in addition to a command line switch to do perl
> regex's.  I didn't take too hard of a look at how that would be done in
> the grep code, but it might be slightly more involved than just changing
> the enum name.
>
> From [12/12] it looks like the main purpose of this series is to use a
> more preferment version of PCRE, if all else is equal it doesn't really
> make much sense to have both versions to be select-able at runtime.  Is
> there any benefit of being able to do that, that I'm missing?

Not really no. I don't think any git user is ever going to be using
both pcre1 & pcre2 at runtime.

This grew more organically out of how I started to hack the code. Due
to how different the two APIs are it's much less messier to have a new
set of wrapper functions than to ifdef around v1 & v2. Once I had all
the code & config flags it was easy to shimmy it up so I could switch
between the two, and it was handy for performance testing.

But the guy developing it is hardly the main target audience for a
feature like this, but on the other hand it's easy to support...

I'm hacking up a v2 of this series. It includes some extra goodies
like bugfixes, v1 JIT support, and I'll try to tack a patch at the end
that removes this facility to switch between the two at runtime, and
we can see if that looks better than not having it.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0
  2017-04-11 10:06   ` Jeff King
@ 2017-04-11 20:20     ` Ævar Arnfjörð Bjarmason
  2017-04-11 20:34       ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-11 20:20 UTC (permalink / raw)
  To: Jeff King
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Thomas Rast,
	Fredrik Kuivinen

On Tue, Apr 11, 2017 at 12:06 PM, Jeff King <peff@peff.net> wrote:
> On Sat, Apr 08, 2017 at 01:24:55PM +0000, Ævar Arnfjörð Bjarmason wrote:
>
>> Add the ability to entirely disable threading by having grep.threads=0
>> in the config or --threads=0 on the command-line.
>
> In pack-objects and index-pack, --threads=0 means "auto-detect". It
> seems like we should try to keep this consistent.
>
> Wouldn't --threads=1 be a better way to disable threading? Pack-objects
> is smart enough to just use the non-threaded code path entirely in that
> case (rather than wasting time spawning a single worker thread). Grep
> should probably do the same.

I'm struggling to find a use-case where threading makes sense at all.
The example in the initial introduction in 5b594f457a is always slower
with >0 for me, and since then in 0579f91dd7 it got disabled entirely
for non-worktree cases.

But assuming it works for someone out there, then 0 threads is clearly
not the same as 1. On linux.git with pcre2 grepping for [q]werty for
example[1]

        Rate P_12  P_8  P_4  P_1  P_2  P_0
P_12 0.861/s   --  -7% -22% -27% -30% -43%
P_8  0.924/s   7%   -- -16% -22% -25% -39%
P_4   1.10/s  28%  19%   --  -7% -10% -27%
P_1   1.19/s  38%  29%   8%   --  -3% -21%
P_2   1.23/s  43%  33%  12%   4%   -- -18%
P_0   1.51/s  75%  63%  37%  27%  22%   --

And for [a]var on git.git:

       Rate P_12  P_8  P_4  P_2  P_1  P_0
P_12 15.6/s   --  -5% -15% -17% -21% -42%
P_8  16.4/s   5%   -- -11% -12% -17% -39%
P_4  18.4/s  18%  13%   --  -1%  -7% -32%
P_2  18.7/s  20%  14%   1%   --  -6% -31%
P_1  19.8/s  27%  21%   8%   6%   -- -27%
P_0  27.0/s  73%  65%  46%  44%  36%   --

Tthere's a >20% performance difference between 0 and 1 threads. The
more threads I add the slower it gets, but if there's some case where
we have the inverse of that and you have e.g. 2 cores, then presumably
you really want 1 thread, and not 0, or 2.

>> +static int thread_callback(const struct option *opt,
>> +                        const char *arg, int unset)
>> +{
>> +     int *threads = (int*)opt->value;
>> +     char *end;
>> +
>> +     if (unset) {
>> +             *threads = GREP_NUM_THREADS_DEFAULT;
>> +             return 0;
>> +     }
>
> This means that "--no-threads" will use 8 threads. Which _kind of_ makes
> sense in that it cancels any previous "--threads", but I wonder if it
> should be the same as "--threads=1".

Dear lazyweb, how do you distinguish --no-foo from no --foo=X being
provided in this API?

> This isn't really a change in behavior from the existing code, though
> (OPT_INTEGER will set it to 0 in that case, too, and we'd later pick up
> the default value).

1. =~/g/git/ perl -MBenchmark=cmpthese -wE 'cmpthese(10, { map { my $t
= $_; +("P_$t" => sub { system "$ENV{PF}git -c grep.patternType=pcre2
grep --threads=$t [q]werty >/dev/null" }) } 0,1,2,4,8,12 })'

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0
  2017-04-11 20:20     ` Ævar Arnfjörð Bjarmason
@ 2017-04-11 20:34       ` Jeff King
  2017-04-11 20:56         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2017-04-11 20:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Thomas Rast,
	Fredrik Kuivinen

On Tue, Apr 11, 2017 at 10:20:59PM +0200, Ævar Arnfjörð Bjarmason wrote:

> I'm struggling to find a use-case where threading makes sense at all.
> The example in the initial introduction in 5b594f457a is always slower
> with >0 for me, and since then in 0579f91dd7 it got disabled entirely
> for non-worktree cases.

It's a big win for me in worktree greps of linux.git:

  $ best-of-five git grep --threads=1 '[q]werty'
  Attempt 1: 0.713
  Attempt 2: 0.708
  Attempt 3: 0.689
  Attempt 4: 0.695
  Attempt 5: 0.7

  real	0m0.689s
  user	0m0.560s
  sys	0m0.248s

  $ best-of-five git grep --threads=8 '[q]werty'
  Attempt 1: 0.238
  Attempt 2: 0.225
  Attempt 3: 0.222
  Attempt 4: 0.221
  Attempt 5: 0.225

  real	0m0.221s
  user	0m0.936s
  sys	0m0.356s

In non-worktree cases most of the time goes to accessing objects, which
happens under a lock. So you don't get any real parallelism, just
overhead.

> But assuming it works for someone out there, then 0 threads is clearly
> not the same as 1. On linux.git with pcre2 grepping for [q]werty for
> example[1]

Right, my suggestion was to teach "grep" to treat --threads=1 as "do not
spawn any other threads". I.e., to make it like the "0" case you were
proposing, and then leave "0" as "auto-detect". There would be no way to
spawn a _single_ thread and feed it. But why would you want to do that?
It's always going to be strictly worse than not threading at all.

-Peff

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0
  2017-04-11 20:34       ` Jeff King
@ 2017-04-11 20:56         ` Ævar Arnfjörð Bjarmason
  2017-04-14 21:23           ` Jeff King
  0 siblings, 1 reply; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-11 20:56 UTC (permalink / raw)
  To: Jeff King
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Thomas Rast,
	Fredrik Kuivinen

On Tue, Apr 11, 2017 at 10:34 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Apr 11, 2017 at 10:20:59PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> I'm struggling to find a use-case where threading makes sense at all.
>> The example in the initial introduction in 5b594f457a is always slower
>> with >0 for me, and since then in 0579f91dd7 it got disabled entirely
>> for non-worktree cases.
>
> It's a big win for me in worktree greps of linux.git:
>
>   $ best-of-five git grep --threads=1 '[q]werty'
>   Attempt 1: 0.713
>   Attempt 2: 0.708
>   Attempt 3: 0.689
>   Attempt 4: 0.695
>   Attempt 5: 0.7
>
>   real  0m0.689s
>   user  0m0.560s
>   sys   0m0.248s
>
>   $ best-of-five git grep --threads=8 '[q]werty'
>   Attempt 1: 0.238
>   Attempt 2: 0.225
>   Attempt 3: 0.222
>   Attempt 4: 0.221
>   Attempt 5: 0.225
>
>   real  0m0.221s
>   user  0m0.936s
>   sys   0m0.356s
>
> In non-worktree cases most of the time goes to accessing objects, which
> happens under a lock. So you don't get any real parallelism, just
> overhead.
>
>> But assuming it works for someone out there, then 0 threads is clearly
>> not the same as 1. On linux.git with pcre2 grepping for [q]werty for
>> example[1]
>
> Right, my suggestion was to teach "grep" to treat --threads=1 as "do not
> spawn any other threads". I.e., to make it like the "0" case you were
> proposing, and then leave "0" as "auto-detect". There would be no way to
> spawn a _single_ thread and feed it. But why would you want to do that?
> It's always going to be strictly worse than not threading at all.

I understand, but given the two profiles we've posted it seems clear
that there's cases where if we did that, we'd be locking people out of
their optimal thread configuration, which would be --thread=1 with my
patch, but wouldn't exist with this proposed change.

If you see better timings with 8 threads than 1 on linux.git, but I
see strictly worse, then if you apply my patch doesn't --threads=0
look worse than --threads=1, which looks worse than --threads=2 etc,
until you reach some number where you either run out of CPU or I/O
throughput.

Anyway, I really don't care about this feature much, I just wanted a
way to disable threading, but looking at the perf profiles I wonder if
doing your proposed change would cause a regression in some cases
where someone really wanted /one/ thread.

But of course my patch breaks the long documented grep.threads=0 for
"give me threads that you auto detect" to now mean "you get none".

Also doesn't --thread=1 right now mean "one thread, but two workers?".
I haven't dug into the grep worker/thread code, but it compiles the
the pattern twice, so isn't both the non-thread main process & the
sole thread it spawns on --thread=1 doing work, so in some other
universe it's synonymous with --workers=2?

If so do pack-objects & index-pack also behave like that? If so this
whole thing is very confusing for users, because some will read 1
thread and think "one worker", whereas it really means "two workers,
one using a thread, if you want three workers spawn two threads".

Bah!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0
  2017-04-11 20:56         ` Ævar Arnfjörð Bjarmason
@ 2017-04-14 21:23           ` Jeff King
  2017-04-16 22:25             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 43+ messages in thread
From: Jeff King @ 2017-04-14 21:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Thomas Rast,
	Fredrik Kuivinen

On Tue, Apr 11, 2017 at 10:56:01PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Right, my suggestion was to teach "grep" to treat --threads=1 as "do not
> > spawn any other threads". I.e., to make it like the "0" case you were
> > proposing, and then leave "0" as "auto-detect". There would be no way to
> > spawn a _single_ thread and feed it. But why would you want to do that?
> > It's always going to be strictly worse than not threading at all.
> 
> I understand, but given the two profiles we've posted it seems clear
> that there's cases where if we did that, we'd be locking people out of
> their optimal thread configuration, which would be --thread=1 with my
> patch, but wouldn't exist with this proposed change.

Maybe I don't understand your profiles. For a single-core machine you
probably want fewer threads, right? There is no such thing as "0"
threads, as you always have the original main thread in which we would
do the work.  So the lowest you can go is "1" (it's a separate question
of what --threads=0 should "mean"; I think we should keep it as
"auto-detect" for compatibility).

We could implement the single-thread case by spawning off one worker
thread (and effectively having 2 threads, but one is just sitting in
pthread_join()). And I think that's how it's implemented now in
git-grep. But we can optimize out the creation of the second thread
entirely, and just do the work in the main thread.

That saves a little bit of thread-spawning overhead, and it also makes
debugging much more pleasant.  For --threads=2, you'd always have to
kick in the thread-spawning code, and you'd spawn two worker threads
(and the main thread just sits there).

IOW, I am just proposing something like this:

diff --git a/builtin/grep.c b/builtin/grep.c
index 65070c52f..76ce38404 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -326,7 +326,7 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid,
 	}
 
 #ifndef NO_PTHREADS
-	if (num_threads) {
+	if (num_threads > 1) {
 		add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, path, oid);
 		strbuf_release(&pathbuf);
 		return 0;
@@ -360,7 +360,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 	}
 
 #ifndef NO_PTHREADS
-	if (num_threads) {
+	if (num_threads > 1) {
 		add_work(opt, GREP_SOURCE_FILE, buf.buf, filename, filename);
 		strbuf_release(&buf);
 		return 0;

where we fall back to the same code that NO_PTHREADS uses when there is
only a single thread requested.

> Anyway, I really don't care about this feature much, I just wanted a
> way to disable threading, but looking at the perf profiles I wonder if
> doing your proposed change would cause a regression in some cases
> where someone really wanted /one/ thread.

I'm not sure what would regress. Right now --threads=1 only does
work in a single worker thread. And --threads=2 does it in 2, and so on.
In all cases, the original main-thread is just farming out work and
waiting on pthread_join() (let's call that the controller thread).  So
why would you ever want the "controller plus one worker" setup? It's
strictly worse than "controller just does the work".

> But of course my patch breaks the long documented grep.threads=0 for
> "give me threads that you auto detect" to now mean "you get none".

Right, that's what I'm concerned about.

> Also doesn't --thread=1 right now mean "one thread, but two workers?".
> I haven't dug into the grep worker/thread code, but it compiles the
> the pattern twice, so isn't both the non-thread main process & the
> sole thread it spawns on --thread=1 doing work, so in some other
> universe it's synonymous with --workers=2?

I think --threads=1 right now means "one worker thread". I think the
main program calls compile_grep_patterns() to make sure they are sane,
before it even considers whether and how to thread.  And then each
worker thread duplicates them and re-compiles them itself (IIRC, this is
because the regex code may not be thread-safe).

> If so do pack-objects & index-pack also behave like that? If so this
> whole thing is very confusing for users, because some will read 1
> thread and think "one worker", whereas it really means "two workers,
> one using a thread, if you want three workers spawn two threads".

No, I think --threads is "this many workers". If you have more than one
worker, you may have an extra thread farming out work to them, but that
isn't counted (and is mostly dormant).

-Peff

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/12] PCREv2 & more
  2017-04-11 10:47 ` [PATCH 00/12] PCREv2 & more Jeff King
@ 2017-04-15  8:11   ` Junio C Hamano
  2017-04-15  9:50     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2017-04-15  8:11 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy

Jeff King <peff@peff.net> writes:

> On Sat, Apr 08, 2017 at 01:24:54PM +0000, Ævar Arnfjörð Bjarmason wrote:
>
>> This adds PCRE v2 support, but as I was adding that I kept noticing
>> other related problems to fix. It's all bundled up into the same
>> series because much of it conflicts because it modifies the same test
>> or other code. Notes on each patch below.
>
> Overall, the series looks OK to me.
>
> I'm not sure if it is worth all the complexity to carry pcre1/pcre2 as
> run-time options. That does make it easier to do back-to-back
> comparisons, but it makes the code a lot more complicated. In particular
> I'm worried about subtle cases where we pcre1 turns into pcre2 (or vice
> versa) by use of the aliases. That shouldn't matter to a user for
> correctness, but it would throw off the benchmarking.
>
> If we literally just added USE_LIBPCRE2 and built against one or the
> other, then all the complexity would be limited to a few #ifdefs. The
> big drawback AFAICT is that anybody doing timing tests would have to
> recompile in between.

Yeah, having to dl two libs at runtime, even when you would ever use
just one in a single run, is less than ideal.  A small downside
inflicted on everybody will add up to million times more than a
larger downside only suffered by developers, so I tend to agree with
you that we probably should simplify to choose just one (or zero) at
compile time.

> So I dunno. I had hoped libpcre2 would be a hands-down win on timing,
> but it sounds like there's a little back-and-forth depending on the
> context. Is there a ticking clock on pcre1 going away? I suspect it will
> be around for many years to come, but if they'll continue optimizing
> pcre2 but not pcre1, then it may make sense to hitch our wagon to the
> one that upstream is actually working on.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/12] PCREv2 & more
  2017-04-15  8:11   ` Junio C Hamano
@ 2017-04-15  9:50     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-15  9:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Git Mailing List, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy

On Sat, Apr 15, 2017 at 10:11 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> On Sat, Apr 08, 2017 at 01:24:54PM +0000, Ævar Arnfjörð Bjarmason wrote:
>>
>>> This adds PCRE v2 support, but as I was adding that I kept noticing
>>> other related problems to fix. It's all bundled up into the same
>>> series because much of it conflicts because it modifies the same test
>>> or other code. Notes on each patch below.
>>
>> Overall, the series looks OK to me.
>>
>> I'm not sure if it is worth all the complexity to carry pcre1/pcre2 as
>> run-time options. That does make it easier to do back-to-back
>> comparisons, but it makes the code a lot more complicated. In particular
>> I'm worried about subtle cases where we pcre1 turns into pcre2 (or vice
>> versa) by use of the aliases. That shouldn't matter to a user for
>> correctness, but it would throw off the benchmarking.
>>
>> If we literally just added USE_LIBPCRE2 and built against one or the
>> other, then all the complexity would be limited to a few #ifdefs. The
>> big drawback AFAICT is that anybody doing timing tests would have to
>> recompile in between.
>
> Yeah, having to dl two libs at runtime, even when you would ever use
> just one in a single run, is less than ideal.  A small downside
> inflicted on everybody will add up to million times more than a
> larger downside only suffered by developers, so I tend to agree with
> you that we probably should simplify to choose just one (or zero) at
> compile time.

I'll document & clarify this in v2, but I don't expect / want anyone
who's distributing git to link to both v1 & v2, more details in my
<CACBZZX6HLDmWSGiQ+cJ-p0Ak6SQHcmECaGqsfVz-Js4q7aSEwg@mail.gmail.com>.

It's just something we already have 95% of the code to support anyway,
and doing the remaining 5% makes it easier to test & benchmark it for
us devs without incurring any real maintenance or tech burden.

But as noted elsewhere in that message I'll include a patch to only
add the ability to use one PCRE version. So we can just review &
discuss the tradeoffs of doing that then.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/12] Makefile & configure: reword outdated comment about PCRE
  2017-04-11 10:14   ` Jeff King
@ 2017-04-15 12:10     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-15 12:10 UTC (permalink / raw)
  To: Jeff King
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy

On Tue, Apr 11, 2017 at 12:14 PM, Jeff King <peff@peff.net> wrote:
> On Sat, Apr 08, 2017 at 01:24:57PM +0000, Ævar Arnfjörð Bjarmason wrote:
>
>> Reword an outdated comment which suggests that only git-grep can use
>> PCRE.
>
> Makes sense.
>
>> -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
>> -# able to use Perl-compatible regular expressions.
>> +# Define USE_LIBPCRE if you have and want to use libpcre. Various
>> +# commands such as like log, grep offer runtime options to use
>> +# Perl-compatible regular expressions instead of standard or extended
>> +# POSIX regular expressions.
>
> s/such as like log, grep/such as log and grep/ ?
Thanks.

>> diff --git a/configure.ac b/configure.ac
>> [...]
>> -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
>> -# able to use Perl-compatible regular expressions.
>> +# Define USE_LIBPCRE if you have and want to use libpcre. Various
>> +# commands such as like log, grep offer runtime options to use
>> +# Perl-compatible regular expressions instead of standard or extended
>> +# POSIX regular expressions.
>
> Ditto.
>
>> @@ -499,8 +501,10 @@ GIT_CONF_SUBST([NEEDS_SSL_WITH_CRYPTO])
>>  GIT_CONF_SUBST([NO_OPENSSL])
>>
>>  #
>> -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
>> -# able to use Perl-compatible regular expressions.
>> +# Define USE_LIBPCRE if you have and want to use libpcre. Various
>> +# commands such as like log, grep offer runtime options to use
>> +# Perl-compatible regular expressions instead of standard or extended
>> +# POSIX regular expressions.
>
> And again. It's weird that the text appears twice in configure.ac.
> Apparently you can use --with-libpcre or USE_LIBPCRE in the environment?
> Configure is weird.

You can't, I've added this to the commit message:

    Copy/pasting this so much in configure.ac is lame, these Makefile-like
    flags aren't even used by autoconf, just the corresponding
    --with[out]-* options. But copy/pasting the comments that make sense
    for the Makefile to configure.ac where they make less sence is the
    pattern everything else follows in that file. I'm not going to war
    against that as part of this change, just following the existing
    pattern.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0
  2017-04-14 21:23           ` Jeff King
@ 2017-04-16 22:25             ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 43+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-16 22:25 UTC (permalink / raw)
  To: Jeff King
  Cc: Git Mailing List, Junio C Hamano, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Thomas Rast,
	Fredrik Kuivinen

On Fri, Apr 14, 2017 at 11:23 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Apr 11, 2017 at 10:56:01PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > Right, my suggestion was to teach "grep" to treat --threads=1 as "do not
>> > spawn any other threads". I.e., to make it like the "0" case you were
>> > proposing, and then leave "0" as "auto-detect". There would be no way to
>> > spawn a _single_ thread and feed it. But why would you want to do that?
>> > It's always going to be strictly worse than not threading at all.
>>
>> I understand, but given the two profiles we've posted it seems clear
>> that there's cases where if we did that, we'd be locking people out of
>> their optimal thread configuration, which would be --thread=1 with my
>> patch, but wouldn't exist with this proposed change.
>
> Maybe I don't understand your profiles. For a single-core machine you
> probably want fewer threads, right? There is no such thing as "0"
> threads, as you always have the original main thread in which we would
> do the work.  So the lowest you can go is "1" (it's a separate question
> of what --threads=0 should "mean"; I think we should keep it as
> "auto-detect" for compatibility).
>
> We could implement the single-thread case by spawning off one worker
> thread (and effectively having 2 threads, but one is just sitting in
> pthread_join()). And I think that's how it's implemented now in
> git-grep. But we can optimize out the creation of the second thread
> entirely, and just do the work in the main thread.
>
> That saves a little bit of thread-spawning overhead, and it also makes
> debugging much more pleasant.  For --threads=2, you'd always have to
> kick in the thread-spawning code, and you'd spawn two worker threads
> (and the main thread just sits there).
>
> IOW, I am just proposing something like this:
>
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 65070c52f..76ce38404 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -326,7 +326,7 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid,
>         }
>
>  #ifndef NO_PTHREADS
> -       if (num_threads) {
> +       if (num_threads > 1) {
>                 add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, path, oid);
>                 strbuf_release(&pathbuf);
>                 return 0;
> @@ -360,7 +360,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
>         }
>
>  #ifndef NO_PTHREADS
> -       if (num_threads) {
> +       if (num_threads > 1) {
>                 add_work(opt, GREP_SOURCE_FILE, buf.buf, filename, filename);
>                 strbuf_release(&buf);
>                 return 0;
>
> where we fall back to the same code that NO_PTHREADS uses when there is
> only a single thread requested.
>
>> Anyway, I really don't care about this feature much, I just wanted a
>> way to disable threading, but looking at the perf profiles I wonder if
>> doing your proposed change would cause a regression in some cases
>> where someone really wanted /one/ thread.
>
> I'm not sure what would regress. Right now --threads=1 only does
> work in a single worker thread. And --threads=2 does it in 2, and so on.
> In all cases, the original main-thread is just farming out work and
> waiting on pthread_join() (let's call that the controller thread).  So
> why would you ever want the "controller plus one worker" setup? It's
> strictly worse than "controller just does the work".
>
>> But of course my patch breaks the long documented grep.threads=0 for
>> "give me threads that you auto detect" to now mean "you get none".
>
> Right, that's what I'm concerned about.
>
>> Also doesn't --thread=1 right now mean "one thread, but two workers?".
>> I haven't dug into the grep worker/thread code, but it compiles the
>> the pattern twice, so isn't both the non-thread main process & the
>> sole thread it spawns on --thread=1 doing work, so in some other
>> universe it's synonymous with --workers=2?
>
> I think --threads=1 right now means "one worker thread". I think the
> main program calls compile_grep_patterns() to make sure they are sane,
> before it even considers whether and how to thread.  And then each
> worker thread duplicates them and re-compiles them itself (IIRC, this is
> because the regex code may not be thread-safe).
>
>> If so do pack-objects & index-pack also behave like that? If so this
>> whole thing is very confusing for users, because some will read 1
>> thread and think "one worker", whereas it really means "two workers,
>> one using a thread, if you want three workers spawn two threads".
>
> No, I think --threads is "this many workers". If you have more than one
> worker, you may have an extra thread farming out work to them, but that
> isn't counted (and is mostly dormant).

I submitted a v2 of this patch series, see "[PATCH v2 0/8] grep
threading cleanup & tests", as noted there this whole confusing v1 +
thread is me misunderstanding how grep actually worked. Thanks for
being patient with me, that patch series fixes the odd handling of
grep patterns which led me down the garden path of thinking that
workers != threads in the case of grep (which in retrospect couldn't
have made any sense either, my mental model of C threading was wrong).

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2017-04-16 22:26 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
2017-04-08 13:24 ` [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0 Ævar Arnfjörð Bjarmason
2017-04-11 10:06   ` Jeff King
2017-04-11 20:20     ` Ævar Arnfjörð Bjarmason
2017-04-11 20:34       ` Jeff King
2017-04-11 20:56         ` Ævar Arnfjörð Bjarmason
2017-04-14 21:23           ` Jeff King
2017-04-16 22:25             ` Ævar Arnfjörð Bjarmason
2017-04-08 13:24 ` [PATCH 02/12] grep: remove redundant regflags assignment under PCRE Ævar Arnfjörð Bjarmason
2017-04-11 10:10   ` Jeff King
2017-04-08 13:24 ` [PATCH 03/12] Makefile & configure: reword outdated comment about PCRE Ævar Arnfjörð Bjarmason
2017-04-11 10:14   ` Jeff King
2017-04-15 12:10     ` Ævar Arnfjörð Bjarmason
2017-04-08 13:24 ` [PATCH 04/12] grep: add a test for backreferences in PCRE patterns Ævar Arnfjörð Bjarmason
2017-04-08 13:24 ` [PATCH 05/12] log: add exhaustive tests for pattern style options & config Ævar Arnfjörð Bjarmason
2017-04-11 10:23   ` Jeff King
2017-04-11 10:32     ` Ævar Arnfjörð Bjarmason
2017-04-11 10:51       ` Jeff King
2017-04-08 13:25 ` [PATCH 06/12] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
2017-04-10  2:39   ` Junio C Hamano
2017-04-11 10:26   ` Jeff King
2017-04-08 13:25 ` [PATCH 07/12] grep & rev-list doc: stop promising libpcre " Ævar Arnfjörð Bjarmason
2017-04-08 13:25 ` [PATCH 08/12] grep: make grep.patternType=[pcre|pcre1] a synonym for "perl" Ævar Arnfjörð Bjarmason
2017-04-11 10:30   ` Jeff King
2017-04-08 13:25 ` [PATCH 09/12] test-lib: rename the LIBPCRE prerequisite to PCRE Ævar Arnfjörð Bjarmason
2017-04-11 10:31   ` Jeff King
2017-04-08 13:25 ` [PATCH 10/12] grep: change the internal PCRE macro names to be PCRE1 Ævar Arnfjörð Bjarmason
2017-04-11 10:35   ` Jeff King
2017-04-11 10:51     ` Ævar Arnfjörð Bjarmason
2017-04-11 10:53       ` Jeff King
2017-04-08 13:25 ` [PATCH 11/12] grep: change the internal PCRE code & header " Ævar Arnfjörð Bjarmason
2017-04-11 10:37   ` Jeff King
2017-04-11 10:45     ` Ævar Arnfjörð Bjarmason
2017-04-11 10:48       ` Jeff King
2017-04-11 11:02         ` Ævar Arnfjörð Bjarmason
2017-04-11 12:57           ` Jeff King
2017-04-11 16:51             ` Brandon Williams
2017-04-11 18:31               ` Ævar Arnfjörð Bjarmason
2017-04-08 13:25 ` [PATCH 12/12] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
2017-04-11 10:43   ` Jeff King
2017-04-11 10:47 ` [PATCH 00/12] PCREv2 & more Jeff King
2017-04-15  8:11   ` Junio C Hamano
2017-04-15  9:50     ` Ævar Arnfjörð Bjarmason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.