All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch]
@ 2021-02-03  3:27 Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 01/25] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
                   ` (48 more replies)
  0 siblings, 49 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

This is based on next. I was wondering if/how to split this up, it
should arguably be a few serieses, but let's see if it gets some
traction in reviews like this. Comments below:

Ævar Arnfjörð Bjarmason (25):
  grep/pcre2 tests: reword comments referring to kwset
  grep/pcre2: drop needless assignment + assert() on opt->pcre2
  grep/pcre2: drop needless assignment to NULL
  grep/pcre2: correct reference to grep_init() in comment
  grep/pcre2: prepare to add debugging to pcre2_malloc()
  grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode
  grep/pcre2: use compile-time PCREv2 version test
  grep/pcre2: use pcre2_maketables_free() function
  grep/pcre2: actually make pcre2 use custom allocator
  grep/pcre2: move back to thread-only PCREv2 structures
  grep/pcre2: move definitions of pcre2_{malloc,free}

PCRE v2 code cleanups, and fix up bugs in our pcre2_{malloc,free}()
handling.

  pickaxe tests: refactor to use test_commit --append
  pickaxe -S: support content with NULs under --pickaxe-regex
  pickaxe -S: remove redundant "sz" check in while-loop
  pickaxe/style: consolidate declarations and assignments
  pickaxe tests: add test for diffgrep_consume() internals
  pickaxe tests: add test for "log -S" not being a regex
  perf: add performance test for pickaxe

Various test prep for pickaxe.

  pickaxe -G: set -U0 for diff generation

Turns out feeding "log -G" -U10 output makes it faster.

  grep.h: make patmatch() a public function
  pickaxe: use PCREv2 for -G and -S
  Remove unused kwset.[ch]

At long last, kwset.[ch] is gone!

  xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn
  xdiff-interface: support early exit in xdiff_outf()
  pickaxe -G: terminate early on matching lines

Solve an ancient todo item in pickaxe by extending our xdiff interface
so you can early exit from hunk/line handlers.

 Makefile                       |   2 -
 builtin/grep.c                 |   1 -
 combine-diff.c                 |   9 +-
 compat/obstack.c               | 413 ------------------
 compat/obstack.h               | 511 ----------------------
 ctype.c                        |  36 --
 diff.c                         |  39 +-
 diff.h                         |   4 +
 diffcore-pickaxe.c             | 184 ++++----
 git-compat-util.h              |   3 -
 grep.c                         | 103 ++---
 grep.h                         |  11 +-
 kwset.c                        | 775 ---------------------------------
 kwset.h                        |  65 ---
 range-diff.c                   |   8 +-
 t/perf/p4209-pickaxe.sh        |  82 ++++
 t/t4209-log-pickaxe.sh         |  64 ++-
 t/t7816-grep-binary-pattern.sh |   4 +-
 xdiff-interface.c              |  26 +-
 xdiff-interface.h              |  36 +-
 20 files changed, 351 insertions(+), 2025 deletions(-)
 delete mode 100644 compat/obstack.c
 delete mode 100644 compat/obstack.h
 delete mode 100644 kwset.c
 delete mode 100644 kwset.h
 create mode 100755 t/perf/p4209-pickaxe.sh

-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH 01/25] grep/pcre2 tests: reword comments referring to kwset
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 02/25] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason
                   ` (47 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

The kwset optimization has not been used by grep since
48de2a768cf (grep: remove the kwset optimization, 2019-07-01).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7816-grep-binary-pattern.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t7816-grep-binary-pattern.sh b/t/t7816-grep-binary-pattern.sh
index 60bab291e4..9d67a5fc4c 100755
--- a/t/t7816-grep-binary-pattern.sh
+++ b/t/t7816-grep-binary-pattern.sh
@@ -59,7 +59,7 @@ test_expect_success 'setup' "
 	git commit -m.
 "
 
-# Simple fixed-string matching that can use kwset (no -i && non-ASCII)
+# Simple fixed-string matching
 nul_match P P P '-F' 'yQf'
 nul_match P P P '-F' 'yQx'
 nul_match P P P '-Fi' 'YQf'
@@ -78,7 +78,7 @@ nul_match P P P '-Fi' '[Y]QF'
 nul_match P P P '-F' 'æQ[ð]'
 nul_match P P P '-F' '[æ]Qð'
 
-# The -F kwset codepath can't handle -i && non-ASCII...
+# Matching pattern and subject case with -i
 nul_match P 1 1 '-i' '[æ]Qð'
 
 # ...PCRE v2 only matches non-ASCII with -i casefolding under UTF-8
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 02/25] grep/pcre2: drop needless assignment + assert() on opt->pcre2
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 01/25] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 03/25] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason
                   ` (46 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Drop an assignment added in b65abcafc7a (grep: use PCRE v2 for
optimized fixed-string search, 2019-07-01) and the overly cautious
assert() I added in 94da9193a6e (grep: add support for PCRE v2,
2017-06-01).

There was never a good reason for this, it's just a relic from when I
initially wrote the PCREv2 support. We're not going to have confusion
about compile_pcre2_pattern() being called when it shouldn't just
because we forgot to cargo-cult this opt->pcre2 option, and "opt"
is (mostly) used for the options the user supplied, let's avoid the
pattern of needlessly assigning to it.

With my in-flight removal of PCREv1 [1] ("Remove support for v1 of the
PCRE library", 2021-01-24) there'll be even less confusion around what
we call where in these codepaths, which is one more reason to remove
this.

1. https://lore.kernel.org/git/xmqqmtwy29x8.fsf@gitster.c.googlers.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/grep.c b/grep.c
index aabfaaa4c3..816e23f17e 100644
--- a/grep.c
+++ b/grep.c
@@ -373,8 +373,6 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	int patinforet;
 	size_t jitsizearg;
 
-	assert(opt->pcre2);
-
 	p->pcre2_compile_context = NULL;
 
 	/* pcre2_global_context is initialized in append_grep_pattern */
@@ -555,7 +553,6 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 #endif
 	if (p->fixed || p->is_fixed) {
 #ifdef USE_LIBPCRE2
-		opt->pcre2 = 1;
 		if (p->is_fixed) {
 			compile_pcre2_pattern(p, opt);
 		} else {
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 03/25] grep/pcre2: drop needless assignment to NULL
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 01/25] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 02/25] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 04/25] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason
                   ` (45 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Remove a redundant assignment of pcre2_compile_context dating back to
my 94da9193a6e (grep: add support for PCRE v2, 2017-06-01). In
create_grep_pat() we xcalloc() the "grep_pat" struct, so there's no
need to NULL out individual members here.

I think this was probably something left over from an earlier
development version of mine.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/grep.c b/grep.c
index 816e23f17e..f27c5de7f5 100644
--- a/grep.c
+++ b/grep.c
@@ -373,8 +373,6 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	int patinforet;
 	size_t jitsizearg;
 
-	p->pcre2_compile_context = NULL;
-
 	/* pcre2_global_context is initialized in append_grep_pattern */
 	if (opt->ignore_case) {
 		if (!opt->ignore_locale && has_non_ascii(p->pattern)) {
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 04/25] grep/pcre2: correct reference to grep_init() in comment
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2021-02-03  3:27 ` [PATCH 03/25] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 05/25] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason
                   ` (44 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Correct a comment added in 513f2b0bbd4 (grep: make PCRE2 aware of
custom allocator, 2019-10-16). This comment was never correct in
git.git, but was consistent with an older version of the patch[1].

1. https://lore.kernel.org/git/20190806163658.66932-3-carenas@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index f27c5de7f5..b9adcd83e7 100644
--- a/grep.c
+++ b/grep.c
@@ -373,7 +373,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	int patinforet;
 	size_t jitsizearg;
 
-	/* pcre2_global_context is initialized in append_grep_pattern */
+	/* pcre2_global_context is initialized in grep_init */
 	if (opt->ignore_case) {
 		if (!opt->ignore_locale && has_non_ascii(p->pattern)) {
 			if (!pcre2_global_context)
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 05/25] grep/pcre2: prepare to add debugging to pcre2_malloc()
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2021-02-03  3:27 ` [PATCH 04/25] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 06/25] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason
                   ` (43 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Change pcre2_malloc() in a way that'll make it easier for a debugging
fprintf() to spew out the allocated pointer. This doesn't introduce
any functional change, it just makes a subsequent commit's diff easier
to read. Changes code added in 513f2b0bbd4 (grep: make PCRE2 aware of
custom allocator, 2019-10-16).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index b9adcd83e7..f96d86c929 100644
--- a/grep.c
+++ b/grep.c
@@ -45,7 +45,8 @@ static pcre2_general_context *pcre2_global_context;
 
 static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data)
 {
-	return malloc(size);
+	void *pointer = malloc(size);
+	return pointer;
 }
 
 static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data)
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 06/25] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (4 preceding siblings ...)
  2021-02-03  3:27 ` [PATCH 05/25] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 07/25] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason
                   ` (42 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Add optional printing of PCREv2 allocations to stderr for a developer
who manually changes the GREP_PCRE2_DEBUG_MALLOC definition to
"1".

This will be referenced a subsequent commit, and is generally useful
to manually see what's going on with PCREv2 allocations while working
on that code.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/grep.c b/grep.c
index f96d86c929..7d262a23d8 100644
--- a/grep.c
+++ b/grep.c
@@ -42,15 +42,25 @@ static struct grep_opt grep_defaults = {
 
 #ifdef USE_LIBPCRE2
 static pcre2_general_context *pcre2_global_context;
+#define GREP_PCRE2_DEBUG_MALLOC 0
 
 static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data)
 {
 	void *pointer = malloc(size);
+#if GREP_PCRE2_DEBUG_MALLOC
+	static int count = 1;
+	fprintf(stderr, "PCRE2:%p -> #%02d: alloc(%lu)\n", pointer, count++, size);
+#endif
 	return pointer;
 }
 
 static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data)
 {
+#if GREP_PCRE2_DEBUG_MALLOC
+	static int count = 1;
+	if (pointer)
+		fprintf(stderr, "PCRE2:%p -> #%02d: free()\n", pointer, count++);
+#endif
 	free(pointer);
 }
 #endif
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 07/25] grep/pcre2: use compile-time PCREv2 version test
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (5 preceding siblings ...)
  2021-02-03  3:27 ` [PATCH 06/25] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 08/25] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason
                   ` (41 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Replace a use of pcre2_config(PCRE2_CONFIG_VERSION, ...) which I added
in 95ca1f987ed (grep/pcre2: better support invalid UTF-8 haystacks,
2021-01-24) with the same test done at compile-time.

It might be cuter to do this at runtime since we don't have to do the
"major >= 11 || (major >= 10 && ...)" test. But in the next commit
we'll add another version comparison that absolutely needs to be done
at compile-time, so we're better of being consistent across the board.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 18 ++++--------------
 grep.h |  3 +++
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/grep.c b/grep.c
index 7d262a23d8..e58044474d 100644
--- a/grep.c
+++ b/grep.c
@@ -400,21 +400,11 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	    !(!opt->ignore_case && (p->fixed || p->is_fixed)))
 		options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
 
+#ifdef GIT_PCRE2_VERSION_10_36_OR_HIGHER
 	/* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */
-	if (PCRE2_MATCH_INVALID_UTF && options & (PCRE2_UTF | PCRE2_CASELESS)) {
-		struct strbuf buf;
-		int len;
-		int err;
-
-		if ((len = pcre2_config(PCRE2_CONFIG_VERSION, NULL)) < 0)
-			BUG("pcre2_config(..., NULL) failed: %d", len);
-		strbuf_init(&buf, len + 1);
-		if ((err = pcre2_config(PCRE2_CONFIG_VERSION, buf.buf)) < 0)
-			BUG("pcre2_config(..., buf.buf) failed: %d", err);
-		if (versioncmp(buf.buf, "10.36") < 0)
-			options |= PCRE2_NO_START_OPTIMIZE;
-		strbuf_release(&buf);
-	}
+	if (PCRE2_MATCH_INVALID_UTF && options & (PCRE2_UTF | PCRE2_CASELESS))
+		options |= PCRE2_NO_START_OPTIMIZE;
+#endif
 
 	p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
 					 p->patternlen, options, &error, &erroffset,
diff --git a/grep.h b/grep.h
index ae89d6254b..54e52042cb 100644
--- a/grep.h
+++ b/grep.h
@@ -4,6 +4,9 @@
 #ifdef USE_LIBPCRE2
 #define PCRE2_CODE_UNIT_WIDTH 8
 #include <pcre2.h>
+#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11
+#define GIT_PCRE2_VERSION_10_36_OR_HIGHER
+#endif
 #else
 typedef int pcre2_code;
 typedef int pcre2_match_data;
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 08/25] grep/pcre2: use pcre2_maketables_free() function
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (6 preceding siblings ...)
  2021-02-03  3:27 ` [PATCH 07/25] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 09/25] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason
                   ` (40 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Make use of the pcre2_maketables_free() function to free the memory
allocated by pcre2_maketables(). At first sight it's strange that
10da030ab75 (grep: avoid leak of chartables in PCRE2, 2019-10-16)
which added the free() call here doesn't make use of the pcre2_free()
the author introduced in the preceding commit in 513f2b0bbd4 (grep:
make PCRE2 aware of custom allocator, 2019-10-16).

The reason is that at the time the function didn't exist. It was first
introduced in PCREv2 version 10.34, released on 2019-11-21.

Let's make use of it behind a macro. I don't think this matters for
anything to do with custom allocators, but it makes our use of PCREv2
more discoverable. At some distant point in the future we'll be able
to drop the version guard, as nobody will be running a version older
than 10.34.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 4 ++++
 grep.h | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/grep.c b/grep.c
index e58044474d..c63dbff4b2 100644
--- a/grep.c
+++ b/grep.c
@@ -490,7 +490,11 @@ static void free_pcre2_pattern(struct grep_pat *p)
 	pcre2_compile_context_free(p->pcre2_compile_context);
 	pcre2_code_free(p->pcre2_pattern);
 	pcre2_match_data_free(p->pcre2_match_data);
+#ifdef GIT_PCRE2_VERSION_10_34_OR_HIGHER
+	pcre2_maketables_free(pcre2_global_context, p->pcre2_tables);
+#else
 	free((void *)p->pcre2_tables);
+#endif
 }
 #else /* !USE_LIBPCRE2 */
 static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
diff --git a/grep.h b/grep.h
index 54e52042cb..64666e9204 100644
--- a/grep.h
+++ b/grep.h
@@ -7,6 +7,9 @@
 #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11
 #define GIT_PCRE2_VERSION_10_36_OR_HIGHER
 #endif
+#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 34) || PCRE2_MAJOR >= 11
+#define GIT_PCRE2_VERSION_10_34_OR_HIGHER
+#endif
 #else
 typedef int pcre2_code;
 typedef int pcre2_match_data;
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 09/25] grep/pcre2: actually make pcre2 use custom allocator
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (7 preceding siblings ...)
  2021-02-03  3:27 ` [PATCH 08/25] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 10/25] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason
                   ` (39 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Continue work started in 513f2b0bbd4 (grep: make PCRE2 aware of custom
allocator, 2019-10-16) and make PCREv2 use our pcre2_{malloc,free}().
functions for allocation. We'll now use it for all PCREv2 allocations.

The reason 513f2b0bbd4 worked as a bugfix for the USE_NED_ALLOCATOR
issue is because it managed to target pretty much the allocation freed
via free(), as opposed to by a pcre2_*free() function. I.e. the
pcre2_maketables() and pcre2_maketables_free() pair. For most of the
rest we continued allocating with stock malloc() inside PCREv2 itself,
but didn't segfault because we'd use its corresponding free().

In a preceding commit of mine I changed the free() to
pcre2_maketables_free() on versions of PCREv2 10.34 and newer. So as
far as fixing the segfault goes we could revert 513f2b0bbd4. But then
we wouldn't use the desired allocator, let's just use it instead.

Before this patch we'd on e.g.:

    grep --threads=1 -iP æ.*var.*xyz

Only use pcre2_{malloc,free}() for 2 malloc() calls and 2
corresponding free() call. Now it's 12 calls to each. This can be
observed with the GREP_PCRE2_DEBUG_MALLOC debug mode.

Reading the history of how this bug got introduced it wasn't present
in Johannes's original patch[1] to fix the issue.

My reading of that thread is that the approach the follow-up patches
to Johannes's original pursued were based on misunderstanding of how
the PCREv2 API works. In particular this part of [2]:

    "most of the time (like when using UTF-8) the chartable (and
    therefore the global context) is not needed (even when using
    alternate allocators)"

That's simply not how PCREv2 memory allocation works. It's easy to see
how the misunderstanding came about. It's because (as noted above) the
issue was noticed because of our use of free() in our own grep.c for
freeing the memory allocated by pcre2_maketables().

Thus the misunderstanding that PCREv2's compile context is something
only needed for pcre2_maketables(), and e.g. an aborted earlier
attempt[3] to only set it up when we ourselves called
pcre2_maketables().

That's not what PCREv2's compile context is. To quote PCREv2's
documentation:

    "This context just contains pointers to (and data for) external
    memory management functions that are called from several places in
    the PCRE2 library."

Thus the failed attempts to go down the route of only creating the
general context in cases where we ourselves call pcre2_maketables(),
before finally settling on the approach 513f2b0bbd4 took of always
creating it.

Instead we should always create it, and then pass the general context
to those functions that accept it, so that they'll consistently use
our preferred memory allocation functions.

1. https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/
2. https://lore.kernel.org/git/CAPUEsphMh_ZqcH3M7PXC9jHTfEdQN3mhTAK2JDkdvKBp53YBoA@mail.gmail.com/
3. https://lore.kernel.org/git/20190806085014.47776-3-carenas@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/grep.c b/grep.c
index c63dbff4b2..0116ff5f09 100644
--- a/grep.c
+++ b/grep.c
@@ -390,7 +390,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 			if (!pcre2_global_context)
 				BUG("pcre2_global_context uninitialized");
 			p->pcre2_tables = pcre2_maketables(pcre2_global_context);
-			p->pcre2_compile_context = pcre2_compile_context_create(NULL);
+			p->pcre2_compile_context = pcre2_compile_context_create(pcre2_global_context);
 			pcre2_set_character_tables(p->pcre2_compile_context,
 							p->pcre2_tables);
 		}
@@ -411,7 +411,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 					 p->pcre2_compile_context);
 
 	if (p->pcre2_pattern) {
-		p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL);
+		p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, pcre2_global_context);
 		if (!p->pcre2_match_data)
 			die("Couldn't allocate PCRE2 match data");
 	} else {
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 10/25] grep/pcre2: move back to thread-only PCREv2 structures
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (8 preceding siblings ...)
  2021-02-03  3:27 ` [PATCH 09/25] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 11/25] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason
                   ` (38 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Change the setup of the "pcre2_general_context" to happen per-thread
in compile_pcre2_pattern() instead of in grep_init(), as happens with
all the rest of the pcre2_* members of the grep_pat structure.

As noted in the preceding commit the approach 513f2b0bbd4 (grep: make
PCRE2 aware of custom allocator, 2019-10-16) took to allocate the
pcre2_general_context seems to have been initially based on a
misunderstanding of how PCREv2 memory allocation works.

This approach of creating a global context is just added complexity
for almost zero gain. On my system it's 24 bytes saved per-thread, for
context PCREv2 will then go on to some kilobytes for its own
thread-local state.

As noted in 6d423dd542f (grep: don't redundantly compile throwaway
patterns under threading, 2017-05-25) the grep code is intentionally
not trying to micro-optimize allocations by e.g. sharing some PCREv2
structures globally, while making others thread-local.

So let's remove this special case and make all of them thread-local
for simplicity again.

See also the discussion in 94da9193a6 (grep: add support for PCRE v2,
2017-06-01) about thread safety, and Johannes's comments[1] to the
effect that we should be doing what this patch is doing.

1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.1908052120302.46@tvgsbejvaqbjf.bet/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/grep.c |  1 -
 grep.c         | 41 +++++++++++++++--------------------------
 grep.h         |  3 ++-
 3 files changed, 17 insertions(+), 28 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 55d06c9513..c69fe99340 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1175,6 +1175,5 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		run_pager(&opt, prefix);
 	clear_pathspec(&pathspec);
 	free_grep_patterns(&opt);
-	grep_destroy();
 	return !hit;
 }
diff --git a/grep.c b/grep.c
index 0116ff5f09..2599f329cd 100644
--- a/grep.c
+++ b/grep.c
@@ -41,7 +41,6 @@ static struct grep_opt grep_defaults = {
 };
 
 #ifdef USE_LIBPCRE2
-static pcre2_general_context *pcre2_global_context;
 #define GREP_PCRE2_DEBUG_MALLOC 0
 
 static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data)
@@ -163,20 +162,9 @@ int grep_config(const char *var, const char *value, void *cb)
  * Initialize one instance of grep_opt and copy the
  * default values from the template we read the configuration
  * information in an earlier call to git_config(grep_config).
- *
- * If using PCRE, make sure that the library is configured
- * to use the same allocator as Git (e.g. nedmalloc on Windows).
- *
- * Any allocated memory needs to be released in grep_destroy().
  */
 void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix)
 {
-#if defined(USE_LIBPCRE2)
-	if (!pcre2_global_context)
-		pcre2_global_context = pcre2_general_context_create(
-					pcre2_malloc, pcre2_free, NULL);
-#endif
-
 	*opt = grep_defaults;
 
 	opt->repo = repo;
@@ -186,13 +174,6 @@ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix
 	opt->header_tail = &opt->header_list;
 }
 
-void grep_destroy(void)
-{
-#ifdef USE_LIBPCRE2
-	pcre2_general_context_free(pcre2_global_context);
-#endif
-}
-
 static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt)
 {
 	/*
@@ -384,13 +365,20 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	int patinforet;
 	size_t jitsizearg;
 
-	/* pcre2_global_context is initialized in grep_init */
+	/*
+	 * Call pcre2_general_context_create() before calling any
+	 * other pcre2_*(). It sets up our malloc()/free() functions
+	 * with which everything else is allocated.
+	 */
+	p->pcre2_general_context = pcre2_general_context_create(
+		pcre2_malloc, pcre2_free, NULL);
+	if (!p->pcre2_general_context)
+		die("Couldn't allocate PCRE2 general context");
+
 	if (opt->ignore_case) {
 		if (!opt->ignore_locale && has_non_ascii(p->pattern)) {
-			if (!pcre2_global_context)
-				BUG("pcre2_global_context uninitialized");
-			p->pcre2_tables = pcre2_maketables(pcre2_global_context);
-			p->pcre2_compile_context = pcre2_compile_context_create(pcre2_global_context);
+			p->pcre2_tables = pcre2_maketables(p->pcre2_general_context);
+			p->pcre2_compile_context = pcre2_compile_context_create(p->pcre2_general_context);
 			pcre2_set_character_tables(p->pcre2_compile_context,
 							p->pcre2_tables);
 		}
@@ -411,7 +399,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 					 p->pcre2_compile_context);
 
 	if (p->pcre2_pattern) {
-		p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, pcre2_global_context);
+		p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, p->pcre2_general_context);
 		if (!p->pcre2_match_data)
 			die("Couldn't allocate PCRE2 match data");
 	} else {
@@ -491,10 +479,11 @@ static void free_pcre2_pattern(struct grep_pat *p)
 	pcre2_code_free(p->pcre2_pattern);
 	pcre2_match_data_free(p->pcre2_match_data);
 #ifdef GIT_PCRE2_VERSION_10_34_OR_HIGHER
-	pcre2_maketables_free(pcre2_global_context, p->pcre2_tables);
+	pcre2_maketables_free(p->pcre2_general_context, p->pcre2_tables);
 #else
 	free((void *)p->pcre2_tables);
 #endif
+	pcre2_general_context_free(p->pcre2_general_context);
 }
 #else /* !USE_LIBPCRE2 */
 static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
diff --git a/grep.h b/grep.h
index 64666e9204..72f82b1e30 100644
--- a/grep.h
+++ b/grep.h
@@ -14,6 +14,7 @@
 typedef int pcre2_code;
 typedef int pcre2_match_data;
 typedef int pcre2_compile_context;
+typedef int pcre2_general_context;
 #endif
 #ifndef PCRE2_MATCH_INVALID_UTF
 /* PCRE2_MATCH_* dummy also with !USE_LIBPCRE2, for test-pcre2-config.c */
@@ -75,6 +76,7 @@ struct grep_pat {
 	pcre2_code *pcre2_pattern;
 	pcre2_match_data *pcre2_match_data;
 	pcre2_compile_context *pcre2_compile_context;
+	pcre2_general_context *pcre2_general_context;
 	const uint8_t *pcre2_tables;
 	uint32_t pcre2_jit_on;
 	unsigned fixed:1;
@@ -167,7 +169,6 @@ struct grep_opt {
 
 int grep_config(const char *var, const char *value, void *);
 void grep_init(struct grep_opt *, struct repository *repo, const char *prefix);
-void grep_destroy(void);
 void grep_commit_pattern_type(enum grep_pattern_type, struct grep_opt *opt);
 
 void append_grep_pat(struct grep_opt *opt, const char *pat, size_t patlen, const char *origin, int no, enum grep_pat_token t);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 11/25] grep/pcre2: move definitions of pcre2_{malloc,free}
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (9 preceding siblings ...)
  2021-02-03  3:27 ` [PATCH 10/25] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 12/25] pickaxe tests: refactor to use test_commit --append Ævar Arnfjörð Bjarmason
                   ` (37 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Move the definitions of the pcre2_{malloc,free} functions above the
compile_pcre2_pattern() function they're used it. Before the preceding
commit they used to be needed earlier, but now we can move them to be
adjacent to the other PCREv2 functions.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 46 ++++++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/grep.c b/grep.c
index 2599f329cd..636ac48bf0 100644
--- a/grep.c
+++ b/grep.c
@@ -40,30 +40,6 @@ static struct grep_opt grep_defaults = {
 	.output = std_output,
 };
 
-#ifdef USE_LIBPCRE2
-#define GREP_PCRE2_DEBUG_MALLOC 0
-
-static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data)
-{
-	void *pointer = malloc(size);
-#if GREP_PCRE2_DEBUG_MALLOC
-	static int count = 1;
-	fprintf(stderr, "PCRE2:%p -> #%02d: alloc(%lu)\n", pointer, count++, size);
-#endif
-	return pointer;
-}
-
-static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data)
-{
-#if GREP_PCRE2_DEBUG_MALLOC
-	static int count = 1;
-	if (pointer)
-		fprintf(stderr, "PCRE2:%p -> #%02d: free()\n", pointer, count++);
-#endif
-	free(pointer);
-}
-#endif
-
 static const char *color_grep_slots[] = {
 	[GREP_COLOR_CONTEXT]	    = "context",
 	[GREP_COLOR_FILENAME]	    = "filename",
@@ -355,6 +331,28 @@ static int is_fixed(const char *s, size_t len)
 }
 
 #ifdef USE_LIBPCRE2
+#define GREP_PCRE2_DEBUG_MALLOC 0
+
+static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data)
+{
+	void *pointer = malloc(size);
+#if GREP_PCRE2_DEBUG_MALLOC
+	static int count = 1;
+	fprintf(stderr, "PCRE2:%p -> #%02d: alloc(%lu)\n", pointer, count++, size);
+#endif
+	return pointer;
+}
+
+static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data)
+{
+#if GREP_PCRE2_DEBUG_MALLOC
+	static int count = 1;
+	if (pointer)
+		fprintf(stderr, "PCRE2:%p -> #%02d: free()\n", pointer, count++);
+#endif
+	free(pointer);
+}
+
 static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
 {
 	int error;
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 12/25] pickaxe tests: refactor to use test_commit --append
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (10 preceding siblings ...)
  2021-02-03  3:27 ` [PATCH 11/25] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:27 ` [PATCH 13/25] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
                   ` (36 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Refactor existing tests added in e0e7cb8080c (log -G: ignore binary
files, 2018-12-14) to use the --append option I added in
3373518cc8b (test-lib functions: add an --append option to
test_commit, 2021-01-12).

See also f5d79bf7dd6 (tests: refactor a few tests to use "test_commit
--append", 2021-01-12) for prior similar refactoring.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4209-log-pickaxe.sh | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 5d06f5f45e..21d22b2a18 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -107,37 +107,35 @@ test_expect_success 'log -S --no-textconv (missing textconv tool)' '
 '
 
 test_expect_success 'setup log -[GS] binary & --text' '
-	git checkout --orphan GS-binary-and-text &&
-	git read-tree --empty &&
-	printf "a\na\0a\n" >data.bin &&
-	git add data.bin &&
-	git commit -m "create binary file" data.bin &&
-	printf "a\na\0a\n" >>data.bin &&
-	git commit -m "modify binary file" data.bin &&
-	git rm data.bin &&
-	git commit -m "delete binary file" data.bin &&
-	git log >full-log
+	test_create_repo GS-bin-txt &&
+	test_commit -C GS-bin-txt --append A data.bin "a\na\0a\n" &&
+	test_commit -C GS-bin-txt --append B data.bin "a\na\0a\n" &&
+	test_commit -C GS-bin-txt C data.bin "" &&
+	git -C GS-bin-txt log >full-log
 '
 
 test_expect_success 'log -G ignores binary files' '
-	git log -Ga >log &&
+	git -C GS-bin-txt log -Ga >log &&
 	test_must_be_empty log
 '
 
 test_expect_success 'log -G looks into binary files with -a' '
-	git log -a -Ga >log &&
+	git -C GS-bin-txt log -a -Ga >log &&
 	test_cmp log full-log
 '
 
 test_expect_success 'log -G looks into binary files with textconv filter' '
-	test_when_finished "rm .gitattributes" &&
-	echo "* diff=bin" >.gitattributes &&
-	git -c diff.bin.textconv=cat log -Ga >log &&
+	(
+		cd GS-bin-txt &&
+		test_when_finished "rm .gitattributes" &&
+		echo "* diff=bin" >.gitattributes &&
+		git -c diff.bin.textconv=cat log -Ga >../log
+	) &&
 	test_cmp log full-log
 '
 
 test_expect_success 'log -S looks into binary files' '
-	git log -Sa >log &&
+	git -C GS-bin-txt log -Sa >log &&
 	test_cmp log full-log
 '
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 13/25] pickaxe -S: support content with NULs under --pickaxe-regex
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (11 preceding siblings ...)
  2021-02-03  3:27 ` [PATCH 12/25] pickaxe tests: refactor to use test_commit --append Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:27 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:28 ` [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop Ævar Arnfjörð Bjarmason
                   ` (35 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Fix a bug in the matching routine powering -S<rx> --pickaxe-regex so
that we won't abort early on content that has NULs in it.

We've had a hard requirement on REG_STARTEND since 2f8952250a8 (regex:
add regexec_buf() that can work on a non NUL-terminated string,
2016-09-21), but this sanity check dates back to d01d8c67828 (Support
for pickaxe matching regular expressions, 2006-03-29).

It wasn't needed anymore, and as the now-passing test shows, actively
getting in our way.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c     | 4 ++--
 t/t4209-log-pickaxe.sh | 8 ++++++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index a9c6d60df2..208177bb40 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -82,12 +82,12 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
 		regmatch_t regmatch;
 		int flags = 0;
 
-		while (sz && *data &&
+		while (sz &&
 		       !regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
 			flags |= REG_NOTBOL;
 			data += regmatch.rm_eo;
 			sz -= regmatch.rm_eo;
-			if (sz && *data && regmatch.rm_so == regmatch.rm_eo) {
+			if (sz && regmatch.rm_so == regmatch.rm_eo) {
 				data++;
 				sz--;
 			}
diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 21d22b2a18..bd42848871 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -139,4 +139,12 @@ test_expect_success 'log -S looks into binary files' '
 	test_cmp log full-log
 '
 
+test_expect_success 'log -S --pickaxe-regex looks into binary files' '
+	git -C GS-bin-txt log --pickaxe-regex -Sa >log &&
+	test_cmp log full-log &&
+
+	git -C GS-bin-txt log --pickaxe-regex -S[a] >log &&
+	test_cmp log full-log
+'
+
 test_done
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (12 preceding siblings ...)
  2021-02-03  3:27 ` [PATCH 13/25] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:28 ` Ævar Arnfjörð Bjarmason
  2021-02-04 16:16   ` René Scharfe
  2021-02-03  3:28 ` [PATCH 15/25] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
                   ` (34 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

If we walk to the end of the string we just won't match the rest of
the regex. This removes an optimization for simplicity's sake. In
subsequent commits we'll alter this code more, and not having to think
about this condition makes it easier to read.

If we look at the context of what we're doing here the last thing we
need to be worried about is one extra regex match. The real problem is
that we keep matching after it's clear that the number of contains()
for "A" and "B" is different. So we could be much smarter here.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 208177bb40..8df76afb6e 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -82,12 +82,11 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
 		regmatch_t regmatch;
 		int flags = 0;
 
-		while (sz &&
-		       !regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
+		while (!regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
 			flags |= REG_NOTBOL;
 			data += regmatch.rm_eo;
 			sz -= regmatch.rm_eo;
-			if (sz && regmatch.rm_so == regmatch.rm_eo) {
+			if (regmatch.rm_so == regmatch.rm_eo) {
 				data++;
 				sz--;
 			}
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 15/25] pickaxe/style: consolidate declarations and assignments
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (13 preceding siblings ...)
  2021-02-03  3:28 ` [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:28 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:28 ` [PATCH 16/25] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
                   ` (33 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Refactor contains() to do its assignments at the same time that it
does its declarations.

This code could have been refactored in ef90ab66e8e (pickaxe: use
textconv for -S counting, 2012-10-28) when a function call between the
declarations and assignments was removed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 8df76afb6e..cb865c8b29 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -70,13 +70,9 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 
 static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
 {
-	unsigned int cnt;
-	unsigned long sz;
-	const char *data;
-
-	sz = mf->size;
-	data = mf->ptr;
-	cnt = 0;
+	unsigned int cnt = 0;
+	unsigned long sz = mf->size;
+	const char *data = mf->ptr;
 
 	if (regexp) {
 		regmatch_t regmatch;
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 16/25] pickaxe tests: add test for diffgrep_consume() internals
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (14 preceding siblings ...)
  2021-02-03  3:28 ` [PATCH 15/25] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:28 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:28 ` [PATCH 17/25] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
                   ` (32 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

In diffgrep_consume() we generate a diff, and then advance past the
"+" or "-" at the start of the line for matching. This has been done
ever since the code was added in f506b8e8b5f (git log/diff: add
-G<regexp> that greps in the patch text, 2010-08-23).

If we match "line" instead of "line + 1" no tests fail, i.e. we've got
zero coverage for whether any of our searches match the beginning of
the line or not. Let's add a test for this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4209-log-pickaxe.sh | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index bd42848871..ebd51f498b 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -106,6 +106,21 @@ test_expect_success 'log -S --no-textconv (missing textconv tool)' '
 	rm .gitattributes
 '
 
+test_expect_success 'setup log -[GS] plain' '
+	test_create_repo GS-plain &&
+	test_commit -C GS-plain --append A data.bin "a" &&
+	test_commit -C GS-plain --append B data.bin "a a" &&
+	test_commit -C GS-plain C data.bin "" &&
+	git -C GS-plain log >full-log
+'
+
+test_expect_success 'log -G trims diff new/old [-+]' '
+	git -C GS-plain log -G"[+-]a" >log &&
+	test_must_be_empty log &&
+	git -C GS-plain log -G"^a" >log &&
+	test_cmp log full-log
+'
+
 test_expect_success 'setup log -[GS] binary & --text' '
 	test_create_repo GS-bin-txt &&
 	test_commit -C GS-bin-txt --append A data.bin "a\na\0a\n" &&
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 17/25] pickaxe tests: add test for "log -S" not being a regex
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (15 preceding siblings ...)
  2021-02-03  3:28 ` [PATCH 16/25] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:28 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:28 ` [PATCH 18/25] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
                   ` (31 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

No test in our test suite checked for "log -S<pat>" being a fixed
string, as opposed to "log -S<pat> --pickaxe-regex". Let's test for
it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4209-log-pickaxe.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index ebd51f498b..b59aaecc68 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -121,6 +121,17 @@ test_expect_success 'log -G trims diff new/old [-+]' '
 	test_cmp log full-log
 '
 
+test_expect_success 'log -S<pat> is not a regex, but -S<pat> --pickaxe-regex is' '
+	git -C GS-plain log -S"a" >log &&
+	test_cmp log full-log &&
+
+	git -C GS-plain log -S"[a]" >log &&
+	test_must_be_empty log &&
+
+	git -C GS-plain log -S"[a]" --pickaxe-regex >log &&
+	test_cmp log full-log
+'
+
 test_expect_success 'setup log -[GS] binary & --text' '
 	test_create_repo GS-bin-txt &&
 	test_commit -C GS-bin-txt --append A data.bin "a\na\0a\n" &&
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 18/25] perf: add performance test for pickaxe
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (16 preceding siblings ...)
  2021-02-03  3:28 ` [PATCH 17/25] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:28 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:28 ` [PATCH 19/25] pickaxe -G: set -U0 for diff generation Ævar Arnfjörð Bjarmason
                   ` (30 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Add a test for the -G and -S pickaxe options and related options. This
test supports being run with GIT_PERF_EXTRA=1 to turn on the full set
of tests, as well as GIT_TEST_LONG=1 to opt-in a full history walk. By
default I'm limiting the walk to 500 commits, which seems to hit a
good spot on git.git of around 0.5s per iteration.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/perf/p4209-pickaxe.sh | 82 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)
 create mode 100755 t/perf/p4209-pickaxe.sh

diff --git a/t/perf/p4209-pickaxe.sh b/t/perf/p4209-pickaxe.sh
new file mode 100755
index 0000000000..011a287d3b
--- /dev/null
+++ b/t/perf/p4209-pickaxe.sh
@@ -0,0 +1,82 @@
+#!/bin/sh
+
+test_description="Test pickaxe performance"
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+# Not --max-count, as that's the number of matching commit, so it's
+# unbounded. We want to limit our revision walk here.
+from_rev_desc=
+from_rev=
+if ! test_have_prereq EXPENSIVE
+then
+	max_count=500
+	from_rev=" $(git rev-list HEAD | head -n $max_count | tail -n 1).."
+	from_rev_desc=" <limit-rev>.."
+fi
+
+for icase in \
+	'' \
+	'-i '
+do
+	# -S (no regex)
+	for pattern in \
+		'a' \
+		'uncommon'\
+		'ö'
+	do
+		for opts in \
+			'-S'
+		do
+			continue
+			test_perf "git log $icase$opts'$pattern'$from_rev_desc" "
+				git log --pretty=format:%H $icase$opts'$pattern'$from_rev
+			"
+		done
+	done
+
+	# -S (regex)
+	for pattern in  \
+		'[þæö]'
+	do
+		for opts in \
+			'--pickaxe-regex -S'
+		do
+			test_perf "git log $icase$opts'$pattern'$from_rev_desc" "
+				git log --pretty=format:%H $icase$opts'$pattern'$from_rev
+			"
+		done
+	done
+
+	# -G
+	for pattern in  \
+		'a' \
+		'uncommon' \
+		'[þæö]'
+	do
+		for opts in \
+			'-G' \
+			'--pickaxe-regex -S'
+		do
+			test_perf "git log $icase$opts'$pattern'$from_rev_desc" "
+				git log --pretty=format:%H $icase$opts'$pattern'$from_rev
+			"
+		done
+
+		# -G extra
+		for opts in \
+			'--text -G' \
+			'--text --pickaxe-all -G' \
+			'--pickaxe-all -G' \
+			'--pickaxe-all --pickaxe-regex -S'
+		do
+			test_perf PERF_EXTRA "git log $icase$opts'$pattern'$from_rev_desc" "
+				git log --pretty=format:%H $icase$opts'$pattern'$from_rev
+			"
+		done
+	done
+done
+
+test_done
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 19/25] pickaxe -G: set -U0 for diff generation
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (17 preceding siblings ...)
  2021-02-03  3:28 ` [PATCH 18/25] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:28 ` Ævar Arnfjörð Bjarmason
  2021-02-03 14:26   ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:28 ` [PATCH 20/25] grep.h: make patmatch() a public function Ævar Arnfjörð Bjarmason
                   ` (29 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Set the equivalent of -U0 when generating diffs for "git log -G". As
seen in diffgrep_consume() we ignore any lines that aren't the "+" and
"-" lines, so the rest of the output wasn't being used.

It turns out that we spent quite a bit of CPU just on this[1]:

    Test                                             HEAD~             HEAD
    -----------------------------------------------------------------------------------------
    4209.2: git log -G'a' <limit-rev>..              0.60(0.54+0.06)   0.52(0.46+0.05) -13.3%
    4209.8: git log -G'uncommon' <limit-rev>..       0.61(0.54+0.07)   0.53(0.47+0.06) -13.1%
    4209.14: git log -G'[þæö]' <limit-rev>..         0.60(0.55+0.04)   0.56(0.48+0.04) -6.7%
    4209.21: git log -i -G'a' <limit-rev>..          0.63(0.56+0.03)   0.54(0.48+0.05) -14.3%
    4209.27: git log -i -G'uncommon' <limit-rev>..   0.61(0.55+0.05)   0.53(0.47+0.06) -13.1%
    4209.33: git log -i -G'[þæö]' <limit-rev>..      0.61(0.53+0.07)   0.53(0.47+0.05) -13.1%

I also experimented with setting diff.interHunkContext to 10, 100
etc. As noted above it's useless for -G to have non-"+" and non-"-"
lines for the matching itself, but there's going to be some sweet spot
where if we can be handed bigger hunks at a time our matching might be
faster.

But alas, the results of that were:

    Test                                             HEAD~2            HEAD~                    HEAD
    ------------------------------------------------------------------------------------------------------------------
    4209.2: git log -G'a' <limit-rev>..              0.61(0.53+0.07)   0.51(0.46+0.05) -16.4%   0.51(0.46+0.05) -16.4%
    4209.8: git log -G'uncommon' <limit-rev>..       0.66(0.55+0.05)   0.53(0.48+0.04) -19.7%   0.52(0.49+0.03) -21.2%
    4209.14: git log -G'[þæö]' <limit-rev>..         0.63(0.54+0.06)   0.51(0.44+0.07) -19.0%   0.52(0.46+0.06) -17.5%
    4209.21: git log -i -G'a' <limit-rev>..          0.62(0.54+0.07)   0.51(0.46+0.04) -17.7%   0.53(0.45+0.07) -14.5%
    4209.27: git log -i -G'uncommon' <limit-rev>..   0.62(0.56+0.06)   0.53(0.48+0.05) -14.5%   0.53(0.46+0.07) -14.5%
    4209.33: git log -i -G'[þæö]' <limit-rev>..      0.63(0.57+0.03)   0.58(0.46+0.06) -7.9%    0.53(0.46+0.06) -15.9%

I.e. maybe it's faster in some cases, but probably slower in general.

Those results are going to be crappy because we're matching a line at
a time, as opposed to some version of /m matching across the whole
diff (if possible). So that approach might be worth revisiting in the
future.

1. GIT_SKIP_TESTS="p4209.[1379] p4209.15 p4209.2[028] p4209.34" GIT_PERF_EXTRA= GIT_PERF_REPO=~/g/git/ GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst' ./run HEAD~ HEAD -- p4209-pickaxe.sh

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index cb865c8b29..5161c81057 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -60,7 +60,7 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	memset(&xecfg, 0, sizeof(xecfg));
 	ecbdata.regexp = regexp;
 	ecbdata.hit = 0;
-	xecfg.ctxlen = o->context;
+	xecfg.ctxlen = 0;
 	xecfg.interhunkctxlen = o->interhunkcontext;
 	if (xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
 			  &ecbdata, &xpp, &xecfg))
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 20/25] grep.h: make patmatch() a public function
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (18 preceding siblings ...)
  2021-02-03  3:28 ` [PATCH 19/25] pickaxe -G: set -U0 for diff generation Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:28 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:28 ` [PATCH 21/25] pickaxe: use PCREv2 for -G and -S Ævar Arnfjörð Bjarmason
                   ` (28 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

I'd like to use the PCRE & ERE etc. code in grep.c for more things in
git, starting with diffcore-pickaxe.c.

The current API just exposes grep_{source,buffer}() for that
purpose. I could use those, but they're very fat entry points into the
entire set of bells and whistles that grep.c supports for "git
grep". I just want the equivalent of a light regexec() wrapper for my
compiled patterns.

So let's expose patmatch() for that purpose. It's not perfect, in
particular it's a bit ugly that we need to pop a pattern off the
opt->pattern_list if all we've got is the "grep_opt" wrapper struct,
but it'll do for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 4 ++--
 grep.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/grep.c b/grep.c
index 636ac48bf0..8d84313d6e 100644
--- a/grep.c
+++ b/grep.c
@@ -906,8 +906,8 @@ static void show_name(struct grep_opt *opt, const char *name)
 	opt->output(opt, opt->null_following_name ? "\0" : "\n", 1);
 }
 
-static int patmatch(struct grep_pat *p, char *line, char *eol,
-		    regmatch_t *match, int eflags)
+int patmatch(struct grep_pat *p, char *line, char *eol,
+	     regmatch_t *match, int eflags)
 {
 	int hit;
 
diff --git a/grep.h b/grep.h
index 72f82b1e30..66e2ee37f3 100644
--- a/grep.h
+++ b/grep.h
@@ -205,6 +205,8 @@ void grep_source_load_driver(struct grep_source *gs,
 
 
 int grep_source(struct grep_opt *opt, struct grep_source *gs);
+int patmatch(struct grep_pat *p, char *line, char *eol,
+	     regmatch_t *match, int eflags);
 
 struct grep_opt *grep_opt_dup(const struct grep_opt *opt);
 int grep_threads_ok(const struct grep_opt *opt);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 21/25] pickaxe: use PCREv2 for -G and -S
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (19 preceding siblings ...)
  2021-02-03  3:28 ` [PATCH 20/25] grep.h: make patmatch() a public function Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:28 ` Ævar Arnfjörð Bjarmason
  2021-02-03 20:44   ` Ævar Arnfjörð Bjarmason
  2021-02-04 18:22   ` Junio C Hamano
  2021-02-03  3:28 ` [PATCH 22/25] Remove unused kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (27 subsequent siblings)
  48 siblings, 2 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Follow-up b65abcafc7a (grep: use PCRE v2 for optimized fixed-string
search, 2019-07-01) and remove the use of kwset in the pickaxe code
for fixed-string search, in favor of optimistically using PCRE v2.

This does mean that the semantics of the -G option subtly change,
before it's an ERE, whereas now it'll be a PCRE if we're compiled with
PCRE. Since PCRE is almost entirely a strict superset of ERE syntax I
think this is OK.

Now when running the newly added t/perf/p4209-pickaxe.sh[1] and the
latest PCRE v2 we'll get the following performance improvements (well,
mostly improvements):

    Test                                                                           origin/next       HEAD
    -----------------------------------------------------------------------------------------------------------------------
    4209.1: git log --pickaxe-regex -S'[þæö]' <limit-rev>..                        0.41(0.37+0.03)   0.42(0.37+0.05) +2.4%
    4209.2: git log -G'a' <limit-rev>..                                            0.61(0.52+0.08)   0.50(0.45+0.05) -18.0%
    4209.3: git log --pickaxe-regex -S'a' <limit-rev>..                            0.73(0.66+0.07)   0.42(0.37+0.05) -42.5%
    4209.4: git log --text -G'a' <limit-rev>..                                     0.61(0.54+0.06)   0.50(0.44+0.05) -18.0%
    4209.5: git log --text --pickaxe-all -G'a' <limit-rev>..                       0.44(0.37+0.06)   0.46(0.36+0.06) +4.5%
    4209.6: git log --pickaxe-all -G'a' <limit-rev>..                              0.46(0.39+0.07)   0.38(0.33+0.05) -17.4%
    4209.7: git log --pickaxe-all --pickaxe-regex -S'a' <limit-rev>..              0.55(0.48+0.06)   0.35(0.30+0.04) -36.4%
    4209.8: git log -G'uncommon' <limit-rev>..                                     0.68(0.60+0.07)   0.59(0.53+0.05) -13.2%
    4209.9: git log --pickaxe-regex -S'uncommon' <limit-rev>..                     0.48(0.45+0.03)   0.36(0.31+0.05) -25.0%
    4209.10: git log --text -G'uncommon' <limit-rev>..                             0.66(0.58+0.07)   0.58(0.52+0.04) -12.1%
    4209.11: git log --text --pickaxe-all -G'uncommon' <limit-rev>..               0.67(0.61+0.05)   0.57(0.52+0.05) -14.9%
    4209.12: git log --pickaxe-all -G'uncommon' <limit-rev>..                      0.62(0.55+0.06)   0.52(0.46+0.06) -16.1%
    4209.13: git log --pickaxe-all --pickaxe-regex -S'uncommon' <limit-rev>..      0.49(0.43+0.06)   0.31(0.26+0.04) -36.7%
    4209.14: git log -G'[þæö]' <limit-rev>..                                       0.71(0.64+0.07)   0.51(0.47+0.04) -28.2%
    4209.15: git log --pickaxe-regex -S'[þæö]' <limit-rev>..                       0.45(0.40+0.05)   0.42(0.37+0.04) -6.7%
    4209.16: git log --text -G'[þæö]' <limit-rev>..                                0.64(0.56+0.07)   0.50(0.44+0.06) -21.9%
    4209.17: git log --text --pickaxe-all -G'[þæö]' <limit-rev>..                  0.64(0.54+0.09)   0.50(0.47+0.03) -21.9%
    4209.18: git log --pickaxe-all -G'[þæö]' <limit-rev>..                         0.66(0.59+0.07)   0.50(0.45+0.05) -24.2%
    4209.19: git log --pickaxe-all --pickaxe-regex -S'[þæö]' <limit-rev>..         0.42(0.38+0.04)   0.41(0.37+0.04) -2.4%
    4209.20: git log -i --pickaxe-regex -S'[þæö]' <limit-rev>..                    0.41(0.38+0.03)   0.49(0.43+0.05) +19.5%
    4209.21: git log -i -G'a' <limit-rev>..                                        0.63(0.61+0.02)   0.50(0.45+0.05) -20.6%
    4209.22: git log -i --pickaxe-regex -S'a' <limit-rev>..                        0.81(0.75+0.06)   0.60(0.58+0.02) -25.9%
    4209.23: git log -i --text -G'a' <limit-rev>..                                 0.64(0.54+0.10)   0.49(0.44+0.05) -23.4%
    4209.24: git log -i --text --pickaxe-all -G'a' <limit-rev>..                   0.47(0.43+0.04)   0.37(0.31+0.06) -21.3%
    4209.25: git log -i --pickaxe-all -G'a' <limit-rev>..                          0.51(0.43+0.08)   0.37(0.31+0.06) -27.5%
    4209.26: git log -i --pickaxe-all --pickaxe-regex -S'a' <limit-rev>..          0.62(0.55+0.05)   0.46(0.41+0.04) -25.8%
    4209.27: git log -i -G'uncommon' <limit-rev>..                                 0.64(0.58+0.05)   0.51(0.48+0.03) -20.3%
    4209.28: git log -i --pickaxe-regex -S'uncommon' <limit-rev>..                 0.48(0.44+0.04)   0.47(0.42+0.05) -2.1%
    4209.29: git log -i --text -G'uncommon' <limit-rev>..                          0.62(0.53+0.08)   0.51(0.44+0.07) -17.7%
    4209.30: git log -i --text --pickaxe-all -G'uncommon' <limit-rev>..            0.61(0.53+0.08)   0.51(0.45+0.06) -16.4%
    4209.31: git log -i --pickaxe-all -G'uncommon' <limit-rev>..                   0.63(0.57+0.05)   0.51(0.47+0.04) -19.0%
    4209.32: git log -i --pickaxe-all --pickaxe-regex -S'uncommon' <limit-rev>..   0.47(0.42+0.04)   0.47(0.45+0.01) +0.0%
    4209.33: git log -i -G'[þæö]' <limit-rev>..                                    0.60(0.54+0.05)   0.51(0.47+0.04) -15.0%
    4209.34: git log -i --pickaxe-regex -S'[þæö]' <limit-rev>..                    0.39(0.33+0.05)   0.49(0.45+0.03) +25.6%
    4209.35: git log -i --text -G'[þæö]' <limit-rev>..                             0.59(0.53+0.06)   0.51(0.47+0.02) -13.6%
    4209.36: git log -i --text --pickaxe-all -G'[þæö]' <limit-rev>..               0.62(0.56+0.05)   0.51(0.45+0.04) -17.7%
    4209.37: git log -i --pickaxe-all -G'[þæö]' <limit-rev>..                      0.61(0.55+0.06)   0.51(0.47+0.04) -16.4%
    4209.38: git log -i --pickaxe-all --pickaxe-regex -S'[þæö]' <limit-rev>..      0.42(0.39+0.02)   0.49(0.46+0.03) +16.7%

1. With these options:

    GIT_PERF_EXTRA=1 GIT_PERF_REPEAT_COUNT=10 GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst' ./run origin/next HEAD -- p4209-pickaxe.sh

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.h             |   4 ++
 diffcore-pickaxe.c | 143 ++++++++++++++++++---------------------------
 2 files changed, 61 insertions(+), 86 deletions(-)

diff --git a/diff.h b/diff.h
index 2ff2b1c7f2..2f369c162b 100644
--- a/diff.h
+++ b/diff.h
@@ -365,6 +365,8 @@ struct diff_options {
 
 	struct repository *repo;
 	struct option *parseopts;
+
+	struct grep_opt *grep_filter;
 };
 
 unsigned diff_filter_bit(char status);
@@ -520,6 +522,8 @@ int git_config_rename(const char *var, const char *value);
 #define DIFF_PICKAXE_KIND_G	8 /* grep in the patch */
 #define DIFF_PICKAXE_KIND_OBJFIND	16 /* specific object IDs */
 
+#define DIFF_PICKAXE_KIND_GS_MASK (DIFF_PICKAXE_KIND_S | \
+				   DIFF_PICKAXE_KIND_G)
 #define DIFF_PICKAXE_KINDS_MASK (DIFF_PICKAXE_KIND_S | \
 				 DIFF_PICKAXE_KIND_G | \
 				 DIFF_PICKAXE_KIND_OBJFIND)
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 5161c81057..25ab1b2427 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -6,16 +6,16 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "xdiff-interface.h"
-#include "kwset.h"
+#include "grep.h"
 #include "commit.h"
 #include "quote.h"
 
 typedef int (*pickaxe_fn)(mmfile_t *one, mmfile_t *two,
 			  struct diff_options *o,
-			  regex_t *regexp, kwset_t kws);
+			  struct grep_opt *grep_filter);
 
 struct diffgrep_cb {
-	regex_t *regexp;
+	struct grep_opt	*grep_filter;
 	int hit;
 };
 
@@ -23,6 +23,8 @@ static void diffgrep_consume(void *priv, char *line, unsigned long len)
 {
 	struct diffgrep_cb *data = priv;
 	regmatch_t regmatch;
+	struct grep_opt *grep_filter = data->grep_filter;
+	struct grep_pat *grep_pat = grep_filter->pattern_list;
 
 	if (line[0] != '+' && line[0] != '-')
 		return;
@@ -32,25 +34,25 @@ static void diffgrep_consume(void *priv, char *line, unsigned long len)
 		 * caller early.
 		 */
 		return;
-	data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1,
-				 &regmatch, 0);
+	data->hit = patmatch(grep_pat, line + 1, line + len + 1, &regmatch, 0);
 }
 
 static int diff_grep(mmfile_t *one, mmfile_t *two,
 		     struct diff_options *o,
-		     regex_t *regexp, kwset_t kws)
+		     struct grep_opt *grep_filter)
 {
-	regmatch_t regmatch;
 	struct diffgrep_cb ecbdata;
 	xpparam_t xpp;
 	xdemitconf_t xecfg;
+	regmatch_t regmatch;
+	struct grep_pat *grep_pat = grep_filter->pattern_list;
 
 	if (!one)
-		return !regexec_buf(regexp, two->ptr, two->size,
-				    1, &regmatch, 0);
+		return patmatch(grep_pat, two->ptr, two->ptr + two->size,
+				&regmatch, 0);
 	if (!two)
-		return !regexec_buf(regexp, one->ptr, one->size,
-				    1, &regmatch, 0);
+		return patmatch(grep_pat, one->ptr, one->ptr + one->size,
+				&regmatch, 0);
 
 	/*
 	 * We have both sides; need to run textual diff and see if
@@ -58,7 +60,7 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	 */
 	memset(&xpp, 0, sizeof(xpp));
 	memset(&xecfg, 0, sizeof(xecfg));
-	ecbdata.regexp = regexp;
+	ecbdata.grep_filter = grep_filter;
 	ecbdata.hit = 0;
 	xecfg.ctxlen = 0;
 	xecfg.interhunkctxlen = o->interhunkcontext;
@@ -68,52 +70,40 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	return ecbdata.hit;
 }
 
-static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
+static unsigned int contains(mmfile_t *mf, struct grep_opt *grep_filter)
 {
+
 	unsigned int cnt = 0;
 	unsigned long sz = mf->size;
-	const char *data = mf->ptr;
-
-	if (regexp) {
-		regmatch_t regmatch;
-		int flags = 0;
-
-		while (!regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
-			flags |= REG_NOTBOL;
-			data += regmatch.rm_eo;
-			sz -= regmatch.rm_eo;
-			if (regmatch.rm_so == regmatch.rm_eo) {
-				data++;
-				sz--;
-			}
-			cnt++;
-		}
-
-	} else { /* Classic exact string match */
-		while (sz) {
-			struct kwsmatch kwsm;
-			size_t offset = kwsexec(kws, data, sz, &kwsm);
-			if (offset == -1)
-				break;
-			sz -= offset + kwsm.size[0];
-			data += offset + kwsm.size[0];
-			cnt++;
+	char *data = mf->ptr;
+	regmatch_t regmatch;
+	int flags = 0;
+	struct grep_pat *grep_pat = grep_filter->pattern_list;
+
+	while (patmatch(grep_pat, data, data + sz, &regmatch, flags)) {
+		flags |= REG_NOTBOL;
+		data += regmatch.rm_eo;
+		sz -= regmatch.rm_eo;
+		if (regmatch.rm_so == regmatch.rm_eo) {
+			data++;
+			sz--;
 		}
+		cnt++;
 	}
 	return cnt;
 }
 
 static int has_changes(mmfile_t *one, mmfile_t *two,
 		       struct diff_options *o,
-		       regex_t *regexp, kwset_t kws)
+		       struct grep_opt *grep_filter)
 {
-	unsigned int one_contains = one ? contains(one, regexp, kws) : 0;
-	unsigned int two_contains = two ? contains(two, regexp, kws) : 0;
+	unsigned int one_contains = one ? contains(one, grep_filter) : 0;
+	unsigned int two_contains = two ? contains(two, grep_filter) : 0;
 	return one_contains != two_contains;
 }
 
 static int pickaxe_match(struct diff_filepair *p, struct diff_options *o,
-			 regex_t *regexp, kwset_t kws, pickaxe_fn fn)
+			 struct grep_opt *grep_filter, pickaxe_fn fn)
 {
 	struct userdiff_driver *textconv_one = NULL;
 	struct userdiff_driver *textconv_two = NULL;
@@ -160,7 +150,7 @@ static int pickaxe_match(struct diff_filepair *p, struct diff_options *o,
 
 	ret = fn(DIFF_FILE_VALID(p->one) ? &mf1 : NULL,
 		 DIFF_FILE_VALID(p->two) ? &mf2 : NULL,
-		 o, regexp, kws);
+		 o, grep_filter);
 
 	if (textconv_one)
 		free(mf1.ptr);
@@ -173,7 +163,7 @@ static int pickaxe_match(struct diff_filepair *p, struct diff_options *o,
 }
 
 static void pickaxe(struct diff_queue_struct *q, struct diff_options *o,
-		    regex_t *regexp, kwset_t kws, pickaxe_fn fn)
+		    struct grep_opt *grep_filter, pickaxe_fn fn)
 {
 	int i;
 	struct diff_queue_struct outq;
@@ -184,7 +174,7 @@ static void pickaxe(struct diff_queue_struct *q, struct diff_options *o,
 		/* Showing the whole changeset if needle exists */
 		for (i = 0; i < q->nr; i++) {
 			struct diff_filepair *p = q->queue[i];
-			if (pickaxe_match(p, o, regexp, kws, fn))
+			if (pickaxe_match(p, o, grep_filter, fn))
 				return; /* do not munge the queue */
 		}
 
@@ -199,7 +189,7 @@ static void pickaxe(struct diff_queue_struct *q, struct diff_options *o,
 		/* Showing only the filepairs that has the needle */
 		for (i = 0; i < q->nr; i++) {
 			struct diff_filepair *p = q->queue[i];
-			if (pickaxe_match(p, o, regexp, kws, fn))
+			if (pickaxe_match(p, o, grep_filter, fn))
 				diff_q(&outq, p);
 			else
 				diff_free_filepair(p);
@@ -210,54 +200,35 @@ static void pickaxe(struct diff_queue_struct *q, struct diff_options *o,
 	*q = outq;
 }
 
-static void regcomp_or_die(regex_t *regex, const char *needle, int cflags)
-{
-	int err = regcomp(regex, needle, cflags);
-	if (err) {
-		/* The POSIX.2 people are surely sick */
-		char errbuf[1024];
-		regerror(err, regex, errbuf, 1024);
-		die("invalid regex: %s", errbuf);
-	}
-}
-
 void diffcore_pickaxe(struct diff_options *o)
 {
 	const char *needle = o->pickaxe;
 	int opts = o->pickaxe_opts;
-	regex_t regex, *regexp = NULL;
-	kwset_t kws = NULL;
+	struct grep_opt opt;
+
+	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_GS_MASK)) {
+		grep_init(&opt, the_repository, NULL);
+#ifdef USE_LIBPCRE2
+		grep_commit_pattern_type(GREP_PATTERN_TYPE_PCRE, &opt);
+#else
+		grep_commit_pattern_type(GREP_PATTERN_TYPE_ERE, &opt);
+#endif
 
-	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
-		int cflags = REG_EXTENDED | REG_NEWLINE;
 		if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE)
-			cflags |= REG_ICASE;
-		regcomp_or_die(&regex, needle, cflags);
-		regexp = &regex;
-	} else if (opts & DIFF_PICKAXE_KIND_S) {
-		if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE &&
-		    has_non_ascii(needle)) {
-			struct strbuf sb = STRBUF_INIT;
-			int cflags = REG_NEWLINE | REG_ICASE;
-
-			basic_regex_quote_buf(&sb, needle);
-			regcomp_or_die(&regex, sb.buf, cflags);
-			strbuf_release(&sb);
-			regexp = &regex;
-		} else {
-			kws = kwsalloc(o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE
-				       ? tolower_trans_tbl : NULL);
-			kwsincr(kws, needle, strlen(needle));
-			kwsprep(kws);
-		}
+			opt.ignore_case = 1;
+		if (opts & DIFF_PICKAXE_KIND_S &&
+		    !(opts & DIFF_PICKAXE_REGEX))
+			opt.fixed = 1;
+
+		append_grep_pattern(&opt, needle, "diffcore-pickaxe", 0, GREP_PATTERN);
+		compile_grep_patterns(&opt);
 	}
 
-	pickaxe(&diff_queued_diff, o, regexp, kws,
+	pickaxe(&diff_queued_diff, o, &opt,
 		(opts & DIFF_PICKAXE_KIND_G) ? diff_grep : has_changes);
 
-	if (regexp)
-		regfree(regexp);
-	if (kws)
-		kwsfree(kws);
+	if (opts & ~DIFF_PICKAXE_KIND_OBJFIND)
+		free_grep_patterns(&opt);
+
 	return;
 }
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 22/25] Remove unused kwset.[ch]
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (20 preceding siblings ...)
  2021-02-03  3:28 ` [PATCH 21/25] pickaxe: use PCREv2 for -G and -S Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:28 ` Ævar Arnfjörð Bjarmason
       [not found]   ` <CAPUEspgBmuTBHVZWY9fRtjbHWBRr0zHravLL1Czepc6jmib4HA@mail.gmail.com>
  2021-02-03  3:28 ` [PATCH 23/25] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
                   ` (26 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Remove the unused kwset matching engine. It has not been used by grep
itself since 48de2a768cf (grep: remove the kwset optimization,
2019-07-01), and in a previous commit we removed its last remaining
use in diffcore-pickaxe.c.

This matching engine is much faster than the C library's regex
matching for fixed strings, but slower than PCREv2, which we can make
use of in the common case.

It's also an increasingly bitrotting version of the GPLv2 code, whose
upstream has moved onto the incompatible GPLv3, so getting rid of the
need to maintain a permanent fork of it is a good thing.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile          |   2 -
 compat/obstack.c  | 413 ------------------------
 compat/obstack.h  | 511 ------------------------------
 ctype.c           |  36 ---
 git-compat-util.h |   3 -
 kwset.c           | 775 ----------------------------------------------
 kwset.h           |  65 ----
 7 files changed, 1805 deletions(-)
 delete mode 100644 compat/obstack.c
 delete mode 100644 compat/obstack.h
 delete mode 100644 kwset.c
 delete mode 100644 kwset.h

diff --git a/Makefile b/Makefile
index 5a239cac20..570b78a528 100644
--- a/Makefile
+++ b/Makefile
@@ -840,7 +840,6 @@ LIB_OBJS += combine-diff.o
 LIB_OBJS += commit-graph.o
 LIB_OBJS += commit-reach.o
 LIB_OBJS += commit.o
-LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
 LIB_OBJS += connect.o
@@ -888,7 +887,6 @@ LIB_OBJS += help.o
 LIB_OBJS += hex.o
 LIB_OBJS += ident.o
 LIB_OBJS += json-writer.o
-LIB_OBJS += kwset.o
 LIB_OBJS += levenshtein.o
 LIB_OBJS += line-log.o
 LIB_OBJS += line-range.o
diff --git a/compat/obstack.c b/compat/obstack.c
deleted file mode 100644
index 27cd5c1ea1..0000000000
--- a/compat/obstack.c
+++ /dev/null
@@ -1,413 +0,0 @@
-/* obstack.c - subroutines used implicitly by object stack macros
-   Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1996, 1997, 1998,
-   1999, 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include "git-compat-util.h"
-#include <gettext.h>
-#include "obstack.h"
-
-/* NOTE BEFORE MODIFYING THIS FILE: This version number must be
-   incremented whenever callers compiled using an old obstack.h can no
-   longer properly call the functions in this obstack.c.  */
-#define OBSTACK_INTERFACE_VERSION 1
-
-/* Comment out all this code if we are using the GNU C Library, and are not
-   actually compiling the library itself, and the installed library
-   supports the same library interface we do.  This code is part of the GNU
-   C Library, but also included in many other GNU distributions.  Compiling
-   and linking in this code is a waste when using the GNU C library
-   (especially if it is a shared library).  Rather than having every GNU
-   program understand `configure --with-gnu-libc' and omit the object
-   files, it is simpler to just do this in the source for each such file.  */
-
-#include <stdio.h>		/* Random thing to get __GNU_LIBRARY__.  */
-#if !defined _LIBC && defined __GNU_LIBRARY__ && __GNU_LIBRARY__ > 1
-# include <gnu-versions.h>
-# if _GNU_OBSTACK_INTERFACE_VERSION == OBSTACK_INTERFACE_VERSION
-#  define ELIDE_CODE
-# endif
-#endif
-
-#include <stddef.h>
-
-#ifndef ELIDE_CODE
-
-
-# if HAVE_INTTYPES_H
-#  include <inttypes.h>
-# endif
-# if HAVE_STDINT_H || defined _LIBC
-#  include <stdint.h>
-# endif
-
-/* Determine default alignment.  */
-union fooround
-{
-  uintmax_t i;
-  long double d;
-  void *p;
-};
-struct fooalign
-{
-  char c;
-  union fooround u;
-};
-/* If malloc were really smart, it would round addresses to DEFAULT_ALIGNMENT.
-   But in fact it might be less smart and round addresses to as much as
-   DEFAULT_ROUNDING.  So we prepare for it to do that.  */
-enum
-  {
-    DEFAULT_ALIGNMENT = offsetof (struct fooalign, u),
-    DEFAULT_ROUNDING = sizeof (union fooround)
-  };
-
-/* When we copy a long block of data, this is the unit to do it with.
-   On some machines, copying successive ints does not work;
-   in such a case, redefine COPYING_UNIT to `long' (if that works)
-   or `char' as a last resort.  */
-# ifndef COPYING_UNIT
-#  define COPYING_UNIT int
-# endif
-
-
-/* The functions allocating more room by calling `obstack_chunk_alloc'
-   jump to the handler pointed to by `obstack_alloc_failed_handler'.
-   This can be set to a user defined function which should either
-   abort gracefully or use longjump - but shouldn't return.  This
-   variable by default points to the internal function
-   `print_and_abort'.  */
-static void print_and_abort (void);
-void (*obstack_alloc_failed_handler) (void) = print_and_abort;
-
-# ifdef _LIBC
-#  if SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_3_4)
-/* A looong time ago (before 1994, anyway; we're not sure) this global variable
-   was used by non-GNU-C macros to avoid multiple evaluation.  The GNU C
-   library still exports it because somebody might use it.  */
-struct obstack *_obstack_compat;
-compat_symbol (libc, _obstack_compat, _obstack, GLIBC_2_0);
-#  endif
-# endif
-
-/* Define a macro that either calls functions with the traditional malloc/free
-   calling interface, or calls functions with the mmalloc/mfree interface
-   (that adds an extra first argument), based on the state of use_extra_arg.
-   For free, do not use ?:, since some compilers, like the MIPS compilers,
-   do not allow (expr) ? void : void.  */
-
-# define CALL_CHUNKFUN(h, size) \
-  (((h) -> use_extra_arg) \
-   ? (*(h)->chunkfun.extra) ((h)->extra_arg, (size)) \
-   : (*(h)->chunkfun.plain) ((size)))
-
-# define CALL_FREEFUN(h, old_chunk) \
-  do { \
-    if ((h) -> use_extra_arg) \
-      (*(h)->freefun.extra) ((h)->extra_arg, (old_chunk)); \
-    else \
-      (*(h)->freefun.plain) ((old_chunk)); \
-  } while (0)
-
-\f
-/* Initialize an obstack H for use.  Specify chunk size SIZE (0 means default).
-   Objects start on multiples of ALIGNMENT (0 means use default).
-   CHUNKFUN is the function to use to allocate chunks,
-   and FREEFUN the function to free them.
-
-   Return nonzero if successful, calls obstack_alloc_failed_handler if
-   allocation fails.  */
-
-int
-_obstack_begin (struct obstack *h,
-		int size, int alignment,
-		void *(*chunkfun) (long),
-		void (*freefun) (void *))
-{
-  register struct _obstack_chunk *chunk; /* points to new chunk */
-
-  if (alignment == 0)
-    alignment = DEFAULT_ALIGNMENT;
-  if (size == 0)
-    /* Default size is what GNU malloc can fit in a 4096-byte block.  */
-    {
-      /* 12 is sizeof (mhead) and 4 is EXTRA from GNU malloc.
-	 Use the values for range checking, because if range checking is off,
-	 the extra bytes won't be missed terribly, but if range checking is on
-	 and we used a larger request, a whole extra 4096 bytes would be
-	 allocated.
-
-	 These number are irrelevant to the new GNU malloc.  I suspect it is
-	 less sensitive to the size of the request.  */
-      int extra = ((((12 + DEFAULT_ROUNDING - 1) & ~(DEFAULT_ROUNDING - 1))
-		    + 4 + DEFAULT_ROUNDING - 1)
-		   & ~(DEFAULT_ROUNDING - 1));
-      size = 4096 - extra;
-    }
-
-  h->chunkfun.plain = chunkfun;
-  h->freefun.plain = freefun;
-  h->chunk_size = size;
-  h->alignment_mask = alignment - 1;
-  h->use_extra_arg = 0;
-
-  chunk = h->chunk = CALL_CHUNKFUN (h, h -> chunk_size);
-  if (!chunk)
-    (*obstack_alloc_failed_handler) ();
-  h->next_free = h->object_base = __PTR_ALIGN ((char *) chunk, chunk->contents,
-					       alignment - 1);
-  h->chunk_limit = chunk->limit
-    = (char *) chunk + h->chunk_size;
-  chunk->prev = NULL;
-  /* The initial chunk now contains no empty object.  */
-  h->maybe_empty_object = 0;
-  h->alloc_failed = 0;
-  return 1;
-}
-
-int
-_obstack_begin_1 (struct obstack *h, int size, int alignment,
-		  void *(*chunkfun) (void *, long),
-		  void (*freefun) (void *, void *),
-		  void *arg)
-{
-  register struct _obstack_chunk *chunk; /* points to new chunk */
-
-  if (alignment == 0)
-    alignment = DEFAULT_ALIGNMENT;
-  if (size == 0)
-    /* Default size is what GNU malloc can fit in a 4096-byte block.  */
-    {
-      /* 12 is sizeof (mhead) and 4 is EXTRA from GNU malloc.
-	 Use the values for range checking, because if range checking is off,
-	 the extra bytes won't be missed terribly, but if range checking is on
-	 and we used a larger request, a whole extra 4096 bytes would be
-	 allocated.
-
-	 These number are irrelevant to the new GNU malloc.  I suspect it is
-	 less sensitive to the size of the request.  */
-      int extra = ((((12 + DEFAULT_ROUNDING - 1) & ~(DEFAULT_ROUNDING - 1))
-		    + 4 + DEFAULT_ROUNDING - 1)
-		   & ~(DEFAULT_ROUNDING - 1));
-      size = 4096 - extra;
-    }
-
-  h->chunkfun.extra = (struct _obstack_chunk * (*)(void *,long)) chunkfun;
-  h->freefun.extra = (void (*) (void *, struct _obstack_chunk *)) freefun;
-
-  h->chunk_size = size;
-  h->alignment_mask = alignment - 1;
-  h->extra_arg = arg;
-  h->use_extra_arg = 1;
-
-  chunk = h->chunk = CALL_CHUNKFUN (h, h -> chunk_size);
-  if (!chunk)
-    (*obstack_alloc_failed_handler) ();
-  h->next_free = h->object_base = __PTR_ALIGN ((char *) chunk, chunk->contents,
-					       alignment - 1);
-  h->chunk_limit = chunk->limit
-    = (char *) chunk + h->chunk_size;
-  chunk->prev = NULL;
-  /* The initial chunk now contains no empty object.  */
-  h->maybe_empty_object = 0;
-  h->alloc_failed = 0;
-  return 1;
-}
-
-/* Allocate a new current chunk for the obstack *H
-   on the assumption that LENGTH bytes need to be added
-   to the current object, or a new object of length LENGTH allocated.
-   Copies any partial object from the end of the old chunk
-   to the beginning of the new one.  */
-
-void
-_obstack_newchunk (struct obstack *h, int length)
-{
-  register struct _obstack_chunk *old_chunk = h->chunk;
-  register struct _obstack_chunk *new_chunk;
-  register long	new_size;
-  register long obj_size = h->next_free - h->object_base;
-  register long i;
-  long already;
-  char *object_base;
-
-  /* Compute size for new chunk.  */
-  new_size = (obj_size + length) + (obj_size >> 3) + h->alignment_mask + 100;
-  if (new_size < h->chunk_size)
-    new_size = h->chunk_size;
-
-  /* Allocate and initialize the new chunk.  */
-  new_chunk = CALL_CHUNKFUN (h, new_size);
-  if (!new_chunk)
-    (*obstack_alloc_failed_handler) ();
-  h->chunk = new_chunk;
-  new_chunk->prev = old_chunk;
-  new_chunk->limit = h->chunk_limit = (char *) new_chunk + new_size;
-
-  /* Compute an aligned object_base in the new chunk */
-  object_base =
-    __PTR_ALIGN ((char *) new_chunk, new_chunk->contents, h->alignment_mask);
-
-  /* Move the existing object to the new chunk.
-     Word at a time is fast and is safe if the object
-     is sufficiently aligned.  */
-  if (h->alignment_mask + 1 >= DEFAULT_ALIGNMENT)
-    {
-      for (i = obj_size / sizeof (COPYING_UNIT) - 1;
-	   i >= 0; i--)
-	((COPYING_UNIT *)object_base)[i]
-	  = ((COPYING_UNIT *)h->object_base)[i];
-      /* We used to copy the odd few remaining bytes as one extra COPYING_UNIT,
-	 but that can cross a page boundary on a machine
-	 which does not do strict alignment for COPYING_UNITS.  */
-      already = obj_size / sizeof (COPYING_UNIT) * sizeof (COPYING_UNIT);
-    }
-  else
-    already = 0;
-  /* Copy remaining bytes one by one.  */
-  for (i = already; i < obj_size; i++)
-    object_base[i] = h->object_base[i];
-
-  /* If the object just copied was the only data in OLD_CHUNK,
-     free that chunk and remove it from the chain.
-     But not if that chunk might contain an empty object.  */
-  if (! h->maybe_empty_object
-      && (h->object_base
-	  == __PTR_ALIGN ((char *) old_chunk, old_chunk->contents,
-			  h->alignment_mask)))
-    {
-      new_chunk->prev = old_chunk->prev;
-      CALL_FREEFUN (h, old_chunk);
-    }
-
-  h->object_base = object_base;
-  h->next_free = h->object_base + obj_size;
-  /* The new chunk certainly contains no empty object yet.  */
-  h->maybe_empty_object = 0;
-}
-# ifdef _LIBC
-libc_hidden_def (_obstack_newchunk)
-# endif
-
-/* Return nonzero if object OBJ has been allocated from obstack H.
-   This is here for debugging.
-   If you use it in a program, you are probably losing.  */
-
-/* Suppress -Wmissing-prototypes warning.  We don't want to declare this in
-   obstack.h because it is just for debugging.  */
-int _obstack_allocated_p (struct obstack *h, void *obj);
-
-int
-_obstack_allocated_p (struct obstack *h, void *obj)
-{
-  register struct _obstack_chunk *lp;	/* below addr of any objects in this chunk */
-  register struct _obstack_chunk *plp;	/* point to previous chunk if any */
-
-  lp = (h)->chunk;
-  /* We use >= rather than > since the object cannot be exactly at
-     the beginning of the chunk but might be an empty object exactly
-     at the end of an adjacent chunk.  */
-  while (lp != NULL && ((void *) lp >= obj || (void *) (lp)->limit < obj))
-    {
-      plp = lp->prev;
-      lp = plp;
-    }
-  return lp != NULL;
-}
-\f
-/* Free objects in obstack H, including OBJ and everything allocate
-   more recently than OBJ.  If OBJ is zero, free everything in H.  */
-
-# undef obstack_free
-
-void
-obstack_free (struct obstack *h, void *obj)
-{
-  register struct _obstack_chunk *lp;	/* below addr of any objects in this chunk */
-  register struct _obstack_chunk *plp;	/* point to previous chunk if any */
-
-  lp = h->chunk;
-  /* We use >= because there cannot be an object at the beginning of a chunk.
-     But there can be an empty object at that address
-     at the end of another chunk.  */
-  while (lp != NULL && ((void *) lp >= obj || (void *) (lp)->limit < obj))
-    {
-      plp = lp->prev;
-      CALL_FREEFUN (h, lp);
-      lp = plp;
-      /* If we switch chunks, we can't tell whether the new current
-	 chunk contains an empty object, so assume that it may.  */
-      h->maybe_empty_object = 1;
-    }
-  if (lp)
-    {
-      h->object_base = h->next_free = (char *) (obj);
-      h->chunk_limit = lp->limit;
-      h->chunk = lp;
-    }
-  else if (obj != NULL)
-    /* obj is not in any of the chunks! */
-    abort ();
-}
-
-# ifdef _LIBC
-/* Older versions of libc used a function _obstack_free intended to be
-   called by non-GCC compilers.  */
-strong_alias (obstack_free, _obstack_free)
-# endif
-\f
-int
-_obstack_memory_used (struct obstack *h)
-{
-  register struct _obstack_chunk* lp;
-  register int nbytes = 0;
-
-  for (lp = h->chunk; lp != NULL; lp = lp->prev)
-    {
-      nbytes += lp->limit - (char *) lp;
-    }
-  return nbytes;
-}
-\f
-# ifdef _LIBC
-#  include <libio/iolibio.h>
-# endif
-
-# ifndef __attribute__
-/* This feature is available in gcc versions 2.5 and later.  */
-#  if __GNUC__ < 2 || (__GNUC__ == 2 && __GNUC_MINOR__ < 5)
-#   define __attribute__(Spec) /* empty */
-#  endif
-# endif
-
-static void
-print_and_abort (void)
-{
-  /* Don't change any of these strings.  Yes, it would be possible to add
-     the newline to the string and use fputs or so.  But this must not
-     happen because the "memory exhausted" message appears in other places
-     like this and the translation should be reused instead of creating
-     a very similar string which requires a separate translation.  */
-# ifdef _LIBC
-  (void) __fxprintf (NULL, "%s\n", _("memory exhausted"));
-# else
-  fprintf (stderr, "%s\n", _("memory exhausted"));
-# endif
-  exit (1);
-}
-
-#endif	/* !ELIDE_CODE */
diff --git a/compat/obstack.h b/compat/obstack.h
deleted file mode 100644
index f90a46d9b9..0000000000
--- a/compat/obstack.h
+++ /dev/null
@@ -1,511 +0,0 @@
-/* obstack.h - object stack macros
-   Copyright (C) 1988-1994,1996-1999,2003,2004,2005,2009
-	Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-/* Summary:
-
-All the apparent functions defined here are macros. The idea
-is that you would use these pre-tested macros to solve a
-very specific set of problems, and they would run fast.
-Caution: no side-effects in arguments please!! They may be
-evaluated MANY times!!
-
-These macros operate a stack of objects.  Each object starts life
-small, and may grow to maturity.  (Consider building a word syllable
-by syllable.)  An object can move while it is growing.  Once it has
-been "finished" it never changes address again.  So the "top of the
-stack" is typically an immature growing object, while the rest of the
-stack is of mature, fixed size and fixed address objects.
-
-These routines grab large chunks of memory, using a function you
-supply, called `obstack_chunk_alloc'.  On occasion, they free chunks,
-by calling `obstack_chunk_free'.  You must define them and declare
-them before using any obstack macros.
-
-Each independent stack is represented by a `struct obstack'.
-Each of the obstack macros expects a pointer to such a structure
-as the first argument.
-
-One motivation for this package is the problem of growing char strings
-in symbol tables.  Unless you are "fascist pig with a read-only mind"
---Gosper's immortal quote from HAKMEM item 154, out of context--you
-would not like to put any arbitrary upper limit on the length of your
-symbols.
-
-In practice this often means you will build many short symbols and a
-few long symbols.  At the time you are reading a symbol you don't know
-how long it is.  One traditional method is to read a symbol into a
-buffer, realloc()ating the buffer every time you try to read a symbol
-that is longer than the buffer.  This is beaut, but you still will
-want to copy the symbol from the buffer to a more permanent
-symbol-table entry say about half the time.
-
-With obstacks, you can work differently.  Use one obstack for all symbol
-names.  As you read a symbol, grow the name in the obstack gradually.
-When the name is complete, finalize it.  Then, if the symbol exists already,
-free the newly read name.
-
-The way we do this is to take a large chunk, allocating memory from
-low addresses.  When you want to build a symbol in the chunk you just
-add chars above the current "high water mark" in the chunk.  When you
-have finished adding chars, because you got to the end of the symbol,
-you know how long the chars are, and you can create a new object.
-Mostly the chars will not burst over the highest address of the chunk,
-because you would typically expect a chunk to be (say) 100 times as
-long as an average object.
-
-In case that isn't clear, when we have enough chars to make up
-the object, THEY ARE ALREADY CONTIGUOUS IN THE CHUNK (guaranteed)
-so we just point to it where it lies.  No moving of chars is
-needed and this is the second win: potentially long strings need
-never be explicitly shuffled. Once an object is formed, it does not
-change its address during its lifetime.
-
-When the chars burst over a chunk boundary, we allocate a larger
-chunk, and then copy the partly formed object from the end of the old
-chunk to the beginning of the new larger chunk.  We then carry on
-accrediting characters to the end of the object as we normally would.
-
-A special macro is provided to add a single char at a time to a
-growing object.  This allows the use of register variables, which
-break the ordinary 'growth' macro.
-
-Summary:
-	We allocate large chunks.
-	We carve out one object at a time from the current chunk.
-	Once carved, an object never moves.
-	We are free to append data of any size to the currently
-	  growing object.
-	Exactly one object is growing in an obstack at any one time.
-	You can run one obstack per control block.
-	You may have as many control blocks as you dare.
-	Because of the way we do it, you can `unwind' an obstack
-	  back to a previous state. (You may remove objects much
-	  as you would with a stack.)
-*/
-
-
-/* Don't do the contents of this file more than once.  */
-
-#ifndef _OBSTACK_H
-#define _OBSTACK_H 1
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-\f
-/* We need the type of a pointer subtraction.  If __PTRDIFF_TYPE__ is
-   defined, as with GNU C, use that; that way we don't pollute the
-   namespace with <stddef.h>'s symbols.  Otherwise, include <stddef.h>
-   and use ptrdiff_t.  */
-
-#ifdef __PTRDIFF_TYPE__
-# define PTR_INT_TYPE __PTRDIFF_TYPE__
-#else
-# include <stddef.h>
-# define PTR_INT_TYPE ptrdiff_t
-#endif
-
-/* If B is the base of an object addressed by P, return the result of
-   aligning P to the next multiple of A + 1.  B and P must be of type
-   char *.  A + 1 must be a power of 2.  */
-
-#define __BPTR_ALIGN(B, P, A) ((B) + (((P) - (B) + (A)) & ~(A)))
-
-/* Similar to _BPTR_ALIGN (B, P, A), except optimize the common case
-   where pointers can be converted to integers, aligned as integers,
-   and converted back again.  If PTR_INT_TYPE is narrower than a
-   pointer (e.g., the AS/400), play it safe and compute the alignment
-   relative to B.  Otherwise, use the faster strategy of computing the
-   alignment relative to 0.  */
-
-#define __PTR_ALIGN(B, P, A)						    \
-  (sizeof (PTR_INT_TYPE) < sizeof(void *) ?                                 \
-   __BPTR_ALIGN((B), (P), (A)) :                                            \
-   (void *)__BPTR_ALIGN((PTR_INT_TYPE)(void *)0, (PTR_INT_TYPE)(P), (A))            \
-  )
-
-#include <string.h>
-
-struct _obstack_chunk		/* Lives at front of each chunk. */
-{
-  char  *limit;			/* 1 past end of this chunk */
-  struct _obstack_chunk *prev;	/* address of prior chunk or NULL */
-  char	contents[4];		/* objects begin here */
-};
-
-struct obstack		/* control current object in current chunk */
-{
-  long	chunk_size;		/* preferred size to allocate chunks in */
-  struct _obstack_chunk *chunk;	/* address of current struct obstack_chunk */
-  char	*object_base;		/* address of object we are building */
-  char	*next_free;		/* where to add next char to current object */
-  char	*chunk_limit;		/* address of char after current chunk */
-  union
-  {
-    PTR_INT_TYPE tempint;
-    void *tempptr;
-  } temp;			/* Temporary for some macros.  */
-  int   alignment_mask;		/* Mask of alignment for each object. */
-  /* These prototypes vary based on `use_extra_arg'. */
-  union {
-    void *(*plain) (long);
-    struct _obstack_chunk *(*extra) (void *, long);
-  } chunkfun;
-  union {
-    void (*plain) (void *);
-    void (*extra) (void *, struct _obstack_chunk *);
-  } freefun;
-  void *extra_arg;		/* first arg for chunk alloc/dealloc funcs */
-  unsigned use_extra_arg:1;	/* chunk alloc/dealloc funcs take extra arg */
-  unsigned maybe_empty_object:1;/* There is a possibility that the current
-				   chunk contains a zero-length object.  This
-				   prevents freeing the chunk if we allocate
-				   a bigger chunk to replace it. */
-  unsigned alloc_failed:1;	/* No longer used, as we now call the failed
-				   handler on error, but retained for binary
-				   compatibility.  */
-};
-
-/* Declare the external functions we use; they are in obstack.c.  */
-
-extern void _obstack_newchunk (struct obstack *, int);
-extern int _obstack_begin (struct obstack *, int, int,
-			    void *(*) (long), void (*) (void *));
-extern int _obstack_begin_1 (struct obstack *, int, int,
-			     void *(*) (void *, long),
-			     void (*) (void *, void *), void *);
-extern int _obstack_memory_used (struct obstack *);
-
-void obstack_free (struct obstack *, void *);
-
-\f
-/* Error handler called when `obstack_chunk_alloc' failed to allocate
-   more memory.  This can be set to a user defined function which
-   should either abort gracefully or use longjump - but shouldn't
-   return.  The default action is to print a message and abort.  */
-extern void (*obstack_alloc_failed_handler) (void);
-\f
-/* Pointer to beginning of object being allocated or to be allocated next.
-   Note that this might not be the final address of the object
-   because a new chunk might be needed to hold the final size.  */
-
-#define obstack_base(h) ((void *) (h)->object_base)
-
-/* Size for allocating ordinary chunks.  */
-
-#define obstack_chunk_size(h) ((h)->chunk_size)
-
-/* Pointer to next byte not yet allocated in current chunk.  */
-
-#define obstack_next_free(h)	((h)->next_free)
-
-/* Mask specifying low bits that should be clear in address of an object.  */
-
-#define obstack_alignment_mask(h) ((h)->alignment_mask)
-
-/* To prevent prototype warnings provide complete argument list.  */
-#define obstack_init(h)						\
-  _obstack_begin ((h), 0, 0,					\
-		  (void *(*) (long)) obstack_chunk_alloc,	\
-		  (void (*) (void *)) obstack_chunk_free)
-
-#define obstack_begin(h, size)					\
-  _obstack_begin ((h), (size), 0,				\
-		  (void *(*) (long)) obstack_chunk_alloc,	\
-		  (void (*) (void *)) obstack_chunk_free)
-
-#define obstack_specify_allocation(h, size, alignment, chunkfun, freefun)  \
-  _obstack_begin ((h), (size), (alignment),				   \
-		  (void *(*) (long)) (chunkfun),			   \
-		  (void (*) (void *)) (freefun))
-
-#define obstack_specify_allocation_with_arg(h, size, alignment, chunkfun, freefun, arg) \
-  _obstack_begin_1 ((h), (size), (alignment),				\
-		    (void *(*) (void *, long)) (chunkfun),		\
-		    (void (*) (void *, void *)) (freefun), (arg))
-
-#define obstack_chunkfun(h, newchunkfun) \
-  ((h)->chunkfun.extra = (struct _obstack_chunk *(*)(void *, long)) (newchunkfun))
-
-#define obstack_freefun(h, newfreefun) \
-  ((h)->freefun.extra = (void (*)(void *, struct _obstack_chunk *)) (newfreefun))
-
-#define obstack_1grow_fast(h,achar) (*((h)->next_free)++ = (achar))
-
-#define obstack_blank_fast(h,n) ((h)->next_free += (n))
-
-#define obstack_memory_used(h) _obstack_memory_used (h)
-\f
-#if defined __GNUC__ && defined __STDC__ && __STDC__
-/* NextStep 2.0 cc is really gcc 1.93 but it defines __GNUC__ = 2 and
-   does not implement __extension__.  But that compiler doesn't define
-   __GNUC_MINOR__.  */
-# if __GNUC__ < 2 || (__NeXT__ && !__GNUC_MINOR__)
-#  define __extension__
-# endif
-
-/* For GNU C, if not -traditional,
-   we can define these macros to compute all args only once
-   without using a global variable.
-   Also, we can avoid using the `temp' slot, to make faster code.  */
-
-# define obstack_object_size(OBSTACK)					\
-  __extension__								\
-  ({ struct obstack const *__o = (OBSTACK);				\
-     (unsigned) (__o->next_free - __o->object_base); })
-
-# define obstack_room(OBSTACK)						\
-  __extension__								\
-  ({ struct obstack const *__o = (OBSTACK);				\
-     (unsigned) (__o->chunk_limit - __o->next_free); })
-
-# define obstack_make_room(OBSTACK,length)				\
-__extension__								\
-({ struct obstack *__o = (OBSTACK);					\
-   int __len = (length);						\
-   if (__o->chunk_limit - __o->next_free < __len)			\
-     _obstack_newchunk (__o, __len);					\
-   (void) 0; })
-
-# define obstack_empty_p(OBSTACK)					\
-  __extension__								\
-  ({ struct obstack const *__o = (OBSTACK);				\
-     (__o->chunk->prev == 0						\
-      && __o->next_free == __PTR_ALIGN ((char *) __o->chunk,		\
-					__o->chunk->contents,		\
-					__o->alignment_mask)); })
-
-# define obstack_grow(OBSTACK,where,length)				\
-__extension__								\
-({ struct obstack *__o = (OBSTACK);					\
-   int __len = (length);						\
-   if (__o->next_free + __len > __o->chunk_limit)			\
-     _obstack_newchunk (__o, __len);					\
-   memcpy (__o->next_free, where, __len);				\
-   __o->next_free += __len;						\
-   (void) 0; })
-
-# define obstack_grow0(OBSTACK,where,length)				\
-__extension__								\
-({ struct obstack *__o = (OBSTACK);					\
-   int __len = (length);						\
-   if (__o->next_free + __len + 1 > __o->chunk_limit)			\
-     _obstack_newchunk (__o, __len + 1);				\
-   memcpy (__o->next_free, where, __len);				\
-   __o->next_free += __len;						\
-   *(__o->next_free)++ = 0;						\
-   (void) 0; })
-
-# define obstack_1grow(OBSTACK,datum)					\
-__extension__								\
-({ struct obstack *__o = (OBSTACK);					\
-   if (__o->next_free + 1 > __o->chunk_limit)				\
-     _obstack_newchunk (__o, 1);					\
-   obstack_1grow_fast (__o, datum);					\
-   (void) 0; })
-
-/* These assume that the obstack alignment is good enough for pointers
-   or ints, and that the data added so far to the current object
-   shares that much alignment.  */
-
-# define obstack_ptr_grow(OBSTACK,datum)				\
-__extension__								\
-({ struct obstack *__o = (OBSTACK);					\
-   if (__o->next_free + sizeof (void *) > __o->chunk_limit)		\
-     _obstack_newchunk (__o, sizeof (void *));				\
-   obstack_ptr_grow_fast (__o, datum); })				\
-
-# define obstack_int_grow(OBSTACK,datum)				\
-__extension__								\
-({ struct obstack *__o = (OBSTACK);					\
-   if (__o->next_free + sizeof (int) > __o->chunk_limit)		\
-     _obstack_newchunk (__o, sizeof (int));				\
-   obstack_int_grow_fast (__o, datum); })
-
-# define obstack_ptr_grow_fast(OBSTACK,aptr)				\
-__extension__								\
-({ struct obstack *__o1 = (OBSTACK);					\
-   *(const void **) __o1->next_free = (aptr);				\
-   __o1->next_free += sizeof (const void *);				\
-   (void) 0; })
-
-# define obstack_int_grow_fast(OBSTACK,aint)				\
-__extension__								\
-({ struct obstack *__o1 = (OBSTACK);					\
-   *(int *) __o1->next_free = (aint);					\
-   __o1->next_free += sizeof (int);					\
-   (void) 0; })
-
-# define obstack_blank(OBSTACK,length)					\
-__extension__								\
-({ struct obstack *__o = (OBSTACK);					\
-   int __len = (length);						\
-   if (__o->chunk_limit - __o->next_free < __len)			\
-     _obstack_newchunk (__o, __len);					\
-   obstack_blank_fast (__o, __len);					\
-   (void) 0; })
-
-# define obstack_alloc(OBSTACK,length)					\
-__extension__								\
-({ struct obstack *__h = (OBSTACK);					\
-   obstack_blank (__h, (length));					\
-   obstack_finish (__h); })
-
-# define obstack_copy(OBSTACK,where,length)				\
-__extension__								\
-({ struct obstack *__h = (OBSTACK);					\
-   obstack_grow (__h, (where), (length));				\
-   obstack_finish (__h); })
-
-# define obstack_copy0(OBSTACK,where,length)				\
-__extension__								\
-({ struct obstack *__h = (OBSTACK);					\
-   obstack_grow0 (__h, (where), (length));				\
-   obstack_finish (__h); })
-
-/* The local variable is named __o1 to avoid a name conflict
-   when obstack_blank is called.  */
-# define obstack_finish(OBSTACK)					\
-__extension__								\
-({ struct obstack *__o1 = (OBSTACK);					\
-   void *__value = (void *) __o1->object_base;				\
-   if (__o1->next_free == __value)					\
-     __o1->maybe_empty_object = 1;					\
-   __o1->next_free							\
-     = __PTR_ALIGN (__o1->object_base, __o1->next_free,			\
-		    __o1->alignment_mask);				\
-   if (__o1->next_free - (char *)__o1->chunk				\
-       > __o1->chunk_limit - (char *)__o1->chunk)			\
-     __o1->next_free = __o1->chunk_limit;				\
-   __o1->object_base = __o1->next_free;					\
-   __value; })
-
-# define obstack_free(OBSTACK, OBJ)					\
-__extension__								\
-({ struct obstack *__o = (OBSTACK);					\
-   void *__obj = (OBJ);							\
-   if (__obj > (void *)__o->chunk && __obj < (void *)__o->chunk_limit)  \
-     __o->next_free = __o->object_base = (char *)__obj;			\
-   else (obstack_free) (__o, __obj); })
-\f
-#else /* not __GNUC__ or not __STDC__ */
-
-# define obstack_object_size(h) \
- (unsigned) ((h)->next_free - (h)->object_base)
-
-# define obstack_room(h)		\
- (unsigned) ((h)->chunk_limit - (h)->next_free)
-
-# define obstack_empty_p(h) \
- ((h)->chunk->prev == 0							\
-  && (h)->next_free == __PTR_ALIGN ((char *) (h)->chunk,		\
-				    (h)->chunk->contents,		\
-				    (h)->alignment_mask))
-
-/* Note that the call to _obstack_newchunk is enclosed in (..., 0)
-   so that we can avoid having void expressions
-   in the arms of the conditional expression.
-   Casting the third operand to void was tried before,
-   but some compilers won't accept it.  */
-
-# define obstack_make_room(h,length)					\
-( (h)->temp.tempint = (length),						\
-  (((h)->next_free + (h)->temp.tempint > (h)->chunk_limit)		\
-   ? (_obstack_newchunk ((h), (h)->temp.tempint), 0) : 0))
-
-# define obstack_grow(h,where,length)					\
-( (h)->temp.tempint = (length),						\
-  (((h)->next_free + (h)->temp.tempint > (h)->chunk_limit)		\
-   ? (_obstack_newchunk ((h), (h)->temp.tempint), 0) : 0),		\
-  memcpy ((h)->next_free, where, (h)->temp.tempint),			\
-  (h)->next_free += (h)->temp.tempint)
-
-# define obstack_grow0(h,where,length)					\
-( (h)->temp.tempint = (length),						\
-  (((h)->next_free + (h)->temp.tempint + 1 > (h)->chunk_limit)		\
-   ? (_obstack_newchunk ((h), (h)->temp.tempint + 1), 0) : 0),		\
-  memcpy ((h)->next_free, where, (h)->temp.tempint),			\
-  (h)->next_free += (h)->temp.tempint,					\
-  *((h)->next_free)++ = 0)
-
-# define obstack_1grow(h,datum)						\
-( (((h)->next_free + 1 > (h)->chunk_limit)				\
-   ? (_obstack_newchunk ((h), 1), 0) : 0),				\
-  obstack_1grow_fast (h, datum))
-
-# define obstack_ptr_grow(h,datum)					\
-( (((h)->next_free + sizeof (char *) > (h)->chunk_limit)		\
-   ? (_obstack_newchunk ((h), sizeof (char *)), 0) : 0),		\
-  obstack_ptr_grow_fast (h, datum))
-
-# define obstack_int_grow(h,datum)					\
-( (((h)->next_free + sizeof (int) > (h)->chunk_limit)			\
-   ? (_obstack_newchunk ((h), sizeof (int)), 0) : 0),			\
-  obstack_int_grow_fast (h, datum))
-
-# define obstack_ptr_grow_fast(h,aptr)					\
-  (((const void **) ((h)->next_free += sizeof (void *)))[-1] = (aptr))
-
-# define obstack_int_grow_fast(h,aint)					\
-  (((int *) ((h)->next_free += sizeof (int)))[-1] = (aint))
-
-# define obstack_blank(h,length)					\
-( (h)->temp.tempint = (length),						\
-  (((h)->chunk_limit - (h)->next_free < (h)->temp.tempint)		\
-   ? (_obstack_newchunk ((h), (h)->temp.tempint), 0) : 0),		\
-  obstack_blank_fast (h, (h)->temp.tempint))
-
-# define obstack_alloc(h,length)					\
- (obstack_blank ((h), (length)), obstack_finish ((h)))
-
-# define obstack_copy(h,where,length)					\
- (obstack_grow ((h), (where), (length)), obstack_finish ((h)))
-
-# define obstack_copy0(h,where,length)					\
- (obstack_grow0 ((h), (where), (length)), obstack_finish ((h)))
-
-# define obstack_finish(h)						\
-( ((h)->next_free == (h)->object_base					\
-   ? (((h)->maybe_empty_object = 1), 0)					\
-   : 0),								\
-  (h)->temp.tempptr = (h)->object_base,					\
-  (h)->next_free							\
-    = __PTR_ALIGN ((h)->object_base, (h)->next_free,			\
-		   (h)->alignment_mask),				\
-  (((h)->next_free - (char *) (h)->chunk				\
-    > (h)->chunk_limit - (char *) (h)->chunk)				\
-   ? ((h)->next_free = (h)->chunk_limit) : 0),				\
-  (h)->object_base = (h)->next_free,					\
-  (h)->temp.tempptr)
-
-# define obstack_free(h,obj)						\
-( (h)->temp.tempint = (char *) (obj) - (char *) (h)->chunk,		\
-  ((((h)->temp.tempint > 0						\
-    && (h)->temp.tempint < (h)->chunk_limit - (char *) (h)->chunk))	\
-   ? (ptrdiff_t) ((h)->next_free = (h)->object_base				\
-	    = (h)->temp.tempint + (char *) (h)->chunk)			\
-   : (((obstack_free) ((h), (h)->temp.tempint + (char *) (h)->chunk), 0), 0)))
-
-#endif /* not __GNUC__ or not __STDC__ */
-
-#ifdef __cplusplus
-}	/* C++ */
-#endif
-
-#endif /* obstack.h */
diff --git a/ctype.c b/ctype.c
index fc0225cebd..3451745550 100644
--- a/ctype.c
+++ b/ctype.c
@@ -28,39 +28,3 @@ const unsigned char sane_ctype[256] = {
 	A, A, A, A, A, A, A, A, A, A, A, R, R, U, P, X,		/* 112..127 */
 	/* Nothing in the 128.. range */
 };
-
-/* For case-insensitive kwset */
-const unsigned char tolower_trans_tbl[256] = {
-	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
-	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
-	0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
-	0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
-	 ' ',  '!',  '"',  '#',  '$',  '%',  '&', 0x27,
-	 '(',  ')',  '*',  '+',  ',',  '-',  '.',  '/',
-	 '0',  '1',  '2',  '3',  '4',  '5',  '6',  '7',
-	 '8',  '9',  ':',  ';',  '<',  '=',  '>',  '?',
-	 '@',  'a',  'b',  'c',  'd',  'e',  'f',  'g',
-	 'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
-	 'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
-	 'x',  'y',  'z',  '[', 0x5c,  ']',  '^',  '_',
-	 '`',  'a',  'b',  'c',  'd',  'e',  'f',  'g',
-	 'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
-	 'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
-	 'x',  'y',  'z',  '{',  '|',  '}',  '~', 0x7f,
-	0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
-	0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
-	0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
-	0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
-	0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
-	0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
-	0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
-	0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
-	0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
-	0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
-	0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
-	0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
-	0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
-	0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
-	0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
-	0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff,
-};
diff --git a/git-compat-util.h b/git-compat-util.h
index 5d5e47fbe2..1f4e740773 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -986,9 +986,6 @@ int xsnprintf(char *dst, size_t max, const char *fmt, ...);
 
 int xgethostname(char *buf, size_t len);
 
-/* in ctype.c, for kwset users */
-extern const unsigned char tolower_trans_tbl[256];
-
 /* Sane ctype - no locale, and works with signed chars */
 #undef isascii
 #undef isspace
diff --git a/kwset.c b/kwset.c
deleted file mode 100644
index fc439e0667..0000000000
--- a/kwset.c
+++ /dev/null
@@ -1,775 +0,0 @@
-/*
- * This file has been copied from commit e7ac713d^ in the GNU grep git
- * repository. A few small changes have been made to adapt the code to
- * Git.
- */
-
-/* kwset.c - search for any of a set of keywords.
-   Copyright 1989, 1998, 2000, 2005 Free Software Foundation, Inc.
-
-   This program is free software; you can redistribute it and/or modify
-   it under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 2, or (at your option)
-   any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program; if not, see <http://www.gnu.org/licenses/>. */
-
-/* Written August 1989 by Mike Haertel.
-   The author may be reached (Email) at the address mike@ai.mit.edu,
-   or (US mail) as Mike Haertel c/o Free Software Foundation. */
-
-/* The algorithm implemented by these routines bears a startling resemblance
-   to one discovered by Beate Commentz-Walter, although it is not identical.
-   See "A String Matching Algorithm Fast on the Average," Technical Report,
-   IBM-Germany, Scientific Center Heidelberg, Tiergartenstrasse 15, D-6900
-   Heidelberg, Germany.  See also Aho, A.V., and M. Corasick, "Efficient
-   String Matching:  An Aid to Bibliographic Search," CACM June 1975,
-   Vol. 18, No. 6, which describes the failure function used below. */
-
-#include "cache.h"
-
-#include "kwset.h"
-#include "compat/obstack.h"
-
-#define NCHAR (UCHAR_MAX + 1)
-/* adapter for `xmalloc()`, which takes `size_t`, not `long` */
-static void *obstack_chunk_alloc(long size)
-{
-	if (size < 0)
-		BUG("Cannot allocate a negative amount: %ld", size);
-	return xmalloc(size);
-}
-#define obstack_chunk_free free
-
-#define U(c) ((unsigned char) (c))
-
-/* Balanced tree of edges and labels leaving a given trie node. */
-struct tree
-{
-  struct tree *llink;		/* Left link; MUST be first field. */
-  struct tree *rlink;		/* Right link (to larger labels). */
-  struct trie *trie;		/* Trie node pointed to by this edge. */
-  unsigned char label;		/* Label on this edge. */
-  char balance;			/* Difference in depths of subtrees. */
-};
-
-/* Node of a trie representing a set of reversed keywords. */
-struct trie
-{
-  unsigned int accepting;	/* Word index of accepted word, or zero. */
-  struct tree *links;		/* Tree of edges leaving this node. */
-  struct trie *parent;		/* Parent of this node. */
-  struct trie *next;		/* List of all trie nodes in level order. */
-  struct trie *fail;		/* Aho-Corasick failure function. */
-  int depth;			/* Depth of this node from the root. */
-  int shift;			/* Shift function for search failures. */
-  int maxshift;			/* Max shift of self and descendants. */
-};
-
-/* Structure returned opaquely to the caller, containing everything. */
-struct kwset
-{
-  struct obstack obstack;	/* Obstack for node allocation. */
-  int words;			/* Number of words in the trie. */
-  struct trie *trie;		/* The trie itself. */
-  int mind;			/* Minimum depth of an accepting node. */
-  int maxd;			/* Maximum depth of any node. */
-  unsigned char delta[NCHAR];	/* Delta table for rapid search. */
-  struct trie *next[NCHAR];	/* Table of children of the root. */
-  char *target;			/* Target string if there's only one. */
-  int mind2;			/* Used in Boyer-Moore search for one string. */
-  unsigned char const *trans;  /* Character translation table. */
-};
-
-/* Allocate and initialize a keyword set object, returning an opaque
-   pointer to it.  Return NULL if memory is not available. */
-kwset_t
-kwsalloc (unsigned char const *trans)
-{
-  struct kwset *kwset;
-
-  kwset = (struct kwset *) xmalloc(sizeof (struct kwset));
-
-  obstack_init(&kwset->obstack);
-  kwset->words = 0;
-  kwset->trie
-    = (struct trie *) obstack_alloc(&kwset->obstack, sizeof (struct trie));
-  if (!kwset->trie)
-    {
-      kwsfree((kwset_t) kwset);
-      return NULL;
-    }
-  kwset->trie->accepting = 0;
-  kwset->trie->links = NULL;
-  kwset->trie->parent = NULL;
-  kwset->trie->next = NULL;
-  kwset->trie->fail = NULL;
-  kwset->trie->depth = 0;
-  kwset->trie->shift = 0;
-  kwset->mind = INT_MAX;
-  kwset->maxd = -1;
-  kwset->target = NULL;
-  kwset->trans = trans;
-
-  return (kwset_t) kwset;
-}
-
-/* This upper bound is valid for CHAR_BIT >= 4 and
-   exact for CHAR_BIT in { 4..11, 13, 15, 17, 19 }. */
-#define DEPTH_SIZE (CHAR_BIT + CHAR_BIT/2)
-
-/* Add the given string to the contents of the keyword set.  Return NULL
-   for success, an error message otherwise. */
-const char *
-kwsincr (kwset_t kws, char const *text, size_t len)
-{
-  struct kwset *kwset;
-  register struct trie *trie;
-  register unsigned char label;
-  register struct tree *link;
-  register int depth;
-  struct tree *links[DEPTH_SIZE];
-  enum { L, R } dirs[DEPTH_SIZE];
-  struct tree *t, *r, *l, *rl, *lr;
-
-  kwset = (struct kwset *) kws;
-  trie = kwset->trie;
-  text += len;
-
-  /* Descend the trie (built of reversed keywords) character-by-character,
-     installing new nodes when necessary. */
-  while (len--)
-    {
-      label = kwset->trans ? kwset->trans[U(*--text)] : *--text;
-
-      /* Descend the tree of outgoing links for this trie node,
-	 looking for the current character and keeping track
-	 of the path followed. */
-      link = trie->links;
-      links[0] = (struct tree *) &trie->links;
-      dirs[0] = L;
-      depth = 1;
-
-      while (link && label != link->label)
-	{
-	  links[depth] = link;
-	  if (label < link->label)
-	    dirs[depth++] = L, link = link->llink;
-	  else
-	    dirs[depth++] = R, link = link->rlink;
-	}
-
-      /* The current character doesn't have an outgoing link at
-	 this trie node, so build a new trie node and install
-	 a link in the current trie node's tree. */
-      if (!link)
-	{
-	  link = (struct tree *) obstack_alloc(&kwset->obstack,
-					       sizeof (struct tree));
-	  if (!link)
-	    return "memory exhausted";
-	  link->llink = NULL;
-	  link->rlink = NULL;
-	  link->trie = (struct trie *) obstack_alloc(&kwset->obstack,
-						     sizeof (struct trie));
-	  if (!link->trie)
-	    {
-	      obstack_free(&kwset->obstack, link);
-	      return "memory exhausted";
-	    }
-	  link->trie->accepting = 0;
-	  link->trie->links = NULL;
-	  link->trie->parent = trie;
-	  link->trie->next = NULL;
-	  link->trie->fail = NULL;
-	  link->trie->depth = trie->depth + 1;
-	  link->trie->shift = 0;
-	  link->label = label;
-	  link->balance = 0;
-
-	  /* Install the new tree node in its parent. */
-	  if (dirs[--depth] == L)
-	    links[depth]->llink = link;
-	  else
-	    links[depth]->rlink = link;
-
-	  /* Back up the tree fixing the balance flags. */
-	  while (depth && !links[depth]->balance)
-	    {
-	      if (dirs[depth] == L)
-		--links[depth]->balance;
-	      else
-		++links[depth]->balance;
-	      --depth;
-	    }
-
-	  /* Rebalance the tree by pointer rotations if necessary. */
-	  if (depth && ((dirs[depth] == L && --links[depth]->balance)
-			|| (dirs[depth] == R && ++links[depth]->balance)))
-	    {
-	      switch (links[depth]->balance)
-		{
-		case (char) -2:
-		  switch (dirs[depth + 1])
-		    {
-		    case L:
-		      r = links[depth], t = r->llink, rl = t->rlink;
-		      t->rlink = r, r->llink = rl;
-		      t->balance = r->balance = 0;
-		      break;
-		    case R:
-		      r = links[depth], l = r->llink, t = l->rlink;
-		      rl = t->rlink, lr = t->llink;
-		      t->llink = l, l->rlink = lr, t->rlink = r, r->llink = rl;
-		      l->balance = t->balance != 1 ? 0 : -1;
-		      r->balance = t->balance != (char) -1 ? 0 : 1;
-		      t->balance = 0;
-		      break;
-		    default:
-		      abort ();
-		    }
-		  break;
-		case 2:
-		  switch (dirs[depth + 1])
-		    {
-		    case R:
-		      l = links[depth], t = l->rlink, lr = t->llink;
-		      t->llink = l, l->rlink = lr;
-		      t->balance = l->balance = 0;
-		      break;
-		    case L:
-		      l = links[depth], r = l->rlink, t = r->llink;
-		      lr = t->llink, rl = t->rlink;
-		      t->llink = l, l->rlink = lr, t->rlink = r, r->llink = rl;
-		      l->balance = t->balance != 1 ? 0 : -1;
-		      r->balance = t->balance != (char) -1 ? 0 : 1;
-		      t->balance = 0;
-		      break;
-		    default:
-		      abort ();
-		    }
-		  break;
-		default:
-		  abort ();
-		}
-
-	      if (dirs[depth - 1] == L)
-		links[depth - 1]->llink = t;
-	      else
-		links[depth - 1]->rlink = t;
-	    }
-	}
-
-      trie = link->trie;
-    }
-
-  /* Mark the node we finally reached as accepting, encoding the
-     index number of this word in the keyword set so far. */
-  if (!trie->accepting)
-    trie->accepting = 1 + 2 * kwset->words;
-  ++kwset->words;
-
-  /* Keep track of the longest and shortest string of the keyword set. */
-  if (trie->depth < kwset->mind)
-    kwset->mind = trie->depth;
-  if (trie->depth > kwset->maxd)
-    kwset->maxd = trie->depth;
-
-  return NULL;
-}
-
-/* Enqueue the trie nodes referenced from the given tree in the
-   given queue. */
-static void
-enqueue (struct tree *tree, struct trie **last)
-{
-  if (!tree)
-    return;
-  enqueue(tree->llink, last);
-  enqueue(tree->rlink, last);
-  (*last) = (*last)->next = tree->trie;
-}
-
-/* Compute the Aho-Corasick failure function for the trie nodes referenced
-   from the given tree, given the failure function for their parent as
-   well as a last resort failure node. */
-static void
-treefails (register struct tree const *tree, struct trie const *fail,
-	   struct trie *recourse)
-{
-  register struct tree *link;
-
-  if (!tree)
-    return;
-
-  treefails(tree->llink, fail, recourse);
-  treefails(tree->rlink, fail, recourse);
-
-  /* Find, in the chain of fails going back to the root, the first
-     node that has a descendant on the current label. */
-  while (fail)
-    {
-      link = fail->links;
-      while (link && tree->label != link->label)
-	if (tree->label < link->label)
-	  link = link->llink;
-	else
-	  link = link->rlink;
-      if (link)
-	{
-	  tree->trie->fail = link->trie;
-	  return;
-	}
-      fail = fail->fail;
-    }
-
-  tree->trie->fail = recourse;
-}
-
-/* Set delta entries for the links of the given tree such that
-   the preexisting delta value is larger than the current depth. */
-static void
-treedelta (register struct tree const *tree,
-	   register unsigned int depth,
-	   unsigned char delta[])
-{
-  if (!tree)
-    return;
-  treedelta(tree->llink, depth, delta);
-  treedelta(tree->rlink, depth, delta);
-  if (depth < delta[tree->label])
-    delta[tree->label] = depth;
-}
-
-/* Return true if A has every label in B. */
-static int
-hasevery (register struct tree const *a, register struct tree const *b)
-{
-  if (!b)
-    return 1;
-  if (!hasevery(a, b->llink))
-    return 0;
-  if (!hasevery(a, b->rlink))
-    return 0;
-  while (a && b->label != a->label)
-    if (b->label < a->label)
-      a = a->llink;
-    else
-      a = a->rlink;
-  return !!a;
-}
-
-/* Compute a vector, indexed by character code, of the trie nodes
-   referenced from the given tree. */
-static void
-treenext (struct tree const *tree, struct trie *next[])
-{
-  if (!tree)
-    return;
-  treenext(tree->llink, next);
-  treenext(tree->rlink, next);
-  next[tree->label] = tree->trie;
-}
-
-/* Compute the shift for each trie node, as well as the delta
-   table and next cache for the given keyword set. */
-const char *
-kwsprep (kwset_t kws)
-{
-  register struct kwset *kwset;
-  register int i;
-  register struct trie *curr;
-  register unsigned char const *trans;
-  unsigned char delta[NCHAR];
-
-  kwset = (struct kwset *) kws;
-
-  /* Initial values for the delta table; will be changed later.  The
-     delta entry for a given character is the smallest depth of any
-     node at which an outgoing edge is labeled by that character. */
-  memset(delta, kwset->mind < UCHAR_MAX ? kwset->mind : UCHAR_MAX, NCHAR);
-
-  /* Check if we can use the simple boyer-moore algorithm, instead
-     of the hairy commentz-walter algorithm. */
-  if (kwset->words == 1 && kwset->trans == NULL)
-    {
-      char c;
-
-      /* Looking for just one string.  Extract it from the trie. */
-      kwset->target = obstack_alloc(&kwset->obstack, kwset->mind);
-      if (!kwset->target)
-	return "memory exhausted";
-      for (i = kwset->mind - 1, curr = kwset->trie; i >= 0; --i)
-	{
-	  kwset->target[i] = curr->links->label;
-	  curr = curr->links->trie;
-	}
-      /* Build the Boyer Moore delta.  Boy that's easy compared to CW. */
-      for (i = 0; i < kwset->mind; ++i)
-	delta[U(kwset->target[i])] = kwset->mind - (i + 1);
-      /* Find the minimal delta2 shift that we might make after
-	 a backwards match has failed. */
-      c = kwset->target[kwset->mind - 1];
-      for (i = kwset->mind - 2; i >= 0; --i)
-	if (kwset->target[i] == c)
-	  break;
-      kwset->mind2 = kwset->mind - (i + 1);
-    }
-  else
-    {
-      register struct trie *fail;
-      struct trie *last, *next[NCHAR];
-
-      /* Traverse the nodes of the trie in level order, simultaneously
-	 computing the delta table, failure function, and shift function. */
-      for (curr = last = kwset->trie; curr; curr = curr->next)
-	{
-	  /* Enqueue the immediate descendants in the level order queue. */
-	  enqueue(curr->links, &last);
-
-	  curr->shift = kwset->mind;
-	  curr->maxshift = kwset->mind;
-
-	  /* Update the delta table for the descendants of this node. */
-	  treedelta(curr->links, curr->depth, delta);
-
-	  /* Compute the failure function for the descendants of this node. */
-	  treefails(curr->links, curr->fail, kwset->trie);
-
-	  /* Update the shifts at each node in the current node's chain
-	     of fails back to the root. */
-	  for (fail = curr->fail; fail; fail = fail->fail)
-	    {
-	      /* If the current node has some outgoing edge that the fail
-		 doesn't, then the shift at the fail should be no larger
-		 than the difference of their depths. */
-	      if (!hasevery(fail->links, curr->links))
-		if (curr->depth - fail->depth < fail->shift)
-		  fail->shift = curr->depth - fail->depth;
-
-	      /* If the current node is accepting then the shift at the
-		 fail and its descendants should be no larger than the
-		 difference of their depths. */
-	      if (curr->accepting && fail->maxshift > curr->depth - fail->depth)
-		fail->maxshift = curr->depth - fail->depth;
-	    }
-	}
-
-      /* Traverse the trie in level order again, fixing up all nodes whose
-	 shift exceeds their inherited maxshift. */
-      for (curr = kwset->trie->next; curr; curr = curr->next)
-	{
-	  if (curr->maxshift > curr->parent->maxshift)
-	    curr->maxshift = curr->parent->maxshift;
-	  if (curr->shift > curr->maxshift)
-	    curr->shift = curr->maxshift;
-	}
-
-      /* Create a vector, indexed by character code, of the outgoing links
-	 from the root node. */
-      for (i = 0; i < NCHAR; ++i)
-	next[i] = NULL;
-      treenext(kwset->trie->links, next);
-
-      if ((trans = kwset->trans) != NULL)
-	for (i = 0; i < NCHAR; ++i)
-	  kwset->next[i] = next[U(trans[i])];
-      else
-	COPY_ARRAY(kwset->next, next, NCHAR);
-    }
-
-  /* Fix things up for any translation table. */
-  if ((trans = kwset->trans) != NULL)
-    for (i = 0; i < NCHAR; ++i)
-      kwset->delta[i] = delta[U(trans[i])];
-  else
-    memcpy(kwset->delta, delta, NCHAR);
-
-  return NULL;
-}
-
-/* Fast boyer-moore search. */
-static size_t
-bmexec (kwset_t kws, char const *text, size_t size)
-{
-  struct kwset const *kwset;
-  register unsigned char const *d1;
-  register char const *ep, *sp, *tp;
-  register int d, gc, i, len, md2;
-
-  kwset = (struct kwset const *) kws;
-  len = kwset->mind;
-
-  if (len == 0)
-    return 0;
-  if (len > size)
-    return -1;
-  if (len == 1)
-    {
-      tp = memchr (text, kwset->target[0], size);
-      return tp ? tp - text : -1;
-    }
-
-  d1 = kwset->delta;
-  sp = kwset->target + len;
-  gc = U(sp[-2]);
-  md2 = kwset->mind2;
-  tp = text + len;
-
-  /* Significance of 12: 1 (initial offset) + 10 (skip loop) + 1 (md2). */
-  if (size > 12 * len)
-    /* 11 is not a bug, the initial offset happens only once. */
-    for (ep = text + size - 11 * len;;)
-      {
-	while (tp <= ep)
-	  {
-	    d = d1[U(tp[-1])], tp += d;
-	    d = d1[U(tp[-1])], tp += d;
-	    if (d == 0)
-	      goto found;
-	    d = d1[U(tp[-1])], tp += d;
-	    d = d1[U(tp[-1])], tp += d;
-	    d = d1[U(tp[-1])], tp += d;
-	    if (d == 0)
-	      goto found;
-	    d = d1[U(tp[-1])], tp += d;
-	    d = d1[U(tp[-1])], tp += d;
-	    d = d1[U(tp[-1])], tp += d;
-	    if (d == 0)
-	      goto found;
-	    d = d1[U(tp[-1])], tp += d;
-	    d = d1[U(tp[-1])], tp += d;
-	  }
-	break;
-      found:
-	if (U(tp[-2]) == gc)
-	  {
-	    for (i = 3; i <= len && U(tp[-i]) == U(sp[-i]); ++i)
-	      ;
-	    if (i > len)
-	      return tp - len - text;
-	  }
-	tp += md2;
-      }
-
-  /* Now we have only a few characters left to search.  We
-     carefully avoid ever producing an out-of-bounds pointer. */
-  ep = text + size;
-  d = d1[U(tp[-1])];
-  while (d <= ep - tp)
-    {
-      d = d1[U((tp += d)[-1])];
-      if (d != 0)
-	continue;
-      if (U(tp[-2]) == gc)
-	{
-	  for (i = 3; i <= len && U(tp[-i]) == U(sp[-i]); ++i)
-	    ;
-	  if (i > len)
-	    return tp - len - text;
-	}
-      d = md2;
-    }
-
-  return -1;
-}
-
-/* Hairy multiple string search. */
-static size_t
-cwexec (kwset_t kws, char const *text, size_t len, struct kwsmatch *kwsmatch)
-{
-  struct kwset const *kwset;
-  struct trie * const *next;
-  struct trie const *trie;
-  struct trie const *accept;
-  char const *beg, *lim, *mch, *lmch;
-  register unsigned char c;
-  register unsigned char const *delta;
-  register int d;
-  register char const *end, *qlim;
-  register struct tree const *tree;
-  register unsigned char const *trans;
-
-  accept = NULL;
-
-  /* Initialize register copies and look for easy ways out. */
-  kwset = (struct kwset *) kws;
-  if (len < kwset->mind)
-    return -1;
-  next = kwset->next;
-  delta = kwset->delta;
-  trans = kwset->trans;
-  lim = text + len;
-  end = text;
-  if ((d = kwset->mind) != 0)
-    mch = NULL;
-  else
-    {
-      mch = text, accept = kwset->trie;
-      goto match;
-    }
-
-  if (len >= 4 * kwset->mind)
-    qlim = lim - 4 * kwset->mind;
-  else
-    qlim = NULL;
-
-  while (lim - end >= d)
-    {
-      if (qlim && end <= qlim)
-	{
-	  end += d - 1;
-	  while ((d = delta[c = *end]) && end < qlim)
-	    {
-	      end += d;
-	      end += delta[U(*end)];
-	      end += delta[U(*end)];
-	    }
-	  ++end;
-	}
-      else
-	d = delta[c = (end += d)[-1]];
-      if (d)
-	continue;
-      beg = end - 1;
-      trie = next[c];
-      if (trie->accepting)
-	{
-	  mch = beg;
-	  accept = trie;
-	}
-      d = trie->shift;
-      while (beg > text)
-	{
-	  c = trans ? trans[U(*--beg)] : *--beg;
-	  tree = trie->links;
-	  while (tree && c != tree->label)
-	    if (c < tree->label)
-	      tree = tree->llink;
-	    else
-	      tree = tree->rlink;
-	  if (tree)
-	    {
-	      trie = tree->trie;
-	      if (trie->accepting)
-		{
-		  mch = beg;
-		  accept = trie;
-		}
-	    }
-	  else
-	    break;
-	  d = trie->shift;
-	}
-      if (mch)
-	goto match;
-    }
-  return -1;
-
- match:
-  /* Given a known match, find the longest possible match anchored
-     at or before its starting point.  This is nearly a verbatim
-     copy of the preceding main search loops. */
-  if (lim - mch > kwset->maxd)
-    lim = mch + kwset->maxd;
-  lmch = NULL;
-  d = 1;
-  while (lim - end >= d)
-    {
-      if ((d = delta[c = (end += d)[-1]]) != 0)
-	continue;
-      beg = end - 1;
-      if (!(trie = next[c]))
-	{
-	  d = 1;
-	  continue;
-	}
-      if (trie->accepting && beg <= mch)
-	{
-	  lmch = beg;
-	  accept = trie;
-	}
-      d = trie->shift;
-      while (beg > text)
-	{
-	  c = trans ? trans[U(*--beg)] : *--beg;
-	  tree = trie->links;
-	  while (tree && c != tree->label)
-	    if (c < tree->label)
-	      tree = tree->llink;
-	    else
-	      tree = tree->rlink;
-	  if (tree)
-	    {
-	      trie = tree->trie;
-	      if (trie->accepting && beg <= mch)
-		{
-		  lmch = beg;
-		  accept = trie;
-		}
-	    }
-	  else
-	    break;
-	  d = trie->shift;
-	}
-      if (lmch)
-	{
-	  mch = lmch;
-	  goto match;
-	}
-      if (!d)
-	d = 1;
-    }
-
-  if (kwsmatch)
-    {
-      kwsmatch->index = accept->accepting / 2;
-      kwsmatch->offset[0] = mch - text;
-      kwsmatch->size[0] = accept->depth;
-    }
-  return mch - text;
-}
-
-/* Search through the given text for a match of any member of the
-   given keyword set.  Return a pointer to the first character of
-   the matching substring, or NULL if no match is found.  If FOUNDLEN
-   is non-NULL store in the referenced location the length of the
-   matching substring.  Similarly, if FOUNDIDX is non-NULL, store
-   in the referenced location the index number of the particular
-   keyword matched. */
-size_t
-kwsexec (kwset_t kws, char const *text, size_t size,
-	 struct kwsmatch *kwsmatch)
-{
-  struct kwset const *kwset = (struct kwset *) kws;
-  if (kwset->words == 1 && kwset->trans == NULL)
-    {
-      size_t ret = bmexec (kws, text, size);
-      if (kwsmatch != NULL && ret != (size_t) -1)
-	{
-	  kwsmatch->index = 0;
-	  kwsmatch->offset[0] = ret;
-	  kwsmatch->size[0] = kwset->mind;
-	}
-      return ret;
-    }
-  else
-    return cwexec(kws, text, size, kwsmatch);
-}
-
-/* Free the components of the given keyword set. */
-void
-kwsfree (kwset_t kws)
-{
-  struct kwset *kwset;
-
-  kwset = (struct kwset *) kws;
-  obstack_free(&kwset->obstack, NULL);
-  free(kws);
-}
diff --git a/kwset.h b/kwset.h
deleted file mode 100644
index f50ecae573..0000000000
--- a/kwset.h
+++ /dev/null
@@ -1,65 +0,0 @@
-#ifndef KWSET_H
-#define KWSET_H
-
-/* This file has been copied from commit e7ac713d^ in the GNU grep git
- * repository. A few small changes have been made to adapt the code to
- * Git.
- */
-
-/* kwset.h - header declaring the keyword set library.
-   Copyright (C) 1989, 1998, 2005 Free Software Foundation, Inc.
-
-   This program is free software; you can redistribute it and/or modify
-   it under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 2, or (at your option)
-   any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program; if not, see <http://www.gnu.org/licenses/>. */
-
-/* Written August 1989 by Mike Haertel.
-   The author may be reached (Email) at the address mike@ai.mit.edu,
-   or (US mail) as Mike Haertel c/o Free Software Foundation. */
-
-struct kwsmatch
-{
-  int index;			/* Index number of matching keyword. */
-  size_t offset[1];		/* Offset of each submatch. */
-  size_t size[1];		/* Length of each submatch. */
-};
-
-struct kwset_t;
-typedef struct kwset_t* kwset_t;
-
-/* Return an opaque pointer to a newly allocated keyword set, or NULL
-   if enough memory cannot be obtained.  The argument if non-NULL
-   specifies a table of character translations to be applied to all
-   pattern and search text. */
-kwset_t kwsalloc(unsigned char const *);
-
-/* Incrementally extend the keyword set to include the given string.
-   Return NULL for success, or an error message.  Remember an index
-   number for each keyword included in the set. */
-const char *kwsincr(kwset_t, char const *, size_t);
-
-/* When the keyword set has been completely built, prepare it for
-   use.  Return NULL for success, or an error message. */
-const char *kwsprep(kwset_t);
-
-/* Search through the given buffer for a member of the keyword set.
-   Return a pointer to the leftmost longest match found, or NULL if
-   no match is found.  If foundlen is non-NULL, store the length of
-   the matching substring in the integer it points to.  Similarly,
-   if foundindex is non-NULL, store the index of the particular
-   keyword found therein. */
-size_t kwsexec(kwset_t, char const *, size_t, struct kwsmatch *);
-
-/* Deallocate the given keyword set and all its associated storage. */
-void kwsfree(kwset_t);
-
-#endif /* KWSET_H */
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 23/25] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (21 preceding siblings ...)
  2021-02-03  3:28 ` [PATCH 22/25] Remove unused kwset.[ch] Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:28 ` Ævar Arnfjörð Bjarmason
  2021-02-03  3:28 ` [PATCH 24/25] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
                   ` (25 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Change the function prototype of xdiff_emit_{line,hunk}_fn to return
an int instead of void. This will allow for returning early from hunk
& diff consumers that want some of the data, but not all of it.

No behavior is being changed here, just replacing the equivalent of
"return" with "return 0", nothing acts on the changed return values
yet.

There was some work in this area of xdiff-interface.[ch] recently with
3b40a090fd4 (diff: avoid generating unused hunk header lines,
2018-11-02) and 7c61e25fbf1 (diff: use hunk callback for word-diff,
2018-11-02).

In combination those two changes allow us to not do any work on the
hunks and diff at all, but didn't change the status quo with regards
to consumers that e.g. want the diff lines, but might want to abort
early.

Whereas now we can abort e.g. on the first "-line" of a 1000 line diff
if that's all we needed.

This interface is rather scary as noted in the comment to
xdiff-interface.h being added here, but it will be useful for
diffcore-pickaxe.c in a subsequent commit. A future change could
e.g. add more exit codes, and hack xdl_emit_diff() and friends to
ignore or skip things more selectively as a result.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 combine-diff.c     |  9 ++++++---
 diff.c             | 39 +++++++++++++++++++++++----------------
 diffcore-pickaxe.c |  7 ++++---
 range-diff.c       |  8 +++++---
 xdiff-interface.c  | 10 ++++++----
 xdiff-interface.h  | 31 +++++++++++++++++++++++--------
 6 files changed, 67 insertions(+), 37 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 9228aebc16..6590c4b5fb 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -369,7 +369,7 @@ struct combine_diff_state {
 	struct sline *lost_bucket;
 };
 
-static void consume_hunk(void *state_,
+static int consume_hunk(void *state_,
 			 long ob, long on,
 			 long nb, long nn,
 			 const char *funcline, long funclen)
@@ -401,13 +401,15 @@ static void consume_hunk(void *state_,
 		state->sline[state->nb-1].p_lno =
 			xcalloc(state->num_parent, sizeof(unsigned long));
 	state->sline[state->nb-1].p_lno[state->n] = state->ob;
+
+	return 0;
 }
 
-static void consume_line(void *state_, char *line, unsigned long len)
+static int consume_line(void *state_, char *line, unsigned long len)
 {
 	struct combine_diff_state *state = state_;
 	if (!state->lost_bucket)
-		return; /* not in any hunk yet */
+		return 0; /* not in any hunk yet */
 	switch (line[0]) {
 	case '-':
 		append_lost(state->lost_bucket, state->n, line+1, len-1);
@@ -417,6 +419,7 @@ static void consume_line(void *state_, char *line, unsigned long len)
 		state->lno++;
 		break;
 	}
+	return 0;
 }
 
 static void combine_diff(struct repository *r,
diff --git a/diff.c b/diff.c
index 69e3bc00ed..bdedf9fdfa 100644
--- a/diff.c
+++ b/diff.c
@@ -1996,10 +1996,10 @@ static int color_words_output_graph_prefix(struct diff_words_data *diff_words)
 	}
 }
 
-static void fn_out_diff_words_aux(void *priv,
-				  long minus_first, long minus_len,
-				  long plus_first, long plus_len,
-				  const char *func, long funclen)
+static int fn_out_diff_words_aux(void *priv,
+				 long minus_first, long minus_len,
+				 long plus_first, long plus_len,
+				 const char *func, long funclen)
 {
 	struct diff_words_data *diff_words = priv;
 	struct diff_words_style *style = diff_words->style;
@@ -2047,6 +2047,8 @@ static void fn_out_diff_words_aux(void *priv,
 
 	diff_words->current_plus = plus_end;
 	diff_words->last_minus = minus_first;
+
+	return 0;
 }
 
 /* This function starts looking at *begin, and returns 0 iff a word was found. */
@@ -2338,7 +2340,7 @@ static void find_lno(const char *line, struct emit_callback *ecbdata)
 	ecbdata->lno_in_postimage = strtol(p + 1, NULL, 10);
 }
 
-static void fn_out_consume(void *priv, char *line, unsigned long len)
+static int fn_out_consume(void *priv, char *line, unsigned long len)
 {
 	struct emit_callback *ecbdata = priv;
 	struct diff_options *o = ecbdata->opt;
@@ -2374,7 +2376,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		len = sane_truncate_line(line, len);
 		find_lno(line, ecbdata);
 		emit_hunk_header(ecbdata, line, len);
-		return;
+		return 0;
 	}
 
 	if (ecbdata->diff_words) {
@@ -2384,11 +2386,11 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		if (line[0] == '-') {
 			diff_words_append(line, len,
 					  &ecbdata->diff_words->minus);
-			return;
+			return 0;
 		} else if (line[0] == '+') {
 			diff_words_append(line, len,
 					  &ecbdata->diff_words->plus);
-			return;
+			return 0;
 		} else if (starts_with(line, "\\ ")) {
 			/*
 			 * Eat the "no newline at eof" marker as if we
@@ -2397,11 +2399,11 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 			 * defer processing. If this is the end of
 			 * preimage, more "+" lines may come after it.
 			 */
-			return;
+			return 0;
 		}
 		diff_words_flush(ecbdata);
 		emit_diff_symbol(o, s, line, len, 0);
-		return;
+		return 0;
 	}
 
 	switch (line[0]) {
@@ -2425,6 +2427,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 				 line, len, 0);
 		break;
 	}
+	return 0;
 }
 
 static void pprint_rename(struct strbuf *name, const char *a, const char *b)
@@ -2524,7 +2527,7 @@ static struct diffstat_file *diffstat_add(struct diffstat_t *diffstat,
 	return x;
 }
 
-static void diffstat_consume(void *priv, char *line, unsigned long len)
+static int diffstat_consume(void *priv, char *line, unsigned long len)
 {
 	struct diffstat_t *diffstat = priv;
 	struct diffstat_file *x = diffstat->files[diffstat->nr - 1];
@@ -2533,6 +2536,7 @@ static void diffstat_consume(void *priv, char *line, unsigned long len)
 		x->added++;
 	else if (line[0] == '-')
 		x->deleted++;
+	return 0;
 }
 
 const char mime_boundary_leader[] = "------------";
@@ -3201,16 +3205,17 @@ static int is_conflict_marker(const char *line, int marker_size, unsigned long l
 	return 1;
 }
 
-static void checkdiff_consume_hunk(void *priv,
+static int checkdiff_consume_hunk(void *priv,
 				   long ob, long on, long nb, long nn,
 				   const char *func, long funclen)
 
 {
 	struct checkdiff_t *data = priv;
 	data->lineno = nb - 1;
+	return 0;
 }
 
-static void checkdiff_consume(void *priv, char *line, unsigned long len)
+static int checkdiff_consume(void *priv, char *line, unsigned long len)
 {
 	struct checkdiff_t *data = priv;
 	int marker_size = data->conflict_marker_size;
@@ -3234,7 +3239,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 		}
 		bad = ws_check(line + 1, len - 1, data->ws_rule);
 		if (!bad)
-			return;
+			return 0;
 		data->status |= bad;
 		err = whitespace_error_string(bad);
 		fprintf(data->o->file, "%s%s:%d: %s.\n",
@@ -3246,6 +3251,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 	} else if (line[0] == ' ') {
 		data->lineno++;
 	}
+	return 0;
 }
 
 static unsigned char *deflate_it(char *data,
@@ -6098,17 +6104,18 @@ void flush_one_hunk(struct object_id *result, git_hash_ctx *ctx)
 	}
 }
 
-static void patch_id_consume(void *priv, char *line, unsigned long len)
+static int patch_id_consume(void *priv, char *line, unsigned long len)
 {
 	struct patch_id_t *data = priv;
 	int new_len;
 
 	if (len > 12 && starts_with(line, "\\ "))
-		return;
+		return 0;
 	new_len = remove_space(line, len);
 
 	the_hash_algo->update_fn(data->ctx, line, new_len);
 	data->patchlen += new_len;
+	return 0;
 }
 
 static void patch_id_add_string(git_hash_ctx *ctx, const char *str)
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 25ab1b2427..21f9d66b6a 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -19,7 +19,7 @@ struct diffgrep_cb {
 	int hit;
 };
 
-static void diffgrep_consume(void *priv, char *line, unsigned long len)
+static int diffgrep_consume(void *priv, char *line, unsigned long len)
 {
 	struct diffgrep_cb *data = priv;
 	regmatch_t regmatch;
@@ -27,14 +27,15 @@ static void diffgrep_consume(void *priv, char *line, unsigned long len)
 	struct grep_pat *grep_pat = grep_filter->pattern_list;
 
 	if (line[0] != '+' && line[0] != '-')
-		return;
+		return 0;
 	if (data->hit)
 		/*
 		 * NEEDSWORK: we should have a way to terminate the
 		 * caller early.
 		 */
-		return;
+		return 0;
 	data->hit = patmatch(grep_pat, line + 1, line + len + 1, &regmatch, 0);
+	return 0;
 }
 
 static int diff_grep(mmfile_t *one, mmfile_t *two,
diff --git a/range-diff.c b/range-diff.c
index b9950f10c8..cb40f67a26 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -267,15 +267,17 @@ static void find_exact_matches(struct string_list *a, struct string_list *b)
 	hashmap_clear(&map);
 }
 
-static void diffsize_consume(void *data, char *line, unsigned long len)
+static int diffsize_consume(void *data, char *line, unsigned long len)
 {
 	(*(int *)data)++;
+	return 0;
 }
 
-static void diffsize_hunk(void *data, long ob, long on, long nb, long nn,
-			  const char *funcline, long funclen)
+static int diffsize_hunk(void *data, long ob, long on, long nb, long nn,
+			 const char *funcline, long funclen)
 {
 	diffsize_consume(data, NULL, 0);
+	return 0;
 }
 
 static int diffsize(const char *a, const char *b)
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 4d20069302..ef557dc4e6 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -31,7 +31,7 @@ static int xdiff_out_hunk(void *priv_,
 	return 0;
 }
 
-static void consume_one(void *priv_, char *s, unsigned long size)
+static int consume_one(void *priv_, char *s, unsigned long size)
 {
 	struct xdiff_emit_state *priv = priv_;
 	char *ep;
@@ -43,6 +43,7 @@ static void consume_one(void *priv_, char *s, unsigned long size)
 		size -= this_size;
 		s += this_size;
 	}
+	return 0;
 }
 
 static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
@@ -115,10 +116,11 @@ int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t co
 	return xdl_diff(&a, &b, xpp, xecfg, xecb);
 }
 
-void discard_hunk_line(void *priv,
-		       long ob, long on, long nb, long nn,
-		       const char *func, long funclen)
+int discard_hunk_line(void *priv,
+		      long ob, long on, long nb, long nn,
+		      const char *func, long funclen)
 {
+	return 0;
 }
 
 int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 93df26900c..1b27d6104c 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -11,11 +11,26 @@
  */
 #define MAX_XDIFF_SIZE (1024UL * 1024 * 1023)
 
-typedef void (*xdiff_emit_line_fn)(void *, char *, unsigned long);
-typedef void (*xdiff_emit_hunk_fn)(void *data,
-				   long old_begin, long old_nr,
-				   long new_begin, long new_nr,
-				   const char *func, long funclen);
+/*
+ * The xdiff_emit_{line,hunk}_fn consumers can return -1 to abort
+ * early, or 0 to continue processing. Note that doing so is an
+ * all-or-nothing affair, as returning -1 will return all the way to
+ * the top-level, e.g. the xdi_diff_outf() call to generate the diff.
+ *
+ * Thus returning -1 from a hunk header callback means you won't be
+ * getting any more hunks, or diffs, and likewise returning from a
+ * line callback means you won't be getting anymore lines.
+ *
+ * We may extend the interface in the future to understand other more
+ * granular return values, but for now use it carefully, or consider
+ * e.g. using discard_hunk_line() if you say just don't care about
+ * hunk headers.
+ */
+typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
+typedef int (*xdiff_emit_hunk_fn)(void *data,
+				  long old_begin, long old_nr,
+				  long new_begin, long new_nr,
+				  const char *func, long funclen);
 
 int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb);
 int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
@@ -36,9 +51,9 @@ extern int git_xmerge_style;
  * Can be used as a no-op hunk_fn for xdi_diff_outf(), since a NULL
  * one just sends the hunk line to the line_fn callback).
  */
-void discard_hunk_line(void *priv,
-		       long ob, long on, long nb, long nn,
-		       const char *func, long funclen);
+int discard_hunk_line(void *priv,
+		      long ob, long on, long nb, long nn,
+		      const char *func, long funclen);
 
 /*
  * Compare the strings l1 with l2 which are of size s1 and s2 respectively.
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 24/25] xdiff-interface: support early exit in xdiff_outf()
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (22 preceding siblings ...)
  2021-02-03  3:28 ` [PATCH 23/25] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:28 ` Ævar Arnfjörð Bjarmason
  2021-02-04 18:16   ` Junio C Hamano
  2021-02-03  3:28 ` [PATCH 25/25] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
                   ` (24 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Bridge the gap between the preceding "xdiff-interface: allow early
return from xdiff_emit_{line,hunk}_fn" change and the public
interface. This change was split off from the rest as it wasn't a
purely mechanical addition of "return 0".

Here we want to be able to abort early, but do so in a way that
doesn't skip the appropriate strbuf_reset() invocations.

The use of -1 as a return value in the xdiff codebase for early
return, as we'll see more of in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 xdiff-interface.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index ef557dc4e6..d066442470 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -39,7 +39,8 @@ static int consume_one(void *priv_, char *s, unsigned long size)
 		unsigned long this_size;
 		ep = memchr(s, '\n', size);
 		this_size = (ep == NULL) ? size : (ep - s + 1);
-		priv->line_fn(priv->consume_callback_data, s, this_size);
+		if (priv->line_fn(priv->consume_callback_data, s, this_size))
+			return -1;
 		size -= this_size;
 		s += this_size;
 	}
@@ -50,11 +51,14 @@ static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
 {
 	struct xdiff_emit_state *priv = priv_;
 	int i;
+	int stop = 0;
 
 	if (!priv->line_fn)
 		return 0;
 
 	for (i = 0; i < nbuf; i++) {
+		if (stop)
+			return -1;
 		if (mb[i].ptr[mb[i].size-1] != '\n') {
 			/* Incomplete line */
 			strbuf_add(&priv->remainder, mb[i].ptr, mb[i].size);
@@ -63,17 +67,21 @@ static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
 
 		/* we have a complete line */
 		if (!priv->remainder.len) {
-			consume_one(priv, mb[i].ptr, mb[i].size);
+			stop = consume_one(priv, mb[i].ptr, mb[i].size);
 			continue;
 		}
 		strbuf_add(&priv->remainder, mb[i].ptr, mb[i].size);
-		consume_one(priv, priv->remainder.buf, priv->remainder.len);
+		stop = consume_one(priv, priv->remainder.buf, priv->remainder.len);
 		strbuf_reset(&priv->remainder);
 	}
+	if (stop)
+		return -1;
 	if (priv->remainder.len) {
-		consume_one(priv, priv->remainder.buf, priv->remainder.len);
+		stop = consume_one(priv, priv->remainder.buf, priv->remainder.len);
 		strbuf_reset(&priv->remainder);
 	}
+	if (stop)
+		return -1;
 	return 0;
 }
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 25/25] pickaxe -G: terminate early on matching lines
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (23 preceding siblings ...)
  2021-02-03  3:28 ` [PATCH 24/25] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
@ 2021-02-03  3:28 ` Ævar Arnfjörð Bjarmason
  2021-02-03 12:38 ` [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (23 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03  3:28 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Solve a long-standing item for "git log -Grx" of us e.g. finding "+
str" in the diff context and noting that we had a "hit", but xdiff
diligently continuing to generate and spew the rest of the diff at us.

The TODO item has been there since "git log -G" was implemented. See
f506b8e8b5f (git log/diff: add -G<regexp> that greps in the patch
text, 2010-08-23).

Our xdiff interface also had the limitation of not being able to abort
early since the beginning, see d9ea73e0564 (combine-diff: refactor
built-in xdiff interface., 2006-04-05). Although at that time
"xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
"xdiff_emit_hunk_fn" didn't exist yet.

But now with the support added in the preceding ""xdiff-interface:
allow early return from xdiff_emit_{line,hunk}_fn" commit we can
return early, and furthermore test the functionality of the new
early-exit xdiff-interface by having a BUG() call here to die if it
ever starts handing us needless work again.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 29 +++++++++++++++++++----------
 xdiff-interface.h  |  5 +++++
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 21f9d66b6a..e773fa69a2 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -29,12 +29,11 @@ static int diffgrep_consume(void *priv, char *line, unsigned long len)
 	if (line[0] != '+' && line[0] != '-')
 		return 0;
 	if (data->hit)
-		/*
-		 * NEEDSWORK: we should have a way to terminate the
-		 * caller early.
-		 */
-		return 0;
-	data->hit = patmatch(grep_pat, line + 1, line + len + 1, &regmatch, 0);
+		BUG("Already matched in diffgrep_consume! Broken xdiff_emit_line_fn?");
+	if (patmatch(grep_pat, line + 1, line + len + 1, &regmatch, 0)) {
+		data->hit = 1;
+		return -1;
+	}
 	return 0;
 }
 
@@ -47,6 +46,7 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	xdemitconf_t xecfg;
 	regmatch_t regmatch;
 	struct grep_pat *grep_pat = grep_filter->pattern_list;
+	int ret;
 
 	if (!one)
 		return patmatch(grep_pat, two->ptr, two->ptr + two->size,
@@ -65,10 +65,19 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	ecbdata.hit = 0;
 	xecfg.ctxlen = 0;
 	xecfg.interhunkctxlen = o->interhunkcontext;
-	if (xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
-			  &ecbdata, &xpp, &xecfg))
-		return 0;
-	return ecbdata.hit;
+
+	/*
+	 * An xdiff error might be our "data->hit" from above. See the
+	 * comment for xdiff_emit_{line,hunk}_fn in xdiff-interface.h
+	 * for why.
+	 */
+	ret = xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
+			    &ecbdata, &xpp, &xecfg);
+	if (ecbdata.hit)
+		return 1;
+	if (ret)
+		return ret;
+	return 0;
 }
 
 static unsigned int contains(mmfile_t *mf, struct grep_opt *grep_filter)
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 1b27d6104c..347d8a4425 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -25,6 +25,11 @@
  * granular return values, but for now use it carefully, or consider
  * e.g. using discard_hunk_line() if you say just don't care about
  * hunk headers.
+ *
+ * Note that just returning -1 will make your early return
+ * indistinguishable from an error internal to xdiff. See "diff_grep"
+ * in diffcore-pickaxe.c for a trick to work around this, i.e. using
+ * the "consume_callback_data" to note the desired early return.
  */
 typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
 typedef int (*xdiff_emit_hunk_fn)(void *data,
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch]
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (24 preceding siblings ...)
  2021-02-03  3:28 ` [PATCH 25/25] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
@ 2021-02-03 12:38 ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:57 ` [PATCH v2 00/22] pickaxe: test and refactoring for follow-up changes Ævar Arnfjörð Bjarmason
                   ` (22 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03 12:38 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason


I'm aware of a CI failure related to this series:

On Wed, Feb 03 2021, Ævar Arnfjörð Bjarmason wrote:

>   pickaxe tests: refactor to use test_commit --append

It's because here I fed "\0" etc. to "echo" instead of "printf", which
isn't portable. I've got a fix for this locally, but want to wait for
more comments before sending a re-roll.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 22/25] Remove unused kwset.[ch]
       [not found]   ` <CAPUEspgBmuTBHVZWY9fRtjbHWBRr0zHravLL1Czepc6jmib4HA@mail.gmail.com>
@ 2021-02-03 14:13     ` Ævar Arnfjörð Bjarmason
       [not found]       ` <CAPUEsphN7QuSVsC1Tr4xE8yQgPTtpF7wL7zbk1crQU3n-5g6JQ@mail.gmail.com>
  0 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03 14:13 UTC (permalink / raw)
  To: Carlo Arenas; +Cc: git, Junio C Hamano, Jeff King, Johannes Schindelin


On Wed, Feb 03 2021, Carlo Arenas wrote:

> this is still being used in Linux without Glibc (ex: alpine or void that
> use musl), and other OS that rely on the compat layer by default for legacy
> reasons (ex: macOS and Windows) or where PCRE2 is not widely used/available
> (ex: OpenBSD, NetBSD)

Are you perhaps confusing kwset with NO_REGEX=1 and compat/regex/*? I
just say that because that's a common "Linux without Glibc" && musl
fallback.

The kwset is not a fallback, we use it unconditionally on all platforms
regardless of libc etc., until this series.

Anyway, as far as PCRE v2 and compatibility go there's no compatibility
concerns here we didn't address already & have in in-the-wild git
releases since Since 48de2a768c (grep: remove the kwset optimization,
2019-07-01) and b65abcafc7 (grep: use PCRE v2 for optimized fixed-string
search, 2019-07-01).

I.e. we haven't used kwset at all in the more commonly used grep code,
just C library regex + PCRE if it's available.

FWIW I think the commonly used packages on Windows and MacOS build with
PCRE v2, but that's just from memory, I don't use those myself.

> PS. hadn't yet tested this series, but thought it was a good idea to at
> least mention this as a FYI to make sure tradeoffs are well known and
> testing done as well.

Indeed, and a glance at the log & past list traffic shows we had a lot
of such portability fixes around pcre2 in the past.

But in this case we're just directing the pickaxe to use the existing
grep codepath, so I think we should be fine, sans small stuff like (sent
a reply to the cover letter about this) me screwing up shell portability
in the v1 etc.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 19/25] pickaxe -G: set -U0 for diff generation
  2021-02-03  3:28 ` [PATCH 19/25] pickaxe -G: set -U0 for diff generation Ævar Arnfjörð Bjarmason
@ 2021-02-03 14:26   ` Ævar Arnfjörð Bjarmason
  2021-02-03 19:42     ` Junio C Hamano
  0 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03 14:26 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason


On Wed, Feb 03 2021, Ævar Arnfjörð Bjarmason wrote:

> Set the equivalent of -U0 when generating diffs for "git log -G". As
> seen in diffgrep_consume() we ignore any lines that aren't the "+" and
> "-" lines, so the rest of the output wasn't being used.
>
> It turns out that we spent quite a bit of CPU just on this[1]:
>
>     Test                                             HEAD~             HEAD
>     -----------------------------------------------------------------------------------------
>     4209.2: git log -G'a' <limit-rev>..              0.60(0.54+0.06)   0.52(0.46+0.05) -13.3%
>     4209.8: git log -G'uncommon' <limit-rev>..       0.61(0.54+0.07)   0.53(0.47+0.06) -13.1%
>     4209.14: git log -G'[þæö]' <limit-rev>..         0.60(0.55+0.04)   0.56(0.48+0.04) -6.7%
>     4209.21: git log -i -G'a' <limit-rev>..          0.63(0.56+0.03)   0.54(0.48+0.05) -14.3%
>     4209.27: git log -i -G'uncommon' <limit-rev>..   0.61(0.55+0.05)   0.53(0.47+0.06) -13.1%
>     4209.33: git log -i -G'[þæö]' <limit-rev>..      0.61(0.53+0.07)   0.53(0.47+0.05) -13.1%
>
> I also experimented with setting diff.interHunkContext to 10, 100
> etc. As noted above it's useless for -G to have non-"+" and non-"-"
> lines for the matching itself, but there's going to be some sweet spot
> where if we can be handed bigger hunks at a time our matching might be
> faster.
>
> But alas, the results of that were:
>
>     Test                                             HEAD~2            HEAD~                    HEAD
>     ------------------------------------------------------------------------------------------------------------------
>     4209.2: git log -G'a' <limit-rev>..              0.61(0.53+0.07)   0.51(0.46+0.05) -16.4%   0.51(0.46+0.05) -16.4%
>     4209.8: git log -G'uncommon' <limit-rev>..       0.66(0.55+0.05)   0.53(0.48+0.04) -19.7%   0.52(0.49+0.03) -21.2%
>     4209.14: git log -G'[þæö]' <limit-rev>..         0.63(0.54+0.06)   0.51(0.44+0.07) -19.0%   0.52(0.46+0.06) -17.5%
>     4209.21: git log -i -G'a' <limit-rev>..          0.62(0.54+0.07)   0.51(0.46+0.04) -17.7%   0.53(0.45+0.07) -14.5%
>     4209.27: git log -i -G'uncommon' <limit-rev>..   0.62(0.56+0.06)   0.53(0.48+0.05) -14.5%   0.53(0.46+0.07) -14.5%
>     4209.33: git log -i -G'[þæö]' <limit-rev>..      0.63(0.57+0.03)   0.58(0.46+0.06) -7.9%    0.53(0.46+0.06) -15.9%
>
> I.e. maybe it's faster in some cases, but probably slower in general.
>
> Those results are going to be crappy because we're matching a line at
> a time, as opposed to some version of /m matching across the whole
> diff (if possible). So that approach might be worth revisiting in the
> future.
>
> 1. GIT_SKIP_TESTS="p4209.[1379] p4209.15 p4209.2[028] p4209.34" GIT_PERF_EXTRA= GIT_PERF_REPO=~/g/git/ GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst' ./run HEAD~ HEAD -- p4209-pickaxe.sh
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  diffcore-pickaxe.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
> index cb865c8b29..5161c81057 100644
> --- a/diffcore-pickaxe.c
> +++ b/diffcore-pickaxe.c
> @@ -60,7 +60,7 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
>  	memset(&xecfg, 0, sizeof(xecfg));
>  	ecbdata.regexp = regexp;
>  	ecbdata.hit = 0;
> -	xecfg.ctxlen = o->context;
> +	xecfg.ctxlen = 0;
>  	xecfg.interhunkctxlen = o->interhunkcontext;
>  	if (xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
>  			  &ecbdata, &xpp, &xecfg))

I since discovered Junio's f01cae918f (diff: teach --stat/--numstat to
honor -U$num, 2011-09-22) (as an aside we have no test for that
behavior).

I haven't looked carefully, but I don't think we'll have the same issue
here, as pickaxe currently doesn't care about whether something is on
the + or - line, when briefly looking at the diffstat edge cases it
seems that's what differs based on -U<n> for the diffstat.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 22/25] Remove unused kwset.[ch]
       [not found]       ` <CAPUEsphN7QuSVsC1Tr4xE8yQgPTtpF7wL7zbk1crQU3n-5g6JQ@mail.gmail.com>
@ 2021-02-03 16:45         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03 16:45 UTC (permalink / raw)
  To: Carlo Arenas; +Cc: git, Junio C Hamano, Jeff King, Johannes Schindelin


On Wed, Feb 03 2021, Carlo Arenas wrote:

> On Wed, Feb 3, 2021 at 6:13 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> wrote:
>
>>
>> On Wed, Feb 03 2021, Carlo Arenas wrote:
>>
>> > this is still being used in Linux without Glibc (ex: alpine or void that
>> > use musl), and other OS that rely on the compat layer by default for
>> legacy
>> > reasons (ex: macOS and Windows) or where PCRE2 is not widely
>> used/available
>> > (ex: OpenBSD, NetBSD)
>>
>> Are you perhaps confusing kwset with NO_REGEX=1 and compat/regex/*? I
>> just say that because that's a common "Linux without Glibc" && musl
>> fallback.
>>
>
> Indeed, my bad!
>
> FWIW I was not arguing against this patchset with my comment, and I agree
> with you that the less old unmaintainable code git has, the better.
> will give this patchset a spin as time allows and hopefully report back
> with some more useful feedback.

No worries, the whole fallback mechanism is quite confusing. One thing
we've got going for us is a hard dependency on REG_STARTEND.

I'm not doing this in this series, but I do have my eye on eventually
migrating other things that use regexec_buf() (and thus might need to
match across a \0) to the grep API, and dropping compat/regex/ in favor
of some version of compat/pcre2/.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 19/25] pickaxe -G: set -U0 for diff generation
  2021-02-03 14:26   ` Ævar Arnfjörð Bjarmason
@ 2021-02-03 19:42     ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-02-03 19:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I since discovered Junio's f01cae918f (diff: teach --stat/--numstat to
> honor -U$num, 2011-09-22) (as an aside we have no test for that
> behavior).
>
> I haven't looked carefully, but I don't think we'll have the same issue
> here, as pickaxe currently doesn't care about whether something is on
> the + or - line, when briefly looking at the diffstat edge cases it
> seems that's what differs based on -U<n> for the diffstat.

With -U0 or different <n> in general, the matching between preimage
and postimage may become different, and both -U3 (usual) and -U0 may
express the same change "correctly" from the point of view of a
program like "git apply", but humans would see them as different
patches, and "diffstat" that counts number of +/- would give
different results.  The patch IDs may also be different.  The old
commit was to pessimize the logic (because we do not need context
just to count +/- lines for the purpose of diffstat) to match human
expectations.  They expect "'diffstat' must be counting 'diff -p'
output" and we were counting "diff -p -U0" instead, resulting in
different numbers.

With internally using -U0, the updated "pickaxe -G" is likely to get
the same complaints: "'pickaxe -G<token>' found this commit, but in
the 'git show' output, the token does not seem to be affected".

You'd respond to "try 'git show -U0' and now you'd see the <token>",
but again that is probably breaking human expectations.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 21/25] pickaxe: use PCREv2 for -G and -S
  2021-02-03  3:28 ` [PATCH 21/25] pickaxe: use PCREv2 for -G and -S Ævar Arnfjörð Bjarmason
@ 2021-02-03 20:44   ` Ævar Arnfjörð Bjarmason
  2021-02-04 18:11     ` Junio C Hamano
  2021-02-04 18:22   ` Junio C Hamano
  1 sibling, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-03 20:44 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason


On Wed, Feb 03 2021, Ævar Arnfjörð Bjarmason wrote:

>  void diffcore_pickaxe(struct diff_options *o)
>  {
>  	const char *needle = o->pickaxe;
>  	int opts = o->pickaxe_opts;
> -	regex_t regex, *regexp = NULL;
> -	kwset_t kws = NULL;
> +	struct grep_opt opt;
> +
> +	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_GS_MASK)) {
> +		grep_init(&opt, the_repository, NULL);
> +#ifdef USE_LIBPCRE2
> +		grep_commit_pattern_type(GREP_PATTERN_TYPE_PCRE, &opt);
> +#else
> +		grep_commit_pattern_type(GREP_PATTERN_TYPE_ERE, &opt);
> +#endif
>  
> -	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
> -		int cflags = REG_EXTENDED | REG_NEWLINE;
>  		if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE)
> -			cflags |= REG_ICASE;
> -		regcomp_or_die(&regex, needle, cflags);
> -		regexp = &regex;
> -	} else if (opts & DIFF_PICKAXE_KIND_S) {
> -		if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE &&
> -		    has_non_ascii(needle)) {
> -			struct strbuf sb = STRBUF_INIT;
> -			int cflags = REG_NEWLINE | REG_ICASE;
> -
> -			basic_regex_quote_buf(&sb, needle);
> -			regcomp_or_die(&regex, sb.buf, cflags);
> -			strbuf_release(&sb);
> -			regexp = &regex;
> -		} else {
> -			kws = kwsalloc(o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE
> -				       ? tolower_trans_tbl : NULL);
> -			kwsincr(kws, needle, strlen(needle));
> -			kwsprep(kws);
> -		}
> +			opt.ignore_case = 1;
> +		if (opts & DIFF_PICKAXE_KIND_S &&
> +		    !(opts & DIFF_PICKAXE_REGEX))
> +			opt.fixed = 1;
> +
> +		append_grep_pattern(&opt, needle, "diffcore-pickaxe", 0, GREP_PATTERN);
> +		compile_grep_patterns(&opt);
>  	}
>  
> -	pickaxe(&diff_queued_diff, o, regexp, kws,
> +	pickaxe(&diff_queued_diff, o, &opt,
>  		(opts & DIFF_PICKAXE_KIND_G) ? diff_grep : has_changes);
>  
> -	if (regexp)
> -		regfree(regexp);
> -	if (kws)
> -		kwsfree(kws);
> +	if (opts & ~DIFF_PICKAXE_KIND_OBJFIND)
> +		free_grep_patterns(&opt);
> +
>  	return;
>  }


There's a bug here where now different things are dispatched to either
the -S or -G codepath wrongly, I've fixed it in my local version.

Anyway, it's interesting between this and the -U0 change that we have
little/no coverage for some/all of this. I'm trying to address that in
preceding patches in v2.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop
  2021-02-03  3:28 ` [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop Ævar Arnfjörð Bjarmason
@ 2021-02-04 16:16   ` René Scharfe
  2021-02-04 17:56     ` Junio C Hamano
  0 siblings, 1 reply; 99+ messages in thread
From: René Scharfe @ 2021-02-04 16:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón

Am 03.02.21 um 04:28 schrieb Ævar Arnfjörð Bjarmason:
> If we walk to the end of the string we just won't match the rest of
> the regex. This removes an optimization for simplicity's sake. In
> subsequent commits we'll alter this code more, and not having to think
> about this condition makes it easier to read.
>
> If we look at the context of what we're doing here the last thing we
> need to be worried about is one extra regex match. The real problem is
> that we keep matching after it's clear that the number of contains()
> for "A" and "B" is different. So we could be much smarter here.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  diffcore-pickaxe.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
> index 208177bb40..8df76afb6e 100644
> --- a/diffcore-pickaxe.c
> +++ b/diffcore-pickaxe.c
> @@ -82,12 +82,11 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
>  		regmatch_t regmatch;
>  		int flags = 0;
>
> -		while (sz &&
> -		       !regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
> +		while (!regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {

This will loop forever for regexes that match an empty string.  An
example would be /$/.  Silly, perhaps, but still I understand this check
less as an optimization and more as a correctness/robustness thing.

>  			flags |= REG_NOTBOL;
>  			data += regmatch.rm_eo;
>  			sz -= regmatch.rm_eo;
> -			if (sz && regmatch.rm_so == regmatch.rm_eo) {
> +			if (regmatch.rm_so == regmatch.rm_eo) {
>  				data++;
>  				sz--;
>  			}

Before, if the match was an empty string and there was more data after
it, then the code would consume a character anyway, in order to avoid
matching the same empty string again.  With the patch, that character
is consumed even if there is no more data.  This leaves 'data'
pointing beyond the buffer and 'sz' rolls over to ULONG_MAX.  Oops. :(

René


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop
  2021-02-04 16:16   ` René Scharfe
@ 2021-02-04 17:56     ` Junio C Hamano
  2021-02-04 21:13       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 99+ messages in thread
From: Junio C Hamano @ 2021-02-04 17:56 UTC (permalink / raw)
  To: René Scharfe
  Cc: Ævar Arnfjörð Bjarmason, git, Jeff King,
	Johannes Schindelin, Carlo Marcelo Arenas Belón

René Scharfe <l.s.r@web.de> writes:

>> -		while (sz &&
>> -		       !regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
>> +		while (!regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
>
> This will loop forever for regexes that match an empty string.  An
> example would be /$/.  Silly, perhaps, but still I understand this check
> less as an optimization and more as a correctness/robustness thing.
>
>>  			flags |= REG_NOTBOL;
>>  			data += regmatch.rm_eo;
>>  			sz -= regmatch.rm_eo;
>> -			if (sz && regmatch.rm_so == regmatch.rm_eo) {
>> +			if (regmatch.rm_so == regmatch.rm_eo) {
>>  				data++;
>>  				sz--;
>>  			}
>
> Before, if the match was an empty string and there was more data after
> it, then the code would consume a character anyway, in order to avoid
> matching the same empty string again.  With the patch, that character
> is consumed even if there is no more data.  This leaves 'data'
> pointing beyond the buffer and 'sz' rolls over to ULONG_MAX.  Oops. :(

While I do not care too much about NUL in the haystack, I do not
mind [13/25] either.  But this is bad.

This whole thing reminds me of f53c5de2 (pickaxe: fix segfault with
'-S<...> --pickaxe-regex', 2017-03-18), by the way.

Thanks.




^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 21/25] pickaxe: use PCREv2 for -G and -S
  2021-02-03 20:44   ` Ævar Arnfjörð Bjarmason
@ 2021-02-04 18:11     ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-02-04 18:11 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> There's a bug here where now different things are dispatched to either
> the -S or -G codepath wrongly, I've fixed it in my local version.
>
> Anyway, it's interesting between this and the -U0 change that we have
> little/no coverage for some/all of this. I'm trying to address that in
> preceding patches in v2.

Back when I did the original -S<fixed>, we weren't as focused on
good test coverage as we are now.  We may have added a lot more by
the time when we introduced -G and -S<regexp>, but your effort that
resulted in noticing the lack of basic coverage and rectifying the
situation is very much appreciated.

Thanks.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 24/25] xdiff-interface: support early exit in xdiff_outf()
  2021-02-03  3:28 ` [PATCH 24/25] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
@ 2021-02-04 18:16   ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-02-04 18:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Bridge the gap between the preceding "xdiff-interface: allow early
> return from xdiff_emit_{line,hunk}_fn" change and the public
> interface. This change was split off from the rest as it wasn't a
> purely mechanical addition of "return 0".
>
> Here we want to be able to abort early, but do so in a way that
> doesn't skip the appropriate strbuf_reset() invocations.

Nice.

> The use of -1 as a return value in the xdiff codebase for early
> return, as we'll see more of in subsequent commits.

This is a non-sentence without a verb.  

I started reading the sentence and expected to see "The use of -1,
as opposed to something else, is because of thess deep reasons"
explained, but perhaps you forgot to conclude the sentence with such
an explanation?


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 21/25] pickaxe: use PCREv2 for -G and -S
  2021-02-03  3:28 ` [PATCH 21/25] pickaxe: use PCREv2 for -G and -S Ævar Arnfjörð Bjarmason
  2021-02-03 20:44   ` Ævar Arnfjörð Bjarmason
@ 2021-02-04 18:22   ` Junio C Hamano
  1 sibling, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-02-04 18:22 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Follow-up b65abcafc7a (grep: use PCRE v2 for optimized fixed-string
> search, 2019-07-01) and remove the use of kwset in the pickaxe code
> for fixed-string search, in favor of optimistically using PCRE v2.
>
> This does mean that the semantics of the -G option subtly change,
> before it's an ERE, whereas now it'll be a PCRE if we're compiled with
> PCRE. Since PCRE is almost entirely a strict superset of ERE syntax I
> think this is OK.

Since Git is no longer a tool shared among 100 developers who are at
least acquaintances, there will be people who are bothered by the
differences, and such a change deserves a backward-compatibility
warning in the release notes, at least.

Recently, I discovered that I've been building my personal copy of
Git without LIBPCRE support at all for a long time.  It is possible
I've never built with LIBPCRE.  This series may give me incentive to
start using "git grep -P" ;-)

Thanks.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop
  2021-02-04 17:56     ` Junio C Hamano
@ 2021-02-04 21:13       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-04 21:13 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: René Scharfe, git, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón


On Thu, Feb 04 2021, Junio C Hamano wrote:

> René Scharfe <l.s.r@web.de> writes:
>
>>> -		while (sz &&
>>> -		       !regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
>>> +		while (!regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
>>
>> This will loop forever for regexes that match an empty string.  An
>> example would be /$/.  Silly, perhaps, but still I understand this check
>> less as an optimization and more as a correctness/robustness thing.
>>
>>>  			flags |= REG_NOTBOL;
>>>  			data += regmatch.rm_eo;
>>>  			sz -= regmatch.rm_eo;
>>> -			if (sz && regmatch.rm_so == regmatch.rm_eo) {
>>> +			if (regmatch.rm_so == regmatch.rm_eo) {
>>>  				data++;
>>>  				sz--;
>>>  			}
>>
>> Before, if the match was an empty string and there was more data after
>> it, then the code would consume a character anyway, in order to avoid
>> matching the same empty string again.  With the patch, that character
>> is consumed even if there is no more data.  This leaves 'data'
>> pointing beyond the buffer and 'sz' rolls over to ULONG_MAX.  Oops. :(
>
> While I do not care too much about NUL in the haystack, I do not
> mind [13/25] either.  But this is bad.
>
> This whole thing reminds me of f53c5de2 (pickaxe: fix segfault with
> '-S<...> --pickaxe-regex', 2017-03-18), by the way.

René: Well spotted, thanks, and Oops.

I've just sent a separate series with 01-10 of this one. I'm sitting on
the diffcore-pickaxe patches for a while. I've got local fixes for a lot
of issues in it, will fix this one too.

I've optimized the PCRE v2 codepath a lot more in my local
version. Current results are:

    4209.1: git log -S'int main' <limit-rev>..                                0.38(0.36+0.01)   0.37(0.33+0.04) -2.6%
    4209.2: git log -S'æ' <limit-rev>..                                       0.51(0.47+0.04)   0.32(0.27+0.05) -37.3%
    4209.3: git log --pickaxe-regex -S'(int|void|null)' <limit-rev>..         0.72(0.68+0.03)   0.57(0.54+0.03) -20.8%
    4209.4: git log --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>..          0.60(0.55+0.02)   0.39(0.34+0.05) -35.0%
    4209.5: git log --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>..       0.43(0.40+0.03)   0.50(0.44+0.06) +16.3%
    4209.6: git log -G'(int|void|null)' <limit-rev>..                         0.64(0.55+0.09)   0.63(0.56+0.05) -1.6%
    4209.7: git log -G'if *\([^ ]+ & ' <limit-rev>..                          0.64(0.59+0.05)   0.63(0.56+0.06) -1.6%
    4209.8: git log -G'[àáâãäåæñøùúûüýþ]' <limit-rev>..                       0.63(0.54+0.08)   0.62(0.55+0.06) -1.6%
    4209.9: git log -i -S'int main' <limit-rev>..                             0.39(0.35+0.03)   0.38(0.35+0.02) -2.6%
    4209.10: git log -i -S'æ' <limit-rev>..                                   0.39(0.33+0.06)   0.32(0.28+0.04) -17.9%
    4209.11: git log -i --pickaxe-regex -S'(int|void|null)' <limit-rev>..     0.90(0.84+0.05)   0.58(0.53+0.04) -35.6%
    4209.12: git log -i --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>..      0.71(0.64+0.06)   0.40(0.37+0.03) -43.7%
    4209.13: git log -i --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>..   0.43(0.40+0.03)   0.50(0.46+0.04) +16.3%
    4209.14: git log -i -G'(int|void|null)' <limit-rev>..                     0.64(0.57+0.06)   0.62(0.56+0.05) -3.1%
    4209.15: git log -i -G'if *\([^ ]+ & ' <limit-rev>..                      0.65(0.59+0.06)   0.63(0.54+0.08) -3.1%
    4209.16: git log -i -G'[àáâãäåæñøùúûüýþ]' <limit-rev>..                   0.63(0.55+0.08)   0.62(0.56+0.05) -1.6%

The main optimization was just moving the compilation of the pattern up
the stack into the diff_options struct, the current version in this
thread re-compiles it every time.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v2 00/22] pickaxe: test and refactoring for follow-up changes
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (25 preceding siblings ...)
  2021-02-03 12:38 ` [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-02-16 22:23   ` Junio C Hamano
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
  2021-02-16 11:57 ` [PATCH v2 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
                   ` (21 subsequent siblings)
  48 siblings, 2 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

This is a smaller v2 of the series to remove the kwset backend and
make pickaxe use PCRE v2[1].

That's not being done here yet. These are mostly small
refactoring/test fixes. The most significant work is a new xdiff
interface at the end of the series.

It's based on next where some preparatory work already landed[2].

The endless loop bug in v1 pointed out by René Scharfe[3] is gone. We
should still have a test for that, I didn't have time to do more on
that, and figured this was already getting large enough.

I'll do some more improvements of test coverage in a follow-up
series. I'm aware of various blind spots in pickaxe test coverage, but
none of it should hide a bug in this refactoring from us.

It's things like how we deal with REG_NEWLINE, "^" matches etc., but
all the matching logic for that stays the same in this series.

1. https://lore.kernel.org/git/20210203032811.14979-1-avarab@gmail.com/
2. https://lore.kernel.org/git/20210204210556.25242-1-avarab@gmail.com/
3. https://lore.kernel.org/git/4ef09db7-34f2-2fb5-b9e9-be69c7102787@web.de/

Ævar Arnfjörð Bjarmason (22):
  grep/pcre2 tests: reword comments referring to kwset
  test-lib-functions: document and test test_commit --no-tag
  test-lib-functions: reword "test_commit --append" docs
  test-lib functions: add --printf option to test_commit
  pickaxe tests: refactor to use test_commit --append --printf
  pickaxe tests: add test for diffgrep_consume() internals
  pickaxe tests: add test for "log -S" not being a regex
  pickaxe tests: test for -G, -S and --find-object incompatibility
  pickaxe: die when -G and --pickaxe-regex are combined
  pickaxe: die when --find-object and --pickaxe-all are combined
  diff.h: move pickaxe fields together again
  pickaxe/style: consolidate declarations and assignments
  perf: add performance test for pickaxe
  pickaxe: refactor function selection in diffcore-pickaxe()
  pickaxe: assert that we must have a needle under -G or -S
  pickaxe -S: support content with NULs under --pickaxe-regex
  pickaxe: rename variables in has_changes() for brevity
  pickaxe -S: slightly optimize contains()
  xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn
  xdiff-interface: support early exit in xdiff_outf()
  pickaxe -G: terminate early on matching lines
  pickaxe -G: don't special-case create/delete

 combine-diff.c                 |   9 ++-
 diff.c                         |  45 +++++++++------
 diff.h                         |   7 ++-
 diffcore-pickaxe.c             |  99 ++++++++++++++++++--------------
 range-diff.c                   |   8 ++-
 t/perf/p4209-pickaxe.sh        |  70 +++++++++++++++++++++++
 t/t0000-basic.sh               |  19 +++++++
 t/t1307-config-blob.sh         |   4 +-
 t/t2030-unresolve-info.sh      |   3 +-
 t/t4006-diff-mode.sh           |   6 +-
 t/t4030-diff-textconv.sh       |   8 +--
 t/t4209-log-pickaxe.sh         | 100 +++++++++++++++++++++++++++------
 t/t5520-pull.sh                |  10 +---
 t/t7816-grep-binary-pattern.sh |   4 +-
 t/test-lib-functions.sh        |  23 ++++++--
 xdiff-interface.c              |  26 ++++++---
 xdiff-interface.h              |  36 +++++++++---
 17 files changed, 349 insertions(+), 128 deletions(-)
 create mode 100755 t/perf/p4209-pickaxe.sh

-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v2 01/22] grep/pcre2 tests: reword comments referring to kwset
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (26 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 00/22] pickaxe: test and refactoring for follow-up changes Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:57 ` [PATCH v2 02/22] test-lib-functions: document and test test_commit --no-tag Ævar Arnfjörð Bjarmason
                   ` (20 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

The kwset optimization has not been used by grep since
48de2a768cf (grep: remove the kwset optimization, 2019-07-01).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7816-grep-binary-pattern.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t7816-grep-binary-pattern.sh b/t/t7816-grep-binary-pattern.sh
index 60bab291e49..9d67a5fc4cf 100755
--- a/t/t7816-grep-binary-pattern.sh
+++ b/t/t7816-grep-binary-pattern.sh
@@ -59,7 +59,7 @@ test_expect_success 'setup' "
 	git commit -m.
 "
 
-# Simple fixed-string matching that can use kwset (no -i && non-ASCII)
+# Simple fixed-string matching
 nul_match P P P '-F' 'yQf'
 nul_match P P P '-F' 'yQx'
 nul_match P P P '-Fi' 'YQf'
@@ -78,7 +78,7 @@ nul_match P P P '-Fi' '[Y]QF'
 nul_match P P P '-F' 'æQ[ð]'
 nul_match P P P '-F' '[æ]Qð'
 
-# The -F kwset codepath can't handle -i && non-ASCII...
+# Matching pattern and subject case with -i
 nul_match P 1 1 '-i' '[æ]Qð'
 
 # ...PCRE v2 only matches non-ASCII with -i casefolding under UTF-8
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 02/22] test-lib-functions: document and test test_commit --no-tag
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (27 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-03-30 23:14   ` Junio C Hamano
  2021-02-16 11:57 ` [PATCH v2 03/22] test-lib-functions: reword "test_commit --append" docs Ævar Arnfjörð Bjarmason
                   ` (19 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

In 76b8b8d05c (test-lib functions: document arguments to test_commit,
2021-01-12) I added missing documentation to test_commit, but in less
than a month later in 3803a3a099 (t: add --no-tag option to
test_commit, 2021-02-09) we got another undocumented option.

Let's fix that, and while we're at it expand on my
e7884b353b (test-lib-functions: assert correct parameter count,
2021-02-12) and assert that you shouldn't be passing the optional
"<tag>" argument under "test_commit --no-tag".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t0000-basic.sh        | 19 +++++++++++++++++++
 t/test-lib-functions.sh |  8 +++++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index a6e570d674a..6ee98fd0695 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -1324,4 +1324,23 @@ test_expect_success 'test_must_fail rejects a non-git command with env' '
 	grep -F "test_must_fail: only '"'"'git'"'"' is allowed" err
 '
 
+test_expect_success 'test_commit --no-tag fails with a <tag> argument' '
+	run_sub_test_lib_test_err \
+		test_commit-bug "test_commit-bug with --no-tag" <<-\EOF &&
+	test_expect_success "setup #1" "test_commit message1 file1 contents1"
+	test_expect_success "setup #2" "test_commit message2 file2 contents2 tag2"
+	test_expect_success "setup #3" "test_commit --no-tag message3 file3 contents3"
+	test_expect_success "setup #4" "test_commit --no-tag message4 file4 contents4 tag4"
+	test_done
+	EOF
+	check_sub_test_lib_test_err test_commit-bug \
+		<<-\EOF_OUT 3<<-\EOF_ERR
+	ok 1 - setup #1
+	ok 2 - setup #2
+	ok 3 - setup #3
+	EOF_OUT
+	error: bug in the test script: expect no <tag> parameter with --no-tag
+	EOF_ERR
+'
+
 test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6348e8d7339..1eb75d0d733 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -178,6 +178,9 @@ debug () {
 #	Invoke "git commit" with --signoff
 #   --author <author>
 #	Invoke "git commit" with --author <author>
+#   --no-tag
+#	Do not tag the resulting commit, if supplied giving the
+#	optional "<tag>" argument is an error.
 #
 # This will commit a file with the given contents and the given commit
 # message, and tag the resulting commit with the given tag name.
@@ -242,7 +245,10 @@ test_commit () {
 	git ${indir:+ -C "$indir"} commit \
 	    ${author:+ --author "$author"} \
 	    $signoff -m "$1" &&
-	if test -z "$no_tag"
+	if test -n "$no_tag" -a $# -eq 4
+	then
+		BUG "expect no <tag> parameter with --no-tag"
+	elif test -z "$no_tag"
 	then
 		git ${indir:+ -C "$indir"} tag "${4:-$1}"
 	fi
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 03/22] test-lib-functions: reword "test_commit --append" docs
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (28 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 02/22] test-lib-functions: document and test test_commit --no-tag Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:57 ` [PATCH v2 04/22] test-lib functions: add --printf option to test_commit Ævar Arnfjörð Bjarmason
                   ` (18 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Reword the documentation for "test_commit --append" added in my
3373518cc8 (test-lib functions: add an --append option to test_commit,
2021-01-12).

A follow-up commit will make the "echo" part of this configurable, and
in any case saying "echo >>" rather than ">>" was redundant.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/test-lib-functions.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 1eb75d0d733..5af92347123 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -172,8 +172,7 @@ debug () {
 #   --notick
 #	Do not call test_tick before making a commit
 #   --append
-#	Use "echo >>" instead of "echo >" when writing "<contents>" to
-#	"<file>"
+#	Use ">>" instead of ">" when writing "<contents>" to "<file>"
 #   --signoff
 #	Invoke "git commit" with --signoff
 #   --author <author>
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 04/22] test-lib functions: add --printf option to test_commit
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (29 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 03/22] test-lib-functions: reword "test_commit --append" docs Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-03-30 23:11   ` Junio C Hamano
  2021-02-16 11:57 ` [PATCH v2 05/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
                   ` (17 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Add a --printf option to test_commit to allow writing to the file with
"printf" instead of "echo".

This is useful for writing "\n", "\0" etc., in particular in
combination with the --append option added in 3373518cc8 (test-lib
functions: add an --append option to test_commit, 2021-01-12).

I'm converting a few tests to use the new option rather than a manual
printf/add/commit combination to demonstrate its usefulness. While I'm
at it use "test_create_repo" where appropriate, and give the
first/second commit a meaningful/more conventional log message in
cases where no test cared about that message.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1307-config-blob.sh    |  4 +---
 t/t2030-unresolve-info.sh |  3 +--
 t/t4006-diff-mode.sh      |  6 ++----
 t/t4030-diff-textconv.sh  |  8 ++------
 t/t5520-pull.sh           | 10 ++--------
 t/test-lib-functions.sh   | 12 ++++++++++--
 6 files changed, 18 insertions(+), 25 deletions(-)

diff --git a/t/t1307-config-blob.sh b/t/t1307-config-blob.sh
index 002e6d3388e..930dce06f0f 100755
--- a/t/t1307-config-blob.sh
+++ b/t/t1307-config-blob.sh
@@ -65,9 +65,7 @@ test_expect_success 'parse errors in blobs are properly attributed' '
 '
 
 test_expect_success 'can parse blob ending with CR' '
-	printf "[some]key = value\\r" >config &&
-	git add config &&
-	git commit -m CR &&
+	test_commit --printf CR config "[some]key = value\\r" &&
 	echo value >expect &&
 	git config --blob=HEAD:config some.key >actual &&
 	test_cmp expect actual
diff --git a/t/t2030-unresolve-info.sh b/t/t2030-unresolve-info.sh
index be6c84c52a2..bad28d29de5 100755
--- a/t/t2030-unresolve-info.sh
+++ b/t/t2030-unresolve-info.sh
@@ -179,8 +179,7 @@ test_expect_success 'rerere and rerere forget (subdirectory)' '
 
 test_expect_success 'rerere forget (binary)' '
 	git checkout -f side &&
-	printf "a\0c" >binary &&
-	git commit -a -m binary &&
+	test_commit binary binary "a\0c" &&
 	test_must_fail git merge second &&
 	git rerere forget binary
 '
diff --git a/t/t4006-diff-mode.sh b/t/t4006-diff-mode.sh
index 03489aff14e..8fd60424142 100755
--- a/t/t4006-diff-mode.sh
+++ b/t/t4006-diff-mode.sh
@@ -26,10 +26,8 @@ test_expect_success 'chmod' '
 '
 
 test_expect_success 'prepare binary file' '
-	git commit -m rezrov &&
-	printf "\00\01\02\03\04\05\06" >binbin &&
-	git add binbin &&
-	git commit -m binbin
+	git commit -m one &&
+	test_commit --printf two binbin "\00\01\02\03\04\05\06"
 '
 
 test_expect_success '--stat output after text chmod' '
diff --git a/t/t4030-diff-textconv.sh b/t/t4030-diff-textconv.sh
index 4cb9f0e523d..6e1e7e38ff4 100755
--- a/t/t4030-diff-textconv.sh
+++ b/t/t4030-diff-textconv.sh
@@ -26,12 +26,8 @@ EOF
 chmod +x hexdump
 
 test_expect_success 'setup binary file with history' '
-	printf "\\0\\n" >file &&
-	git add file &&
-	git commit -m one &&
-	printf "\\01\\n" >>file &&
-	git add file &&
-	git commit -m two
+	test_commit --printf one file "\\0\\n" &&
+	test_commit --printf --append two file "\\01\\n"
 '
 
 test_expect_success 'file is considered binary by porcelain' '
diff --git a/t/t5520-pull.sh b/t/t5520-pull.sh
index a09411327f9..e2c0c510222 100755
--- a/t/t5520-pull.sh
+++ b/t/t5520-pull.sh
@@ -746,14 +746,8 @@ test_expect_success 'pull --rebase fails on corrupt HEAD' '
 '
 
 test_expect_success 'setup for detecting upstreamed changes' '
-	mkdir src &&
-	(
-		cd src &&
-		git init &&
-		printf "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n" > stuff &&
-		git add stuff &&
-		git commit -m "Initial revision"
-	) &&
+	test_create_repo src &&
+	test_commit -C src --printf one stuff "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n" &&
 	git clone src dst &&
 	(
 		cd src &&
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 5af92347123..5e49dd6b864 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -173,6 +173,10 @@ debug () {
 #	Do not call test_tick before making a commit
 #   --append
 #	Use ">>" instead of ">" when writing "<contents>" to "<file>"
+#   --printf
+#       Use "printf" instead of "echo" when writing "<contents>" to
+#       "<file>". You will need to provide your own trailing "\n". You
+#       can only supply the FORMAT for the printf(1), not its ARGUMENT(s).
 #   --signoff
 #	Invoke "git commit" with --signoff
 #   --author <author>
@@ -188,6 +192,7 @@ debug () {
 
 test_commit () {
 	notick= &&
+	echo=echo &&
 	append= &&
 	author= &&
 	signoff= &&
@@ -199,6 +204,9 @@ test_commit () {
 		--notick)
 			notick=yes
 			;;
+		--printf)
+			echo=printf
+			;;
 		--append)
 			append=yes
 			;;
@@ -232,9 +240,9 @@ test_commit () {
 	file=${2:-"$1.t"} &&
 	if test -n "$append"
 	then
-		echo "${3-$1}" >>"$indir$file"
+		$echo "${3-$1}" >>"$indir$file"
 	else
-		echo "${3-$1}" >"$indir$file"
+		$echo "${3-$1}" >"$indir$file"
 	fi &&
 	git ${indir:+ -C "$indir"} add "$file" &&
 	if test -z "$notick"
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 05/22] pickaxe tests: refactor to use test_commit --append --printf
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (30 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 04/22] test-lib functions: add --printf option to test_commit Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-03-30 23:26   ` Junio C Hamano
  2021-02-16 11:57 ` [PATCH v2 06/22] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
                   ` (16 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Refactor existing tests added in e0e7cb8080c (log -G: ignore binary
files, 2018-12-14) to use the --append option I added in
3373518cc8b (test-lib functions: add an --append option to
test_commit, 2021-01-12) and the --printf option added in a preceding
commit.

See also f5d79bf7dd6 (tests: refactor a few tests to use "test_commit
--append", 2021-01-12) for prior similar refactoring.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4209-log-pickaxe.sh | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 5d06f5f45ea..298b25265f4 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -107,37 +107,35 @@ test_expect_success 'log -S --no-textconv (missing textconv tool)' '
 '
 
 test_expect_success 'setup log -[GS] binary & --text' '
-	git checkout --orphan GS-binary-and-text &&
-	git read-tree --empty &&
-	printf "a\na\0a\n" >data.bin &&
-	git add data.bin &&
-	git commit -m "create binary file" data.bin &&
-	printf "a\na\0a\n" >>data.bin &&
-	git commit -m "modify binary file" data.bin &&
-	git rm data.bin &&
-	git commit -m "delete binary file" data.bin &&
-	git log >full-log
+	test_create_repo GS-bin-txt &&
+	test_commit -C GS-bin-txt --append --printf A data.bin "a\na\0a\n" &&
+	test_commit -C GS-bin-txt --append --printf B data.bin "a\na\0a\n" &&
+	test_commit -C GS-bin-txt C data.bin "" &&
+	git -C GS-bin-txt log >full-log
 '
 
 test_expect_success 'log -G ignores binary files' '
-	git log -Ga >log &&
+	git -C GS-bin-txt log -Ga >log &&
 	test_must_be_empty log
 '
 
 test_expect_success 'log -G looks into binary files with -a' '
-	git log -a -Ga >log &&
+	git -C GS-bin-txt log -a -Ga >log &&
 	test_cmp log full-log
 '
 
 test_expect_success 'log -G looks into binary files with textconv filter' '
-	test_when_finished "rm .gitattributes" &&
-	echo "* diff=bin" >.gitattributes &&
-	git -c diff.bin.textconv=cat log -Ga >log &&
+	test_when_finished "rm GS-bin-txt/.gitattributes" &&
+	(
+		cd GS-bin-txt &&
+		echo "* diff=bin" >.gitattributes &&
+		git -c diff.bin.textconv=cat log -Ga >../log
+	) &&
 	test_cmp log full-log
 '
 
 test_expect_success 'log -S looks into binary files' '
-	git log -Sa >log &&
+	git -C GS-bin-txt log -Sa >log &&
 	test_cmp log full-log
 '
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 06/22] pickaxe tests: add test for diffgrep_consume() internals
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (31 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 05/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:57 ` [PATCH v2 07/22] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
                   ` (15 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

In diffgrep_consume() we generate a diff, and then advance past the
"+" or "-" at the start of the line for matching. This has been done
ever since the code was added in f506b8e8b5f (git log/diff: add
-G<regexp> that greps in the patch text, 2010-08-23).

If we match "line" instead of "line + 1" no tests fail, i.e. we've got
zero coverage for whether any of our searches match the beginning of
the line or not. Let's add a test for this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4209-log-pickaxe.sh | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 298b25265f4..e5fa84816a5 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -106,6 +106,21 @@ test_expect_success 'log -S --no-textconv (missing textconv tool)' '
 	rm .gitattributes
 '
 
+test_expect_success 'setup log -[GS] plain' '
+	test_create_repo GS-plain &&
+	test_commit -C GS-plain --append A data.txt "a" &&
+	test_commit -C GS-plain --append B data.txt "a a" &&
+	test_commit -C GS-plain C data.txt "" &&
+	git -C GS-plain log >full-log
+'
+
+test_expect_success 'log -G trims diff new/old [-+]' '
+	git -C GS-plain log -G"[+-]a" >log &&
+	test_must_be_empty log &&
+	git -C GS-plain log -G"^a" >log &&
+	test_cmp log full-log
+'
+
 test_expect_success 'setup log -[GS] binary & --text' '
 	test_create_repo GS-bin-txt &&
 	test_commit -C GS-bin-txt --append --printf A data.bin "a\na\0a\n" &&
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 07/22] pickaxe tests: add test for "log -S" not being a regex
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (32 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 06/22] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:57 ` [PATCH v2 08/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
                   ` (14 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

No test in our test suite checked for "log -S<pat>" being a fixed
string, as opposed to "log -S<pat> --pickaxe-regex". Let's test for
it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4209-log-pickaxe.sh | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index e5fa84816a5..c6b4751d5b6 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -106,11 +106,18 @@ test_expect_success 'log -S --no-textconv (missing textconv tool)' '
 	rm .gitattributes
 '
 
-test_expect_success 'setup log -[GS] plain' '
+test_expect_success 'setup log -[GS] plain & regex' '
 	test_create_repo GS-plain &&
 	test_commit -C GS-plain --append A data.txt "a" &&
 	test_commit -C GS-plain --append B data.txt "a a" &&
-	test_commit -C GS-plain C data.txt "" &&
+	test_commit -C GS-plain --append C data.txt "b" &&
+	test_commit -C GS-plain --append D data.txt "[b]" &&
+	test_commit -C GS-plain E data.txt "" &&
+
+	# We also include E, the deletion commit
+	git -C GS-plain log --grep="[ABE]" >A-to-B-then-E-log &&
+	git -C GS-plain log --grep="[CDE]" >C-to-D-then-E-log &&
+	git -C GS-plain log --grep="[DE]" >D-then-E-log &&
 	git -C GS-plain log >full-log
 '
 
@@ -118,7 +125,24 @@ test_expect_success 'log -G trims diff new/old [-+]' '
 	git -C GS-plain log -G"[+-]a" >log &&
 	test_must_be_empty log &&
 	git -C GS-plain log -G"^a" >log &&
-	test_cmp log full-log
+	test_cmp log A-to-B-then-E-log
+'
+
+test_expect_success 'log -S<pat> is not a regex, but -S<pat> --pickaxe-regex is' '
+	git -C GS-plain log -S"a" >log &&
+	test_cmp log A-to-B-then-E-log &&
+
+	git -C GS-plain log -S"[a]" >log &&
+	test_must_be_empty log &&
+
+	git -C GS-plain log -S"[a]" --pickaxe-regex >log &&
+	test_cmp log A-to-B-then-E-log &&
+
+	git -C GS-plain log -S"[b]" >log &&
+	test_cmp log D-then-E-log &&
+
+	git -C GS-plain log -S"[b]" --pickaxe-regex >log &&
+	test_cmp log C-to-D-then-E-log
 '
 
 test_expect_success 'setup log -[GS] binary & --text' '
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 08/22] pickaxe tests: test for -G, -S and --find-object incompatibility
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (33 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 07/22] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-03-30 23:32   ` Junio C Hamano
  2021-02-16 11:57 ` [PATCH v2 09/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
                   ` (13 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Add a test for the options sanity check added in 5e505257f2 (diff:
properly error out when combining multiple pickaxe options,
2018-01-04).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4209-log-pickaxe.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index c6b4751d5b6..5ad4fad964c 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -55,6 +55,17 @@ test_expect_success setup '
 	git rev-parse --verify HEAD >expect_second
 '
 
+test_expect_success 'usage' '
+	test_expect_code 128 git log -Gregex -Sstring 2>err &&
+	test_i18ngrep "mutually exclusive" err &&
+
+	test_expect_code 128 git log -Gregex --find-object=HEAD 2>err &&
+	test_i18ngrep "mutually exclusive" err &&
+
+	test_expect_code 128 git log -Gstring --find-object=HEAD 2>err &&
+	test_i18ngrep "mutually exclusive" err
+'
+
 test_log	expect_initial	--grep initial
 test_log	expect_nomatch	--grep InItial
 test_log_icase	expect_initial	--grep InItial
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 09/22] pickaxe: die when -G and --pickaxe-regex are combined
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (34 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 08/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-03-30 23:36   ` Junio C Hamano
  2021-02-16 11:57 ` [PATCH v2 10/22] pickaxe: die when --find-object and --pickaxe-all " Ævar Arnfjörð Bjarmason
                   ` (12 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

When the -G and --pickaxe-regex options are combined we simply ignore
the --pickaxe-regex option. Let's die instead as suggested by our
documentation, since -G is always a regex.

When --pickaxe-regex was added in d01d8c6782 (Support for pickaxe
matching regular expressions, 2006-03-29) only the -S option
existed. Then when -G was added in f506b8e8b5 (git log/diff: add
-G<regexp> that greps in the patch text, 2010-08-23) neither the
documentation for --pickaxe-regex was updater accordingly, nor was
something like this assertion added.

Since 5bc3f0b567 (diffcore-pickaxe doc: document -S and -G properly,
2013-05-31) we've claimed that --pickaxe-regex should only be used
with -S, but have silently toileted combining it with -G, let's die
instead.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.c                 | 3 +++
 diff.h                 | 2 ++
 t/t4209-log-pickaxe.sh | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/diff.c b/diff.c
index bf2cbf15e77..96da1fe6167 100644
--- a/diff.c
+++ b/diff.c
@@ -4630,6 +4630,9 @@ void diff_setup_done(struct diff_options *options)
 	if (HAS_MULTI_BITS(options->pickaxe_opts & DIFF_PICKAXE_KINDS_MASK))
 		die(_("-G, -S and --find-object are mutually exclusive"));
 
+	if (HAS_MULTI_BITS(options->pickaxe_opts & DIFF_PICKAXE_KINDS_G_REGEX_MASK))
+		die(_("-G and --pickaxe-regex are mutually exclusive, use --pickaxe-regex with -S"));
+
 	/*
 	 * Most of the time we can say "there are changes"
 	 * only by checking if there are changed paths, but
diff --git a/diff.h b/diff.h
index 527fb56d851..668d496d7a5 100644
--- a/diff.h
+++ b/diff.h
@@ -535,6 +535,8 @@ int git_config_rename(const char *var, const char *value);
 #define DIFF_PICKAXE_KINDS_MASK (DIFF_PICKAXE_KIND_S | \
 				 DIFF_PICKAXE_KIND_G | \
 				 DIFF_PICKAXE_KIND_OBJFIND)
+#define DIFF_PICKAXE_KINDS_G_REGEX_MASK (DIFF_PICKAXE_KIND_G | \
+					 DIFF_PICKAXE_REGEX)
 
 #define DIFF_PICKAXE_IGNORE_CASE	32
 
diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 5ad4fad964c..46dc5f14b3b 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -59,6 +59,9 @@ test_expect_success 'usage' '
 	test_expect_code 128 git log -Gregex -Sstring 2>err &&
 	test_i18ngrep "mutually exclusive" err &&
 
+	test_expect_code 128 git log -Gregex --pickaxe-regex 2>err &&
+	test_i18ngrep "mutually exclusive" err &&
+
 	test_expect_code 128 git log -Gregex --find-object=HEAD 2>err &&
 	test_i18ngrep "mutually exclusive" err &&
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 10/22] pickaxe: die when --find-object and --pickaxe-all are combined
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (35 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 09/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:57 ` [PATCH v2 11/22] diff.h: move pickaxe fields together again Ævar Arnfjörð Bjarmason
                   ` (11 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Neither the --pickaxe-all documentation nor --find-object's has ever
suggested that you can combine the two. See f506b8e8b5 (git log/diff:
add -G<regexp> that greps in the patch text, 2010-08-23) and
15af58c1ad (diffcore: add a pickaxe option to find a specific blob,
2018-01-04).

But we've silently tolerated it, which makes the logic in
diffcore_pickaxe() harder to reason about. Let's assert that we won't
have the two combined.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.c                 | 3 +++
 diff.h                 | 2 ++
 t/t4209-log-pickaxe.sh | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/diff.c b/diff.c
index 96da1fe6167..63c49ecaef1 100644
--- a/diff.c
+++ b/diff.c
@@ -4633,6 +4633,9 @@ void diff_setup_done(struct diff_options *options)
 	if (HAS_MULTI_BITS(options->pickaxe_opts & DIFF_PICKAXE_KINDS_G_REGEX_MASK))
 		die(_("-G and --pickaxe-regex are mutually exclusive, use --pickaxe-regex with -S"));
 
+	if (HAS_MULTI_BITS(options->pickaxe_opts & DIFF_PICKAXE_KINDS_ALL_OBJFIND_MASK))
+		die(_("---pickaxe-all and --find-object are mutually exclusive, use --pickaxe-all with -G and -S"));
+
 	/*
 	 * Most of the time we can say "there are changes"
 	 * only by checking if there are changed paths, but
diff --git a/diff.h b/diff.h
index 668d496d7a5..8f0dc7ef43b 100644
--- a/diff.h
+++ b/diff.h
@@ -537,6 +537,8 @@ int git_config_rename(const char *var, const char *value);
 				 DIFF_PICKAXE_KIND_OBJFIND)
 #define DIFF_PICKAXE_KINDS_G_REGEX_MASK (DIFF_PICKAXE_KIND_G | \
 					 DIFF_PICKAXE_REGEX)
+#define DIFF_PICKAXE_KINDS_ALL_OBJFIND_MASK (DIFF_PICKAXE_ALL | \
+					     DIFF_PICKAXE_KIND_OBJFIND)
 
 #define DIFF_PICKAXE_IGNORE_CASE	32
 
diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 46dc5f14b3b..bcaca7e882c 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -66,6 +66,9 @@ test_expect_success 'usage' '
 	test_i18ngrep "mutually exclusive" err &&
 
 	test_expect_code 128 git log -Gstring --find-object=HEAD 2>err &&
+	test_i18ngrep "mutually exclusive" err &&
+
+	test_expect_code 128 git log --pickaxe-all --find-object=HEAD 2>err &&
 	test_i18ngrep "mutually exclusive" err
 '
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 11/22] diff.h: move pickaxe fields together again
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (36 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 10/22] pickaxe: die when --find-object and --pickaxe-all " Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:57 ` [PATCH v2 12/22] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
                   ` (10 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Move the pickaxe and pickaxe_opts fields next to each other again. In
a past life they'd been on adjacent lines, but when they got moved
from a global variable to the diff_options struct in 6b5ee137e5 (Diff
clean-up., 2005-09-21) they got split apart.

That split made sense at the time, the "char*" and "int" (flags)
options were being grouped, but we've long since abandoned that
pattern in the diff_options struct, and now it makes more sense to
group these together again.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/diff.h b/diff.h
index 8f0dc7ef43b..5f33e7e4f76 100644
--- a/diff.h
+++ b/diff.h
@@ -244,6 +244,7 @@ struct diff_options {
 	 * postimage of the diff_queue.
 	 */
 	const char *pickaxe;
+	unsigned pickaxe_opts;
 
 	/* -I<regex> */
 	regex_t **ignore_regex;
@@ -283,8 +284,6 @@ struct diff_options {
 	/* The output format used when `diff_flush()` is run. */
 	int output_format;
 
-	unsigned pickaxe_opts;
-
 	/* Affects the way detection logic for complete rewrites, renames and
 	 * copies.
 	 */
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 12/22] pickaxe/style: consolidate declarations and assignments
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (37 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 11/22] diff.h: move pickaxe fields together again Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:57 ` [PATCH v2 13/22] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
                   ` (9 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Refactor contains() to do its assignments at the same time that it
does its declarations.

This code could have been refactored in ef90ab66e8e (pickaxe: use
textconv for -S counting, 2012-10-28) when a function call between the
declarations and assignments was removed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index a9c6d60df22..a278b9b71d9 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -70,13 +70,9 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 
 static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
 {
-	unsigned int cnt;
-	unsigned long sz;
-	const char *data;
-
-	sz = mf->size;
-	data = mf->ptr;
-	cnt = 0;
+	unsigned int cnt = 0;
+	unsigned long sz = mf->size;
+	const char *data = mf->ptr;
 
 	if (regexp) {
 		regmatch_t regmatch;
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 13/22] perf: add performance test for pickaxe
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (38 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 12/22] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:57 ` [PATCH v2 14/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
                   ` (8 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Add a test for the -G and -S pickaxe options and related options.

This test supports being run with GIT_TEST_LONG=1 to adjust the limit
on the number of commits from 1k to 10k. The 1k limit seems to hit a
good spot on git.git

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/perf/p4209-pickaxe.sh | 70 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)
 create mode 100755 t/perf/p4209-pickaxe.sh

diff --git a/t/perf/p4209-pickaxe.sh b/t/perf/p4209-pickaxe.sh
new file mode 100755
index 00000000000..f585a4465ae
--- /dev/null
+++ b/t/perf/p4209-pickaxe.sh
@@ -0,0 +1,70 @@
+#!/bin/sh
+
+test_description="Test pickaxe performance"
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+# Not --max-count, as that's the number of matching commit, so it's
+# unbounded. We want to limit our revision walk here.
+from_rev_desc=
+from_rev=
+max_count=1000
+if test_have_prereq EXPENSIVE
+then
+	max_count=10000
+fi
+from_rev=" $(git rev-list HEAD | head -n $max_count | tail -n 1).."
+from_rev_desc=" <limit-rev>.."
+
+for icase in \
+	'' \
+	'-i '
+do
+	# -S (no regex)
+	for pattern in \
+		'int main' \
+		'æ'
+	do
+		for opts in \
+			'-S'
+		do
+			test_perf "git log $icase$opts'$pattern'$from_rev_desc" "
+				git log --pretty=format:%H $icase$opts'$pattern'$from_rev
+			"
+		done
+	done
+
+	# -S (regex)
+	for pattern in  \
+		'(int|void|null)' \
+		'if *\([^ ]+ & ' \
+		'[àáâãäåæñøùúûüýþ]'
+	do
+		for opts in \
+			'--pickaxe-regex -S'
+		do
+			test_perf "git log $icase$opts'$pattern'$from_rev_desc" "
+				git log --pretty=format:%H $icase$opts'$pattern'$from_rev
+			"
+		done
+	done
+
+	# -G
+	for pattern in  \
+		'(int|void|null)' \
+		'if *\([^ ]+ & ' \
+		'[àáâãäåæñøùúûüýþ]'
+	do
+		for opts in \
+			'-G'
+		do
+			test_perf "git log $icase$opts'$pattern'$from_rev_desc" "
+				git log --pretty=format:%H $icase$opts'$pattern'$from_rev
+			"
+		done
+	done
+done
+
+test_done
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 14/22] pickaxe: refactor function selection in diffcore-pickaxe()
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (39 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 13/22] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-03-30 23:45   ` Junio C Hamano
  2021-02-16 11:57 ` [PATCH v2 15/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
                   ` (7 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

It's hard to read this codepath at a glance and reason about exactly
what combination of -G and -S will compile either regexes or kwset,
and whether we'll then dispatch to "diff_grep" or "has_changes".

Then in the "--find-object" case we aren't using the callback
function, but were previously passing down "has_changes".

Refactor this code to exhaustively check "opts", it's now more obvious
what callback function (or none) we want under what mode.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index a278b9b71d9..cff46f9f8f7 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -228,6 +228,7 @@ void diffcore_pickaxe(struct diff_options *o)
 	int opts = o->pickaxe_opts;
 	regex_t regex, *regexp = NULL;
 	kwset_t kws = NULL;
+	pickaxe_fn fn;
 
 	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
 		int cflags = REG_EXTENDED | REG_NEWLINE;
@@ -235,6 +236,14 @@ void diffcore_pickaxe(struct diff_options *o)
 			cflags |= REG_ICASE;
 		regcomp_or_die(&regex, needle, cflags);
 		regexp = &regex;
+
+		/* diff.c errors on -G and --pickaxe-regex for us */
+		if (opts & DIFF_PICKAXE_KIND_G)
+			fn = diff_grep;
+		else if (opts & DIFF_PICKAXE_REGEX)
+			fn = has_changes;
+		else
+			BUG("unreachable");
 	} else if (opts & DIFF_PICKAXE_KIND_S) {
 		if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE &&
 		    has_non_ascii(needle)) {
@@ -251,10 +260,14 @@ void diffcore_pickaxe(struct diff_options *o)
 			kwsincr(kws, needle, strlen(needle));
 			kwsprep(kws);
 		}
+		fn = has_changes;
+	} else if (opts & DIFF_PICKAXE_KIND_OBJFIND) {
+		fn = NULL;
+	} else {
+		BUG("unknown pickaxe_opts flag");
 	}
 
-	pickaxe(&diff_queued_diff, o, regexp, kws,
-		(opts & DIFF_PICKAXE_KIND_G) ? diff_grep : has_changes);
+	pickaxe(&diff_queued_diff, o, regexp, kws, fn);
 
 	if (regexp)
 		regfree(regexp);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 15/22] pickaxe: assert that we must have a needle under -G or -S
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (40 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 14/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-03-30 23:50   ` Junio C Hamano
  2021-02-16 11:57 ` [PATCH v2 16/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
                   ` (6 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Assert early in diffcore_pickaxe() that we've got a needle to work
with under -G and -S.

This code is redundant to the check -G and -S get from
parse-options.c's get_arg(), which I'm adding a test for.

This check dates back to e1b161161d (diffcore-pickaxe: fix infinite
loop on zero-length needle, 2007-01-25) when "git log -S" could send
this code into an infinite loop.

It was then later refactored in 8fa4b09fb1 (pickaxe: hoist empty
needle check, 2012-10-28) into its current form, but it seemingly
wasn't noticed that in the meantime a move to the parse-options.c API
in dea007fb4c (diff: parse separate options like -S foo, 2010-08-05)
had made it redundant.

Let's retain some of the paranoia here with a BUG(), but there's no
need to be checking this in the pickaxe_match() inner loop.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c     | 5 ++---
 t/t4209-log-pickaxe.sh | 6 ++++++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index cff46f9f8f7..dd1b5c72332 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -132,9 +132,6 @@ static int pickaxe_match(struct diff_filepair *p, struct diff_options *o,
 			 oidset_contains(o->objfind, &p->two->oid));
 	}
 
-	if (!o->pickaxe[0])
-		return 0;
-
 	if (o->flags.allow_textconv) {
 		textconv_one = get_textconv(o->repo, p->one);
 		textconv_two = get_textconv(o->repo, p->two);
@@ -230,6 +227,8 @@ void diffcore_pickaxe(struct diff_options *o)
 	kwset_t kws = NULL;
 	pickaxe_fn fn;
 
+	if (opts & ~DIFF_PICKAXE_KIND_OBJFIND && !needle)
+		BUG("should have needle under -G or -S");
 	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
 		int cflags = REG_EXTENDED | REG_NEWLINE;
 		if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE)
diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index bcaca7e882c..4b65b89e7a5 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -56,6 +56,12 @@ test_expect_success setup '
 '
 
 test_expect_success 'usage' '
+	test_expect_code 129 git log -S 2>err &&
+	test_i18ngrep "switch.*requires a value" err &&
+
+	test_expect_code 129 git log -G 2>err &&
+	test_i18ngrep "switch.*requires a value" err &&
+
 	test_expect_code 128 git log -Gregex -Sstring 2>err &&
 	test_i18ngrep "mutually exclusive" err &&
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 16/22] pickaxe -S: support content with NULs under --pickaxe-regex
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (41 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 15/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-03-30 23:54   ` Junio C Hamano
  2021-02-16 11:57 ` [PATCH v2 17/22] pickaxe: rename variables in has_changes() for brevity Ævar Arnfjörð Bjarmason
                   ` (5 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Fix a bug in the matching routine powering -S<rx> --pickaxe-regex so
that we won't abort early on content that has NULs in it.

We've had a hard requirement on REG_STARTEND since 2f8952250a8 (regex:
add regexec_buf() that can work on a non NUL-terminated string,
2016-09-21), but this sanity check dates back to d01d8c67828 (Support
for pickaxe matching regular expressions, 2006-03-29).

It wasn't needed anymore, and as the now-passing test shows, actively
getting in our way.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c     | 4 ++--
 t/t4209-log-pickaxe.sh | 8 ++++++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index dd1b5c72332..0bf50a2f595 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -78,12 +78,12 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
 		regmatch_t regmatch;
 		int flags = 0;
 
-		while (sz && *data &&
+		while (sz &&
 		       !regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
 			flags |= REG_NOTBOL;
 			data += regmatch.rm_eo;
 			sz -= regmatch.rm_eo;
-			if (sz && *data && regmatch.rm_so == regmatch.rm_eo) {
+			if (sz && regmatch.rm_so == regmatch.rm_eo) {
 				data++;
 				sz--;
 			}
diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 4b65b89e7a5..6ea1f02d142 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -201,4 +201,12 @@ test_expect_success 'log -S looks into binary files' '
 	test_cmp log full-log
 '
 
+test_expect_success 'log -S --pickaxe-regex looks into binary files' '
+	git -C GS-bin-txt log --pickaxe-regex -Sa >log &&
+	test_cmp log full-log &&
+
+	git -C GS-bin-txt log --pickaxe-regex -S[a] >log &&
+	test_cmp log full-log
+'
+
 test_done
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 17/22] pickaxe: rename variables in has_changes() for brevity
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (42 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 16/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:57 ` [PATCH v2 18/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Rename the {one,two}_contains variables to c{1,2}. This will make a
follow-up change easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 0bf50a2f595..66e34d254f1 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -108,9 +108,9 @@ static int has_changes(mmfile_t *one, mmfile_t *two,
 		       struct diff_options *o,
 		       regex_t *regexp, kwset_t kws)
 {
-	unsigned int one_contains = one ? contains(one, regexp, kws) : 0;
-	unsigned int two_contains = two ? contains(two, regexp, kws) : 0;
-	return one_contains != two_contains;
+	unsigned int c1 = one ? contains(one, regexp, kws) : 0;
+	unsigned int c2 = two ? contains(two, regexp, kws) : 0;
+	return c1 != c2;
 }
 
 static int pickaxe_match(struct diff_filepair *p, struct diff_options *o,
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 18/22] pickaxe -S: slightly optimize contains()
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (43 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 17/22] pickaxe: rename variables in has_changes() for brevity Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-03-30 23:58   ` Junio C Hamano
  2021-02-16 11:57 ` [PATCH v2 19/22] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

When the "log -S<pat>" switch counts occurrences of <pat> on the
pre-image and post-image of a change. As soon as we know we had e.g. 1
before and 2 now we can stop, we don't need to keep counting past 2.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 66e34d254f1..76c178bae2b 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -68,7 +68,8 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	return ecbdata.hit;
 }
 
-static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
+static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws,
+			     unsigned int limit)
 {
 	unsigned int cnt = 0;
 	unsigned long sz = mf->size;
@@ -88,6 +89,9 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
 				sz--;
 			}
 			cnt++;
+
+			if (limit && cnt == limit)
+				return cnt;
 		}
 
 	} else { /* Classic exact string match */
@@ -99,6 +103,9 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
 			sz -= offset + kwsm.size[0];
 			data += offset + kwsm.size[0];
 			cnt++;
+
+			if (limit && cnt == limit)
+				return cnt;
 		}
 	}
 	return cnt;
@@ -108,8 +115,8 @@ static int has_changes(mmfile_t *one, mmfile_t *two,
 		       struct diff_options *o,
 		       regex_t *regexp, kwset_t kws)
 {
-	unsigned int c1 = one ? contains(one, regexp, kws) : 0;
-	unsigned int c2 = two ? contains(two, regexp, kws) : 0;
+	unsigned int c1 = one ? contains(one, regexp, kws, 0) : 0;
+	unsigned int c2 = two ? contains(two, regexp, kws, c1 + 1) : 0;
 	return c1 != c2;
 }
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 19/22] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (44 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 18/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-03-31  0:04   ` Junio C Hamano
  2021-02-16 11:57 ` [PATCH v2 20/22] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Change the function prototype of xdiff_emit_{line,hunk}_fn to return
an int instead of void. This will allow for returning early from hunk
& diff consumers that want some of the data, but not all of it.

No behavior is being changed here, just replacing the equivalent of
"return" with "return 0", nothing acts on the changed return values
yet.

There was some work in this area of xdiff-interface.[ch] recently with
3b40a090fd4 (diff: avoid generating unused hunk header lines,
2018-11-02) and 7c61e25fbf1 (diff: use hunk callback for word-diff,
2018-11-02).

In combination those two changes allow us to not do any work on the
hunks and diff at all, but didn't change the status quo with regards
to consumers that e.g. want the diff lines, but might want to abort
early.

Whereas soon we can abort e.g. on the first "-line" of a 1000 line
diff if that's all we needed.

This interface is rather scary as noted in the comment to
xdiff-interface.h being added here, but it will be useful for
diffcore-pickaxe.c in a subsequent commit. A future change could
e.g. add more exit codes, and hack xdl_emit_diff() and friends to
ignore or skip things more selectively as a result.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 combine-diff.c     |  9 ++++++---
 diff.c             | 39 +++++++++++++++++++++++----------------
 diffcore-pickaxe.c |  7 ++++---
 range-diff.c       |  8 +++++---
 xdiff-interface.c  | 10 ++++++----
 xdiff-interface.h  | 31 +++++++++++++++++++++++--------
 6 files changed, 67 insertions(+), 37 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 9228aebc16b..6590c4b5fb7 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -369,7 +369,7 @@ struct combine_diff_state {
 	struct sline *lost_bucket;
 };
 
-static void consume_hunk(void *state_,
+static int consume_hunk(void *state_,
 			 long ob, long on,
 			 long nb, long nn,
 			 const char *funcline, long funclen)
@@ -401,13 +401,15 @@ static void consume_hunk(void *state_,
 		state->sline[state->nb-1].p_lno =
 			xcalloc(state->num_parent, sizeof(unsigned long));
 	state->sline[state->nb-1].p_lno[state->n] = state->ob;
+
+	return 0;
 }
 
-static void consume_line(void *state_, char *line, unsigned long len)
+static int consume_line(void *state_, char *line, unsigned long len)
 {
 	struct combine_diff_state *state = state_;
 	if (!state->lost_bucket)
-		return; /* not in any hunk yet */
+		return 0; /* not in any hunk yet */
 	switch (line[0]) {
 	case '-':
 		append_lost(state->lost_bucket, state->n, line+1, len-1);
@@ -417,6 +419,7 @@ static void consume_line(void *state_, char *line, unsigned long len)
 		state->lno++;
 		break;
 	}
+	return 0;
 }
 
 static void combine_diff(struct repository *r,
diff --git a/diff.c b/diff.c
index 63c49ecaef1..4d554281147 100644
--- a/diff.c
+++ b/diff.c
@@ -1996,10 +1996,10 @@ static int color_words_output_graph_prefix(struct diff_words_data *diff_words)
 	}
 }
 
-static void fn_out_diff_words_aux(void *priv,
-				  long minus_first, long minus_len,
-				  long plus_first, long plus_len,
-				  const char *func, long funclen)
+static int fn_out_diff_words_aux(void *priv,
+				 long minus_first, long minus_len,
+				 long plus_first, long plus_len,
+				 const char *func, long funclen)
 {
 	struct diff_words_data *diff_words = priv;
 	struct diff_words_style *style = diff_words->style;
@@ -2047,6 +2047,8 @@ static void fn_out_diff_words_aux(void *priv,
 
 	diff_words->current_plus = plus_end;
 	diff_words->last_minus = minus_first;
+
+	return 0;
 }
 
 /* This function starts looking at *begin, and returns 0 iff a word was found. */
@@ -2338,7 +2340,7 @@ static void find_lno(const char *line, struct emit_callback *ecbdata)
 	ecbdata->lno_in_postimage = strtol(p + 1, NULL, 10);
 }
 
-static void fn_out_consume(void *priv, char *line, unsigned long len)
+static int fn_out_consume(void *priv, char *line, unsigned long len)
 {
 	struct emit_callback *ecbdata = priv;
 	struct diff_options *o = ecbdata->opt;
@@ -2374,7 +2376,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		len = sane_truncate_line(line, len);
 		find_lno(line, ecbdata);
 		emit_hunk_header(ecbdata, line, len);
-		return;
+		return 0;
 	}
 
 	if (ecbdata->diff_words) {
@@ -2384,11 +2386,11 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		if (line[0] == '-') {
 			diff_words_append(line, len,
 					  &ecbdata->diff_words->minus);
-			return;
+			return 0;
 		} else if (line[0] == '+') {
 			diff_words_append(line, len,
 					  &ecbdata->diff_words->plus);
-			return;
+			return 0;
 		} else if (starts_with(line, "\\ ")) {
 			/*
 			 * Eat the "no newline at eof" marker as if we
@@ -2397,11 +2399,11 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 			 * defer processing. If this is the end of
 			 * preimage, more "+" lines may come after it.
 			 */
-			return;
+			return 0;
 		}
 		diff_words_flush(ecbdata);
 		emit_diff_symbol(o, s, line, len, 0);
-		return;
+		return 0;
 	}
 
 	switch (line[0]) {
@@ -2425,6 +2427,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 				 line, len, 0);
 		break;
 	}
+	return 0;
 }
 
 static void pprint_rename(struct strbuf *name, const char *a, const char *b)
@@ -2524,7 +2527,7 @@ static struct diffstat_file *diffstat_add(struct diffstat_t *diffstat,
 	return x;
 }
 
-static void diffstat_consume(void *priv, char *line, unsigned long len)
+static int diffstat_consume(void *priv, char *line, unsigned long len)
 {
 	struct diffstat_t *diffstat = priv;
 	struct diffstat_file *x = diffstat->files[diffstat->nr - 1];
@@ -2533,6 +2536,7 @@ static void diffstat_consume(void *priv, char *line, unsigned long len)
 		x->added++;
 	else if (line[0] == '-')
 		x->deleted++;
+	return 0;
 }
 
 const char mime_boundary_leader[] = "------------";
@@ -3201,16 +3205,17 @@ static int is_conflict_marker(const char *line, int marker_size, unsigned long l
 	return 1;
 }
 
-static void checkdiff_consume_hunk(void *priv,
+static int checkdiff_consume_hunk(void *priv,
 				   long ob, long on, long nb, long nn,
 				   const char *func, long funclen)
 
 {
 	struct checkdiff_t *data = priv;
 	data->lineno = nb - 1;
+	return 0;
 }
 
-static void checkdiff_consume(void *priv, char *line, unsigned long len)
+static int checkdiff_consume(void *priv, char *line, unsigned long len)
 {
 	struct checkdiff_t *data = priv;
 	int marker_size = data->conflict_marker_size;
@@ -3234,7 +3239,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 		}
 		bad = ws_check(line + 1, len - 1, data->ws_rule);
 		if (!bad)
-			return;
+			return 0;
 		data->status |= bad;
 		err = whitespace_error_string(bad);
 		fprintf(data->o->file, "%s%s:%d: %s.\n",
@@ -3246,6 +3251,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 	} else if (line[0] == ' ') {
 		data->lineno++;
 	}
+	return 0;
 }
 
 static unsigned char *deflate_it(char *data,
@@ -6104,17 +6110,18 @@ void flush_one_hunk(struct object_id *result, git_hash_ctx *ctx)
 	}
 }
 
-static void patch_id_consume(void *priv, char *line, unsigned long len)
+static int patch_id_consume(void *priv, char *line, unsigned long len)
 {
 	struct patch_id_t *data = priv;
 	int new_len;
 
 	if (len > 12 && starts_with(line, "\\ "))
-		return;
+		return 0;
 	new_len = remove_space(line, len);
 
 	the_hash_algo->update_fn(data->ctx, line, new_len);
 	data->patchlen += new_len;
+	return 0;
 }
 
 static void patch_id_add_string(git_hash_ctx *ctx, const char *str)
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 76c178bae2b..94601072bde 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -19,21 +19,22 @@ struct diffgrep_cb {
 	int hit;
 };
 
-static void diffgrep_consume(void *priv, char *line, unsigned long len)
+static int diffgrep_consume(void *priv, char *line, unsigned long len)
 {
 	struct diffgrep_cb *data = priv;
 	regmatch_t regmatch;
 
 	if (line[0] != '+' && line[0] != '-')
-		return;
+		return 0;
 	if (data->hit)
 		/*
 		 * NEEDSWORK: we should have a way to terminate the
 		 * caller early.
 		 */
-		return;
+		return 0;
 	data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1,
 				 &regmatch, 0);
+	return 0;
 }
 
 static int diff_grep(mmfile_t *one, mmfile_t *two,
diff --git a/range-diff.c b/range-diff.c
index a3cc7c94a3d..f51c6a67712 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -274,15 +274,17 @@ static void find_exact_matches(struct string_list *a, struct string_list *b)
 	hashmap_clear(&map);
 }
 
-static void diffsize_consume(void *data, char *line, unsigned long len)
+static int diffsize_consume(void *data, char *line, unsigned long len)
 {
 	(*(int *)data)++;
+	return 0;
 }
 
-static void diffsize_hunk(void *data, long ob, long on, long nb, long nn,
-			  const char *funcline, long funclen)
+static int diffsize_hunk(void *data, long ob, long on, long nb, long nn,
+			 const char *funcline, long funclen)
 {
 	diffsize_consume(data, NULL, 0);
+	return 0;
 }
 
 static int diffsize(const char *a, const char *b)
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 4d20069302b..ef557dc4e63 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -31,7 +31,7 @@ static int xdiff_out_hunk(void *priv_,
 	return 0;
 }
 
-static void consume_one(void *priv_, char *s, unsigned long size)
+static int consume_one(void *priv_, char *s, unsigned long size)
 {
 	struct xdiff_emit_state *priv = priv_;
 	char *ep;
@@ -43,6 +43,7 @@ static void consume_one(void *priv_, char *s, unsigned long size)
 		size -= this_size;
 		s += this_size;
 	}
+	return 0;
 }
 
 static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
@@ -115,10 +116,11 @@ int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t co
 	return xdl_diff(&a, &b, xpp, xecfg, xecb);
 }
 
-void discard_hunk_line(void *priv,
-		       long ob, long on, long nb, long nn,
-		       const char *func, long funclen)
+int discard_hunk_line(void *priv,
+		      long ob, long on, long nb, long nn,
+		      const char *func, long funclen)
 {
+	return 0;
 }
 
 int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 93df26900c2..1b27d6104ce 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -11,11 +11,26 @@
  */
 #define MAX_XDIFF_SIZE (1024UL * 1024 * 1023)
 
-typedef void (*xdiff_emit_line_fn)(void *, char *, unsigned long);
-typedef void (*xdiff_emit_hunk_fn)(void *data,
-				   long old_begin, long old_nr,
-				   long new_begin, long new_nr,
-				   const char *func, long funclen);
+/*
+ * The xdiff_emit_{line,hunk}_fn consumers can return -1 to abort
+ * early, or 0 to continue processing. Note that doing so is an
+ * all-or-nothing affair, as returning -1 will return all the way to
+ * the top-level, e.g. the xdi_diff_outf() call to generate the diff.
+ *
+ * Thus returning -1 from a hunk header callback means you won't be
+ * getting any more hunks, or diffs, and likewise returning from a
+ * line callback means you won't be getting anymore lines.
+ *
+ * We may extend the interface in the future to understand other more
+ * granular return values, but for now use it carefully, or consider
+ * e.g. using discard_hunk_line() if you say just don't care about
+ * hunk headers.
+ */
+typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
+typedef int (*xdiff_emit_hunk_fn)(void *data,
+				  long old_begin, long old_nr,
+				  long new_begin, long new_nr,
+				  const char *func, long funclen);
 
 int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb);
 int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
@@ -36,9 +51,9 @@ extern int git_xmerge_style;
  * Can be used as a no-op hunk_fn for xdi_diff_outf(), since a NULL
  * one just sends the hunk line to the line_fn callback).
  */
-void discard_hunk_line(void *priv,
-		       long ob, long on, long nb, long nn,
-		       const char *func, long funclen);
+int discard_hunk_line(void *priv,
+		      long ob, long on, long nb, long nn,
+		      const char *func, long funclen);
 
 /*
  * Compare the strings l1 with l2 which are of size s1 and s2 respectively.
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 20/22] xdiff-interface: support early exit in xdiff_outf()
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (45 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 19/22] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:57 ` Ævar Arnfjörð Bjarmason
  2021-02-16 11:58 ` [PATCH v2 21/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
  2021-02-16 11:58 ` [PATCH v2 22/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
  48 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Bridge the gap between the preceding "xdiff-interface: allow early
return from xdiff_emit_{line,hunk}_fn" change and the public
interface.

This change was split off from the preceding commit as it wasn't a
purely mechanical addition of "return 0".

Here we want to be able to abort early, but do so in a way that
doesn't skip the appropriate strbuf_reset() invocations.

The use of -1 as a return value is consistent with the rest of the
xdiff codebase, where doing so signals an abort or error that'll
propagate up the stack.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 xdiff-interface.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index ef557dc4e63..d066442470f 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -39,7 +39,8 @@ static int consume_one(void *priv_, char *s, unsigned long size)
 		unsigned long this_size;
 		ep = memchr(s, '\n', size);
 		this_size = (ep == NULL) ? size : (ep - s + 1);
-		priv->line_fn(priv->consume_callback_data, s, this_size);
+		if (priv->line_fn(priv->consume_callback_data, s, this_size))
+			return -1;
 		size -= this_size;
 		s += this_size;
 	}
@@ -50,11 +51,14 @@ static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
 {
 	struct xdiff_emit_state *priv = priv_;
 	int i;
+	int stop = 0;
 
 	if (!priv->line_fn)
 		return 0;
 
 	for (i = 0; i < nbuf; i++) {
+		if (stop)
+			return -1;
 		if (mb[i].ptr[mb[i].size-1] != '\n') {
 			/* Incomplete line */
 			strbuf_add(&priv->remainder, mb[i].ptr, mb[i].size);
@@ -63,17 +67,21 @@ static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
 
 		/* we have a complete line */
 		if (!priv->remainder.len) {
-			consume_one(priv, mb[i].ptr, mb[i].size);
+			stop = consume_one(priv, mb[i].ptr, mb[i].size);
 			continue;
 		}
 		strbuf_add(&priv->remainder, mb[i].ptr, mb[i].size);
-		consume_one(priv, priv->remainder.buf, priv->remainder.len);
+		stop = consume_one(priv, priv->remainder.buf, priv->remainder.len);
 		strbuf_reset(&priv->remainder);
 	}
+	if (stop)
+		return -1;
 	if (priv->remainder.len) {
-		consume_one(priv, priv->remainder.buf, priv->remainder.len);
+		stop = consume_one(priv, priv->remainder.buf, priv->remainder.len);
 		strbuf_reset(&priv->remainder);
 	}
+	if (stop)
+		return -1;
 	return 0;
 }
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 21/22] pickaxe -G: terminate early on matching lines
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (46 preceding siblings ...)
  2021-02-16 11:57 ` [PATCH v2 20/22] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:58 ` Ævar Arnfjörð Bjarmason
  2021-03-31  0:11   ` Junio C Hamano
  2021-02-16 11:58 ` [PATCH v2 22/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Solve a long-standing item for "git log -Grx" of us e.g. finding "+
str" in the diff context and noting that we had a "hit", but xdiff
diligently continuing to generate and spew the rest of the diff at us.

The TODO item has been there since "git log -G" was implemented. See
f506b8e8b5f (git log/diff: add -G<regexp> that greps in the patch
text, 2010-08-23).

Our xdiff interface also had the limitation of not being able to abort
early since the beginning, see d9ea73e0564 (combine-diff: refactor
built-in xdiff interface., 2006-04-05). Although at that time
"xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
"xdiff_emit_hunk_fn" didn't exist yet.

But now with the support added in the preceding ""xdiff-interface:
allow early return from xdiff_emit_{line,hunk}_fn" commit we can
return early, and furthermore test the functionality of the new
early-exit xdiff-interface by having a BUG() call here to die if it
ever starts handing us needless work again.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 31 ++++++++++++++++++++-----------
 xdiff-interface.h  |  5 +++++
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 94601072bde..f11b38b7121 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -27,13 +27,12 @@ static int diffgrep_consume(void *priv, char *line, unsigned long len)
 	if (line[0] != '+' && line[0] != '-')
 		return 0;
 	if (data->hit)
-		/*
-		 * NEEDSWORK: we should have a way to terminate the
-		 * caller early.
-		 */
-		return 0;
-	data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1,
-				 &regmatch, 0);
+		BUG("Already matched in diffgrep_consume! Broken xdiff_emit_line_fn?");
+	if (!regexec_buf(data->regexp, line + 1, len - 1, 1,
+			 &regmatch, 0)) {
+		data->hit = 1;
+		return -1;
+	}
 	return 0;
 }
 
@@ -45,6 +44,7 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	struct diffgrep_cb ecbdata;
 	xpparam_t xpp;
 	xdemitconf_t xecfg;
+	int ret;
 
 	if (!one)
 		return !regexec_buf(regexp, two->ptr, two->size,
@@ -63,10 +63,19 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	ecbdata.hit = 0;
 	xecfg.ctxlen = o->context;
 	xecfg.interhunkctxlen = o->interhunkcontext;
-	if (xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
-			  &ecbdata, &xpp, &xecfg))
-		return 0;
-	return ecbdata.hit;
+
+	/*
+	 * An xdiff error might be our "data->hit" from above. See the
+	 * comment for xdiff_emit_{line,hunk}_fn in xdiff-interface.h
+	 * for why.
+	 */
+	ret = xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
+			    &ecbdata, &xpp, &xecfg);
+	if (ecbdata.hit)
+		return 1;
+	if (ret)
+		return ret;
+	return 0;
 }
 
 static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws,
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 1b27d6104ce..347d8a4425f 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -25,6 +25,11 @@
  * granular return values, but for now use it carefully, or consider
  * e.g. using discard_hunk_line() if you say just don't care about
  * hunk headers.
+ *
+ * Note that just returning -1 will make your early return
+ * indistinguishable from an error internal to xdiff. See "diff_grep"
+ * in diffcore-pickaxe.c for a trick to work around this, i.e. using
+ * the "consume_callback_data" to note the desired early return.
  */
 typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
 typedef int (*xdiff_emit_hunk_fn)(void *data,
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 22/22] pickaxe -G: don't special-case create/delete
  2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
                   ` (47 preceding siblings ...)
  2021-02-16 11:58 ` [PATCH v2 21/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
@ 2021-02-16 11:58 ` Ævar Arnfjörð Bjarmason
  2021-03-31  0:14   ` Junio C Hamano
  48 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-16 11:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Instead of special-casing creations and deletions let's just generate
a diff for them.

This logic of not running a diff under -G if we don't have both sides
dates back to the original implementation of -S in
52e9578985f ([PATCH] Introducing software archaeologist's tool
"pickaxe"., 2005-05-21).

In the case of -S we were not working with the xdiff interface and
needed to do this, but when -G was implemented in f506b8e8b5f (git
log/diff: add -G<regexp> that greps in the patch text, 2010-08-23)
this logic was diligently copied over.

But as the performance test added earlier in this series shows, this
does not make much of a difference. With:

    time GIT_TEST_LONG= GIT_PERF_REPEAT_COUNT=10 GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3' ./run origin/next HEAD~ HEAD -- p4209-pickaxe.sh

With the HEAD~ commit being the preceding "pickaxe -G: terminate early
on matching lines" we get these results. Note that it's only the -G
codepaths that are relevant to this change:

    Test                                                                      origin/next       HEAD~                   HEAD
    -----------------------------------------------------------------------------------------------------------------------------------------
    4209.1: git log -S'int main' <limit-rev>..                                0.35(0.32+0.03)   0.35(0.33+0.02) +0.0%   0.35(0.30+0.05) +0.0%
    4209.2: git log -S'æ' <limit-rev>..                                       0.46(0.42+0.04)   0.46(0.41+0.05) +0.0%   0.46(0.42+0.04) +0.0%
    4209.3: git log --pickaxe-regex -S'(int|void|null)' <limit-rev>..         0.65(0.62+0.02)   0.64(0.61+0.02) -1.5%   0.64(0.60+0.04) -1.5%
    4209.4: git log --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>..          0.52(0.45+0.06)   0.52(0.50+0.01) +0.0%   0.54(0.47+0.04) +3.8%
    4209.5: git log --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>..       0.39(0.34+0.05)   0.39(0.34+0.04) +0.0%   0.39(0.36+0.03) +0.0%
    4209.6: git log -G'(int|void|null)' <limit-rev>..                         0.60(0.55+0.04)   0.58(0.54+0.03) -3.3%   0.58(0.49+0.08) -3.3%
    4209.7: git log -G'if *\([^ ]+ & ' <limit-rev>..                          0.61(0.52+0.06)   0.59(0.53+0.05) -3.3%   0.59(0.54+0.05) -3.3%
    4209.8: git log -G'[àáâãäåæñøùúûüýþ]' <limit-rev>..                       0.61(0.51+0.07)   0.58(0.54+0.04) -4.9%   0.57(0.51+0.06) -6.6%
    4209.9: git log -i -S'int main' <limit-rev>..                             0.36(0.31+0.04)   0.36(0.34+0.02) +0.0%   0.35(0.32+0.03) -2.8%
    4209.10: git log -i -S'æ' <limit-rev>..                                   0.36(0.33+0.03)   0.39(0.34+0.01) +8.3%   0.36(0.32+0.03) +0.0%
    4209.11: git log -i --pickaxe-regex -S'(int|void|null)' <limit-rev>..     0.83(0.77+0.05)   0.82(0.77+0.05) -1.2%   0.80(0.75+0.04) -3.6%
    4209.12: git log -i --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>..      0.67(0.61+0.03)   0.64(0.61+0.03) -4.5%   0.63(0.61+0.02) -6.0%
    4209.13: git log -i --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>..   0.40(0.37+0.02)   0.40(0.37+0.03) +0.0%   0.40(0.36+0.04) +0.0%
    4209.14: git log -i -G'(int|void|null)' <limit-rev>..                     0.58(0.51+0.07)   0.59(0.52+0.06) +1.7%   0.58(0.52+0.05) +0.0%
    4209.15: git log -i -G'if *\([^ ]+ & ' <limit-rev>..                      0.60(0.54+0.05)   0.60(0.54+0.06) +0.0%   0.60(0.56+0.03) +0.0%
    4209.16: git log -i -G'[àáâãäåæñøùúûüýþ]' <limit-rev>..                   0.58(0.51+0.06)   0.57(0.52+0.05) -1.7%   0.60(0.48+0.09) +3.4%

This small simplification really doesn't buy us much now, but I've got
plans to both convert the pickaxe code to using a PCREv2 backend[1]
and to implement additional pickaxe modes to do custom searches
through the diff[2]. Always having the diff available under -G is
going to help to simplify both of those changes.

1. https://lore.kernel.org/git/20210203032811.14979-22-avarab@gmail.com/
2. https://lore.kernel.org/git/20190424152215.16251-3-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index f11b38b7121..94d3890e669 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -40,19 +40,11 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 		     struct diff_options *o,
 		     regex_t *regexp, kwset_t kws)
 {
-	regmatch_t regmatch;
 	struct diffgrep_cb ecbdata;
 	xpparam_t xpp;
 	xdemitconf_t xecfg;
 	int ret;
 
-	if (!one)
-		return !regexec_buf(regexp, two->ptr, two->size,
-				    1, &regmatch, 0);
-	if (!two)
-		return !regexec_buf(regexp, one->ptr, one->size,
-				    1, &regmatch, 0);
-
 	/*
 	 * We have both sides; need to run textual diff and see if
 	 * the pattern appears on added/deleted lines.
@@ -173,9 +165,7 @@ static int pickaxe_match(struct diff_filepair *p, struct diff_options *o,
 	mf1.size = fill_textconv(o->repo, textconv_one, p->one, &mf1.ptr);
 	mf2.size = fill_textconv(o->repo, textconv_two, p->two, &mf2.ptr);
 
-	ret = fn(DIFF_FILE_VALID(p->one) ? &mf1 : NULL,
-		 DIFF_FILE_VALID(p->two) ? &mf2 : NULL,
-		 o, regexp, kws);
+	ret = fn(&mf1, &mf2, o, regexp, kws);
 
 	if (textconv_one)
 		free(mf1.ptr);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 00/22] pickaxe: test and refactoring for follow-up changes
  2021-02-16 11:57 ` [PATCH v2 00/22] pickaxe: test and refactoring for follow-up changes Ævar Arnfjörð Bjarmason
@ 2021-02-16 22:23   ` Junio C Hamano
  2021-02-17  1:19     ` Junio C Hamano
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 99+ messages in thread
From: Junio C Hamano @ 2021-02-16 22:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> This is a smaller v2 of the series to remove the kwset backend and
> make pickaxe use PCRE v2[1].
>
> That's not being done here yet. These are mostly small
> refactoring/test fixes. The most significant work is a new xdiff
> interface at the end of the series.
>
> It's based on next where some preparatory work already landed[2].

Do you really mean <20210204210556.25242-1-avarab@gmail.com>?

  grep/pcre2: drop needless assignment + assert() on opt->pcre2
  grep/pcre2: drop needless assignment to NULL
  grep/pcre2: correct reference to grep_init() in comment
  grep/pcre2: prepare to add debugging to pcre2_malloc()
  grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode
  grep/pcre2: use compile-time PCREv2 version test
  grep/pcre2: use pcre2_maketables_free() function
  grep/pcre2: actually make pcre2 use custom allocator
  grep/pcre2: move back to thread-only PCREv2 structures
  grep/pcre2: move definitions of pcre2_{malloc,free}

I do not think we have that many patches whose title begin with
grep/pcre2 in 'next'.

In any case, I'd rather not to see things done directly on 'next';
targetting a selected few topics merged on top of 'master' would
not be bad, though.

Thanks.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 00/22] pickaxe: test and refactoring for follow-up changes
  2021-02-16 22:23   ` Junio C Hamano
@ 2021-02-17  1:19     ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-02-17  1:19 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Junio C Hamano <gitster@pobox.com> writes:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> This is a smaller v2 of the series to remove the kwset backend and
>> make pickaxe use PCRE v2[1].
>>
>> That's not being done here yet. These are mostly small
>> refactoring/test fixes. The most significant work is a new xdiff
>> interface at the end of the series.
>>
>> It's based on next where some preparatory work already landed[2].
>
> Do you really mean <20210204210556.25242-1-avarab@gmail.com>?
> ...
> I do not think we have that many patches whose title begin with
> grep/pcre2 in 'next'.
>
> In any case, I'd rather not to see things done directly on 'next';
> targetting a selected few topics merged on top of 'master' would
> not be bad, though.

It turns out that these patches textually depend on two topics in
'next', and both are to be merged to 'master' hopefully by the end
of the week, so perhaps that is a good time to queue this topic to
'seen' on top of the then-current master.

Thanks.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 04/22] test-lib functions: add --printf option to test_commit
  2021-02-16 11:57 ` [PATCH v2 04/22] test-lib functions: add --printf option to test_commit Ævar Arnfjörð Bjarmason
@ 2021-03-30 23:11   ` Junio C Hamano
  2021-04-12 13:19     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 99+ messages in thread
From: Junio C Hamano @ 2021-03-30 23:11 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

>  test_expect_success 'can parse blob ending with CR' '
> -	printf "[some]key = value\\r" >config &&
> -	git add config &&
> -	git commit -m CR &&
> +	test_commit --printf CR config "[some]key = value\\r" &&

OK, the first arg becomes the commit title, and the second one is
used for the filename, and the next arg is the string given to
printf, I guess.

>  test_expect_success 'rerere forget (binary)' '
>  	git checkout -f side &&
> -	printf "a\0c" >binary &&
> -	git commit -a -m binary &&
> +	test_commit binary binary "a\0c" &&

This lacks --printf.  Are we breaking the test but "test-must-fail"
is hiding the breakage here?

>  	test_must_fail git merge second &&
>  	git rerere forget binary
>  '


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 02/22] test-lib-functions: document and test test_commit --no-tag
  2021-02-16 11:57 ` [PATCH v2 02/22] test-lib-functions: document and test test_commit --no-tag Ævar Arnfjörð Bjarmason
@ 2021-03-30 23:14   ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-03-30 23:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> @@ -242,7 +245,10 @@ test_commit () {
>  	git ${indir:+ -C "$indir"} commit \
>  	    ${author:+ --author "$author"} \
>  	    $signoff -m "$1" &&
> -	if test -z "$no_tag"
> +	if test -n "$no_tag" -a $# -eq 4

Let's spell it

	test -n "$no_tag" && test $# = 4

to avoid clueless newbie from cargo-culting the use of '-o' and '-a'
with test in an unsafe context, even though the way you used it here
is perfectly safe.

> +	then
> +		BUG "expect no <tag> parameter with --no-tag"
> +	elif test -z "$no_tag"
>  	then
>  		git ${indir:+ -C "$indir"} tag "${4:-$1}"
>  	fi

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 05/22] pickaxe tests: refactor to use test_commit --append --printf
  2021-02-16 11:57 ` [PATCH v2 05/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
@ 2021-03-30 23:26   ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-03-30 23:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Refactor existing tests added in e0e7cb8080c (log -G: ignore binary
> files, 2018-12-14) to use the --append option I added in
> 3373518cc8b (test-lib functions: add an --append option to
> test_commit, 2021-01-12) and the --printf option added in a preceding
> commit.
>
> See also f5d79bf7dd6 (tests: refactor a few tests to use "test_commit
> --append", 2021-01-12) for prior similar refactoring.

This does not exactly look like a "refactoring".  There are at least
two differences and it is the courteous thing to do to readers to
note them and explain why these differences do not impact the
correctness of the tests, I would say.

 * The original uses a dedicated branch, while the rewritten uses a
   dedicated repository for these tests.  This does not impact the
   correctness as long as all the tests that originally used the
   branch are made to run in this new repository with "git -C".

 * The youngest change removes the binary sample file, while the
   rewritten only truncates it.  This does not impact the
   correctness, because a removed file and a file whose lines are
   all removed behave in the same way with respect to -G/-S.  Both
   will reduce the number of occurrence of -S<needle>, and make
   "removal" (-) lines of -G<needle> appear in the patch.

>  test_expect_success 'setup log -[GS] binary & --text' '
> -	git checkout --orphan GS-binary-and-text &&
> -	git read-tree --empty &&
> -	printf "a\na\0a\n" >data.bin &&
> -	git add data.bin &&
> -	git commit -m "create binary file" data.bin &&
> -	printf "a\na\0a\n" >>data.bin &&
> -	git commit -m "modify binary file" data.bin &&
> -	git rm data.bin &&
> -	git commit -m "delete binary file" data.bin &&
> -	git log >full-log
> +	test_create_repo GS-bin-txt &&
> +	test_commit -C GS-bin-txt --append --printf A data.bin "a\na\0a\n" &&

The --append on the first one does not make any difference ;-)

> +	test_commit -C GS-bin-txt --append --printf B data.bin "a\na\0a\n" &&
> +	test_commit -C GS-bin-txt C data.bin "" &&

The original removes the file at the end, while this merely
truncates it.

> +	git -C GS-bin-txt log >full-log
>  '

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 08/22] pickaxe tests: test for -G, -S and --find-object incompatibility
  2021-02-16 11:57 ` [PATCH v2 08/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
@ 2021-03-30 23:32   ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-03-30 23:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add a test for the options sanity check added in 5e505257f2 (diff:
> properly error out when combining multiple pickaxe options,
> 2018-01-04).
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  t/t4209-log-pickaxe.sh | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
> index c6b4751d5b6..5ad4fad964c 100755
> --- a/t/t4209-log-pickaxe.sh
> +++ b/t/t4209-log-pickaxe.sh
> @@ -55,6 +55,17 @@ test_expect_success setup '
>  	git rev-parse --verify HEAD >expect_second
>  '
>  
> +test_expect_success 'usage' '
> +	test_expect_code 128 git log -Gregex -Sstring 2>err &&
> +	test_i18ngrep "mutually exclusive" err &&
> +
> +	test_expect_code 128 git log -Gregex --find-object=HEAD 2>err &&
> +	test_i18ngrep "mutually exclusive" err &&
> +
> +	test_expect_code 128 git log -Gstring --find-object=HEAD 2>err &&
> +	test_i18ngrep "mutually exclusive" err

Didn't you mean -Sstring for the third one?  Otherwise we'd be
testing -G/--find-object combination twice.

> +'
> +
>  test_log	expect_initial	--grep initial
>  test_log	expect_nomatch	--grep InItial
>  test_log_icase	expect_initial	--grep InItial

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 09/22] pickaxe: die when -G and --pickaxe-regex are combined
  2021-02-16 11:57 ` [PATCH v2 09/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
@ 2021-03-30 23:36   ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-03-30 23:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> When the -G and --pickaxe-regex options are combined we simply ignore
> the --pickaxe-regex option. Let's die instead as suggested by our
> documentation, since -G is always a regex.
>
> When --pickaxe-regex was added in d01d8c6782 (Support for pickaxe
> matching regular expressions, 2006-03-29) only the -S option
> existed. Then when -G was added in f506b8e8b5 (git log/diff: add
> -G<regexp> that greps in the patch text, 2010-08-23) neither the
> documentation for --pickaxe-regex was updater accordingly, nor was

s/updater/updated/;

> something like this assertion added.
>
> Since 5bc3f0b567 (diffcore-pickaxe doc: document -S and -G properly,
> 2013-05-31) we've claimed that --pickaxe-regex should only be used
> with -S, but have silently toileted combining it with -G, let's die

toilet?  tolerate?

> instead.

Hmph.  I've always hated that -G can only take regexp and users
cannot ask for a fixed-string match.  It may not be a bad idea to
keep this as-is, as that would leave the door open for those who are
motivated enough to later introduce --no-pickaxe-regex, so that
-G<string> would naturally work with combination with it.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 14/22] pickaxe: refactor function selection in diffcore-pickaxe()
  2021-02-16 11:57 ` [PATCH v2 14/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
@ 2021-03-30 23:45   ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-03-30 23:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> +	pickaxe_fn fn;
>  
>  	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
>  		int cflags = REG_EXTENDED | REG_NEWLINE;
> @@ -235,6 +236,14 @@ void diffcore_pickaxe(struct diff_options *o)
>  			cflags |= REG_ICASE;
>  		regcomp_or_die(&regex, needle, cflags);
>  		regexp = &regex;
> +
> +		/* diff.c errors on -G and --pickaxe-regex for us */

I had to read this twice; I am guessing that the comment wants to
say that this if/else if/else cascade is correct because KIND_G and
PICKAXE_REGEX are mutually incompatible (ensured in diff.c).  And I
think that is true (but as I said, I am not sure if we want to cast
in stone that kind-g and regex are mutually exclusive---rather, I'd
want to see them eventually orthogonal).

> +		if (opts & DIFF_PICKAXE_KIND_G)
> +			fn = diff_grep;
> +		else if (opts & DIFF_PICKAXE_REGEX)
> +			fn = has_changes;
> +		else
> +			BUG("unreachable");
>  	} else if (opts & DIFF_PICKAXE_KIND_S) {
>  		if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE &&
>  		    has_non_ascii(needle)) {
> @@ -251,10 +260,14 @@ void diffcore_pickaxe(struct diff_options *o)
>  			kwsincr(kws, needle, strlen(needle));
>  			kwsprep(kws);
>  		}
> +		fn = has_changes;
> +	} else if (opts & DIFF_PICKAXE_KIND_OBJFIND) {
> +		fn = NULL;

This is the most valuable line in this patch ;-)  It makes tons of sense.

> +	} else {
> +		BUG("unknown pickaxe_opts flag");
>  	}
>  
> -	pickaxe(&diff_queued_diff, o, regexp, kws,
> -		(opts & DIFF_PICKAXE_KIND_G) ? diff_grep : has_changes);
> +	pickaxe(&diff_queued_diff, o, regexp, kws, fn);
>  
>  	if (regexp)
>  		regfree(regexp);

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 15/22] pickaxe: assert that we must have a needle under -G or -S
  2021-02-16 11:57 ` [PATCH v2 15/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
@ 2021-03-30 23:50   ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-03-30 23:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
> index cff46f9f8f7..dd1b5c72332 100644
> --- a/diffcore-pickaxe.c
> +++ b/diffcore-pickaxe.c
> @@ -132,9 +132,6 @@ static int pickaxe_match(struct diff_filepair *p, struct diff_options *o,
>  			 oidset_contains(o->objfind, &p->two->oid));
>  	}
>  
> -	if (!o->pickaxe[0])
> -		return 0;

So -S"" could pass o->pickaxe a non-NULL pointer, but the string
pointed by that pointer could be 0-length.  And that is not what we
want to see happen.

>  	if (o->flags.allow_textconv) {
>  		textconv_one = get_textconv(o->repo, p->one);
>  		textconv_two = get_textconv(o->repo, p->two);
> @@ -230,6 +227,8 @@ void diffcore_pickaxe(struct diff_options *o)
>  	kwset_t kws = NULL;
>  	pickaxe_fn fn;
>  
> +	if (opts & ~DIFF_PICKAXE_KIND_OBJFIND && !needle)
> +		BUG("should have needle under -G or -S");

Here, needle was picked up at the beginning of this function like
so:

        void diffcore_pickaxe(struct diff_options *o)
        {
                const char *needle = o->pickaxe;

The two checks seem to be protecting from different things.
Shouldn't the new BUG() condition be more like

	if ((opts & ~DIFF_PICKAXE_KIND_OBJFIND) && (!needle || !*needle))

instead?

>  	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
>  		int cflags = REG_EXTENDED | REG_NEWLINE;
>  		if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE)
> diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
> index bcaca7e882c..4b65b89e7a5 100755
> --- a/t/t4209-log-pickaxe.sh
> +++ b/t/t4209-log-pickaxe.sh
> @@ -56,6 +56,12 @@ test_expect_success setup '
>  '
>  
>  test_expect_success 'usage' '
> +	test_expect_code 129 git log -S 2>err &&
> +	test_i18ngrep "switch.*requires a value" err &&
> +
> +	test_expect_code 129 git log -G 2>err &&
> +	test_i18ngrep "switch.*requires a value" err &&
> +
>  	test_expect_code 128 git log -Gregex -Sstring 2>err &&
>  	test_i18ngrep "mutually exclusive" err &&

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 16/22] pickaxe -S: support content with NULs under --pickaxe-regex
  2021-02-16 11:57 ` [PATCH v2 16/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
@ 2021-03-30 23:54   ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-03-30 23:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Fix a bug in the matching routine powering -S<rx> --pickaxe-regex so
> that we won't abort early on content that has NULs in it.
>
> We've had a hard requirement on REG_STARTEND since 2f8952250a8 (regex:
> add regexec_buf() that can work on a non NUL-terminated string,
> 2016-09-21), but this sanity check dates back to d01d8c67828 (Support
> for pickaxe matching regular expressions, 2006-03-29).
>
> It wasn't needed anymore, and as the now-passing test shows, actively
> getting in our way.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  diffcore-pickaxe.c     | 4 ++--
>  t/t4209-log-pickaxe.sh | 8 ++++++++
>  2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
> index dd1b5c72332..0bf50a2f595 100644
> --- a/diffcore-pickaxe.c
> +++ b/diffcore-pickaxe.c
> @@ -78,12 +78,12 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
>  		regmatch_t regmatch;
>  		int flags = 0;
>  
> -		while (sz && *data &&
> +		while (sz &&
>  		       !regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
>  			flags |= REG_NOTBOL;
>  			data += regmatch.rm_eo;
>  			sz -= regmatch.rm_eo;
> -			if (sz && *data && regmatch.rm_so == regmatch.rm_eo) {
> +			if (sz && regmatch.rm_so == regmatch.rm_eo) {

OK.  As we always require start-end support, we do not need to stop
at NULs, and we shouldn't if we are dealing with a haystack with NUL
in it.  The needle may be behind that NUL.

Makes sense.

>  				data++;
>  				sz--;
>  			}
> diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
> index 4b65b89e7a5..6ea1f02d142 100755
> --- a/t/t4209-log-pickaxe.sh
> +++ b/t/t4209-log-pickaxe.sh
> @@ -201,4 +201,12 @@ test_expect_success 'log -S looks into binary files' '
>  	test_cmp log full-log
>  '
>  
> +test_expect_success 'log -S --pickaxe-regex looks into binary files' '
> +	git -C GS-bin-txt log --pickaxe-regex -Sa >log &&
> +	test_cmp log full-log &&
> +
> +	git -C GS-bin-txt log --pickaxe-regex -S[a] >log &&

Please quote this so that readers do not have to look around to see
if there _could_ have been a file with such a name that matches the
[glob] to cause the needle not passed literally to us.

> +	test_cmp log full-log
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 18/22] pickaxe -S: slightly optimize contains()
  2021-02-16 11:57 ` [PATCH v2 18/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
@ 2021-03-30 23:58   ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-03-30 23:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> When the "log -S<pat>" switch counts occurrences of <pat> on the
> pre-image and post-image of a change. As soon as we know we had e.g. 1
> before and 2 now we can stop, we don't need to keep counting past 2.

Logical.

I do not know how much difference this would make in practice,
though.  The performance characteristics between "diff A B"
and "diff B A" with the same pickaxe -Sneedle would be asymmetric
with this optimization, which is a bit curious (but not incorrect).

I wonder if there is a good heuristics to decide which, between one
and two, blob to start counting.  Obviously, scanning the one that
is likely to contain fewer occurrence of the needle, before we can
employ this optimization to the other side, is what we want.

>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  diffcore-pickaxe.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
> index 66e34d254f1..76c178bae2b 100644
> --- a/diffcore-pickaxe.c
> +++ b/diffcore-pickaxe.c
> @@ -68,7 +68,8 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
>  	return ecbdata.hit;
>  }
>  
> -static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
> +static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws,
> +			     unsigned int limit)
>  {
>  	unsigned int cnt = 0;
>  	unsigned long sz = mf->size;
> @@ -88,6 +89,9 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
>  				sz--;
>  			}
>  			cnt++;
> +
> +			if (limit && cnt == limit)
> +				return cnt;
>  		}
>  
>  	} else { /* Classic exact string match */
> @@ -99,6 +103,9 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
>  			sz -= offset + kwsm.size[0];
>  			data += offset + kwsm.size[0];
>  			cnt++;
> +
> +			if (limit && cnt == limit)
> +				return cnt;
>  		}
>  	}
>  	return cnt;
> @@ -108,8 +115,8 @@ static int has_changes(mmfile_t *one, mmfile_t *two,
>  		       struct diff_options *o,
>  		       regex_t *regexp, kwset_t kws)
>  {
> -	unsigned int c1 = one ? contains(one, regexp, kws) : 0;
> -	unsigned int c2 = two ? contains(two, regexp, kws) : 0;
> +	unsigned int c1 = one ? contains(one, regexp, kws, 0) : 0;
> +	unsigned int c2 = two ? contains(two, regexp, kws, c1 + 1) : 0;
>  	return c1 != c2;
>  }

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 19/22] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn
  2021-02-16 11:57 ` [PATCH v2 19/22] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
@ 2021-03-31  0:04   ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-03-31  0:04 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change the function prototype of xdiff_emit_{line,hunk}_fn to return
> an int instead of void. This will allow for returning early from hunk
> & diff consumers that want some of the data, but not all of it.
>
> No behavior is being changed here, just replacing the equivalent of
> "return" with "return 0", nothing acts on the changed return values
> yet.

So, is this "allowing" early return yet?  I am looking at the patch
title and then reading the above paragraph and wondering.

Or is this step "preparation for allowing" and the real fun is left
for later steps?

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 21/22] pickaxe -G: terminate early on matching lines
  2021-02-16 11:58 ` [PATCH v2 21/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
@ 2021-03-31  0:11   ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-03-31  0:11 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Solve a long-standing item for "git log -Grx" of us e.g. finding "+
> str" in the diff context and noting that we had a "hit", but xdiff
> diligently continuing to generate and spew the rest of the diff at us.
>

Nice.  "git log -Gpattern" without "-p" has no reason to do the full
diff once it finds out that pattern would have existed if "-p" were
given in the output.

It actually is a bit of shame that "git log -Gpattern -p" still
needs to run two diffs (instead of taking advantage of the diff that
it needs to generate anyway (to show to the user of the "log") and
pattern match in it).

>  	if (data->hit)
> +		BUG("Already matched in diffgrep_consume! Broken xdiff_emit_line_fn?");

Hmph, an obvious alternative would be to silently return -1 here,
which probably would not hurt, either.  I do not mind the check to
be stricter, though.

> +	if (!regexec_buf(data->regexp, line + 1, len - 1, 1,
> +			 &regmatch, 0)) {
> +		data->hit = 1;
> +		return -1;
> +	}
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 22/22] pickaxe -G: don't special-case create/delete
  2021-02-16 11:58 ` [PATCH v2 22/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
@ 2021-03-31  0:14   ` Junio C Hamano
  0 siblings, 0 replies; 99+ messages in thread
From: Junio C Hamano @ 2021-03-31  0:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Instead of special-casing creations and deletions let's just generate
> a diff for them.
>
> This logic of not running a diff under -G if we don't have both sides
> dates back to the original implementation of -S in
> 52e9578985f ([PATCH] Introducing software archaeologist's tool
> "pickaxe"., 2005-05-21).
>
> In the case of -S we were not working with the xdiff interface and
> needed to do this, but when -G was implemented in f506b8e8b5f (git
> log/diff: add -G<regexp> that greps in the patch text, 2010-08-23)
> this logic was diligently copied over.

Nicely analized.  Yes, I agree that -G that special cases deletion
and creation is just being silly, mimicking what -S did without
thinking.  I can imagine that running grep over diff output in
normal case, and having to run grep over a single side in the edge
cases, would require unnecessary code duplication.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 04/22] test-lib functions: add --printf option to test_commit
  2021-03-30 23:11   ` Junio C Hamano
@ 2021-04-12 13:19     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 13:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Johannes Schindelin, Carlo Marcelo Arenas Belón


On Wed, Mar 31 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>>  test_expect_success 'can parse blob ending with CR' '
>> -	printf "[some]key = value\\r" >config &&
>> -	git add config &&
>> -	git commit -m CR &&
>> +	test_commit --printf CR config "[some]key = value\\r" &&
>
> OK, the first arg becomes the commit title, and the second one is
> used for the filename, and the next arg is the string given to
> printf, I guess.
>
>>  test_expect_success 'rerere forget (binary)' '
>>  	git checkout -f side &&
>> -	printf "a\0c" >binary &&
>> -	git commit -a -m binary &&
>> +	test_commit binary binary "a\0c" &&
>
> This lacks --printf.  Are we breaking the test but "test-must-fail"
> is hiding the breakage here?

Yes, well spotted. FWIW in splitting this out into another series I
fixed this bug in the re-roll:
https://lore.kernel.org/git/cover-00.16-00000000000-20210412T110456Z-avarab@gmail.com/T/#ma9ef67d8198c203adc05aab44f87aa753a3df993

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend
  2021-02-16 11:57 ` [PATCH v2 00/22] pickaxe: test and refactoring for follow-up changes Ævar Arnfjörð Bjarmason
  2021-02-16 22:23   ` Junio C Hamano
@ 2021-04-12 17:15   ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
                       ` (21 more replies)
  1 sibling, 22 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

This much-delayed re-roll of v2[2] is a test and refactoring change to
diffcore-pickaxe.c to allow an eventual removal of the kwset.[ch]
files and to add a PCRE backend.

This series is now based on my "test-lib.sh: new test_commit args,
simplification & fixes" series. The trivial dependency between the two
is using the new test_commit --printf option.

I'll not summarize the range-diff too much, but since v2 I addressed
all outstanding feedback from Junio. There's 2x new patches at the end
of the series to change the existing early return feature now using
"discard_hunk_line" to use a flag.

In v2 I had what I realized was a WIP migration of that to using
return values in the callback instead, but unlike the
xdiff_emit_line_fn I don't think that makes sense in
xdiff_emit_hunk_fn.

1. https://lore.kernel.org/git/20210216115801.4773-1-avarab@gmail.com/
2. https://lore.kernel.org/git/cover-00.16-00000000000-20210412T110456Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (22):
  grep/pcre2 tests: reword comments referring to kwset
  pickaxe tests: refactor to use test_commit --append --printf
  pickaxe tests: add test for diffgrep_consume() internals
  pickaxe tests: add test for "log -S" not being a regex
  pickaxe tests: test for -G, -S and --find-object incompatibility
  pickaxe tests: add missing test for --no-pickaxe-regex being an error
  pickaxe: die when -G and --pickaxe-regex are combined
  pickaxe: die when --find-object and --pickaxe-all are combined
  diff.h: move pickaxe fields together again
  pickaxe/style: consolidate declarations and assignments
  perf: add performance test for pickaxe
  pickaxe: refactor function selection in diffcore-pickaxe()
  pickaxe: assert that we must have a needle under -G or -S
  pickaxe -S: support content with NULs under --pickaxe-regex
  pickaxe: rename variables in has_changes() for brevity
  pickaxe -S: slightly optimize contains()
  xdiff-interface: prepare for allowing early return
  xdiff-interface: allow early return from xdiff_emit_line_fn
  pickaxe -G: terminate early on matching lines
  pickaxe -G: don't special-case create/delete
  xdiff users: use designated initializers for out_line
  xdiff-interface: replace discard_hunk_line() with a flag

 builtin/merge-tree.c           |   5 +-
 builtin/rerere.c               |   4 +-
 combine-diff.c                 |   5 +-
 diff.c                         |  39 +++++++----
 diff.h                         |   7 +-
 diffcore-pickaxe.c             | 106 ++++++++++++++++++------------
 range-diff.c                   |   3 +-
 t/perf/p4209-pickaxe.sh        |  70 ++++++++++++++++++++
 t/t4209-log-pickaxe.sh         | 114 ++++++++++++++++++++++++++++-----
 t/t7816-grep-binary-pattern.sh |   4 +-
 xdiff-interface.c              |  27 ++++----
 xdiff-interface.h              |  31 ++++++---
 xdiff/xdiff.h                  |   1 +
 xdiff/xemit.c                  |   3 +-
 14 files changed, 312 insertions(+), 107 deletions(-)
 create mode 100755 t/perf/p4209-pickaxe.sh

Range-diff against v2:
 1:  75bfc8eba13 =  1:  cfe934d6081 grep/pcre2 tests: reword comments referring to kwset
 2:  cddb1f75f6c <  -:  ----------- test-lib-functions: document and test test_commit --no-tag
 3:  44540f6039e <  -:  ----------- test-lib-functions: reword "test_commit --append" docs
 4:  0e1f133476f <  -:  ----------- test-lib functions: add --printf option to test_commit
 5:  2a814e8d53a !  2:  413a330d3d6 pickaxe tests: refactor to use test_commit --append --printf
    @@ Metadata
      ## Commit message ##
         pickaxe tests: refactor to use test_commit --append --printf
     
    -    Refactor existing tests added in e0e7cb8080c (log -G: ignore binary
    -    files, 2018-12-14) to use the --append option I added in
    +    Refactor the existing tests added in e0e7cb8080c (log -G: ignore
    +    binary files, 2018-12-14) to use the --append option I added in
         3373518cc8b (test-lib functions: add an --append option to
    -    test_commit, 2021-01-12) and the --printf option added in a preceding
    -    commit.
    +    test_commit, 2021-01-12) and the --printf option added as part of an
    +    in-flight topic of mine this commit depends on.
    +
    +    While I'm at it change some of the setup of the test to use a more
    +    sensible pattern, e.g. setting up a temporary repo instead of creating
    +    an orphan branch.
    +
    +    Since the -G and -S options will behave the same way with truncated
    +    and removed content also change the "git rm" to emptying data.bin,
    +    that's just catering to how test_commit works. The resulting test is
    +    shorter.
     
         See also f5d79bf7dd6 (tests: refactor a few tests to use "test_commit
         --append", 2021-01-12) for prior similar refactoring.
    @@ t/t4209-log-pickaxe.sh: test_expect_success 'log -S --no-textconv (missing textc
     -	git commit -m "delete binary file" data.bin &&
     -	git log >full-log
     +	test_create_repo GS-bin-txt &&
    -+	test_commit -C GS-bin-txt --append --printf A data.bin "a\na\0a\n" &&
    ++	test_commit -C GS-bin-txt --printf A data.bin "a\na\0a\n" &&
     +	test_commit -C GS-bin-txt --append --printf B data.bin "a\na\0a\n" &&
     +	test_commit -C GS-bin-txt C data.bin "" &&
     +	git -C GS-bin-txt log >full-log
 6:  f49eb6c95e5 !  3:  ddd2224836b pickaxe tests: add test for diffgrep_consume() internals
    @@ t/t4209-log-pickaxe.sh: test_expect_success 'log -S --no-textconv (missing textc
     +
      test_expect_success 'setup log -[GS] binary & --text' '
      	test_create_repo GS-bin-txt &&
    - 	test_commit -C GS-bin-txt --append --printf A data.bin "a\na\0a\n" &&
    + 	test_commit -C GS-bin-txt --printf A data.bin "a\na\0a\n" &&
 7:  80c62fb0448 =  4:  ca6340c1fa7 pickaxe tests: add test for "log -S" not being a regex
 8:  6d329e0c3b1 !  5:  0c4657189a8 pickaxe tests: test for -G, -S and --find-object incompatibility
    @@ t/t4209-log-pickaxe.sh: test_expect_success setup '
      
     +test_expect_success 'usage' '
     +	test_expect_code 128 git log -Gregex -Sstring 2>err &&
    -+	test_i18ngrep "mutually exclusive" err &&
    ++	grep "mutually exclusive" err &&
     +
     +	test_expect_code 128 git log -Gregex --find-object=HEAD 2>err &&
    -+	test_i18ngrep "mutually exclusive" err &&
    ++	grep "mutually exclusive" err &&
     +
    -+	test_expect_code 128 git log -Gstring --find-object=HEAD 2>err &&
    -+	test_i18ngrep "mutually exclusive" err
    ++	test_expect_code 128 git log -Sstring --find-object=HEAD 2>err &&
    ++	grep "mutually exclusive" err
     +'
     +
      test_log	expect_initial	--grep initial
 -:  ----------- >  6:  1696076bb09 pickaxe tests: add missing test for --no-pickaxe-regex being an error
 9:  bd0c3b7e3b0 !  7:  83e7b4793b6 pickaxe: die when -G and --pickaxe-regex are combined
    @@ Commit message
         matching regular expressions, 2006-03-29) only the -S option
         existed. Then when -G was added in f506b8e8b5 (git log/diff: add
         -G<regexp> that greps in the patch text, 2010-08-23) neither the
    -    documentation for --pickaxe-regex was updater accordingly, nor was
    +    documentation for --pickaxe-regex was updated accordingly, nor was
         something like this assertion added.
     
         Since 5bc3f0b567 (diffcore-pickaxe doc: document -S and -G properly,
         2013-05-31) we've claimed that --pickaxe-regex should only be used
    -    with -S, but have silently toileted combining it with -G, let's die
    +    with -S, but have silently tolerated combining it with -G, let's die
         instead.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
    @@ diff.h: int git_config_rename(const char *var, const char *value);
     
      ## t/t4209-log-pickaxe.sh ##
     @@ t/t4209-log-pickaxe.sh: test_expect_success 'usage' '
    - 	test_expect_code 128 git log -Gregex -Sstring 2>err &&
    - 	test_i18ngrep "mutually exclusive" err &&
    + 	grep "mutually exclusive" err
    + '
      
    ++test_expect_success 'usage: --pickaxe-regex' '
     +	test_expect_code 128 git log -Gregex --pickaxe-regex 2>err &&
    -+	test_i18ngrep "mutually exclusive" err &&
    ++	grep "mutually exclusive" err
    ++'
     +
    - 	test_expect_code 128 git log -Gregex --find-object=HEAD 2>err &&
    - 	test_i18ngrep "mutually exclusive" err &&
    - 
    + test_expect_success 'usage: --no-pickaxe-regex' '
    + 	cat >expect <<-\EOF &&
    + 	fatal: unrecognized argument: --no-pickaxe-regex
10:  72075874f5c !  8:  749c3ca3f98 pickaxe: die when --find-object and --pickaxe-all are combined
    @@ diff.h: int git_config_rename(const char *var, const char *value);
     
      ## t/t4209-log-pickaxe.sh ##
     @@ t/t4209-log-pickaxe.sh: test_expect_success 'usage' '
    - 	test_i18ngrep "mutually exclusive" err &&
    + 	grep "mutually exclusive" err &&
      
    - 	test_expect_code 128 git log -Gstring --find-object=HEAD 2>err &&
    -+	test_i18ngrep "mutually exclusive" err &&
    + 	test_expect_code 128 git log -Sstring --find-object=HEAD 2>err &&
    ++	grep "mutually exclusive" err &&
     +
     +	test_expect_code 128 git log --pickaxe-all --find-object=HEAD 2>err &&
    - 	test_i18ngrep "mutually exclusive" err
    + 	grep "mutually exclusive" err
      '
      
11:  f8116a2b814 =  9:  fe4e75c39d2 diff.h: move pickaxe fields together again
12:  4778357cfc7 = 10:  afe70b163a2 pickaxe/style: consolidate declarations and assignments
13:  7449a59b104 = 11:  97616d741c7 perf: add performance test for pickaxe
14:  160f7d8b0f2 ! 12:  c29deb428b1 pickaxe: refactor function selection in diffcore-pickaxe()
    @@ diffcore-pickaxe.c: void diffcore_pickaxe(struct diff_options *o)
      		regcomp_or_die(&regex, needle, cflags);
      		regexp = &regex;
     +
    -+		/* diff.c errors on -G and --pickaxe-regex for us */
     +		if (opts & DIFF_PICKAXE_KIND_G)
     +			fn = diff_grep;
     +		else if (opts & DIFF_PICKAXE_REGEX)
     +			fn = has_changes;
     +		else
    ++			/*
    ++			 * We don't need to check the combination of
    ++			 * -G and --pickaxe-regex, by the time we get
    ++			 * here diff.c has already died if they're
    ++			 * combined. See the usage tests in
    ++			 * t4209-log-pickaxe.sh.
    ++			 */
     +			BUG("unreachable");
      	} else if (opts & DIFF_PICKAXE_KIND_S) {
      		if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE &&
15:  05fc57e54e1 ! 13:  115a369d067 pickaxe: assert that we must have a needle under -G or -S
    @@ diffcore-pickaxe.c: void diffcore_pickaxe(struct diff_options *o)
      	kwset_t kws = NULL;
      	pickaxe_fn fn;
      
    -+	if (opts & ~DIFF_PICKAXE_KIND_OBJFIND && !needle)
    ++	if (opts & ~DIFF_PICKAXE_KIND_OBJFIND &&
    ++	    (!needle || !*needle))
     +		BUG("should have needle under -G or -S");
      	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
      		int cflags = REG_EXTENDED | REG_NEWLINE;
    @@ t/t4209-log-pickaxe.sh: test_expect_success setup '
     +	test_i18ngrep "switch.*requires a value" err &&
     +
      	test_expect_code 128 git log -Gregex -Sstring 2>err &&
    - 	test_i18ngrep "mutually exclusive" err &&
    + 	grep "mutually exclusive" err &&
      
16:  550620ec13b ! 14:  a86032792b6 pickaxe -S: support content with NULs under --pickaxe-regex
    @@ Commit message
         for pickaxe matching regular expressions, 2006-03-29).
     
         It wasn't needed anymore, and as the now-passing test shows, actively
    -    getting in our way.
    +    getting in our way. Since we always require REG_STARTEND support we do
    +    not need to stop at NULs. If we are dealing with a haystack with NUL
    +    in it. The needle may be behind that NUL.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ t/t4209-log-pickaxe.sh: test_expect_success 'log -S looks into binary files' '
     +	git -C GS-bin-txt log --pickaxe-regex -Sa >log &&
     +	test_cmp log full-log &&
     +
    -+	git -C GS-bin-txt log --pickaxe-regex -S[a] >log &&
    ++	git -C GS-bin-txt log --pickaxe-regex -S"[a]" >log &&
     +	test_cmp log full-log
     +'
     +
17:  985e077d561 = 15:  10f85edcff7 pickaxe: rename variables in has_changes() for brevity
18:  648e6e5f11b ! 16:  ed83c3add89 pickaxe -S: slightly optimize contains()
    @@ Commit message
         pre-image and post-image of a change. As soon as we know we had e.g. 1
         before and 2 now we can stop, we don't need to keep counting past 2.
     
    +    With this change a diff between A and B may have different performance
    +    characteristics than between B and A. That's OK in this case, since
    +    we'll emit the same output, and the effect is to make one of them
    +    better.
    +
    +    I'm picking a check of "one" first on the assumption that it's a more
    +    common case to have files grow over time than not.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## diffcore-pickaxe.c ##
19:  b991660fbf5 ! 17:  62c306275c7 xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn
    +    xdiff-interface: prepare for allowing early return
     
    -    Change the function prototype of xdiff_emit_{line,hunk}_fn to return
    -    an int instead of void. This will allow for returning early from hunk
    -    & diff consumers that want some of the data, but not all of it.
    +    Change the function prototype of xdiff_emit_line_fn to return an "int"
    +    instead of "void". Change all of those functions to "return 0",
    +    nothing checks those return values yet, and no behavior is being
    +    changed.
     
    -    No behavior is being changed here, just replacing the equivalent of
    -    "return" with "return 0", nothing acts on the changed return values
    -    yet.
    -
    -    There was some work in this area of xdiff-interface.[ch] recently with
    -    3b40a090fd4 (diff: avoid generating unused hunk header lines,
    -    2018-11-02) and 7c61e25fbf1 (diff: use hunk callback for word-diff,
    -    2018-11-02).
    -
    -    In combination those two changes allow us to not do any work on the
    -    hunks and diff at all, but didn't change the status quo with regards
    -    to consumers that e.g. want the diff lines, but might want to abort
    -    early.
    -
    -    Whereas soon we can abort e.g. on the first "-line" of a 1000 line
    -    diff if that's all we needed.
    -
    -    This interface is rather scary as noted in the comment to
    -    xdiff-interface.h being added here, but it will be useful for
    -    diffcore-pickaxe.c in a subsequent commit. A future change could
    -    e.g. add more exit codes, and hack xdl_emit_diff() and friends to
    -    ignore or skip things more selectively as a result.
    +    In subsequent commits the interface will be changed to allow early
    +    return via this new return value.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## combine-diff.c ##
    -@@ combine-diff.c: struct combine_diff_state {
    - 	struct sline *lost_bucket;
    - };
    - 
    --static void consume_hunk(void *state_,
    -+static int consume_hunk(void *state_,
    - 			 long ob, long on,
    - 			 long nb, long nn,
    - 			 const char *funcline, long funclen)
     @@ combine-diff.c: static void consume_hunk(void *state_,
    - 		state->sline[state->nb-1].p_lno =
    - 			xcalloc(state->num_parent, sizeof(unsigned long));
      	state->sline[state->nb-1].p_lno[state->n] = state->ob;
    -+
    -+	return 0;
      }
      
     -static void consume_line(void *state_, char *line, unsigned long len)
    @@ combine-diff.c: static void consume_line(void *state_, char *line, unsigned long
      static void combine_diff(struct repository *r,
     
      ## diff.c ##
    -@@ diff.c: static int color_words_output_graph_prefix(struct diff_words_data *diff_words)
    - 	}
    - }
    - 
    --static void fn_out_diff_words_aux(void *priv,
    --				  long minus_first, long minus_len,
    --				  long plus_first, long plus_len,
    --				  const char *func, long funclen)
    -+static int fn_out_diff_words_aux(void *priv,
    -+				 long minus_first, long minus_len,
    -+				 long plus_first, long plus_len,
    -+				 const char *func, long funclen)
    - {
    - 	struct diff_words_data *diff_words = priv;
    - 	struct diff_words_style *style = diff_words->style;
    -@@ diff.c: static void fn_out_diff_words_aux(void *priv,
    - 
    - 	diff_words->current_plus = plus_end;
    - 	diff_words->last_minus = minus_first;
    -+
    -+	return 0;
    - }
    - 
    - /* This function starts looking at *begin, and returns 0 iff a word was found. */
     @@ diff.c: static void find_lno(const char *line, struct emit_callback *ecbdata)
      	ecbdata->lno_in_postimage = strtol(p + 1, NULL, 10);
      }
    @@ diff.c: static void diffstat_consume(void *priv, char *line, unsigned long len)
      }
      
      const char mime_boundary_leader[] = "------------";
    -@@ diff.c: static int is_conflict_marker(const char *line, int marker_size, unsigned long l
    - 	return 1;
    - }
    - 
    --static void checkdiff_consume_hunk(void *priv,
    -+static int checkdiff_consume_hunk(void *priv,
    - 				   long ob, long on, long nb, long nn,
    - 				   const char *func, long funclen)
    - 
    - {
    - 	struct checkdiff_t *data = priv;
    +@@ diff.c: static void checkdiff_consume_hunk(void *priv,
      	data->lineno = nb - 1;
    -+	return 0;
      }
      
     -static void checkdiff_consume(void *priv, char *line, unsigned long len)
    @@ range-diff.c: static void find_exact_matches(struct string_list *a, struct strin
     +	return 0;
      }
      
    --static void diffsize_hunk(void *data, long ob, long on, long nb, long nn,
    --			  const char *funcline, long funclen)
    -+static int diffsize_hunk(void *data, long ob, long on, long nb, long nn,
    -+			 const char *funcline, long funclen)
    - {
    - 	diffsize_consume(data, NULL, 0);
    -+	return 0;
    - }
    - 
    - static int diffsize(const char *a, const char *b)
    + static void diffsize_hunk(void *data, long ob, long on, long nb, long nn,
     
      ## xdiff-interface.c ##
     @@ xdiff-interface.c: static int xdiff_out_hunk(void *priv_,
    @@ xdiff-interface.c: static void consume_one(void *priv_, char *s, unsigned long s
      }
      
      static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
    -@@ xdiff-interface.c: int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t co
    - 	return xdl_diff(&a, &b, xpp, xecfg, xecb);
    - }
    - 
    --void discard_hunk_line(void *priv,
    --		       long ob, long on, long nb, long nn,
    --		       const char *func, long funclen)
    -+int discard_hunk_line(void *priv,
    -+		      long ob, long on, long nb, long nn,
    -+		      const char *func, long funclen)
    - {
    -+	return 0;
    - }
    - 
    - int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
     
      ## xdiff-interface.h ##
     @@
    @@ xdiff-interface.h
      #define MAX_XDIFF_SIZE (1024UL * 1024 * 1023)
      
     -typedef void (*xdiff_emit_line_fn)(void *, char *, unsigned long);
    --typedef void (*xdiff_emit_hunk_fn)(void *data,
    --				   long old_begin, long old_nr,
    --				   long new_begin, long new_nr,
    --				   const char *func, long funclen);
    -+/*
    -+ * The xdiff_emit_{line,hunk}_fn consumers can return -1 to abort
    -+ * early, or 0 to continue processing. Note that doing so is an
    -+ * all-or-nothing affair, as returning -1 will return all the way to
    -+ * the top-level, e.g. the xdi_diff_outf() call to generate the diff.
    -+ *
    -+ * Thus returning -1 from a hunk header callback means you won't be
    -+ * getting any more hunks, or diffs, and likewise returning from a
    -+ * line callback means you won't be getting anymore lines.
    -+ *
    -+ * We may extend the interface in the future to understand other more
    -+ * granular return values, but for now use it carefully, or consider
    -+ * e.g. using discard_hunk_line() if you say just don't care about
    -+ * hunk headers.
    -+ */
     +typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
    -+typedef int (*xdiff_emit_hunk_fn)(void *data,
    -+				  long old_begin, long old_nr,
    -+				  long new_begin, long new_nr,
    -+				  const char *func, long funclen);
    - 
    - int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb);
    - int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
    -@@ xdiff-interface.h: extern int git_xmerge_style;
    -  * Can be used as a no-op hunk_fn for xdi_diff_outf(), since a NULL
    -  * one just sends the hunk line to the line_fn callback).
    -  */
    --void discard_hunk_line(void *priv,
    --		       long ob, long on, long nb, long nn,
    --		       const char *func, long funclen);
    -+int discard_hunk_line(void *priv,
    -+		      long ob, long on, long nb, long nn,
    -+		      const char *func, long funclen);
    - 
    - /*
    -  * Compare the strings l1 with l2 which are of size s1 and s2 respectively.
    + typedef void (*xdiff_emit_hunk_fn)(void *data,
    + 				   long old_begin, long old_nr,
    + 				   long new_begin, long new_nr,
20:  69b061832b3 ! 18:  76d667f152f xdiff-interface: support early exit in xdiff_outf()
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    xdiff-interface: support early exit in xdiff_outf()
    +    xdiff-interface: allow early return from xdiff_emit_line_fn
     
    -    Bridge the gap between the preceding "xdiff-interface: allow early
    -    return from xdiff_emit_{line,hunk}_fn" change and the public
    -    interface.
    +    Finish the change started in the preceding commit and allow an early
    +    return from "xdiff_emit_line_fn" callbacks, this will allows
    +    diffcore-pickaxe.c to save itself redundant work.
     
    -    This change was split off from the preceding commit as it wasn't a
    -    purely mechanical addition of "return 0".
    +    Our xdiff interface also had the limitation of not being able to abort
    +    early since the beginning, see d9ea73e0564 (combine-diff: refactor
    +    built-in xdiff interface., 2006-04-05). Although at that time
    +    "xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
    +    "xdiff_emit_hunk_fn" didn't exist yet.
     
    -    Here we want to be able to abort early, but do so in a way that
    -    doesn't skip the appropriate strbuf_reset() invocations.
    +    There was some work in this area of xdiff-interface.[ch] recently with
    +    3b40a090fd4 (diff: avoid generating unused hunk header lines,
    +    2018-11-02) and 7c61e25fbf1 (diff: use hunk callback for word-diff,
    +    2018-11-02).
    +
    +    In combination those two changes allow us to not do any work on the
    +    hunks and diff at all, but didn't change the status quo with regards
    +    to consumers that e.g. want the diff lines, but might want to abort
    +    early.
    +
    +    Whereas now we can abort e.g. on the first "-line" of a 1000 line diff
    +    if that's all we needed.
    +
    +    This interface is rather scary as noted in the comment to
    +    xdiff-interface.h being added here, as noted there a future change
    +    could add more exit codes, and hack xdl_emit_diff() and friends to
    +    ignore or skip things more selectively as a result.
    +
    +    I did not see an inherent reason for why xdl_emit_{diffrec,record}()
    +    could not be changed to ferry the "xdiff_emit_line_fn" error code
    +    upwards instead of returning -1 on all "ret < 0".
     
    -    The use of -1 as a return value is consistent with the rest of the
    -    xdiff codebase, where doing so signals an abort or error that'll
    -    propagate up the stack.
    +    But doing so would require corresponding changes in xdl_emit_diff(),
    +    xdl_diff(). I didn't see any issue with narrowly doing that to
    +    accomplish what I needed here, but it would leave xdiff's own return
    +    values in an inconsistent state.
    +
    +    Instead I've left it at returning a more conventional (for git's own
    +    codebase) 1 for an early return, and translating it (or rather, all
    +    non-zero) to -1 for xdiff's consumption.
    +
    +    The reason for most of the "stop" complexity in xdiff_outf() is
    +    because we want to be able to abort early, but do so in a way that
    +    doesn't skip the appropriate strbuf_reset() invocations.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## xdiff-interface.c ##
     @@ xdiff-interface.c: static int consume_one(void *priv_, char *s, unsigned long size)
    + 	char *ep;
    + 	while (size) {
      		unsigned long this_size;
    ++		int ret;
      		ep = memchr(s, '\n', size);
      		this_size = (ep == NULL) ? size : (ep - s + 1);
     -		priv->line_fn(priv->consume_callback_data, s, this_size);
    -+		if (priv->line_fn(priv->consume_callback_data, s, this_size))
    -+			return -1;
    ++		ret = priv->line_fn(priv->consume_callback_data, s, this_size);
    ++		if (ret)
    ++			return ret;
      		size -= this_size;
      		s += this_size;
      	}
    @@ xdiff-interface.c: static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
      
      	for (i = 0; i < nbuf; i++) {
     +		if (stop)
    -+			return -1;
    ++			return 1;
      		if (mb[i].ptr[mb[i].size-1] != '\n') {
      			/* Incomplete line */
      			strbuf_add(&priv->remainder, mb[i].ptr, mb[i].size);
    @@ xdiff-interface.c: static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
      	return 0;
      }
      
    +
    + ## xdiff-interface.h ##
    +@@
    +  */
    + #define MAX_XDIFF_SIZE (1024UL * 1024 * 1023)
    + 
    ++/**
    ++ * The `xdiff_emit_line_fn` function can return 1 to abort early, or 0
    ++ * to continue processing. Note that doing so is an all-or-nothing
    ++ * affair, as returning 1 will return all the way to the top-level,
    ++ * e.g. the xdi_diff_outf() call to generate the diff.
    ++ *
    ++ * Thus returning 1 means you won't be getting any more diff lines. If
    ++ * you need something in-between those two options you'll to use
    ++ * `xdl_emit_hunk_consume_func_t` and implement your own version of
    ++ * xdl_emit_diff().
    ++ *
    ++ * We may extend the interface in the future to understand other more
    ++ * granular return values. While you should return 1 to exit early,
    ++ * doing so will currently make your early return indistinguishable
    ++ * from an error internal to xdiff, xdiff itself will see that
    ++ * non-zero return and translate it to -1.
    ++ */
    + typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
    + typedef void (*xdiff_emit_hunk_fn)(void *data,
    + 				   long old_begin, long old_nr,
21:  fc0aa61d093 ! 19:  53e9405f849 pickaxe -G: terminate early on matching lines
    @@ Commit message
     
         Solve a long-standing item for "git log -Grx" of us e.g. finding "+
         str" in the diff context and noting that we had a "hit", but xdiff
    -    diligently continuing to generate and spew the rest of the diff at us.
    +    diligently continuing to generate and spew the rest of the diff at
    +    us. This makes use of a new "early return" xdiff interface added by
    +    preceding commits.
     
    -    The TODO item has been there since "git log -G" was implemented. See
    -    f506b8e8b5f (git log/diff: add -G<regexp> that greps in the patch
    -    text, 2010-08-23).
    +    The TODO item (or, the NEEDSWORK comment) has been there since "git
    +    log -G" was implemented. See f506b8e8b5f (git log/diff: add -G<regexp>
    +    that greps in the patch text, 2010-08-23).
     
    -    Our xdiff interface also had the limitation of not being able to abort
    -    early since the beginning, see d9ea73e0564 (combine-diff: refactor
    -    built-in xdiff interface., 2006-04-05). Although at that time
    -    "xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
    -    "xdiff_emit_hunk_fn" didn't exist yet.
    -
    -    But now with the support added in the preceding ""xdiff-interface:
    -    allow early return from xdiff_emit_{line,hunk}_fn" commit we can
    -    return early, and furthermore test the functionality of the new
    -    early-exit xdiff-interface by having a BUG() call here to die if it
    -    ever starts handing us needless work again.
    +    But now with the support added in the preceding changes to the
    +    xdiff-interface we can return early. Let's assert the behavior of that
    +    new early-return xdiff-interface by having a BUG() call here to die if
    +    it ever starts handing us needless work again.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ diffcore-pickaxe.c: static int diffgrep_consume(void *priv, char *line, unsigned
     +	if (!regexec_buf(data->regexp, line + 1, len - 1, 1,
     +			 &regmatch, 0)) {
     +		data->hit = 1;
    -+		return -1;
    ++		return 1;
     +	}
      	return 0;
      }
    @@ diffcore-pickaxe.c: static int diff_grep(mmfile_t *one, mmfile_t *two,
     +
     +	/*
     +	 * An xdiff error might be our "data->hit" from above. See the
    -+	 * comment for xdiff_emit_{line,hunk}_fn in xdiff-interface.h
    -+	 * for why.
    ++	 * comment for xdiff_emit_line_fn in xdiff-interface.h
     +	 */
     +	ret = xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
     +			    &ecbdata, &xpp, &xecfg);
    @@ diffcore-pickaxe.c: static int diff_grep(mmfile_t *one, mmfile_t *two,
     
      ## xdiff-interface.h ##
     @@
    -  * granular return values, but for now use it carefully, or consider
    -  * e.g. using discard_hunk_line() if you say just don't care about
    -  * hunk headers.
    +  * doing so will currently make your early return indistinguishable
    +  * from an error internal to xdiff, xdiff itself will see that
    +  * non-zero return and translate it to -1.
     + *
    -+ * Note that just returning -1 will make your early return
    -+ * indistinguishable from an error internal to xdiff. See "diff_grep"
    -+ * in diffcore-pickaxe.c for a trick to work around this, i.e. using
    -+ * the "consume_callback_data" to note the desired early return.
    ++ * See "diff_grep" in diffcore-pickaxe.c for a trick to work around
    ++ * this, i.e. using the "consume_callback_data" to note the desired
    ++ * early return.
       */
      typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
    - typedef int (*xdiff_emit_hunk_fn)(void *data,
    + typedef void (*xdiff_emit_hunk_fn)(void *data,
22:  c81b18ca4c7 = 20:  76de6ebc8b8 pickaxe -G: don't special-case create/delete
 -:  ----------- > 21:  9bb7ac910f3 xdiff users: use designated initializers for out_line
 -:  ----------- > 22:  1178956fb3d xdiff-interface: replace discard_hunk_line() with a flag
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 01/22] grep/pcre2 tests: reword comments referring to kwset
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 02/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
                       ` (20 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

The kwset optimization has not been used by grep since
48de2a768cf (grep: remove the kwset optimization, 2019-07-01).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7816-grep-binary-pattern.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t7816-grep-binary-pattern.sh b/t/t7816-grep-binary-pattern.sh
index 60bab291e49..9d67a5fc4cf 100755
--- a/t/t7816-grep-binary-pattern.sh
+++ b/t/t7816-grep-binary-pattern.sh
@@ -59,7 +59,7 @@ test_expect_success 'setup' "
 	git commit -m.
 "
 
-# Simple fixed-string matching that can use kwset (no -i && non-ASCII)
+# Simple fixed-string matching
 nul_match P P P '-F' 'yQf'
 nul_match P P P '-F' 'yQx'
 nul_match P P P '-Fi' 'YQf'
@@ -78,7 +78,7 @@ nul_match P P P '-Fi' '[Y]QF'
 nul_match P P P '-F' 'æQ[ð]'
 nul_match P P P '-F' '[æ]Qð'
 
-# The -F kwset codepath can't handle -i && non-ASCII...
+# Matching pattern and subject case with -i
 nul_match P 1 1 '-i' '[æ]Qð'
 
 # ...PCRE v2 only matches non-ASCII with -i casefolding under UTF-8
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 02/22] pickaxe tests: refactor to use test_commit --append --printf
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 03/22] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
                       ` (19 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Refactor the existing tests added in e0e7cb8080c (log -G: ignore
binary files, 2018-12-14) to use the --append option I added in
3373518cc8b (test-lib functions: add an --append option to
test_commit, 2021-01-12) and the --printf option added as part of an
in-flight topic of mine this commit depends on.

While I'm at it change some of the setup of the test to use a more
sensible pattern, e.g. setting up a temporary repo instead of creating
an orphan branch.

Since the -G and -S options will behave the same way with truncated
and removed content also change the "git rm" to emptying data.bin,
that's just catering to how test_commit works. The resulting test is
shorter.

See also f5d79bf7dd6 (tests: refactor a few tests to use "test_commit
--append", 2021-01-12) for prior similar refactoring.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4209-log-pickaxe.sh | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 5d06f5f45ea..ad45d8cfd0a 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -107,37 +107,35 @@ test_expect_success 'log -S --no-textconv (missing textconv tool)' '
 '
 
 test_expect_success 'setup log -[GS] binary & --text' '
-	git checkout --orphan GS-binary-and-text &&
-	git read-tree --empty &&
-	printf "a\na\0a\n" >data.bin &&
-	git add data.bin &&
-	git commit -m "create binary file" data.bin &&
-	printf "a\na\0a\n" >>data.bin &&
-	git commit -m "modify binary file" data.bin &&
-	git rm data.bin &&
-	git commit -m "delete binary file" data.bin &&
-	git log >full-log
+	test_create_repo GS-bin-txt &&
+	test_commit -C GS-bin-txt --printf A data.bin "a\na\0a\n" &&
+	test_commit -C GS-bin-txt --append --printf B data.bin "a\na\0a\n" &&
+	test_commit -C GS-bin-txt C data.bin "" &&
+	git -C GS-bin-txt log >full-log
 '
 
 test_expect_success 'log -G ignores binary files' '
-	git log -Ga >log &&
+	git -C GS-bin-txt log -Ga >log &&
 	test_must_be_empty log
 '
 
 test_expect_success 'log -G looks into binary files with -a' '
-	git log -a -Ga >log &&
+	git -C GS-bin-txt log -a -Ga >log &&
 	test_cmp log full-log
 '
 
 test_expect_success 'log -G looks into binary files with textconv filter' '
-	test_when_finished "rm .gitattributes" &&
-	echo "* diff=bin" >.gitattributes &&
-	git -c diff.bin.textconv=cat log -Ga >log &&
+	test_when_finished "rm GS-bin-txt/.gitattributes" &&
+	(
+		cd GS-bin-txt &&
+		echo "* diff=bin" >.gitattributes &&
+		git -c diff.bin.textconv=cat log -Ga >../log
+	) &&
 	test_cmp log full-log
 '
 
 test_expect_success 'log -S looks into binary files' '
-	git log -Sa >log &&
+	git -C GS-bin-txt log -Sa >log &&
 	test_cmp log full-log
 '
 
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 03/22] pickaxe tests: add test for diffgrep_consume() internals
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 02/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 04/22] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
                       ` (18 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

In diffgrep_consume() we generate a diff, and then advance past the
"+" or "-" at the start of the line for matching. This has been done
ever since the code was added in f506b8e8b5f (git log/diff: add
-G<regexp> that greps in the patch text, 2010-08-23).

If we match "line" instead of "line + 1" no tests fail, i.e. we've got
zero coverage for whether any of our searches match the beginning of
the line or not. Let's add a test for this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4209-log-pickaxe.sh | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index ad45d8cfd0a..eacb9f0a1b5 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -106,6 +106,21 @@ test_expect_success 'log -S --no-textconv (missing textconv tool)' '
 	rm .gitattributes
 '
 
+test_expect_success 'setup log -[GS] plain' '
+	test_create_repo GS-plain &&
+	test_commit -C GS-plain --append A data.txt "a" &&
+	test_commit -C GS-plain --append B data.txt "a a" &&
+	test_commit -C GS-plain C data.txt "" &&
+	git -C GS-plain log >full-log
+'
+
+test_expect_success 'log -G trims diff new/old [-+]' '
+	git -C GS-plain log -G"[+-]a" >log &&
+	test_must_be_empty log &&
+	git -C GS-plain log -G"^a" >log &&
+	test_cmp log full-log
+'
+
 test_expect_success 'setup log -[GS] binary & --text' '
 	test_create_repo GS-bin-txt &&
 	test_commit -C GS-bin-txt --printf A data.bin "a\na\0a\n" &&
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 04/22] pickaxe tests: add test for "log -S" not being a regex
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (2 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 03/22] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 05/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
                       ` (17 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

No test in our test suite checked for "log -S<pat>" being a fixed
string, as opposed to "log -S<pat> --pickaxe-regex". Let's test for
it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4209-log-pickaxe.sh | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index eacb9f0a1b5..9fa770b5fbd 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -106,11 +106,18 @@ test_expect_success 'log -S --no-textconv (missing textconv tool)' '
 	rm .gitattributes
 '
 
-test_expect_success 'setup log -[GS] plain' '
+test_expect_success 'setup log -[GS] plain & regex' '
 	test_create_repo GS-plain &&
 	test_commit -C GS-plain --append A data.txt "a" &&
 	test_commit -C GS-plain --append B data.txt "a a" &&
-	test_commit -C GS-plain C data.txt "" &&
+	test_commit -C GS-plain --append C data.txt "b" &&
+	test_commit -C GS-plain --append D data.txt "[b]" &&
+	test_commit -C GS-plain E data.txt "" &&
+
+	# We also include E, the deletion commit
+	git -C GS-plain log --grep="[ABE]" >A-to-B-then-E-log &&
+	git -C GS-plain log --grep="[CDE]" >C-to-D-then-E-log &&
+	git -C GS-plain log --grep="[DE]" >D-then-E-log &&
 	git -C GS-plain log >full-log
 '
 
@@ -118,7 +125,24 @@ test_expect_success 'log -G trims diff new/old [-+]' '
 	git -C GS-plain log -G"[+-]a" >log &&
 	test_must_be_empty log &&
 	git -C GS-plain log -G"^a" >log &&
-	test_cmp log full-log
+	test_cmp log A-to-B-then-E-log
+'
+
+test_expect_success 'log -S<pat> is not a regex, but -S<pat> --pickaxe-regex is' '
+	git -C GS-plain log -S"a" >log &&
+	test_cmp log A-to-B-then-E-log &&
+
+	git -C GS-plain log -S"[a]" >log &&
+	test_must_be_empty log &&
+
+	git -C GS-plain log -S"[a]" --pickaxe-regex >log &&
+	test_cmp log A-to-B-then-E-log &&
+
+	git -C GS-plain log -S"[b]" >log &&
+	test_cmp log D-then-E-log &&
+
+	git -C GS-plain log -S"[b]" --pickaxe-regex >log &&
+	test_cmp log C-to-D-then-E-log
 '
 
 test_expect_success 'setup log -[GS] binary & --text' '
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 05/22] pickaxe tests: test for -G, -S and --find-object incompatibility
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (3 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 04/22] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 06/22] pickaxe tests: add missing test for --no-pickaxe-regex being an error Ævar Arnfjörð Bjarmason
                       ` (16 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Add a test for the options sanity check added in 5e505257f2 (diff:
properly error out when combining multiple pickaxe options,
2018-01-04).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4209-log-pickaxe.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 9fa770b5fbd..21e22af1e7e 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -55,6 +55,17 @@ test_expect_success setup '
 	git rev-parse --verify HEAD >expect_second
 '
 
+test_expect_success 'usage' '
+	test_expect_code 128 git log -Gregex -Sstring 2>err &&
+	grep "mutually exclusive" err &&
+
+	test_expect_code 128 git log -Gregex --find-object=HEAD 2>err &&
+	grep "mutually exclusive" err &&
+
+	test_expect_code 128 git log -Sstring --find-object=HEAD 2>err &&
+	grep "mutually exclusive" err
+'
+
 test_log	expect_initial	--grep initial
 test_log	expect_nomatch	--grep InItial
 test_log_icase	expect_initial	--grep InItial
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 06/22] pickaxe tests: add missing test for --no-pickaxe-regex being an error
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (4 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 05/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 07/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
                       ` (15 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Add a missing test for --no-pickaxe-regex. This has been an error ever
since before the -S or -G options were added, or since
7ae0b0cb65f (git-log (internal): more options., 2006-03-01).

The reason for adding this test is that Junio suggested in [1] in
response to a later test addition in this series that it might be good
to support --no-pickaxe-regex in combination with -G. This would allow
for fixed-string searching with -G, similr to grep's --fixed-strings
mode.

I agree that that would make sense if anyone would like to implement
it, but since it dies right now let's first add this test to assert
the existing long-standing behavior. We can always add support for
--[no-]pickaxe-regex in combination with -G at some later date.

1. http://lore.kernel.org/git/xmqqwnto9pt7.fsf@gitster.g

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4209-log-pickaxe.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 21e22af1e7e..532bb875f02 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -66,6 +66,18 @@ test_expect_success 'usage' '
 	grep "mutually exclusive" err
 '
 
+test_expect_success 'usage: --no-pickaxe-regex' '
+	cat >expect <<-\EOF &&
+	fatal: unrecognized argument: --no-pickaxe-regex
+	EOF
+
+	test_expect_code 128 git log -Sstring --no-pickaxe-regex 2>actual &&
+	test_cmp expect actual &&
+
+	test_expect_code 128 git log -Gstring --no-pickaxe-regex 2>err &&
+	test_cmp expect actual
+'
+
 test_log	expect_initial	--grep initial
 test_log	expect_nomatch	--grep InItial
 test_log_icase	expect_initial	--grep InItial
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 07/22] pickaxe: die when -G and --pickaxe-regex are combined
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (5 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 06/22] pickaxe tests: add missing test for --no-pickaxe-regex being an error Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 08/22] pickaxe: die when --find-object and --pickaxe-all " Ævar Arnfjörð Bjarmason
                       ` (14 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

When the -G and --pickaxe-regex options are combined we simply ignore
the --pickaxe-regex option. Let's die instead as suggested by our
documentation, since -G is always a regex.

When --pickaxe-regex was added in d01d8c6782 (Support for pickaxe
matching regular expressions, 2006-03-29) only the -S option
existed. Then when -G was added in f506b8e8b5 (git log/diff: add
-G<regexp> that greps in the patch text, 2010-08-23) neither the
documentation for --pickaxe-regex was updated accordingly, nor was
something like this assertion added.

Since 5bc3f0b567 (diffcore-pickaxe doc: document -S and -G properly,
2013-05-31) we've claimed that --pickaxe-regex should only be used
with -S, but have silently tolerated combining it with -G, let's die
instead.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.c                 | 3 +++
 diff.h                 | 2 ++
 t/t4209-log-pickaxe.sh | 5 +++++
 3 files changed, 10 insertions(+)

diff --git a/diff.c b/diff.c
index 4acccd9d7ed..f9e86bca04e 100644
--- a/diff.c
+++ b/diff.c
@@ -4628,6 +4628,9 @@ void diff_setup_done(struct diff_options *options)
 	if (HAS_MULTI_BITS(options->pickaxe_opts & DIFF_PICKAXE_KINDS_MASK))
 		die(_("-G, -S and --find-object are mutually exclusive"));
 
+	if (HAS_MULTI_BITS(options->pickaxe_opts & DIFF_PICKAXE_KINDS_G_REGEX_MASK))
+		die(_("-G and --pickaxe-regex are mutually exclusive, use --pickaxe-regex with -S"));
+
 	/*
 	 * Most of the time we can say "there are changes"
 	 * only by checking if there are changed paths, but
diff --git a/diff.h b/diff.h
index c8f3faea8aa..5e110d349be 100644
--- a/diff.h
+++ b/diff.h
@@ -556,6 +556,8 @@ int git_config_rename(const char *var, const char *value);
 #define DIFF_PICKAXE_KINDS_MASK (DIFF_PICKAXE_KIND_S | \
 				 DIFF_PICKAXE_KIND_G | \
 				 DIFF_PICKAXE_KIND_OBJFIND)
+#define DIFF_PICKAXE_KINDS_G_REGEX_MASK (DIFF_PICKAXE_KIND_G | \
+					 DIFF_PICKAXE_REGEX)
 
 #define DIFF_PICKAXE_IGNORE_CASE	32
 
diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 532bb875f02..772c6c1a7c8 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -66,6 +66,11 @@ test_expect_success 'usage' '
 	grep "mutually exclusive" err
 '
 
+test_expect_success 'usage: --pickaxe-regex' '
+	test_expect_code 128 git log -Gregex --pickaxe-regex 2>err &&
+	grep "mutually exclusive" err
+'
+
 test_expect_success 'usage: --no-pickaxe-regex' '
 	cat >expect <<-\EOF &&
 	fatal: unrecognized argument: --no-pickaxe-regex
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 08/22] pickaxe: die when --find-object and --pickaxe-all are combined
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (6 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 07/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 09/22] diff.h: move pickaxe fields together again Ævar Arnfjörð Bjarmason
                       ` (13 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Neither the --pickaxe-all documentation nor --find-object's has ever
suggested that you can combine the two. See f506b8e8b5 (git log/diff:
add -G<regexp> that greps in the patch text, 2010-08-23) and
15af58c1ad (diffcore: add a pickaxe option to find a specific blob,
2018-01-04).

But we've silently tolerated it, which makes the logic in
diffcore_pickaxe() harder to reason about. Let's assert that we won't
have the two combined.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.c                 | 3 +++
 diff.h                 | 2 ++
 t/t4209-log-pickaxe.sh | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/diff.c b/diff.c
index f9e86bca04e..c1f47a7f013 100644
--- a/diff.c
+++ b/diff.c
@@ -4631,6 +4631,9 @@ void diff_setup_done(struct diff_options *options)
 	if (HAS_MULTI_BITS(options->pickaxe_opts & DIFF_PICKAXE_KINDS_G_REGEX_MASK))
 		die(_("-G and --pickaxe-regex are mutually exclusive, use --pickaxe-regex with -S"));
 
+	if (HAS_MULTI_BITS(options->pickaxe_opts & DIFF_PICKAXE_KINDS_ALL_OBJFIND_MASK))
+		die(_("---pickaxe-all and --find-object are mutually exclusive, use --pickaxe-all with -G and -S"));
+
 	/*
 	 * Most of the time we can say "there are changes"
 	 * only by checking if there are changed paths, but
diff --git a/diff.h b/diff.h
index 5e110d349be..82254396f95 100644
--- a/diff.h
+++ b/diff.h
@@ -558,6 +558,8 @@ int git_config_rename(const char *var, const char *value);
 				 DIFF_PICKAXE_KIND_OBJFIND)
 #define DIFF_PICKAXE_KINDS_G_REGEX_MASK (DIFF_PICKAXE_KIND_G | \
 					 DIFF_PICKAXE_REGEX)
+#define DIFF_PICKAXE_KINDS_ALL_OBJFIND_MASK (DIFF_PICKAXE_ALL | \
+					     DIFF_PICKAXE_KIND_OBJFIND)
 
 #define DIFF_PICKAXE_IGNORE_CASE	32
 
diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 772c6c1a7c8..16166ffd3e6 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -63,6 +63,9 @@ test_expect_success 'usage' '
 	grep "mutually exclusive" err &&
 
 	test_expect_code 128 git log -Sstring --find-object=HEAD 2>err &&
+	grep "mutually exclusive" err &&
+
+	test_expect_code 128 git log --pickaxe-all --find-object=HEAD 2>err &&
 	grep "mutually exclusive" err
 '
 
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 09/22] diff.h: move pickaxe fields together again
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (7 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 08/22] pickaxe: die when --find-object and --pickaxe-all " Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 10/22] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
                       ` (12 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Move the pickaxe and pickaxe_opts fields next to each other again. In
a past life they'd been on adjacent lines, but when they got moved
from a global variable to the diff_options struct in 6b5ee137e5 (Diff
clean-up., 2005-09-21) they got split apart.

That split made sense at the time, the "char*" and "int" (flags)
options were being grouped, but we've long since abandoned that
pattern in the diff_options struct, and now it makes more sense to
group these together again.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/diff.h b/diff.h
index 82254396f95..8ba85c5e605 100644
--- a/diff.h
+++ b/diff.h
@@ -265,6 +265,7 @@ struct diff_options {
 	 * postimage of the diff_queue.
 	 */
 	const char *pickaxe;
+	unsigned pickaxe_opts;
 
 	/* -I<regex> */
 	regex_t **ignore_regex;
@@ -304,8 +305,6 @@ struct diff_options {
 	/* The output format used when `diff_flush()` is run. */
 	int output_format;
 
-	unsigned pickaxe_opts;
-
 	/* Affects the way detection logic for complete rewrites, renames and
 	 * copies.
 	 */
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 10/22] pickaxe/style: consolidate declarations and assignments
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (8 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 09/22] diff.h: move pickaxe fields together again Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 11/22] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
                       ` (11 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Refactor contains() to do its assignments at the same time that it
does its declarations.

This code could have been refactored in ef90ab66e8e (pickaxe: use
textconv for -S counting, 2012-10-28) when a function call between the
declarations and assignments was removed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index a9c6d60df22..a278b9b71d9 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -70,13 +70,9 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 
 static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
 {
-	unsigned int cnt;
-	unsigned long sz;
-	const char *data;
-
-	sz = mf->size;
-	data = mf->ptr;
-	cnt = 0;
+	unsigned int cnt = 0;
+	unsigned long sz = mf->size;
+	const char *data = mf->ptr;
 
 	if (regexp) {
 		regmatch_t regmatch;
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 11/22] perf: add performance test for pickaxe
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (9 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 10/22] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 12/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
                       ` (10 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Add a test for the -G and -S pickaxe options and related options.

This test supports being run with GIT_TEST_LONG=1 to adjust the limit
on the number of commits from 1k to 10k. The 1k limit seems to hit a
good spot on git.git

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/perf/p4209-pickaxe.sh | 70 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)
 create mode 100755 t/perf/p4209-pickaxe.sh

diff --git a/t/perf/p4209-pickaxe.sh b/t/perf/p4209-pickaxe.sh
new file mode 100755
index 00000000000..f585a4465ae
--- /dev/null
+++ b/t/perf/p4209-pickaxe.sh
@@ -0,0 +1,70 @@
+#!/bin/sh
+
+test_description="Test pickaxe performance"
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+# Not --max-count, as that's the number of matching commit, so it's
+# unbounded. We want to limit our revision walk here.
+from_rev_desc=
+from_rev=
+max_count=1000
+if test_have_prereq EXPENSIVE
+then
+	max_count=10000
+fi
+from_rev=" $(git rev-list HEAD | head -n $max_count | tail -n 1).."
+from_rev_desc=" <limit-rev>.."
+
+for icase in \
+	'' \
+	'-i '
+do
+	# -S (no regex)
+	for pattern in \
+		'int main' \
+		'æ'
+	do
+		for opts in \
+			'-S'
+		do
+			test_perf "git log $icase$opts'$pattern'$from_rev_desc" "
+				git log --pretty=format:%H $icase$opts'$pattern'$from_rev
+			"
+		done
+	done
+
+	# -S (regex)
+	for pattern in  \
+		'(int|void|null)' \
+		'if *\([^ ]+ & ' \
+		'[àáâãäåæñøùúûüýþ]'
+	do
+		for opts in \
+			'--pickaxe-regex -S'
+		do
+			test_perf "git log $icase$opts'$pattern'$from_rev_desc" "
+				git log --pretty=format:%H $icase$opts'$pattern'$from_rev
+			"
+		done
+	done
+
+	# -G
+	for pattern in  \
+		'(int|void|null)' \
+		'if *\([^ ]+ & ' \
+		'[àáâãäåæñøùúûüýþ]'
+	do
+		for opts in \
+			'-G'
+		do
+			test_perf "git log $icase$opts'$pattern'$from_rev_desc" "
+				git log --pretty=format:%H $icase$opts'$pattern'$from_rev
+			"
+		done
+	done
+done
+
+test_done
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 12/22] pickaxe: refactor function selection in diffcore-pickaxe()
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (10 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 11/22] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 13/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
                       ` (9 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

It's hard to read this codepath at a glance and reason about exactly
what combination of -G and -S will compile either regexes or kwset,
and whether we'll then dispatch to "diff_grep" or "has_changes".

Then in the "--find-object" case we aren't using the callback
function, but were previously passing down "has_changes".

Refactor this code to exhaustively check "opts", it's now more obvious
what callback function (or none) we want under what mode.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index a278b9b71d9..953b6ec1b4a 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -228,6 +228,7 @@ void diffcore_pickaxe(struct diff_options *o)
 	int opts = o->pickaxe_opts;
 	regex_t regex, *regexp = NULL;
 	kwset_t kws = NULL;
+	pickaxe_fn fn;
 
 	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
 		int cflags = REG_EXTENDED | REG_NEWLINE;
@@ -235,6 +236,20 @@ void diffcore_pickaxe(struct diff_options *o)
 			cflags |= REG_ICASE;
 		regcomp_or_die(&regex, needle, cflags);
 		regexp = &regex;
+
+		if (opts & DIFF_PICKAXE_KIND_G)
+			fn = diff_grep;
+		else if (opts & DIFF_PICKAXE_REGEX)
+			fn = has_changes;
+		else
+			/*
+			 * We don't need to check the combination of
+			 * -G and --pickaxe-regex, by the time we get
+			 * here diff.c has already died if they're
+			 * combined. See the usage tests in
+			 * t4209-log-pickaxe.sh.
+			 */
+			BUG("unreachable");
 	} else if (opts & DIFF_PICKAXE_KIND_S) {
 		if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE &&
 		    has_non_ascii(needle)) {
@@ -251,10 +266,14 @@ void diffcore_pickaxe(struct diff_options *o)
 			kwsincr(kws, needle, strlen(needle));
 			kwsprep(kws);
 		}
+		fn = has_changes;
+	} else if (opts & DIFF_PICKAXE_KIND_OBJFIND) {
+		fn = NULL;
+	} else {
+		BUG("unknown pickaxe_opts flag");
 	}
 
-	pickaxe(&diff_queued_diff, o, regexp, kws,
-		(opts & DIFF_PICKAXE_KIND_G) ? diff_grep : has_changes);
+	pickaxe(&diff_queued_diff, o, regexp, kws, fn);
 
 	if (regexp)
 		regfree(regexp);
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 13/22] pickaxe: assert that we must have a needle under -G or -S
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (11 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 12/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 14/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
                       ` (8 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Assert early in diffcore_pickaxe() that we've got a needle to work
with under -G and -S.

This code is redundant to the check -G and -S get from
parse-options.c's get_arg(), which I'm adding a test for.

This check dates back to e1b161161d (diffcore-pickaxe: fix infinite
loop on zero-length needle, 2007-01-25) when "git log -S" could send
this code into an infinite loop.

It was then later refactored in 8fa4b09fb1 (pickaxe: hoist empty
needle check, 2012-10-28) into its current form, but it seemingly
wasn't noticed that in the meantime a move to the parse-options.c API
in dea007fb4c (diff: parse separate options like -S foo, 2010-08-05)
had made it redundant.

Let's retain some of the paranoia here with a BUG(), but there's no
need to be checking this in the pickaxe_match() inner loop.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c     | 6 +++---
 t/t4209-log-pickaxe.sh | 6 ++++++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 953b6ec1b4a..88b6ca840f6 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -132,9 +132,6 @@ static int pickaxe_match(struct diff_filepair *p, struct diff_options *o,
 			 oidset_contains(o->objfind, &p->two->oid));
 	}
 
-	if (!o->pickaxe[0])
-		return 0;
-
 	if (o->flags.allow_textconv) {
 		textconv_one = get_textconv(o->repo, p->one);
 		textconv_two = get_textconv(o->repo, p->two);
@@ -230,6 +227,9 @@ void diffcore_pickaxe(struct diff_options *o)
 	kwset_t kws = NULL;
 	pickaxe_fn fn;
 
+	if (opts & ~DIFF_PICKAXE_KIND_OBJFIND &&
+	    (!needle || !*needle))
+		BUG("should have needle under -G or -S");
 	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
 		int cflags = REG_EXTENDED | REG_NEWLINE;
 		if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE)
diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 16166ffd3e6..3f9aad0fdb0 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -56,6 +56,12 @@ test_expect_success setup '
 '
 
 test_expect_success 'usage' '
+	test_expect_code 129 git log -S 2>err &&
+	test_i18ngrep "switch.*requires a value" err &&
+
+	test_expect_code 129 git log -G 2>err &&
+	test_i18ngrep "switch.*requires a value" err &&
+
 	test_expect_code 128 git log -Gregex -Sstring 2>err &&
 	grep "mutually exclusive" err &&
 
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 14/22] pickaxe -S: support content with NULs under --pickaxe-regex
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (12 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 13/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 15/22] pickaxe: rename variables in has_changes() for brevity Ævar Arnfjörð Bjarmason
                       ` (7 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Fix a bug in the matching routine powering -S<rx> --pickaxe-regex so
that we won't abort early on content that has NULs in it.

We've had a hard requirement on REG_STARTEND since 2f8952250a8 (regex:
add regexec_buf() that can work on a non NUL-terminated string,
2016-09-21), but this sanity check dates back to d01d8c67828 (Support
for pickaxe matching regular expressions, 2006-03-29).

It wasn't needed anymore, and as the now-passing test shows, actively
getting in our way. Since we always require REG_STARTEND support we do
not need to stop at NULs. If we are dealing with a haystack with NUL
in it. The needle may be behind that NUL.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c     | 4 ++--
 t/t4209-log-pickaxe.sh | 8 ++++++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 88b6ca840f6..be0dd683b63 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -78,12 +78,12 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
 		regmatch_t regmatch;
 		int flags = 0;
 
-		while (sz && *data &&
+		while (sz &&
 		       !regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
 			flags |= REG_NOTBOL;
 			data += regmatch.rm_eo;
 			sz -= regmatch.rm_eo;
-			if (sz && *data && regmatch.rm_so == regmatch.rm_eo) {
+			if (sz && regmatch.rm_so == regmatch.rm_eo) {
 				data++;
 				sz--;
 			}
diff --git a/t/t4209-log-pickaxe.sh b/t/t4209-log-pickaxe.sh
index 3f9aad0fdb0..75795d0b492 100755
--- a/t/t4209-log-pickaxe.sh
+++ b/t/t4209-log-pickaxe.sh
@@ -215,4 +215,12 @@ test_expect_success 'log -S looks into binary files' '
 	test_cmp log full-log
 '
 
+test_expect_success 'log -S --pickaxe-regex looks into binary files' '
+	git -C GS-bin-txt log --pickaxe-regex -Sa >log &&
+	test_cmp log full-log &&
+
+	git -C GS-bin-txt log --pickaxe-regex -S"[a]" >log &&
+	test_cmp log full-log
+'
+
 test_done
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 15/22] pickaxe: rename variables in has_changes() for brevity
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (13 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 14/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 16/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
                       ` (6 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Rename the {one,two}_contains variables to c{1,2}. This will make a
follow-up change easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index be0dd683b63..23362a23597 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -108,9 +108,9 @@ static int has_changes(mmfile_t *one, mmfile_t *two,
 		       struct diff_options *o,
 		       regex_t *regexp, kwset_t kws)
 {
-	unsigned int one_contains = one ? contains(one, regexp, kws) : 0;
-	unsigned int two_contains = two ? contains(two, regexp, kws) : 0;
-	return one_contains != two_contains;
+	unsigned int c1 = one ? contains(one, regexp, kws) : 0;
+	unsigned int c2 = two ? contains(two, regexp, kws) : 0;
+	return c1 != c2;
 }
 
 static int pickaxe_match(struct diff_filepair *p, struct diff_options *o,
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 16/22] pickaxe -S: slightly optimize contains()
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (14 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 15/22] pickaxe: rename variables in has_changes() for brevity Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 17/22] xdiff-interface: prepare for allowing early return Ævar Arnfjörð Bjarmason
                       ` (5 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

When the "log -S<pat>" switch counts occurrences of <pat> on the
pre-image and post-image of a change. As soon as we know we had e.g. 1
before and 2 now we can stop, we don't need to keep counting past 2.

With this change a diff between A and B may have different performance
characteristics than between B and A. That's OK in this case, since
we'll emit the same output, and the effect is to make one of them
better.

I'm picking a check of "one" first on the assumption that it's a more
common case to have files grow over time than not.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 23362a23597..b7494fdf89c 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -68,7 +68,8 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	return ecbdata.hit;
 }
 
-static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
+static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws,
+			     unsigned int limit)
 {
 	unsigned int cnt = 0;
 	unsigned long sz = mf->size;
@@ -88,6 +89,9 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
 				sz--;
 			}
 			cnt++;
+
+			if (limit && cnt == limit)
+				return cnt;
 		}
 
 	} else { /* Classic exact string match */
@@ -99,6 +103,9 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws)
 			sz -= offset + kwsm.size[0];
 			data += offset + kwsm.size[0];
 			cnt++;
+
+			if (limit && cnt == limit)
+				return cnt;
 		}
 	}
 	return cnt;
@@ -108,8 +115,8 @@ static int has_changes(mmfile_t *one, mmfile_t *two,
 		       struct diff_options *o,
 		       regex_t *regexp, kwset_t kws)
 {
-	unsigned int c1 = one ? contains(one, regexp, kws) : 0;
-	unsigned int c2 = two ? contains(two, regexp, kws) : 0;
+	unsigned int c1 = one ? contains(one, regexp, kws, 0) : 0;
+	unsigned int c2 = two ? contains(two, regexp, kws, c1 + 1) : 0;
 	return c1 != c2;
 }
 
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 17/22] xdiff-interface: prepare for allowing early return
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (15 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 16/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 18/22] xdiff-interface: allow early return from xdiff_emit_line_fn Ævar Arnfjörð Bjarmason
                       ` (4 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Change the function prototype of xdiff_emit_line_fn to return an "int"
instead of "void". Change all of those functions to "return 0",
nothing checks those return values yet, and no behavior is being
changed.

In subsequent commits the interface will be changed to allow early
return via this new return value.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 combine-diff.c     |  5 +++--
 diff.c             | 26 +++++++++++++++-----------
 diffcore-pickaxe.c |  7 ++++---
 range-diff.c       |  3 ++-
 xdiff-interface.c  |  3 ++-
 xdiff-interface.h  |  2 +-
 6 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 06635f91bc2..a12d3bc0d9c 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -403,11 +403,11 @@ static void consume_hunk(void *state_,
 	state->sline[state->nb-1].p_lno[state->n] = state->ob;
 }
 
-static void consume_line(void *state_, char *line, unsigned long len)
+static int consume_line(void *state_, char *line, unsigned long len)
 {
 	struct combine_diff_state *state = state_;
 	if (!state->lost_bucket)
-		return; /* not in any hunk yet */
+		return 0; /* not in any hunk yet */
 	switch (line[0]) {
 	case '-':
 		append_lost(state->lost_bucket, state->n, line+1, len-1);
@@ -417,6 +417,7 @@ static void consume_line(void *state_, char *line, unsigned long len)
 		state->lno++;
 		break;
 	}
+	return 0;
 }
 
 static void combine_diff(struct repository *r,
diff --git a/diff.c b/diff.c
index c1f47a7f013..7a03c581c79 100644
--- a/diff.c
+++ b/diff.c
@@ -2336,7 +2336,7 @@ static void find_lno(const char *line, struct emit_callback *ecbdata)
 	ecbdata->lno_in_postimage = strtol(p + 1, NULL, 10);
 }
 
-static void fn_out_consume(void *priv, char *line, unsigned long len)
+static int fn_out_consume(void *priv, char *line, unsigned long len)
 {
 	struct emit_callback *ecbdata = priv;
 	struct diff_options *o = ecbdata->opt;
@@ -2372,7 +2372,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		len = sane_truncate_line(line, len);
 		find_lno(line, ecbdata);
 		emit_hunk_header(ecbdata, line, len);
-		return;
+		return 0;
 	}
 
 	if (ecbdata->diff_words) {
@@ -2382,11 +2382,11 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		if (line[0] == '-') {
 			diff_words_append(line, len,
 					  &ecbdata->diff_words->minus);
-			return;
+			return 0;
 		} else if (line[0] == '+') {
 			diff_words_append(line, len,
 					  &ecbdata->diff_words->plus);
-			return;
+			return 0;
 		} else if (starts_with(line, "\\ ")) {
 			/*
 			 * Eat the "no newline at eof" marker as if we
@@ -2395,11 +2395,11 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 			 * defer processing. If this is the end of
 			 * preimage, more "+" lines may come after it.
 			 */
-			return;
+			return 0;
 		}
 		diff_words_flush(ecbdata);
 		emit_diff_symbol(o, s, line, len, 0);
-		return;
+		return 0;
 	}
 
 	switch (line[0]) {
@@ -2423,6 +2423,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 				 line, len, 0);
 		break;
 	}
+	return 0;
 }
 
 static void pprint_rename(struct strbuf *name, const char *a, const char *b)
@@ -2522,7 +2523,7 @@ static struct diffstat_file *diffstat_add(struct diffstat_t *diffstat,
 	return x;
 }
 
-static void diffstat_consume(void *priv, char *line, unsigned long len)
+static int diffstat_consume(void *priv, char *line, unsigned long len)
 {
 	struct diffstat_t *diffstat = priv;
 	struct diffstat_file *x = diffstat->files[diffstat->nr - 1];
@@ -2531,6 +2532,7 @@ static void diffstat_consume(void *priv, char *line, unsigned long len)
 		x->added++;
 	else if (line[0] == '-')
 		x->deleted++;
+	return 0;
 }
 
 const char mime_boundary_leader[] = "------------";
@@ -3208,7 +3210,7 @@ static void checkdiff_consume_hunk(void *priv,
 	data->lineno = nb - 1;
 }
 
-static void checkdiff_consume(void *priv, char *line, unsigned long len)
+static int checkdiff_consume(void *priv, char *line, unsigned long len)
 {
 	struct checkdiff_t *data = priv;
 	int marker_size = data->conflict_marker_size;
@@ -3232,7 +3234,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 		}
 		bad = ws_check(line + 1, len - 1, data->ws_rule);
 		if (!bad)
-			return;
+			return 0;
 		data->status |= bad;
 		err = whitespace_error_string(bad);
 		fprintf(data->o->file, "%s%s:%d: %s.\n",
@@ -3244,6 +3246,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 	} else if (line[0] == ' ') {
 		data->lineno++;
 	}
+	return 0;
 }
 
 static unsigned char *deflate_it(char *data,
@@ -6121,17 +6124,18 @@ void flush_one_hunk(struct object_id *result, git_hash_ctx *ctx)
 	}
 }
 
-static void patch_id_consume(void *priv, char *line, unsigned long len)
+static int patch_id_consume(void *priv, char *line, unsigned long len)
 {
 	struct patch_id_t *data = priv;
 	int new_len;
 
 	if (len > 12 && starts_with(line, "\\ "))
-		return;
+		return 0;
 	new_len = remove_space(line, len);
 
 	the_hash_algo->update_fn(data->ctx, line, new_len);
 	data->patchlen += new_len;
+	return 0;
 }
 
 static void patch_id_add_string(git_hash_ctx *ctx, const char *str)
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index b7494fdf89c..27aa20be350 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -19,21 +19,22 @@ struct diffgrep_cb {
 	int hit;
 };
 
-static void diffgrep_consume(void *priv, char *line, unsigned long len)
+static int diffgrep_consume(void *priv, char *line, unsigned long len)
 {
 	struct diffgrep_cb *data = priv;
 	regmatch_t regmatch;
 
 	if (line[0] != '+' && line[0] != '-')
-		return;
+		return 0;
 	if (data->hit)
 		/*
 		 * NEEDSWORK: we should have a way to terminate the
 		 * caller early.
 		 */
-		return;
+		return 0;
 	data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1,
 				 &regmatch, 0);
+	return 0;
 }
 
 static int diff_grep(mmfile_t *one, mmfile_t *two,
diff --git a/range-diff.c b/range-diff.c
index 116fb0735c6..83c90f946ea 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -274,9 +274,10 @@ static void find_exact_matches(struct string_list *a, struct string_list *b)
 	hashmap_clear(&map);
 }
 
-static void diffsize_consume(void *data, char *line, unsigned long len)
+static int diffsize_consume(void *data, char *line, unsigned long len)
 {
 	(*(int *)data)++;
+	return 0;
 }
 
 static void diffsize_hunk(void *data, long ob, long on, long nb, long nn,
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 4d20069302b..5d8c8c67dc2 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -31,7 +31,7 @@ static int xdiff_out_hunk(void *priv_,
 	return 0;
 }
 
-static void consume_one(void *priv_, char *s, unsigned long size)
+static int consume_one(void *priv_, char *s, unsigned long size)
 {
 	struct xdiff_emit_state *priv = priv_;
 	char *ep;
@@ -43,6 +43,7 @@ static void consume_one(void *priv_, char *s, unsigned long size)
 		size -= this_size;
 		s += this_size;
 	}
+	return 0;
 }
 
 static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 93df26900c2..0198f9632f5 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -11,7 +11,7 @@
  */
 #define MAX_XDIFF_SIZE (1024UL * 1024 * 1023)
 
-typedef void (*xdiff_emit_line_fn)(void *, char *, unsigned long);
+typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
 typedef void (*xdiff_emit_hunk_fn)(void *data,
 				   long old_begin, long old_nr,
 				   long new_begin, long new_nr,
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 18/22] xdiff-interface: allow early return from xdiff_emit_line_fn
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (16 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 17/22] xdiff-interface: prepare for allowing early return Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 19/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
                       ` (3 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Finish the change started in the preceding commit and allow an early
return from "xdiff_emit_line_fn" callbacks, this will allows
diffcore-pickaxe.c to save itself redundant work.

Our xdiff interface also had the limitation of not being able to abort
early since the beginning, see d9ea73e0564 (combine-diff: refactor
built-in xdiff interface., 2006-04-05). Although at that time
"xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
"xdiff_emit_hunk_fn" didn't exist yet.

There was some work in this area of xdiff-interface.[ch] recently with
3b40a090fd4 (diff: avoid generating unused hunk header lines,
2018-11-02) and 7c61e25fbf1 (diff: use hunk callback for word-diff,
2018-11-02).

In combination those two changes allow us to not do any work on the
hunks and diff at all, but didn't change the status quo with regards
to consumers that e.g. want the diff lines, but might want to abort
early.

Whereas now we can abort e.g. on the first "-line" of a 1000 line diff
if that's all we needed.

This interface is rather scary as noted in the comment to
xdiff-interface.h being added here, as noted there a future change
could add more exit codes, and hack xdl_emit_diff() and friends to
ignore or skip things more selectively as a result.

I did not see an inherent reason for why xdl_emit_{diffrec,record}()
could not be changed to ferry the "xdiff_emit_line_fn" error code
upwards instead of returning -1 on all "ret < 0".

But doing so would require corresponding changes in xdl_emit_diff(),
xdl_diff(). I didn't see any issue with narrowly doing that to
accomplish what I needed here, but it would leave xdiff's own return
values in an inconsistent state.

Instead I've left it at returning a more conventional (for git's own
codebase) 1 for an early return, and translating it (or rather, all
non-zero) to -1 for xdiff's consumption.

The reason for most of the "stop" complexity in xdiff_outf() is
because we want to be able to abort early, but do so in a way that
doesn't skip the appropriate strbuf_reset() invocations.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 xdiff-interface.c | 18 ++++++++++++++----
 xdiff-interface.h | 17 +++++++++++++++++
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index 5d8c8c67dc2..50c0ef759dd 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -37,9 +37,12 @@ static int consume_one(void *priv_, char *s, unsigned long size)
 	char *ep;
 	while (size) {
 		unsigned long this_size;
+		int ret;
 		ep = memchr(s, '\n', size);
 		this_size = (ep == NULL) ? size : (ep - s + 1);
-		priv->line_fn(priv->consume_callback_data, s, this_size);
+		ret = priv->line_fn(priv->consume_callback_data, s, this_size);
+		if (ret)
+			return ret;
 		size -= this_size;
 		s += this_size;
 	}
@@ -50,11 +53,14 @@ static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
 {
 	struct xdiff_emit_state *priv = priv_;
 	int i;
+	int stop = 0;
 
 	if (!priv->line_fn)
 		return 0;
 
 	for (i = 0; i < nbuf; i++) {
+		if (stop)
+			return 1;
 		if (mb[i].ptr[mb[i].size-1] != '\n') {
 			/* Incomplete line */
 			strbuf_add(&priv->remainder, mb[i].ptr, mb[i].size);
@@ -63,17 +69,21 @@ static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
 
 		/* we have a complete line */
 		if (!priv->remainder.len) {
-			consume_one(priv, mb[i].ptr, mb[i].size);
+			stop = consume_one(priv, mb[i].ptr, mb[i].size);
 			continue;
 		}
 		strbuf_add(&priv->remainder, mb[i].ptr, mb[i].size);
-		consume_one(priv, priv->remainder.buf, priv->remainder.len);
+		stop = consume_one(priv, priv->remainder.buf, priv->remainder.len);
 		strbuf_reset(&priv->remainder);
 	}
+	if (stop)
+		return -1;
 	if (priv->remainder.len) {
-		consume_one(priv, priv->remainder.buf, priv->remainder.len);
+		stop = consume_one(priv, priv->remainder.buf, priv->remainder.len);
 		strbuf_reset(&priv->remainder);
 	}
+	if (stop)
+		return -1;
 	return 0;
 }
 
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 0198f9632f5..7d1724abb64 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -11,6 +11,23 @@
  */
 #define MAX_XDIFF_SIZE (1024UL * 1024 * 1023)
 
+/**
+ * The `xdiff_emit_line_fn` function can return 1 to abort early, or 0
+ * to continue processing. Note that doing so is an all-or-nothing
+ * affair, as returning 1 will return all the way to the top-level,
+ * e.g. the xdi_diff_outf() call to generate the diff.
+ *
+ * Thus returning 1 means you won't be getting any more diff lines. If
+ * you need something in-between those two options you'll to use
+ * `xdl_emit_hunk_consume_func_t` and implement your own version of
+ * xdl_emit_diff().
+ *
+ * We may extend the interface in the future to understand other more
+ * granular return values. While you should return 1 to exit early,
+ * doing so will currently make your early return indistinguishable
+ * from an error internal to xdiff, xdiff itself will see that
+ * non-zero return and translate it to -1.
+ */
 typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
 typedef void (*xdiff_emit_hunk_fn)(void *data,
 				   long old_begin, long old_nr,
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 19/22] pickaxe -G: terminate early on matching lines
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (17 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 18/22] xdiff-interface: allow early return from xdiff_emit_line_fn Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 20/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
                       ` (2 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Solve a long-standing item for "git log -Grx" of us e.g. finding "+
str" in the diff context and noting that we had a "hit", but xdiff
diligently continuing to generate and spew the rest of the diff at
us. This makes use of a new "early return" xdiff interface added by
preceding commits.

The TODO item (or, the NEEDSWORK comment) has been there since "git
log -G" was implemented. See f506b8e8b5f (git log/diff: add -G<regexp>
that greps in the patch text, 2010-08-23).

But now with the support added in the preceding changes to the
xdiff-interface we can return early. Let's assert the behavior of that
new early-return xdiff-interface by having a BUG() call here to die if
it ever starts handing us needless work again.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 30 +++++++++++++++++++-----------
 xdiff-interface.h  |  4 ++++
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 27aa20be350..2147afef722 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -27,13 +27,12 @@ static int diffgrep_consume(void *priv, char *line, unsigned long len)
 	if (line[0] != '+' && line[0] != '-')
 		return 0;
 	if (data->hit)
-		/*
-		 * NEEDSWORK: we should have a way to terminate the
-		 * caller early.
-		 */
-		return 0;
-	data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1,
-				 &regmatch, 0);
+		BUG("Already matched in diffgrep_consume! Broken xdiff_emit_line_fn?");
+	if (!regexec_buf(data->regexp, line + 1, len - 1, 1,
+			 &regmatch, 0)) {
+		data->hit = 1;
+		return 1;
+	}
 	return 0;
 }
 
@@ -45,6 +44,7 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	struct diffgrep_cb ecbdata;
 	xpparam_t xpp;
 	xdemitconf_t xecfg;
+	int ret;
 
 	if (!one)
 		return !regexec_buf(regexp, two->ptr, two->size,
@@ -63,10 +63,18 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	ecbdata.hit = 0;
 	xecfg.ctxlen = o->context;
 	xecfg.interhunkctxlen = o->interhunkcontext;
-	if (xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
-			  &ecbdata, &xpp, &xecfg))
-		return 0;
-	return ecbdata.hit;
+
+	/*
+	 * An xdiff error might be our "data->hit" from above. See the
+	 * comment for xdiff_emit_line_fn in xdiff-interface.h
+	 */
+	ret = xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
+			    &ecbdata, &xpp, &xecfg);
+	if (ecbdata.hit)
+		return 1;
+	if (ret)
+		return ret;
+	return 0;
 }
 
 static unsigned int contains(mmfile_t *mf, regex_t *regexp, kwset_t kws,
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 7d1724abb64..3b6819586da 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -27,6 +27,10 @@
  * doing so will currently make your early return indistinguishable
  * from an error internal to xdiff, xdiff itself will see that
  * non-zero return and translate it to -1.
+ *
+ * See "diff_grep" in diffcore-pickaxe.c for a trick to work around
+ * this, i.e. using the "consume_callback_data" to note the desired
+ * early return.
  */
 typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
 typedef void (*xdiff_emit_hunk_fn)(void *data,
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 20/22] pickaxe -G: don't special-case create/delete
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (18 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 19/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 21/22] xdiff users: use designated initializers for out_line Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 22/22] xdiff-interface: replace discard_hunk_line() with a flag Ævar Arnfjörð Bjarmason
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Instead of special-casing creations and deletions let's just generate
a diff for them.

This logic of not running a diff under -G if we don't have both sides
dates back to the original implementation of -S in
52e9578985f ([PATCH] Introducing software archaeologist's tool
"pickaxe"., 2005-05-21).

In the case of -S we were not working with the xdiff interface and
needed to do this, but when -G was implemented in f506b8e8b5f (git
log/diff: add -G<regexp> that greps in the patch text, 2010-08-23)
this logic was diligently copied over.

But as the performance test added earlier in this series shows, this
does not make much of a difference. With:

    time GIT_TEST_LONG= GIT_PERF_REPEAT_COUNT=10 GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3' ./run origin/next HEAD~ HEAD -- p4209-pickaxe.sh

With the HEAD~ commit being the preceding "pickaxe -G: terminate early
on matching lines" we get these results. Note that it's only the -G
codepaths that are relevant to this change:

    Test                                                                      origin/next       HEAD~                   HEAD
    -----------------------------------------------------------------------------------------------------------------------------------------
    4209.1: git log -S'int main' <limit-rev>..                                0.35(0.32+0.03)   0.35(0.33+0.02) +0.0%   0.35(0.30+0.05) +0.0%
    4209.2: git log -S'æ' <limit-rev>..                                       0.46(0.42+0.04)   0.46(0.41+0.05) +0.0%   0.46(0.42+0.04) +0.0%
    4209.3: git log --pickaxe-regex -S'(int|void|null)' <limit-rev>..         0.65(0.62+0.02)   0.64(0.61+0.02) -1.5%   0.64(0.60+0.04) -1.5%
    4209.4: git log --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>..          0.52(0.45+0.06)   0.52(0.50+0.01) +0.0%   0.54(0.47+0.04) +3.8%
    4209.5: git log --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>..       0.39(0.34+0.05)   0.39(0.34+0.04) +0.0%   0.39(0.36+0.03) +0.0%
    4209.6: git log -G'(int|void|null)' <limit-rev>..                         0.60(0.55+0.04)   0.58(0.54+0.03) -3.3%   0.58(0.49+0.08) -3.3%
    4209.7: git log -G'if *\([^ ]+ & ' <limit-rev>..                          0.61(0.52+0.06)   0.59(0.53+0.05) -3.3%   0.59(0.54+0.05) -3.3%
    4209.8: git log -G'[àáâãäåæñøùúûüýþ]' <limit-rev>..                       0.61(0.51+0.07)   0.58(0.54+0.04) -4.9%   0.57(0.51+0.06) -6.6%
    4209.9: git log -i -S'int main' <limit-rev>..                             0.36(0.31+0.04)   0.36(0.34+0.02) +0.0%   0.35(0.32+0.03) -2.8%
    4209.10: git log -i -S'æ' <limit-rev>..                                   0.36(0.33+0.03)   0.39(0.34+0.01) +8.3%   0.36(0.32+0.03) +0.0%
    4209.11: git log -i --pickaxe-regex -S'(int|void|null)' <limit-rev>..     0.83(0.77+0.05)   0.82(0.77+0.05) -1.2%   0.80(0.75+0.04) -3.6%
    4209.12: git log -i --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>..      0.67(0.61+0.03)   0.64(0.61+0.03) -4.5%   0.63(0.61+0.02) -6.0%
    4209.13: git log -i --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>..   0.40(0.37+0.02)   0.40(0.37+0.03) +0.0%   0.40(0.36+0.04) +0.0%
    4209.14: git log -i -G'(int|void|null)' <limit-rev>..                     0.58(0.51+0.07)   0.59(0.52+0.06) +1.7%   0.58(0.52+0.05) +0.0%
    4209.15: git log -i -G'if *\([^ ]+ & ' <limit-rev>..                      0.60(0.54+0.05)   0.60(0.54+0.06) +0.0%   0.60(0.56+0.03) +0.0%
    4209.16: git log -i -G'[àáâãäåæñøùúûüýþ]' <limit-rev>..                   0.58(0.51+0.06)   0.57(0.52+0.05) -1.7%   0.60(0.48+0.09) +3.4%

This small simplification really doesn't buy us much now, but I've got
plans to both convert the pickaxe code to using a PCREv2 backend[1]
and to implement additional pickaxe modes to do custom searches
through the diff[2]. Always having the diff available under -G is
going to help to simplify both of those changes.

1. https://lore.kernel.org/git/20210203032811.14979-22-avarab@gmail.com/
2. https://lore.kernel.org/git/20190424152215.16251-3-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diffcore-pickaxe.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 2147afef722..96183f4cfab 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -40,19 +40,11 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 		     struct diff_options *o,
 		     regex_t *regexp, kwset_t kws)
 {
-	regmatch_t regmatch;
 	struct diffgrep_cb ecbdata;
 	xpparam_t xpp;
 	xdemitconf_t xecfg;
 	int ret;
 
-	if (!one)
-		return !regexec_buf(regexp, two->ptr, two->size,
-				    1, &regmatch, 0);
-	if (!two)
-		return !regexec_buf(regexp, one->ptr, one->size,
-				    1, &regmatch, 0);
-
 	/*
 	 * We have both sides; need to run textual diff and see if
 	 * the pattern appears on added/deleted lines.
@@ -172,9 +164,7 @@ static int pickaxe_match(struct diff_filepair *p, struct diff_options *o,
 	mf1.size = fill_textconv(o->repo, textconv_one, p->one, &mf1.ptr);
 	mf2.size = fill_textconv(o->repo, textconv_two, p->two, &mf2.ptr);
 
-	ret = fn(DIFF_FILE_VALID(p->one) ? &mf1 : NULL,
-		 DIFF_FILE_VALID(p->two) ? &mf2 : NULL,
-		 o, regexp, kws);
+	ret = fn(&mf1, &mf2, o, regexp, kws);
 
 	if (textconv_one)
 		free(mf1.ptr);
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 21/22] xdiff users: use designated initializers for out_line
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (19 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 20/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  2021-04-12 17:15     ` [PATCH v3 22/22] xdiff-interface: replace discard_hunk_line() with a flag Ævar Arnfjörð Bjarmason
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Amend the code added in 611e42a5980 (xdiff: provide a separate emit
callback for hunks, 2018-11-02) to be more readable by using
designated initializers.

This changes "priv" in rerere.c to be initialized to NULL as we did in
merge-tree.c. That's not needed as we'll only use it if the callback
is defined, but being consistent here is better and less verbose.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/merge-tree.c | 5 +----
 builtin/rerere.c     | 4 +---
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index de8520778d2..5dc94d6f880 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -107,15 +107,12 @@ static void show_diff(struct merge_list *entry)
 	mmfile_t src, dst;
 	xpparam_t xpp;
 	xdemitconf_t xecfg;
-	xdemitcb_t ecb;
+	xdemitcb_t ecb = { .out_line = show_outf };
 
 	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
-	ecb.out_hunk = NULL;
-	ecb.out_line = show_outf;
-	ecb.priv = NULL;
 
 	src.ptr = origin(entry, &size);
 	if (!src.ptr)
diff --git a/builtin/rerere.c b/builtin/rerere.c
index fd3be17b976..83d7a778e37 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -28,7 +28,7 @@ static int diff_two(const char *file1, const char *label1,
 {
 	xpparam_t xpp;
 	xdemitconf_t xecfg;
-	xdemitcb_t ecb;
+	xdemitcb_t ecb = { .out_line = outf };
 	mmfile_t minus, plus;
 	int ret;
 
@@ -41,8 +41,6 @@ static int diff_two(const char *file1, const char *label1,
 	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
-	ecb.out_hunk = NULL;
-	ecb.out_line = outf;
 	ret = xdi_diff(&minus, &plus, &xpp, &xecfg, &ecb);
 
 	free(minus.ptr);
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 22/22] xdiff-interface: replace discard_hunk_line() with a flag
  2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
                       ` (20 preceding siblings ...)
  2021-04-12 17:15     ` [PATCH v3 21/22] xdiff users: use designated initializers for out_line Ævar Arnfjörð Bjarmason
@ 2021-04-12 17:15     ` Ævar Arnfjörð Bjarmason
  21 siblings, 0 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-12 17:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin,
	Carlo Marcelo Arenas Belón,
	Ævar Arnfjörð Bjarmason

Remove the dummy discard_hunk_line() function added in
3b40a090fd4 (diff: avoid generating unused hunk header lines,
2018-11-02) in favor of having a new XDL_EMIT_NO_HUNK_HDR flag, for
use along with the two existing and similar XDL_EMIT_* flags.

Unlike the recently amended xdiff_emit_line_fn interface which'll be
called in a loop in xdl_emit_diff(), the hunk header is only emitted
once.

It makes more sense to pass this as a flag than provide a dummy
callback because that function may be able to skip doing certain work
if it knows the caller is doing nothing with the hunk header.

It would be possible to do so in the case of -U0 now, but the benefit
of doing so is so small that I haven't bothered. But this leaves the
door open to that, and more importantly makes the API use more
intuitive.

The reason we're putting a flag in the gap between 1<<0 and 1<<2 is
that the old 1<<1 flag was removed in 907681e940d (xdiff: drop
XDL_EMIT_COMMON, 2016-02-23) without re-ordering the remaining flags.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.c             | 7 ++++---
 diffcore-pickaxe.c | 3 ++-
 xdiff-interface.c  | 6 ------
 xdiff-interface.h  | 8 --------
 xdiff/xdiff.h      | 1 +
 xdiff/xemit.c      | 3 ++-
 6 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/diff.c b/diff.c
index 7a03c581c79..fe3abac79fe 100644
--- a/diff.c
+++ b/diff.c
@@ -3725,7 +3725,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		xpp.anchors_nr = o->anchors_nr;
 		xecfg.ctxlen = o->context;
 		xecfg.interhunkctxlen = o->interhunkcontext;
-		if (xdi_diff_outf(&mf1, &mf2, discard_hunk_line,
+		xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
+		if (xdi_diff_outf(&mf1, &mf2, NULL,
 				  diffstat_consume, diffstat, &xpp, &xecfg))
 			die("unable to generate diffstat for %s", one->path);
 
@@ -6233,8 +6234,8 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid
 
 		xpp.flags = 0;
 		xecfg.ctxlen = 3;
-		xecfg.flags = 0;
-		if (xdi_diff_outf(&mf1, &mf2, discard_hunk_line,
+		xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
+		if (xdi_diff_outf(&mf1, &mf2, NULL,
 				  patch_id_consume, &data, &xpp, &xecfg))
 			return error("unable to generate patch-id diff for %s",
 				     p->one->path);
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 96183f4cfab..c88e50c6329 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -53,6 +53,7 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	memset(&xecfg, 0, sizeof(xecfg));
 	ecbdata.regexp = regexp;
 	ecbdata.hit = 0;
+	xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
 	xecfg.ctxlen = o->context;
 	xecfg.interhunkctxlen = o->interhunkcontext;
 
@@ -60,7 +61,7 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
 	 * An xdiff error might be our "data->hit" from above. See the
 	 * comment for xdiff_emit_line_fn in xdiff-interface.h
 	 */
-	ret = xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
+	ret = xdi_diff_outf(one, two, NULL, diffgrep_consume,
 			    &ecbdata, &xpp, &xecfg);
 	if (ecbdata.hit)
 		return 1;
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 50c0ef759dd..95f13a93ff9 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -126,12 +126,6 @@ int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t co
 	return xdl_diff(&a, &b, xpp, xecfg, xecb);
 }
 
-void discard_hunk_line(void *priv,
-		       long ob, long on, long nb, long nn,
-		       const char *func, long funclen)
-{
-}
-
 int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
 		  xdiff_emit_hunk_fn hunk_fn,
 		  xdiff_emit_line_fn line_fn,
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 3b6819586da..4301a7eef27 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -53,14 +53,6 @@ void xdiff_clear_find_func(xdemitconf_t *xecfg);
 int git_xmerge_config(const char *var, const char *value, void *cb);
 extern int git_xmerge_style;
 
-/*
- * Can be used as a no-op hunk_fn for xdi_diff_outf(), since a NULL
- * one just sends the hunk line to the line_fn callback).
- */
-void discard_hunk_line(void *priv,
-		       long ob, long on, long nb, long nn,
-		       const char *func, long funclen);
-
 /*
  * Compare the strings l1 with l2 which are of size s1 and s2 respectively.
  * Returns 1 if the strings are deemed equal, 0 otherwise.
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 7a046051468..b29deca5de8 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -50,6 +50,7 @@ extern "C" {
 
 /* xdemitconf_t.flags */
 #define XDL_EMIT_FUNCNAMES (1 << 0)
+#define XDL_EMIT_NO_HUNK_HDR (1 << 1)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
 /* merge simplification levels */
diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index 9d7d6c50874..1cbf2b9829e 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -278,7 +278,8 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 				      s1 - 1, funclineprev);
 			funclineprev = s1 - 1;
 		}
-		if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2,
+		if (!(xecfg->flags & XDL_EMIT_NO_HUNK_HDR) &&
+		    xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2,
 				      func_line.buf, func_line.len, ecb) < 0)
 			return -1;
 
-- 
2.31.1.639.g3d04783866f


^ permalink raw reply related	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2021-04-12 17:16 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 01/25] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 02/25] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 03/25] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 04/25] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 05/25] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 06/25] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 07/25] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 08/25] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 09/25] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 10/25] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 11/25] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 12/25] pickaxe tests: refactor to use test_commit --append Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 13/25] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop Ævar Arnfjörð Bjarmason
2021-02-04 16:16   ` René Scharfe
2021-02-04 17:56     ` Junio C Hamano
2021-02-04 21:13       ` Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 15/25] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 16/25] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 17/25] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 18/25] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 19/25] pickaxe -G: set -U0 for diff generation Ævar Arnfjörð Bjarmason
2021-02-03 14:26   ` Ævar Arnfjörð Bjarmason
2021-02-03 19:42     ` Junio C Hamano
2021-02-03  3:28 ` [PATCH 20/25] grep.h: make patmatch() a public function Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 21/25] pickaxe: use PCREv2 for -G and -S Ævar Arnfjörð Bjarmason
2021-02-03 20:44   ` Ævar Arnfjörð Bjarmason
2021-02-04 18:11     ` Junio C Hamano
2021-02-04 18:22   ` Junio C Hamano
2021-02-03  3:28 ` [PATCH 22/25] Remove unused kwset.[ch] Ævar Arnfjörð Bjarmason
     [not found]   ` <CAPUEspgBmuTBHVZWY9fRtjbHWBRr0zHravLL1Czepc6jmib4HA@mail.gmail.com>
2021-02-03 14:13     ` Ævar Arnfjörð Bjarmason
     [not found]       ` <CAPUEsphN7QuSVsC1Tr4xE8yQgPTtpF7wL7zbk1crQU3n-5g6JQ@mail.gmail.com>
2021-02-03 16:45         ` Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 23/25] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 24/25] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
2021-02-04 18:16   ` Junio C Hamano
2021-02-03  3:28 ` [PATCH 25/25] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
2021-02-03 12:38 ` [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 00/22] pickaxe: test and refactoring for follow-up changes Ævar Arnfjörð Bjarmason
2021-02-16 22:23   ` Junio C Hamano
2021-02-17  1:19     ` Junio C Hamano
2021-04-12 17:15   ` [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 02/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 03/22] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 04/22] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 05/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 06/22] pickaxe tests: add missing test for --no-pickaxe-regex being an error Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 07/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 08/22] pickaxe: die when --find-object and --pickaxe-all " Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 09/22] diff.h: move pickaxe fields together again Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 10/22] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 11/22] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 12/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 13/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 14/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 15/22] pickaxe: rename variables in has_changes() for brevity Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 16/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 17/22] xdiff-interface: prepare for allowing early return Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 18/22] xdiff-interface: allow early return from xdiff_emit_line_fn Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 19/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 20/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 21/22] xdiff users: use designated initializers for out_line Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 22/22] xdiff-interface: replace discard_hunk_line() with a flag Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 02/22] test-lib-functions: document and test test_commit --no-tag Ævar Arnfjörð Bjarmason
2021-03-30 23:14   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 03/22] test-lib-functions: reword "test_commit --append" docs Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 04/22] test-lib functions: add --printf option to test_commit Ævar Arnfjörð Bjarmason
2021-03-30 23:11   ` Junio C Hamano
2021-04-12 13:19     ` Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 05/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
2021-03-30 23:26   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 06/22] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 07/22] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 08/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
2021-03-30 23:32   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 09/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
2021-03-30 23:36   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 10/22] pickaxe: die when --find-object and --pickaxe-all " Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 11/22] diff.h: move pickaxe fields together again Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 12/22] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 13/22] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 14/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
2021-03-30 23:45   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 15/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
2021-03-30 23:50   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 16/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
2021-03-30 23:54   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 17/22] pickaxe: rename variables in has_changes() for brevity Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 18/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
2021-03-30 23:58   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 19/22] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
2021-03-31  0:04   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 20/22] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
2021-02-16 11:58 ` [PATCH v2 21/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
2021-03-31  0:11   ` Junio C Hamano
2021-02-16 11:58 ` [PATCH v2 22/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
2021-03-31  0:14   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.