All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v3 00/22] pickaxe: test and refactoring for future PCRE backend
Date: Mon, 12 Apr 2021 19:15:07 +0200	[thread overview]
Message-ID: <cover-00.22-00000000000-20210412T170457Z-avarab@gmail.com> (raw)
In-Reply-To: <20210216115801.4773-1-avarab@gmail.com>

This much-delayed re-roll of v2[2] is a test and refactoring change to
diffcore-pickaxe.c to allow an eventual removal of the kwset.[ch]
files and to add a PCRE backend.

This series is now based on my "test-lib.sh: new test_commit args,
simplification & fixes" series. The trivial dependency between the two
is using the new test_commit --printf option.

I'll not summarize the range-diff too much, but since v2 I addressed
all outstanding feedback from Junio. There's 2x new patches at the end
of the series to change the existing early return feature now using
"discard_hunk_line" to use a flag.

In v2 I had what I realized was a WIP migration of that to using
return values in the callback instead, but unlike the
xdiff_emit_line_fn I don't think that makes sense in
xdiff_emit_hunk_fn.

1. https://lore.kernel.org/git/20210216115801.4773-1-avarab@gmail.com/
2. https://lore.kernel.org/git/cover-00.16-00000000000-20210412T110456Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (22):
  grep/pcre2 tests: reword comments referring to kwset
  pickaxe tests: refactor to use test_commit --append --printf
  pickaxe tests: add test for diffgrep_consume() internals
  pickaxe tests: add test for "log -S" not being a regex
  pickaxe tests: test for -G, -S and --find-object incompatibility
  pickaxe tests: add missing test for --no-pickaxe-regex being an error
  pickaxe: die when -G and --pickaxe-regex are combined
  pickaxe: die when --find-object and --pickaxe-all are combined
  diff.h: move pickaxe fields together again
  pickaxe/style: consolidate declarations and assignments
  perf: add performance test for pickaxe
  pickaxe: refactor function selection in diffcore-pickaxe()
  pickaxe: assert that we must have a needle under -G or -S
  pickaxe -S: support content with NULs under --pickaxe-regex
  pickaxe: rename variables in has_changes() for brevity
  pickaxe -S: slightly optimize contains()
  xdiff-interface: prepare for allowing early return
  xdiff-interface: allow early return from xdiff_emit_line_fn
  pickaxe -G: terminate early on matching lines
  pickaxe -G: don't special-case create/delete
  xdiff users: use designated initializers for out_line
  xdiff-interface: replace discard_hunk_line() with a flag

 builtin/merge-tree.c           |   5 +-
 builtin/rerere.c               |   4 +-
 combine-diff.c                 |   5 +-
 diff.c                         |  39 +++++++----
 diff.h                         |   7 +-
 diffcore-pickaxe.c             | 106 ++++++++++++++++++------------
 range-diff.c                   |   3 +-
 t/perf/p4209-pickaxe.sh        |  70 ++++++++++++++++++++
 t/t4209-log-pickaxe.sh         | 114 ++++++++++++++++++++++++++++-----
 t/t7816-grep-binary-pattern.sh |   4 +-
 xdiff-interface.c              |  27 ++++----
 xdiff-interface.h              |  31 ++++++---
 xdiff/xdiff.h                  |   1 +
 xdiff/xemit.c                  |   3 +-
 14 files changed, 312 insertions(+), 107 deletions(-)
 create mode 100755 t/perf/p4209-pickaxe.sh

Range-diff against v2:
 1:  75bfc8eba13 =  1:  cfe934d6081 grep/pcre2 tests: reword comments referring to kwset
 2:  cddb1f75f6c <  -:  ----------- test-lib-functions: document and test test_commit --no-tag
 3:  44540f6039e <  -:  ----------- test-lib-functions: reword "test_commit --append" docs
 4:  0e1f133476f <  -:  ----------- test-lib functions: add --printf option to test_commit
 5:  2a814e8d53a !  2:  413a330d3d6 pickaxe tests: refactor to use test_commit --append --printf
    @@ Metadata
      ## Commit message ##
         pickaxe tests: refactor to use test_commit --append --printf
     
    -    Refactor existing tests added in e0e7cb8080c (log -G: ignore binary
    -    files, 2018-12-14) to use the --append option I added in
    +    Refactor the existing tests added in e0e7cb8080c (log -G: ignore
    +    binary files, 2018-12-14) to use the --append option I added in
         3373518cc8b (test-lib functions: add an --append option to
    -    test_commit, 2021-01-12) and the --printf option added in a preceding
    -    commit.
    +    test_commit, 2021-01-12) and the --printf option added as part of an
    +    in-flight topic of mine this commit depends on.
    +
    +    While I'm at it change some of the setup of the test to use a more
    +    sensible pattern, e.g. setting up a temporary repo instead of creating
    +    an orphan branch.
    +
    +    Since the -G and -S options will behave the same way with truncated
    +    and removed content also change the "git rm" to emptying data.bin,
    +    that's just catering to how test_commit works. The resulting test is
    +    shorter.
     
         See also f5d79bf7dd6 (tests: refactor a few tests to use "test_commit
         --append", 2021-01-12) for prior similar refactoring.
    @@ t/t4209-log-pickaxe.sh: test_expect_success 'log -S --no-textconv (missing textc
     -	git commit -m "delete binary file" data.bin &&
     -	git log >full-log
     +	test_create_repo GS-bin-txt &&
    -+	test_commit -C GS-bin-txt --append --printf A data.bin "a\na\0a\n" &&
    ++	test_commit -C GS-bin-txt --printf A data.bin "a\na\0a\n" &&
     +	test_commit -C GS-bin-txt --append --printf B data.bin "a\na\0a\n" &&
     +	test_commit -C GS-bin-txt C data.bin "" &&
     +	git -C GS-bin-txt log >full-log
 6:  f49eb6c95e5 !  3:  ddd2224836b pickaxe tests: add test for diffgrep_consume() internals
    @@ t/t4209-log-pickaxe.sh: test_expect_success 'log -S --no-textconv (missing textc
     +
      test_expect_success 'setup log -[GS] binary & --text' '
      	test_create_repo GS-bin-txt &&
    - 	test_commit -C GS-bin-txt --append --printf A data.bin "a\na\0a\n" &&
    + 	test_commit -C GS-bin-txt --printf A data.bin "a\na\0a\n" &&
 7:  80c62fb0448 =  4:  ca6340c1fa7 pickaxe tests: add test for "log -S" not being a regex
 8:  6d329e0c3b1 !  5:  0c4657189a8 pickaxe tests: test for -G, -S and --find-object incompatibility
    @@ t/t4209-log-pickaxe.sh: test_expect_success setup '
      
     +test_expect_success 'usage' '
     +	test_expect_code 128 git log -Gregex -Sstring 2>err &&
    -+	test_i18ngrep "mutually exclusive" err &&
    ++	grep "mutually exclusive" err &&
     +
     +	test_expect_code 128 git log -Gregex --find-object=HEAD 2>err &&
    -+	test_i18ngrep "mutually exclusive" err &&
    ++	grep "mutually exclusive" err &&
     +
    -+	test_expect_code 128 git log -Gstring --find-object=HEAD 2>err &&
    -+	test_i18ngrep "mutually exclusive" err
    ++	test_expect_code 128 git log -Sstring --find-object=HEAD 2>err &&
    ++	grep "mutually exclusive" err
     +'
     +
      test_log	expect_initial	--grep initial
 -:  ----------- >  6:  1696076bb09 pickaxe tests: add missing test for --no-pickaxe-regex being an error
 9:  bd0c3b7e3b0 !  7:  83e7b4793b6 pickaxe: die when -G and --pickaxe-regex are combined
    @@ Commit message
         matching regular expressions, 2006-03-29) only the -S option
         existed. Then when -G was added in f506b8e8b5 (git log/diff: add
         -G<regexp> that greps in the patch text, 2010-08-23) neither the
    -    documentation for --pickaxe-regex was updater accordingly, nor was
    +    documentation for --pickaxe-regex was updated accordingly, nor was
         something like this assertion added.
     
         Since 5bc3f0b567 (diffcore-pickaxe doc: document -S and -G properly,
         2013-05-31) we've claimed that --pickaxe-regex should only be used
    -    with -S, but have silently toileted combining it with -G, let's die
    +    with -S, but have silently tolerated combining it with -G, let's die
         instead.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
    @@ diff.h: int git_config_rename(const char *var, const char *value);
     
      ## t/t4209-log-pickaxe.sh ##
     @@ t/t4209-log-pickaxe.sh: test_expect_success 'usage' '
    - 	test_expect_code 128 git log -Gregex -Sstring 2>err &&
    - 	test_i18ngrep "mutually exclusive" err &&
    + 	grep "mutually exclusive" err
    + '
      
    ++test_expect_success 'usage: --pickaxe-regex' '
     +	test_expect_code 128 git log -Gregex --pickaxe-regex 2>err &&
    -+	test_i18ngrep "mutually exclusive" err &&
    ++	grep "mutually exclusive" err
    ++'
     +
    - 	test_expect_code 128 git log -Gregex --find-object=HEAD 2>err &&
    - 	test_i18ngrep "mutually exclusive" err &&
    - 
    + test_expect_success 'usage: --no-pickaxe-regex' '
    + 	cat >expect <<-\EOF &&
    + 	fatal: unrecognized argument: --no-pickaxe-regex
10:  72075874f5c !  8:  749c3ca3f98 pickaxe: die when --find-object and --pickaxe-all are combined
    @@ diff.h: int git_config_rename(const char *var, const char *value);
     
      ## t/t4209-log-pickaxe.sh ##
     @@ t/t4209-log-pickaxe.sh: test_expect_success 'usage' '
    - 	test_i18ngrep "mutually exclusive" err &&
    + 	grep "mutually exclusive" err &&
      
    - 	test_expect_code 128 git log -Gstring --find-object=HEAD 2>err &&
    -+	test_i18ngrep "mutually exclusive" err &&
    + 	test_expect_code 128 git log -Sstring --find-object=HEAD 2>err &&
    ++	grep "mutually exclusive" err &&
     +
     +	test_expect_code 128 git log --pickaxe-all --find-object=HEAD 2>err &&
    - 	test_i18ngrep "mutually exclusive" err
    + 	grep "mutually exclusive" err
      '
      
11:  f8116a2b814 =  9:  fe4e75c39d2 diff.h: move pickaxe fields together again
12:  4778357cfc7 = 10:  afe70b163a2 pickaxe/style: consolidate declarations and assignments
13:  7449a59b104 = 11:  97616d741c7 perf: add performance test for pickaxe
14:  160f7d8b0f2 ! 12:  c29deb428b1 pickaxe: refactor function selection in diffcore-pickaxe()
    @@ diffcore-pickaxe.c: void diffcore_pickaxe(struct diff_options *o)
      		regcomp_or_die(&regex, needle, cflags);
      		regexp = &regex;
     +
    -+		/* diff.c errors on -G and --pickaxe-regex for us */
     +		if (opts & DIFF_PICKAXE_KIND_G)
     +			fn = diff_grep;
     +		else if (opts & DIFF_PICKAXE_REGEX)
     +			fn = has_changes;
     +		else
    ++			/*
    ++			 * We don't need to check the combination of
    ++			 * -G and --pickaxe-regex, by the time we get
    ++			 * here diff.c has already died if they're
    ++			 * combined. See the usage tests in
    ++			 * t4209-log-pickaxe.sh.
    ++			 */
     +			BUG("unreachable");
      	} else if (opts & DIFF_PICKAXE_KIND_S) {
      		if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_CASE &&
15:  05fc57e54e1 ! 13:  115a369d067 pickaxe: assert that we must have a needle under -G or -S
    @@ diffcore-pickaxe.c: void diffcore_pickaxe(struct diff_options *o)
      	kwset_t kws = NULL;
      	pickaxe_fn fn;
      
    -+	if (opts & ~DIFF_PICKAXE_KIND_OBJFIND && !needle)
    ++	if (opts & ~DIFF_PICKAXE_KIND_OBJFIND &&
    ++	    (!needle || !*needle))
     +		BUG("should have needle under -G or -S");
      	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
      		int cflags = REG_EXTENDED | REG_NEWLINE;
    @@ t/t4209-log-pickaxe.sh: test_expect_success setup '
     +	test_i18ngrep "switch.*requires a value" err &&
     +
      	test_expect_code 128 git log -Gregex -Sstring 2>err &&
    - 	test_i18ngrep "mutually exclusive" err &&
    + 	grep "mutually exclusive" err &&
      
16:  550620ec13b ! 14:  a86032792b6 pickaxe -S: support content with NULs under --pickaxe-regex
    @@ Commit message
         for pickaxe matching regular expressions, 2006-03-29).
     
         It wasn't needed anymore, and as the now-passing test shows, actively
    -    getting in our way.
    +    getting in our way. Since we always require REG_STARTEND support we do
    +    not need to stop at NULs. If we are dealing with a haystack with NUL
    +    in it. The needle may be behind that NUL.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ t/t4209-log-pickaxe.sh: test_expect_success 'log -S looks into binary files' '
     +	git -C GS-bin-txt log --pickaxe-regex -Sa >log &&
     +	test_cmp log full-log &&
     +
    -+	git -C GS-bin-txt log --pickaxe-regex -S[a] >log &&
    ++	git -C GS-bin-txt log --pickaxe-regex -S"[a]" >log &&
     +	test_cmp log full-log
     +'
     +
17:  985e077d561 = 15:  10f85edcff7 pickaxe: rename variables in has_changes() for brevity
18:  648e6e5f11b ! 16:  ed83c3add89 pickaxe -S: slightly optimize contains()
    @@ Commit message
         pre-image and post-image of a change. As soon as we know we had e.g. 1
         before and 2 now we can stop, we don't need to keep counting past 2.
     
    +    With this change a diff between A and B may have different performance
    +    characteristics than between B and A. That's OK in this case, since
    +    we'll emit the same output, and the effect is to make one of them
    +    better.
    +
    +    I'm picking a check of "one" first on the assumption that it's a more
    +    common case to have files grow over time than not.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## diffcore-pickaxe.c ##
19:  b991660fbf5 ! 17:  62c306275c7 xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn
    +    xdiff-interface: prepare for allowing early return
     
    -    Change the function prototype of xdiff_emit_{line,hunk}_fn to return
    -    an int instead of void. This will allow for returning early from hunk
    -    & diff consumers that want some of the data, but not all of it.
    +    Change the function prototype of xdiff_emit_line_fn to return an "int"
    +    instead of "void". Change all of those functions to "return 0",
    +    nothing checks those return values yet, and no behavior is being
    +    changed.
     
    -    No behavior is being changed here, just replacing the equivalent of
    -    "return" with "return 0", nothing acts on the changed return values
    -    yet.
    -
    -    There was some work in this area of xdiff-interface.[ch] recently with
    -    3b40a090fd4 (diff: avoid generating unused hunk header lines,
    -    2018-11-02) and 7c61e25fbf1 (diff: use hunk callback for word-diff,
    -    2018-11-02).
    -
    -    In combination those two changes allow us to not do any work on the
    -    hunks and diff at all, but didn't change the status quo with regards
    -    to consumers that e.g. want the diff lines, but might want to abort
    -    early.
    -
    -    Whereas soon we can abort e.g. on the first "-line" of a 1000 line
    -    diff if that's all we needed.
    -
    -    This interface is rather scary as noted in the comment to
    -    xdiff-interface.h being added here, but it will be useful for
    -    diffcore-pickaxe.c in a subsequent commit. A future change could
    -    e.g. add more exit codes, and hack xdl_emit_diff() and friends to
    -    ignore or skip things more selectively as a result.
    +    In subsequent commits the interface will be changed to allow early
    +    return via this new return value.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## combine-diff.c ##
    -@@ combine-diff.c: struct combine_diff_state {
    - 	struct sline *lost_bucket;
    - };
    - 
    --static void consume_hunk(void *state_,
    -+static int consume_hunk(void *state_,
    - 			 long ob, long on,
    - 			 long nb, long nn,
    - 			 const char *funcline, long funclen)
     @@ combine-diff.c: static void consume_hunk(void *state_,
    - 		state->sline[state->nb-1].p_lno =
    - 			xcalloc(state->num_parent, sizeof(unsigned long));
      	state->sline[state->nb-1].p_lno[state->n] = state->ob;
    -+
    -+	return 0;
      }
      
     -static void consume_line(void *state_, char *line, unsigned long len)
    @@ combine-diff.c: static void consume_line(void *state_, char *line, unsigned long
      static void combine_diff(struct repository *r,
     
      ## diff.c ##
    -@@ diff.c: static int color_words_output_graph_prefix(struct diff_words_data *diff_words)
    - 	}
    - }
    - 
    --static void fn_out_diff_words_aux(void *priv,
    --				  long minus_first, long minus_len,
    --				  long plus_first, long plus_len,
    --				  const char *func, long funclen)
    -+static int fn_out_diff_words_aux(void *priv,
    -+				 long minus_first, long minus_len,
    -+				 long plus_first, long plus_len,
    -+				 const char *func, long funclen)
    - {
    - 	struct diff_words_data *diff_words = priv;
    - 	struct diff_words_style *style = diff_words->style;
    -@@ diff.c: static void fn_out_diff_words_aux(void *priv,
    - 
    - 	diff_words->current_plus = plus_end;
    - 	diff_words->last_minus = minus_first;
    -+
    -+	return 0;
    - }
    - 
    - /* This function starts looking at *begin, and returns 0 iff a word was found. */
     @@ diff.c: static void find_lno(const char *line, struct emit_callback *ecbdata)
      	ecbdata->lno_in_postimage = strtol(p + 1, NULL, 10);
      }
    @@ diff.c: static void diffstat_consume(void *priv, char *line, unsigned long len)
      }
      
      const char mime_boundary_leader[] = "------------";
    -@@ diff.c: static int is_conflict_marker(const char *line, int marker_size, unsigned long l
    - 	return 1;
    - }
    - 
    --static void checkdiff_consume_hunk(void *priv,
    -+static int checkdiff_consume_hunk(void *priv,
    - 				   long ob, long on, long nb, long nn,
    - 				   const char *func, long funclen)
    - 
    - {
    - 	struct checkdiff_t *data = priv;
    +@@ diff.c: static void checkdiff_consume_hunk(void *priv,
      	data->lineno = nb - 1;
    -+	return 0;
      }
      
     -static void checkdiff_consume(void *priv, char *line, unsigned long len)
    @@ range-diff.c: static void find_exact_matches(struct string_list *a, struct strin
     +	return 0;
      }
      
    --static void diffsize_hunk(void *data, long ob, long on, long nb, long nn,
    --			  const char *funcline, long funclen)
    -+static int diffsize_hunk(void *data, long ob, long on, long nb, long nn,
    -+			 const char *funcline, long funclen)
    - {
    - 	diffsize_consume(data, NULL, 0);
    -+	return 0;
    - }
    - 
    - static int diffsize(const char *a, const char *b)
    + static void diffsize_hunk(void *data, long ob, long on, long nb, long nn,
     
      ## xdiff-interface.c ##
     @@ xdiff-interface.c: static int xdiff_out_hunk(void *priv_,
    @@ xdiff-interface.c: static void consume_one(void *priv_, char *s, unsigned long s
      }
      
      static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
    -@@ xdiff-interface.c: int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t co
    - 	return xdl_diff(&a, &b, xpp, xecfg, xecb);
    - }
    - 
    --void discard_hunk_line(void *priv,
    --		       long ob, long on, long nb, long nn,
    --		       const char *func, long funclen)
    -+int discard_hunk_line(void *priv,
    -+		      long ob, long on, long nb, long nn,
    -+		      const char *func, long funclen)
    - {
    -+	return 0;
    - }
    - 
    - int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
     
      ## xdiff-interface.h ##
     @@
    @@ xdiff-interface.h
      #define MAX_XDIFF_SIZE (1024UL * 1024 * 1023)
      
     -typedef void (*xdiff_emit_line_fn)(void *, char *, unsigned long);
    --typedef void (*xdiff_emit_hunk_fn)(void *data,
    --				   long old_begin, long old_nr,
    --				   long new_begin, long new_nr,
    --				   const char *func, long funclen);
    -+/*
    -+ * The xdiff_emit_{line,hunk}_fn consumers can return -1 to abort
    -+ * early, or 0 to continue processing. Note that doing so is an
    -+ * all-or-nothing affair, as returning -1 will return all the way to
    -+ * the top-level, e.g. the xdi_diff_outf() call to generate the diff.
    -+ *
    -+ * Thus returning -1 from a hunk header callback means you won't be
    -+ * getting any more hunks, or diffs, and likewise returning from a
    -+ * line callback means you won't be getting anymore lines.
    -+ *
    -+ * We may extend the interface in the future to understand other more
    -+ * granular return values, but for now use it carefully, or consider
    -+ * e.g. using discard_hunk_line() if you say just don't care about
    -+ * hunk headers.
    -+ */
     +typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
    -+typedef int (*xdiff_emit_hunk_fn)(void *data,
    -+				  long old_begin, long old_nr,
    -+				  long new_begin, long new_nr,
    -+				  const char *func, long funclen);
    - 
    - int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb);
    - int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
    -@@ xdiff-interface.h: extern int git_xmerge_style;
    -  * Can be used as a no-op hunk_fn for xdi_diff_outf(), since a NULL
    -  * one just sends the hunk line to the line_fn callback).
    -  */
    --void discard_hunk_line(void *priv,
    --		       long ob, long on, long nb, long nn,
    --		       const char *func, long funclen);
    -+int discard_hunk_line(void *priv,
    -+		      long ob, long on, long nb, long nn,
    -+		      const char *func, long funclen);
    - 
    - /*
    -  * Compare the strings l1 with l2 which are of size s1 and s2 respectively.
    + typedef void (*xdiff_emit_hunk_fn)(void *data,
    + 				   long old_begin, long old_nr,
    + 				   long new_begin, long new_nr,
20:  69b061832b3 ! 18:  76d667f152f xdiff-interface: support early exit in xdiff_outf()
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    xdiff-interface: support early exit in xdiff_outf()
    +    xdiff-interface: allow early return from xdiff_emit_line_fn
     
    -    Bridge the gap between the preceding "xdiff-interface: allow early
    -    return from xdiff_emit_{line,hunk}_fn" change and the public
    -    interface.
    +    Finish the change started in the preceding commit and allow an early
    +    return from "xdiff_emit_line_fn" callbacks, this will allows
    +    diffcore-pickaxe.c to save itself redundant work.
     
    -    This change was split off from the preceding commit as it wasn't a
    -    purely mechanical addition of "return 0".
    +    Our xdiff interface also had the limitation of not being able to abort
    +    early since the beginning, see d9ea73e0564 (combine-diff: refactor
    +    built-in xdiff interface., 2006-04-05). Although at that time
    +    "xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
    +    "xdiff_emit_hunk_fn" didn't exist yet.
     
    -    Here we want to be able to abort early, but do so in a way that
    -    doesn't skip the appropriate strbuf_reset() invocations.
    +    There was some work in this area of xdiff-interface.[ch] recently with
    +    3b40a090fd4 (diff: avoid generating unused hunk header lines,
    +    2018-11-02) and 7c61e25fbf1 (diff: use hunk callback for word-diff,
    +    2018-11-02).
    +
    +    In combination those two changes allow us to not do any work on the
    +    hunks and diff at all, but didn't change the status quo with regards
    +    to consumers that e.g. want the diff lines, but might want to abort
    +    early.
    +
    +    Whereas now we can abort e.g. on the first "-line" of a 1000 line diff
    +    if that's all we needed.
    +
    +    This interface is rather scary as noted in the comment to
    +    xdiff-interface.h being added here, as noted there a future change
    +    could add more exit codes, and hack xdl_emit_diff() and friends to
    +    ignore or skip things more selectively as a result.
    +
    +    I did not see an inherent reason for why xdl_emit_{diffrec,record}()
    +    could not be changed to ferry the "xdiff_emit_line_fn" error code
    +    upwards instead of returning -1 on all "ret < 0".
     
    -    The use of -1 as a return value is consistent with the rest of the
    -    xdiff codebase, where doing so signals an abort or error that'll
    -    propagate up the stack.
    +    But doing so would require corresponding changes in xdl_emit_diff(),
    +    xdl_diff(). I didn't see any issue with narrowly doing that to
    +    accomplish what I needed here, but it would leave xdiff's own return
    +    values in an inconsistent state.
    +
    +    Instead I've left it at returning a more conventional (for git's own
    +    codebase) 1 for an early return, and translating it (or rather, all
    +    non-zero) to -1 for xdiff's consumption.
    +
    +    The reason for most of the "stop" complexity in xdiff_outf() is
    +    because we want to be able to abort early, but do so in a way that
    +    doesn't skip the appropriate strbuf_reset() invocations.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## xdiff-interface.c ##
     @@ xdiff-interface.c: static int consume_one(void *priv_, char *s, unsigned long size)
    + 	char *ep;
    + 	while (size) {
      		unsigned long this_size;
    ++		int ret;
      		ep = memchr(s, '\n', size);
      		this_size = (ep == NULL) ? size : (ep - s + 1);
     -		priv->line_fn(priv->consume_callback_data, s, this_size);
    -+		if (priv->line_fn(priv->consume_callback_data, s, this_size))
    -+			return -1;
    ++		ret = priv->line_fn(priv->consume_callback_data, s, this_size);
    ++		if (ret)
    ++			return ret;
      		size -= this_size;
      		s += this_size;
      	}
    @@ xdiff-interface.c: static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
      
      	for (i = 0; i < nbuf; i++) {
     +		if (stop)
    -+			return -1;
    ++			return 1;
      		if (mb[i].ptr[mb[i].size-1] != '\n') {
      			/* Incomplete line */
      			strbuf_add(&priv->remainder, mb[i].ptr, mb[i].size);
    @@ xdiff-interface.c: static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
      	return 0;
      }
      
    +
    + ## xdiff-interface.h ##
    +@@
    +  */
    + #define MAX_XDIFF_SIZE (1024UL * 1024 * 1023)
    + 
    ++/**
    ++ * The `xdiff_emit_line_fn` function can return 1 to abort early, or 0
    ++ * to continue processing. Note that doing so is an all-or-nothing
    ++ * affair, as returning 1 will return all the way to the top-level,
    ++ * e.g. the xdi_diff_outf() call to generate the diff.
    ++ *
    ++ * Thus returning 1 means you won't be getting any more diff lines. If
    ++ * you need something in-between those two options you'll to use
    ++ * `xdl_emit_hunk_consume_func_t` and implement your own version of
    ++ * xdl_emit_diff().
    ++ *
    ++ * We may extend the interface in the future to understand other more
    ++ * granular return values. While you should return 1 to exit early,
    ++ * doing so will currently make your early return indistinguishable
    ++ * from an error internal to xdiff, xdiff itself will see that
    ++ * non-zero return and translate it to -1.
    ++ */
    + typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
    + typedef void (*xdiff_emit_hunk_fn)(void *data,
    + 				   long old_begin, long old_nr,
21:  fc0aa61d093 ! 19:  53e9405f849 pickaxe -G: terminate early on matching lines
    @@ Commit message
     
         Solve a long-standing item for "git log -Grx" of us e.g. finding "+
         str" in the diff context and noting that we had a "hit", but xdiff
    -    diligently continuing to generate and spew the rest of the diff at us.
    +    diligently continuing to generate and spew the rest of the diff at
    +    us. This makes use of a new "early return" xdiff interface added by
    +    preceding commits.
     
    -    The TODO item has been there since "git log -G" was implemented. See
    -    f506b8e8b5f (git log/diff: add -G<regexp> that greps in the patch
    -    text, 2010-08-23).
    +    The TODO item (or, the NEEDSWORK comment) has been there since "git
    +    log -G" was implemented. See f506b8e8b5f (git log/diff: add -G<regexp>
    +    that greps in the patch text, 2010-08-23).
     
    -    Our xdiff interface also had the limitation of not being able to abort
    -    early since the beginning, see d9ea73e0564 (combine-diff: refactor
    -    built-in xdiff interface., 2006-04-05). Although at that time
    -    "xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
    -    "xdiff_emit_hunk_fn" didn't exist yet.
    -
    -    But now with the support added in the preceding ""xdiff-interface:
    -    allow early return from xdiff_emit_{line,hunk}_fn" commit we can
    -    return early, and furthermore test the functionality of the new
    -    early-exit xdiff-interface by having a BUG() call here to die if it
    -    ever starts handing us needless work again.
    +    But now with the support added in the preceding changes to the
    +    xdiff-interface we can return early. Let's assert the behavior of that
    +    new early-return xdiff-interface by having a BUG() call here to die if
    +    it ever starts handing us needless work again.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ diffcore-pickaxe.c: static int diffgrep_consume(void *priv, char *line, unsigned
     +	if (!regexec_buf(data->regexp, line + 1, len - 1, 1,
     +			 &regmatch, 0)) {
     +		data->hit = 1;
    -+		return -1;
    ++		return 1;
     +	}
      	return 0;
      }
    @@ diffcore-pickaxe.c: static int diff_grep(mmfile_t *one, mmfile_t *two,
     +
     +	/*
     +	 * An xdiff error might be our "data->hit" from above. See the
    -+	 * comment for xdiff_emit_{line,hunk}_fn in xdiff-interface.h
    -+	 * for why.
    ++	 * comment for xdiff_emit_line_fn in xdiff-interface.h
     +	 */
     +	ret = xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
     +			    &ecbdata, &xpp, &xecfg);
    @@ diffcore-pickaxe.c: static int diff_grep(mmfile_t *one, mmfile_t *two,
     
      ## xdiff-interface.h ##
     @@
    -  * granular return values, but for now use it carefully, or consider
    -  * e.g. using discard_hunk_line() if you say just don't care about
    -  * hunk headers.
    +  * doing so will currently make your early return indistinguishable
    +  * from an error internal to xdiff, xdiff itself will see that
    +  * non-zero return and translate it to -1.
     + *
    -+ * Note that just returning -1 will make your early return
    -+ * indistinguishable from an error internal to xdiff. See "diff_grep"
    -+ * in diffcore-pickaxe.c for a trick to work around this, i.e. using
    -+ * the "consume_callback_data" to note the desired early return.
    ++ * See "diff_grep" in diffcore-pickaxe.c for a trick to work around
    ++ * this, i.e. using the "consume_callback_data" to note the desired
    ++ * early return.
       */
      typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
    - typedef int (*xdiff_emit_hunk_fn)(void *data,
    + typedef void (*xdiff_emit_hunk_fn)(void *data,
22:  c81b18ca4c7 = 20:  76de6ebc8b8 pickaxe -G: don't special-case create/delete
 -:  ----------- > 21:  9bb7ac910f3 xdiff users: use designated initializers for out_line
 -:  ----------- > 22:  1178956fb3d xdiff-interface: replace discard_hunk_line() with a flag
-- 
2.31.1.639.g3d04783866f


  parent reply	other threads:[~2021-04-12 17:15 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-03  3:27 [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 01/25] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 02/25] grep/pcre2: drop needless assignment + assert() on opt->pcre2 Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 03/25] grep/pcre2: drop needless assignment to NULL Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 04/25] grep/pcre2: correct reference to grep_init() in comment Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 05/25] grep/pcre2: prepare to add debugging to pcre2_malloc() Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 06/25] grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 07/25] grep/pcre2: use compile-time PCREv2 version test Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 08/25] grep/pcre2: use pcre2_maketables_free() function Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 09/25] grep/pcre2: actually make pcre2 use custom allocator Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 10/25] grep/pcre2: move back to thread-only PCREv2 structures Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 11/25] grep/pcre2: move definitions of pcre2_{malloc,free} Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 12/25] pickaxe tests: refactor to use test_commit --append Ævar Arnfjörð Bjarmason
2021-02-03  3:27 ` [PATCH 13/25] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop Ævar Arnfjörð Bjarmason
2021-02-04 16:16   ` René Scharfe
2021-02-04 17:56     ` Junio C Hamano
2021-02-04 21:13       ` Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 15/25] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 16/25] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 17/25] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 18/25] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 19/25] pickaxe -G: set -U0 for diff generation Ævar Arnfjörð Bjarmason
2021-02-03 14:26   ` Ævar Arnfjörð Bjarmason
2021-02-03 19:42     ` Junio C Hamano
2021-02-03  3:28 ` [PATCH 20/25] grep.h: make patmatch() a public function Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 21/25] pickaxe: use PCREv2 for -G and -S Ævar Arnfjörð Bjarmason
2021-02-03 20:44   ` Ævar Arnfjörð Bjarmason
2021-02-04 18:11     ` Junio C Hamano
2021-02-04 18:22   ` Junio C Hamano
2021-02-03  3:28 ` [PATCH 22/25] Remove unused kwset.[ch] Ævar Arnfjörð Bjarmason
     [not found]   ` <CAPUEspgBmuTBHVZWY9fRtjbHWBRr0zHravLL1Czepc6jmib4HA@mail.gmail.com>
2021-02-03 14:13     ` Ævar Arnfjörð Bjarmason
     [not found]       ` <CAPUEsphN7QuSVsC1Tr4xE8yQgPTtpF7wL7zbk1crQU3n-5g6JQ@mail.gmail.com>
2021-02-03 16:45         ` Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 23/25] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
2021-02-03  3:28 ` [PATCH 24/25] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
2021-02-04 18:16   ` Junio C Hamano
2021-02-03  3:28 ` [PATCH 25/25] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
2021-02-03 12:38 ` [PATCH 00/25] grep: PCREv2 fixes, remove kwset.[ch] Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 00/22] pickaxe: test and refactoring for follow-up changes Ævar Arnfjörð Bjarmason
2021-02-16 22:23   ` Junio C Hamano
2021-02-17  1:19     ` Junio C Hamano
2021-04-12 17:15   ` Ævar Arnfjörð Bjarmason [this message]
2021-04-12 17:15     ` [PATCH v3 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 02/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 03/22] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 04/22] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 05/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 06/22] pickaxe tests: add missing test for --no-pickaxe-regex being an error Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 07/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 08/22] pickaxe: die when --find-object and --pickaxe-all " Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 09/22] diff.h: move pickaxe fields together again Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 10/22] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 11/22] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 12/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 13/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 14/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 15/22] pickaxe: rename variables in has_changes() for brevity Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 16/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 17/22] xdiff-interface: prepare for allowing early return Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 18/22] xdiff-interface: allow early return from xdiff_emit_line_fn Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 19/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 20/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 21/22] xdiff users: use designated initializers for out_line Ævar Arnfjörð Bjarmason
2021-04-12 17:15     ` [PATCH v3 22/22] xdiff-interface: replace discard_hunk_line() with a flag Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 01/22] grep/pcre2 tests: reword comments referring to kwset Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 02/22] test-lib-functions: document and test test_commit --no-tag Ævar Arnfjörð Bjarmason
2021-03-30 23:14   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 03/22] test-lib-functions: reword "test_commit --append" docs Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 04/22] test-lib functions: add --printf option to test_commit Ævar Arnfjörð Bjarmason
2021-03-30 23:11   ` Junio C Hamano
2021-04-12 13:19     ` Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 05/22] pickaxe tests: refactor to use test_commit --append --printf Ævar Arnfjörð Bjarmason
2021-03-30 23:26   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 06/22] pickaxe tests: add test for diffgrep_consume() internals Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 07/22] pickaxe tests: add test for "log -S" not being a regex Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 08/22] pickaxe tests: test for -G, -S and --find-object incompatibility Ævar Arnfjörð Bjarmason
2021-03-30 23:32   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 09/22] pickaxe: die when -G and --pickaxe-regex are combined Ævar Arnfjörð Bjarmason
2021-03-30 23:36   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 10/22] pickaxe: die when --find-object and --pickaxe-all " Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 11/22] diff.h: move pickaxe fields together again Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 12/22] pickaxe/style: consolidate declarations and assignments Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 13/22] perf: add performance test for pickaxe Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 14/22] pickaxe: refactor function selection in diffcore-pickaxe() Ævar Arnfjörð Bjarmason
2021-03-30 23:45   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 15/22] pickaxe: assert that we must have a needle under -G or -S Ævar Arnfjörð Bjarmason
2021-03-30 23:50   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 16/22] pickaxe -S: support content with NULs under --pickaxe-regex Ævar Arnfjörð Bjarmason
2021-03-30 23:54   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 17/22] pickaxe: rename variables in has_changes() for brevity Ævar Arnfjörð Bjarmason
2021-02-16 11:57 ` [PATCH v2 18/22] pickaxe -S: slightly optimize contains() Ævar Arnfjörð Bjarmason
2021-03-30 23:58   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 19/22] xdiff-interface: allow early return from xdiff_emit_{line,hunk}_fn Ævar Arnfjörð Bjarmason
2021-03-31  0:04   ` Junio C Hamano
2021-02-16 11:57 ` [PATCH v2 20/22] xdiff-interface: support early exit in xdiff_outf() Ævar Arnfjörð Bjarmason
2021-02-16 11:58 ` [PATCH v2 21/22] pickaxe -G: terminate early on matching lines Ævar Arnfjörð Bjarmason
2021-03-31  0:11   ` Junio C Hamano
2021-02-16 11:58 ` [PATCH v2 22/22] pickaxe -G: don't special-case create/delete Ævar Arnfjörð Bjarmason
2021-03-31  0:14   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover-00.22-00000000000-20210412T170457Z-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=carenas@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.