git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] diff: add -I<regex> that ignores matching changes
@ 2020-10-01 12:06 Michał Kępień
  2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień
                   ` (3 more replies)
  0 siblings, 4 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-01 12:06 UTC (permalink / raw)
  To: git

This patch series adds a new diff option that enables ignoring changes
whose all lines (changed, removed, and added) match a given regular
expression.  This is similar to the -I option in standalone diff
utilities and can be used e.g. to look for unrelated changes in commits
containing a large number of automatically applied modifications (e.g. a
tree-wide string replacement).  The difference between -G/-S and the new
-I option is that the latter filters output on a per-change basis.

I personally found this feature handy quite a few times while reviewing
patches and a quick web search yields some questions from other people
looking for something like this, so I thought I would take a shot at
submitting it upstream.

Please note that while this patch series in its current shape builds,
seems to pass tests, and appears to work as I would expect it to, this
is my first attempt at contributing to Git and I did come across a few
stumbling blocks:

  - Instead of adding an extra flag for 'xdl_opts', would it be better
    to make sure xpparam_t structures are always zero-initialized before
    xdl_diff() is called (see commit message for patch 1 for context)?

  - If this patch series is accepted, should xdl_mark_ignorable() be
    renamed to something like xdl_mark_ignorable_blank() for slightly
    improved code clarity?

  - Would a more thorough explanation of the option help or hurt?

  - Given that hunk emitting rules are the same for the -I option
    proposed in this patch series as for --ignore-blank-lines, is it
    necessary to duplicate the (impressively extensive!)
    --inter-hunk-context tests in t/t4015-diff-whitespace.sh for -I?  I
    decided against doing that for the time being (though I did add some
    basic tests for such scenarios in patch 2, see the "with -U1" test).

  - Should tests be added in a separate commit?  This is what I did as I
    thought it would help with readability, but...

Thanks in advance for taking a look!

Michał Kępień (2):
  diff: add -I<regex> that ignores matching changes
  t: add -I<regex> tests

 Documentation/diff-options.txt |   3 +
 diff.c                         |  16 ++
 diff.h                         |   2 +
 t/t4069-diff-ignore-regex.sh   | 330 +++++++++++++++++++++++++++++++++
 xdiff/xdiff.h                  |   2 +
 xdiff/xdiffi.c                 |  36 ++++
 6 files changed, 389 insertions(+)
 create mode 100755 t/t4069-diff-ignore-regex.sh

-- 
2.28.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 1/2] diff: add -I<regex> that ignores matching changes
  2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
@ 2020-10-01 12:06 ` Michał Kępień
  2020-10-01 18:21   ` Junio C Hamano
  2020-10-01 12:06 ` [PATCH 2/2] t: add -I<regex> tests Michał Kępień
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 57+ messages in thread
From: Michał Kępień @ 2020-10-01 12:06 UTC (permalink / raw)
  To: git

Add a new diff option that enables ignoring changes whose all lines
(changed, removed, and added) match a given regular expression.  This is
similar to the -I option in standalone diff utilities and can be used
e.g. to look for unrelated changes in commits containing a large number
of automatically applied modifications (e.g. a tree-wide string
replacement).  The difference between -G/-S and the new -I option is
that the latter filters output on a per-change basis.

Use the 'ignore' field of xdchange_t for marking a change as ignored or
not.  Since the same field is used by --ignore-blank-lines, identical
hunk emitting rules apply for --ignore-blank-lines and -I.  These two
options can also be used together at the same time.

Apart from adding a new field to struct diff_options, also define a new
flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure.
This is done because the xpparam_t structure (which is used for passing
around the regular expression supplied to -I) is not zeroed out in all
call stacks involving xdl_diff() and thus only performing a NULL check
on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating
garbage on the stack as a regular expression.  As the 'flags' field of
xpparam_t is initialized in all call stacks involving xdl_diff(), adding
a flag check avoids that problem.

Signed-off-by: Michał Kępień <michal@isc.org>
---
 Documentation/diff-options.txt |  3 +++
 diff.c                         | 16 +++++++++++++++
 diff.h                         |  2 ++
 xdiff/xdiff.h                  |  2 ++
 xdiff/xdiffi.c                 | 36 ++++++++++++++++++++++++++++++++++
 5 files changed, 59 insertions(+)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 573fb9bb71..a7ef3f5645 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -687,6 +687,9 @@ endif::git-format-patch[]
 --ignore-blank-lines::
 	Ignore changes whose lines are all blank.
 
+-I<regex>::
+	Ignore changes whose all lines match <regex>.
+
 --inter-hunk-context=<lines>::
 	Show the context between diff hunks, up to the specified number
 	of lines, thereby fusing hunks that are close to each other.
diff --git a/diff.c b/diff.c
index 2bb2f8f57e..c9603c941e 100644
--- a/diff.c
+++ b/diff.c
@@ -3587,6 +3587,7 @@ static void builtin_diff(const char *name_a,
 		if (header.len && !o->flags.suppress_diff_headers)
 			ecbdata.header = &header;
 		xpp.flags = o->xdl_opts;
+		xpp.ignore_regex = o->ignore_regex;
 		xpp.anchors = o->anchors;
 		xpp.anchors_nr = o->anchors_nr;
 		xecfg.ctxlen = o->context;
@@ -3716,6 +3717,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		xpp.flags = o->xdl_opts;
+		xpp.ignore_regex = o->ignore_regex;
 		xpp.anchors = o->anchors;
 		xpp.anchors_nr = o->anchors_nr;
 		xecfg.ctxlen = o->context;
@@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt,
 	return 0;
 }
 
+static int diff_opt_ignore_regex(const struct option *opt,
+				 const char *arg, int unset)
+{
+	struct diff_options *options = opt->value;
+
+	BUG_ON_OPT_NEG(unset);
+	options->xdl_opts |= XDF_IGNORE_REGEX;
+	options->ignore_regex = arg;
+	return 0;
+}
+
 static int diff_opt_pickaxe_regex(const struct option *opt,
 				  const char *arg, int unset)
 {
@@ -5491,6 +5504,9 @@ static void prep_parse_options(struct diff_options *options)
 		OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts,
 			  N_("ignore changes whose lines are all blank"),
 			  XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG),
+		OPT_CALLBACK_F('I', NULL, options, N_("<regex>"),
+			       N_("ignore changes whose all lines match <regex>"),
+			       0, diff_opt_ignore_regex),
 		OPT_BIT(0, "indent-heuristic", &options->xdl_opts,
 			N_("heuristic to shift diff hunk boundaries for easy reading"),
 			XDF_INDENT_HEURISTIC),
diff --git a/diff.h b/diff.h
index 3de343270f..0b8871c0c1 100644
--- a/diff.h
+++ b/diff.h
@@ -234,6 +234,8 @@ struct diff_options {
 	 */
 	const char *pickaxe;
 
+	const char *ignore_regex;
+
 	const char *single_follow;
 	const char *a_prefix, *b_prefix;
 	const char *line_prefix;
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 032e3a9f41..db28055a5e 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -40,6 +40,7 @@ extern "C" {
 			      XDF_IGNORE_CR_AT_EOL)
 
 #define XDF_IGNORE_BLANK_LINES (1 << 7)
+#define XDF_IGNORE_REGEX (1 << 8)
 
 #define XDF_PATIENCE_DIFF (1 << 14)
 #define XDF_HISTOGRAM_DIFF (1 << 15)
@@ -78,6 +79,7 @@ typedef struct s_mmbuffer {
 
 typedef struct s_xpparam {
 	unsigned long flags;
+	const char *ignore_regex;
 
 	/* See Documentation/diff-options.txt. */
 	char **anchors;
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index bd035139f9..13f7ab95ac 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
 	}
 }
 
+static void
+xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
+			 const char *ignore_regex)
+{
+	xdchange_t *xch;
+	regex_t regex;
+
+	if (regcomp(&regex, ignore_regex, REG_EXTENDED | REG_NEWLINE))
+		die("invalid regex: %s", ignore_regex);
+
+	for (xch = xscr; xch; xch = xch->next) {
+		regmatch_t regmatch;
+		xrecord_t **rec;
+		int ignore = 1;
+		long i;
+
+		rec = &xe->xdf1.recs[xch->i1];
+		for (i = 0; i < xch->chg1 && ignore; i++)
+			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
+					      1, &regmatch, 0);
+
+		rec = &xe->xdf2.recs[xch->i2];
+		for (i = 0; i < xch->chg2 && ignore; i++)
+			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
+					      1, &regmatch, 0);
+
+		/*
+		 * Do not override --ignore-blank-lines.
+		 */
+		xch->ignore |= ignore;
+	}
+}
+
 int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
 	xdchange_t *xscr;
@@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
 			xdl_mark_ignorable(xscr, &xe, xpp->flags);
 
+		if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex)
+			xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex);
+
 		if (ef(&xe, xscr, ecb, xecfg) < 0) {
 
 			xdl_free_script(xscr);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 2/2] t: add -I<regex> tests
  2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
  2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień
@ 2020-10-01 12:06 ` Michał Kępień
  2020-10-01 17:02 ` [PATCH 0/2] diff: add -I<regex> that ignores matching changes Junio C Hamano
  2020-10-12  9:17 ` [PATCH v2 0/3] " Michał Kępień
  3 siblings, 0 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-01 12:06 UTC (permalink / raw)
  To: git

Exercise the new 'git diff -I<regex>' option in various scenarios to
ensure it behaves as expected.

Signed-off-by: Michał Kępień <michal@isc.org>
---
 t/t4069-diff-ignore-regex.sh | 330 +++++++++++++++++++++++++++++++++++
 1 file changed, 330 insertions(+)
 create mode 100755 t/t4069-diff-ignore-regex.sh

diff --git a/t/t4069-diff-ignore-regex.sh b/t/t4069-diff-ignore-regex.sh
new file mode 100755
index 0000000000..323687f1dc
--- /dev/null
+++ b/t/t4069-diff-ignore-regex.sh
@@ -0,0 +1,330 @@
+#!/bin/sh
+
+test_description='Test diff -I<regex>'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
+
+test_expect_success setup '
+	test_seq 20 >x &&
+	git update-index --add x
+'
+
+test_expect_success 'one line changed' '
+	test_seq 20 | sed "s/10/100/" >x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -7,7 +7,7 @@
+	 7
+	 8
+	 9
+	-10
+	+100
+	 11
+	 12
+	 13
+	EOF
+	compare_diff_patch expected plain &&
+
+	# Both old and new line match regex - ignore change
+	git diff -I "^10" >actual &&
+	test_must_be_empty actual &&
+
+	# Only old line matches regex - do not ignore change
+	git diff -I "^10\$" >actual &&
+	compare_diff_patch plain actual &&
+
+	# Only new line matches regex - do not ignore change
+	git diff -I "^100" >actual &&
+	compare_diff_patch plain actual
+'
+
+test_expect_success 'one line removed' '
+	test_seq 20 | sed "10d" >x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -7,7 +7,6 @@
+	 7
+	 8
+	 9
+	-10
+	 11
+	 12
+	 13
+	EOF
+	compare_diff_patch expected plain &&
+
+	# Removed line matches regex - ignore change
+	git diff -I "^10" >actual &&
+	test_must_be_empty actual &&
+
+	# Removed line does not match regex - do not ignore change
+	git diff -I "^101" >actual &&
+	compare_diff_patch plain actual
+'
+
+test_expect_success 'one line added' '
+	test_seq 21 >x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -18,3 +18,4 @@
+	 18
+	 19
+	 20
+	+21
+	EOF
+	compare_diff_patch expected plain &&
+
+	# Added line matches regex - ignore change
+	git diff -I "^21" >actual &&
+	test_must_be_empty actual &&
+
+	# Added line does not match regex - do not ignore change
+	git diff -I "^212" >actual &&
+	compare_diff_patch plain actual
+'
+
+test_expect_success 'last two lines changed' '
+	test_seq 20 | sed "s/19/21/; s/20/22/" >x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -16,5 +16,5 @@
+	 16
+	 17
+	 18
+	-19
+	-20
+	+21
+	+22
+	EOF
+	compare_diff_patch expected plain &&
+
+	# All changed lines match regex - ignore change
+	git diff -I "^[12]" >actual &&
+	test_must_be_empty actual &&
+
+	# Not all changed lines match regex - do not ignore change
+	git diff -I "^2" >actual &&
+	compare_diff_patch plain actual
+'
+
+test_expect_success 'two non-adjacent lines removed in the same hunk' '
+	test_seq 20 | sed "1d; 3d" >x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -1,6 +1,4 @@
+	-1
+	 2
+	-3
+	 4
+	 5
+	 6
+	EOF
+	compare_diff_patch expected plain &&
+
+	# Both removed lines match regex - ignore hunk
+	git diff -I "^[1-3]" >actual &&
+	test_must_be_empty actual &&
+
+	# First removed line does not match regex - do not ignore hunk
+	git diff -I "^[2-3]" >actual &&
+	compare_diff_patch plain actual &&
+
+	# Second removed line does not match regex - do not ignore hunk
+	git diff -I "^[1-2]" >actual &&
+	compare_diff_patch plain actual
+'
+
+test_expect_success 'two non-adjacent lines removed in the same hunk, with -U1' '
+	test_seq 20 | sed "1d; 3d" >x &&
+
+	# Get plain diff
+	git diff -U1 >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -1,4 +1,2 @@
+	-1
+	 2
+	-3
+	 4
+	EOF
+	compare_diff_patch expected plain &&
+
+	# Both removed lines match regex - ignore hunk
+	git diff -U1 -I "^[1-3]" >actual &&
+	test_must_be_empty actual &&
+
+	# First removed line does not match regex, but is out of context - ignore second change
+	git diff -U1 -I "^[2-3]" >actual &&
+	cat >second-change-ignored <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -1,2 +1 @@
+	-1
+	 2
+	EOF
+	compare_diff_patch second-change-ignored actual &&
+
+	# Second removed line does not match regex, but is out of context - ignore first change
+	git diff -U1 -I "^[1-2]" >actual &&
+	cat >first-change-ignored <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -2,3 +1,2 @@
+	 2
+	-3
+	 4
+	EOF
+	compare_diff_patch first-change-ignored actual
+'
+
+test_expect_success 'multiple hunks' '
+	test_seq 20 | sed "1d; 20d" >x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -1,4 +1,3 @@
+	-1
+	 2
+	 3
+	 4
+	@@ -17,4 +16,3 @@
+	 17
+	 18
+	 19
+	-20
+	EOF
+	compare_diff_patch expected plain &&
+
+	# Ignore both hunks
+	git diff -I "^[12]" >actual &&
+	test_must_be_empty actual &&
+
+	# Only ignore first hunk
+	git diff -I "^1" >actual &&
+	cat >first-hunk-ignored <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -17,4 +16,3 @@
+	 17
+	 18
+	 19
+	-20
+	EOF
+	compare_diff_patch first-hunk-ignored actual &&
+
+	# Only ignore second hunk
+	git diff -I "^2" >actual &&
+	cat >second-hunk-ignored <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -1,4 +1,3 @@
+	-1
+	 2
+	 3
+	 4
+	EOF
+	compare_diff_patch second-hunk-ignored actual
+'
+
+test_expect_success 'multiple hunks, with --ignore-blank-lines' '
+	echo >x &&
+	test_seq 21 >>x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -1,3 +1,4 @@
+	+
+	 1
+	 2
+	 3
+	@@ -18,3 +19,4 @@
+	 18
+	 19
+	 20
+	+21
+	EOF
+	compare_diff_patch expected plain &&
+
+	# -I does not override --ignore-blank-lines - ignore both hunks
+	git diff --ignore-blank-lines -I ^21 >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'diffstat' '
+	test_seq 20 | sed "s/^5/0/p; s/^15/10/; 16d" >x &&
+
+	# Get plain diffstat
+	git diff --stat >actual &&
+	cat >expected <<-EOF &&
+	 x | 6 +++---
+	 1 file changed, 3 insertions(+), 3 deletions(-)
+	EOF
+	test_cmp expected actual &&
+
+	# Ignore both hunks
+	git diff --stat -I "^[0-5]" >actual &&
+	test_must_be_empty actual &&
+
+	# Only ignore first hunk
+	git diff --stat -I "^[05]" >actual &&
+	cat >expected <<-EOF &&
+	 x | 3 +--
+	 1 file changed, 1 insertion(+), 2 deletions(-)
+	EOF
+	test_cmp expected actual &&
+
+	# Only ignore second hunk
+	git diff --stat -I "^1" >actual &&
+	cat >expected <<-EOF &&
+	 x | 3 ++-
+	 1 file changed, 2 insertions(+), 1 deletion(-)
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success 'invalid regex' '
+	>x &&
+	git diff -I "^[1" 2>&1 | grep "invalid regex: "
+'
+
+test_done
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/2] diff: add -I<regex> that ignores matching changes
  2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
  2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień
  2020-10-01 12:06 ` [PATCH 2/2] t: add -I<regex> tests Michał Kępień
@ 2020-10-01 17:02 ` Junio C Hamano
  2020-10-12  9:17 ` [PATCH v2 0/3] " Michał Kępień
  3 siblings, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2020-10-01 17:02 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

Michał Kępień <michal@isc.org> writes:

> This patch series adds a new diff option that enables ignoring changes
> whose all lines (changed, removed, and added) match a given regular
> expression.  This is similar to the -I option in standalone diff
> utilities and can be used e.g. to look for unrelated changes in commits
> containing a large number of automatically applied modifications (e.g. a
> tree-wide string replacement).  The difference between -G/-S and the new
> -I option is that the latter filters output on a per-change basis.

I am uncomfortably excited to see a change to quite a low-level code
of the diff machinery.  It does not happen every day ;-)


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] diff: add -I<regex> that ignores matching changes
  2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień
@ 2020-10-01 18:21   ` Junio C Hamano
  2020-10-07 19:48     ` Michał Kępień
  0 siblings, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2020-10-01 18:21 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

Michał Kępień <michal@isc.org> writes:

> Apart from adding a new field to struct diff_options, also define a new
> flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure.
> This is done because the xpparam_t structure (which is used for passing
> around the regular expression supplied to -I) is not zeroed out in all
> call stacks involving xdl_diff() and thus only performing a NULL check
> on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating
> garbage on the stack as a regular expression.  As the 'flags' field of
> xpparam_t is initialized in all call stacks involving xdl_diff(), adding
> a flag check avoids that problem.

You mentioned in your cover letter about this, and I tend to agree
that this feels like a hack, at least from the first glance.  The
next feature that wants to have an optional pointer in xpparam_t and
have the code behave differently with the data pointed by it would
need to waste another bit the same way.  Do you already know (read:
I am not asking you to dig to find out immediately---just asking if
you already know the answer) if there is an inherent reason why they
cannot be memset(0) before use?  It seems like a much better
approach to ensure that we clear the structure.  Doesn't existing
anchors array share the same issue (at least anchors_nr must be
cleared if there is no interesting anchors) already?  IOW, should
"git grep anchors_nr" be a valid way to identify _all_ places where
you need to clear the ignore_regex field?

> +-I<regex>::
> +	Ignore changes whose all lines match <regex>.
> +

The implementation seems to allow only one regex, but I suspect we'd
want to mimic

    $ printf "%s\n" a a a >test_a
    $ printf "%s\n" b b b >test_b
    $ diff -Ia     test_a test_b
    $ diff     -Ib test_a test_b
    $ diff -Ia -Ib test_a test_b

and until that happens, the explanation needs to say something about
earlier <regex> being discarded when this option is given more than
once.

> @@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt,
>  	return 0;
>  }
>  
> +static int diff_opt_ignore_regex(const struct option *opt,
> +				 const char *arg, int unset)
> +{
> +	struct diff_options *options = opt->value;
> +
> +	BUG_ON_OPT_NEG(unset);
> +	options->xdl_opts |= XDF_IGNORE_REGEX;
> +	options->ignore_regex = arg;

When given twice or more, the earlier value gets lost (it does not
leak, luckily, though).

> +	return 0;
> +}

If we somehow can lose the use of XDF_IGNORE_REGEX bit, we do not
have to have this as a callback.  Instead, it can be OPT_STRING with
the current semantics of "only the last one is used", or we can use
OPT_STRING_LIST to keep all the expressions.

On the other hand, I wonder if it would be a valid approach to make
the new field(s) in diff_options a "regex_t *ignore_regex" (and add
an "int ignore_regex_nr" next to it, if we shoot for multiple -I
options), instead of "const char *".  For that, we would need a
callback even without XDF_IGNORE_REGEX bit having to futz with
xdl_opts field.

Doing so would give us a chance to compile and notice a malformed
regular expression in diff.c, before it hits xdiff/ layer, which may
or may not be a good thing.

> @@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
>  	}
>  }

I agree with what you said in the cover letter that it would be a
good idea to name the existing xdl_mark_ignorable() and the new one
in such a way that they look similar and parallel, by renaming the
former to xdl_mark_ignorable_lines or something.

> +static void
> +xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
> +			 const char *ignore_regex)

I know some coding standard do chomp line immediately before the
function name (I grew up with one), but that is not what this
project uses, and judging from the surrounding code, it is not what
the upstream xdiff source we borrowed uses, either.

> +{
> +	xdchange_t *xch;
> +	regex_t regex;
> +
> +	if (regcomp(&regex, ignore_regex, REG_EXTENDED | REG_NEWLINE))
> +		die("invalid regex: %s", ignore_regex);

If we compile in diff.c layer and pass regex_t down here, we won't
have to fail here this deep in the callchain.

> +	for (xch = xscr; xch; xch = xch->next) {
> +		regmatch_t regmatch;
> +		xrecord_t **rec;
> +		int ignore = 1;
> +		long i;
> +
> +		rec = &xe->xdf1.recs[xch->i1];
> +		for (i = 0; i < xch->chg1 && ignore; i++)
> +			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
> +					      1, &regmatch, 0);
> +		rec = &xe->xdf2.recs[xch->i2];
> +		for (i = 0; i < xch->chg2 && ignore; i++)
> +			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
> +					      1, &regmatch, 0);
> +
> +		/*
> +		 * Do not override --ignore-blank-lines.
> +		 */
> +		xch->ignore |= ignore;

Well, you could optimize out the whole regexp matching by adding

		if (xch->ignore)
			continue;

before the two loops try to find a non-matching line, no?

> +	}
> +}
> +
>  int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>  	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
>  	xdchange_t *xscr;
> @@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>  		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
>  			xdl_mark_ignorable(xscr, &xe, xpp->flags);
>  
> +		if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex)
> +			xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex);
> +
>  		if (ef(&xe, xscr, ecb, xecfg) < 0) {
>  
>  			xdl_free_script(xscr);

Thanks for an exciting read ;-)


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] diff: add -I<regex> that ignores matching changes
  2020-10-01 18:21   ` Junio C Hamano
@ 2020-10-07 19:48     ` Michał Kępień
  2020-10-07 20:08       ` Junio C Hamano
  0 siblings, 1 reply; 57+ messages in thread
From: Michał Kępień @ 2020-10-07 19:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

> > Apart from adding a new field to struct diff_options, also define a new
> > flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure.
> > This is done because the xpparam_t structure (which is used for passing
> > around the regular expression supplied to -I) is not zeroed out in all
> > call stacks involving xdl_diff() and thus only performing a NULL check
> > on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating
> > garbage on the stack as a regular expression.  As the 'flags' field of
> > xpparam_t is initialized in all call stacks involving xdl_diff(), adding
> > a flag check avoids that problem.
> 
> You mentioned in your cover letter about this, and I tend to agree
> that this feels like a hack, at least from the first glance.  The
> next feature that wants to have an optional pointer in xpparam_t and
> have the code behave differently with the data pointed by it would
> need to waste another bit the same way.

Agreed.

> Do you already know (read:
> I am not asking you to dig to find out immediately---just asking if
> you already know the answer) if there is an inherent reason why they
> cannot be memset(0) before use?  It seems like a much better
> approach to ensure that we clear the structure.

I do not know of any reason for xpparam_t structures to not be
zero-initialized.  In fact, they _are_ zero-initialized most of the
time; AFAICT, there are just three places in the tree in which they are
not.

Would you like me to address that in a separate patch in this patch
series?

> Doesn't existing
> anchors array share the same issue (at least anchors_nr must be
> cleared if there is no interesting anchors) already?  IOW, should
> "git grep anchors_nr" be a valid way to identify _all_ places where
> you need to clear the ignore_regex field?

Yes, the 'anchors' and 'anchors_nr' fields of xpparam_t are also
affected by the same problem, but the symptoms are more benign in their
case because these fields are only used in xdiff/xpatience.c, i.e. when
XDF_PATIENCE_DIFF is set in 'flags' - and as I already mentioned in
commit messages, that field is always initialized properly, so there is
currently no practical possibility of stack garbage being interpreted as
e.g. a pointer to the anchor array.

> > +-I<regex>::
> > +	Ignore changes whose all lines match <regex>.
> > +
> 
> The implementation seems to allow only one regex, but I suspect we'd
> want to mimic
> 
>     $ printf "%s\n" a a a >test_a
>     $ printf "%s\n" b b b >test_b
>     $ diff -Ia     test_a test_b
>     $ diff     -Ib test_a test_b
>     $ diff -Ia -Ib test_a test_b

Ah, sure.  After skimming through various man pages available online, I
was not sure whether all standalone versions of diff which support -I
allow that switch to be used multiple times and I thought it would be
better to start with the simplest thing possible.  If you would rather
have the above implemented immediately, I can sure try that in v2.

> and until that happens, the explanation needs to say something about
> earlier <regex> being discarded when this option is given more than
> once.

Sorry, where do I look for that explanation?  I tried to make -I behave
similarly to -G and -S, each of which can be specified more than once,
but the last value provided overrides the earlier ones - and I could not
find any mention of that behavior in the docs.  Please help?

> > @@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt,
> >  	return 0;
> >  }
> >  
> > +static int diff_opt_ignore_regex(const struct option *opt,
> > +				 const char *arg, int unset)
> > +{
> > +	struct diff_options *options = opt->value;
> > +
> > +	BUG_ON_OPT_NEG(unset);
> > +	options->xdl_opts |= XDF_IGNORE_REGEX;
> > +	options->ignore_regex = arg;
> 
> When given twice or more, the earlier value gets lost (it does not
> leak, luckily, though).

Right, as I mentioned above, I just wanted to start with something
simple that resembles -G/-S.  I will revise this part in v2 if you would
like to see support for multiple regular expressions in this patch
series.

> > +	return 0;
> > +}
> 
> If we somehow can lose the use of XDF_IGNORE_REGEX bit, we do not
> have to have this as a callback.  Instead, it can be OPT_STRING with
> the current semantics of "only the last one is used", or we can use
> OPT_STRING_LIST to keep all the expressions.

I see, thanks for the hints!

> On the other hand, I wonder if it would be a valid approach to make
> the new field(s) in diff_options a "regex_t *ignore_regex" (and add
> an "int ignore_regex_nr" next to it, if we shoot for multiple -I
> options), instead of "const char *".  For that, we would need a
> callback even without XDF_IGNORE_REGEX bit having to futz with
> xdl_opts field.
> 
> Doing so would give us a chance to compile and notice a malformed
> regular expression in diff.c, before it hits xdiff/ layer, which may
> or may not be a good thing.

I have not thought about this approach before, but it sounds elegant to
me, at least at first glance - option parsing code sounds like the right
place for sanitizing user-provided parameters.  Doubly so if we take the
multiple -I options approach.  I will try this in v2.

> > @@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
> >  	}
> >  }
> 
> I agree with what you said in the cover letter that it would be a
> good idea to name the existing xdl_mark_ignorable() and the new one
> in such a way that they look similar and parallel, by renaming the
> former to xdl_mark_ignorable_lines or something.

Then I will do that in v2.  Separate patch?

> > +static void
> > +xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
> > +			 const char *ignore_regex)
> 
> I know some coding standard do chomp line immediately before the
> function name (I grew up with one), but that is not what this
> project uses, and judging from the surrounding code, it is not what
> the upstream xdiff source we borrowed uses, either.

Oh, right, sorry - force of habit.  I will fix this in v2.

> > +{
> > +	xdchange_t *xch;
> > +	regex_t regex;
> > +
> > +	if (regcomp(&regex, ignore_regex, REG_EXTENDED | REG_NEWLINE))
> > +		die("invalid regex: %s", ignore_regex);
> 
> If we compile in diff.c layer and pass regex_t down here, we won't
> have to fail here this deep in the callchain.

Agreed - I already responded to this remark above.

> > +	for (xch = xscr; xch; xch = xch->next) {
> > +		regmatch_t regmatch;
> > +		xrecord_t **rec;
> > +		int ignore = 1;
> > +		long i;
> > +
> > +		rec = &xe->xdf1.recs[xch->i1];
> > +		for (i = 0; i < xch->chg1 && ignore; i++)
> > +			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
> > +					      1, &regmatch, 0);
> > +		rec = &xe->xdf2.recs[xch->i2];
> > +		for (i = 0; i < xch->chg2 && ignore; i++)
> > +			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
> > +					      1, &regmatch, 0);
> > +
> > +		/*
> > +		 * Do not override --ignore-blank-lines.
> > +		 */
> > +		xch->ignore |= ignore;
> 
> Well, you could optimize out the whole regexp matching by adding
> 
> 		if (xch->ignore)
> 			continue;
> 
> before the two loops try to find a non-matching line, no?

Ah, of course, it looks so obvious now that you pointed it out :-)  I
guess I was copy-past^W^W^Wtrying to make xdl_mark_ignorable_regex()
look similar to xdl_mark_ignorable().  I will use your suggestion in v2.

> > +	}
> > +}
> > +
> >  int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> >  	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
> >  	xdchange_t *xscr;
> > @@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> >  		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
> >  			xdl_mark_ignorable(xscr, &xe, xpp->flags);
> >  
> > +		if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex)
> > +			xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex);
> > +
> >  		if (ef(&xe, xscr, ecb, xecfg) < 0) {
> >  
> >  			xdl_free_script(xscr);
> 
> Thanks for an exciting read ;-)

My pleasure ;-)  And thanks for taking a look.  Sorry about the long
turnaround time, but unfortunately this is the best I can do at the
moment.

-- 
Best regards,
Michał Kępień

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] diff: add -I<regex> that ignores matching changes
  2020-10-07 19:48     ` Michał Kępień
@ 2020-10-07 20:08       ` Junio C Hamano
  0 siblings, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2020-10-07 20:08 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

Michał Kępień <michal@isc.org> writes:

>> cannot be memset(0) before use?  It seems like a much better
>> approach to ensure that we clear the structure.
>
> I do not know of any reason for xpparam_t structures to not be
> zero-initialized.  In fact, they _are_ zero-initialized most of the
> time; AFAICT, there are just three places in the tree in which they are
> not.
>
> Would you like me to address that in a separate patch in this patch
> series?

Yes, as a preliminary clean-up patch before the real/fun stuff ;-)

>> > +-I<regex>::
>> > +	Ignore changes whose all lines match <regex>.
>> > +
>> 
>> The implementation seems to allow only one regex, but I suspect we'd
>> want to mimic
>> 
>>     $ printf "%s\n" a a a >test_a
>>     $ printf "%s\n" b b b >test_b
>>     $ diff -Ia     test_a test_b
>>     $ diff     -Ib test_a test_b
>>     $ diff -Ia -Ib test_a test_b
>
> Ah, sure.  After skimming through various man pages available online, I
> was not sure whether all standalone versions of diff which support -I
> allow that switch to be used multiple times and I thought it would be
> better to start with the simplest thing possible.  If you would rather
> have the above implemented immediately, I can sure try that in v2.
>
>> and until that happens, the explanation needs to say something about
>> earlier <regex> being discarded when this option is given more than
>> once.
>
> Sorry, where do I look for that explanation? 

You wrote "Ignore changes whose all lines match <regex>"; I was
suggesting that we also need "when given more than once, all but the
last <regex> are ignored" after that sentence, because that is not
what people who know how -I works in versions of "diff" that support
it expect.

But I think it should be trivial to make it a list of patterns and
try to match against them in a loop.  So let's support multiple
patterns from the get-go.

> I have not thought about this approach before, but it sounds elegant to
> me, at least at first glance - option parsing code sounds like the right
> place for sanitizing user-provided parameters.  Doubly so if we take the
> multiple -I options approach.  I will try this in v2.

Sounds good.

>> I agree with what you said in the cover letter that it would be a
>> good idea to name the existing xdl_mark_ignorable() and the new one
>> in such a way that they look similar and parallel, by renaming the
>> former to xdl_mark_ignorable_lines or something.
>
> Then I will do that in v2.  Separate patch?

Given that the function has only one caller, I think it is better
done within the same patch.  xdl_mark_ignorable() as the name of the
function is not wrong per-se, so it does not deserve to be renamed
standalone in a preliminary clean-up patch---there is nothing to be
cleaned up.  The fact that we introduce a similarly tasked function
makes its current name less desirable, so adjusting to the new world
order is better done as part of the same patch.

> My pleasure ;-)  And thanks for taking a look.  Sorry about the long
> turnaround time, but unfortunately this is the best I can do at the
> moment.

Pleasure is mutual ;-)

We've survived without -I for 15 years.  Even a few more months
would not hurt anybody.  Take time, aim for a quality job, and most
importantly, have fun.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v2 0/3] diff: add -I<regex> that ignores matching changes
  2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
                   ` (2 preceding siblings ...)
  2020-10-01 17:02 ` [PATCH 0/2] diff: add -I<regex> that ignores matching changes Junio C Hamano
@ 2020-10-12  9:17 ` Michał Kępień
  2020-10-12  9:17   ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
                     ` (3 more replies)
  3 siblings, 4 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-12  9:17 UTC (permalink / raw)
  To: git

This patch series adds a new diff option that enables ignoring changes
whose all lines (changed, removed, and added) match a given regular
expression.  This is similar to the -I option in standalone diff
utilities and can be used e.g. to ignore changes which only affect code
comments or to look for unrelated changes in commits containing a large
number of automatically applied modifications (e.g. a tree-wide string
replacement).  The difference between -G/-S and the new -I option is
that the latter filters output on a per-change basis.

Changes from v1:

  - Add a new preliminary cleanup patch which ensures xpparam_t
    structures are always zero-initialized.  (This was a prerequisite
    for the next change below.)

  - Do not add a new 'xdl_opts' flag to check whether -I was used;
    instead, just check whether the array of regular expressions to
    match against is non-NULL.

  - Enable the -I option to be used multiple times.  As a consequence of
    this, regular expressions are now "pre-compiled" in the option's
    callback (and passed around as an array of regex_t structures)
    rather than deep down in xdiff code.  Add test cases exercising use
    of multiple -I options in the same git invocation.  Update
    documentation accordingly.

  - Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to
    indicate that it is logically a "sibling" of
    xdl_mark_ignorable_regex() rather than its "parent".

  - Optimize xdl_mark_ignorable_regex() by making it immediately skip
    changes already marked as ignored by xdl_mark_ignorable_lines().

  - Fix coding style issue in the prototype part of the definition of
    xdl_mark_ignorable_regex().

  - Add "/* see Documentation/diff-options.txt */" comments for the
    fields added to struct diff_options and xpparam_t, mimicking the
    comments used for 'anchors', 'anchors_nr', and 'anchors_alloc'.

  - Revise commit log messages to reflect all of the above.

Michał Kępień (3):
  merge-base, xdiff: zero out xpparam_t structures
  diff: add -I<regex> that ignores matching changes
  t: add -I<regex> tests

 Documentation/diff-options.txt |   4 +
 builtin/merge-tree.c           |   1 +
 diff.c                         |  23 ++
 diff.h                         |   4 +
 t/t4069-diff-ignore-regex.sh   | 426 +++++++++++++++++++++++++++++++++
 xdiff/xdiff.h                  |   4 +
 xdiff/xdiffi.c                 |  47 +++-
 xdiff/xhistogram.c             |   2 +
 xdiff/xpatience.c              |   2 +
 9 files changed, 511 insertions(+), 2 deletions(-)
 create mode 100755 t/t4069-diff-ignore-regex.sh

-- 
2.28.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures
  2020-10-12  9:17 ` [PATCH v2 0/3] " Michał Kępień
@ 2020-10-12  9:17   ` Michał Kępień
  2020-10-12 11:14     ` Johannes Schindelin
  2020-10-12 19:52     ` Junio C Hamano
  2020-10-12  9:17   ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-12  9:17 UTC (permalink / raw)
  To: git

xpparam_t structures are usually zero-initialized before their specific
fields are assigned to, but there are three locations in the tree where
that does not happen.  Add the missing memset() calls in order to make
initialization of xpparam_t structures consistent tree-wide and to
prevent stack garbage from being used as field values.

Signed-off-by: Michał Kępień <michal@isc.org>
---
 builtin/merge-tree.c | 1 +
 xdiff/xhistogram.c   | 2 ++
 xdiff/xpatience.c    | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index e72714a5a8..de8520778d 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -109,6 +109,7 @@ static void show_diff(struct merge_list *entry)
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;
 
+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
index c7b35a9667..e694bfd9e3 100644
--- a/xdiff/xhistogram.c
+++ b/xdiff/xhistogram.c
@@ -235,6 +235,8 @@ static int fall_back_to_classic_diff(xpparam_t const *xpp, xdfenv_t *env,
 		int line1, int count1, int line2, int count2)
 {
 	xpparam_t xpparam;
+
+	memset(&xpparam, 0, sizeof(xpparam));
 	xpparam.flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK;
 
 	return xdl_fall_back_diff(env, &xpparam,
diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
index 3c5601b602..20699a6f60 100644
--- a/xdiff/xpatience.c
+++ b/xdiff/xpatience.c
@@ -318,6 +318,8 @@ static int fall_back_to_classic_diff(struct hashmap *map,
 		int line1, int count1, int line2, int count2)
 {
 	xpparam_t xpp;
+
+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = map->xpp->flags & ~XDF_DIFF_ALGORITHM_MASK;
 
 	return xdl_fall_back_diff(map->env, &xpp,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-12  9:17 ` [PATCH v2 0/3] " Michał Kępień
  2020-10-12  9:17   ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
@ 2020-10-12  9:17   ` Michał Kępień
  2020-10-12 11:20     ` Johannes Schindelin
                       ` (2 more replies)
  2020-10-12  9:17   ` [PATCH v2 3/3] t: add -I<regex> tests Michał Kępień
  2020-10-15  7:24   ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
  3 siblings, 3 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-12  9:17 UTC (permalink / raw)
  To: git

Add a new diff option that enables ignoring changes whose all lines
(changed, removed, and added) match a given regular expression.  This is
similar to the -I option in standalone diff utilities and can be used
e.g. to ignore changes which only affect code comments or to look for
unrelated changes in commits containing a large number of automatically
applied modifications (e.g. a tree-wide string replacement).  The
difference between -G/-S and the new -I option is that the latter
filters output on a per-change basis.

Use the 'ignore' field of xdchange_t for marking a change as ignored or
not.  Since the same field is used by --ignore-blank-lines, identical
hunk emitting rules apply for --ignore-blank-lines and -I.  These two
options can also be used together in the same git invocation (they are
complementary to each other).

Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
than its "parent".

Signed-off-by: Michał Kępień <michal@isc.org>
---
 Documentation/diff-options.txt |  4 +++
 diff.c                         | 23 +++++++++++++++++
 diff.h                         |  4 +++
 xdiff/xdiff.h                  |  4 +++
 xdiff/xdiffi.c                 | 47 ++++++++++++++++++++++++++++++++--
 5 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 573fb9bb71..f8ccf85173 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -687,6 +687,10 @@ endif::git-format-patch[]
 --ignore-blank-lines::
 	Ignore changes whose lines are all blank.
 
+-I<regex>::
+	Ignore changes whose all lines match <regex>.  This option may
+	be specified more than once.
+
 --inter-hunk-context=<lines>::
 	Show the context between diff hunks, up to the specified number
 	of lines, thereby fusing hunks that are close to each other.
diff --git a/diff.c b/diff.c
index 2bb2f8f57e..c5d4920fee 100644
--- a/diff.c
+++ b/diff.c
@@ -3587,6 +3587,8 @@ static void builtin_diff(const char *name_a,
 		if (header.len && !o->flags.suppress_diff_headers)
 			ecbdata.header = &header;
 		xpp.flags = o->xdl_opts;
+		xpp.ignore_regex = o->ignore_regex;
+		xpp.ignore_regex_nr = o->ignore_regex_nr;
 		xpp.anchors = o->anchors;
 		xpp.anchors_nr = o->anchors_nr;
 		xecfg.ctxlen = o->context;
@@ -3716,6 +3718,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		xpp.flags = o->xdl_opts;
+		xpp.ignore_regex = o->ignore_regex;
+		xpp.ignore_regex_nr = o->ignore_regex_nr;
 		xpp.anchors = o->anchors;
 		xpp.anchors_nr = o->anchors_nr;
 		xecfg.ctxlen = o->context;
@@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt,
 	return 0;
 }
 
+static int diff_opt_ignore_regex(const struct option *opt,
+				 const char *arg, int unset)
+{
+	struct diff_options *options = opt->value;
+	regex_t *regex;
+
+	BUG_ON_OPT_NEG(unset);
+	regex = xcalloc(1, sizeof(*regex));
+	if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE))
+		die("invalid regex: %s", arg);
+	ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1,
+		   options->ignore_regex_alloc);
+	options->ignore_regex[options->ignore_regex_nr++] = regex;
+	return 0;
+}
+
 static int diff_opt_pickaxe_regex(const struct option *opt,
 				  const char *arg, int unset)
 {
@@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options)
 		OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts,
 			  N_("ignore changes whose lines are all blank"),
 			  XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG),
+		OPT_CALLBACK_F('I', NULL, options, N_("<regex>"),
+			       N_("ignore changes whose all lines match <regex>"),
+			       0, diff_opt_ignore_regex),
 		OPT_BIT(0, "indent-heuristic", &options->xdl_opts,
 			N_("heuristic to shift diff hunk boundaries for easy reading"),
 			XDF_INDENT_HEURISTIC),
diff --git a/diff.h b/diff.h
index 11de52e9e9..80dbd3dfdc 100644
--- a/diff.h
+++ b/diff.h
@@ -234,6 +234,10 @@ struct diff_options {
 	 */
 	const char *pickaxe;
 
+	/* see Documentation/diff-options.txt */
+	regex_t **ignore_regex;
+	size_t ignore_regex_nr, ignore_regex_alloc;
+
 	const char *single_follow;
 	const char *a_prefix, *b_prefix;
 	const char *line_prefix;
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 032e3a9f41..883a0d770e 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -79,6 +79,10 @@ typedef struct s_mmbuffer {
 typedef struct s_xpparam {
 	unsigned long flags;
 
+	/* See Documentation/diff-options.txt. */
+	regex_t **ignore_regex;
+	size_t ignore_regex_nr;
+
 	/* See Documentation/diff-options.txt. */
 	char **anchors;
 	size_t anchors_nr;
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index bd035139f9..380eb728ed 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -998,7 +998,7 @@ static int xdl_call_hunk_func(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 	return 0;
 }
 
-static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
+static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
 {
 	xdchange_t *xch;
 
@@ -1019,6 +1019,46 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
 	}
 }
 
+static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) {
+	regmatch_t regmatch;
+	int i;
+
+	for (i = 0; i < xpp->ignore_regex_nr; i++)
+		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1,
+				 &regmatch, 0))
+			return 1;
+
+	return 0;
+}
+
+static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
+				     xpparam_t const *xpp)
+{
+	xdchange_t *xch;
+
+	for (xch = xscr; xch; xch = xch->next) {
+		xrecord_t **rec;
+		int ignore = 1;
+		long i;
+
+		/*
+		 * Do not override --ignore-blank-lines.
+		 */
+		if (xch->ignore)
+			continue;
+
+		rec = &xe->xdf1.recs[xch->i1];
+		for (i = 0; i < xch->chg1 && ignore; i++)
+			ignore = record_matches_regex(rec[i], xpp);
+
+		rec = &xe->xdf2.recs[xch->i2];
+		for (i = 0; i < xch->chg2 && ignore; i++)
+			ignore = record_matches_regex(rec[i], xpp);
+
+		xch->ignore = ignore;
+	}
+}
+
 int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
 	xdchange_t *xscr;
@@ -1038,7 +1078,10 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	}
 	if (xscr) {
 		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
-			xdl_mark_ignorable(xscr, &xe, xpp->flags);
+			xdl_mark_ignorable_lines(xscr, &xe, xpp->flags);
+
+		if (xpp->ignore_regex)
+			xdl_mark_ignorable_regex(xscr, &xe, xpp);
 
 		if (ef(&xe, xscr, ecb, xecfg) < 0) {
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 3/3] t: add -I<regex> tests
  2020-10-12  9:17 ` [PATCH v2 0/3] " Michał Kępień
  2020-10-12  9:17   ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
  2020-10-12  9:17   ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień
@ 2020-10-12  9:17   ` Michał Kępień
  2020-10-12 11:49     ` Johannes Schindelin
  2020-10-15  7:24   ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
  3 siblings, 1 reply; 57+ messages in thread
From: Michał Kępień @ 2020-10-12  9:17 UTC (permalink / raw)
  To: git

Exercise the new -I<regex> diff option in various scenarios to ensure it
behaves as expected.

Signed-off-by: Michał Kępień <michal@isc.org>
---
 t/t4069-diff-ignore-regex.sh | 426 +++++++++++++++++++++++++++++++++++
 1 file changed, 426 insertions(+)
 create mode 100755 t/t4069-diff-ignore-regex.sh

diff --git a/t/t4069-diff-ignore-regex.sh b/t/t4069-diff-ignore-regex.sh
new file mode 100755
index 0000000000..4ddf5c67ae
--- /dev/null
+++ b/t/t4069-diff-ignore-regex.sh
@@ -0,0 +1,426 @@
+#!/bin/sh
+
+test_description='Test diff -I<regex>'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
+
+test_expect_success setup '
+	test_seq 20 >x &&
+	git update-index --add x
+'
+
+test_expect_success 'one line changed' '
+	test_seq 20 | sed "s/10/100/" >x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -7,7 +7,7 @@
+	 7
+	 8
+	 9
+	-10
+	+100
+	 11
+	 12
+	 13
+	EOF
+	compare_diff_patch expected plain &&
+
+	# Both old and new line match regex - ignore change
+	git diff -I "^10" >actual &&
+	test_must_be_empty actual &&
+
+	# Both old and new line match some regex - ignore change
+	git diff -I "^10\$" -I "^100" >actual &&
+	test_must_be_empty actual &&
+
+	# Only old line matches regex - do not ignore change
+	git diff -I "^10\$" >actual &&
+	compare_diff_patch plain actual &&
+
+	# Only new line matches regex - do not ignore change
+	git diff -I "^100" >actual &&
+	compare_diff_patch plain actual &&
+
+	# Only old line matches some regex - do not ignore change
+	git diff -I "^10\$" -I "^101" >actual &&
+	compare_diff_patch plain actual &&
+
+	# Only new line matches some regex - do not ignore change
+	git diff -I "^11\$" -I "^100" >actual &&
+	compare_diff_patch plain actual
+'
+
+test_expect_success 'one line removed' '
+	test_seq 20 | sed "10d" >x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -7,7 +7,6 @@
+	 7
+	 8
+	 9
+	-10
+	 11
+	 12
+	 13
+	EOF
+	compare_diff_patch expected plain &&
+
+	# Removed line matches regex - ignore change
+	git diff -I "^10" >actual &&
+	test_must_be_empty actual &&
+
+	# Removed line matches some regex - ignore change
+	git diff -I "^10" -I "^100" >actual &&
+	test_must_be_empty actual &&
+
+	# Removed line does not match regex - do not ignore change
+	git diff -I "^101" >actual &&
+	compare_diff_patch plain actual &&
+
+	# Removed line does not match any regex - do not ignore change
+	git diff -I "^100" -I "^101" >actual &&
+	compare_diff_patch plain actual
+'
+
+test_expect_success 'one line added' '
+	test_seq 21 >x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -18,3 +18,4 @@
+	 18
+	 19
+	 20
+	+21
+	EOF
+	compare_diff_patch expected plain &&
+
+	# Added line matches regex - ignore change
+	git diff -I "^21" >actual &&
+	test_must_be_empty actual &&
+
+	# Added line matches some regex - ignore change
+	git diff -I "^21" -I "^22" >actual &&
+	test_must_be_empty actual &&
+
+	# Added line does not match regex - do not ignore change
+	git diff -I "^212" >actual &&
+	compare_diff_patch plain actual &&
+
+	# Added line does not match any regex - do not ignore change
+	git diff -I "^211" -I "^212" >actual &&
+	compare_diff_patch plain actual
+'
+
+test_expect_success 'last two lines changed' '
+	test_seq 20 | sed "s/19/21/; s/20/22/" >x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -16,5 +16,5 @@
+	 16
+	 17
+	 18
+	-19
+	-20
+	+21
+	+22
+	EOF
+	compare_diff_patch expected plain &&
+
+	# All changed lines match regex - ignore change
+	git diff -I "^[12]" >actual &&
+	test_must_be_empty actual &&
+
+	# All changed lines match some regex - ignore change
+	git diff -I "^1" -I "^2" >actual &&
+	test_must_be_empty actual &&
+
+	# Not all changed lines match regex - do not ignore change
+	git diff -I "^2" >actual &&
+	compare_diff_patch plain actual &&
+
+	# Not all changed lines match some regex - do not ignore change
+	git diff -I "^2" -I "^3" >actual &&
+	compare_diff_patch plain actual
+'
+
+test_expect_success 'two non-adjacent lines removed in the same hunk' '
+	test_seq 20 | sed "1d; 3d" >x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -1,6 +1,4 @@
+	-1
+	 2
+	-3
+	 4
+	 5
+	 6
+	EOF
+	compare_diff_patch expected plain &&
+
+	# Both removed lines match regex - ignore hunk
+	git diff -I "^[1-3]" >actual &&
+	test_must_be_empty actual &&
+
+	# Both removed lines match some regex - ignore hunk
+	git diff -I "^1" -I "^3" >actual &&
+	test_must_be_empty actual &&
+
+	# First removed line does not match regex - do not ignore hunk
+	git diff -I "^[2-3]" >actual &&
+	compare_diff_patch plain actual &&
+
+	# First removed line does not match any regex - do not ignore hunk
+	git diff -I "^2" -I "^3" >actual &&
+	compare_diff_patch plain actual &&
+
+	# Second removed line does not match regex - do not ignore hunk
+	git diff -I "^[1-2]" >actual &&
+	compare_diff_patch plain actual &&
+
+	# Second removed line does not match any regex - do not ignore hunk
+	git diff -I "^1" -I "^2" >actual &&
+	compare_diff_patch plain actual
+'
+
+test_expect_success 'two non-adjacent lines removed in the same hunk, with -U1' '
+	test_seq 20 | sed "1d; 3d" >x &&
+
+	# Get plain diff
+	git diff -U1 >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -1,4 +1,2 @@
+	-1
+	 2
+	-3
+	 4
+	EOF
+	compare_diff_patch expected plain &&
+
+	# Both removed lines match regex - ignore hunk
+	git diff -U1 -I "^[1-3]" >actual &&
+	test_must_be_empty actual &&
+
+	# Both removed lines match some regex - ignore hunk
+	git diff -U1 -I "^1" -I "^3" >actual &&
+	test_must_be_empty actual &&
+
+	# First removed line does not match regex, but is out of context - ignore second change
+	git diff -U1 -I "^[2-3]" >actual &&
+	cat >second-change-ignored <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -1,2 +1 @@
+	-1
+	 2
+	EOF
+	compare_diff_patch second-change-ignored actual &&
+
+	# First removed line does not match any regex, but is out of context - ignore second change
+	git diff -U1 -I "^2" -I "^3" >actual &&
+	compare_diff_patch second-change-ignored actual &&
+
+	# Second removed line does not match regex, but is out of context - ignore first change
+	git diff -U1 -I "^[1-2]" >actual &&
+	cat >first-change-ignored <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -2,3 +1,2 @@
+	 2
+	-3
+	 4
+	EOF
+	compare_diff_patch first-change-ignored actual &&
+
+	# Second removed line does not match any regex, but is out of context - ignore first change
+	git diff -U1 -I "^1" -I "^2" >actual &&
+	compare_diff_patch first-change-ignored actual
+'
+
+test_expect_success 'multiple hunks' '
+	test_seq 20 | sed "1d; 20d" >x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -1,4 +1,3 @@
+	-1
+	 2
+	 3
+	 4
+	@@ -17,4 +16,3 @@
+	 17
+	 18
+	 19
+	-20
+	EOF
+	compare_diff_patch expected plain &&
+
+	# Ignore both hunks (single regex)
+	git diff -I "^[12]" >actual &&
+	test_must_be_empty actual &&
+
+	# Ignore both hunks (multiple regexes)
+	git diff -I "^1" -I "^2" >actual &&
+	test_must_be_empty actual &&
+
+	# Only ignore first hunk (single regex)
+	git diff -I "^1" >actual &&
+	cat >first-hunk-ignored <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -17,4 +16,3 @@
+	 17
+	 18
+	 19
+	-20
+	EOF
+	compare_diff_patch first-hunk-ignored actual &&
+
+	# Only ignore first hunk (multiple regexes)
+	git diff -I "^0" -I "^1" >actual &&
+	compare_diff_patch first-hunk-ignored actual &&
+
+	# Only ignore second hunk (single regex)
+	git diff -I "^2" >actual &&
+	cat >second-hunk-ignored <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -1,4 +1,3 @@
+	-1
+	 2
+	 3
+	 4
+	EOF
+	compare_diff_patch second-hunk-ignored actual &&
+
+	# Only ignore second hunk (multiple regexes)
+	git diff -I "^2" -I "^3" >actual &&
+	compare_diff_patch second-hunk-ignored actual
+'
+
+test_expect_success 'multiple hunks, with --ignore-blank-lines' '
+	echo >x &&
+	test_seq 21 >>x &&
+
+	# Get plain diff
+	git diff >plain &&
+	cat >expected <<-EOF &&
+	diff --git a/x b/x
+	--- a/x
+	+++ b/x
+	@@ -1,3 +1,4 @@
+	+
+	 1
+	 2
+	 3
+	@@ -18,3 +19,4 @@
+	 18
+	 19
+	 20
+	+21
+	EOF
+	compare_diff_patch expected plain &&
+
+	# -I does not override --ignore-blank-lines - ignore both hunks (single regex)
+	git diff --ignore-blank-lines -I "^21" >actual &&
+	test_must_be_empty actual &&
+
+	# -I does not override --ignore-blank-lines - ignore both hunks (multiple regexes)
+	git diff --ignore-blank-lines -I "^21" -I "^12" >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'diffstat' '
+	test_seq 20 | sed "s/^5/0/p; s/^15/10/; 16d" >x &&
+
+	# Get plain diffstat
+	git diff --stat >actual &&
+	cat >expected <<-EOF &&
+	 x | 6 +++---
+	 1 file changed, 3 insertions(+), 3 deletions(-)
+	EOF
+	test_cmp expected actual &&
+
+	# Ignore both hunks (single regex)
+	git diff --stat -I "^[0-5]" >actual &&
+	test_must_be_empty actual &&
+
+	# Ignore both hunks (multiple regexes)
+	git diff --stat -I "^0" -I "^1" -I "^5" >actual &&
+	test_must_be_empty actual &&
+
+	# Only ignore first hunk (single regex)
+	git diff --stat -I "^[05]" >actual &&
+	cat >expected <<-EOF &&
+	 x | 3 +--
+	 1 file changed, 1 insertion(+), 2 deletions(-)
+	EOF
+	test_cmp expected actual &&
+
+	# Only ignore first hunk (multiple regexes)
+	git diff --stat -I "^0" -I "^5" >actual &&
+	test_cmp expected actual &&
+
+	# Only ignore second hunk (single regex)
+	git diff --stat -I "^1" >actual &&
+	cat >expected <<-EOF &&
+	 x | 3 ++-
+	 1 file changed, 2 insertions(+), 1 deletion(-)
+	EOF
+	test_cmp expected actual &&
+
+	# Only ignore second hunk (multiple regexes)
+	git diff --stat -I "^1" -I "^2" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'invalid regexes' '
+	>x &&
+
+	# Single invalid regex
+	git diff -I "^[1" 2>&1 | grep "invalid regex: " &&
+
+	# Two regexes: first invalid, second valid
+	git diff -I "^[1" -I "^1" 2>&1 | grep "invalid regex: " &&
+
+	# Two invalid regexes
+	git diff -I "^[1" -I "^[2" 2>&1 | grep "invalid regex: "
+'
+
+test_done
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures
  2020-10-12  9:17   ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
@ 2020-10-12 11:14     ` Johannes Schindelin
  2020-10-12 17:09       ` Junio C Hamano
  2020-10-12 19:52     ` Junio C Hamano
  1 sibling, 1 reply; 57+ messages in thread
From: Johannes Schindelin @ 2020-10-12 11:14 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1262 bytes --]

Hi Michał,

On Mon, 12 Oct 2020, Michał Kępień wrote:


> diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
> index c7b35a9667..e694bfd9e3 100644
> --- a/xdiff/xhistogram.c
> +++ b/xdiff/xhistogram.c
> @@ -235,6 +235,8 @@ static int fall_back_to_classic_diff(xpparam_t const *xpp, xdfenv_t *env,
>  		int line1, int count1, int line2, int count2)
>  {
>  	xpparam_t xpparam;
> +
> +	memset(&xpparam, 0, sizeof(xpparam));

A slightly more elegant way to do this would be

	xpparam_t xpparam = { 0l };

Or even

	xpparam_t xpparam = XPPARAM_T_INIT;

with

	#define XPPARAM_T_INIT { 0l, NULL, 0 }

in `xdiff/xdiff.h`.

Thanks,
Dscho

>  	xpparam.flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK;
>
>  	return xdl_fall_back_diff(env, &xpparam,
> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
> index 3c5601b602..20699a6f60 100644
> --- a/xdiff/xpatience.c
> +++ b/xdiff/xpatience.c
> @@ -318,6 +318,8 @@ static int fall_back_to_classic_diff(struct hashmap *map,
>  		int line1, int count1, int line2, int count2)
>  {
>  	xpparam_t xpp;
> +
> +	memset(&xpp, 0, sizeof(xpp));
>  	xpp.flags = map->xpp->flags & ~XDF_DIFF_ALGORITHM_MASK;
>
>  	return xdl_fall_back_diff(map->env, &xpp,
> --
> 2.28.0
>
>
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-12  9:17   ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień
@ 2020-10-12 11:20     ` Johannes Schindelin
  2020-10-12 20:00       ` Junio C Hamano
  2020-10-13  6:36       ` Michał Kępień
  2020-10-12 18:01     ` Junio C Hamano
  2020-10-12 20:04     ` Junio C Hamano
  2 siblings, 2 replies; 57+ messages in thread
From: Johannes Schindelin @ 2020-10-12 11:20 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 4652 bytes --]

Hi Michał,

On Mon, 12 Oct 2020, Michał Kępień wrote:

> @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt,
>  	return 0;
>  }
>
> +static int diff_opt_ignore_regex(const struct option *opt,
> +				 const char *arg, int unset)
> +{
> +	struct diff_options *options = opt->value;
> +	regex_t *regex;
> +
> +	BUG_ON_OPT_NEG(unset);
> +	regex = xcalloc(1, sizeof(*regex));
> +	if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE))
> +		die("invalid regex: %s", arg);
> +	ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1,
> +		   options->ignore_regex_alloc);
> +	options->ignore_regex[options->ignore_regex_nr++] = regex;

A slightly more elegant way would be to have `ignore_regex` have the type
`regex_t *` and use `ALLOC_GROW_BY()` (which zeroes the newly-added
elements automagically).

> +	return 0;
> +}
> +
>  static int diff_opt_pickaxe_regex(const struct option *opt,
>  				  const char *arg, int unset)
>  {
> [...]
> @@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options)
>  		OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts,
>  			  N_("ignore changes whose lines are all blank"),
>  			  XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG),
> +		OPT_CALLBACK_F('I', NULL, options, N_("<regex>"),
> +			       N_("ignore changes whose all lines match <regex>"),
> +			       0, diff_opt_ignore_regex),
>  		OPT_BIT(0, "indent-heuristic", &options->xdl_opts,
>  			N_("heuristic to shift diff hunk boundaries for easy reading"),
>  			XDF_INDENT_HEURISTIC),

Are we releasing the `ignore_regex` anywhere?

Thanks,
Dscho

> diff --git a/diff.h b/diff.h
> index 11de52e9e9..80dbd3dfdc 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -234,6 +234,10 @@ struct diff_options {
>  	 */
>  	const char *pickaxe;
>
> +	/* see Documentation/diff-options.txt */
> +	regex_t **ignore_regex;
> +	size_t ignore_regex_nr, ignore_regex_alloc;
> +
>  	const char *single_follow;
>  	const char *a_prefix, *b_prefix;
>  	const char *line_prefix;
> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> index 032e3a9f41..883a0d770e 100644
> --- a/xdiff/xdiff.h
> +++ b/xdiff/xdiff.h
> @@ -79,6 +79,10 @@ typedef struct s_mmbuffer {
>  typedef struct s_xpparam {
>  	unsigned long flags;
>
> +	/* See Documentation/diff-options.txt. */
> +	regex_t **ignore_regex;
> +	size_t ignore_regex_nr;
> +
>  	/* See Documentation/diff-options.txt. */
>  	char **anchors;
>  	size_t anchors_nr;
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index bd035139f9..380eb728ed 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -998,7 +998,7 @@ static int xdl_call_hunk_func(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
>  	return 0;
>  }
>
> -static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
> +static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
>  {
>  	xdchange_t *xch;
>
> @@ -1019,6 +1019,46 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
>  	}
>  }
>
> +static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) {
> +	regmatch_t regmatch;
> +	int i;
> +
> +	for (i = 0; i < xpp->ignore_regex_nr; i++)
> +		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1,
> +				 &regmatch, 0))
> +			return 1;
> +
> +	return 0;
> +}
> +
> +static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
> +				     xpparam_t const *xpp)
> +{
> +	xdchange_t *xch;
> +
> +	for (xch = xscr; xch; xch = xch->next) {
> +		xrecord_t **rec;
> +		int ignore = 1;
> +		long i;
> +
> +		/*
> +		 * Do not override --ignore-blank-lines.
> +		 */
> +		if (xch->ignore)
> +			continue;
> +
> +		rec = &xe->xdf1.recs[xch->i1];
> +		for (i = 0; i < xch->chg1 && ignore; i++)
> +			ignore = record_matches_regex(rec[i], xpp);
> +
> +		rec = &xe->xdf2.recs[xch->i2];
> +		for (i = 0; i < xch->chg2 && ignore; i++)
> +			ignore = record_matches_regex(rec[i], xpp);
> +
> +		xch->ignore = ignore;
> +	}
> +}
> +
>  int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>  	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
>  	xdchange_t *xscr;
> @@ -1038,7 +1078,10 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>  	}
>  	if (xscr) {
>  		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
> -			xdl_mark_ignorable(xscr, &xe, xpp->flags);
> +			xdl_mark_ignorable_lines(xscr, &xe, xpp->flags);
> +
> +		if (xpp->ignore_regex)
> +			xdl_mark_ignorable_regex(xscr, &xe, xpp);
>
>  		if (ef(&xe, xscr, ecb, xecfg) < 0) {
>
> --
> 2.28.0
>
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 3/3] t: add -I<regex> tests
  2020-10-12  9:17   ` [PATCH v2 3/3] t: add -I<regex> tests Michał Kępień
@ 2020-10-12 11:49     ` Johannes Schindelin
  2020-10-13  6:38       ` Michał Kępień
  0 siblings, 1 reply; 57+ messages in thread
From: Johannes Schindelin @ 2020-10-12 11:49 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 13600 bytes --]

Hi Michał,

On Mon, 12 Oct 2020, Michał Kępień wrote:

> Exercise the new -I<regex> diff option in various scenarios to ensure it
> behaves as expected.

Excellent. I was actually looking for a test in patch 2/3 and almost
commented about that.

>
> Signed-off-by: Michał Kępień <michal@isc.org>
> ---
>  t/t4069-diff-ignore-regex.sh | 426 +++++++++++++++++++++++++++++++++++

Hmm. I wonder whether we could do with a much more concise test script.
The test suite already takes a quite long time to run, which is not a
laughing matter: we had issues in the past where contributors would skip
running it because it took too long, and this test is sure to exacerbate
that problem.

I could imagine, for example, that it would be plenty enough to do
something like this instead:

-- snip --
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 5c7b0122b4f..bf158be137f 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -6,6 +6,7 @@
 test_description='Various diff formatting options'

 . ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh

 test_expect_success setup '

@@ -473,4 +474,24 @@ test_expect_success 'diff-tree --stdin with log formatting' '
 	test_cmp expect actual
 '

+test_expect_success '-I<regex>' '
+	seq 50 >I.txt &&
+	sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt &&
+	test_must_fail git diff --no-index -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&
+	cat >expect <<-\EOF &&
+	diff --git a/I.txt b/J.txt
+	--- a/I.txt
+	+++ b/J.txt
+	@@ -34,7 +31,6 @@
+	 34
+	 35
+	 36
+	-37
+	 38
+	 39
+	 40
+	EOF
+	compare_diff_patch expect actual
+'
+
 test_done
-- snap --

Note how it tests various things in one go?

Thanks,
Dscho

P.S.: My main interest in the `-I` option is its use case for `git
range-diff` in Git's own context: if you want to compare your patches to
what entered the `seen` branch, there will _always_ be a difference
because Junio adds their DCO. Something like this can help that:

git range-diff \
	-I'^    Signed-off-by: Junio C Hamano <gitster@pobox.com>$' \
	<my-patch-range> <the-equivalent-in-seen>

>  1 file changed, 426 insertions(+)
>  create mode 100755 t/t4069-diff-ignore-regex.sh
>
> diff --git a/t/t4069-diff-ignore-regex.sh b/t/t4069-diff-ignore-regex.sh
> new file mode 100755
> index 0000000000..4ddf5c67ae
> --- /dev/null
> +++ b/t/t4069-diff-ignore-regex.sh
> @@ -0,0 +1,426 @@
> +#!/bin/sh
> +
> +test_description='Test diff -I<regex>'
> +
> +. ./test-lib.sh
> +. "$TEST_DIRECTORY"/diff-lib.sh
> +
> +test_expect_success setup '
> +	test_seq 20 >x &&
> +	git update-index --add x
> +'
> +
> +test_expect_success 'one line changed' '
> +	test_seq 20 | sed "s/10/100/" >x &&
> +
> +	# Get plain diff
> +	git diff >plain &&
> +	cat >expected <<-EOF &&
> +	diff --git a/x b/x
> +	--- a/x
> +	+++ b/x
> +	@@ -7,7 +7,7 @@
> +	 7
> +	 8
> +	 9
> +	-10
> +	+100
> +	 11
> +	 12
> +	 13
> +	EOF
> +	compare_diff_patch expected plain &&
> +
> +	# Both old and new line match regex - ignore change
> +	git diff -I "^10" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Both old and new line match some regex - ignore change
> +	git diff -I "^10\$" -I "^100" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Only old line matches regex - do not ignore change
> +	git diff -I "^10\$" >actual &&
> +	compare_diff_patch plain actual &&
> +
> +	# Only new line matches regex - do not ignore change
> +	git diff -I "^100" >actual &&
> +	compare_diff_patch plain actual &&
> +
> +	# Only old line matches some regex - do not ignore change
> +	git diff -I "^10\$" -I "^101" >actual &&
> +	compare_diff_patch plain actual &&
> +
> +	# Only new line matches some regex - do not ignore change
> +	git diff -I "^11\$" -I "^100" >actual &&
> +	compare_diff_patch plain actual
> +'
> +
> +test_expect_success 'one line removed' '
> +	test_seq 20 | sed "10d" >x &&
> +
> +	# Get plain diff
> +	git diff >plain &&
> +	cat >expected <<-EOF &&
> +	diff --git a/x b/x
> +	--- a/x
> +	+++ b/x
> +	@@ -7,7 +7,6 @@
> +	 7
> +	 8
> +	 9
> +	-10
> +	 11
> +	 12
> +	 13
> +	EOF
> +	compare_diff_patch expected plain &&
> +
> +	# Removed line matches regex - ignore change
> +	git diff -I "^10" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Removed line matches some regex - ignore change
> +	git diff -I "^10" -I "^100" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Removed line does not match regex - do not ignore change
> +	git diff -I "^101" >actual &&
> +	compare_diff_patch plain actual &&
> +
> +	# Removed line does not match any regex - do not ignore change
> +	git diff -I "^100" -I "^101" >actual &&
> +	compare_diff_patch plain actual
> +'
> +
> +test_expect_success 'one line added' '
> +	test_seq 21 >x &&
> +
> +	# Get plain diff
> +	git diff >plain &&
> +	cat >expected <<-EOF &&
> +	diff --git a/x b/x
> +	--- a/x
> +	+++ b/x
> +	@@ -18,3 +18,4 @@
> +	 18
> +	 19
> +	 20
> +	+21
> +	EOF
> +	compare_diff_patch expected plain &&
> +
> +	# Added line matches regex - ignore change
> +	git diff -I "^21" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Added line matches some regex - ignore change
> +	git diff -I "^21" -I "^22" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Added line does not match regex - do not ignore change
> +	git diff -I "^212" >actual &&
> +	compare_diff_patch plain actual &&
> +
> +	# Added line does not match any regex - do not ignore change
> +	git diff -I "^211" -I "^212" >actual &&
> +	compare_diff_patch plain actual
> +'
> +
> +test_expect_success 'last two lines changed' '
> +	test_seq 20 | sed "s/19/21/; s/20/22/" >x &&
> +
> +	# Get plain diff
> +	git diff >plain &&
> +	cat >expected <<-EOF &&
> +	diff --git a/x b/x
> +	--- a/x
> +	+++ b/x
> +	@@ -16,5 +16,5 @@
> +	 16
> +	 17
> +	 18
> +	-19
> +	-20
> +	+21
> +	+22
> +	EOF
> +	compare_diff_patch expected plain &&
> +
> +	# All changed lines match regex - ignore change
> +	git diff -I "^[12]" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# All changed lines match some regex - ignore change
> +	git diff -I "^1" -I "^2" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Not all changed lines match regex - do not ignore change
> +	git diff -I "^2" >actual &&
> +	compare_diff_patch plain actual &&
> +
> +	# Not all changed lines match some regex - do not ignore change
> +	git diff -I "^2" -I "^3" >actual &&
> +	compare_diff_patch plain actual
> +'
> +
> +test_expect_success 'two non-adjacent lines removed in the same hunk' '
> +	test_seq 20 | sed "1d; 3d" >x &&
> +
> +	# Get plain diff
> +	git diff >plain &&
> +	cat >expected <<-EOF &&
> +	diff --git a/x b/x
> +	--- a/x
> +	+++ b/x
> +	@@ -1,6 +1,4 @@
> +	-1
> +	 2
> +	-3
> +	 4
> +	 5
> +	 6
> +	EOF
> +	compare_diff_patch expected plain &&
> +
> +	# Both removed lines match regex - ignore hunk
> +	git diff -I "^[1-3]" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Both removed lines match some regex - ignore hunk
> +	git diff -I "^1" -I "^3" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# First removed line does not match regex - do not ignore hunk
> +	git diff -I "^[2-3]" >actual &&
> +	compare_diff_patch plain actual &&
> +
> +	# First removed line does not match any regex - do not ignore hunk
> +	git diff -I "^2" -I "^3" >actual &&
> +	compare_diff_patch plain actual &&
> +
> +	# Second removed line does not match regex - do not ignore hunk
> +	git diff -I "^[1-2]" >actual &&
> +	compare_diff_patch plain actual &&
> +
> +	# Second removed line does not match any regex - do not ignore hunk
> +	git diff -I "^1" -I "^2" >actual &&
> +	compare_diff_patch plain actual
> +'
> +
> +test_expect_success 'two non-adjacent lines removed in the same hunk, with -U1' '
> +	test_seq 20 | sed "1d; 3d" >x &&
> +
> +	# Get plain diff
> +	git diff -U1 >plain &&
> +	cat >expected <<-EOF &&
> +	diff --git a/x b/x
> +	--- a/x
> +	+++ b/x
> +	@@ -1,4 +1,2 @@
> +	-1
> +	 2
> +	-3
> +	 4
> +	EOF
> +	compare_diff_patch expected plain &&
> +
> +	# Both removed lines match regex - ignore hunk
> +	git diff -U1 -I "^[1-3]" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Both removed lines match some regex - ignore hunk
> +	git diff -U1 -I "^1" -I "^3" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# First removed line does not match regex, but is out of context - ignore second change
> +	git diff -U1 -I "^[2-3]" >actual &&
> +	cat >second-change-ignored <<-EOF &&
> +	diff --git a/x b/x
> +	--- a/x
> +	+++ b/x
> +	@@ -1,2 +1 @@
> +	-1
> +	 2
> +	EOF
> +	compare_diff_patch second-change-ignored actual &&
> +
> +	# First removed line does not match any regex, but is out of context - ignore second change
> +	git diff -U1 -I "^2" -I "^3" >actual &&
> +	compare_diff_patch second-change-ignored actual &&
> +
> +	# Second removed line does not match regex, but is out of context - ignore first change
> +	git diff -U1 -I "^[1-2]" >actual &&
> +	cat >first-change-ignored <<-EOF &&
> +	diff --git a/x b/x
> +	--- a/x
> +	+++ b/x
> +	@@ -2,3 +1,2 @@
> +	 2
> +	-3
> +	 4
> +	EOF
> +	compare_diff_patch first-change-ignored actual &&
> +
> +	# Second removed line does not match any regex, but is out of context - ignore first change
> +	git diff -U1 -I "^1" -I "^2" >actual &&
> +	compare_diff_patch first-change-ignored actual
> +'
> +
> +test_expect_success 'multiple hunks' '
> +	test_seq 20 | sed "1d; 20d" >x &&
> +
> +	# Get plain diff
> +	git diff >plain &&
> +	cat >expected <<-EOF &&
> +	diff --git a/x b/x
> +	--- a/x
> +	+++ b/x
> +	@@ -1,4 +1,3 @@
> +	-1
> +	 2
> +	 3
> +	 4
> +	@@ -17,4 +16,3 @@
> +	 17
> +	 18
> +	 19
> +	-20
> +	EOF
> +	compare_diff_patch expected plain &&
> +
> +	# Ignore both hunks (single regex)
> +	git diff -I "^[12]" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Ignore both hunks (multiple regexes)
> +	git diff -I "^1" -I "^2" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Only ignore first hunk (single regex)
> +	git diff -I "^1" >actual &&
> +	cat >first-hunk-ignored <<-EOF &&
> +	diff --git a/x b/x
> +	--- a/x
> +	+++ b/x
> +	@@ -17,4 +16,3 @@
> +	 17
> +	 18
> +	 19
> +	-20
> +	EOF
> +	compare_diff_patch first-hunk-ignored actual &&
> +
> +	# Only ignore first hunk (multiple regexes)
> +	git diff -I "^0" -I "^1" >actual &&
> +	compare_diff_patch first-hunk-ignored actual &&
> +
> +	# Only ignore second hunk (single regex)
> +	git diff -I "^2" >actual &&
> +	cat >second-hunk-ignored <<-EOF &&
> +	diff --git a/x b/x
> +	--- a/x
> +	+++ b/x
> +	@@ -1,4 +1,3 @@
> +	-1
> +	 2
> +	 3
> +	 4
> +	EOF
> +	compare_diff_patch second-hunk-ignored actual &&
> +
> +	# Only ignore second hunk (multiple regexes)
> +	git diff -I "^2" -I "^3" >actual &&
> +	compare_diff_patch second-hunk-ignored actual
> +'
> +
> +test_expect_success 'multiple hunks, with --ignore-blank-lines' '
> +	echo >x &&
> +	test_seq 21 >>x &&
> +
> +	# Get plain diff
> +	git diff >plain &&
> +	cat >expected <<-EOF &&
> +	diff --git a/x b/x
> +	--- a/x
> +	+++ b/x
> +	@@ -1,3 +1,4 @@
> +	+
> +	 1
> +	 2
> +	 3
> +	@@ -18,3 +19,4 @@
> +	 18
> +	 19
> +	 20
> +	+21
> +	EOF
> +	compare_diff_patch expected plain &&
> +
> +	# -I does not override --ignore-blank-lines - ignore both hunks (single regex)
> +	git diff --ignore-blank-lines -I "^21" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# -I does not override --ignore-blank-lines - ignore both hunks (multiple regexes)
> +	git diff --ignore-blank-lines -I "^21" -I "^12" >actual &&
> +	test_must_be_empty actual
> +'
> +
> +test_expect_success 'diffstat' '
> +	test_seq 20 | sed "s/^5/0/p; s/^15/10/; 16d" >x &&
> +
> +	# Get plain diffstat
> +	git diff --stat >actual &&
> +	cat >expected <<-EOF &&
> +	 x | 6 +++---
> +	 1 file changed, 3 insertions(+), 3 deletions(-)
> +	EOF
> +	test_cmp expected actual &&
> +
> +	# Ignore both hunks (single regex)
> +	git diff --stat -I "^[0-5]" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Ignore both hunks (multiple regexes)
> +	git diff --stat -I "^0" -I "^1" -I "^5" >actual &&
> +	test_must_be_empty actual &&
> +
> +	# Only ignore first hunk (single regex)
> +	git diff --stat -I "^[05]" >actual &&
> +	cat >expected <<-EOF &&
> +	 x | 3 +--
> +	 1 file changed, 1 insertion(+), 2 deletions(-)
> +	EOF
> +	test_cmp expected actual &&
> +
> +	# Only ignore first hunk (multiple regexes)
> +	git diff --stat -I "^0" -I "^5" >actual &&
> +	test_cmp expected actual &&
> +
> +	# Only ignore second hunk (single regex)
> +	git diff --stat -I "^1" >actual &&
> +	cat >expected <<-EOF &&
> +	 x | 3 ++-
> +	 1 file changed, 2 insertions(+), 1 deletion(-)
> +	EOF
> +	test_cmp expected actual &&
> +
> +	# Only ignore second hunk (multiple regexes)
> +	git diff --stat -I "^1" -I "^2" >actual &&
> +	test_cmp expected actual
> +'
> +
> +test_expect_success 'invalid regexes' '
> +	>x &&
> +
> +	# Single invalid regex
> +	git diff -I "^[1" 2>&1 | grep "invalid regex: " &&
> +
> +	# Two regexes: first invalid, second valid
> +	git diff -I "^[1" -I "^1" 2>&1 | grep "invalid regex: " &&
> +
> +	# Two invalid regexes
> +	git diff -I "^[1" -I "^[2" 2>&1 | grep "invalid regex: "
> +'
> +
> +test_done
> --
> 2.28.0
>
>

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures
  2020-10-12 11:14     ` Johannes Schindelin
@ 2020-10-12 17:09       ` Junio C Hamano
  0 siblings, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2020-10-12 17:09 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Michał Kępień, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>>  	xpparam_t xpparam;
>> +
>> +	memset(&xpparam, 0, sizeof(xpparam));
>
> A slightly more elegant way to do this would be
>
> 	xpparam_t xpparam = { 0l };
>
> Or even
>
> 	xpparam_t xpparam = XPPARAM_T_INIT;
>
> with
>
> 	#define XPPARAM_T_INIT { 0l, NULL, 0 }
>
> in `xdiff/xdiff.h`.

Let's not suggest any of the above.

I think we had a recent thread [*1*] on which we agreed that "{ 0 }"
is the way to spell "a naturally zero" initialization like this one,
if we were to spell it out.

TBH, I'd say that memset() is also OK as-is in this codebase as an
established way (git grep 'memset([^,]*, 0, sizeof' gives too many
hits).

Thanks.


[Reference]

*1* https://lore.kernel.org/git/xmqq4knca7c6.fsf@gitster.c.googlers.com/

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-12  9:17   ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień
  2020-10-12 11:20     ` Johannes Schindelin
@ 2020-10-12 18:01     ` Junio C Hamano
  2020-10-13  6:38       ` Michał Kępień
  2020-10-12 20:04     ` Junio C Hamano
  2 siblings, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2020-10-12 18:01 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

Michał Kępień <michal@isc.org> writes:

> Add a new diff option that enables ignoring changes whose all lines
> (changed, removed, and added) match a given regular expression.  This is
> similar to the -I option in standalone diff utilities and can be used
> e.g. to ignore changes which only affect code comments or to look for
> unrelated changes in commits containing a large number of automatically
> applied modifications (e.g. a tree-wide string replacement).  The
> difference between -G/-S and the new -I option is that the latter
> filters output on a per-change basis.
>
> Use the 'ignore' field of xdchange_t for marking a change as ignored or
> not.  Since the same field is used by --ignore-blank-lines, identical
> hunk emitting rules apply for --ignore-blank-lines and -I.  These two
> options can also be used together in the same git invocation (they are
> complementary to each other).
>
> Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
> that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
> than its "parent".
>
> Signed-off-by: Michał Kępień <michal@isc.org>

Well explained.

> +-I<regex>::

Since we are mimicking other folks' feature, giving also the

--ignore-matching-lines=<regex>

synonym to that their users are familiar with would be a good idea,
no?

> +	Ignore changes whose all lines match <regex>.  This option may
> +	be specified more than once.
> +
>  --inter-hunk-context=<lines>::
>  	Show the context between diff hunks, up to the specified number
>  	of lines, thereby fusing hunks that are close to each other.
> diff --git a/diff.c b/diff.c
> index 2bb2f8f57e..c5d4920fee 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -3587,6 +3587,8 @@ static void builtin_diff(const char *name_a,
>  		if (header.len && !o->flags.suppress_diff_headers)
>  			ecbdata.header = &header;
>  		xpp.flags = o->xdl_opts;
> +		xpp.ignore_regex = o->ignore_regex;
> +		xpp.ignore_regex_nr = o->ignore_regex_nr;
>  		xpp.anchors = o->anchors;
>  		xpp.anchors_nr = o->anchors_nr;
>  		xecfg.ctxlen = o->context;
> @@ -3716,6 +3718,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
>  		memset(&xpp, 0, sizeof(xpp));
>  		memset(&xecfg, 0, sizeof(xecfg));
>  		xpp.flags = o->xdl_opts;
> +		xpp.ignore_regex = o->ignore_regex;
> +		xpp.ignore_regex_nr = o->ignore_regex_nr;
>  		xpp.anchors = o->anchors;
>  		xpp.anchors_nr = o->anchors_nr;
>  		xecfg.ctxlen = o->context;
> @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt,
>  	return 0;
>  }
>  
> +static int diff_opt_ignore_regex(const struct option *opt,
> +				 const char *arg, int unset)
> +{
> +	struct diff_options *options = opt->value;
> +	regex_t *regex;
> +
> +	BUG_ON_OPT_NEG(unset);
> +	regex = xcalloc(1, sizeof(*regex));
> +	if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE))
> +		die("invalid regex: %s", arg);
> +	ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1,
> +		   options->ignore_regex_alloc);
> +	options->ignore_regex[options->ignore_regex_nr++] = regex;
> +	return 0;
> +}

Don't these parse-options callback functions have a way to tell the
caller die instead of them themselves dying like this?  Would it
work better to "return error(... message ...)", or would that give
two error messages?  In anycase, this is end-user facing, so we'd
want it to be localizable, e.g.

	die(_("invalid regexp given to -I: '%s'", arg));

probably.

>  static int diff_opt_pickaxe_regex(const struct option *opt,
>  				  const char *arg, int unset)
>  {

Other than that, nicely done.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures
  2020-10-12  9:17   ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
  2020-10-12 11:14     ` Johannes Schindelin
@ 2020-10-12 19:52     ` Junio C Hamano
  2020-10-13  6:35       ` Michał Kępień
  1 sibling, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2020-10-12 19:52 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

Michał Kępień <michal@isc.org> writes:

> diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
> index e72714a5a8..de8520778d 100644
> --- a/builtin/merge-tree.c
> +++ b/builtin/merge-tree.c
> @@ -109,6 +109,7 @@ static void show_diff(struct merge_list *entry)
>  	xdemitconf_t xecfg;
>  	xdemitcb_t ecb;
>  
> +	memset(&xpp, 0, sizeof(xpp));
>  	xpp.flags = 0;
>  	memset(&xecfg, 0, sizeof(xecfg));
>  	xecfg.ctxlen = 3;

The existing "xpp.flags=0" can then go away, and as Dscho hinted,

	xpparam_t xpp = {
		.flags = 0,
	};

may also be a valid way to write this.  What it says is that we want
the entirety of xpp to be reasonably initialized, but we do care
that its .flags field to be exactly zero.

> diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
> index c7b35a9667..e694bfd9e3 100644
> --- a/xdiff/xhistogram.c
> +++ b/xdiff/xhistogram.c
> @@ -235,6 +235,8 @@ static int fall_back_to_classic_diff(xpparam_t const *xpp, xdfenv_t *env,
>  		int line1, int count1, int line2, int count2)
>  {
>  	xpparam_t xpparam;
> +
> +	memset(&xpparam, 0, sizeof(xpparam));
>  	xpparam.flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK;

Likewise.  Giving an initializer to the local variable def, e.g.

	xpparam_t xpparam = {
		.flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK,
	};

would allow us to lose one line.

>  	return xdl_fall_back_diff(env, &xpparam,
> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
> index 3c5601b602..20699a6f60 100644
> --- a/xdiff/xpatience.c
> +++ b/xdiff/xpatience.c
> @@ -318,6 +318,8 @@ static int fall_back_to_classic_diff(struct hashmap *map,
>  		int line1, int count1, int line2, int count2)
>  {
>  	xpparam_t xpp;
> +
> +	memset(&xpp, 0, sizeof(xpp));
>  	xpp.flags = map->xpp->flags & ~XDF_DIFF_ALGORITHM_MASK;

Likewise.

>  	return xdl_fall_back_diff(map->env, &xpp,

But the patch is good as-is, given especially the way how xecfg is
cleared the same way in builtin/merge-tree.c::show_diff().

Will queue.  Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-12 11:20     ` Johannes Schindelin
@ 2020-10-12 20:00       ` Junio C Hamano
  2020-10-12 20:39         ` Johannes Schindelin
  2020-10-13  6:36       ` Michał Kępień
  1 sibling, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2020-10-12 20:00 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Michał Kępień, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi Michał,
>
> On Mon, 12 Oct 2020, Michał Kępień wrote:
>
>> @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt,
>>  	return 0;
>>  }
>>
>> +static int diff_opt_ignore_regex(const struct option *opt,
>> +				 const char *arg, int unset)
>> +{
>> +	struct diff_options *options = opt->value;
>> +	regex_t *regex;
>> +
>> +	BUG_ON_OPT_NEG(unset);
>> +	regex = xcalloc(1, sizeof(*regex));
>> +	if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE))
>> +		die("invalid regex: %s", arg);
>> +	ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1,
>> +		   options->ignore_regex_alloc);
>> +	options->ignore_regex[options->ignore_regex_nr++] = regex;
>
> A slightly more elegant way would be to have `ignore_regex` have the type
> `regex_t *` and use `ALLOC_GROW_BY()` (which zeroes the newly-added
> elements automagically).

It may be "elegant", but we we know if it is "correct" on
everybody's implementation of regex_t?

A struct like

	struct foo {
		char *str;
		char in_place_buffer[10];
	};

where str points at in_place_buffer[] only when it needs to point at
a very short string, and points at an allocated memory on heap, can
not be safely copied and used, and an array of such a struct breaks
when ALLOC_GROW_BY() needs to call realloc().  Keeping an array of
pointers to regex_t and growing it would not break even for such a
structure type.

Since we cannot rely on the implementation detail of regex_t on a
single platform, I'd rather not to see anybody suggest such an
"elegant" approach.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-12  9:17   ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień
  2020-10-12 11:20     ` Johannes Schindelin
  2020-10-12 18:01     ` Junio C Hamano
@ 2020-10-12 20:04     ` Junio C Hamano
  2020-10-13  6:38       ` Michał Kępień
  2 siblings, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2020-10-12 20:04 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

Michał Kępień <michal@isc.org> writes:

> +	/* see Documentation/diff-options.txt */

This comment adds negative value.

If it were

	/* "-I<regexp>" */

the readers won't have to switch to the file only to find out that
the comment didn't tell them where in the file to look at. 

> +	regex_t **ignore_regex;
> +	size_t ignore_regex_nr, ignore_regex_alloc;
> +
>  	const char *single_follow;
>  	const char *a_prefix, *b_prefix;
>  	const char *line_prefix;
> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> index 032e3a9f41..883a0d770e 100644
> --- a/xdiff/xdiff.h
> +++ b/xdiff/xdiff.h
> @@ -79,6 +79,10 @@ typedef struct s_mmbuffer {
>  typedef struct s_xpparam {
>  	unsigned long flags;
>  
> +	/* See Documentation/diff-options.txt. */

Likewise.

> +	regex_t **ignore_regex;
> +	size_t ignore_regex_nr;
> +
>  	/* See Documentation/diff-options.txt. */
>  	char **anchors;
>  	size_t anchors_nr;

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-12 20:00       ` Junio C Hamano
@ 2020-10-12 20:39         ` Johannes Schindelin
  2020-10-12 21:43           ` Junio C Hamano
  0 siblings, 1 reply; 57+ messages in thread
From: Johannes Schindelin @ 2020-10-12 20:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michał Kępień, git

[-- Attachment #1: Type: text/plain, Size: 2158 bytes --]

Hi Junio,

On Mon, 12 Oct 2020, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Hi Michał,
> >
> > On Mon, 12 Oct 2020, Michał Kępień wrote:
> >
> >> @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt,
> >>  	return 0;
> >>  }
> >>
> >> +static int diff_opt_ignore_regex(const struct option *opt,
> >> +				 const char *arg, int unset)
> >> +{
> >> +	struct diff_options *options = opt->value;
> >> +	regex_t *regex;
> >> +
> >> +	BUG_ON_OPT_NEG(unset);
> >> +	regex = xcalloc(1, sizeof(*regex));
> >> +	if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE))
> >> +		die("invalid regex: %s", arg);
> >> +	ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1,
> >> +		   options->ignore_regex_alloc);
> >> +	options->ignore_regex[options->ignore_regex_nr++] = regex;
> >
> > A slightly more elegant way would be to have `ignore_regex` have the type
> > `regex_t *` and use `ALLOC_GROW_BY()` (which zeroes the newly-added
> > elements automagically).
>
> It may be "elegant", but we we know if it is "correct" on
> everybody's implementation of regex_t?
>
> A struct like
>
> 	struct foo {
> 		char *str;
> 		char in_place_buffer[10];
> 	};
>
> where str points at in_place_buffer[] only when it needs to point at
> a very short string, and points at an allocated memory on heap, can
> not be safely copied and used, and an array of such a struct breaks
> when ALLOC_GROW_BY() needs to call realloc().  Keeping an array of
> pointers to regex_t and growing it would not break even for such a
> structure type.
>
> Since we cannot rely on the implementation detail of regex_t on a
> single platform, I'd rather not to see anybody suggest such an
> "elegant" approach.

True, such an implementation detail is totally conceivable.

Thanks for the sanity check.

Having said that, my suggestion to use `ALLOC_GROW_BY()` was triggered by
the use of `xcalloc()`, and now that I think further about it, an
`xmalloc()` should be plenty enough: `regcomp()` does not need that struct
to be initialized to all zero.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-12 20:39         ` Johannes Schindelin
@ 2020-10-12 21:43           ` Junio C Hamano
  2020-10-13  6:37             ` Michał Kępień
  0 siblings, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2020-10-12 21:43 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Michał Kępień, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> the use of `xcalloc()`, and now that I think further about it, an
> `xmalloc()` should be plenty enough: `regcomp()` does not need that struct
> to be initialized to all zero.

Yup, I think it makes sense.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures
  2020-10-12 19:52     ` Junio C Hamano
@ 2020-10-13  6:35       ` Michał Kępień
  0 siblings, 0 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-13  6:35 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin; +Cc: git

Junio, Johannes,

Thank you for your feedback.

> But the patch is good as-is, given especially the way how xecfg is
> cleared the same way in builtin/merge-tree.c::show_diff().

I just wanted to explain that I initially planned to take the "= { 0 }"
approach and I started checking whether all xpparam_t structures in the
source tree are stack-allocated, but I quickly noticed that, as Junio
pointed out, existing code showed an inclination towards memset(), so I
settled for that instead.

> Will queue.  Thanks.

I have a process question: since the other two patches in this patch
series will need a v3, would you like me to keep bundling patch 1 in
this series or should I only send revised version of patches 2 & 3 in
v3?

-- 
Best regards,
Michał Kępień

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-12 11:20     ` Johannes Schindelin
  2020-10-12 20:00       ` Junio C Hamano
@ 2020-10-13  6:36       ` Michał Kępień
  2020-10-13 12:02         ` Johannes Schindelin
  1 sibling, 1 reply; 57+ messages in thread
From: Michał Kępień @ 2020-10-13  6:36 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Hi Johannes,

> > @@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options)
> >  		OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts,
> >  			  N_("ignore changes whose lines are all blank"),
> >  			  XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG),
> > +		OPT_CALLBACK_F('I', NULL, options, N_("<regex>"),
> > +			       N_("ignore changes whose all lines match <regex>"),
> > +			       0, diff_opt_ignore_regex),
> >  		OPT_BIT(0, "indent-heuristic", &options->xdl_opts,
> >  			N_("heuristic to shift diff hunk boundaries for easy reading"),
> >  			XDF_INDENT_HEURISTIC),
> 
> Are we releasing the `ignore_regex` anywhere?

Oops, I tried to mimic what is done for 'anchors', and I failed to
notice that apparently the elements of the options->anchors array are
only free()'d when --patience is also used and the array pointer itself
is never free()'d at all.  Given this, I believe I need to fix
diff_opt_ignore_regex() in patch 2 and also make sure that the memory
allocated in diff_opt_anchored() gets properly released - in another
preliminary clean-up patch?

At first glance, diff_flush() - specifically the part below the
'free_queue' label - looks like a sane place to free() things.  Am I
mistaken?

-- 
Best regards,
Michał Kępień

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-12 21:43           ` Junio C Hamano
@ 2020-10-13  6:37             ` Michał Kępień
  2020-10-13 15:49               ` Junio C Hamano
  0 siblings, 1 reply; 57+ messages in thread
From: Michał Kępień @ 2020-10-13  6:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

> > the use of `xcalloc()`, and now that I think further about it, an
> > `xmalloc()` should be plenty enough: `regcomp()` does not need that struct
> > to be initialized to all zero.
> 
> Yup, I think it makes sense.

Sure thing, I will switch to xmalloc() in v3.  xcalloc()'s semantics
("allocate one structure of this size") appealed to me, but there is
indeed no need to zero-initialize the allocated memory before passing it
to regcomp().

-- 
Best regards,
Michał Kępień

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-12 18:01     ` Junio C Hamano
@ 2020-10-13  6:38       ` Michał Kępień
  0 siblings, 0 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-13  6:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

> > +-I<regex>::
> 
> Since we are mimicking other folks' feature, giving also the
> 
> --ignore-matching-lines=<regex>
> 
> synonym to that their users are familiar with would be a good idea,
> no?

Some diff implementations (e.g. the OpenBSD one) lack the long option
and I was aiming for the "greatest common divisor" for all diff
implementations, but sure, I do not see how adding the synonym could
hurt.  I will do that in v3.

> > @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt,
> >  	return 0;
> >  }
> >  
> > +static int diff_opt_ignore_regex(const struct option *opt,
> > +				 const char *arg, int unset)
> > +{
> > +	struct diff_options *options = opt->value;
> > +	regex_t *regex;
> > +
> > +	BUG_ON_OPT_NEG(unset);
> > +	regex = xcalloc(1, sizeof(*regex));
> > +	if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE))
> > +		die("invalid regex: %s", arg);
> > +	ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1,
> > +		   options->ignore_regex_alloc);
> > +	options->ignore_regex[options->ignore_regex_nr++] = regex;
> > +	return 0;
> > +}
> 
> Don't these parse-options callback functions have a way to tell the
> caller die instead of them themselves dying like this?  Would it
> work better to "return error(... message ...)", or would that give
> two error messages?

Yes, that would indeed look better, thanks - just one error message is
generated when "return error(...)" is used.  I just failed to notice
that it is a thing.  I will use it in v3.

> In anycase, this is end-user facing, so we'd
> want it to be localizable, e.g.
> 
> 	die(_("invalid regexp given to -I: '%s'", arg));
> 
> probably.

Absolutely, I will fix that in v3.  I tried to mimic what other similar
code in the tree does and a quick `git grep 'die.*regex'` yielded mostly
non-localized strings, so I settled for one as well.

-- 
Best regards,
Michał Kępień

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-12 20:04     ` Junio C Hamano
@ 2020-10-13  6:38       ` Michał Kępień
  0 siblings, 0 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-13  6:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

> > +	/* see Documentation/diff-options.txt */
> 
> This comment adds negative value.
> 
> If it were
> 
> 	/* "-I<regexp>" */
> 
> the readers won't have to switch to the file only to find out that
> the comment didn't tell them where in the file to look at. 

Agreed, thanks, I will fix this in v3.

> > +	regex_t **ignore_regex;
> > +	size_t ignore_regex_nr, ignore_regex_alloc;
> > +
> >  	const char *single_follow;
> >  	const char *a_prefix, *b_prefix;
> >  	const char *line_prefix;
> > diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> > index 032e3a9f41..883a0d770e 100644
> > --- a/xdiff/xdiff.h
> > +++ b/xdiff/xdiff.h
> > @@ -79,6 +79,10 @@ typedef struct s_mmbuffer {
> >  typedef struct s_xpparam {
> >  	unsigned long flags;
> >  
> > +	/* See Documentation/diff-options.txt. */
> 
> Likewise.

I will fix this in v3.

-- 
Best regards,
Michał Kępień

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 3/3] t: add -I<regex> tests
  2020-10-12 11:49     ` Johannes Schindelin
@ 2020-10-13  6:38       ` Michał Kępień
  2020-10-13 12:00         ` Johannes Schindelin
  0 siblings, 1 reply; 57+ messages in thread
From: Michał Kępień @ 2020-10-13  6:38 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Hi Johannes,

> > Exercise the new -I<regex> diff option in various scenarios to ensure it
> > behaves as expected.
> 
> Excellent. I was actually looking for a test in patch 2/3 and almost
> commented about that.

Right, I expressed my doubts in this area at the end of the cover letter
for v1:

>>   - Should tests be added in a separate commit?  This is what I did as I
>>     thought it would help with readability, but...

I will be glad to follow any guidance provided, I just picked one of the
two possible routes for v1.

> Hmm. I wonder whether we could do with a much more concise test script.
> The test suite already takes a quite long time to run, which is not a
> laughing matter: we had issues in the past where contributors would skip
> running it because it took too long, and this test is sure to exacerbate
> that problem.

First, let me say that the goal of minimizing the run time of a test
suite is close to my heart (it is an issue at my day job).  Yet, I
assumed that this new test would not be detrimental to test suite run
times as it takes about half a second to run t4069-diff-ignore-regex.sh
on my machine - and (I hope) its contents are in line with the "tests
are the best documentation" proverb.  That being said, I realize that
the hosts used in various test environments may have different
processing capabilities.  I tried preparing something exhaustive and
well-commented, so that it is clear what to expect from the new feature.
Yet, if you would rather have me cut some things out, I am certainly not
particularly attached to the tests from patch 3 and I will be glad to
rip them out if that is the recommendation :-)

> I could imagine, for example, that it would be plenty enough to do
> something like this instead:
> 
> -- snip --
> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> index 5c7b0122b4f..bf158be137f 100755
> --- a/t/t4013-diff-various.sh
> +++ b/t/t4013-diff-various.sh
> @@ -6,6 +6,7 @@
>  test_description='Various diff formatting options'
> 
>  . ./test-lib.sh
> +. "$TEST_DIRECTORY"/diff-lib.sh
> 
>  test_expect_success setup '
> 
> @@ -473,4 +474,24 @@ test_expect_success 'diff-tree --stdin with log formatting' '
>  	test_cmp expect actual
>  '
> 
> +test_expect_success '-I<regex>' '
> +	seq 50 >I.txt &&
> +	sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt &&
> +	test_must_fail git diff --no-index -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&
> +	cat >expect <<-\EOF &&
> +	diff --git a/I.txt b/J.txt
> +	--- a/I.txt
> +	+++ b/J.txt
> +	@@ -34,7 +31,6 @@
> +	 34
> +	 35
> +	 36
> +	-37
> +	 38
> +	 39
> +	 40
> +	EOF
> +	compare_diff_patch expect actual
> +'
> +
>  test_done
> -- snap --
> 
> Note how it tests various things in one go?

Right, neat, though this does not (yet) test:

  - the interaction between -I and --ignore-blank-lines (this is visible
    in code coverage),

  - whether the list of hunks emitted varies for different -U<n> values,

  - diffstat with -I<regex>,

  - invalid regular expressions.

Would you like me to add these tests to your proposal or to skip them,
given that -I uses the same field for marking changes as ignored as
--ignore-blank-lines does?

> P.S.: My main interest in the `-I` option is its use case for `git
> range-diff` in Git's own context: if you want to compare your patches to
> what entered the `seen` branch, there will _always_ be a difference
> because Junio adds their DCO. Something like this can help that:
> 
> git range-diff \
> 	-I'^    Signed-off-by: Junio C Hamano <gitster@pobox.com>$' \
> 	<my-patch-range> <the-equivalent-in-seen>

Right, makes sense, I have not thought of that use case.

-- 
Best regards,
Michał Kępień

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 3/3] t: add -I<regex> tests
  2020-10-13  6:38       ` Michał Kępień
@ 2020-10-13 12:00         ` Johannes Schindelin
  2020-10-13 16:00           ` Junio C Hamano
  2020-10-13 19:01           ` Michał Kępień
  0 siblings, 2 replies; 57+ messages in thread
From: Johannes Schindelin @ 2020-10-13 12:00 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3249 bytes --]

Hi Michał,

On Tue, 13 Oct 2020, Michał Kępień wrote:

> > Hmm. I wonder whether we could do with a much more concise test script.
> > The test suite already takes a quite long time to run, which is not a
> > laughing matter: we had issues in the past where contributors would skip
> > running it because it took too long, and this test is sure to exacerbate
> > that problem.
>
> First, let me say that the goal of minimizing the run time of a test
> suite is close to my heart (it is an issue at my day job).  Yet, I
> assumed that this new test would not be detrimental to test suite run
> times as it takes about half a second to run t4069-diff-ignore-regex.sh
> on my machine - and (I hope) its contents are in line with the "tests
> are the best documentation" proverb.

Sadly, the test is not quite as fast on Windows. I just ran this (on a not
quite idle machine, admittedly) and it ended in this:

	# passed all 11 test(s)
	1..11

	real    0m51.470s
	user    0m0.046s
	sys     0m0.015s

Yes, that's almost a minute.

> > I could imagine, for example, that it would be plenty enough to do
> > something like this instead:
> >
> > -- snip --
> > diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> > index 5c7b0122b4f..bf158be137f 100755
> > --- a/t/t4013-diff-various.sh
> > +++ b/t/t4013-diff-various.sh
> > @@ -6,6 +6,7 @@
> >  test_description='Various diff formatting options'
> >
> >  . ./test-lib.sh
> > +. "$TEST_DIRECTORY"/diff-lib.sh
> >
> >  test_expect_success setup '
> >
> > @@ -473,4 +474,24 @@ test_expect_success 'diff-tree --stdin with log formatting' '
> >  	test_cmp expect actual
> >  '
> >
> > +test_expect_success '-I<regex>' '
> > +	seq 50 >I.txt &&
> > +	sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt &&
> > +	test_must_fail git diff --no-index -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&
> > +	cat >expect <<-\EOF &&
> > +	diff --git a/I.txt b/J.txt
> > +	--- a/I.txt
> > +	+++ b/J.txt
> > +	@@ -34,7 +31,6 @@
> > +	 34
> > +	 35
> > +	 36
> > +	-37
> > +	 38
> > +	 39
> > +	 40
> > +	EOF
> > +	compare_diff_patch expect actual
> > +'
> > +
> >  test_done
> > -- snap --
> >
> > Note how it tests various things in one go?
>
> Right, neat, though this does not (yet) test:
>
>   - the interaction between -I and --ignore-blank-lines (this is visible
>     in code coverage),

Right. Any chance you can finagle that in, e.g. by yet another `-e`
argument to the `sed` call?

>   - whether the list of hunks emitted varies for different -U<n> values,

I am not worried about that.

>   - diffstat with -I<regex>,

I am not worried about that, either, as `diffstat` consumes `xdiff`'s
output, therefore if one consumer works, another consumer will work, too.

>   - invalid regular expressions.

Right, that should be super easy (and quick) to test.

> Would you like me to add these tests to your proposal or to skip them,
> given that -I uses the same field for marking changes as ignored as
> --ignore-blank-lines does?

I'd say it depends how easily (read: in how small a test case or
modifications to an existing test case) you can add a test for that
interaction.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-13  6:36       ` Michał Kępień
@ 2020-10-13 12:02         ` Johannes Schindelin
  2020-10-13 15:53           ` Junio C Hamano
  2020-10-13 18:45           ` Michał Kępień
  0 siblings, 2 replies; 57+ messages in thread
From: Johannes Schindelin @ 2020-10-13 12:02 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1599 bytes --]

Hi Michał,

On Tue, 13 Oct 2020, Michał Kępień wrote:

> Hi Johannes,
>
> > > @@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options)
> > >  		OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts,
> > >  			  N_("ignore changes whose lines are all blank"),
> > >  			  XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG),
> > > +		OPT_CALLBACK_F('I', NULL, options, N_("<regex>"),
> > > +			       N_("ignore changes whose all lines match <regex>"),
> > > +			       0, diff_opt_ignore_regex),
> > >  		OPT_BIT(0, "indent-heuristic", &options->xdl_opts,
> > >  			N_("heuristic to shift diff hunk boundaries for easy reading"),
> > >  			XDF_INDENT_HEURISTIC),
> >
> > Are we releasing the `ignore_regex` anywhere?
>
> Oops, I tried to mimic what is done for 'anchors', and I failed to
> notice that apparently the elements of the options->anchors array are
> only free()'d when --patience is also used and the array pointer itself
> is never free()'d at all.  Given this, I believe I need to fix
> diff_opt_ignore_regex() in patch 2 and also make sure that the memory
> allocated in diff_opt_anchored() gets properly released - in another
> preliminary clean-up patch?
>
> At first glance, diff_flush() - specifically the part below the
> 'free_queue' label - looks like a sane place to free() things.  Am I
> mistaken?

Oh wow, from a cursory look it seems as if the diff machinery was not
exactly careful with releasing memory. I might be mistaken, but if I am
not, then this would deserve a separate patch series, methinks.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-13  6:37             ` Michał Kępień
@ 2020-10-13 15:49               ` Junio C Hamano
  0 siblings, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2020-10-13 15:49 UTC (permalink / raw)
  To: Michał Kępień; +Cc: Johannes Schindelin, git

Michał Kępień <michal@isc.org> writes:

> ...  xcalloc()'s semantics
> ("allocate one structure of this size") appealed to me, ...

FWIW, it appeals to me, too ;-)  But if we are going to pass it to
regcomp(), there is no point clearing it beforehand.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-13 12:02         ` Johannes Schindelin
@ 2020-10-13 15:53           ` Junio C Hamano
  2020-10-13 18:45           ` Michał Kępień
  1 sibling, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2020-10-13 15:53 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Michał Kępień, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Oh wow, from a cursory look it seems as if the diff machinery was not
> exactly careful with releasing memory. I might be mistaken, but if I am
> not, then this would deserve a separate patch series, methinks.

I wouldn't be surprised if newer parts of it is much less careful
than the older parts of the machinery.  Most callers of the diff
machinery, including "log -p" that repeatedly generates patches,
would however run just a single setup for the entire series of diff
invocations before tearing it down AFIAR, so leaking the result of
parsing the command line option may not be such a big deal to these
callers.  But some callers added recently may not follow the access
pattern, in which case the machinery may have to be made more
leakproof.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 3/3] t: add -I<regex> tests
  2020-10-13 12:00         ` Johannes Schindelin
@ 2020-10-13 16:00           ` Junio C Hamano
  2020-10-13 19:01           ` Michał Kępień
  1 sibling, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2020-10-13 16:00 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Michał Kępień, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>>   - diffstat with -I<regex>,
>
> I am not worried about that, either, as `diffstat` consumes `xdiff`'s
> output, therefore if one consumer works, another consumer will work, too.

Careful.  Such a "we know what happens in the code" transparent-box
testing attitude is laying a minefield for later our selves.

As we learned in a recent bug in sequencer, some corners of
implementation do the same thing in different codepaths as
optimization.  The really bad part of the story is that such an
implementation detail can and will change over time, since that is
the kind of thing we do when optimizing code.

In other words, we only know what happens in the current code.  And
automated tests protect us from the future, when done right.  If
written with too intimate knowledge of how the current code works,
well, what are we really testing?  It's a balancing act and there is
no single "right" answer, but I'd draw the line on a bit more
careful side than you are drawing here.

Thanks.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes
  2020-10-13 12:02         ` Johannes Schindelin
  2020-10-13 15:53           ` Junio C Hamano
@ 2020-10-13 18:45           ` Michał Kępień
  1 sibling, 0 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-13 18:45 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Hi Johannes,

> Oh wow, from a cursory look it seems as if the diff machinery was not
> exactly careful with releasing memory. I might be mistaken, but if I am
> not, then this would deserve a separate patch series, methinks.

Oh, okay.  Since that sounds like a non-trivial task, I should probably
finish this patch series first, making sure it releases memory properly,
and only then move on to looking for memory leaks in diff.c.

-- 
Best regards,
Michał Kępień

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 3/3] t: add -I<regex> tests
  2020-10-13 12:00         ` Johannes Schindelin
  2020-10-13 16:00           ` Junio C Hamano
@ 2020-10-13 19:01           ` Michał Kępień
  2020-10-15 11:45             ` Johannes Schindelin
  1 sibling, 1 reply; 57+ messages in thread
From: Michał Kępień @ 2020-10-13 19:01 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Hi Johannes,

> > First, let me say that the goal of minimizing the run time of a test
> > suite is close to my heart (it is an issue at my day job).  Yet, I
> > assumed that this new test would not be detrimental to test suite run
> > times as it takes about half a second to run t4069-diff-ignore-regex.sh
> > on my machine - and (I hope) its contents are in line with the "tests
> > are the best documentation" proverb.
> 
> Sadly, the test is not quite as fast on Windows. I just ran this (on a not
> quite idle machine, admittedly) and it ended in this:
> 
> 	# passed all 11 test(s)
> 	1..11
> 
> 	real    0m51.470s
> 	user    0m0.046s
> 	sys     0m0.015s
> 
> Yes, that's almost a minute.

Out of curiosity: is that under Cygwin?  I have seen shell-based tests
finishing in 15 *seconds* on Unix-like systems and in 15 *minutes* under
Cygwin, which would be in line with your measurements provided above.

> > Right, neat, though this does not (yet) test:
> >
> >   - the interaction between -I and --ignore-blank-lines (this is visible
> >     in code coverage),
> 
> Right. Any chance you can finagle that in, e.g. by yet another `-e`
> argument to the `sed` call?

I will try in v3 (while also looking at what I can do for other missing
-I<regex> tests I pointed out).

-- 
Best regards,
Michał Kępień

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes
  2020-10-12  9:17 ` [PATCH v2 0/3] " Michał Kępień
                     ` (2 preceding siblings ...)
  2020-10-12  9:17   ` [PATCH v2 3/3] t: add -I<regex> tests Michał Kępień
@ 2020-10-15  7:24   ` Michał Kępień
  2020-10-15  7:24     ` [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
                       ` (3 more replies)
  3 siblings, 4 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-15  7:24 UTC (permalink / raw)
  To: git

This patch series adds a new diff option that enables ignoring changes
whose all lines (changed, removed, and added) match a given regular
expression.  This is similar to the -I/--ignore-matching-lines option in
standalone diff utilities and can be used e.g. to ignore changes which
only affect code comments or to look for unrelated changes in commits
containing a large number of automatically applied modifications (e.g. a
tree-wide string replacement).  The difference between -G/-S and the new
-I option is that the latter filters output on a per-change basis.

Changes from v2:

  - Add a long option for -I (--ignore-matching-lines) as it is
    commonplace in standalone diff utilities.  Update documentation and
    commit log messages accordingly.

  - Use xmalloc() instead of xcalloc() for allocating regex_t
    structures in diff_opt_ignore_regex().

  - Ensure the memory allocated in diff_opt_ignore_regex() gets
    released.

  - Use "return error(...)" instead of die() in the -I option callback.
    Make the relevant error message localizable.

  - Drastically reduce the number of -I<regex> tests due to excessive
    run time of t/t4069-diff-ignore-regex.sh from v1/v2 on some
    platforms (notably Windows).  Use a tweaked version of a test
    suggested by Johannes Schindelin (thanks!).  Given its reduction in
    size, squash patch 3 (which contained the tests) into patch 2.

  - Replace "see Documentation/diff-options.txt" with "-I<regex>" in the
    comments for the added structure fields, in order to make these
    comments more useful.

Changes from v1:

  - Add a new preliminary cleanup patch which ensures xpparam_t
    structures are always zero-initialized.  (This was a prerequisite
    for the next change below.)

  - Do not add a new 'xdl_opts' flag to check whether -I was used;
    instead, just check whether the array of regular expressions to
    match against is non-NULL.

  - Enable the -I option to be used multiple times.  As a consequence of
    this, regular expressions are now "pre-compiled" in the option's
    callback (and passed around as an array of regex_t structures)
    rather than deep down in xdiff code.  Add test cases exercising use
    of multiple -I options in the same git invocation.  Update
    documentation accordingly.

  - Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to
    indicate that it is logically a "sibling" of
    xdl_mark_ignorable_regex() rather than its "parent".

  - Optimize xdl_mark_ignorable_regex() by making it immediately skip
    changes already marked as ignored by xdl_mark_ignorable_lines().

  - Fix coding style issue in the prototype part of the definition of
    xdl_mark_ignorable_regex().

  - Add "/* see Documentation/diff-options.txt */" comments for the
    fields added to struct diff_options and xpparam_t, mimicking the
    comments used for 'anchors', 'anchors_nr', and 'anchors_alloc'.

  - Revise commit log messages to reflect all of the above.

Michał Kępień (2):
  merge-base, xdiff: zero out xpparam_t structures
  diff: add -I<regex> that ignores matching changes

 Documentation/diff-options.txt |  5 ++++
 builtin/merge-tree.c           |  1 +
 diff.c                         | 28 ++++++++++++++++++++
 diff.h                         |  4 +++
 t/t4013-diff-various.sh        | 33 ++++++++++++++++++++++++
 xdiff/xdiff.h                  |  4 +++
 xdiff/xdiffi.c                 | 47 ++++++++++++++++++++++++++++++++--
 xdiff/xhistogram.c             |  2 ++
 xdiff/xpatience.c              |  2 ++
 9 files changed, 124 insertions(+), 2 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures
  2020-10-15  7:24   ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
@ 2020-10-15  7:24     ` Michał Kępień
  2020-10-15  7:24     ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-15  7:24 UTC (permalink / raw)
  To: git

xpparam_t structures are usually zero-initialized before their specific
fields are assigned to, but there are three locations in the tree where
that does not happen.  Add the missing memset() calls in order to make
initialization of xpparam_t structures consistent tree-wide and to
prevent stack garbage from being used as field values.

Signed-off-by: Michał Kępień <michal@isc.org>
---
 builtin/merge-tree.c | 1 +
 xdiff/xhistogram.c   | 2 ++
 xdiff/xpatience.c    | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index e72714a5a8..de8520778d 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -109,6 +109,7 @@ static void show_diff(struct merge_list *entry)
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;
 
+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
index c7b35a9667..e694bfd9e3 100644
--- a/xdiff/xhistogram.c
+++ b/xdiff/xhistogram.c
@@ -235,6 +235,8 @@ static int fall_back_to_classic_diff(xpparam_t const *xpp, xdfenv_t *env,
 		int line1, int count1, int line2, int count2)
 {
 	xpparam_t xpparam;
+
+	memset(&xpparam, 0, sizeof(xpparam));
 	xpparam.flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK;
 
 	return xdl_fall_back_diff(env, &xpparam,
diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
index 3c5601b602..20699a6f60 100644
--- a/xdiff/xpatience.c
+++ b/xdiff/xpatience.c
@@ -318,6 +318,8 @@ static int fall_back_to_classic_diff(struct hashmap *map,
 		int line1, int count1, int line2, int count2)
 {
 	xpparam_t xpp;
+
+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = map->xpp->flags & ~XDF_DIFF_ALGORITHM_MASK;
 
 	return xdl_fall_back_diff(map->env, &xpp,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes
  2020-10-15  7:24   ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
  2020-10-15  7:24     ` [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
@ 2020-10-15  7:24     ` Michał Kępień
  2020-10-16 15:32       ` Phillip Wood
  2020-10-16 18:16       ` Junio C Hamano
  2020-10-16 10:00     ` [PATCH v3 0/2] " Johannes Schindelin
  2020-10-20  6:48     ` [PATCH v4 " Michał Kępień
  3 siblings, 2 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-15  7:24 UTC (permalink / raw)
  To: git

Add a new diff option that enables ignoring changes whose all lines
(changed, removed, and added) match a given regular expression.  This is
similar to the -I/--ignore-matching-lines option in standalone diff
utilities and can be used e.g. to ignore changes which only affect code
comments or to look for unrelated changes in commits containing a large
number of automatically applied modifications (e.g. a tree-wide string
replacement).  The difference between -G/-S and the new -I option is
that the latter filters output on a per-change basis.

Use the 'ignore' field of xdchange_t for marking a change as ignored or
not.  Since the same field is used by --ignore-blank-lines, identical
hunk emitting rules apply for --ignore-blank-lines and -I.  These two
options can also be used together in the same git invocation (they are
complementary to each other).

Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
than its "parent".

Signed-off-by: Michał Kępień <michal@isc.org>
---
 Documentation/diff-options.txt |  5 ++++
 diff.c                         | 28 ++++++++++++++++++++
 diff.h                         |  4 +++
 t/t4013-diff-various.sh        | 33 ++++++++++++++++++++++++
 xdiff/xdiff.h                  |  4 +++
 xdiff/xdiffi.c                 | 47 ++++++++++++++++++++++++++++++++--
 6 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 573fb9bb71..ee52b65e46 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -687,6 +687,11 @@ endif::git-format-patch[]
 --ignore-blank-lines::
 	Ignore changes whose lines are all blank.
 
+-I<regex>::
+--ignore-matching-lines=<regex>::
+	Ignore changes whose all lines match <regex>.  This option may
+	be specified more than once.
+
 --inter-hunk-context=<lines>::
 	Show the context between diff hunks, up to the specified number
 	of lines, thereby fusing hunks that are close to each other.
diff --git a/diff.c b/diff.c
index 2bb2f8f57e..677de23352 100644
--- a/diff.c
+++ b/diff.c
@@ -3587,6 +3587,8 @@ static void builtin_diff(const char *name_a,
 		if (header.len && !o->flags.suppress_diff_headers)
 			ecbdata.header = &header;
 		xpp.flags = o->xdl_opts;
+		xpp.ignore_regex = o->ignore_regex;
+		xpp.ignore_regex_nr = o->ignore_regex_nr;
 		xpp.anchors = o->anchors;
 		xpp.anchors_nr = o->anchors_nr;
 		xecfg.ctxlen = o->context;
@@ -3716,6 +3718,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		xpp.flags = o->xdl_opts;
+		xpp.ignore_regex = o->ignore_regex;
+		xpp.ignore_regex_nr = o->ignore_regex_nr;
 		xpp.anchors = o->anchors;
 		xpp.anchors_nr = o->anchors_nr;
 		xecfg.ctxlen = o->context;
@@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt,
 	return 0;
 }
 
+static int diff_opt_ignore_regex(const struct option *opt,
+				 const char *arg, int unset)
+{
+	struct diff_options *options = opt->value;
+	regex_t *regex;
+
+	BUG_ON_OPT_NEG(unset);
+	regex = xmalloc(sizeof(*regex));
+	if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE))
+		return error(_("invalid regex given to -I: '%s'"), arg);
+	ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1,
+		   options->ignore_regex_alloc);
+	options->ignore_regex[options->ignore_regex_nr++] = regex;
+	return 0;
+}
+
 static int diff_opt_pickaxe_regex(const struct option *opt,
 				  const char *arg, int unset)
 {
@@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options)
 		OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts,
 			  N_("ignore changes whose lines are all blank"),
 			  XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG),
+		OPT_CALLBACK_F('I', "ignore-matching-lines", options, N_("<regex>"),
+			       N_("ignore changes whose all lines match <regex>"),
+			       0, diff_opt_ignore_regex),
 		OPT_BIT(0, "indent-heuristic", &options->xdl_opts,
 			N_("heuristic to shift diff hunk boundaries for easy reading"),
 			XDF_INDENT_HEURISTIC),
@@ -6405,6 +6428,11 @@ void diff_flush(struct diff_options *options)
 	DIFF_QUEUE_CLEAR(q);
 	if (options->close_file)
 		fclose(options->file);
+	for (i = 0; i < options->ignore_regex_nr; i++) {
+		regfree(options->ignore_regex[i]);
+		free(options->ignore_regex[i]);
+	}
+	free(options->ignore_regex);
 
 	/*
 	 * Report the content-level differences with HAS_CHANGES;
diff --git a/diff.h b/diff.h
index 11de52e9e9..a402227b80 100644
--- a/diff.h
+++ b/diff.h
@@ -234,6 +234,10 @@ struct diff_options {
 	 */
 	const char *pickaxe;
 
+	/* -I<regex> */
+	regex_t **ignore_regex;
+	size_t ignore_regex_nr, ignore_regex_alloc;
+
 	const char *single_follow;
 	const char *a_prefix, *b_prefix;
 	const char *line_prefix;
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 5c7b0122b4..efaaee2ef0 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -6,6 +6,7 @@
 test_description='Various diff formatting options'
 
 . ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
 
 test_expect_success setup '
 
@@ -473,4 +474,36 @@ test_expect_success 'diff-tree --stdin with log formatting' '
 	test_cmp expect actual
 '
 
+test_expect_success 'diff -I<regex>' '
+	test_seq 50 >I.txt &&
+	sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt &&
+	echo >>J.txt &&
+
+	test_expect_code 1 git diff --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&
+	cat >expect <<-\EOF &&
+	diff --git a/I.txt b/J.txt
+	--- a/I.txt
+	+++ b/J.txt
+	@@ -34,7 +31,6 @@
+	 34
+	 35
+	 36
+	-37
+	 38
+	 39
+	 40
+	EOF
+	compare_diff_patch expect actual &&
+
+	test_expect_code 1 git diff --stat --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&
+	cat >expect <<-\EOF &&
+	 I.txt => J.txt | 1 -
+	 1 file changed, 1 deletion(-)
+	EOF
+	test_cmp expect actual &&
+
+	test_expect_code 129 git diff --no-index --ignore-matching-lines="^[124-9]" --ignore-matching-lines="^[124-9" I.txt J.txt >output 2>&1 &&
+	test_i18ngrep "invalid regex given to -I: " output
+'
+
 test_done
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 032e3a9f41..7a04605146 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -79,6 +79,10 @@ typedef struct s_mmbuffer {
 typedef struct s_xpparam {
 	unsigned long flags;
 
+	/* -I<regex> */
+	regex_t **ignore_regex;
+	size_t ignore_regex_nr;
+
 	/* See Documentation/diff-options.txt. */
 	char **anchors;
 	size_t anchors_nr;
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index bd035139f9..380eb728ed 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -998,7 +998,7 @@ static int xdl_call_hunk_func(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 	return 0;
 }
 
-static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
+static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
 {
 	xdchange_t *xch;
 
@@ -1019,6 +1019,46 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
 	}
 }
 
+static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) {
+	regmatch_t regmatch;
+	int i;
+
+	for (i = 0; i < xpp->ignore_regex_nr; i++)
+		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1,
+				 &regmatch, 0))
+			return 1;
+
+	return 0;
+}
+
+static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
+				     xpparam_t const *xpp)
+{
+	xdchange_t *xch;
+
+	for (xch = xscr; xch; xch = xch->next) {
+		xrecord_t **rec;
+		int ignore = 1;
+		long i;
+
+		/*
+		 * Do not override --ignore-blank-lines.
+		 */
+		if (xch->ignore)
+			continue;
+
+		rec = &xe->xdf1.recs[xch->i1];
+		for (i = 0; i < xch->chg1 && ignore; i++)
+			ignore = record_matches_regex(rec[i], xpp);
+
+		rec = &xe->xdf2.recs[xch->i2];
+		for (i = 0; i < xch->chg2 && ignore; i++)
+			ignore = record_matches_regex(rec[i], xpp);
+
+		xch->ignore = ignore;
+	}
+}
+
 int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
 	xdchange_t *xscr;
@@ -1038,7 +1078,10 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	}
 	if (xscr) {
 		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
-			xdl_mark_ignorable(xscr, &xe, xpp->flags);
+			xdl_mark_ignorable_lines(xscr, &xe, xpp->flags);
+
+		if (xpp->ignore_regex)
+			xdl_mark_ignorable_regex(xscr, &xe, xpp);
 
 		if (ef(&xe, xscr, ecb, xecfg) < 0) {
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 3/3] t: add -I<regex> tests
  2020-10-13 19:01           ` Michał Kępień
@ 2020-10-15 11:45             ` Johannes Schindelin
  0 siblings, 0 replies; 57+ messages in thread
From: Johannes Schindelin @ 2020-10-15 11:45 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]

Hi Michał,

On Tue, 13 Oct 2020, Michał Kępień wrote:

> > > First, let me say that the goal of minimizing the run time of a test
> > > suite is close to my heart (it is an issue at my day job).  Yet, I
> > > assumed that this new test would not be detrimental to test suite run
> > > times as it takes about half a second to run t4069-diff-ignore-regex.sh
> > > on my machine - and (I hope) its contents are in line with the "tests
> > > are the best documentation" proverb.
> >
> > Sadly, the test is not quite as fast on Windows. I just ran this (on a not
> > quite idle machine, admittedly) and it ended in this:
> >
> > 	# passed all 11 test(s)
> > 	1..11
> >
> > 	real    0m51.470s
> > 	user    0m0.046s
> > 	sys     0m0.015s
> >
> > Yes, that's almost a minute.
>
> Out of curiosity: is that under Cygwin?  I have seen shell-based tests
> finishing in 15 *seconds* on Unix-like systems and in 15 *minutes* under
> Cygwin, which would be in line with your measurements provided above.

Close enough: it's an MSYS2 Bash (and MSYS2 is a close fork of Cygwin). It
is what Git for Windows uses.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes
  2020-10-15  7:24   ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
  2020-10-15  7:24     ` [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
  2020-10-15  7:24     ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
@ 2020-10-16 10:00     ` Johannes Schindelin
  2020-10-20  6:48     ` [PATCH v4 " Michał Kępień
  3 siblings, 0 replies; 57+ messages in thread
From: Johannes Schindelin @ 2020-10-16 10:00 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1831 bytes --]

Hi Michał,

On Thu, 15 Oct 2020, Michał Kępień wrote:

> This patch series adds a new diff option that enables ignoring changes
> whose all lines (changed, removed, and added) match a given regular
> expression.  This is similar to the -I/--ignore-matching-lines option in
> standalone diff utilities and can be used e.g. to ignore changes which
> only affect code comments or to look for unrelated changes in commits
> containing a large number of automatically applied modifications (e.g. a
> tree-wide string replacement).  The difference between -G/-S and the new
> -I option is that the latter filters output on a per-change basis.
>
> Changes from v2:
>
>   - Add a long option for -I (--ignore-matching-lines) as it is
>     commonplace in standalone diff utilities.  Update documentation and
>     commit log messages accordingly.
>
>   - Use xmalloc() instead of xcalloc() for allocating regex_t
>     structures in diff_opt_ignore_regex().
>
>   - Ensure the memory allocated in diff_opt_ignore_regex() gets
>     released.
>
>   - Use "return error(...)" instead of die() in the -I option callback.
>     Make the relevant error message localizable.
>
>   - Drastically reduce the number of -I<regex> tests due to excessive
>     run time of t/t4069-diff-ignore-regex.sh from v1/v2 on some
>     platforms (notably Windows).  Use a tweaked version of a test
>     suggested by Johannes Schindelin (thanks!).  Given its reduction in
>     size, squash patch 3 (which contained the tests) into patch 2.
>
>   - Replace "see Documentation/diff-options.txt" with "-I<regex>" in the
>     comments for the added structure fields, in order to make these
>     comments more useful.

Thank you for this diligent work! I looked over the patches and like them
a lot!

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes
  2020-10-15  7:24     ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
@ 2020-10-16 15:32       ` Phillip Wood
  2020-10-16 18:04         ` Junio C Hamano
  2020-10-16 18:16       ` Junio C Hamano
  1 sibling, 1 reply; 57+ messages in thread
From: Phillip Wood @ 2020-10-16 15:32 UTC (permalink / raw)
  To: Michał Kępień, git

Hi Michał

Thanks for working on this, it will be a useful addition. Unfortunately 
there's a use-after-free error see below

On 15/10/2020 08:24, Michał Kępień wrote:
> Add a new diff option that enables ignoring changes whose all lines
> (changed, removed, and added) match a given regular expression.  This is
> similar to the -I/--ignore-matching-lines option in standalone diff
> utilities and can be used e.g. to ignore changes which only affect code
> comments or to look for unrelated changes in commits containing a large
> number of automatically applied modifications (e.g. a tree-wide string
> replacement).  The difference between -G/-S and the new -I option is
> that the latter filters output on a per-change basis.
> 
> Use the 'ignore' field of xdchange_t for marking a change as ignored or
> not.  Since the same field is used by --ignore-blank-lines, identical
> hunk emitting rules apply for --ignore-blank-lines and -I.  These two
> options can also be used together in the same git invocation (they are
> complementary to each other).
> 
> Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
> that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
> than its "parent".
> 
> Signed-off-by: Michał Kępień <michal@isc.org>
> ---
>   Documentation/diff-options.txt |  5 ++++
>   diff.c                         | 28 ++++++++++++++++++++
>   diff.h                         |  4 +++
>   t/t4013-diff-various.sh        | 33 ++++++++++++++++++++++++
>   xdiff/xdiff.h                  |  4 +++
>   xdiff/xdiffi.c                 | 47 ++++++++++++++++++++++++++++++++--
>   6 files changed, 119 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 573fb9bb71..ee52b65e46 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -687,6 +687,11 @@ endif::git-format-patch[]
>   --ignore-blank-lines::
>   	Ignore changes whose lines are all blank.
>   
> +-I<regex>::
> +--ignore-matching-lines=<regex>::
> +	Ignore changes whose all lines match <regex>.  This option may
> +	be specified more than once.
> +
>   --inter-hunk-context=<lines>::
>   	Show the context between diff hunks, up to the specified number
>   	of lines, thereby fusing hunks that are close to each other.
> diff --git a/diff.c b/diff.c
> index 2bb2f8f57e..677de23352 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -3587,6 +3587,8 @@ static void builtin_diff(const char *name_a,
>   		if (header.len && !o->flags.suppress_diff_headers)
>   			ecbdata.header = &header;
>   		xpp.flags = o->xdl_opts;
> +		xpp.ignore_regex = o->ignore_regex;
> +		xpp.ignore_regex_nr = o->ignore_regex_nr;
>   		xpp.anchors = o->anchors;
>   		xpp.anchors_nr = o->anchors_nr;
>   		xecfg.ctxlen = o->context;
> @@ -3716,6 +3718,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
>   		memset(&xpp, 0, sizeof(xpp));
>   		memset(&xecfg, 0, sizeof(xecfg));
>   		xpp.flags = o->xdl_opts;
> +		xpp.ignore_regex = o->ignore_regex;
> +		xpp.ignore_regex_nr = o->ignore_regex_nr;
>   		xpp.anchors = o->anchors;
>   		xpp.anchors_nr = o->anchors_nr;
>   		xecfg.ctxlen = o->context;
> @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt,
>   	return 0;
>   }
>   
> +static int diff_opt_ignore_regex(const struct option *opt,
> +				 const char *arg, int unset)
> +{
> +	struct diff_options *options = opt->value;
> +	regex_t *regex;
> +
> +	BUG_ON_OPT_NEG(unset);
> +	regex = xmalloc(sizeof(*regex));
> +	if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE))
> +		return error(_("invalid regex given to -I: '%s'"), arg);
> +	ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1,
> +		   options->ignore_regex_alloc);
> +	options->ignore_regex[options->ignore_regex_nr++] = regex;
> +	return 0;
> +}
> +
>   static int diff_opt_pickaxe_regex(const struct option *opt,
>   				  const char *arg, int unset)
>   {
> @@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options)
>   		OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts,
>   			  N_("ignore changes whose lines are all blank"),
>   			  XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG),
> +		OPT_CALLBACK_F('I', "ignore-matching-lines", options, N_("<regex>"),
> +			       N_("ignore changes whose all lines match <regex>"),
> +			       0, diff_opt_ignore_regex),
>   		OPT_BIT(0, "indent-heuristic", &options->xdl_opts,
>   			N_("heuristic to shift diff hunk boundaries for easy reading"),
>   			XDF_INDENT_HEURISTIC),
> @@ -6405,6 +6428,11 @@ void diff_flush(struct diff_options *options)
>   	DIFF_QUEUE_CLEAR(q);
>   	if (options->close_file)
>   		fclose(options->file);
> +	for (i = 0; i < options->ignore_regex_nr; i++) {
> +		regfree(options->ignore_regex[i]);
> +		free(options->ignore_regex[i]);
> +	}
> +	free(options->ignore_regex);

If I run `git log -p -I foo` then the address sanitizer reports

AddressSanitizer: heap-use-after-free xdiff/xdiffi.c:1027 in 
record_matches_regex

after it has printed the diff for the first commit. I think freeing the 
regex here is the cause of the problem.

Best Wishes

Phillip

>   	/*
>   	 * Report the content-level differences with HAS_CHANGES;
> diff --git a/diff.h b/diff.h
> index 11de52e9e9..a402227b80 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -234,6 +234,10 @@ struct diff_options {
>   	 */
>   	const char *pickaxe;
>   
> +	/* -I<regex> */
> +	regex_t **ignore_regex;
> +	size_t ignore_regex_nr, ignore_regex_alloc;
> +
>   	const char *single_follow;
>   	const char *a_prefix, *b_prefix;
>   	const char *line_prefix;
> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> index 5c7b0122b4..efaaee2ef0 100755
> --- a/t/t4013-diff-various.sh
> +++ b/t/t4013-diff-various.sh
> @@ -6,6 +6,7 @@
>   test_description='Various diff formatting options'
>   
>   . ./test-lib.sh
> +. "$TEST_DIRECTORY"/diff-lib.sh
>   
>   test_expect_success setup '
>   
> @@ -473,4 +474,36 @@ test_expect_success 'diff-tree --stdin with log formatting' '
>   	test_cmp expect actual
>   '
>   
> +test_expect_success 'diff -I<regex>' '
> +	test_seq 50 >I.txt &&
> +	sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt &&
> +	echo >>J.txt &&
> +
> +	test_expect_code 1 git diff --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&
> +	cat >expect <<-\EOF &&
> +	diff --git a/I.txt b/J.txt
> +	--- a/I.txt
> +	+++ b/J.txt
> +	@@ -34,7 +31,6 @@
> +	 34
> +	 35
> +	 36
> +	-37
> +	 38
> +	 39
> +	 40
> +	EOF
> +	compare_diff_patch expect actual &&
> +
> +	test_expect_code 1 git diff --stat --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&
> +	cat >expect <<-\EOF &&
> +	 I.txt => J.txt | 1 -
> +	 1 file changed, 1 deletion(-)
> +	EOF
> +	test_cmp expect actual &&
> +
> +	test_expect_code 129 git diff --no-index --ignore-matching-lines="^[124-9]" --ignore-matching-lines="^[124-9" I.txt J.txt >output 2>&1 &&
> +	test_i18ngrep "invalid regex given to -I: " output
> +'
> +
>   test_done
> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> index 032e3a9f41..7a04605146 100644
> --- a/xdiff/xdiff.h
> +++ b/xdiff/xdiff.h
> @@ -79,6 +79,10 @@ typedef struct s_mmbuffer {
>   typedef struct s_xpparam {
>   	unsigned long flags;
>   
> +	/* -I<regex> */
> +	regex_t **ignore_regex;
> +	size_t ignore_regex_nr;
> +
>   	/* See Documentation/diff-options.txt. */
>   	char **anchors;
>   	size_t anchors_nr;
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index bd035139f9..380eb728ed 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -998,7 +998,7 @@ static int xdl_call_hunk_func(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
>   	return 0;
>   }
>   
> -static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
> +static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
>   {
>   	xdchange_t *xch;
>   
> @@ -1019,6 +1019,46 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
>   	}
>   }
>   
> +static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) {
> +	regmatch_t regmatch;
> +	int i;
> +
> +	for (i = 0; i < xpp->ignore_regex_nr; i++)
> +		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1,
> +				 &regmatch, 0))
> +			return 1;
> +
> +	return 0;
> +}
> +
> +static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
> +				     xpparam_t const *xpp)
> +{
> +	xdchange_t *xch;
> +
> +	for (xch = xscr; xch; xch = xch->next) {
> +		xrecord_t **rec;
> +		int ignore = 1;
> +		long i;
> +
> +		/*
> +		 * Do not override --ignore-blank-lines.
> +		 */
> +		if (xch->ignore)
> +			continue;
> +
> +		rec = &xe->xdf1.recs[xch->i1];
> +		for (i = 0; i < xch->chg1 && ignore; i++)
> +			ignore = record_matches_regex(rec[i], xpp);
> +
> +		rec = &xe->xdf2.recs[xch->i2];
> +		for (i = 0; i < xch->chg2 && ignore; i++)
> +			ignore = record_matches_regex(rec[i], xpp);
> +
> +		xch->ignore = ignore;
> +	}
> +}
> +
>   int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>   	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
>   	xdchange_t *xscr;
> @@ -1038,7 +1078,10 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>   	}
>   	if (xscr) {
>   		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
> -			xdl_mark_ignorable(xscr, &xe, xpp->flags);
> +			xdl_mark_ignorable_lines(xscr, &xe, xpp->flags);
> +
> +		if (xpp->ignore_regex)
> +			xdl_mark_ignorable_regex(xscr, &xe, xpp);
>   
>   		if (ef(&xe, xscr, ecb, xecfg) < 0) {
>   
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes
  2020-10-16 15:32       ` Phillip Wood
@ 2020-10-16 18:04         ` Junio C Hamano
  2020-10-19  9:48           ` Michał Kępień
  0 siblings, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2020-10-16 18:04 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Michał Kępień, git

Phillip Wood <phillip.wood123@gmail.com> writes:

>> @@ -6405,6 +6428,11 @@ void diff_flush(struct diff_options *options)
>>   	DIFF_QUEUE_CLEAR(q);
>>   	if (options->close_file)
>>   		fclose(options->file);
>> +	for (i = 0; i < options->ignore_regex_nr; i++) {
>> +		regfree(options->ignore_regex[i]);
>> +		free(options->ignore_regex[i]);
>> +	}
>> +	free(options->ignore_regex);
>
> If I run `git log -p -I foo` then the address sanitizer reports
>
> AddressSanitizer: heap-use-after-free xdiff/xdiffi.c:1027 in
> record_matches_regex
>
> after it has printed the diff for the first commit. I think freeing
> the regex here is the cause of the problem.

Yes, this is a good location to clear the diff_queue, which is
per-invocation.  But diff_options->ignore_regex[] can and should be
shared across multiple invocations of the diff machinery, as we are
parsing upfront in the command line option parser just once to be
used throughout the life of the program.  It is wrong to free them
early like this.

I really hate to suggest this, one common API call made into the
diff machinery from various consumers, when they are absolutely done
with diff_options, is probably diff_result_code().  Its purpose is
to collect bits computed to participate in the overall result into
the final result code and anybody who ran one or more diff and wants
to learn the outcome must call it, so it is unlikely that callers
that matter are missing a call to finalize their uses of the diff
machinery.  So if freeing .ignore_regex[] is really necessary, it
could be used to tell us where to do so [*1*].

Unlike per-invocation resources that MUST be freed for repeated
invocations of the diff machinery made in "git log" family,
resources held for and repeatedly used during the entire session
like this may not have to be freed, to be honest, though.

Thanks.


[Footnote]

*1* Adding calls to free() in diff_result_code() directly is
    probably not a good idea.  Some callers may not even need result
    code (e.g.  "blame") and do not call it at all, but we may want
    to free the resource in diff_options that is not per-invocation.
    So if we were to do this, we probably need a new API function,
    perhaps diff_options_clear() or something, and call that from
    diff_result_code(), and then find callers that do not care about
    the result code and call diff_options_clear() directly from
    them.

    If a caller does a single diff_setup_done(), makes repeated
    calls to the diff machinery, and calls diff_result_code() once
    per each iteration (e.g. imagine "git log -p" that detects the
    status for each commit), then having the diff_options_clear()
    call in diff_result_code() will break the machinery, so if we
    were to follow the outline given in the previous paragraph, we
    need to make sure there is no such caller.  But I see that
    diff_result_code() does not clear the fields it looks at in the
    diff_options fields after it uses them, so the second and
    subsequent calls into the diff machinery by such a caller may
    already be broken (not in the "resource leakage" sense, but in
    the correctness sense).
    

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes
  2020-10-15  7:24     ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
  2020-10-16 15:32       ` Phillip Wood
@ 2020-10-16 18:16       ` Junio C Hamano
  2020-10-19  9:55         ` Michał Kępień
  1 sibling, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2020-10-16 18:16 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

Michał Kępień <michal@isc.org> writes:

> +test_expect_success 'diff -I<regex>' '
> +	test_seq 50 >I.txt &&
> +	sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt &&
> +	echo >>J.txt &&
> +
> +	test_expect_code 1 git diff --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&

Please wrap an overly long line like this oned, perhaps like:

	test_expect_code 1 git diff --no-index --ignore-blank-lines \
		-I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&

> +	test_expect_code 1 git diff --stat --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&

> +	test_expect_code 129 git diff --no-index --ignore-matching-lines="^[124-9]" --ignore-matching-lines="^[124-9" I.txt J.txt >output 2>&1 &&

Cramming unrelated tests into a single one made me puzzled, staring
at this line for longer than necessary before realizing that this is
an attempt to catch a malformed regexp.  If this were in a separate
test with its own title, e.g.

	test_expect_success 'diff -I<regex>: setup' '
		... set up the sample input, perhaps sample history ...
		... perhaps prepare the expected output in "expect" ...
	'

	test_expect_success 'diff -I<regex> -p' '
		git diff --I"ten.*e" ... >actual &&
		test_cmp expect actual
	'

	test_expect_success 'diff -I<regex> --stat' '
		git diff --stat --I"ten.*e" ... >actual &&
		test_cmp expect actual
	'

	test_expect_success 'diff -I<regex>: detect malformed regex'
		test_must_fail git diff -I="[^124-9" ... 2>error &&
		test_i18ngrep "invalid regex" error
	'
		
It would have been much easier to follow.

It also is curious that it only tests --no-index, which is a bolted
on feature that may not exercise the codepath the users depend on
working correctly.  If this were tested with "git log -p" for a few
commits, the early destruction of regexp may have been caught,
especially with ASAN build.

Other than that, and also the premature destruction of regexp
pointed out by Phillip already, looks reasonably done.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes
  2020-10-16 18:04         ` Junio C Hamano
@ 2020-10-19  9:48           ` Michał Kępień
  0 siblings, 0 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-19  9:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, git

First, thanks to Phillip for testing this.

> Yes, this is a good location to clear the diff_queue, which is
> per-invocation.  But diff_options->ignore_regex[] can and should be
> shared across multiple invocations of the diff machinery, as we are
> parsing upfront in the command line option parser just once to be
> used throughout the life of the program.  It is wrong to free them
> early like this.

Ugh, sorry, I am only now starting to see the big picture.

> I really hate to suggest this, one common API call made into the
> diff machinery from various consumers, when they are absolutely done
> with diff_options, is probably diff_result_code().  Its purpose is
> to collect bits computed to participate in the overall result into
> the final result code and anybody who ran one or more diff and wants
> to learn the outcome must call it, so it is unlikely that callers
> that matter are missing a call to finalize their uses of the diff
> machinery.  So if freeing .ignore_regex[] is really necessary, it
> could be used to tell us where to do so [*1*].
> 
> Unlike per-invocation resources that MUST be freed for repeated
> invocations of the diff machinery made in "git log" family,
> resources held for and repeatedly used during the entire session
> like this may not have to be freed, to be honest, though.

After reading your message, taking a closer look, and doing a few
Valgrind runs, I tend to agree that perhaps there is no need to free the
'ignore_regex' array after all - it does *not* get allocated repeatedly
during each invocation of the diff machinery and when Git exits, the OS
will reclaim any memory allocated by the process anyway.  On my laptop,
a Valgrind run done for a -DSUPPRESS_ANNOTATED_LEAKS build claims that
'ignore_regex' is one of 1,300+ memory blocks still reachable at exit
time, which amount to about 2.5 MB of memory in total.  Given this, it
feels slightly weird to single 'ignore_regex' out.

I will revert the diff_flush() changes in v4 unless somebody disagrees
with the above reasoning.

-- 
Best regards,
Michał Kępień

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes
  2020-10-16 18:16       ` Junio C Hamano
@ 2020-10-19  9:55         ` Michał Kępień
  2020-10-19 17:29           ` Junio C Hamano
  0 siblings, 1 reply; 57+ messages in thread
From: Michał Kępień @ 2020-10-19  9:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

> Michał Kępień <michal@isc.org> writes:
> 
> > +test_expect_success 'diff -I<regex>' '
> > +	test_seq 50 >I.txt &&
> > +	sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt &&
> > +	echo >>J.txt &&
> > +
> > +	test_expect_code 1 git diff --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&
> 
> Please wrap an overly long line like this oned, perhaps like:
> 
> 	test_expect_code 1 git diff --no-index --ignore-blank-lines \
> 		-I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&

Yeah, thanks, I was unsure what the line length limit for test code is.
I will fix this in v4.

> > +	test_expect_code 1 git diff --stat --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual &&
> 
> > +	test_expect_code 129 git diff --no-index --ignore-matching-lines="^[124-9]" --ignore-matching-lines="^[124-9" I.txt J.txt >output 2>&1 &&
> 
> Cramming unrelated tests into a single one made me puzzled, staring
> at this line for longer than necessary before realizing that this is
> an attempt to catch a malformed regexp.  If this were in a separate
> test with its own title, e.g.
> 
> (...)
> 		
> It would have been much easier to follow.
> 
> It also is curious that it only tests --no-index, which is a bolted
> on feature that may not exercise the codepath the users depend on
> working correctly.  If this were tested with "git log -p" for a few
> commits, the early destruction of regexp may have been caught,
> especially with ASAN build.

Sorry, I think I got a bit carried away trying to use the test suggested
by Johannes while also extending it with the missing bits.  I will try
to come up with something more balanced in v4.

-- 
Best regards,
Michał Kępień

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes
  2020-10-19  9:55         ` Michał Kępień
@ 2020-10-19 17:29           ` Junio C Hamano
  0 siblings, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2020-10-19 17:29 UTC (permalink / raw)
  To: Michał Kępień; +Cc: git

Michał Kępień <michal@isc.org> writes:

>> Cramming unrelated tests into a single one made me puzzled, staring
>> at this line for longer than necessary before realizing that this is
>> an attempt to catch a malformed regexp.  If this were in a separate
>> test with its own title, e.g.
>> 		
>> It would have been much easier to follow.
> ... I will try
> to come up with something more balanced in v4.

We want to avoid going overboard and doing a full test matrix.  

But having it tested fully with one variant while testing just the
basic for other variants may give us a good balance, e.g. "git log
-p -I<regex>" is tested but we do not bother testing interactions
with other options", while we try different combinations when
testing "git diff --no-index -I<regex>", like multiple -I options
etc.

Thanks.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v4 0/2] diff: add -I<regex> that ignores matching changes
  2020-10-15  7:24   ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
                       ` (2 preceding siblings ...)
  2020-10-16 10:00     ` [PATCH v3 0/2] " Johannes Schindelin
@ 2020-10-20  6:48     ` Michał Kępień
  2020-10-20  6:48       ` [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
                         ` (3 more replies)
  3 siblings, 4 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-20  6:48 UTC (permalink / raw)
  To: git

This patch series adds a new diff option that enables ignoring changes
whose all lines (changed, removed, and added) match a given regular
expression.  This is similar to the -I/--ignore-matching-lines option in
standalone diff utilities and can be used e.g. to ignore changes which
only affect code comments or to look for unrelated changes in commits
containing a large number of automatically applied modifications (e.g. a
tree-wide string replacement).  The difference between -G/-S and the new
-I option is that the latter filters output on a per-change basis.

Changes from v3:

  - Do not release the memory allocated in diff_opt_ignore_regex() after
    all as the free() calls added in v2 were triggering use-after-free
    errors when the diff machinery was invoked multiple times throughout
    the lifetime of a single Git command (e.g. by "git log -p").

  - Further test improvements: split various -I<regex> checks into
    multiple tests; add a check for "git log -p -I<regex>" to trigger
    the issue mentioned above; avoid using --no-index.

Changes from v2:

  - Add a long option for -I (--ignore-matching-lines) as it is
    commonplace in standalone diff utilities.  Update documentation and
    commit log messages accordingly.

  - Use xmalloc() instead of xcalloc() for allocating regex_t
    structures in diff_opt_ignore_regex().

  - Ensure the memory allocated in diff_opt_ignore_regex() gets
    released.

  - Use "return error(...)" instead of die() in the -I option callback.
    Make sure the error message is localizable.

  - Drastically reduce the number of -I<regex> tests due to excessive
    run time of t/t4069-diff-ignore-regex.sh from v1/v2 on some
    platforms (notably Windows).  Use a tweaked version of a test
    suggested by Johannes Schindelin (thanks!).  Squash patch 3 into
    patch 2.

  - Replace "see Documentation/diff-options.txt" with "-I<regex>" in the
    comments for the added structure fields, in order to make these
    comments more useful.

Changes from v1:

  - Add a new preliminary cleanup patch which ensures xpparam_t
    structures are always zero-initialized.  (This was a prerequisite
    for the next change below.)

  - Do not add a new 'xdl_opts' flag to check whether -I was used;
    instead, just check whether the array of regular expressions to
    match against is non-NULL.

  - Enable the -I option to be used multiple times.  As a consequence of
    this, regular expressions are now "pre-compiled" in the option's
    callback (and passed around as an array of regex_t structures)
    rather than deep down in xdiff code.  Add test cases exercising use
    of multiple -I options in the same git invocation.  Update
    documentation accordingly.

  - Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to
    indicate that it is logically a "sibling" of
    xdl_mark_ignorable_regex() rather than its "parent".

  - Optimize xdl_mark_ignorable_regex() by making it immediately skip
    changes already marked as ignored by xdl_mark_ignorable_lines().

  - Fix coding style issue in the prototype part of the definition of
    xdl_mark_ignorable_regex().

  - Add "/* see Documentation/diff-options.txt */" comments for the
    fields added to struct diff_options and xpparam_t, mimicking the
    comments used for 'anchors', 'anchors_nr', and 'anchors_alloc'.

  - Revise commit log messages to reflect all of the above.

Michał Kępień (2):
  merge-base, xdiff: zero out xpparam_t structures
  diff: add -I<regex> that ignores matching changes

 Documentation/diff-options.txt             |  5 ++
 builtin/merge-tree.c                       |  1 +
 diff.c                                     | 23 +++++
 diff.h                                     |  4 +
 t/t4013-diff-various.sh                    | 41 +++++++++
 t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master | 99 ++++++++++++++++++++++
 xdiff/xdiff.h                              |  4 +
 xdiff/xdiffi.c                             | 47 +++++++++-
 xdiff/xhistogram.c                         |  2 +
 xdiff/xpatience.c                          |  2 +
 10 files changed, 226 insertions(+), 2 deletions(-)
 create mode 100644 t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master

-- 
2.28.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures
  2020-10-20  6:48     ` [PATCH v4 " Michał Kępień
@ 2020-10-20  6:48       ` Michał Kępień
  2020-10-20  6:48       ` [PATCH v4 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-20  6:48 UTC (permalink / raw)
  To: git

xpparam_t structures are usually zero-initialized before their specific
fields are assigned to, but there are three locations in the tree where
that does not happen.  Add the missing memset() calls in order to make
initialization of xpparam_t structures consistent tree-wide and to
prevent stack garbage from being used as field values.

Signed-off-by: Michał Kępień <michal@isc.org>
---
 builtin/merge-tree.c | 1 +
 xdiff/xhistogram.c   | 2 ++
 xdiff/xpatience.c    | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index e72714a5a8..de8520778d 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -109,6 +109,7 @@ static void show_diff(struct merge_list *entry)
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;
 
+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
index c7b35a9667..e694bfd9e3 100644
--- a/xdiff/xhistogram.c
+++ b/xdiff/xhistogram.c
@@ -235,6 +235,8 @@ static int fall_back_to_classic_diff(xpparam_t const *xpp, xdfenv_t *env,
 		int line1, int count1, int line2, int count2)
 {
 	xpparam_t xpparam;
+
+	memset(&xpparam, 0, sizeof(xpparam));
 	xpparam.flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK;
 
 	return xdl_fall_back_diff(env, &xpparam,
diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
index 3c5601b602..20699a6f60 100644
--- a/xdiff/xpatience.c
+++ b/xdiff/xpatience.c
@@ -318,6 +318,8 @@ static int fall_back_to_classic_diff(struct hashmap *map,
 		int line1, int count1, int line2, int count2)
 {
 	xpparam_t xpp;
+
+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = map->xpp->flags & ~XDF_DIFF_ALGORITHM_MASK;
 
 	return xdl_fall_back_diff(map->env, &xpp,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v4 2/2] diff: add -I<regex> that ignores matching changes
  2020-10-20  6:48     ` [PATCH v4 " Michał Kępień
  2020-10-20  6:48       ` [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
@ 2020-10-20  6:48       ` Michał Kępień
  2021-02-05 14:13       ` [PATCH 1/2] diff: add an API for deferred freeing Ævar Arnfjörð Bjarmason
  2021-02-05 14:13       ` [PATCH " Ævar Arnfjörð Bjarmason
  3 siblings, 0 replies; 57+ messages in thread
From: Michał Kępień @ 2020-10-20  6:48 UTC (permalink / raw)
  To: git

Add a new diff option that enables ignoring changes whose all lines
(changed, removed, and added) match a given regular expression.  This is
similar to the -I/--ignore-matching-lines option in standalone diff
utilities and can be used e.g. to ignore changes which only affect code
comments or to look for unrelated changes in commits containing a large
number of automatically applied modifications (e.g. a tree-wide string
replacement).  The difference between -G/-S and the new -I option is
that the latter filters output on a per-change basis.

Use the 'ignore' field of xdchange_t for marking a change as ignored or
not.  Since the same field is used by --ignore-blank-lines, identical
hunk emitting rules apply for --ignore-blank-lines and -I.  These two
options can also be used together in the same git invocation (they are
complementary to each other).

Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
than its "parent".

Signed-off-by: Michał Kępień <michal@isc.org>
---
 Documentation/diff-options.txt             |  5 ++
 diff.c                                     | 23 +++++
 diff.h                                     |  4 +
 t/t4013-diff-various.sh                    | 41 +++++++++
 t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master | 99 ++++++++++++++++++++++
 xdiff/xdiff.h                              |  4 +
 xdiff/xdiffi.c                             | 47 +++++++++-
 7 files changed, 221 insertions(+), 2 deletions(-)
 create mode 100644 t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 573fb9bb71..ee52b65e46 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -687,6 +687,11 @@ endif::git-format-patch[]
 --ignore-blank-lines::
 	Ignore changes whose lines are all blank.
 
+-I<regex>::
+--ignore-matching-lines=<regex>::
+	Ignore changes whose all lines match <regex>.  This option may
+	be specified more than once.
+
 --inter-hunk-context=<lines>::
 	Show the context between diff hunks, up to the specified number
 	of lines, thereby fusing hunks that are close to each other.
diff --git a/diff.c b/diff.c
index 2bb2f8f57e..d24f47df99 100644
--- a/diff.c
+++ b/diff.c
@@ -3587,6 +3587,8 @@ static void builtin_diff(const char *name_a,
 		if (header.len && !o->flags.suppress_diff_headers)
 			ecbdata.header = &header;
 		xpp.flags = o->xdl_opts;
+		xpp.ignore_regex = o->ignore_regex;
+		xpp.ignore_regex_nr = o->ignore_regex_nr;
 		xpp.anchors = o->anchors;
 		xpp.anchors_nr = o->anchors_nr;
 		xecfg.ctxlen = o->context;
@@ -3716,6 +3718,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		xpp.flags = o->xdl_opts;
+		xpp.ignore_regex = o->ignore_regex;
+		xpp.ignore_regex_nr = o->ignore_regex_nr;
 		xpp.anchors = o->anchors;
 		xpp.anchors_nr = o->anchors_nr;
 		xecfg.ctxlen = o->context;
@@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt,
 	return 0;
 }
 
+static int diff_opt_ignore_regex(const struct option *opt,
+				 const char *arg, int unset)
+{
+	struct diff_options *options = opt->value;
+	regex_t *regex;
+
+	BUG_ON_OPT_NEG(unset);
+	regex = xmalloc(sizeof(*regex));
+	if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE))
+		return error(_("invalid regex given to -I: '%s'"), arg);
+	ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1,
+		   options->ignore_regex_alloc);
+	options->ignore_regex[options->ignore_regex_nr++] = regex;
+	return 0;
+}
+
 static int diff_opt_pickaxe_regex(const struct option *opt,
 				  const char *arg, int unset)
 {
@@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options)
 		OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts,
 			  N_("ignore changes whose lines are all blank"),
 			  XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG),
+		OPT_CALLBACK_F('I', "ignore-matching-lines", options, N_("<regex>"),
+			       N_("ignore changes whose all lines match <regex>"),
+			       0, diff_opt_ignore_regex),
 		OPT_BIT(0, "indent-heuristic", &options->xdl_opts,
 			N_("heuristic to shift diff hunk boundaries for easy reading"),
 			XDF_INDENT_HEURISTIC),
diff --git a/diff.h b/diff.h
index 11de52e9e9..a402227b80 100644
--- a/diff.h
+++ b/diff.h
@@ -234,6 +234,10 @@ struct diff_options {
 	 */
 	const char *pickaxe;
 
+	/* -I<regex> */
+	regex_t **ignore_regex;
+	size_t ignore_regex_nr, ignore_regex_alloc;
+
 	const char *single_follow;
 	const char *a_prefix, *b_prefix;
 	const char *line_prefix;
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 5c7b0122b4..f72d456d3b 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -6,6 +6,7 @@
 test_description='Various diff formatting options'
 
 . ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
 
 test_expect_success setup '
 
@@ -333,6 +334,7 @@ log -SF master --max-count=2
 log -GF master
 log -GF -p master
 log -GF -p --pickaxe-all master
+log -IA -IB -I1 -I2 -p master
 log --decorate --all
 log --decorate=full --all
 
@@ -473,4 +475,43 @@ test_expect_success 'diff-tree --stdin with log formatting' '
 	test_cmp expect actual
 '
 
+test_expect_success 'diff -I<regex>: setup' '
+	git checkout master &&
+	test_seq 50 >file0 &&
+	git commit -m "Set up -I<regex> test file" file0 &&
+	test_seq 50 | sed -e "s/13/ten and three/" -e "/7\$/d" >file0 &&
+	echo >>file0
+'
+test_expect_success 'diff -I<regex>' '
+	git diff --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >actual &&
+	cat >expect <<-\EOF &&
+	diff --git a/file0 b/file0
+	--- a/file0
+	+++ b/file0
+	@@ -34,7 +31,6 @@
+	 34
+	 35
+	 36
+	-37
+	 38
+	 39
+	 40
+	EOF
+	compare_diff_patch expect actual
+'
+
+test_expect_success 'diff -I<regex> --stat' '
+	git diff --stat --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >actual &&
+	cat >expect <<-\EOF &&
+	 file0 | 1 -
+	 1 file changed, 1 deletion(-)
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff -I<regex>: detect malformed regex' '
+	test_expect_code 129 git diff --ignore-matching-lines="^[124-9" 2>error &&
+	test_i18ngrep "invalid regex given to -I: " error
+'
+
 test_done
diff --git a/t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master b/t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master
new file mode 100644
index 0000000000..929f35a05b
--- /dev/null
+++ b/t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master
@@ -0,0 +1,99 @@
+$ git log -IA -IB -I1 -I2 -p master
+commit 59d314ad6f356dd08601a4cd5e530381da3e3c64
+Merge: 9a6d494 c7a2ab9
+Author: A U Thor <author@example.com>
+Date:   Mon Jun 26 00:04:00 2006 +0000
+
+    Merge branch 'side'
+
+commit c7a2ab9e8eac7b117442a607d5a9b3950ae34d5a
+Author: A U Thor <author@example.com>
+Date:   Mon Jun 26 00:03:00 2006 +0000
+
+    Side
+
+diff --git a/file0 b/file0
+index 01e79c3..f4615da 100644
+--- a/file0
++++ b/file0
+@@ -1,3 +1,6 @@
+ 1
+ 2
+ 3
++A
++B
++C
+diff --git a/file3 b/file3
+new file mode 100644
+index 0000000..7289e35
+
+commit 9a6d4949b6b76956d9d5e26f2791ec2ceff5fdc0
+Author: A U Thor <author@example.com>
+Date:   Mon Jun 26 00:02:00 2006 +0000
+
+    Third
+
+diff --git a/dir/sub b/dir/sub
+index 8422d40..cead32e 100644
+--- a/dir/sub
++++ b/dir/sub
+@@ -2,3 +2,5 @@ A
+ B
+ C
+ D
++E
++F
+diff --git a/file1 b/file1
+new file mode 100644
+index 0000000..b1e6722
+--- /dev/null
++++ b/file1
+@@ -0,0 +1,3 @@
++A
++B
++C
+
+commit 1bde4ae5f36c8d9abe3a0fce0c6aab3c4a12fe44
+Author: A U Thor <author@example.com>
+Date:   Mon Jun 26 00:01:00 2006 +0000
+
+    Second
+    
+    This is the second commit.
+
+diff --git a/dir/sub b/dir/sub
+index 35d242b..8422d40 100644
+--- a/dir/sub
++++ b/dir/sub
+@@ -1,2 +1,4 @@
+ A
+ B
++C
++D
+diff --git a/file0 b/file0
+index 01e79c3..b414108 100644
+--- a/file0
++++ b/file0
+@@ -1,3 +1,6 @@
+ 1
+ 2
+ 3
++4
++5
++6
+diff --git a/file2 b/file2
+deleted file mode 100644
+index 01e79c3..0000000
+--- a/file2
++++ /dev/null
+@@ -1,3 +0,0 @@
+-1
+-2
+-3
+
+commit 444ac553ac7612cc88969031b02b3767fb8a353a
+Author: A U Thor <author@example.com>
+Date:   Mon Jun 26 00:00:00 2006 +0000
+
+    Initial
+$
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 032e3a9f41..7a04605146 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -79,6 +79,10 @@ typedef struct s_mmbuffer {
 typedef struct s_xpparam {
 	unsigned long flags;
 
+	/* -I<regex> */
+	regex_t **ignore_regex;
+	size_t ignore_regex_nr;
+
 	/* See Documentation/diff-options.txt. */
 	char **anchors;
 	size_t anchors_nr;
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index bd035139f9..380eb728ed 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -998,7 +998,7 @@ static int xdl_call_hunk_func(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 	return 0;
 }
 
-static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
+static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
 {
 	xdchange_t *xch;
 
@@ -1019,6 +1019,46 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
 	}
 }
 
+static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) {
+	regmatch_t regmatch;
+	int i;
+
+	for (i = 0; i < xpp->ignore_regex_nr; i++)
+		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1,
+				 &regmatch, 0))
+			return 1;
+
+	return 0;
+}
+
+static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
+				     xpparam_t const *xpp)
+{
+	xdchange_t *xch;
+
+	for (xch = xscr; xch; xch = xch->next) {
+		xrecord_t **rec;
+		int ignore = 1;
+		long i;
+
+		/*
+		 * Do not override --ignore-blank-lines.
+		 */
+		if (xch->ignore)
+			continue;
+
+		rec = &xe->xdf1.recs[xch->i1];
+		for (i = 0; i < xch->chg1 && ignore; i++)
+			ignore = record_matches_regex(rec[i], xpp);
+
+		rec = &xe->xdf2.recs[xch->i2];
+		for (i = 0; i < xch->chg2 && ignore; i++)
+			ignore = record_matches_regex(rec[i], xpp);
+
+		xch->ignore = ignore;
+	}
+}
+
 int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
 	xdchange_t *xscr;
@@ -1038,7 +1078,10 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	}
 	if (xscr) {
 		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
-			xdl_mark_ignorable(xscr, &xe, xpp->flags);
+			xdl_mark_ignorable_lines(xscr, &xe, xpp->flags);
+
+		if (xpp->ignore_regex)
+			xdl_mark_ignorable_regex(xscr, &xe, xpp);
 
 		if (ef(&xe, xscr, ecb, xecfg) < 0) {
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 1/2] diff: add an API for deferred freeing
  2020-10-20  6:48     ` [PATCH v4 " Michał Kępień
  2020-10-20  6:48       ` [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
  2020-10-20  6:48       ` [PATCH v4 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
@ 2021-02-05 14:13       ` Ævar Arnfjörð Bjarmason
  2021-02-10 16:00         ` Johannes Schindelin
  2021-02-05 14:13       ` [PATCH " Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-05 14:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Michał Kępień,
	Johannes Schindelin, Phillip Wood,
	Ævar Arnfjörð Bjarmason

Add a diff_free() function to free anything we may have allocated in
the "diff_options" struct, and the ability to make calling it a noop
by setting "no_free" in "diff_options".

This is required because when e.g. "git diff" is run we'll allocate
things in that struct, use the diff machinery once, and then exit, but
if we run e.g. "git log -p" we're going to re-use what we allocated
across multiple diff_flush() calls, and only want to free things at
the end.

We've thus ended up with features like the recently added "diff -I"[1]
where we'll leak memory. As it turns out it could have simply used the
pattern established in 6ea57703f6 (log: prepare log/log-tree to reuse
the diffopt.close_file attribute, 2016-06-22).

Manually adding more such flags to things log_tree_commit() every time
we need to allocate something would be tedious.

Let's instead move that fclose() code it to a new diff_free(), in
anticipation of freeing more things in that function in follow-up
commits. I'm renaming the "close_file" struct member to "fclose_file"
for the ease of validating this, we can be certain that these are all
the relevant callsites.

Some functions such as log_tree_commit() need an idiom of optionally
retaining a previous "no_free", as they may either free the memory
themselves, or their caller may do so. I'm keeping that idiom in
log_show_early() even though I don't think it's currently called in
this manner, since it also gets passed an existing "struct rev_info"..

1. 296d4a94e7 (diff: add -I<regex> that ignores matching changes,
   2020-10-20)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/add.c |  2 +-
 builtin/am.c  |  6 ++++--
 builtin/log.c | 27 ++++++++++++++-------------
 diff.c        | 18 +++++++++++++-----
 diff.h        | 14 +++++++++++++-
 log-tree.c    | 10 ++++++----
 wt-status.c   |  2 +-
 7 files changed, 52 insertions(+), 27 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index a825887c50..6319710186 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -282,7 +282,7 @@ static int edit_patch(int argc, const char **argv, const char *prefix)
 	if (out < 0)
 		die(_("Could not open '%s' for writing."), file);
 	rev.diffopt.file = xfdopen(out, "w");
-	rev.diffopt.close_file = 1;
+	rev.diffopt.fclose_file = 1;
 	if (run_diff_files(&rev, 0))
 		die(_("Could not write patch"));
 
diff --git a/builtin/am.c b/builtin/am.c
index 8355e3566f..157d264583 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1315,7 +1315,7 @@ static void write_commit_patch(const struct am_state *state, struct commit *comm
 	rev_info.diffopt.flags.full_index = 1;
 	rev_info.diffopt.use_color = 0;
 	rev_info.diffopt.file = fp;
-	rev_info.diffopt.close_file = 1;
+	rev_info.diffopt.fclose_file = 1; /* log_tree_commit() sets .no_free=1 */
 	add_pending_object(&rev_info, &commit->object, "");
 	diff_setup_done(&rev_info.diffopt);
 	log_tree_commit(&rev_info, commit);
@@ -1347,10 +1347,12 @@ static void write_index_patch(const struct am_state *state)
 	rev_info.diffopt.output_format = DIFF_FORMAT_PATCH;
 	rev_info.diffopt.use_color = 0;
 	rev_info.diffopt.file = fp;
-	rev_info.diffopt.close_file = 1;
+	rev_info.diffopt.fclose_file = 1;
+	rev_info.diffopt.no_free = 1;
 	add_pending_object(&rev_info, &tree->object, "");
 	diff_setup_done(&rev_info.diffopt);
 	run_diff_index(&rev_info, 1);
+	diff_free(&rev_info.diffopt);
 }
 
 /**
diff --git a/builtin/log.c b/builtin/log.c
index fd282def43..604ee29ec0 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -306,10 +306,11 @@ static struct itimerval early_output_timer;
 
 static void log_show_early(struct rev_info *revs, struct commit_list *list)
 {
-	int i = revs->early_output, close_file = revs->diffopt.close_file;
+	int i = revs->early_output;
 	int show_header = 1;
+	int no_free = revs->diffopt.no_free;
 
-	revs->diffopt.close_file = 0;
+	revs->diffopt.no_free = 0;
 	sort_in_topological_order(&list, revs->sort_order);
 	while (list && i) {
 		struct commit *commit = list->item;
@@ -326,8 +327,8 @@ static void log_show_early(struct rev_info *revs, struct commit_list *list)
 		case commit_ignore:
 			break;
 		case commit_error:
-			if (close_file)
-				fclose(revs->diffopt.file);
+			revs->diffopt.no_free = no_free;
+			diff_free(&revs->diffopt);
 			return;
 		}
 		list = list->next;
@@ -335,8 +336,8 @@ static void log_show_early(struct rev_info *revs, struct commit_list *list)
 
 	/* Did we already get enough commits for the early output? */
 	if (!i) {
-		if (close_file)
-			fclose(revs->diffopt.file);
+		revs->diffopt.no_free = 0;
+		diff_free(&revs->diffopt);
 		return;
 	}
 
@@ -400,7 +401,7 @@ static int cmd_log_walk(struct rev_info *rev)
 {
 	struct commit *commit;
 	int saved_nrl = 0;
-	int saved_dcctc = 0, close_file = rev->diffopt.close_file;
+	int saved_dcctc = 0;
 
 	if (rev->early_output)
 		setup_early_output();
@@ -416,7 +417,7 @@ static int cmd_log_walk(struct rev_info *rev)
 	 * and HAS_CHANGES being accumulated in rev->diffopt, so be careful to
 	 * retain that state information if replacing rev->diffopt in this loop
 	 */
-	rev->diffopt.close_file = 0;
+	rev->diffopt.no_free = 1;
 	while ((commit = get_revision(rev)) != NULL) {
 		if (!log_tree_commit(rev, commit) && rev->max_count >= 0)
 			/*
@@ -441,8 +442,8 @@ static int cmd_log_walk(struct rev_info *rev)
 	}
 	rev->diffopt.degraded_cc_to_c = saved_dcctc;
 	rev->diffopt.needed_rename_limit = saved_nrl;
-	if (close_file)
-		fclose(rev->diffopt.file);
+	rev->diffopt.no_free = 0;
+	diff_free(&rev->diffopt);
 
 	if (rev->diffopt.output_format & DIFF_FORMAT_CHECKDIFF &&
 	    rev->diffopt.flags.check_failed) {
@@ -1952,18 +1953,18 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 	if (rev.show_notes)
 		load_display_notes(&rev.notes_opt);
 
-	if (use_stdout + rev.diffopt.close_file + !!output_directory > 1)
+	if (use_stdout + rev.diffopt.fclose_file + !!output_directory > 1)
 		die(_("--stdout, --output, and --output-directory are mutually exclusive"));
 
 	if (use_stdout) {
 		setup_pager();
-	} else if (rev.diffopt.close_file) {
+	} else if (rev.diffopt.fclose_file) {
 		/*
 		 * The diff code parsed --output; it has already opened the
 		 * file, but but we must instruct it not to close after each
 		 * diff.
 		 */
-		rev.diffopt.close_file = 0;
+		rev.diffopt.no_free = 1;
 	} else {
 		int saved;
 
diff --git a/diff.c b/diff.c
index 69e3bc00ed..3e6f8f0a71 100644
--- a/diff.c
+++ b/diff.c
@@ -5187,7 +5187,7 @@ static enum parse_opt_result diff_opt_output(struct parse_opt_ctx_t *ctx,
 	BUG_ON_OPT_NEG(unset);
 	path = prefix_filename(ctx->prefix, arg);
 	options->file = xfopen(path, "w");
-	options->close_file = 1;
+	options->fclose_file = 1;
 	if (options->use_color != GIT_COLOR_ALWAYS)
 		options->use_color = GIT_COLOR_NEVER;
 	free(path);
@@ -6399,10 +6399,10 @@ void diff_flush(struct diff_options *options)
 		 * options->file to /dev/null should be safe, because we
 		 * aren't supposed to produce any output anyway.
 		 */
-		if (options->close_file)
+		if (options->fclose_file)
 			fclose(options->file);
 		options->file = xfopen("/dev/null", "w");
-		options->close_file = 1;
+		options->fclose_file = 1;
 		options->color_moved = 0;
 		for (i = 0; i < q->nr; i++) {
 			struct diff_filepair *p = q->queue[i];
@@ -6433,8 +6433,7 @@ void diff_flush(struct diff_options *options)
 free_queue:
 	free(q->queue);
 	DIFF_QUEUE_CLEAR(q);
-	if (options->close_file)
-		fclose(options->file);
+	diff_free(options);
 
 	/*
 	 * Report the content-level differences with HAS_CHANGES;
@@ -6449,6 +6448,15 @@ void diff_flush(struct diff_options *options)
 	}
 }
 
+void diff_free(struct diff_options *options)
+{
+	if (options->no_free)
+		return;
+	if (options->fclose_file)
+		fclose(options->file);
+}
+	
+
 static int match_filter(const struct diff_options *options, const struct diff_filepair *p)
 {
 	return (((p->status == DIFF_STATUS_MODIFIED) &&
diff --git a/diff.h b/diff.h
index 2ff2b1c7f2..d1d74c3a9e 100644
--- a/diff.h
+++ b/diff.h
@@ -49,7 +49,16 @@
  * - Once you finish feeding the pairs of files, call `diffcore_std()`.
  * This will tell the diffcore library to go ahead and do its work.
  *
+ * - The `diff_opt_parse()` etc. functions might allocate memory in
+ *  `struct diff_options`. When running the API `N > 1` set `.no_free
+ *  = 1` to make the `diff_free()` invoked by `diff_flush()` below a
+ *  noop.
+ *
  * - Calling `diff_flush()` will produce the output.
+ *
+ * - If you set `.no_free = 1` before set it to `0` and call
+ *   `diff_free()` again. If `.no_free = 1` was not set there's no
+ *   need to call `diff_free()`, `diff_flush()` will call it.
  */
 
 struct combine_diff_path;
@@ -328,7 +337,7 @@ struct diff_options {
 	void (*set_default)(struct diff_options *);
 
 	FILE *file;
-	int close_file;
+	int fclose_file;
 
 #define OUTPUT_INDICATOR_NEW 0
 #define OUTPUT_INDICATOR_OLD 1
@@ -365,6 +374,8 @@ struct diff_options {
 
 	struct repository *repo;
 	struct option *parseopts;
+
+	int no_free;
 };
 
 unsigned diff_filter_bit(char status);
@@ -559,6 +570,7 @@ void diffcore_fix_diff_index(void);
 
 int diff_queue_is_empty(void);
 void diff_flush(struct diff_options*);
+void diff_free(struct diff_options*);
 void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc);
 
 /* diff-raw status letters */
diff --git a/log-tree.c b/log-tree.c
index fd0dde97ec..bb946f15f1 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -952,12 +952,14 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log
 int log_tree_commit(struct rev_info *opt, struct commit *commit)
 {
 	struct log_info log;
-	int shown, close_file = opt->diffopt.close_file;
+	int shown;
+	/* maybe called by e.g. cmd_log_walk(), maybe stand-alone */
+	int no_free = opt->diffopt.no_free;
 
 	log.commit = commit;
 	log.parent = NULL;
 	opt->loginfo = &log;
-	opt->diffopt.close_file = 0;
+	opt->diffopt.no_free = 1;
 
 	if (opt->line_level_traverse)
 		return line_log_print(opt, commit);
@@ -974,7 +976,7 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 		fprintf(opt->diffopt.file, "\n%s\n", opt->break_bar);
 	opt->loginfo = NULL;
 	maybe_flush_or_die(opt->diffopt.file, "stdout");
-	if (close_file)
-		fclose(opt->diffopt.file);
+	opt->diffopt.no_free = no_free;
+	diff_free(&opt->diffopt);
 	return shown;
 }
diff --git a/wt-status.c b/wt-status.c
index 0c8287a023..41b08474e5 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1052,7 +1052,7 @@ static void wt_longstatus_print_verbose(struct wt_status *s)
 	rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
 	rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
 	rev.diffopt.file = s->fp;
-	rev.diffopt.close_file = 0;
+	rev.diffopt.fclose_file = 0; /* wt_status owns the s->fp */
 	/*
 	 * If we're not going to stdout, then we definitely don't
 	 * want color, since we are going to the commit message
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 2/2] diff: plug memory leak from regcomp() on {log,diff} -I
  2020-10-20  6:48     ` [PATCH v4 " Michał Kępień
                         ` (2 preceding siblings ...)
  2021-02-05 14:13       ` [PATCH 1/2] diff: add an API for deferred freeing Ævar Arnfjörð Bjarmason
@ 2021-02-05 14:13       ` Ævar Arnfjörð Bjarmason
  3 siblings, 0 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-05 14:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Michał Kępień,
	Johannes Schindelin, Phillip Wood,
	Ævar Arnfjörð Bjarmason

Fix a memory leak in 296d4a94e7 (diff: add -I<regex> that ignores
matching changes, 2020-10-20) by freeing the memory it allocates in
the newly introduced diff_free(). See the previous commit for details
on that.

This memory leak was intentionally introduced in 296d4a94e7, see the
discussion on a previous iteration of it in
https://lore.kernel.org/git/xmqqeelycajx.fsf@gitster.c.googlers.com/

At that time freeing the memory was somewhat tedious, but since it
isn't anymore with the newly introduced diff_free() let's use it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/diff.c b/diff.c
index 3e6f8f0a71..24257759bb 100644
--- a/diff.c
+++ b/diff.c
@@ -6450,10 +6450,17 @@ void diff_flush(struct diff_options *options)
 
 void diff_free(struct diff_options *options)
 {
+	int i;
 	if (options->no_free)
 		return;
 	if (options->fclose_file)
 		fclose(options->file);
+
+	for (i = 0; i < options->ignore_regex_nr; i++) {
+		regfree(options->ignore_regex[i]);
+		free(options->ignore_regex[i]);
+	}
+	free(options->ignore_regex);
 }
 	
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] diff: add an API for deferred freeing
  2021-02-05 14:13       ` [PATCH 1/2] diff: add an API for deferred freeing Ævar Arnfjörð Bjarmason
@ 2021-02-10 16:00         ` Johannes Schindelin
  2021-02-11  3:00           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 57+ messages in thread
From: Johannes Schindelin @ 2021-02-10 16:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Michał Kępień, Phillip Wood

[-- Attachment #1: Type: text/plain, Size: 12871 bytes --]

Hi Ævar,

On Fri, 5 Feb 2021, Ævar Arnfjörð Bjarmason wrote:

> Add a diff_free() function to free anything we may have allocated in
> the "diff_options" struct, and the ability to make calling it a noop
> by setting "no_free" in "diff_options".

Why do we need a `no_free` flag? Why not simply set the `free()`d (or
`fclose()`d) attributes to `NULL`?

> This is required because when e.g. "git diff" is run we'll allocate
> things in that struct, use the diff machinery once, and then exit, but
> if we run e.g. "git log -p" we're going to re-use what we allocated
> across multiple diff_flush() calls, and only want to free things at
> the end.
>
> We've thus ended up with features like the recently added "diff -I"[1]
> where we'll leak memory. As it turns out it could have simply used the
> pattern established in 6ea57703f6 (log: prepare log/log-tree to reuse
> the diffopt.close_file attribute, 2016-06-22).
>
> Manually adding more such flags to things log_tree_commit() every time
> we need to allocate something would be tedious.
>
> Let's instead move that fclose() code it to a new diff_free(), in
> anticipation of freeing more things in that function in follow-up
> commits. I'm renaming the "close_file" struct member to "fclose_file"
> for the ease of validating this, we can be certain that these are all
> the relevant callsites.

I like that idea a lot.

> Some functions such as log_tree_commit() need an idiom of optionally
> retaining a previous "no_free", as they may either free the memory
> themselves, or their caller may do so. I'm keeping that idiom in
> log_show_early() even though I don't think it's currently called in
> this manner, since it also gets passed an existing "struct rev_info"..
>
> 1. 296d4a94e7 (diff: add -I<regex> that ignores matching changes,
>    2020-10-20)
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/add.c |  2 +-
>  builtin/am.c  |  6 ++++--
>  builtin/log.c | 27 ++++++++++++++-------------
>  diff.c        | 18 +++++++++++++-----
>  diff.h        | 14 +++++++++++++-
>  log-tree.c    | 10 ++++++----
>  wt-status.c   |  2 +-
>  7 files changed, 52 insertions(+), 27 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index a825887c50..6319710186 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -282,7 +282,7 @@ static int edit_patch(int argc, const char **argv, const char *prefix)
>  	if (out < 0)
>  		die(_("Could not open '%s' for writing."), file);
>  	rev.diffopt.file = xfdopen(out, "w");
> -	rev.diffopt.close_file = 1;
> +	rev.diffopt.fclose_file = 1;

This rename makes the patch unnecessarily tedious to review, and I do not
even agree with leaking the implementation detail that we need to
`fclose()` the file.

Let's just not?

>  	if (run_diff_files(&rev, 0))
>  		die(_("Could not write patch"));
>
> diff --git a/builtin/am.c b/builtin/am.c
> index 8355e3566f..157d264583 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -1315,7 +1315,7 @@ static void write_commit_patch(const struct am_state *state, struct commit *comm
>  	rev_info.diffopt.flags.full_index = 1;
>  	rev_info.diffopt.use_color = 0;
>  	rev_info.diffopt.file = fp;
> -	rev_info.diffopt.close_file = 1;
> +	rev_info.diffopt.fclose_file = 1; /* log_tree_commit() sets .no_free=1 */
>  	add_pending_object(&rev_info, &commit->object, "");
>  	diff_setup_done(&rev_info.diffopt);
>  	log_tree_commit(&rev_info, commit);
> @@ -1347,10 +1347,12 @@ static void write_index_patch(const struct am_state *state)
>  	rev_info.diffopt.output_format = DIFF_FORMAT_PATCH;
>  	rev_info.diffopt.use_color = 0;
>  	rev_info.diffopt.file = fp;
> -	rev_info.diffopt.close_file = 1;
> +	rev_info.diffopt.fclose_file = 1;
> +	rev_info.diffopt.no_free = 1;
>  	add_pending_object(&rev_info, &tree->object, "");
>  	diff_setup_done(&rev_info.diffopt);
>  	run_diff_index(&rev_info, 1);
> +	diff_free(&rev_info.diffopt);
>  }
>
>  /**
> diff --git a/builtin/log.c b/builtin/log.c
> index fd282def43..604ee29ec0 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -306,10 +306,11 @@ static struct itimerval early_output_timer;
>
>  static void log_show_early(struct rev_info *revs, struct commit_list *list)
>  {
> -	int i = revs->early_output, close_file = revs->diffopt.close_file;
> +	int i = revs->early_output;
>  	int show_header = 1;
> +	int no_free = revs->diffopt.no_free;
>
> -	revs->diffopt.close_file = 0;
> +	revs->diffopt.no_free = 0;
>  	sort_in_topological_order(&list, revs->sort_order);
>  	while (list && i) {
>  		struct commit *commit = list->item;
> @@ -326,8 +327,8 @@ static void log_show_early(struct rev_info *revs, struct commit_list *list)
>  		case commit_ignore:
>  			break;
>  		case commit_error:
> -			if (close_file)
> -				fclose(revs->diffopt.file);
> +			revs->diffopt.no_free = no_free;
> +			diff_free(&revs->diffopt);
>  			return;
>  		}
>  		list = list->next;
> @@ -335,8 +336,8 @@ static void log_show_early(struct rev_info *revs, struct commit_list *list)
>
>  	/* Did we already get enough commits for the early output? */
>  	if (!i) {
> -		if (close_file)
> -			fclose(revs->diffopt.file);
> +		revs->diffopt.no_free = 0;
> +		diff_free(&revs->diffopt);
>  		return;
>  	}
>
> @@ -400,7 +401,7 @@ static int cmd_log_walk(struct rev_info *rev)
>  {
>  	struct commit *commit;
>  	int saved_nrl = 0;
> -	int saved_dcctc = 0, close_file = rev->diffopt.close_file;
> +	int saved_dcctc = 0;
>
>  	if (rev->early_output)
>  		setup_early_output();
> @@ -416,7 +417,7 @@ static int cmd_log_walk(struct rev_info *rev)
>  	 * and HAS_CHANGES being accumulated in rev->diffopt, so be careful to
>  	 * retain that state information if replacing rev->diffopt in this loop
>  	 */
> -	rev->diffopt.close_file = 0;
> +	rev->diffopt.no_free = 1;
>  	while ((commit = get_revision(rev)) != NULL) {
>  		if (!log_tree_commit(rev, commit) && rev->max_count >= 0)
>  			/*
> @@ -441,8 +442,8 @@ static int cmd_log_walk(struct rev_info *rev)
>  	}
>  	rev->diffopt.degraded_cc_to_c = saved_dcctc;
>  	rev->diffopt.needed_rename_limit = saved_nrl;
> -	if (close_file)
> -		fclose(rev->diffopt.file);
> +	rev->diffopt.no_free = 0;
> +	diff_free(&rev->diffopt);
>
>  	if (rev->diffopt.output_format & DIFF_FORMAT_CHECKDIFF &&
>  	    rev->diffopt.flags.check_failed) {
> @@ -1952,18 +1953,18 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
>  	if (rev.show_notes)
>  		load_display_notes(&rev.notes_opt);
>
> -	if (use_stdout + rev.diffopt.close_file + !!output_directory > 1)
> +	if (use_stdout + rev.diffopt.fclose_file + !!output_directory > 1)
>  		die(_("--stdout, --output, and --output-directory are mutually exclusive"));
>
>  	if (use_stdout) {
>  		setup_pager();
> -	} else if (rev.diffopt.close_file) {
> +	} else if (rev.diffopt.fclose_file) {
>  		/*
>  		 * The diff code parsed --output; it has already opened the
>  		 * file, but but we must instruct it not to close after each
>  		 * diff.
>  		 */
> -		rev.diffopt.close_file = 0;
> +		rev.diffopt.no_free = 1;
>  	} else {
>  		int saved;
>
> diff --git a/diff.c b/diff.c
> index 69e3bc00ed..3e6f8f0a71 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -5187,7 +5187,7 @@ static enum parse_opt_result diff_opt_output(struct parse_opt_ctx_t *ctx,
>  	BUG_ON_OPT_NEG(unset);
>  	path = prefix_filename(ctx->prefix, arg);
>  	options->file = xfopen(path, "w");
> -	options->close_file = 1;
> +	options->fclose_file = 1;
>  	if (options->use_color != GIT_COLOR_ALWAYS)
>  		options->use_color = GIT_COLOR_NEVER;
>  	free(path);
> @@ -6399,10 +6399,10 @@ void diff_flush(struct diff_options *options)
>  		 * options->file to /dev/null should be safe, because we
>  		 * aren't supposed to produce any output anyway.
>  		 */
> -		if (options->close_file)
> +		if (options->fclose_file)
>  			fclose(options->file);

And at this stage, we should set `options->file = NULL`.

>  		options->file = xfopen("/dev/null", "w");
> -		options->close_file = 1;
> +		options->fclose_file = 1;
>  		options->color_moved = 0;
>  		for (i = 0; i < q->nr; i++) {
>  			struct diff_filepair *p = q->queue[i];
> @@ -6433,8 +6433,7 @@ void diff_flush(struct diff_options *options)
>  free_queue:
>  	free(q->queue);
>  	DIFF_QUEUE_CLEAR(q);
> -	if (options->close_file)
> -		fclose(options->file);
> +	diff_free(options);
>
>  	/*
>  	 * Report the content-level differences with HAS_CHANGES;
> @@ -6449,6 +6448,15 @@ void diff_flush(struct diff_options *options)
>  	}
>  }
>
> +void diff_free(struct diff_options *options)
> +{
> +	if (options->no_free)
> +		return;
> +	if (options->fclose_file)
> +		fclose(options->file);

And at this stage, we should set `options->file = NULL`.

> +}
> +
> +
>  static int match_filter(const struct diff_options *options, const struct diff_filepair *p)
>  {
>  	return (((p->status == DIFF_STATUS_MODIFIED) &&
> diff --git a/diff.h b/diff.h
> index 2ff2b1c7f2..d1d74c3a9e 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -49,7 +49,16 @@
>   * - Once you finish feeding the pairs of files, call `diffcore_std()`.
>   * This will tell the diffcore library to go ahead and do its work.
>   *
> + * - The `diff_opt_parse()` etc. functions might allocate memory in
> + *  `struct diff_options`. When running the API `N > 1` set `.no_free
> + *  = 1` to make the `diff_free()` invoked by `diff_flush()` below a
> + *  noop.

I have serious trouble parsing that last sentence. Would you mind
rephrasing it?

> + *
>   * - Calling `diff_flush()` will produce the output.
> + *
> + * - If you set `.no_free = 1` before set it to `0` and call
> + *   `diff_free()` again. If `.no_free = 1` was not set there's no
> + *   need to call `diff_free()`, `diff_flush()` will call it.

Again, as I mentioned before, the indicator whether things need to be
released should be whether the attribute is `NULL` or not. And once
released, ot should be set to `NULL`.

Other than that, it looks fine to me. And I definitely like the idea of
introducing a function to release all of the stuff in `struct diffopt`.

Thanks,
Dscho

>   */
>
>  struct combine_diff_path;
> @@ -328,7 +337,7 @@ struct diff_options {
>  	void (*set_default)(struct diff_options *);
>
>  	FILE *file;
> -	int close_file;
> +	int fclose_file;
>
>  #define OUTPUT_INDICATOR_NEW 0
>  #define OUTPUT_INDICATOR_OLD 1
> @@ -365,6 +374,8 @@ struct diff_options {
>
>  	struct repository *repo;
>  	struct option *parseopts;
> +
> +	int no_free;
>  };
>
>  unsigned diff_filter_bit(char status);
> @@ -559,6 +570,7 @@ void diffcore_fix_diff_index(void);
>
>  int diff_queue_is_empty(void);
>  void diff_flush(struct diff_options*);
> +void diff_free(struct diff_options*);
>  void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc);
>
>  /* diff-raw status letters */
> diff --git a/log-tree.c b/log-tree.c
> index fd0dde97ec..bb946f15f1 100644
> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -952,12 +952,14 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log
>  int log_tree_commit(struct rev_info *opt, struct commit *commit)
>  {
>  	struct log_info log;
> -	int shown, close_file = opt->diffopt.close_file;
> +	int shown;
> +	/* maybe called by e.g. cmd_log_walk(), maybe stand-alone */
> +	int no_free = opt->diffopt.no_free;
>
>  	log.commit = commit;
>  	log.parent = NULL;
>  	opt->loginfo = &log;
> -	opt->diffopt.close_file = 0;
> +	opt->diffopt.no_free = 1;
>
>  	if (opt->line_level_traverse)
>  		return line_log_print(opt, commit);
> @@ -974,7 +976,7 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
>  		fprintf(opt->diffopt.file, "\n%s\n", opt->break_bar);
>  	opt->loginfo = NULL;
>  	maybe_flush_or_die(opt->diffopt.file, "stdout");
> -	if (close_file)
> -		fclose(opt->diffopt.file);
> +	opt->diffopt.no_free = no_free;
> +	diff_free(&opt->diffopt);
>  	return shown;
>  }
> diff --git a/wt-status.c b/wt-status.c
> index 0c8287a023..41b08474e5 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -1052,7 +1052,7 @@ static void wt_longstatus_print_verbose(struct wt_status *s)
>  	rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
>  	rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
>  	rev.diffopt.file = s->fp;
> -	rev.diffopt.close_file = 0;
> +	rev.diffopt.fclose_file = 0; /* wt_status owns the s->fp */
>  	/*
>  	 * If we're not going to stdout, then we definitely don't
>  	 * want color, since we are going to the commit message
> --
> 2.30.0.284.gd98b1dd5eaa7
>
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] diff: add an API for deferred freeing
  2021-02-10 16:00         ` Johannes Schindelin
@ 2021-02-11  3:00           ` Ævar Arnfjörð Bjarmason
  2021-02-11  9:40             ` Johannes Schindelin
  0 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-11  3:00 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Junio C Hamano, Michał Kępień, Phillip Wood


On Wed, Feb 10 2021, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Fri, 5 Feb 2021, Ævar Arnfjörð Bjarmason wrote:
>
>> Add a diff_free() function to free anything we may have allocated in
>> the "diff_options" struct, and the ability to make calling it a noop
>> by setting "no_free" in "diff_options".
>
> Why do we need a `no_free` flag? Why not simply set the `free()`d (or
> `fclose()`d) attributes to `NULL`?

Because we're calling the diff API N times, so we need to not have those
be NULL while we're using them in a loop, and we need to not free()
them, and then flip "no_free = 0" at the end and free() them.

>> diff --git a/builtin/add.c b/builtin/add.c
>> index a825887c50..6319710186 100644
>> --- a/builtin/add.c
>> +++ b/builtin/add.c
>> @@ -282,7 +282,7 @@ static int edit_patch(int argc, const char **argv, const char *prefix)
>>  	if (out < 0)
>>  		die(_("Could not open '%s' for writing."), file);
>>  	rev.diffopt.file = xfdopen(out, "w");
>> -	rev.diffopt.close_file = 1;
>> +	rev.diffopt.fclose_file = 1;
>
> This rename makes the patch unnecessarily tedious to review, and I do not
> even agree with leaking the implementation detail that we need to
> `fclose()` the file.
>
> Let's just not?

Fair enough, I figured it would be easier for reviewers to reason about
to be guaranteed to see all the uses of the flag, but I'll drop it.

>> @@ -6399,10 +6399,10 @@ void diff_flush(struct diff_options *options)
>>  		 * options->file to /dev/null should be safe, because we
>>  		 * aren't supposed to produce any output anyway.
>>  		 */
>> -		if (options->close_file)
>> +		if (options->fclose_file)
>>  			fclose(options->file);
>
> And at this stage, we should set `options->file = NULL`.

Sure, why not, but unless we get rid of the need for "close_file"
there's not much point other than ease of inspection in a
debugger...[cont].

>> [...]
>> +void diff_free(struct diff_options *options)
>> +{
>> +	if (options->no_free)
>> +		return;
>> +	if (options->fclose_file)
>> +		fclose(options->file);
>
> And at this stage, we should set `options->file = NULL`.

...ditto...[cont]

>> +}
>> +
>> +
>>  static int match_filter(const struct diff_options *options, const struct diff_filepair *p)
>>  {
>>  	return (((p->status == DIFF_STATUS_MODIFIED) &&
>> diff --git a/diff.h b/diff.h
>> index 2ff2b1c7f2..d1d74c3a9e 100644
>> --- a/diff.h
>> +++ b/diff.h
>> @@ -49,7 +49,16 @@
>>   * - Once you finish feeding the pairs of files, call `diffcore_std()`.
>>   * This will tell the diffcore library to go ahead and do its work.
>>   *
>> + * - The `diff_opt_parse()` etc. functions might allocate memory in
>> + *  `struct diff_options`. When running the API `N > 1` set `.no_free
>> + *  = 1` to make the `diff_free()` invoked by `diff_flush()` below a
>> + *  noop.
>
> I have serious trouble parsing that last sentence. Would you mind
> rephrasing it?

Willdo.

>> + *
>>   * - Calling `diff_flush()` will produce the output.
>> + *
>> + * - If you set `.no_free = 1` before set it to `0` and call
>> + *   `diff_free()` again. If `.no_free = 1` was not set there's no
>> + *   need to call `diff_free()`, `diff_flush()` will call it.
>
> Again, as I mentioned before, the indicator whether things need to be
> released should be whether the attribute is `NULL` or not. And once
> released, ot should be set to `NULL`.
>
> Other than that, it looks fine to me. And I definitely like the idea of
> introducing a function to release all of the stuff in `struct diffopt`.

[cont]...I don't think it's possible in anything like the current API to
do that.

We don't want to "if (file) fclose(file)", instead we need an opt-in "if
(close_file) fclose(file)". That's because usually the "file" is
stdout. So close_file=1 is only set when we're opening a "real" file,
/dev/null or whatever.

That along with the "no_free" flag as noted above means we need both a
"no_free" and "close_file" flags.

Unless there's some way of re-arranging this that I'm missing.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] diff: add an API for deferred freeing
  2021-02-11  3:00           ` Ævar Arnfjörð Bjarmason
@ 2021-02-11  9:40             ` Johannes Schindelin
  2021-02-11 10:21               ` Jeff King
  0 siblings, 1 reply; 57+ messages in thread
From: Johannes Schindelin @ 2021-02-11  9:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Michał Kępień, Phillip Wood

[-- Attachment #1: Type: text/plain, Size: 934 bytes --]

Hi Ævar,

On Thu, 11 Feb 2021, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Feb 10 2021, Johannes Schindelin wrote:
>
> > On Fri, 5 Feb 2021, Ævar Arnfjörð Bjarmason wrote:
> >
> >> Add a diff_free() function to free anything we may have allocated in
> >> the "diff_options" struct, and the ability to make calling it a noop
> >> by setting "no_free" in "diff_options".
> >
> > Why do we need a `no_free` flag? Why not simply set the `free()`d (or
> > `fclose()`d) attributes to `NULL`?

Hmm. That was not even clear to me until I read this reply.

Doesn't this indicate that the closing is done at the wrong layer? If we
want to call a function N times and only at the last iteration should it
clean up our resources, doesn't that indicate that the clean-up should be
pulled out from that function?

I thought that's exactly what you did, but I must have glanced over
something obvious...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] diff: add an API for deferred freeing
  2021-02-11  9:40             ` Johannes Schindelin
@ 2021-02-11 10:21               ` Jeff King
  2021-02-11 10:45                 ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason
                                   ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Jeff King @ 2021-02-11 10:21 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Michał Kępień,
	Phillip Wood

On Thu, Feb 11, 2021 at 10:40:45AM +0100, Johannes Schindelin wrote:

> On Thu, 11 Feb 2021, Ævar Arnfjörð Bjarmason wrote:
> 
> > On Wed, Feb 10 2021, Johannes Schindelin wrote:
> >
> > > On Fri, 5 Feb 2021, Ævar Arnfjörð Bjarmason wrote:
> > >
> > >> Add a diff_free() function to free anything we may have allocated in
> > >> the "diff_options" struct, and the ability to make calling it a noop
> > >> by setting "no_free" in "diff_options".
> > >
> > > Why do we need a `no_free` flag? Why not simply set the `free()`d (or
> > > `fclose()`d) attributes to `NULL`?
> 
> Hmm. That was not even clear to me until I read this reply.
> 
> Doesn't this indicate that the closing is done at the wrong layer? If we
> want to call a function N times and only at the last iteration should it
> clean up our resources, doesn't that indicate that the clean-up should be
> pulled out from that function?
> 
> I thought that's exactly what you did, but I must have glanced over
> something obvious...

I think the issue then is that every caller must now be modified to do
the clean up (which otherwise happens automatically when flushing the
diff).

So I definitely agree that the current API is weird and backwards. But
flipping it now may introduce new leaks in existing callers.

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v2 0/2] diff: add an API for deferred freeing
  2021-02-11 10:21               ` Jeff King
@ 2021-02-11 10:45                 ` Ævar Arnfjörð Bjarmason
  2021-02-11 10:45                 ` [PATCH v2 1/2] " Ævar Arnfjörð Bjarmason
  2021-02-11 10:45                 ` [PATCH v2 2/2] diff: plug memory leak from regcomp() on {log,diff} -I Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-11 10:45 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Michał Kępień,
	Phillip Wood, Jeff King, Ævar Arnfjörð Bjarmason

I skipped the rename of the "close_file" flag, and updated the code &
commit messages in response to Johannes's feedback on v1. Hopefully
this is all better now.

Ævar Arnfjörð Bjarmason (2):
  diff: add an API for deferred freeing
  diff: plug memory leak from regcomp() on {log,diff} -I

 builtin/log.c | 23 ++++++++++++-----------
 diff.c        | 32 ++++++++++++++++++++++++++++----
 diff.h        | 15 ++++++++++++++-
 log-tree.c    | 10 ++++++----
 4 files changed, 60 insertions(+), 20 deletions(-)

Range-diff:
1:  531fed77f4c ! 1:  045d3f72d15 diff: add an API for deferred freeing
    @@ Commit message
         by setting "no_free" in "diff_options".
     
         This is required because when e.g. "git diff" is run we'll allocate
    -    things in that struct, use the diff machinery once, and then exit, but
    -    if we run e.g. "git log -p" we're going to re-use what we allocated
    -    across multiple diff_flush() calls, and only want to free things at
    -    the end.
    +    things in that struct, use the diff machinery once, and then exit.
    +
    +    But if we run e.g. "git log -p" we're going to re-use what we
    +    allocated across multiple diff_flush() calls, and only want to free
    +    things at the end.
     
         We've thus ended up with features like the recently added "diff -I"[1]
         where we'll leak memory. As it turns out it could have simply used the
    @@ Commit message
         the diffopt.close_file attribute, 2016-06-22).
     
         Manually adding more such flags to things log_tree_commit() every time
    -    we need to allocate something would be tedious.
    -
    -    Let's instead move that fclose() code it to a new diff_free(), in
    -    anticipation of freeing more things in that function in follow-up
    -    commits. I'm renaming the "close_file" struct member to "fclose_file"
    -    for the ease of validating this, we can be certain that these are all
    -    the relevant callsites.
    +    we need to allocate something would be tedious. Let's instead move
    +    that fclose() code it to a new diff_free(), in anticipation of freeing
    +    more things in that function in follow-up commits.
     
         Some functions such as log_tree_commit() need an idiom of optionally
         retaining a previous "no_free", as they may either free the memory
         themselves, or their caller may do so. I'm keeping that idiom in
    -    log_show_early() even though I don't think it's currently called in
    -    this manner, since it also gets passed an existing "struct rev_info"..
    +    log_show_early() for good measure, even though I don't think it's
    +    currently called in this manner. It also gets passed an existing
    +    "struct rev_info", so future callers may want to set the "no_free"
    +    flag.
    +
    +    This change is a bit hard to read because while the freeing pattern
    +    we're introducing isn't unusual, the "file" member is a special
    +    snowflake. We usually don't want to fclose() it. This is because
    +    "file" is usually stdout, in which case we don't want to fclose()
    +    it. We only want to opt-in to closing it when we e.g. open a file on
    +    the filesystem. Thus the opt-in "close_file" flag.
    +
    +    So the API in general just needs a "no_free" flag to defer freeing,
    +    but the "file" member still needs its "close_file" flag. This is made
    +    more confusing because while refactoring this code we could replace
    +    some "close_file=0" with "no_free=1", whereas others need to set both
    +    flags.
    +
    +    This is because there were some cases where an existing "close_file=0"
    +    meant "let's defer deallocation", and others where it meant "we don't
    +    want to close this file handle at all".
     
         1. 296d4a94e7 (diff: add -I<regex> that ignores matching changes,
            2020-10-20)
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## builtin/add.c ##
    -@@ builtin/add.c: static int edit_patch(int argc, const char **argv, const char *prefix)
    - 	if (out < 0)
    - 		die(_("Could not open '%s' for writing."), file);
    - 	rev.diffopt.file = xfdopen(out, "w");
    --	rev.diffopt.close_file = 1;
    -+	rev.diffopt.fclose_file = 1;
    - 	if (run_diff_files(&rev, 0))
    - 		die(_("Could not write patch"));
    - 
    -
    - ## builtin/am.c ##
    -@@ builtin/am.c: static void write_commit_patch(const struct am_state *state, struct commit *comm
    - 	rev_info.diffopt.flags.full_index = 1;
    - 	rev_info.diffopt.use_color = 0;
    - 	rev_info.diffopt.file = fp;
    --	rev_info.diffopt.close_file = 1;
    -+	rev_info.diffopt.fclose_file = 1; /* log_tree_commit() sets .no_free=1 */
    - 	add_pending_object(&rev_info, &commit->object, "");
    - 	diff_setup_done(&rev_info.diffopt);
    - 	log_tree_commit(&rev_info, commit);
    -@@ builtin/am.c: static void write_index_patch(const struct am_state *state)
    - 	rev_info.diffopt.output_format = DIFF_FORMAT_PATCH;
    - 	rev_info.diffopt.use_color = 0;
    - 	rev_info.diffopt.file = fp;
    --	rev_info.diffopt.close_file = 1;
    -+	rev_info.diffopt.fclose_file = 1;
    -+	rev_info.diffopt.no_free = 1;
    - 	add_pending_object(&rev_info, &tree->object, "");
    - 	diff_setup_done(&rev_info.diffopt);
    - 	run_diff_index(&rev_info, 1);
    -+	diff_free(&rev_info.diffopt);
    - }
    - 
    - /**
    -
      ## builtin/log.c ##
     @@ builtin/log.c: static struct itimerval early_output_timer;
      
    @@ builtin/log.c: static int cmd_log_walk(struct rev_info *rev)
      	if (rev->diffopt.output_format & DIFF_FORMAT_CHECKDIFF &&
      	    rev->diffopt.flags.check_failed) {
     @@ builtin/log.c: int cmd_format_patch(int argc, const char **argv, const char *prefix)
    - 	if (rev.show_notes)
    - 		load_display_notes(&rev.notes_opt);
    - 
    --	if (use_stdout + rev.diffopt.close_file + !!output_directory > 1)
    -+	if (use_stdout + rev.diffopt.fclose_file + !!output_directory > 1)
    - 		die(_("--stdout, --output, and --output-directory are mutually exclusive"));
    - 
    - 	if (use_stdout) {
    - 		setup_pager();
    --	} else if (rev.diffopt.close_file) {
    -+	} else if (rev.diffopt.fclose_file) {
    - 		/*
    - 		 * The diff code parsed --output; it has already opened the
      		 * file, but but we must instruct it not to close after each
      		 * diff.
      		 */
    @@ builtin/log.c: int cmd_format_patch(int argc, const char **argv, const char *pre
      
     
      ## diff.c ##
    -@@ diff.c: static enum parse_opt_result diff_opt_output(struct parse_opt_ctx_t *ctx,
    - 	BUG_ON_OPT_NEG(unset);
    - 	path = prefix_filename(ctx->prefix, arg);
    - 	options->file = xfopen(path, "w");
    --	options->close_file = 1;
    -+	options->fclose_file = 1;
    - 	if (options->use_color != GIT_COLOR_ALWAYS)
    - 		options->use_color = GIT_COLOR_NEVER;
    - 	free(path);
    +@@ diff.c: static void diff_flush_patch_all_file_pairs(struct diff_options *o)
    + 	}
    + }
    + 
    ++static void diff_free_file(struct diff_options *options)
    ++{
    ++	if (options->close_file)
    ++		fclose(options->file);
    ++}
    ++
    ++void diff_free(struct diff_options *options)
    ++{
    ++	if (options->no_free)
    ++		return;
    ++
    ++	diff_free_file(options);
    ++}
    ++
    + void diff_flush(struct diff_options *options)
    + {
    + 	struct diff_queue_struct *q = &diff_queued_diff;
     @@ diff.c: void diff_flush(struct diff_options *options)
      		 * options->file to /dev/null should be safe, because we
      		 * aren't supposed to produce any output anyway.
      		 */
     -		if (options->close_file)
    -+		if (options->fclose_file)
    - 			fclose(options->file);
    +-			fclose(options->file);
    ++		diff_free_file(options);
      		options->file = xfopen("/dev/null", "w");
    --		options->close_file = 1;
    -+		options->fclose_file = 1;
    + 		options->close_file = 1;
      		options->color_moved = 0;
    - 		for (i = 0; i < q->nr; i++) {
    - 			struct diff_filepair *p = q->queue[i];
     @@ diff.c: void diff_flush(struct diff_options *options)
      free_queue:
      	free(q->queue);
    @@ diff.c: void diff_flush(struct diff_options *options)
      
      	/*
      	 * Report the content-level differences with HAS_CHANGES;
    -@@ diff.c: void diff_flush(struct diff_options *options)
    - 	}
    - }
    - 
    -+void diff_free(struct diff_options *options)
    -+{
    -+	if (options->no_free)
    -+		return;
    -+	if (options->fclose_file)
    -+		fclose(options->file);
    -+}
    -+	
    -+
    - static int match_filter(const struct diff_options *options, const struct diff_filepair *p)
    - {
    - 	return (((p->status == DIFF_STATUS_MODIFIED) &&
     
      ## diff.h ##
     @@
       * - Once you finish feeding the pairs of files, call `diffcore_std()`.
       * This will tell the diffcore library to go ahead and do its work.
       *
    -+ * - The `diff_opt_parse()` etc. functions might allocate memory in
    -+ *  `struct diff_options`. When running the API `N > 1` set `.no_free
    -+ *  = 1` to make the `diff_free()` invoked by `diff_flush()` below a
    -+ *  noop.
    -+ *
    -  * - Calling `diff_flush()` will produce the output.
    +- * - Calling `diff_flush()` will produce the output.
    ++ * - Calling `diff_flush()` will produce the output, it will call
    ++ *   `diff_free()` to free any resources, e.g. those allocated in
    ++ *   `diff_opt_parse()`.
     + *
    -+ * - If you set `.no_free = 1` before set it to `0` and call
    -+ *   `diff_free()` again. If `.no_free = 1` was not set there's no
    -+ *   need to call `diff_free()`, `diff_flush()` will call it.
    ++ * - Set `.no_free = 1` before calling `diff_flush()` to defer the
    ++ *   freeing of allocated memory in diff_options. This is useful when
    ++ *   `diff_flush()` is being called in a loop, rather than as a
    ++ *   one-off. When setting `.no_free = 1` you must ensure that
    ++ *   `diff_free()` is called at the end, either by flipping the flag
    ++ *   before the last `diff_flush()` call, or by flipping it before
    ++ *   calling `diff_free()` yourself.
       */
      
      struct combine_diff_path;
     @@ diff.h: struct diff_options {
    - 	void (*set_default)(struct diff_options *);
    - 
    - 	FILE *file;
    --	int close_file;
    -+	int fclose_file;
    - 
    - #define OUTPUT_INDICATOR_NEW 0
    - #define OUTPUT_INDICATOR_OLD 1
    -@@ diff.h: struct diff_options {
      
      	struct repository *repo;
      	struct option *parseopts;
    @@ log-tree.c: int log_tree_commit(struct rev_info *opt, struct commit *commit)
     +	diff_free(&opt->diffopt);
      	return shown;
      }
    -
    - ## wt-status.c ##
    -@@ wt-status.c: static void wt_longstatus_print_verbose(struct wt_status *s)
    - 	rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
    - 	rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
    - 	rev.diffopt.file = s->fp;
    --	rev.diffopt.close_file = 0;
    -+	rev.diffopt.fclose_file = 0; /* wt_status owns the s->fp */
    - 	/*
    - 	 * If we're not going to stdout, then we definitely don't
    - 	 * want color, since we are going to the commit message
2:  7192cf01e71 ! 2:  f571524e6d8 diff: plug memory leak from regcomp() on {log,diff} -I
    @@ Commit message
         At that time freeing the memory was somewhat tedious, but since it
         isn't anymore with the newly introduced diff_free() let's use it.
     
    +    Let's retain the pattern for diff_free_file() and add a
    +    diff_free_ignore_regex(), even though (unlike "diff_free_file") we
    +    don't need to call it elsewhere. I think this'll make for more
    +    readable code than gradually accumulating a giant diff_free()
    +    function, sharing "int i" across unrelated code etc.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## diff.c ##
    -@@ diff.c: void diff_flush(struct diff_options *options)
    +@@ diff.c: static void diff_free_file(struct diff_options *options)
    + 		fclose(options->file);
    + }
      
    - void diff_free(struct diff_options *options)
    - {
    ++static void diff_free_ignore_regex(struct diff_options *options)
    ++{
     +	int i;
    - 	if (options->no_free)
    - 		return;
    - 	if (options->fclose_file)
    - 		fclose(options->file);
     +
     +	for (i = 0; i < options->ignore_regex_nr; i++) {
     +		regfree(options->ignore_regex[i]);
     +		free(options->ignore_regex[i]);
     +	}
     +	free(options->ignore_regex);
    ++}
    ++
    + void diff_free(struct diff_options *options)
    + {
    + 	if (options->no_free)
    + 		return;
    + 
    + 	diff_free_file(options);
    ++	diff_free_ignore_regex(options);
      }
    - 	
      
    + void diff_flush(struct diff_options *options)
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v2 1/2] diff: add an API for deferred freeing
  2021-02-11 10:21               ` Jeff King
  2021-02-11 10:45                 ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason
@ 2021-02-11 10:45                 ` Ævar Arnfjörð Bjarmason
  2021-02-11 10:45                 ` [PATCH v2 2/2] diff: plug memory leak from regcomp() on {log,diff} -I Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-11 10:45 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Michał Kępień,
	Phillip Wood, Jeff King, Ævar Arnfjörð Bjarmason

Add a diff_free() function to free anything we may have allocated in
the "diff_options" struct, and the ability to make calling it a noop
by setting "no_free" in "diff_options".

This is required because when e.g. "git diff" is run we'll allocate
things in that struct, use the diff machinery once, and then exit.

But if we run e.g. "git log -p" we're going to re-use what we
allocated across multiple diff_flush() calls, and only want to free
things at the end.

We've thus ended up with features like the recently added "diff -I"[1]
where we'll leak memory. As it turns out it could have simply used the
pattern established in 6ea57703f6 (log: prepare log/log-tree to reuse
the diffopt.close_file attribute, 2016-06-22).

Manually adding more such flags to things log_tree_commit() every time
we need to allocate something would be tedious. Let's instead move
that fclose() code it to a new diff_free(), in anticipation of freeing
more things in that function in follow-up commits.

Some functions such as log_tree_commit() need an idiom of optionally
retaining a previous "no_free", as they may either free the memory
themselves, or their caller may do so. I'm keeping that idiom in
log_show_early() for good measure, even though I don't think it's
currently called in this manner. It also gets passed an existing
"struct rev_info", so future callers may want to set the "no_free"
flag.

This change is a bit hard to read because while the freeing pattern
we're introducing isn't unusual, the "file" member is a special
snowflake. We usually don't want to fclose() it. This is because
"file" is usually stdout, in which case we don't want to fclose()
it. We only want to opt-in to closing it when we e.g. open a file on
the filesystem. Thus the opt-in "close_file" flag.

So the API in general just needs a "no_free" flag to defer freeing,
but the "file" member still needs its "close_file" flag. This is made
more confusing because while refactoring this code we could replace
some "close_file=0" with "no_free=1", whereas others need to set both
flags.

This is because there were some cases where an existing "close_file=0"
meant "let's defer deallocation", and others where it meant "we don't
want to close this file handle at all".

1. 296d4a94e7 (diff: add -I<regex> that ignores matching changes,
   2020-10-20)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/log.c | 23 ++++++++++++-----------
 diff.c        | 20 ++++++++++++++++----
 diff.h        | 15 ++++++++++++++-
 log-tree.c    | 10 ++++++----
 4 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index d0cbaaf68a0..fffaf51970d 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -307,10 +307,11 @@ static struct itimerval early_output_timer;
 
 static void log_show_early(struct rev_info *revs, struct commit_list *list)
 {
-	int i = revs->early_output, close_file = revs->diffopt.close_file;
+	int i = revs->early_output;
 	int show_header = 1;
+	int no_free = revs->diffopt.no_free;
 
-	revs->diffopt.close_file = 0;
+	revs->diffopt.no_free = 0;
 	sort_in_topological_order(&list, revs->sort_order);
 	while (list && i) {
 		struct commit *commit = list->item;
@@ -327,8 +328,8 @@ static void log_show_early(struct rev_info *revs, struct commit_list *list)
 		case commit_ignore:
 			break;
 		case commit_error:
-			if (close_file)
-				fclose(revs->diffopt.file);
+			revs->diffopt.no_free = no_free;
+			diff_free(&revs->diffopt);
 			return;
 		}
 		list = list->next;
@@ -336,8 +337,8 @@ static void log_show_early(struct rev_info *revs, struct commit_list *list)
 
 	/* Did we already get enough commits for the early output? */
 	if (!i) {
-		if (close_file)
-			fclose(revs->diffopt.file);
+		revs->diffopt.no_free = 0;
+		diff_free(&revs->diffopt);
 		return;
 	}
 
@@ -401,7 +402,7 @@ static int cmd_log_walk(struct rev_info *rev)
 {
 	struct commit *commit;
 	int saved_nrl = 0;
-	int saved_dcctc = 0, close_file = rev->diffopt.close_file;
+	int saved_dcctc = 0;
 
 	if (rev->early_output)
 		setup_early_output();
@@ -417,7 +418,7 @@ static int cmd_log_walk(struct rev_info *rev)
 	 * and HAS_CHANGES being accumulated in rev->diffopt, so be careful to
 	 * retain that state information if replacing rev->diffopt in this loop
 	 */
-	rev->diffopt.close_file = 0;
+	rev->diffopt.no_free = 1;
 	while ((commit = get_revision(rev)) != NULL) {
 		if (!log_tree_commit(rev, commit) && rev->max_count >= 0)
 			/*
@@ -442,8 +443,8 @@ static int cmd_log_walk(struct rev_info *rev)
 	}
 	rev->diffopt.degraded_cc_to_c = saved_dcctc;
 	rev->diffopt.needed_rename_limit = saved_nrl;
-	if (close_file)
-		fclose(rev->diffopt.file);
+	rev->diffopt.no_free = 0;
+	diff_free(&rev->diffopt);
 
 	if (rev->diffopt.output_format & DIFF_FORMAT_CHECKDIFF &&
 	    rev->diffopt.flags.check_failed) {
@@ -1955,7 +1956,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 		 * file, but but we must instruct it not to close after each
 		 * diff.
 		 */
-		rev.diffopt.close_file = 0;
+		rev.diffopt.no_free = 1;
 	} else {
 		int saved;
 
diff --git a/diff.c b/diff.c
index 69e3bc00ed8..a63c9ecae79 100644
--- a/diff.c
+++ b/diff.c
@@ -6336,6 +6336,20 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 	}
 }
 
+static void diff_free_file(struct diff_options *options)
+{
+	if (options->close_file)
+		fclose(options->file);
+}
+
+void diff_free(struct diff_options *options)
+{
+	if (options->no_free)
+		return;
+
+	diff_free_file(options);
+}
+
 void diff_flush(struct diff_options *options)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
@@ -6399,8 +6413,7 @@ void diff_flush(struct diff_options *options)
 		 * options->file to /dev/null should be safe, because we
 		 * aren't supposed to produce any output anyway.
 		 */
-		if (options->close_file)
-			fclose(options->file);
+		diff_free_file(options);
 		options->file = xfopen("/dev/null", "w");
 		options->close_file = 1;
 		options->color_moved = 0;
@@ -6433,8 +6446,7 @@ void diff_flush(struct diff_options *options)
 free_queue:
 	free(q->queue);
 	DIFF_QUEUE_CLEAR(q);
-	if (options->close_file)
-		fclose(options->file);
+	diff_free(options);
 
 	/*
 	 * Report the content-level differences with HAS_CHANGES;
diff --git a/diff.h b/diff.h
index 2ff2b1c7f2c..527fb56d851 100644
--- a/diff.h
+++ b/diff.h
@@ -49,7 +49,17 @@
  * - Once you finish feeding the pairs of files, call `diffcore_std()`.
  * This will tell the diffcore library to go ahead and do its work.
  *
- * - Calling `diff_flush()` will produce the output.
+ * - Calling `diff_flush()` will produce the output, it will call
+ *   `diff_free()` to free any resources, e.g. those allocated in
+ *   `diff_opt_parse()`.
+ *
+ * - Set `.no_free = 1` before calling `diff_flush()` to defer the
+ *   freeing of allocated memory in diff_options. This is useful when
+ *   `diff_flush()` is being called in a loop, rather than as a
+ *   one-off. When setting `.no_free = 1` you must ensure that
+ *   `diff_free()` is called at the end, either by flipping the flag
+ *   before the last `diff_flush()` call, or by flipping it before
+ *   calling `diff_free()` yourself.
  */
 
 struct combine_diff_path;
@@ -365,6 +375,8 @@ struct diff_options {
 
 	struct repository *repo;
 	struct option *parseopts;
+
+	int no_free;
 };
 
 unsigned diff_filter_bit(char status);
@@ -559,6 +571,7 @@ void diffcore_fix_diff_index(void);
 
 int diff_queue_is_empty(void);
 void diff_flush(struct diff_options*);
+void diff_free(struct diff_options*);
 void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc);
 
 /* diff-raw status letters */
diff --git a/log-tree.c b/log-tree.c
index e0484676507..e7fcd70ba17 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -958,12 +958,14 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log
 int log_tree_commit(struct rev_info *opt, struct commit *commit)
 {
 	struct log_info log;
-	int shown, close_file = opt->diffopt.close_file;
+	int shown;
+	/* maybe called by e.g. cmd_log_walk(), maybe stand-alone */
+	int no_free = opt->diffopt.no_free;
 
 	log.commit = commit;
 	log.parent = NULL;
 	opt->loginfo = &log;
-	opt->diffopt.close_file = 0;
+	opt->diffopt.no_free = 1;
 
 	if (opt->line_level_traverse)
 		return line_log_print(opt, commit);
@@ -980,7 +982,7 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 		fprintf(opt->diffopt.file, "\n%s\n", opt->break_bar);
 	opt->loginfo = NULL;
 	maybe_flush_or_die(opt->diffopt.file, "stdout");
-	if (close_file)
-		fclose(opt->diffopt.file);
+	opt->diffopt.no_free = no_free;
+	diff_free(&opt->diffopt);
 	return shown;
 }
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 2/2] diff: plug memory leak from regcomp() on {log,diff} -I
  2021-02-11 10:21               ` Jeff King
  2021-02-11 10:45                 ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason
  2021-02-11 10:45                 ` [PATCH v2 1/2] " Ævar Arnfjörð Bjarmason
@ 2021-02-11 10:45                 ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-11 10:45 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Michał Kępień,
	Phillip Wood, Jeff King, Ævar Arnfjörð Bjarmason

Fix a memory leak in 296d4a94e7 (diff: add -I<regex> that ignores
matching changes, 2020-10-20) by freeing the memory it allocates in
the newly introduced diff_free(). See the previous commit for details
on that.

This memory leak was intentionally introduced in 296d4a94e7, see the
discussion on a previous iteration of it in
https://lore.kernel.org/git/xmqqeelycajx.fsf@gitster.c.googlers.com/

At that time freeing the memory was somewhat tedious, but since it
isn't anymore with the newly introduced diff_free() let's use it.

Let's retain the pattern for diff_free_file() and add a
diff_free_ignore_regex(), even though (unlike "diff_free_file") we
don't need to call it elsewhere. I think this'll make for more
readable code than gradually accumulating a giant diff_free()
function, sharing "int i" across unrelated code etc.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/diff.c b/diff.c
index a63c9ecae79..bf2cbf15e77 100644
--- a/diff.c
+++ b/diff.c
@@ -6342,12 +6342,24 @@ static void diff_free_file(struct diff_options *options)
 		fclose(options->file);
 }
 
+static void diff_free_ignore_regex(struct diff_options *options)
+{
+	int i;
+
+	for (i = 0; i < options->ignore_regex_nr; i++) {
+		regfree(options->ignore_regex[i]);
+		free(options->ignore_regex[i]);
+	}
+	free(options->ignore_regex);
+}
+
 void diff_free(struct diff_options *options)
 {
 	if (options->no_free)
 		return;
 
 	diff_free_file(options);
+	diff_free_ignore_regex(options);
 }
 
 void diff_flush(struct diff_options *options)
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2021-02-11 10:50 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień
2020-10-01 18:21   ` Junio C Hamano
2020-10-07 19:48     ` Michał Kępień
2020-10-07 20:08       ` Junio C Hamano
2020-10-01 12:06 ` [PATCH 2/2] t: add -I<regex> tests Michał Kępień
2020-10-01 17:02 ` [PATCH 0/2] diff: add -I<regex> that ignores matching changes Junio C Hamano
2020-10-12  9:17 ` [PATCH v2 0/3] " Michał Kępień
2020-10-12  9:17   ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-12 11:14     ` Johannes Schindelin
2020-10-12 17:09       ` Junio C Hamano
2020-10-12 19:52     ` Junio C Hamano
2020-10-13  6:35       ` Michał Kępień
2020-10-12  9:17   ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-12 11:20     ` Johannes Schindelin
2020-10-12 20:00       ` Junio C Hamano
2020-10-12 20:39         ` Johannes Schindelin
2020-10-12 21:43           ` Junio C Hamano
2020-10-13  6:37             ` Michał Kępień
2020-10-13 15:49               ` Junio C Hamano
2020-10-13  6:36       ` Michał Kępień
2020-10-13 12:02         ` Johannes Schindelin
2020-10-13 15:53           ` Junio C Hamano
2020-10-13 18:45           ` Michał Kępień
2020-10-12 18:01     ` Junio C Hamano
2020-10-13  6:38       ` Michał Kępień
2020-10-12 20:04     ` Junio C Hamano
2020-10-13  6:38       ` Michał Kępień
2020-10-12  9:17   ` [PATCH v2 3/3] t: add -I<regex> tests Michał Kępień
2020-10-12 11:49     ` Johannes Schindelin
2020-10-13  6:38       ` Michał Kępień
2020-10-13 12:00         ` Johannes Schindelin
2020-10-13 16:00           ` Junio C Hamano
2020-10-13 19:01           ` Michał Kępień
2020-10-15 11:45             ` Johannes Schindelin
2020-10-15  7:24   ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-15  7:24     ` [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-15  7:24     ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-16 15:32       ` Phillip Wood
2020-10-16 18:04         ` Junio C Hamano
2020-10-19  9:48           ` Michał Kępień
2020-10-16 18:16       ` Junio C Hamano
2020-10-19  9:55         ` Michał Kępień
2020-10-19 17:29           ` Junio C Hamano
2020-10-16 10:00     ` [PATCH v3 0/2] " Johannes Schindelin
2020-10-20  6:48     ` [PATCH v4 " Michał Kępień
2020-10-20  6:48       ` [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-20  6:48       ` [PATCH v4 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2021-02-05 14:13       ` [PATCH 1/2] diff: add an API for deferred freeing Ævar Arnfjörð Bjarmason
2021-02-10 16:00         ` Johannes Schindelin
2021-02-11  3:00           ` Ævar Arnfjörð Bjarmason
2021-02-11  9:40             ` Johannes Schindelin
2021-02-11 10:21               ` Jeff King
2021-02-11 10:45                 ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason
2021-02-11 10:45                 ` [PATCH v2 1/2] " Ævar Arnfjörð Bjarmason
2021-02-11 10:45                 ` [PATCH v2 2/2] diff: plug memory leak from regcomp() on {log,diff} -I Ævar Arnfjörð Bjarmason
2021-02-05 14:13       ` [PATCH " Ævar Arnfjörð Bjarmason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).