* [PATCH 0/2] diff: add -I<regex> that ignores matching changes @ 2020-10-01 12:06 Michał Kępień 2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień ` (3 more replies) 0 siblings, 4 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-01 12:06 UTC (permalink / raw) To: git This patch series adds a new diff option that enables ignoring changes whose all lines (changed, removed, and added) match a given regular expression. This is similar to the -I option in standalone diff utilities and can be used e.g. to look for unrelated changes in commits containing a large number of automatically applied modifications (e.g. a tree-wide string replacement). The difference between -G/-S and the new -I option is that the latter filters output on a per-change basis. I personally found this feature handy quite a few times while reviewing patches and a quick web search yields some questions from other people looking for something like this, so I thought I would take a shot at submitting it upstream. Please note that while this patch series in its current shape builds, seems to pass tests, and appears to work as I would expect it to, this is my first attempt at contributing to Git and I did come across a few stumbling blocks: - Instead of adding an extra flag for 'xdl_opts', would it be better to make sure xpparam_t structures are always zero-initialized before xdl_diff() is called (see commit message for patch 1 for context)? - If this patch series is accepted, should xdl_mark_ignorable() be renamed to something like xdl_mark_ignorable_blank() for slightly improved code clarity? - Would a more thorough explanation of the option help or hurt? - Given that hunk emitting rules are the same for the -I option proposed in this patch series as for --ignore-blank-lines, is it necessary to duplicate the (impressively extensive!) --inter-hunk-context tests in t/t4015-diff-whitespace.sh for -I? I decided against doing that for the time being (though I did add some basic tests for such scenarios in patch 2, see the "with -U1" test). - Should tests be added in a separate commit? This is what I did as I thought it would help with readability, but... Thanks in advance for taking a look! Michał Kępień (2): diff: add -I<regex> that ignores matching changes t: add -I<regex> tests Documentation/diff-options.txt | 3 + diff.c | 16 ++ diff.h | 2 + t/t4069-diff-ignore-regex.sh | 330 +++++++++++++++++++++++++++++++++ xdiff/xdiff.h | 2 + xdiff/xdiffi.c | 36 ++++ 6 files changed, 389 insertions(+) create mode 100755 t/t4069-diff-ignore-regex.sh -- 2.28.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 1/2] diff: add -I<regex> that ignores matching changes 2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień @ 2020-10-01 12:06 ` Michał Kępień 2020-10-01 18:21 ` Junio C Hamano 2020-10-01 12:06 ` [PATCH 2/2] t: add -I<regex> tests Michał Kępień ` (2 subsequent siblings) 3 siblings, 1 reply; 57+ messages in thread From: Michał Kępień @ 2020-10-01 12:06 UTC (permalink / raw) To: git Add a new diff option that enables ignoring changes whose all lines (changed, removed, and added) match a given regular expression. This is similar to the -I option in standalone diff utilities and can be used e.g. to look for unrelated changes in commits containing a large number of automatically applied modifications (e.g. a tree-wide string replacement). The difference between -G/-S and the new -I option is that the latter filters output on a per-change basis. Use the 'ignore' field of xdchange_t for marking a change as ignored or not. Since the same field is used by --ignore-blank-lines, identical hunk emitting rules apply for --ignore-blank-lines and -I. These two options can also be used together at the same time. Apart from adding a new field to struct diff_options, also define a new flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure. This is done because the xpparam_t structure (which is used for passing around the regular expression supplied to -I) is not zeroed out in all call stacks involving xdl_diff() and thus only performing a NULL check on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating garbage on the stack as a regular expression. As the 'flags' field of xpparam_t is initialized in all call stacks involving xdl_diff(), adding a flag check avoids that problem. Signed-off-by: Michał Kępień <michal@isc.org> --- Documentation/diff-options.txt | 3 +++ diff.c | 16 +++++++++++++++ diff.h | 2 ++ xdiff/xdiff.h | 2 ++ xdiff/xdiffi.c | 36 ++++++++++++++++++++++++++++++++++ 5 files changed, 59 insertions(+) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 573fb9bb71..a7ef3f5645 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -687,6 +687,9 @@ endif::git-format-patch[] --ignore-blank-lines:: Ignore changes whose lines are all blank. +-I<regex>:: + Ignore changes whose all lines match <regex>. + --inter-hunk-context=<lines>:: Show the context between diff hunks, up to the specified number of lines, thereby fusing hunks that are close to each other. diff --git a/diff.c b/diff.c index 2bb2f8f57e..c9603c941e 100644 --- a/diff.c +++ b/diff.c @@ -3587,6 +3587,7 @@ static void builtin_diff(const char *name_a, if (header.len && !o->flags.suppress_diff_headers) ecbdata.header = &header; xpp.flags = o->xdl_opts; + xpp.ignore_regex = o->ignore_regex; xpp.anchors = o->anchors; xpp.anchors_nr = o->anchors_nr; xecfg.ctxlen = o->context; @@ -3716,6 +3717,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b, memset(&xpp, 0, sizeof(xpp)); memset(&xecfg, 0, sizeof(xecfg)); xpp.flags = o->xdl_opts; + xpp.ignore_regex = o->ignore_regex; xpp.anchors = o->anchors; xpp.anchors_nr = o->anchors_nr; xecfg.ctxlen = o->context; @@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt, return 0; } +static int diff_opt_ignore_regex(const struct option *opt, + const char *arg, int unset) +{ + struct diff_options *options = opt->value; + + BUG_ON_OPT_NEG(unset); + options->xdl_opts |= XDF_IGNORE_REGEX; + options->ignore_regex = arg; + return 0; +} + static int diff_opt_pickaxe_regex(const struct option *opt, const char *arg, int unset) { @@ -5491,6 +5504,9 @@ static void prep_parse_options(struct diff_options *options) OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts, N_("ignore changes whose lines are all blank"), XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG), + OPT_CALLBACK_F('I', NULL, options, N_("<regex>"), + N_("ignore changes whose all lines match <regex>"), + 0, diff_opt_ignore_regex), OPT_BIT(0, "indent-heuristic", &options->xdl_opts, N_("heuristic to shift diff hunk boundaries for easy reading"), XDF_INDENT_HEURISTIC), diff --git a/diff.h b/diff.h index 3de343270f..0b8871c0c1 100644 --- a/diff.h +++ b/diff.h @@ -234,6 +234,8 @@ struct diff_options { */ const char *pickaxe; + const char *ignore_regex; + const char *single_follow; const char *a_prefix, *b_prefix; const char *line_prefix; diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index 032e3a9f41..db28055a5e 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -40,6 +40,7 @@ extern "C" { XDF_IGNORE_CR_AT_EOL) #define XDF_IGNORE_BLANK_LINES (1 << 7) +#define XDF_IGNORE_REGEX (1 << 8) #define XDF_PATIENCE_DIFF (1 << 14) #define XDF_HISTOGRAM_DIFF (1 << 15) @@ -78,6 +79,7 @@ typedef struct s_mmbuffer { typedef struct s_xpparam { unsigned long flags; + const char *ignore_regex; /* See Documentation/diff-options.txt. */ char **anchors; diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index bd035139f9..13f7ab95ac 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) } } +static void +xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, + const char *ignore_regex) +{ + xdchange_t *xch; + regex_t regex; + + if (regcomp(®ex, ignore_regex, REG_EXTENDED | REG_NEWLINE)) + die("invalid regex: %s", ignore_regex); + + for (xch = xscr; xch; xch = xch->next) { + regmatch_t regmatch; + xrecord_t **rec; + int ignore = 1; + long i; + + rec = &xe->xdf1.recs[xch->i1]; + for (i = 0; i < xch->chg1 && ignore; i++) + ignore = !regexec_buf(®ex, rec[i]->ptr, rec[i]->size, + 1, ®match, 0); + + rec = &xe->xdf2.recs[xch->i2]; + for (i = 0; i < xch->chg2 && ignore; i++) + ignore = !regexec_buf(®ex, rec[i]->ptr, rec[i]->size, + 1, ®match, 0); + + /* + * Do not override --ignore-blank-lines. + */ + xch->ignore |= ignore; + } +} + int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb) { xdchange_t *xscr; @@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, if (xpp->flags & XDF_IGNORE_BLANK_LINES) xdl_mark_ignorable(xscr, &xe, xpp->flags); + if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex) + xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex); + if (ef(&xe, xscr, ecb, xecfg) < 0) { xdl_free_script(xscr); -- 2.28.0 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] diff: add -I<regex> that ignores matching changes 2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień @ 2020-10-01 18:21 ` Junio C Hamano 2020-10-07 19:48 ` Michał Kępień 0 siblings, 1 reply; 57+ messages in thread From: Junio C Hamano @ 2020-10-01 18:21 UTC (permalink / raw) To: Michał Kępień; +Cc: git Michał Kępień <michal@isc.org> writes: > Apart from adding a new field to struct diff_options, also define a new > flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure. > This is done because the xpparam_t structure (which is used for passing > around the regular expression supplied to -I) is not zeroed out in all > call stacks involving xdl_diff() and thus only performing a NULL check > on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating > garbage on the stack as a regular expression. As the 'flags' field of > xpparam_t is initialized in all call stacks involving xdl_diff(), adding > a flag check avoids that problem. You mentioned in your cover letter about this, and I tend to agree that this feels like a hack, at least from the first glance. The next feature that wants to have an optional pointer in xpparam_t and have the code behave differently with the data pointed by it would need to waste another bit the same way. Do you already know (read: I am not asking you to dig to find out immediately---just asking if you already know the answer) if there is an inherent reason why they cannot be memset(0) before use? It seems like a much better approach to ensure that we clear the structure. Doesn't existing anchors array share the same issue (at least anchors_nr must be cleared if there is no interesting anchors) already? IOW, should "git grep anchors_nr" be a valid way to identify _all_ places where you need to clear the ignore_regex field? > +-I<regex>:: > + Ignore changes whose all lines match <regex>. > + The implementation seems to allow only one regex, but I suspect we'd want to mimic $ printf "%s\n" a a a >test_a $ printf "%s\n" b b b >test_b $ diff -Ia test_a test_b $ diff -Ib test_a test_b $ diff -Ia -Ib test_a test_b and until that happens, the explanation needs to say something about earlier <regex> being discarded when this option is given more than once. > @@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt, > return 0; > } > > +static int diff_opt_ignore_regex(const struct option *opt, > + const char *arg, int unset) > +{ > + struct diff_options *options = opt->value; > + > + BUG_ON_OPT_NEG(unset); > + options->xdl_opts |= XDF_IGNORE_REGEX; > + options->ignore_regex = arg; When given twice or more, the earlier value gets lost (it does not leak, luckily, though). > + return 0; > +} If we somehow can lose the use of XDF_IGNORE_REGEX bit, we do not have to have this as a callback. Instead, it can be OPT_STRING with the current semantics of "only the last one is used", or we can use OPT_STRING_LIST to keep all the expressions. On the other hand, I wonder if it would be a valid approach to make the new field(s) in diff_options a "regex_t *ignore_regex" (and add an "int ignore_regex_nr" next to it, if we shoot for multiple -I options), instead of "const char *". For that, we would need a callback even without XDF_IGNORE_REGEX bit having to futz with xdl_opts field. Doing so would give us a chance to compile and notice a malformed regular expression in diff.c, before it hits xdiff/ layer, which may or may not be a good thing. > @@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) > } > } I agree with what you said in the cover letter that it would be a good idea to name the existing xdl_mark_ignorable() and the new one in such a way that they look similar and parallel, by renaming the former to xdl_mark_ignorable_lines or something. > +static void > +xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, > + const char *ignore_regex) I know some coding standard do chomp line immediately before the function name (I grew up with one), but that is not what this project uses, and judging from the surrounding code, it is not what the upstream xdiff source we borrowed uses, either. > +{ > + xdchange_t *xch; > + regex_t regex; > + > + if (regcomp(®ex, ignore_regex, REG_EXTENDED | REG_NEWLINE)) > + die("invalid regex: %s", ignore_regex); If we compile in diff.c layer and pass regex_t down here, we won't have to fail here this deep in the callchain. > + for (xch = xscr; xch; xch = xch->next) { > + regmatch_t regmatch; > + xrecord_t **rec; > + int ignore = 1; > + long i; > + > + rec = &xe->xdf1.recs[xch->i1]; > + for (i = 0; i < xch->chg1 && ignore; i++) > + ignore = !regexec_buf(®ex, rec[i]->ptr, rec[i]->size, > + 1, ®match, 0); > + rec = &xe->xdf2.recs[xch->i2]; > + for (i = 0; i < xch->chg2 && ignore; i++) > + ignore = !regexec_buf(®ex, rec[i]->ptr, rec[i]->size, > + 1, ®match, 0); > + > + /* > + * Do not override --ignore-blank-lines. > + */ > + xch->ignore |= ignore; Well, you could optimize out the whole regexp matching by adding if (xch->ignore) continue; before the two loops try to find a non-matching line, no? > + } > +} > + > int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > xdemitconf_t const *xecfg, xdemitcb_t *ecb) { > xdchange_t *xscr; > @@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > if (xpp->flags & XDF_IGNORE_BLANK_LINES) > xdl_mark_ignorable(xscr, &xe, xpp->flags); > > + if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex) > + xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex); > + > if (ef(&xe, xscr, ecb, xecfg) < 0) { > > xdl_free_script(xscr); Thanks for an exciting read ;-) ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] diff: add -I<regex> that ignores matching changes 2020-10-01 18:21 ` Junio C Hamano @ 2020-10-07 19:48 ` Michał Kępień 2020-10-07 20:08 ` Junio C Hamano 0 siblings, 1 reply; 57+ messages in thread From: Michał Kępień @ 2020-10-07 19:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: git > > Apart from adding a new field to struct diff_options, also define a new > > flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure. > > This is done because the xpparam_t structure (which is used for passing > > around the regular expression supplied to -I) is not zeroed out in all > > call stacks involving xdl_diff() and thus only performing a NULL check > > on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating > > garbage on the stack as a regular expression. As the 'flags' field of > > xpparam_t is initialized in all call stacks involving xdl_diff(), adding > > a flag check avoids that problem. > > You mentioned in your cover letter about this, and I tend to agree > that this feels like a hack, at least from the first glance. The > next feature that wants to have an optional pointer in xpparam_t and > have the code behave differently with the data pointed by it would > need to waste another bit the same way. Agreed. > Do you already know (read: > I am not asking you to dig to find out immediately---just asking if > you already know the answer) if there is an inherent reason why they > cannot be memset(0) before use? It seems like a much better > approach to ensure that we clear the structure. I do not know of any reason for xpparam_t structures to not be zero-initialized. In fact, they _are_ zero-initialized most of the time; AFAICT, there are just three places in the tree in which they are not. Would you like me to address that in a separate patch in this patch series? > Doesn't existing > anchors array share the same issue (at least anchors_nr must be > cleared if there is no interesting anchors) already? IOW, should > "git grep anchors_nr" be a valid way to identify _all_ places where > you need to clear the ignore_regex field? Yes, the 'anchors' and 'anchors_nr' fields of xpparam_t are also affected by the same problem, but the symptoms are more benign in their case because these fields are only used in xdiff/xpatience.c, i.e. when XDF_PATIENCE_DIFF is set in 'flags' - and as I already mentioned in commit messages, that field is always initialized properly, so there is currently no practical possibility of stack garbage being interpreted as e.g. a pointer to the anchor array. > > +-I<regex>:: > > + Ignore changes whose all lines match <regex>. > > + > > The implementation seems to allow only one regex, but I suspect we'd > want to mimic > > $ printf "%s\n" a a a >test_a > $ printf "%s\n" b b b >test_b > $ diff -Ia test_a test_b > $ diff -Ib test_a test_b > $ diff -Ia -Ib test_a test_b Ah, sure. After skimming through various man pages available online, I was not sure whether all standalone versions of diff which support -I allow that switch to be used multiple times and I thought it would be better to start with the simplest thing possible. If you would rather have the above implemented immediately, I can sure try that in v2. > and until that happens, the explanation needs to say something about > earlier <regex> being discarded when this option is given more than > once. Sorry, where do I look for that explanation? I tried to make -I behave similarly to -G and -S, each of which can be specified more than once, but the last value provided overrides the earlier ones - and I could not find any mention of that behavior in the docs. Please help? > > @@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt, > > return 0; > > } > > > > +static int diff_opt_ignore_regex(const struct option *opt, > > + const char *arg, int unset) > > +{ > > + struct diff_options *options = opt->value; > > + > > + BUG_ON_OPT_NEG(unset); > > + options->xdl_opts |= XDF_IGNORE_REGEX; > > + options->ignore_regex = arg; > > When given twice or more, the earlier value gets lost (it does not > leak, luckily, though). Right, as I mentioned above, I just wanted to start with something simple that resembles -G/-S. I will revise this part in v2 if you would like to see support for multiple regular expressions in this patch series. > > + return 0; > > +} > > If we somehow can lose the use of XDF_IGNORE_REGEX bit, we do not > have to have this as a callback. Instead, it can be OPT_STRING with > the current semantics of "only the last one is used", or we can use > OPT_STRING_LIST to keep all the expressions. I see, thanks for the hints! > On the other hand, I wonder if it would be a valid approach to make > the new field(s) in diff_options a "regex_t *ignore_regex" (and add > an "int ignore_regex_nr" next to it, if we shoot for multiple -I > options), instead of "const char *". For that, we would need a > callback even without XDF_IGNORE_REGEX bit having to futz with > xdl_opts field. > > Doing so would give us a chance to compile and notice a malformed > regular expression in diff.c, before it hits xdiff/ layer, which may > or may not be a good thing. I have not thought about this approach before, but it sounds elegant to me, at least at first glance - option parsing code sounds like the right place for sanitizing user-provided parameters. Doubly so if we take the multiple -I options approach. I will try this in v2. > > @@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) > > } > > } > > I agree with what you said in the cover letter that it would be a > good idea to name the existing xdl_mark_ignorable() and the new one > in such a way that they look similar and parallel, by renaming the > former to xdl_mark_ignorable_lines or something. Then I will do that in v2. Separate patch? > > +static void > > +xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, > > + const char *ignore_regex) > > I know some coding standard do chomp line immediately before the > function name (I grew up with one), but that is not what this > project uses, and judging from the surrounding code, it is not what > the upstream xdiff source we borrowed uses, either. Oh, right, sorry - force of habit. I will fix this in v2. > > +{ > > + xdchange_t *xch; > > + regex_t regex; > > + > > + if (regcomp(®ex, ignore_regex, REG_EXTENDED | REG_NEWLINE)) > > + die("invalid regex: %s", ignore_regex); > > If we compile in diff.c layer and pass regex_t down here, we won't > have to fail here this deep in the callchain. Agreed - I already responded to this remark above. > > + for (xch = xscr; xch; xch = xch->next) { > > + regmatch_t regmatch; > > + xrecord_t **rec; > > + int ignore = 1; > > + long i; > > + > > + rec = &xe->xdf1.recs[xch->i1]; > > + for (i = 0; i < xch->chg1 && ignore; i++) > > + ignore = !regexec_buf(®ex, rec[i]->ptr, rec[i]->size, > > + 1, ®match, 0); > > + rec = &xe->xdf2.recs[xch->i2]; > > + for (i = 0; i < xch->chg2 && ignore; i++) > > + ignore = !regexec_buf(®ex, rec[i]->ptr, rec[i]->size, > > + 1, ®match, 0); > > + > > + /* > > + * Do not override --ignore-blank-lines. > > + */ > > + xch->ignore |= ignore; > > Well, you could optimize out the whole regexp matching by adding > > if (xch->ignore) > continue; > > before the two loops try to find a non-matching line, no? Ah, of course, it looks so obvious now that you pointed it out :-) I guess I was copy-past^W^W^Wtrying to make xdl_mark_ignorable_regex() look similar to xdl_mark_ignorable(). I will use your suggestion in v2. > > + } > > +} > > + > > int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > > xdemitconf_t const *xecfg, xdemitcb_t *ecb) { > > xdchange_t *xscr; > > @@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > > if (xpp->flags & XDF_IGNORE_BLANK_LINES) > > xdl_mark_ignorable(xscr, &xe, xpp->flags); > > > > + if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex) > > + xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex); > > + > > if (ef(&xe, xscr, ecb, xecfg) < 0) { > > > > xdl_free_script(xscr); > > Thanks for an exciting read ;-) My pleasure ;-) And thanks for taking a look. Sorry about the long turnaround time, but unfortunately this is the best I can do at the moment. -- Best regards, Michał Kępień ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] diff: add -I<regex> that ignores matching changes 2020-10-07 19:48 ` Michał Kępień @ 2020-10-07 20:08 ` Junio C Hamano 0 siblings, 0 replies; 57+ messages in thread From: Junio C Hamano @ 2020-10-07 20:08 UTC (permalink / raw) To: Michał Kępień; +Cc: git Michał Kępień <michal@isc.org> writes: >> cannot be memset(0) before use? It seems like a much better >> approach to ensure that we clear the structure. > > I do not know of any reason for xpparam_t structures to not be > zero-initialized. In fact, they _are_ zero-initialized most of the > time; AFAICT, there are just three places in the tree in which they are > not. > > Would you like me to address that in a separate patch in this patch > series? Yes, as a preliminary clean-up patch before the real/fun stuff ;-) >> > +-I<regex>:: >> > + Ignore changes whose all lines match <regex>. >> > + >> >> The implementation seems to allow only one regex, but I suspect we'd >> want to mimic >> >> $ printf "%s\n" a a a >test_a >> $ printf "%s\n" b b b >test_b >> $ diff -Ia test_a test_b >> $ diff -Ib test_a test_b >> $ diff -Ia -Ib test_a test_b > > Ah, sure. After skimming through various man pages available online, I > was not sure whether all standalone versions of diff which support -I > allow that switch to be used multiple times and I thought it would be > better to start with the simplest thing possible. If you would rather > have the above implemented immediately, I can sure try that in v2. > >> and until that happens, the explanation needs to say something about >> earlier <regex> being discarded when this option is given more than >> once. > > Sorry, where do I look for that explanation? You wrote "Ignore changes whose all lines match <regex>"; I was suggesting that we also need "when given more than once, all but the last <regex> are ignored" after that sentence, because that is not what people who know how -I works in versions of "diff" that support it expect. But I think it should be trivial to make it a list of patterns and try to match against them in a loop. So let's support multiple patterns from the get-go. > I have not thought about this approach before, but it sounds elegant to > me, at least at first glance - option parsing code sounds like the right > place for sanitizing user-provided parameters. Doubly so if we take the > multiple -I options approach. I will try this in v2. Sounds good. >> I agree with what you said in the cover letter that it would be a >> good idea to name the existing xdl_mark_ignorable() and the new one >> in such a way that they look similar and parallel, by renaming the >> former to xdl_mark_ignorable_lines or something. > > Then I will do that in v2. Separate patch? Given that the function has only one caller, I think it is better done within the same patch. xdl_mark_ignorable() as the name of the function is not wrong per-se, so it does not deserve to be renamed standalone in a preliminary clean-up patch---there is nothing to be cleaned up. The fact that we introduce a similarly tasked function makes its current name less desirable, so adjusting to the new world order is better done as part of the same patch. > My pleasure ;-) And thanks for taking a look. Sorry about the long > turnaround time, but unfortunately this is the best I can do at the > moment. Pleasure is mutual ;-) We've survived without -I for 15 years. Even a few more months would not hurt anybody. Take time, aim for a quality job, and most importantly, have fun. Thanks. ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 2/2] t: add -I<regex> tests 2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień 2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień @ 2020-10-01 12:06 ` Michał Kępień 2020-10-01 17:02 ` [PATCH 0/2] diff: add -I<regex> that ignores matching changes Junio C Hamano 2020-10-12 9:17 ` [PATCH v2 0/3] " Michał Kępień 3 siblings, 0 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-01 12:06 UTC (permalink / raw) To: git Exercise the new 'git diff -I<regex>' option in various scenarios to ensure it behaves as expected. Signed-off-by: Michał Kępień <michal@isc.org> --- t/t4069-diff-ignore-regex.sh | 330 +++++++++++++++++++++++++++++++++++ 1 file changed, 330 insertions(+) create mode 100755 t/t4069-diff-ignore-regex.sh diff --git a/t/t4069-diff-ignore-regex.sh b/t/t4069-diff-ignore-regex.sh new file mode 100755 index 0000000000..323687f1dc --- /dev/null +++ b/t/t4069-diff-ignore-regex.sh @@ -0,0 +1,330 @@ +#!/bin/sh + +test_description='Test diff -I<regex>' + +. ./test-lib.sh +. "$TEST_DIRECTORY"/diff-lib.sh + +test_expect_success setup ' + test_seq 20 >x && + git update-index --add x +' + +test_expect_success 'one line changed' ' + test_seq 20 | sed "s/10/100/" >x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -7,7 +7,7 @@ + 7 + 8 + 9 + -10 + +100 + 11 + 12 + 13 + EOF + compare_diff_patch expected plain && + + # Both old and new line match regex - ignore change + git diff -I "^10" >actual && + test_must_be_empty actual && + + # Only old line matches regex - do not ignore change + git diff -I "^10\$" >actual && + compare_diff_patch plain actual && + + # Only new line matches regex - do not ignore change + git diff -I "^100" >actual && + compare_diff_patch plain actual +' + +test_expect_success 'one line removed' ' + test_seq 20 | sed "10d" >x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -7,7 +7,6 @@ + 7 + 8 + 9 + -10 + 11 + 12 + 13 + EOF + compare_diff_patch expected plain && + + # Removed line matches regex - ignore change + git diff -I "^10" >actual && + test_must_be_empty actual && + + # Removed line does not match regex - do not ignore change + git diff -I "^101" >actual && + compare_diff_patch plain actual +' + +test_expect_success 'one line added' ' + test_seq 21 >x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -18,3 +18,4 @@ + 18 + 19 + 20 + +21 + EOF + compare_diff_patch expected plain && + + # Added line matches regex - ignore change + git diff -I "^21" >actual && + test_must_be_empty actual && + + # Added line does not match regex - do not ignore change + git diff -I "^212" >actual && + compare_diff_patch plain actual +' + +test_expect_success 'last two lines changed' ' + test_seq 20 | sed "s/19/21/; s/20/22/" >x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -16,5 +16,5 @@ + 16 + 17 + 18 + -19 + -20 + +21 + +22 + EOF + compare_diff_patch expected plain && + + # All changed lines match regex - ignore change + git diff -I "^[12]" >actual && + test_must_be_empty actual && + + # Not all changed lines match regex - do not ignore change + git diff -I "^2" >actual && + compare_diff_patch plain actual +' + +test_expect_success 'two non-adjacent lines removed in the same hunk' ' + test_seq 20 | sed "1d; 3d" >x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,6 +1,4 @@ + -1 + 2 + -3 + 4 + 5 + 6 + EOF + compare_diff_patch expected plain && + + # Both removed lines match regex - ignore hunk + git diff -I "^[1-3]" >actual && + test_must_be_empty actual && + + # First removed line does not match regex - do not ignore hunk + git diff -I "^[2-3]" >actual && + compare_diff_patch plain actual && + + # Second removed line does not match regex - do not ignore hunk + git diff -I "^[1-2]" >actual && + compare_diff_patch plain actual +' + +test_expect_success 'two non-adjacent lines removed in the same hunk, with -U1' ' + test_seq 20 | sed "1d; 3d" >x && + + # Get plain diff + git diff -U1 >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,4 +1,2 @@ + -1 + 2 + -3 + 4 + EOF + compare_diff_patch expected plain && + + # Both removed lines match regex - ignore hunk + git diff -U1 -I "^[1-3]" >actual && + test_must_be_empty actual && + + # First removed line does not match regex, but is out of context - ignore second change + git diff -U1 -I "^[2-3]" >actual && + cat >second-change-ignored <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,2 +1 @@ + -1 + 2 + EOF + compare_diff_patch second-change-ignored actual && + + # Second removed line does not match regex, but is out of context - ignore first change + git diff -U1 -I "^[1-2]" >actual && + cat >first-change-ignored <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -2,3 +1,2 @@ + 2 + -3 + 4 + EOF + compare_diff_patch first-change-ignored actual +' + +test_expect_success 'multiple hunks' ' + test_seq 20 | sed "1d; 20d" >x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,4 +1,3 @@ + -1 + 2 + 3 + 4 + @@ -17,4 +16,3 @@ + 17 + 18 + 19 + -20 + EOF + compare_diff_patch expected plain && + + # Ignore both hunks + git diff -I "^[12]" >actual && + test_must_be_empty actual && + + # Only ignore first hunk + git diff -I "^1" >actual && + cat >first-hunk-ignored <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -17,4 +16,3 @@ + 17 + 18 + 19 + -20 + EOF + compare_diff_patch first-hunk-ignored actual && + + # Only ignore second hunk + git diff -I "^2" >actual && + cat >second-hunk-ignored <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,4 +1,3 @@ + -1 + 2 + 3 + 4 + EOF + compare_diff_patch second-hunk-ignored actual +' + +test_expect_success 'multiple hunks, with --ignore-blank-lines' ' + echo >x && + test_seq 21 >>x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,3 +1,4 @@ + + + 1 + 2 + 3 + @@ -18,3 +19,4 @@ + 18 + 19 + 20 + +21 + EOF + compare_diff_patch expected plain && + + # -I does not override --ignore-blank-lines - ignore both hunks + git diff --ignore-blank-lines -I ^21 >actual && + test_must_be_empty actual +' + +test_expect_success 'diffstat' ' + test_seq 20 | sed "s/^5/0/p; s/^15/10/; 16d" >x && + + # Get plain diffstat + git diff --stat >actual && + cat >expected <<-EOF && + x | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + EOF + test_cmp expected actual && + + # Ignore both hunks + git diff --stat -I "^[0-5]" >actual && + test_must_be_empty actual && + + # Only ignore first hunk + git diff --stat -I "^[05]" >actual && + cat >expected <<-EOF && + x | 3 +-- + 1 file changed, 1 insertion(+), 2 deletions(-) + EOF + test_cmp expected actual && + + # Only ignore second hunk + git diff --stat -I "^1" >actual && + cat >expected <<-EOF && + x | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + EOF + test_cmp expected actual +' + +test_expect_success 'invalid regex' ' + >x && + git diff -I "^[1" 2>&1 | grep "invalid regex: " +' + +test_done -- 2.28.0 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH 0/2] diff: add -I<regex> that ignores matching changes 2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień 2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień 2020-10-01 12:06 ` [PATCH 2/2] t: add -I<regex> tests Michał Kępień @ 2020-10-01 17:02 ` Junio C Hamano 2020-10-12 9:17 ` [PATCH v2 0/3] " Michał Kępień 3 siblings, 0 replies; 57+ messages in thread From: Junio C Hamano @ 2020-10-01 17:02 UTC (permalink / raw) To: Michał Kępień; +Cc: git Michał Kępień <michal@isc.org> writes: > This patch series adds a new diff option that enables ignoring changes > whose all lines (changed, removed, and added) match a given regular > expression. This is similar to the -I option in standalone diff > utilities and can be used e.g. to look for unrelated changes in commits > containing a large number of automatically applied modifications (e.g. a > tree-wide string replacement). The difference between -G/-S and the new > -I option is that the latter filters output on a per-change basis. I am uncomfortably excited to see a change to quite a low-level code of the diff machinery. It does not happen every day ;-) ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v2 0/3] diff: add -I<regex> that ignores matching changes 2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień ` (2 preceding siblings ...) 2020-10-01 17:02 ` [PATCH 0/2] diff: add -I<regex> that ignores matching changes Junio C Hamano @ 2020-10-12 9:17 ` Michał Kępień 2020-10-12 9:17 ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień ` (3 more replies) 3 siblings, 4 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-12 9:17 UTC (permalink / raw) To: git This patch series adds a new diff option that enables ignoring changes whose all lines (changed, removed, and added) match a given regular expression. This is similar to the -I option in standalone diff utilities and can be used e.g. to ignore changes which only affect code comments or to look for unrelated changes in commits containing a large number of automatically applied modifications (e.g. a tree-wide string replacement). The difference between -G/-S and the new -I option is that the latter filters output on a per-change basis. Changes from v1: - Add a new preliminary cleanup patch which ensures xpparam_t structures are always zero-initialized. (This was a prerequisite for the next change below.) - Do not add a new 'xdl_opts' flag to check whether -I was used; instead, just check whether the array of regular expressions to match against is non-NULL. - Enable the -I option to be used multiple times. As a consequence of this, regular expressions are now "pre-compiled" in the option's callback (and passed around as an array of regex_t structures) rather than deep down in xdiff code. Add test cases exercising use of multiple -I options in the same git invocation. Update documentation accordingly. - Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate that it is logically a "sibling" of xdl_mark_ignorable_regex() rather than its "parent". - Optimize xdl_mark_ignorable_regex() by making it immediately skip changes already marked as ignored by xdl_mark_ignorable_lines(). - Fix coding style issue in the prototype part of the definition of xdl_mark_ignorable_regex(). - Add "/* see Documentation/diff-options.txt */" comments for the fields added to struct diff_options and xpparam_t, mimicking the comments used for 'anchors', 'anchors_nr', and 'anchors_alloc'. - Revise commit log messages to reflect all of the above. Michał Kępień (3): merge-base, xdiff: zero out xpparam_t structures diff: add -I<regex> that ignores matching changes t: add -I<regex> tests Documentation/diff-options.txt | 4 + builtin/merge-tree.c | 1 + diff.c | 23 ++ diff.h | 4 + t/t4069-diff-ignore-regex.sh | 426 +++++++++++++++++++++++++++++++++ xdiff/xdiff.h | 4 + xdiff/xdiffi.c | 47 +++- xdiff/xhistogram.c | 2 + xdiff/xpatience.c | 2 + 9 files changed, 511 insertions(+), 2 deletions(-) create mode 100755 t/t4069-diff-ignore-regex.sh -- 2.28.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures 2020-10-12 9:17 ` [PATCH v2 0/3] " Michał Kępień @ 2020-10-12 9:17 ` Michał Kępień 2020-10-12 11:14 ` Johannes Schindelin 2020-10-12 19:52 ` Junio C Hamano 2020-10-12 9:17 ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień ` (2 subsequent siblings) 3 siblings, 2 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-12 9:17 UTC (permalink / raw) To: git xpparam_t structures are usually zero-initialized before their specific fields are assigned to, but there are three locations in the tree where that does not happen. Add the missing memset() calls in order to make initialization of xpparam_t structures consistent tree-wide and to prevent stack garbage from being used as field values. Signed-off-by: Michał Kępień <michal@isc.org> --- builtin/merge-tree.c | 1 + xdiff/xhistogram.c | 2 ++ xdiff/xpatience.c | 2 ++ 3 files changed, 5 insertions(+) diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index e72714a5a8..de8520778d 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -109,6 +109,7 @@ static void show_diff(struct merge_list *entry) xdemitconf_t xecfg; xdemitcb_t ecb; + memset(&xpp, 0, sizeof(xpp)); xpp.flags = 0; memset(&xecfg, 0, sizeof(xecfg)); xecfg.ctxlen = 3; diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c index c7b35a9667..e694bfd9e3 100644 --- a/xdiff/xhistogram.c +++ b/xdiff/xhistogram.c @@ -235,6 +235,8 @@ static int fall_back_to_classic_diff(xpparam_t const *xpp, xdfenv_t *env, int line1, int count1, int line2, int count2) { xpparam_t xpparam; + + memset(&xpparam, 0, sizeof(xpparam)); xpparam.flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK; return xdl_fall_back_diff(env, &xpparam, diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c index 3c5601b602..20699a6f60 100644 --- a/xdiff/xpatience.c +++ b/xdiff/xpatience.c @@ -318,6 +318,8 @@ static int fall_back_to_classic_diff(struct hashmap *map, int line1, int count1, int line2, int count2) { xpparam_t xpp; + + memset(&xpp, 0, sizeof(xpp)); xpp.flags = map->xpp->flags & ~XDF_DIFF_ALGORITHM_MASK; return xdl_fall_back_diff(map->env, &xpp, -- 2.28.0 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures 2020-10-12 9:17 ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień @ 2020-10-12 11:14 ` Johannes Schindelin 2020-10-12 17:09 ` Junio C Hamano 2020-10-12 19:52 ` Junio C Hamano 1 sibling, 1 reply; 57+ messages in thread From: Johannes Schindelin @ 2020-10-12 11:14 UTC (permalink / raw) To: Michał Kępień; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1262 bytes --] Hi Michał, On Mon, 12 Oct 2020, Michał Kępień wrote: > diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c > index c7b35a9667..e694bfd9e3 100644 > --- a/xdiff/xhistogram.c > +++ b/xdiff/xhistogram.c > @@ -235,6 +235,8 @@ static int fall_back_to_classic_diff(xpparam_t const *xpp, xdfenv_t *env, > int line1, int count1, int line2, int count2) > { > xpparam_t xpparam; > + > + memset(&xpparam, 0, sizeof(xpparam)); A slightly more elegant way to do this would be xpparam_t xpparam = { 0l }; Or even xpparam_t xpparam = XPPARAM_T_INIT; with #define XPPARAM_T_INIT { 0l, NULL, 0 } in `xdiff/xdiff.h`. Thanks, Dscho > xpparam.flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK; > > return xdl_fall_back_diff(env, &xpparam, > diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c > index 3c5601b602..20699a6f60 100644 > --- a/xdiff/xpatience.c > +++ b/xdiff/xpatience.c > @@ -318,6 +318,8 @@ static int fall_back_to_classic_diff(struct hashmap *map, > int line1, int count1, int line2, int count2) > { > xpparam_t xpp; > + > + memset(&xpp, 0, sizeof(xpp)); > xpp.flags = map->xpp->flags & ~XDF_DIFF_ALGORITHM_MASK; > > return xdl_fall_back_diff(map->env, &xpp, > -- > 2.28.0 > > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures 2020-10-12 11:14 ` Johannes Schindelin @ 2020-10-12 17:09 ` Junio C Hamano 0 siblings, 0 replies; 57+ messages in thread From: Junio C Hamano @ 2020-10-12 17:09 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Michał Kępień, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> xpparam_t xpparam; >> + >> + memset(&xpparam, 0, sizeof(xpparam)); > > A slightly more elegant way to do this would be > > xpparam_t xpparam = { 0l }; > > Or even > > xpparam_t xpparam = XPPARAM_T_INIT; > > with > > #define XPPARAM_T_INIT { 0l, NULL, 0 } > > in `xdiff/xdiff.h`. Let's not suggest any of the above. I think we had a recent thread [*1*] on which we agreed that "{ 0 }" is the way to spell "a naturally zero" initialization like this one, if we were to spell it out. TBH, I'd say that memset() is also OK as-is in this codebase as an established way (git grep 'memset([^,]*, 0, sizeof' gives too many hits). Thanks. [Reference] *1* https://lore.kernel.org/git/xmqq4knca7c6.fsf@gitster.c.googlers.com/ ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures 2020-10-12 9:17 ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień 2020-10-12 11:14 ` Johannes Schindelin @ 2020-10-12 19:52 ` Junio C Hamano 2020-10-13 6:35 ` Michał Kępień 1 sibling, 1 reply; 57+ messages in thread From: Junio C Hamano @ 2020-10-12 19:52 UTC (permalink / raw) To: Michał Kępień; +Cc: git Michał Kępień <michal@isc.org> writes: > diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c > index e72714a5a8..de8520778d 100644 > --- a/builtin/merge-tree.c > +++ b/builtin/merge-tree.c > @@ -109,6 +109,7 @@ static void show_diff(struct merge_list *entry) > xdemitconf_t xecfg; > xdemitcb_t ecb; > > + memset(&xpp, 0, sizeof(xpp)); > xpp.flags = 0; > memset(&xecfg, 0, sizeof(xecfg)); > xecfg.ctxlen = 3; The existing "xpp.flags=0" can then go away, and as Dscho hinted, xpparam_t xpp = { .flags = 0, }; may also be a valid way to write this. What it says is that we want the entirety of xpp to be reasonably initialized, but we do care that its .flags field to be exactly zero. > diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c > index c7b35a9667..e694bfd9e3 100644 > --- a/xdiff/xhistogram.c > +++ b/xdiff/xhistogram.c > @@ -235,6 +235,8 @@ static int fall_back_to_classic_diff(xpparam_t const *xpp, xdfenv_t *env, > int line1, int count1, int line2, int count2) > { > xpparam_t xpparam; > + > + memset(&xpparam, 0, sizeof(xpparam)); > xpparam.flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK; Likewise. Giving an initializer to the local variable def, e.g. xpparam_t xpparam = { .flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK, }; would allow us to lose one line. > return xdl_fall_back_diff(env, &xpparam, > diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c > index 3c5601b602..20699a6f60 100644 > --- a/xdiff/xpatience.c > +++ b/xdiff/xpatience.c > @@ -318,6 +318,8 @@ static int fall_back_to_classic_diff(struct hashmap *map, > int line1, int count1, int line2, int count2) > { > xpparam_t xpp; > + > + memset(&xpp, 0, sizeof(xpp)); > xpp.flags = map->xpp->flags & ~XDF_DIFF_ALGORITHM_MASK; Likewise. > return xdl_fall_back_diff(map->env, &xpp, But the patch is good as-is, given especially the way how xecfg is cleared the same way in builtin/merge-tree.c::show_diff(). Will queue. Thanks. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures 2020-10-12 19:52 ` Junio C Hamano @ 2020-10-13 6:35 ` Michał Kępień 0 siblings, 0 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-13 6:35 UTC (permalink / raw) To: Junio C Hamano, Johannes Schindelin; +Cc: git Junio, Johannes, Thank you for your feedback. > But the patch is good as-is, given especially the way how xecfg is > cleared the same way in builtin/merge-tree.c::show_diff(). I just wanted to explain that I initially planned to take the "= { 0 }" approach and I started checking whether all xpparam_t structures in the source tree are stack-allocated, but I quickly noticed that, as Junio pointed out, existing code showed an inclination towards memset(), so I settled for that instead. > Will queue. Thanks. I have a process question: since the other two patches in this patch series will need a v3, would you like me to keep bundling patch 1 in this series or should I only send revised version of patches 2 & 3 in v3? -- Best regards, Michał Kępień ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-12 9:17 ` [PATCH v2 0/3] " Michał Kępień 2020-10-12 9:17 ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień @ 2020-10-12 9:17 ` Michał Kępień 2020-10-12 11:20 ` Johannes Schindelin ` (2 more replies) 2020-10-12 9:17 ` [PATCH v2 3/3] t: add -I<regex> tests Michał Kępień 2020-10-15 7:24 ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień 3 siblings, 3 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-12 9:17 UTC (permalink / raw) To: git Add a new diff option that enables ignoring changes whose all lines (changed, removed, and added) match a given regular expression. This is similar to the -I option in standalone diff utilities and can be used e.g. to ignore changes which only affect code comments or to look for unrelated changes in commits containing a large number of automatically applied modifications (e.g. a tree-wide string replacement). The difference between -G/-S and the new -I option is that the latter filters output on a per-change basis. Use the 'ignore' field of xdchange_t for marking a change as ignored or not. Since the same field is used by --ignore-blank-lines, identical hunk emitting rules apply for --ignore-blank-lines and -I. These two options can also be used together in the same git invocation (they are complementary to each other). Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate that it is logically a "sibling" of xdl_mark_ignorable_regex() rather than its "parent". Signed-off-by: Michał Kępień <michal@isc.org> --- Documentation/diff-options.txt | 4 +++ diff.c | 23 +++++++++++++++++ diff.h | 4 +++ xdiff/xdiff.h | 4 +++ xdiff/xdiffi.c | 47 ++++++++++++++++++++++++++++++++-- 5 files changed, 80 insertions(+), 2 deletions(-) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 573fb9bb71..f8ccf85173 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -687,6 +687,10 @@ endif::git-format-patch[] --ignore-blank-lines:: Ignore changes whose lines are all blank. +-I<regex>:: + Ignore changes whose all lines match <regex>. This option may + be specified more than once. + --inter-hunk-context=<lines>:: Show the context between diff hunks, up to the specified number of lines, thereby fusing hunks that are close to each other. diff --git a/diff.c b/diff.c index 2bb2f8f57e..c5d4920fee 100644 --- a/diff.c +++ b/diff.c @@ -3587,6 +3587,8 @@ static void builtin_diff(const char *name_a, if (header.len && !o->flags.suppress_diff_headers) ecbdata.header = &header; xpp.flags = o->xdl_opts; + xpp.ignore_regex = o->ignore_regex; + xpp.ignore_regex_nr = o->ignore_regex_nr; xpp.anchors = o->anchors; xpp.anchors_nr = o->anchors_nr; xecfg.ctxlen = o->context; @@ -3716,6 +3718,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b, memset(&xpp, 0, sizeof(xpp)); memset(&xecfg, 0, sizeof(xecfg)); xpp.flags = o->xdl_opts; + xpp.ignore_regex = o->ignore_regex; + xpp.ignore_regex_nr = o->ignore_regex_nr; xpp.anchors = o->anchors; xpp.anchors_nr = o->anchors_nr; xecfg.ctxlen = o->context; @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt, return 0; } +static int diff_opt_ignore_regex(const struct option *opt, + const char *arg, int unset) +{ + struct diff_options *options = opt->value; + regex_t *regex; + + BUG_ON_OPT_NEG(unset); + regex = xcalloc(1, sizeof(*regex)); + if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE)) + die("invalid regex: %s", arg); + ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1, + options->ignore_regex_alloc); + options->ignore_regex[options->ignore_regex_nr++] = regex; + return 0; +} + static int diff_opt_pickaxe_regex(const struct option *opt, const char *arg, int unset) { @@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options) OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts, N_("ignore changes whose lines are all blank"), XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG), + OPT_CALLBACK_F('I', NULL, options, N_("<regex>"), + N_("ignore changes whose all lines match <regex>"), + 0, diff_opt_ignore_regex), OPT_BIT(0, "indent-heuristic", &options->xdl_opts, N_("heuristic to shift diff hunk boundaries for easy reading"), XDF_INDENT_HEURISTIC), diff --git a/diff.h b/diff.h index 11de52e9e9..80dbd3dfdc 100644 --- a/diff.h +++ b/diff.h @@ -234,6 +234,10 @@ struct diff_options { */ const char *pickaxe; + /* see Documentation/diff-options.txt */ + regex_t **ignore_regex; + size_t ignore_regex_nr, ignore_regex_alloc; + const char *single_follow; const char *a_prefix, *b_prefix; const char *line_prefix; diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index 032e3a9f41..883a0d770e 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -79,6 +79,10 @@ typedef struct s_mmbuffer { typedef struct s_xpparam { unsigned long flags; + /* See Documentation/diff-options.txt. */ + regex_t **ignore_regex; + size_t ignore_regex_nr; + /* See Documentation/diff-options.txt. */ char **anchors; size_t anchors_nr; diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index bd035139f9..380eb728ed 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -998,7 +998,7 @@ static int xdl_call_hunk_func(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, return 0; } -static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) +static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags) { xdchange_t *xch; @@ -1019,6 +1019,46 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) } } +static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) { + regmatch_t regmatch; + int i; + + for (i = 0; i < xpp->ignore_regex_nr; i++) + if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1, + ®match, 0)) + return 1; + + return 0; +} + +static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, + xpparam_t const *xpp) +{ + xdchange_t *xch; + + for (xch = xscr; xch; xch = xch->next) { + xrecord_t **rec; + int ignore = 1; + long i; + + /* + * Do not override --ignore-blank-lines. + */ + if (xch->ignore) + continue; + + rec = &xe->xdf1.recs[xch->i1]; + for (i = 0; i < xch->chg1 && ignore; i++) + ignore = record_matches_regex(rec[i], xpp); + + rec = &xe->xdf2.recs[xch->i2]; + for (i = 0; i < xch->chg2 && ignore; i++) + ignore = record_matches_regex(rec[i], xpp); + + xch->ignore = ignore; + } +} + int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb) { xdchange_t *xscr; @@ -1038,7 +1078,10 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, } if (xscr) { if (xpp->flags & XDF_IGNORE_BLANK_LINES) - xdl_mark_ignorable(xscr, &xe, xpp->flags); + xdl_mark_ignorable_lines(xscr, &xe, xpp->flags); + + if (xpp->ignore_regex) + xdl_mark_ignorable_regex(xscr, &xe, xpp); if (ef(&xe, xscr, ecb, xecfg) < 0) { -- 2.28.0 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-12 9:17 ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień @ 2020-10-12 11:20 ` Johannes Schindelin 2020-10-12 20:00 ` Junio C Hamano 2020-10-13 6:36 ` Michał Kępień 2020-10-12 18:01 ` Junio C Hamano 2020-10-12 20:04 ` Junio C Hamano 2 siblings, 2 replies; 57+ messages in thread From: Johannes Schindelin @ 2020-10-12 11:20 UTC (permalink / raw) To: Michał Kępień; +Cc: git [-- Attachment #1: Type: text/plain, Size: 4652 bytes --] Hi Michał, On Mon, 12 Oct 2020, Michał Kępień wrote: > @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt, > return 0; > } > > +static int diff_opt_ignore_regex(const struct option *opt, > + const char *arg, int unset) > +{ > + struct diff_options *options = opt->value; > + regex_t *regex; > + > + BUG_ON_OPT_NEG(unset); > + regex = xcalloc(1, sizeof(*regex)); > + if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE)) > + die("invalid regex: %s", arg); > + ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1, > + options->ignore_regex_alloc); > + options->ignore_regex[options->ignore_regex_nr++] = regex; A slightly more elegant way would be to have `ignore_regex` have the type `regex_t *` and use `ALLOC_GROW_BY()` (which zeroes the newly-added elements automagically). > + return 0; > +} > + > static int diff_opt_pickaxe_regex(const struct option *opt, > const char *arg, int unset) > { > [...] > @@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options) > OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts, > N_("ignore changes whose lines are all blank"), > XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG), > + OPT_CALLBACK_F('I', NULL, options, N_("<regex>"), > + N_("ignore changes whose all lines match <regex>"), > + 0, diff_opt_ignore_regex), > OPT_BIT(0, "indent-heuristic", &options->xdl_opts, > N_("heuristic to shift diff hunk boundaries for easy reading"), > XDF_INDENT_HEURISTIC), Are we releasing the `ignore_regex` anywhere? Thanks, Dscho > diff --git a/diff.h b/diff.h > index 11de52e9e9..80dbd3dfdc 100644 > --- a/diff.h > +++ b/diff.h > @@ -234,6 +234,10 @@ struct diff_options { > */ > const char *pickaxe; > > + /* see Documentation/diff-options.txt */ > + regex_t **ignore_regex; > + size_t ignore_regex_nr, ignore_regex_alloc; > + > const char *single_follow; > const char *a_prefix, *b_prefix; > const char *line_prefix; > diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h > index 032e3a9f41..883a0d770e 100644 > --- a/xdiff/xdiff.h > +++ b/xdiff/xdiff.h > @@ -79,6 +79,10 @@ typedef struct s_mmbuffer { > typedef struct s_xpparam { > unsigned long flags; > > + /* See Documentation/diff-options.txt. */ > + regex_t **ignore_regex; > + size_t ignore_regex_nr; > + > /* See Documentation/diff-options.txt. */ > char **anchors; > size_t anchors_nr; > diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c > index bd035139f9..380eb728ed 100644 > --- a/xdiff/xdiffi.c > +++ b/xdiff/xdiffi.c > @@ -998,7 +998,7 @@ static int xdl_call_hunk_func(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, > return 0; > } > > -static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) > +static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags) > { > xdchange_t *xch; > > @@ -1019,6 +1019,46 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) > } > } > > +static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) { > + regmatch_t regmatch; > + int i; > + > + for (i = 0; i < xpp->ignore_regex_nr; i++) > + if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1, > + ®match, 0)) > + return 1; > + > + return 0; > +} > + > +static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, > + xpparam_t const *xpp) > +{ > + xdchange_t *xch; > + > + for (xch = xscr; xch; xch = xch->next) { > + xrecord_t **rec; > + int ignore = 1; > + long i; > + > + /* > + * Do not override --ignore-blank-lines. > + */ > + if (xch->ignore) > + continue; > + > + rec = &xe->xdf1.recs[xch->i1]; > + for (i = 0; i < xch->chg1 && ignore; i++) > + ignore = record_matches_regex(rec[i], xpp); > + > + rec = &xe->xdf2.recs[xch->i2]; > + for (i = 0; i < xch->chg2 && ignore; i++) > + ignore = record_matches_regex(rec[i], xpp); > + > + xch->ignore = ignore; > + } > +} > + > int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > xdemitconf_t const *xecfg, xdemitcb_t *ecb) { > xdchange_t *xscr; > @@ -1038,7 +1078,10 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > } > if (xscr) { > if (xpp->flags & XDF_IGNORE_BLANK_LINES) > - xdl_mark_ignorable(xscr, &xe, xpp->flags); > + xdl_mark_ignorable_lines(xscr, &xe, xpp->flags); > + > + if (xpp->ignore_regex) > + xdl_mark_ignorable_regex(xscr, &xe, xpp); > > if (ef(&xe, xscr, ecb, xecfg) < 0) { > > -- > 2.28.0 > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-12 11:20 ` Johannes Schindelin @ 2020-10-12 20:00 ` Junio C Hamano 2020-10-12 20:39 ` Johannes Schindelin 2020-10-13 6:36 ` Michał Kępień 1 sibling, 1 reply; 57+ messages in thread From: Junio C Hamano @ 2020-10-12 20:00 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Michał Kępień, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Hi Michał, > > On Mon, 12 Oct 2020, Michał Kępień wrote: > >> @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt, >> return 0; >> } >> >> +static int diff_opt_ignore_regex(const struct option *opt, >> + const char *arg, int unset) >> +{ >> + struct diff_options *options = opt->value; >> + regex_t *regex; >> + >> + BUG_ON_OPT_NEG(unset); >> + regex = xcalloc(1, sizeof(*regex)); >> + if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE)) >> + die("invalid regex: %s", arg); >> + ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1, >> + options->ignore_regex_alloc); >> + options->ignore_regex[options->ignore_regex_nr++] = regex; > > A slightly more elegant way would be to have `ignore_regex` have the type > `regex_t *` and use `ALLOC_GROW_BY()` (which zeroes the newly-added > elements automagically). It may be "elegant", but we we know if it is "correct" on everybody's implementation of regex_t? A struct like struct foo { char *str; char in_place_buffer[10]; }; where str points at in_place_buffer[] only when it needs to point at a very short string, and points at an allocated memory on heap, can not be safely copied and used, and an array of such a struct breaks when ALLOC_GROW_BY() needs to call realloc(). Keeping an array of pointers to regex_t and growing it would not break even for such a structure type. Since we cannot rely on the implementation detail of regex_t on a single platform, I'd rather not to see anybody suggest such an "elegant" approach. Thanks. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-12 20:00 ` Junio C Hamano @ 2020-10-12 20:39 ` Johannes Schindelin 2020-10-12 21:43 ` Junio C Hamano 0 siblings, 1 reply; 57+ messages in thread From: Johannes Schindelin @ 2020-10-12 20:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: Michał Kępień, git [-- Attachment #1: Type: text/plain, Size: 2158 bytes --] Hi Junio, On Mon, 12 Oct 2020, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > Hi Michał, > > > > On Mon, 12 Oct 2020, Michał Kępień wrote: > > > >> @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt, > >> return 0; > >> } > >> > >> +static int diff_opt_ignore_regex(const struct option *opt, > >> + const char *arg, int unset) > >> +{ > >> + struct diff_options *options = opt->value; > >> + regex_t *regex; > >> + > >> + BUG_ON_OPT_NEG(unset); > >> + regex = xcalloc(1, sizeof(*regex)); > >> + if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE)) > >> + die("invalid regex: %s", arg); > >> + ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1, > >> + options->ignore_regex_alloc); > >> + options->ignore_regex[options->ignore_regex_nr++] = regex; > > > > A slightly more elegant way would be to have `ignore_regex` have the type > > `regex_t *` and use `ALLOC_GROW_BY()` (which zeroes the newly-added > > elements automagically). > > It may be "elegant", but we we know if it is "correct" on > everybody's implementation of regex_t? > > A struct like > > struct foo { > char *str; > char in_place_buffer[10]; > }; > > where str points at in_place_buffer[] only when it needs to point at > a very short string, and points at an allocated memory on heap, can > not be safely copied and used, and an array of such a struct breaks > when ALLOC_GROW_BY() needs to call realloc(). Keeping an array of > pointers to regex_t and growing it would not break even for such a > structure type. > > Since we cannot rely on the implementation detail of regex_t on a > single platform, I'd rather not to see anybody suggest such an > "elegant" approach. True, such an implementation detail is totally conceivable. Thanks for the sanity check. Having said that, my suggestion to use `ALLOC_GROW_BY()` was triggered by the use of `xcalloc()`, and now that I think further about it, an `xmalloc()` should be plenty enough: `regcomp()` does not need that struct to be initialized to all zero. Ciao, Dscho ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-12 20:39 ` Johannes Schindelin @ 2020-10-12 21:43 ` Junio C Hamano 2020-10-13 6:37 ` Michał Kępień 0 siblings, 1 reply; 57+ messages in thread From: Junio C Hamano @ 2020-10-12 21:43 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Michał Kępień, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > the use of `xcalloc()`, and now that I think further about it, an > `xmalloc()` should be plenty enough: `regcomp()` does not need that struct > to be initialized to all zero. Yup, I think it makes sense. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-12 21:43 ` Junio C Hamano @ 2020-10-13 6:37 ` Michał Kępień 2020-10-13 15:49 ` Junio C Hamano 0 siblings, 1 reply; 57+ messages in thread From: Michał Kępień @ 2020-10-13 6:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin, git > > the use of `xcalloc()`, and now that I think further about it, an > > `xmalloc()` should be plenty enough: `regcomp()` does not need that struct > > to be initialized to all zero. > > Yup, I think it makes sense. Sure thing, I will switch to xmalloc() in v3. xcalloc()'s semantics ("allocate one structure of this size") appealed to me, but there is indeed no need to zero-initialize the allocated memory before passing it to regcomp(). -- Best regards, Michał Kępień ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-13 6:37 ` Michał Kępień @ 2020-10-13 15:49 ` Junio C Hamano 0 siblings, 0 replies; 57+ messages in thread From: Junio C Hamano @ 2020-10-13 15:49 UTC (permalink / raw) To: Michał Kępień; +Cc: Johannes Schindelin, git Michał Kępień <michal@isc.org> writes: > ... xcalloc()'s semantics > ("allocate one structure of this size") appealed to me, ... FWIW, it appeals to me, too ;-) But if we are going to pass it to regcomp(), there is no point clearing it beforehand. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-12 11:20 ` Johannes Schindelin 2020-10-12 20:00 ` Junio C Hamano @ 2020-10-13 6:36 ` Michał Kępień 2020-10-13 12:02 ` Johannes Schindelin 1 sibling, 1 reply; 57+ messages in thread From: Michał Kępień @ 2020-10-13 6:36 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Hi Johannes, > > @@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options) > > OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts, > > N_("ignore changes whose lines are all blank"), > > XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG), > > + OPT_CALLBACK_F('I', NULL, options, N_("<regex>"), > > + N_("ignore changes whose all lines match <regex>"), > > + 0, diff_opt_ignore_regex), > > OPT_BIT(0, "indent-heuristic", &options->xdl_opts, > > N_("heuristic to shift diff hunk boundaries for easy reading"), > > XDF_INDENT_HEURISTIC), > > Are we releasing the `ignore_regex` anywhere? Oops, I tried to mimic what is done for 'anchors', and I failed to notice that apparently the elements of the options->anchors array are only free()'d when --patience is also used and the array pointer itself is never free()'d at all. Given this, I believe I need to fix diff_opt_ignore_regex() in patch 2 and also make sure that the memory allocated in diff_opt_anchored() gets properly released - in another preliminary clean-up patch? At first glance, diff_flush() - specifically the part below the 'free_queue' label - looks like a sane place to free() things. Am I mistaken? -- Best regards, Michał Kępień ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-13 6:36 ` Michał Kępień @ 2020-10-13 12:02 ` Johannes Schindelin 2020-10-13 15:53 ` Junio C Hamano 2020-10-13 18:45 ` Michał Kępień 0 siblings, 2 replies; 57+ messages in thread From: Johannes Schindelin @ 2020-10-13 12:02 UTC (permalink / raw) To: Michał Kępień; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1599 bytes --] Hi Michał, On Tue, 13 Oct 2020, Michał Kępień wrote: > Hi Johannes, > > > > @@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options) > > > OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts, > > > N_("ignore changes whose lines are all blank"), > > > XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG), > > > + OPT_CALLBACK_F('I', NULL, options, N_("<regex>"), > > > + N_("ignore changes whose all lines match <regex>"), > > > + 0, diff_opt_ignore_regex), > > > OPT_BIT(0, "indent-heuristic", &options->xdl_opts, > > > N_("heuristic to shift diff hunk boundaries for easy reading"), > > > XDF_INDENT_HEURISTIC), > > > > Are we releasing the `ignore_regex` anywhere? > > Oops, I tried to mimic what is done for 'anchors', and I failed to > notice that apparently the elements of the options->anchors array are > only free()'d when --patience is also used and the array pointer itself > is never free()'d at all. Given this, I believe I need to fix > diff_opt_ignore_regex() in patch 2 and also make sure that the memory > allocated in diff_opt_anchored() gets properly released - in another > preliminary clean-up patch? > > At first glance, diff_flush() - specifically the part below the > 'free_queue' label - looks like a sane place to free() things. Am I > mistaken? Oh wow, from a cursory look it seems as if the diff machinery was not exactly careful with releasing memory. I might be mistaken, but if I am not, then this would deserve a separate patch series, methinks. Ciao, Dscho ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-13 12:02 ` Johannes Schindelin @ 2020-10-13 15:53 ` Junio C Hamano 2020-10-13 18:45 ` Michał Kępień 1 sibling, 0 replies; 57+ messages in thread From: Junio C Hamano @ 2020-10-13 15:53 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Michał Kępień, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Oh wow, from a cursory look it seems as if the diff machinery was not > exactly careful with releasing memory. I might be mistaken, but if I am > not, then this would deserve a separate patch series, methinks. I wouldn't be surprised if newer parts of it is much less careful than the older parts of the machinery. Most callers of the diff machinery, including "log -p" that repeatedly generates patches, would however run just a single setup for the entire series of diff invocations before tearing it down AFIAR, so leaking the result of parsing the command line option may not be such a big deal to these callers. But some callers added recently may not follow the access pattern, in which case the machinery may have to be made more leakproof. Thanks. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-13 12:02 ` Johannes Schindelin 2020-10-13 15:53 ` Junio C Hamano @ 2020-10-13 18:45 ` Michał Kępień 1 sibling, 0 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-13 18:45 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Hi Johannes, > Oh wow, from a cursory look it seems as if the diff machinery was not > exactly careful with releasing memory. I might be mistaken, but if I am > not, then this would deserve a separate patch series, methinks. Oh, okay. Since that sounds like a non-trivial task, I should probably finish this patch series first, making sure it releases memory properly, and only then move on to looking for memory leaks in diff.c. -- Best regards, Michał Kępień ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-12 9:17 ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień 2020-10-12 11:20 ` Johannes Schindelin @ 2020-10-12 18:01 ` Junio C Hamano 2020-10-13 6:38 ` Michał Kępień 2020-10-12 20:04 ` Junio C Hamano 2 siblings, 1 reply; 57+ messages in thread From: Junio C Hamano @ 2020-10-12 18:01 UTC (permalink / raw) To: Michał Kępień; +Cc: git Michał Kępień <michal@isc.org> writes: > Add a new diff option that enables ignoring changes whose all lines > (changed, removed, and added) match a given regular expression. This is > similar to the -I option in standalone diff utilities and can be used > e.g. to ignore changes which only affect code comments or to look for > unrelated changes in commits containing a large number of automatically > applied modifications (e.g. a tree-wide string replacement). The > difference between -G/-S and the new -I option is that the latter > filters output on a per-change basis. > > Use the 'ignore' field of xdchange_t for marking a change as ignored or > not. Since the same field is used by --ignore-blank-lines, identical > hunk emitting rules apply for --ignore-blank-lines and -I. These two > options can also be used together in the same git invocation (they are > complementary to each other). > > Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate > that it is logically a "sibling" of xdl_mark_ignorable_regex() rather > than its "parent". > > Signed-off-by: Michał Kępień <michal@isc.org> Well explained. > +-I<regex>:: Since we are mimicking other folks' feature, giving also the --ignore-matching-lines=<regex> synonym to that their users are familiar with would be a good idea, no? > + Ignore changes whose all lines match <regex>. This option may > + be specified more than once. > + > --inter-hunk-context=<lines>:: > Show the context between diff hunks, up to the specified number > of lines, thereby fusing hunks that are close to each other. > diff --git a/diff.c b/diff.c > index 2bb2f8f57e..c5d4920fee 100644 > --- a/diff.c > +++ b/diff.c > @@ -3587,6 +3587,8 @@ static void builtin_diff(const char *name_a, > if (header.len && !o->flags.suppress_diff_headers) > ecbdata.header = &header; > xpp.flags = o->xdl_opts; > + xpp.ignore_regex = o->ignore_regex; > + xpp.ignore_regex_nr = o->ignore_regex_nr; > xpp.anchors = o->anchors; > xpp.anchors_nr = o->anchors_nr; > xecfg.ctxlen = o->context; > @@ -3716,6 +3718,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b, > memset(&xpp, 0, sizeof(xpp)); > memset(&xecfg, 0, sizeof(xecfg)); > xpp.flags = o->xdl_opts; > + xpp.ignore_regex = o->ignore_regex; > + xpp.ignore_regex_nr = o->ignore_regex_nr; > xpp.anchors = o->anchors; > xpp.anchors_nr = o->anchors_nr; > xecfg.ctxlen = o->context; > @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt, > return 0; > } > > +static int diff_opt_ignore_regex(const struct option *opt, > + const char *arg, int unset) > +{ > + struct diff_options *options = opt->value; > + regex_t *regex; > + > + BUG_ON_OPT_NEG(unset); > + regex = xcalloc(1, sizeof(*regex)); > + if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE)) > + die("invalid regex: %s", arg); > + ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1, > + options->ignore_regex_alloc); > + options->ignore_regex[options->ignore_regex_nr++] = regex; > + return 0; > +} Don't these parse-options callback functions have a way to tell the caller die instead of them themselves dying like this? Would it work better to "return error(... message ...)", or would that give two error messages? In anycase, this is end-user facing, so we'd want it to be localizable, e.g. die(_("invalid regexp given to -I: '%s'", arg)); probably. > static int diff_opt_pickaxe_regex(const struct option *opt, > const char *arg, int unset) > { Other than that, nicely done. Thanks. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-12 18:01 ` Junio C Hamano @ 2020-10-13 6:38 ` Michał Kępień 0 siblings, 0 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-13 6:38 UTC (permalink / raw) To: Junio C Hamano; +Cc: git > > +-I<regex>:: > > Since we are mimicking other folks' feature, giving also the > > --ignore-matching-lines=<regex> > > synonym to that their users are familiar with would be a good idea, > no? Some diff implementations (e.g. the OpenBSD one) lack the long option and I was aiming for the "greatest common divisor" for all diff implementations, but sure, I do not see how adding the synonym could hurt. I will do that in v3. > > @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt, > > return 0; > > } > > > > +static int diff_opt_ignore_regex(const struct option *opt, > > + const char *arg, int unset) > > +{ > > + struct diff_options *options = opt->value; > > + regex_t *regex; > > + > > + BUG_ON_OPT_NEG(unset); > > + regex = xcalloc(1, sizeof(*regex)); > > + if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE)) > > + die("invalid regex: %s", arg); > > + ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1, > > + options->ignore_regex_alloc); > > + options->ignore_regex[options->ignore_regex_nr++] = regex; > > + return 0; > > +} > > Don't these parse-options callback functions have a way to tell the > caller die instead of them themselves dying like this? Would it > work better to "return error(... message ...)", or would that give > two error messages? Yes, that would indeed look better, thanks - just one error message is generated when "return error(...)" is used. I just failed to notice that it is a thing. I will use it in v3. > In anycase, this is end-user facing, so we'd > want it to be localizable, e.g. > > die(_("invalid regexp given to -I: '%s'", arg)); > > probably. Absolutely, I will fix that in v3. I tried to mimic what other similar code in the tree does and a quick `git grep 'die.*regex'` yielded mostly non-localized strings, so I settled for one as well. -- Best regards, Michał Kępień ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-12 9:17 ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień 2020-10-12 11:20 ` Johannes Schindelin 2020-10-12 18:01 ` Junio C Hamano @ 2020-10-12 20:04 ` Junio C Hamano 2020-10-13 6:38 ` Michał Kępień 2 siblings, 1 reply; 57+ messages in thread From: Junio C Hamano @ 2020-10-12 20:04 UTC (permalink / raw) To: Michał Kępień; +Cc: git Michał Kępień <michal@isc.org> writes: > + /* see Documentation/diff-options.txt */ This comment adds negative value. If it were /* "-I<regexp>" */ the readers won't have to switch to the file only to find out that the comment didn't tell them where in the file to look at. > + regex_t **ignore_regex; > + size_t ignore_regex_nr, ignore_regex_alloc; > + > const char *single_follow; > const char *a_prefix, *b_prefix; > const char *line_prefix; > diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h > index 032e3a9f41..883a0d770e 100644 > --- a/xdiff/xdiff.h > +++ b/xdiff/xdiff.h > @@ -79,6 +79,10 @@ typedef struct s_mmbuffer { > typedef struct s_xpparam { > unsigned long flags; > > + /* See Documentation/diff-options.txt. */ Likewise. > + regex_t **ignore_regex; > + size_t ignore_regex_nr; > + > /* See Documentation/diff-options.txt. */ > char **anchors; > size_t anchors_nr; ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes 2020-10-12 20:04 ` Junio C Hamano @ 2020-10-13 6:38 ` Michał Kępień 0 siblings, 0 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-13 6:38 UTC (permalink / raw) To: Junio C Hamano; +Cc: git > > + /* see Documentation/diff-options.txt */ > > This comment adds negative value. > > If it were > > /* "-I<regexp>" */ > > the readers won't have to switch to the file only to find out that > the comment didn't tell them where in the file to look at. Agreed, thanks, I will fix this in v3. > > + regex_t **ignore_regex; > > + size_t ignore_regex_nr, ignore_regex_alloc; > > + > > const char *single_follow; > > const char *a_prefix, *b_prefix; > > const char *line_prefix; > > diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h > > index 032e3a9f41..883a0d770e 100644 > > --- a/xdiff/xdiff.h > > +++ b/xdiff/xdiff.h > > @@ -79,6 +79,10 @@ typedef struct s_mmbuffer { > > typedef struct s_xpparam { > > unsigned long flags; > > > > + /* See Documentation/diff-options.txt. */ > > Likewise. I will fix this in v3. -- Best regards, Michał Kępień ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v2 3/3] t: add -I<regex> tests 2020-10-12 9:17 ` [PATCH v2 0/3] " Michał Kępień 2020-10-12 9:17 ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień 2020-10-12 9:17 ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień @ 2020-10-12 9:17 ` Michał Kępień 2020-10-12 11:49 ` Johannes Schindelin 2020-10-15 7:24 ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień 3 siblings, 1 reply; 57+ messages in thread From: Michał Kępień @ 2020-10-12 9:17 UTC (permalink / raw) To: git Exercise the new -I<regex> diff option in various scenarios to ensure it behaves as expected. Signed-off-by: Michał Kępień <michal@isc.org> --- t/t4069-diff-ignore-regex.sh | 426 +++++++++++++++++++++++++++++++++++ 1 file changed, 426 insertions(+) create mode 100755 t/t4069-diff-ignore-regex.sh diff --git a/t/t4069-diff-ignore-regex.sh b/t/t4069-diff-ignore-regex.sh new file mode 100755 index 0000000000..4ddf5c67ae --- /dev/null +++ b/t/t4069-diff-ignore-regex.sh @@ -0,0 +1,426 @@ +#!/bin/sh + +test_description='Test diff -I<regex>' + +. ./test-lib.sh +. "$TEST_DIRECTORY"/diff-lib.sh + +test_expect_success setup ' + test_seq 20 >x && + git update-index --add x +' + +test_expect_success 'one line changed' ' + test_seq 20 | sed "s/10/100/" >x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -7,7 +7,7 @@ + 7 + 8 + 9 + -10 + +100 + 11 + 12 + 13 + EOF + compare_diff_patch expected plain && + + # Both old and new line match regex - ignore change + git diff -I "^10" >actual && + test_must_be_empty actual && + + # Both old and new line match some regex - ignore change + git diff -I "^10\$" -I "^100" >actual && + test_must_be_empty actual && + + # Only old line matches regex - do not ignore change + git diff -I "^10\$" >actual && + compare_diff_patch plain actual && + + # Only new line matches regex - do not ignore change + git diff -I "^100" >actual && + compare_diff_patch plain actual && + + # Only old line matches some regex - do not ignore change + git diff -I "^10\$" -I "^101" >actual && + compare_diff_patch plain actual && + + # Only new line matches some regex - do not ignore change + git diff -I "^11\$" -I "^100" >actual && + compare_diff_patch plain actual +' + +test_expect_success 'one line removed' ' + test_seq 20 | sed "10d" >x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -7,7 +7,6 @@ + 7 + 8 + 9 + -10 + 11 + 12 + 13 + EOF + compare_diff_patch expected plain && + + # Removed line matches regex - ignore change + git diff -I "^10" >actual && + test_must_be_empty actual && + + # Removed line matches some regex - ignore change + git diff -I "^10" -I "^100" >actual && + test_must_be_empty actual && + + # Removed line does not match regex - do not ignore change + git diff -I "^101" >actual && + compare_diff_patch plain actual && + + # Removed line does not match any regex - do not ignore change + git diff -I "^100" -I "^101" >actual && + compare_diff_patch plain actual +' + +test_expect_success 'one line added' ' + test_seq 21 >x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -18,3 +18,4 @@ + 18 + 19 + 20 + +21 + EOF + compare_diff_patch expected plain && + + # Added line matches regex - ignore change + git diff -I "^21" >actual && + test_must_be_empty actual && + + # Added line matches some regex - ignore change + git diff -I "^21" -I "^22" >actual && + test_must_be_empty actual && + + # Added line does not match regex - do not ignore change + git diff -I "^212" >actual && + compare_diff_patch plain actual && + + # Added line does not match any regex - do not ignore change + git diff -I "^211" -I "^212" >actual && + compare_diff_patch plain actual +' + +test_expect_success 'last two lines changed' ' + test_seq 20 | sed "s/19/21/; s/20/22/" >x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -16,5 +16,5 @@ + 16 + 17 + 18 + -19 + -20 + +21 + +22 + EOF + compare_diff_patch expected plain && + + # All changed lines match regex - ignore change + git diff -I "^[12]" >actual && + test_must_be_empty actual && + + # All changed lines match some regex - ignore change + git diff -I "^1" -I "^2" >actual && + test_must_be_empty actual && + + # Not all changed lines match regex - do not ignore change + git diff -I "^2" >actual && + compare_diff_patch plain actual && + + # Not all changed lines match some regex - do not ignore change + git diff -I "^2" -I "^3" >actual && + compare_diff_patch plain actual +' + +test_expect_success 'two non-adjacent lines removed in the same hunk' ' + test_seq 20 | sed "1d; 3d" >x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,6 +1,4 @@ + -1 + 2 + -3 + 4 + 5 + 6 + EOF + compare_diff_patch expected plain && + + # Both removed lines match regex - ignore hunk + git diff -I "^[1-3]" >actual && + test_must_be_empty actual && + + # Both removed lines match some regex - ignore hunk + git diff -I "^1" -I "^3" >actual && + test_must_be_empty actual && + + # First removed line does not match regex - do not ignore hunk + git diff -I "^[2-3]" >actual && + compare_diff_patch plain actual && + + # First removed line does not match any regex - do not ignore hunk + git diff -I "^2" -I "^3" >actual && + compare_diff_patch plain actual && + + # Second removed line does not match regex - do not ignore hunk + git diff -I "^[1-2]" >actual && + compare_diff_patch plain actual && + + # Second removed line does not match any regex - do not ignore hunk + git diff -I "^1" -I "^2" >actual && + compare_diff_patch plain actual +' + +test_expect_success 'two non-adjacent lines removed in the same hunk, with -U1' ' + test_seq 20 | sed "1d; 3d" >x && + + # Get plain diff + git diff -U1 >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,4 +1,2 @@ + -1 + 2 + -3 + 4 + EOF + compare_diff_patch expected plain && + + # Both removed lines match regex - ignore hunk + git diff -U1 -I "^[1-3]" >actual && + test_must_be_empty actual && + + # Both removed lines match some regex - ignore hunk + git diff -U1 -I "^1" -I "^3" >actual && + test_must_be_empty actual && + + # First removed line does not match regex, but is out of context - ignore second change + git diff -U1 -I "^[2-3]" >actual && + cat >second-change-ignored <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,2 +1 @@ + -1 + 2 + EOF + compare_diff_patch second-change-ignored actual && + + # First removed line does not match any regex, but is out of context - ignore second change + git diff -U1 -I "^2" -I "^3" >actual && + compare_diff_patch second-change-ignored actual && + + # Second removed line does not match regex, but is out of context - ignore first change + git diff -U1 -I "^[1-2]" >actual && + cat >first-change-ignored <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -2,3 +1,2 @@ + 2 + -3 + 4 + EOF + compare_diff_patch first-change-ignored actual && + + # Second removed line does not match any regex, but is out of context - ignore first change + git diff -U1 -I "^1" -I "^2" >actual && + compare_diff_patch first-change-ignored actual +' + +test_expect_success 'multiple hunks' ' + test_seq 20 | sed "1d; 20d" >x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,4 +1,3 @@ + -1 + 2 + 3 + 4 + @@ -17,4 +16,3 @@ + 17 + 18 + 19 + -20 + EOF + compare_diff_patch expected plain && + + # Ignore both hunks (single regex) + git diff -I "^[12]" >actual && + test_must_be_empty actual && + + # Ignore both hunks (multiple regexes) + git diff -I "^1" -I "^2" >actual && + test_must_be_empty actual && + + # Only ignore first hunk (single regex) + git diff -I "^1" >actual && + cat >first-hunk-ignored <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -17,4 +16,3 @@ + 17 + 18 + 19 + -20 + EOF + compare_diff_patch first-hunk-ignored actual && + + # Only ignore first hunk (multiple regexes) + git diff -I "^0" -I "^1" >actual && + compare_diff_patch first-hunk-ignored actual && + + # Only ignore second hunk (single regex) + git diff -I "^2" >actual && + cat >second-hunk-ignored <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,4 +1,3 @@ + -1 + 2 + 3 + 4 + EOF + compare_diff_patch second-hunk-ignored actual && + + # Only ignore second hunk (multiple regexes) + git diff -I "^2" -I "^3" >actual && + compare_diff_patch second-hunk-ignored actual +' + +test_expect_success 'multiple hunks, with --ignore-blank-lines' ' + echo >x && + test_seq 21 >>x && + + # Get plain diff + git diff >plain && + cat >expected <<-EOF && + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,3 +1,4 @@ + + + 1 + 2 + 3 + @@ -18,3 +19,4 @@ + 18 + 19 + 20 + +21 + EOF + compare_diff_patch expected plain && + + # -I does not override --ignore-blank-lines - ignore both hunks (single regex) + git diff --ignore-blank-lines -I "^21" >actual && + test_must_be_empty actual && + + # -I does not override --ignore-blank-lines - ignore both hunks (multiple regexes) + git diff --ignore-blank-lines -I "^21" -I "^12" >actual && + test_must_be_empty actual +' + +test_expect_success 'diffstat' ' + test_seq 20 | sed "s/^5/0/p; s/^15/10/; 16d" >x && + + # Get plain diffstat + git diff --stat >actual && + cat >expected <<-EOF && + x | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + EOF + test_cmp expected actual && + + # Ignore both hunks (single regex) + git diff --stat -I "^[0-5]" >actual && + test_must_be_empty actual && + + # Ignore both hunks (multiple regexes) + git diff --stat -I "^0" -I "^1" -I "^5" >actual && + test_must_be_empty actual && + + # Only ignore first hunk (single regex) + git diff --stat -I "^[05]" >actual && + cat >expected <<-EOF && + x | 3 +-- + 1 file changed, 1 insertion(+), 2 deletions(-) + EOF + test_cmp expected actual && + + # Only ignore first hunk (multiple regexes) + git diff --stat -I "^0" -I "^5" >actual && + test_cmp expected actual && + + # Only ignore second hunk (single regex) + git diff --stat -I "^1" >actual && + cat >expected <<-EOF && + x | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + EOF + test_cmp expected actual && + + # Only ignore second hunk (multiple regexes) + git diff --stat -I "^1" -I "^2" >actual && + test_cmp expected actual +' + +test_expect_success 'invalid regexes' ' + >x && + + # Single invalid regex + git diff -I "^[1" 2>&1 | grep "invalid regex: " && + + # Two regexes: first invalid, second valid + git diff -I "^[1" -I "^1" 2>&1 | grep "invalid regex: " && + + # Two invalid regexes + git diff -I "^[1" -I "^[2" 2>&1 | grep "invalid regex: " +' + +test_done -- 2.28.0 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v2 3/3] t: add -I<regex> tests 2020-10-12 9:17 ` [PATCH v2 3/3] t: add -I<regex> tests Michał Kępień @ 2020-10-12 11:49 ` Johannes Schindelin 2020-10-13 6:38 ` Michał Kępień 0 siblings, 1 reply; 57+ messages in thread From: Johannes Schindelin @ 2020-10-12 11:49 UTC (permalink / raw) To: Michał Kępień; +Cc: git [-- Attachment #1: Type: text/plain, Size: 13600 bytes --] Hi Michał, On Mon, 12 Oct 2020, Michał Kępień wrote: > Exercise the new -I<regex> diff option in various scenarios to ensure it > behaves as expected. Excellent. I was actually looking for a test in patch 2/3 and almost commented about that. > > Signed-off-by: Michał Kępień <michal@isc.org> > --- > t/t4069-diff-ignore-regex.sh | 426 +++++++++++++++++++++++++++++++++++ Hmm. I wonder whether we could do with a much more concise test script. The test suite already takes a quite long time to run, which is not a laughing matter: we had issues in the past where contributors would skip running it because it took too long, and this test is sure to exacerbate that problem. I could imagine, for example, that it would be plenty enough to do something like this instead: -- snip -- diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh index 5c7b0122b4f..bf158be137f 100755 --- a/t/t4013-diff-various.sh +++ b/t/t4013-diff-various.sh @@ -6,6 +6,7 @@ test_description='Various diff formatting options' . ./test-lib.sh +. "$TEST_DIRECTORY"/diff-lib.sh test_expect_success setup ' @@ -473,4 +474,24 @@ test_expect_success 'diff-tree --stdin with log formatting' ' test_cmp expect actual ' +test_expect_success '-I<regex>' ' + seq 50 >I.txt && + sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt && + test_must_fail git diff --no-index -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && + cat >expect <<-\EOF && + diff --git a/I.txt b/J.txt + --- a/I.txt + +++ b/J.txt + @@ -34,7 +31,6 @@ + 34 + 35 + 36 + -37 + 38 + 39 + 40 + EOF + compare_diff_patch expect actual +' + test_done -- snap -- Note how it tests various things in one go? Thanks, Dscho P.S.: My main interest in the `-I` option is its use case for `git range-diff` in Git's own context: if you want to compare your patches to what entered the `seen` branch, there will _always_ be a difference because Junio adds their DCO. Something like this can help that: git range-diff \ -I'^ Signed-off-by: Junio C Hamano <gitster@pobox.com>$' \ <my-patch-range> <the-equivalent-in-seen> > 1 file changed, 426 insertions(+) > create mode 100755 t/t4069-diff-ignore-regex.sh > > diff --git a/t/t4069-diff-ignore-regex.sh b/t/t4069-diff-ignore-regex.sh > new file mode 100755 > index 0000000000..4ddf5c67ae > --- /dev/null > +++ b/t/t4069-diff-ignore-regex.sh > @@ -0,0 +1,426 @@ > +#!/bin/sh > + > +test_description='Test diff -I<regex>' > + > +. ./test-lib.sh > +. "$TEST_DIRECTORY"/diff-lib.sh > + > +test_expect_success setup ' > + test_seq 20 >x && > + git update-index --add x > +' > + > +test_expect_success 'one line changed' ' > + test_seq 20 | sed "s/10/100/" >x && > + > + # Get plain diff > + git diff >plain && > + cat >expected <<-EOF && > + diff --git a/x b/x > + --- a/x > + +++ b/x > + @@ -7,7 +7,7 @@ > + 7 > + 8 > + 9 > + -10 > + +100 > + 11 > + 12 > + 13 > + EOF > + compare_diff_patch expected plain && > + > + # Both old and new line match regex - ignore change > + git diff -I "^10" >actual && > + test_must_be_empty actual && > + > + # Both old and new line match some regex - ignore change > + git diff -I "^10\$" -I "^100" >actual && > + test_must_be_empty actual && > + > + # Only old line matches regex - do not ignore change > + git diff -I "^10\$" >actual && > + compare_diff_patch plain actual && > + > + # Only new line matches regex - do not ignore change > + git diff -I "^100" >actual && > + compare_diff_patch plain actual && > + > + # Only old line matches some regex - do not ignore change > + git diff -I "^10\$" -I "^101" >actual && > + compare_diff_patch plain actual && > + > + # Only new line matches some regex - do not ignore change > + git diff -I "^11\$" -I "^100" >actual && > + compare_diff_patch plain actual > +' > + > +test_expect_success 'one line removed' ' > + test_seq 20 | sed "10d" >x && > + > + # Get plain diff > + git diff >plain && > + cat >expected <<-EOF && > + diff --git a/x b/x > + --- a/x > + +++ b/x > + @@ -7,7 +7,6 @@ > + 7 > + 8 > + 9 > + -10 > + 11 > + 12 > + 13 > + EOF > + compare_diff_patch expected plain && > + > + # Removed line matches regex - ignore change > + git diff -I "^10" >actual && > + test_must_be_empty actual && > + > + # Removed line matches some regex - ignore change > + git diff -I "^10" -I "^100" >actual && > + test_must_be_empty actual && > + > + # Removed line does not match regex - do not ignore change > + git diff -I "^101" >actual && > + compare_diff_patch plain actual && > + > + # Removed line does not match any regex - do not ignore change > + git diff -I "^100" -I "^101" >actual && > + compare_diff_patch plain actual > +' > + > +test_expect_success 'one line added' ' > + test_seq 21 >x && > + > + # Get plain diff > + git diff >plain && > + cat >expected <<-EOF && > + diff --git a/x b/x > + --- a/x > + +++ b/x > + @@ -18,3 +18,4 @@ > + 18 > + 19 > + 20 > + +21 > + EOF > + compare_diff_patch expected plain && > + > + # Added line matches regex - ignore change > + git diff -I "^21" >actual && > + test_must_be_empty actual && > + > + # Added line matches some regex - ignore change > + git diff -I "^21" -I "^22" >actual && > + test_must_be_empty actual && > + > + # Added line does not match regex - do not ignore change > + git diff -I "^212" >actual && > + compare_diff_patch plain actual && > + > + # Added line does not match any regex - do not ignore change > + git diff -I "^211" -I "^212" >actual && > + compare_diff_patch plain actual > +' > + > +test_expect_success 'last two lines changed' ' > + test_seq 20 | sed "s/19/21/; s/20/22/" >x && > + > + # Get plain diff > + git diff >plain && > + cat >expected <<-EOF && > + diff --git a/x b/x > + --- a/x > + +++ b/x > + @@ -16,5 +16,5 @@ > + 16 > + 17 > + 18 > + -19 > + -20 > + +21 > + +22 > + EOF > + compare_diff_patch expected plain && > + > + # All changed lines match regex - ignore change > + git diff -I "^[12]" >actual && > + test_must_be_empty actual && > + > + # All changed lines match some regex - ignore change > + git diff -I "^1" -I "^2" >actual && > + test_must_be_empty actual && > + > + # Not all changed lines match regex - do not ignore change > + git diff -I "^2" >actual && > + compare_diff_patch plain actual && > + > + # Not all changed lines match some regex - do not ignore change > + git diff -I "^2" -I "^3" >actual && > + compare_diff_patch plain actual > +' > + > +test_expect_success 'two non-adjacent lines removed in the same hunk' ' > + test_seq 20 | sed "1d; 3d" >x && > + > + # Get plain diff > + git diff >plain && > + cat >expected <<-EOF && > + diff --git a/x b/x > + --- a/x > + +++ b/x > + @@ -1,6 +1,4 @@ > + -1 > + 2 > + -3 > + 4 > + 5 > + 6 > + EOF > + compare_diff_patch expected plain && > + > + # Both removed lines match regex - ignore hunk > + git diff -I "^[1-3]" >actual && > + test_must_be_empty actual && > + > + # Both removed lines match some regex - ignore hunk > + git diff -I "^1" -I "^3" >actual && > + test_must_be_empty actual && > + > + # First removed line does not match regex - do not ignore hunk > + git diff -I "^[2-3]" >actual && > + compare_diff_patch plain actual && > + > + # First removed line does not match any regex - do not ignore hunk > + git diff -I "^2" -I "^3" >actual && > + compare_diff_patch plain actual && > + > + # Second removed line does not match regex - do not ignore hunk > + git diff -I "^[1-2]" >actual && > + compare_diff_patch plain actual && > + > + # Second removed line does not match any regex - do not ignore hunk > + git diff -I "^1" -I "^2" >actual && > + compare_diff_patch plain actual > +' > + > +test_expect_success 'two non-adjacent lines removed in the same hunk, with -U1' ' > + test_seq 20 | sed "1d; 3d" >x && > + > + # Get plain diff > + git diff -U1 >plain && > + cat >expected <<-EOF && > + diff --git a/x b/x > + --- a/x > + +++ b/x > + @@ -1,4 +1,2 @@ > + -1 > + 2 > + -3 > + 4 > + EOF > + compare_diff_patch expected plain && > + > + # Both removed lines match regex - ignore hunk > + git diff -U1 -I "^[1-3]" >actual && > + test_must_be_empty actual && > + > + # Both removed lines match some regex - ignore hunk > + git diff -U1 -I "^1" -I "^3" >actual && > + test_must_be_empty actual && > + > + # First removed line does not match regex, but is out of context - ignore second change > + git diff -U1 -I "^[2-3]" >actual && > + cat >second-change-ignored <<-EOF && > + diff --git a/x b/x > + --- a/x > + +++ b/x > + @@ -1,2 +1 @@ > + -1 > + 2 > + EOF > + compare_diff_patch second-change-ignored actual && > + > + # First removed line does not match any regex, but is out of context - ignore second change > + git diff -U1 -I "^2" -I "^3" >actual && > + compare_diff_patch second-change-ignored actual && > + > + # Second removed line does not match regex, but is out of context - ignore first change > + git diff -U1 -I "^[1-2]" >actual && > + cat >first-change-ignored <<-EOF && > + diff --git a/x b/x > + --- a/x > + +++ b/x > + @@ -2,3 +1,2 @@ > + 2 > + -3 > + 4 > + EOF > + compare_diff_patch first-change-ignored actual && > + > + # Second removed line does not match any regex, but is out of context - ignore first change > + git diff -U1 -I "^1" -I "^2" >actual && > + compare_diff_patch first-change-ignored actual > +' > + > +test_expect_success 'multiple hunks' ' > + test_seq 20 | sed "1d; 20d" >x && > + > + # Get plain diff > + git diff >plain && > + cat >expected <<-EOF && > + diff --git a/x b/x > + --- a/x > + +++ b/x > + @@ -1,4 +1,3 @@ > + -1 > + 2 > + 3 > + 4 > + @@ -17,4 +16,3 @@ > + 17 > + 18 > + 19 > + -20 > + EOF > + compare_diff_patch expected plain && > + > + # Ignore both hunks (single regex) > + git diff -I "^[12]" >actual && > + test_must_be_empty actual && > + > + # Ignore both hunks (multiple regexes) > + git diff -I "^1" -I "^2" >actual && > + test_must_be_empty actual && > + > + # Only ignore first hunk (single regex) > + git diff -I "^1" >actual && > + cat >first-hunk-ignored <<-EOF && > + diff --git a/x b/x > + --- a/x > + +++ b/x > + @@ -17,4 +16,3 @@ > + 17 > + 18 > + 19 > + -20 > + EOF > + compare_diff_patch first-hunk-ignored actual && > + > + # Only ignore first hunk (multiple regexes) > + git diff -I "^0" -I "^1" >actual && > + compare_diff_patch first-hunk-ignored actual && > + > + # Only ignore second hunk (single regex) > + git diff -I "^2" >actual && > + cat >second-hunk-ignored <<-EOF && > + diff --git a/x b/x > + --- a/x > + +++ b/x > + @@ -1,4 +1,3 @@ > + -1 > + 2 > + 3 > + 4 > + EOF > + compare_diff_patch second-hunk-ignored actual && > + > + # Only ignore second hunk (multiple regexes) > + git diff -I "^2" -I "^3" >actual && > + compare_diff_patch second-hunk-ignored actual > +' > + > +test_expect_success 'multiple hunks, with --ignore-blank-lines' ' > + echo >x && > + test_seq 21 >>x && > + > + # Get plain diff > + git diff >plain && > + cat >expected <<-EOF && > + diff --git a/x b/x > + --- a/x > + +++ b/x > + @@ -1,3 +1,4 @@ > + + > + 1 > + 2 > + 3 > + @@ -18,3 +19,4 @@ > + 18 > + 19 > + 20 > + +21 > + EOF > + compare_diff_patch expected plain && > + > + # -I does not override --ignore-blank-lines - ignore both hunks (single regex) > + git diff --ignore-blank-lines -I "^21" >actual && > + test_must_be_empty actual && > + > + # -I does not override --ignore-blank-lines - ignore both hunks (multiple regexes) > + git diff --ignore-blank-lines -I "^21" -I "^12" >actual && > + test_must_be_empty actual > +' > + > +test_expect_success 'diffstat' ' > + test_seq 20 | sed "s/^5/0/p; s/^15/10/; 16d" >x && > + > + # Get plain diffstat > + git diff --stat >actual && > + cat >expected <<-EOF && > + x | 6 +++--- > + 1 file changed, 3 insertions(+), 3 deletions(-) > + EOF > + test_cmp expected actual && > + > + # Ignore both hunks (single regex) > + git diff --stat -I "^[0-5]" >actual && > + test_must_be_empty actual && > + > + # Ignore both hunks (multiple regexes) > + git diff --stat -I "^0" -I "^1" -I "^5" >actual && > + test_must_be_empty actual && > + > + # Only ignore first hunk (single regex) > + git diff --stat -I "^[05]" >actual && > + cat >expected <<-EOF && > + x | 3 +-- > + 1 file changed, 1 insertion(+), 2 deletions(-) > + EOF > + test_cmp expected actual && > + > + # Only ignore first hunk (multiple regexes) > + git diff --stat -I "^0" -I "^5" >actual && > + test_cmp expected actual && > + > + # Only ignore second hunk (single regex) > + git diff --stat -I "^1" >actual && > + cat >expected <<-EOF && > + x | 3 ++- > + 1 file changed, 2 insertions(+), 1 deletion(-) > + EOF > + test_cmp expected actual && > + > + # Only ignore second hunk (multiple regexes) > + git diff --stat -I "^1" -I "^2" >actual && > + test_cmp expected actual > +' > + > +test_expect_success 'invalid regexes' ' > + >x && > + > + # Single invalid regex > + git diff -I "^[1" 2>&1 | grep "invalid regex: " && > + > + # Two regexes: first invalid, second valid > + git diff -I "^[1" -I "^1" 2>&1 | grep "invalid regex: " && > + > + # Two invalid regexes > + git diff -I "^[1" -I "^[2" 2>&1 | grep "invalid regex: " > +' > + > +test_done > -- > 2.28.0 > > ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v2 3/3] t: add -I<regex> tests 2020-10-12 11:49 ` Johannes Schindelin @ 2020-10-13 6:38 ` Michał Kępień 2020-10-13 12:00 ` Johannes Schindelin 0 siblings, 1 reply; 57+ messages in thread From: Michał Kępień @ 2020-10-13 6:38 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Hi Johannes, > > Exercise the new -I<regex> diff option in various scenarios to ensure it > > behaves as expected. > > Excellent. I was actually looking for a test in patch 2/3 and almost > commented about that. Right, I expressed my doubts in this area at the end of the cover letter for v1: >> - Should tests be added in a separate commit? This is what I did as I >> thought it would help with readability, but... I will be glad to follow any guidance provided, I just picked one of the two possible routes for v1. > Hmm. I wonder whether we could do with a much more concise test script. > The test suite already takes a quite long time to run, which is not a > laughing matter: we had issues in the past where contributors would skip > running it because it took too long, and this test is sure to exacerbate > that problem. First, let me say that the goal of minimizing the run time of a test suite is close to my heart (it is an issue at my day job). Yet, I assumed that this new test would not be detrimental to test suite run times as it takes about half a second to run t4069-diff-ignore-regex.sh on my machine - and (I hope) its contents are in line with the "tests are the best documentation" proverb. That being said, I realize that the hosts used in various test environments may have different processing capabilities. I tried preparing something exhaustive and well-commented, so that it is clear what to expect from the new feature. Yet, if you would rather have me cut some things out, I am certainly not particularly attached to the tests from patch 3 and I will be glad to rip them out if that is the recommendation :-) > I could imagine, for example, that it would be plenty enough to do > something like this instead: > > -- snip -- > diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh > index 5c7b0122b4f..bf158be137f 100755 > --- a/t/t4013-diff-various.sh > +++ b/t/t4013-diff-various.sh > @@ -6,6 +6,7 @@ > test_description='Various diff formatting options' > > . ./test-lib.sh > +. "$TEST_DIRECTORY"/diff-lib.sh > > test_expect_success setup ' > > @@ -473,4 +474,24 @@ test_expect_success 'diff-tree --stdin with log formatting' ' > test_cmp expect actual > ' > > +test_expect_success '-I<regex>' ' > + seq 50 >I.txt && > + sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt && > + test_must_fail git diff --no-index -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && > + cat >expect <<-\EOF && > + diff --git a/I.txt b/J.txt > + --- a/I.txt > + +++ b/J.txt > + @@ -34,7 +31,6 @@ > + 34 > + 35 > + 36 > + -37 > + 38 > + 39 > + 40 > + EOF > + compare_diff_patch expect actual > +' > + > test_done > -- snap -- > > Note how it tests various things in one go? Right, neat, though this does not (yet) test: - the interaction between -I and --ignore-blank-lines (this is visible in code coverage), - whether the list of hunks emitted varies for different -U<n> values, - diffstat with -I<regex>, - invalid regular expressions. Would you like me to add these tests to your proposal or to skip them, given that -I uses the same field for marking changes as ignored as --ignore-blank-lines does? > P.S.: My main interest in the `-I` option is its use case for `git > range-diff` in Git's own context: if you want to compare your patches to > what entered the `seen` branch, there will _always_ be a difference > because Junio adds their DCO. Something like this can help that: > > git range-diff \ > -I'^ Signed-off-by: Junio C Hamano <gitster@pobox.com>$' \ > <my-patch-range> <the-equivalent-in-seen> Right, makes sense, I have not thought of that use case. -- Best regards, Michał Kępień ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 3/3] t: add -I<regex> tests 2020-10-13 6:38 ` Michał Kępień @ 2020-10-13 12:00 ` Johannes Schindelin 2020-10-13 16:00 ` Junio C Hamano 2020-10-13 19:01 ` Michał Kępień 0 siblings, 2 replies; 57+ messages in thread From: Johannes Schindelin @ 2020-10-13 12:00 UTC (permalink / raw) To: Michał Kępień; +Cc: git [-- Attachment #1: Type: text/plain, Size: 3249 bytes --] Hi Michał, On Tue, 13 Oct 2020, Michał Kępień wrote: > > Hmm. I wonder whether we could do with a much more concise test script. > > The test suite already takes a quite long time to run, which is not a > > laughing matter: we had issues in the past where contributors would skip > > running it because it took too long, and this test is sure to exacerbate > > that problem. > > First, let me say that the goal of minimizing the run time of a test > suite is close to my heart (it is an issue at my day job). Yet, I > assumed that this new test would not be detrimental to test suite run > times as it takes about half a second to run t4069-diff-ignore-regex.sh > on my machine - and (I hope) its contents are in line with the "tests > are the best documentation" proverb. Sadly, the test is not quite as fast on Windows. I just ran this (on a not quite idle machine, admittedly) and it ended in this: # passed all 11 test(s) 1..11 real 0m51.470s user 0m0.046s sys 0m0.015s Yes, that's almost a minute. > > I could imagine, for example, that it would be plenty enough to do > > something like this instead: > > > > -- snip -- > > diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh > > index 5c7b0122b4f..bf158be137f 100755 > > --- a/t/t4013-diff-various.sh > > +++ b/t/t4013-diff-various.sh > > @@ -6,6 +6,7 @@ > > test_description='Various diff formatting options' > > > > . ./test-lib.sh > > +. "$TEST_DIRECTORY"/diff-lib.sh > > > > test_expect_success setup ' > > > > @@ -473,4 +474,24 @@ test_expect_success 'diff-tree --stdin with log formatting' ' > > test_cmp expect actual > > ' > > > > +test_expect_success '-I<regex>' ' > > + seq 50 >I.txt && > > + sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt && > > + test_must_fail git diff --no-index -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && > > + cat >expect <<-\EOF && > > + diff --git a/I.txt b/J.txt > > + --- a/I.txt > > + +++ b/J.txt > > + @@ -34,7 +31,6 @@ > > + 34 > > + 35 > > + 36 > > + -37 > > + 38 > > + 39 > > + 40 > > + EOF > > + compare_diff_patch expect actual > > +' > > + > > test_done > > -- snap -- > > > > Note how it tests various things in one go? > > Right, neat, though this does not (yet) test: > > - the interaction between -I and --ignore-blank-lines (this is visible > in code coverage), Right. Any chance you can finagle that in, e.g. by yet another `-e` argument to the `sed` call? > - whether the list of hunks emitted varies for different -U<n> values, I am not worried about that. > - diffstat with -I<regex>, I am not worried about that, either, as `diffstat` consumes `xdiff`'s output, therefore if one consumer works, another consumer will work, too. > - invalid regular expressions. Right, that should be super easy (and quick) to test. > Would you like me to add these tests to your proposal or to skip them, > given that -I uses the same field for marking changes as ignored as > --ignore-blank-lines does? I'd say it depends how easily (read: in how small a test case or modifications to an existing test case) you can add a test for that interaction. Thanks, Dscho ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 3/3] t: add -I<regex> tests 2020-10-13 12:00 ` Johannes Schindelin @ 2020-10-13 16:00 ` Junio C Hamano 2020-10-13 19:01 ` Michał Kępień 1 sibling, 0 replies; 57+ messages in thread From: Junio C Hamano @ 2020-10-13 16:00 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Michał Kępień, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> - diffstat with -I<regex>, > > I am not worried about that, either, as `diffstat` consumes `xdiff`'s > output, therefore if one consumer works, another consumer will work, too. Careful. Such a "we know what happens in the code" transparent-box testing attitude is laying a minefield for later our selves. As we learned in a recent bug in sequencer, some corners of implementation do the same thing in different codepaths as optimization. The really bad part of the story is that such an implementation detail can and will change over time, since that is the kind of thing we do when optimizing code. In other words, we only know what happens in the current code. And automated tests protect us from the future, when done right. If written with too intimate knowledge of how the current code works, well, what are we really testing? It's a balancing act and there is no single "right" answer, but I'd draw the line on a bit more careful side than you are drawing here. Thanks. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 3/3] t: add -I<regex> tests 2020-10-13 12:00 ` Johannes Schindelin 2020-10-13 16:00 ` Junio C Hamano @ 2020-10-13 19:01 ` Michał Kępień 2020-10-15 11:45 ` Johannes Schindelin 1 sibling, 1 reply; 57+ messages in thread From: Michał Kępień @ 2020-10-13 19:01 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Hi Johannes, > > First, let me say that the goal of minimizing the run time of a test > > suite is close to my heart (it is an issue at my day job). Yet, I > > assumed that this new test would not be detrimental to test suite run > > times as it takes about half a second to run t4069-diff-ignore-regex.sh > > on my machine - and (I hope) its contents are in line with the "tests > > are the best documentation" proverb. > > Sadly, the test is not quite as fast on Windows. I just ran this (on a not > quite idle machine, admittedly) and it ended in this: > > # passed all 11 test(s) > 1..11 > > real 0m51.470s > user 0m0.046s > sys 0m0.015s > > Yes, that's almost a minute. Out of curiosity: is that under Cygwin? I have seen shell-based tests finishing in 15 *seconds* on Unix-like systems and in 15 *minutes* under Cygwin, which would be in line with your measurements provided above. > > Right, neat, though this does not (yet) test: > > > > - the interaction between -I and --ignore-blank-lines (this is visible > > in code coverage), > > Right. Any chance you can finagle that in, e.g. by yet another `-e` > argument to the `sed` call? I will try in v3 (while also looking at what I can do for other missing -I<regex> tests I pointed out). -- Best regards, Michał Kępień ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v2 3/3] t: add -I<regex> tests 2020-10-13 19:01 ` Michał Kępień @ 2020-10-15 11:45 ` Johannes Schindelin 0 siblings, 0 replies; 57+ messages in thread From: Johannes Schindelin @ 2020-10-15 11:45 UTC (permalink / raw) To: Michał Kępień; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1146 bytes --] Hi Michał, On Tue, 13 Oct 2020, Michał Kępień wrote: > > > First, let me say that the goal of minimizing the run time of a test > > > suite is close to my heart (it is an issue at my day job). Yet, I > > > assumed that this new test would not be detrimental to test suite run > > > times as it takes about half a second to run t4069-diff-ignore-regex.sh > > > on my machine - and (I hope) its contents are in line with the "tests > > > are the best documentation" proverb. > > > > Sadly, the test is not quite as fast on Windows. I just ran this (on a not > > quite idle machine, admittedly) and it ended in this: > > > > # passed all 11 test(s) > > 1..11 > > > > real 0m51.470s > > user 0m0.046s > > sys 0m0.015s > > > > Yes, that's almost a minute. > > Out of curiosity: is that under Cygwin? I have seen shell-based tests > finishing in 15 *seconds* on Unix-like systems and in 15 *minutes* under > Cygwin, which would be in line with your measurements provided above. Close enough: it's an MSYS2 Bash (and MSYS2 is a close fork of Cygwin). It is what Git for Windows uses. Ciao, Dscho ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes 2020-10-12 9:17 ` [PATCH v2 0/3] " Michał Kępień ` (2 preceding siblings ...) 2020-10-12 9:17 ` [PATCH v2 3/3] t: add -I<regex> tests Michał Kępień @ 2020-10-15 7:24 ` Michał Kępień 2020-10-15 7:24 ` [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień ` (3 more replies) 3 siblings, 4 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-15 7:24 UTC (permalink / raw) To: git This patch series adds a new diff option that enables ignoring changes whose all lines (changed, removed, and added) match a given regular expression. This is similar to the -I/--ignore-matching-lines option in standalone diff utilities and can be used e.g. to ignore changes which only affect code comments or to look for unrelated changes in commits containing a large number of automatically applied modifications (e.g. a tree-wide string replacement). The difference between -G/-S and the new -I option is that the latter filters output on a per-change basis. Changes from v2: - Add a long option for -I (--ignore-matching-lines) as it is commonplace in standalone diff utilities. Update documentation and commit log messages accordingly. - Use xmalloc() instead of xcalloc() for allocating regex_t structures in diff_opt_ignore_regex(). - Ensure the memory allocated in diff_opt_ignore_regex() gets released. - Use "return error(...)" instead of die() in the -I option callback. Make the relevant error message localizable. - Drastically reduce the number of -I<regex> tests due to excessive run time of t/t4069-diff-ignore-regex.sh from v1/v2 on some platforms (notably Windows). Use a tweaked version of a test suggested by Johannes Schindelin (thanks!). Given its reduction in size, squash patch 3 (which contained the tests) into patch 2. - Replace "see Documentation/diff-options.txt" with "-I<regex>" in the comments for the added structure fields, in order to make these comments more useful. Changes from v1: - Add a new preliminary cleanup patch which ensures xpparam_t structures are always zero-initialized. (This was a prerequisite for the next change below.) - Do not add a new 'xdl_opts' flag to check whether -I was used; instead, just check whether the array of regular expressions to match against is non-NULL. - Enable the -I option to be used multiple times. As a consequence of this, regular expressions are now "pre-compiled" in the option's callback (and passed around as an array of regex_t structures) rather than deep down in xdiff code. Add test cases exercising use of multiple -I options in the same git invocation. Update documentation accordingly. - Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate that it is logically a "sibling" of xdl_mark_ignorable_regex() rather than its "parent". - Optimize xdl_mark_ignorable_regex() by making it immediately skip changes already marked as ignored by xdl_mark_ignorable_lines(). - Fix coding style issue in the prototype part of the definition of xdl_mark_ignorable_regex(). - Add "/* see Documentation/diff-options.txt */" comments for the fields added to struct diff_options and xpparam_t, mimicking the comments used for 'anchors', 'anchors_nr', and 'anchors_alloc'. - Revise commit log messages to reflect all of the above. Michał Kępień (2): merge-base, xdiff: zero out xpparam_t structures diff: add -I<regex> that ignores matching changes Documentation/diff-options.txt | 5 ++++ builtin/merge-tree.c | 1 + diff.c | 28 ++++++++++++++++++++ diff.h | 4 +++ t/t4013-diff-various.sh | 33 ++++++++++++++++++++++++ xdiff/xdiff.h | 4 +++ xdiff/xdiffi.c | 47 ++++++++++++++++++++++++++++++++-- xdiff/xhistogram.c | 2 ++ xdiff/xpatience.c | 2 ++ 9 files changed, 124 insertions(+), 2 deletions(-) -- 2.28.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures 2020-10-15 7:24 ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień @ 2020-10-15 7:24 ` Michał Kępień 2020-10-15 7:24 ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień ` (2 subsequent siblings) 3 siblings, 0 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-15 7:24 UTC (permalink / raw) To: git xpparam_t structures are usually zero-initialized before their specific fields are assigned to, but there are three locations in the tree where that does not happen. Add the missing memset() calls in order to make initialization of xpparam_t structures consistent tree-wide and to prevent stack garbage from being used as field values. Signed-off-by: Michał Kępień <michal@isc.org> --- builtin/merge-tree.c | 1 + xdiff/xhistogram.c | 2 ++ xdiff/xpatience.c | 2 ++ 3 files changed, 5 insertions(+) diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index e72714a5a8..de8520778d 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -109,6 +109,7 @@ static void show_diff(struct merge_list *entry) xdemitconf_t xecfg; xdemitcb_t ecb; + memset(&xpp, 0, sizeof(xpp)); xpp.flags = 0; memset(&xecfg, 0, sizeof(xecfg)); xecfg.ctxlen = 3; diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c index c7b35a9667..e694bfd9e3 100644 --- a/xdiff/xhistogram.c +++ b/xdiff/xhistogram.c @@ -235,6 +235,8 @@ static int fall_back_to_classic_diff(xpparam_t const *xpp, xdfenv_t *env, int line1, int count1, int line2, int count2) { xpparam_t xpparam; + + memset(&xpparam, 0, sizeof(xpparam)); xpparam.flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK; return xdl_fall_back_diff(env, &xpparam, diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c index 3c5601b602..20699a6f60 100644 --- a/xdiff/xpatience.c +++ b/xdiff/xpatience.c @@ -318,6 +318,8 @@ static int fall_back_to_classic_diff(struct hashmap *map, int line1, int count1, int line2, int count2) { xpparam_t xpp; + + memset(&xpp, 0, sizeof(xpp)); xpp.flags = map->xpp->flags & ~XDF_DIFF_ALGORITHM_MASK; return xdl_fall_back_diff(map->env, &xpp, -- 2.28.0 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes 2020-10-15 7:24 ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień 2020-10-15 7:24 ` [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień @ 2020-10-15 7:24 ` Michał Kępień 2020-10-16 15:32 ` Phillip Wood 2020-10-16 18:16 ` Junio C Hamano 2020-10-16 10:00 ` [PATCH v3 0/2] " Johannes Schindelin 2020-10-20 6:48 ` [PATCH v4 " Michał Kępień 3 siblings, 2 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-15 7:24 UTC (permalink / raw) To: git Add a new diff option that enables ignoring changes whose all lines (changed, removed, and added) match a given regular expression. This is similar to the -I/--ignore-matching-lines option in standalone diff utilities and can be used e.g. to ignore changes which only affect code comments or to look for unrelated changes in commits containing a large number of automatically applied modifications (e.g. a tree-wide string replacement). The difference between -G/-S and the new -I option is that the latter filters output on a per-change basis. Use the 'ignore' field of xdchange_t for marking a change as ignored or not. Since the same field is used by --ignore-blank-lines, identical hunk emitting rules apply for --ignore-blank-lines and -I. These two options can also be used together in the same git invocation (they are complementary to each other). Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate that it is logically a "sibling" of xdl_mark_ignorable_regex() rather than its "parent". Signed-off-by: Michał Kępień <michal@isc.org> --- Documentation/diff-options.txt | 5 ++++ diff.c | 28 ++++++++++++++++++++ diff.h | 4 +++ t/t4013-diff-various.sh | 33 ++++++++++++++++++++++++ xdiff/xdiff.h | 4 +++ xdiff/xdiffi.c | 47 ++++++++++++++++++++++++++++++++-- 6 files changed, 119 insertions(+), 2 deletions(-) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 573fb9bb71..ee52b65e46 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -687,6 +687,11 @@ endif::git-format-patch[] --ignore-blank-lines:: Ignore changes whose lines are all blank. +-I<regex>:: +--ignore-matching-lines=<regex>:: + Ignore changes whose all lines match <regex>. This option may + be specified more than once. + --inter-hunk-context=<lines>:: Show the context between diff hunks, up to the specified number of lines, thereby fusing hunks that are close to each other. diff --git a/diff.c b/diff.c index 2bb2f8f57e..677de23352 100644 --- a/diff.c +++ b/diff.c @@ -3587,6 +3587,8 @@ static void builtin_diff(const char *name_a, if (header.len && !o->flags.suppress_diff_headers) ecbdata.header = &header; xpp.flags = o->xdl_opts; + xpp.ignore_regex = o->ignore_regex; + xpp.ignore_regex_nr = o->ignore_regex_nr; xpp.anchors = o->anchors; xpp.anchors_nr = o->anchors_nr; xecfg.ctxlen = o->context; @@ -3716,6 +3718,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b, memset(&xpp, 0, sizeof(xpp)); memset(&xecfg, 0, sizeof(xecfg)); xpp.flags = o->xdl_opts; + xpp.ignore_regex = o->ignore_regex; + xpp.ignore_regex_nr = o->ignore_regex_nr; xpp.anchors = o->anchors; xpp.anchors_nr = o->anchors_nr; xecfg.ctxlen = o->context; @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt, return 0; } +static int diff_opt_ignore_regex(const struct option *opt, + const char *arg, int unset) +{ + struct diff_options *options = opt->value; + regex_t *regex; + + BUG_ON_OPT_NEG(unset); + regex = xmalloc(sizeof(*regex)); + if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE)) + return error(_("invalid regex given to -I: '%s'"), arg); + ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1, + options->ignore_regex_alloc); + options->ignore_regex[options->ignore_regex_nr++] = regex; + return 0; +} + static int diff_opt_pickaxe_regex(const struct option *opt, const char *arg, int unset) { @@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options) OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts, N_("ignore changes whose lines are all blank"), XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG), + OPT_CALLBACK_F('I', "ignore-matching-lines", options, N_("<regex>"), + N_("ignore changes whose all lines match <regex>"), + 0, diff_opt_ignore_regex), OPT_BIT(0, "indent-heuristic", &options->xdl_opts, N_("heuristic to shift diff hunk boundaries for easy reading"), XDF_INDENT_HEURISTIC), @@ -6405,6 +6428,11 @@ void diff_flush(struct diff_options *options) DIFF_QUEUE_CLEAR(q); if (options->close_file) fclose(options->file); + for (i = 0; i < options->ignore_regex_nr; i++) { + regfree(options->ignore_regex[i]); + free(options->ignore_regex[i]); + } + free(options->ignore_regex); /* * Report the content-level differences with HAS_CHANGES; diff --git a/diff.h b/diff.h index 11de52e9e9..a402227b80 100644 --- a/diff.h +++ b/diff.h @@ -234,6 +234,10 @@ struct diff_options { */ const char *pickaxe; + /* -I<regex> */ + regex_t **ignore_regex; + size_t ignore_regex_nr, ignore_regex_alloc; + const char *single_follow; const char *a_prefix, *b_prefix; const char *line_prefix; diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh index 5c7b0122b4..efaaee2ef0 100755 --- a/t/t4013-diff-various.sh +++ b/t/t4013-diff-various.sh @@ -6,6 +6,7 @@ test_description='Various diff formatting options' . ./test-lib.sh +. "$TEST_DIRECTORY"/diff-lib.sh test_expect_success setup ' @@ -473,4 +474,36 @@ test_expect_success 'diff-tree --stdin with log formatting' ' test_cmp expect actual ' +test_expect_success 'diff -I<regex>' ' + test_seq 50 >I.txt && + sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt && + echo >>J.txt && + + test_expect_code 1 git diff --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && + cat >expect <<-\EOF && + diff --git a/I.txt b/J.txt + --- a/I.txt + +++ b/J.txt + @@ -34,7 +31,6 @@ + 34 + 35 + 36 + -37 + 38 + 39 + 40 + EOF + compare_diff_patch expect actual && + + test_expect_code 1 git diff --stat --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && + cat >expect <<-\EOF && + I.txt => J.txt | 1 - + 1 file changed, 1 deletion(-) + EOF + test_cmp expect actual && + + test_expect_code 129 git diff --no-index --ignore-matching-lines="^[124-9]" --ignore-matching-lines="^[124-9" I.txt J.txt >output 2>&1 && + test_i18ngrep "invalid regex given to -I: " output +' + test_done diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index 032e3a9f41..7a04605146 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -79,6 +79,10 @@ typedef struct s_mmbuffer { typedef struct s_xpparam { unsigned long flags; + /* -I<regex> */ + regex_t **ignore_regex; + size_t ignore_regex_nr; + /* See Documentation/diff-options.txt. */ char **anchors; size_t anchors_nr; diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index bd035139f9..380eb728ed 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -998,7 +998,7 @@ static int xdl_call_hunk_func(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, return 0; } -static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) +static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags) { xdchange_t *xch; @@ -1019,6 +1019,46 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) } } +static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) { + regmatch_t regmatch; + int i; + + for (i = 0; i < xpp->ignore_regex_nr; i++) + if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1, + ®match, 0)) + return 1; + + return 0; +} + +static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, + xpparam_t const *xpp) +{ + xdchange_t *xch; + + for (xch = xscr; xch; xch = xch->next) { + xrecord_t **rec; + int ignore = 1; + long i; + + /* + * Do not override --ignore-blank-lines. + */ + if (xch->ignore) + continue; + + rec = &xe->xdf1.recs[xch->i1]; + for (i = 0; i < xch->chg1 && ignore; i++) + ignore = record_matches_regex(rec[i], xpp); + + rec = &xe->xdf2.recs[xch->i2]; + for (i = 0; i < xch->chg2 && ignore; i++) + ignore = record_matches_regex(rec[i], xpp); + + xch->ignore = ignore; + } +} + int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb) { xdchange_t *xscr; @@ -1038,7 +1078,10 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, } if (xscr) { if (xpp->flags & XDF_IGNORE_BLANK_LINES) - xdl_mark_ignorable(xscr, &xe, xpp->flags); + xdl_mark_ignorable_lines(xscr, &xe, xpp->flags); + + if (xpp->ignore_regex) + xdl_mark_ignorable_regex(xscr, &xe, xpp); if (ef(&xe, xscr, ecb, xecfg) < 0) { -- 2.28.0 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes 2020-10-15 7:24 ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień @ 2020-10-16 15:32 ` Phillip Wood 2020-10-16 18:04 ` Junio C Hamano 2020-10-16 18:16 ` Junio C Hamano 1 sibling, 1 reply; 57+ messages in thread From: Phillip Wood @ 2020-10-16 15:32 UTC (permalink / raw) To: Michał Kępień, git Hi Michał Thanks for working on this, it will be a useful addition. Unfortunately there's a use-after-free error see below On 15/10/2020 08:24, Michał Kępień wrote: > Add a new diff option that enables ignoring changes whose all lines > (changed, removed, and added) match a given regular expression. This is > similar to the -I/--ignore-matching-lines option in standalone diff > utilities and can be used e.g. to ignore changes which only affect code > comments or to look for unrelated changes in commits containing a large > number of automatically applied modifications (e.g. a tree-wide string > replacement). The difference between -G/-S and the new -I option is > that the latter filters output on a per-change basis. > > Use the 'ignore' field of xdchange_t for marking a change as ignored or > not. Since the same field is used by --ignore-blank-lines, identical > hunk emitting rules apply for --ignore-blank-lines and -I. These two > options can also be used together in the same git invocation (they are > complementary to each other). > > Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate > that it is logically a "sibling" of xdl_mark_ignorable_regex() rather > than its "parent". > > Signed-off-by: Michał Kępień <michal@isc.org> > --- > Documentation/diff-options.txt | 5 ++++ > diff.c | 28 ++++++++++++++++++++ > diff.h | 4 +++ > t/t4013-diff-various.sh | 33 ++++++++++++++++++++++++ > xdiff/xdiff.h | 4 +++ > xdiff/xdiffi.c | 47 ++++++++++++++++++++++++++++++++-- > 6 files changed, 119 insertions(+), 2 deletions(-) > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > index 573fb9bb71..ee52b65e46 100644 > --- a/Documentation/diff-options.txt > +++ b/Documentation/diff-options.txt > @@ -687,6 +687,11 @@ endif::git-format-patch[] > --ignore-blank-lines:: > Ignore changes whose lines are all blank. > > +-I<regex>:: > +--ignore-matching-lines=<regex>:: > + Ignore changes whose all lines match <regex>. This option may > + be specified more than once. > + > --inter-hunk-context=<lines>:: > Show the context between diff hunks, up to the specified number > of lines, thereby fusing hunks that are close to each other. > diff --git a/diff.c b/diff.c > index 2bb2f8f57e..677de23352 100644 > --- a/diff.c > +++ b/diff.c > @@ -3587,6 +3587,8 @@ static void builtin_diff(const char *name_a, > if (header.len && !o->flags.suppress_diff_headers) > ecbdata.header = &header; > xpp.flags = o->xdl_opts; > + xpp.ignore_regex = o->ignore_regex; > + xpp.ignore_regex_nr = o->ignore_regex_nr; > xpp.anchors = o->anchors; > xpp.anchors_nr = o->anchors_nr; > xecfg.ctxlen = o->context; > @@ -3716,6 +3718,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b, > memset(&xpp, 0, sizeof(xpp)); > memset(&xecfg, 0, sizeof(xecfg)); > xpp.flags = o->xdl_opts; > + xpp.ignore_regex = o->ignore_regex; > + xpp.ignore_regex_nr = o->ignore_regex_nr; > xpp.anchors = o->anchors; > xpp.anchors_nr = o->anchors_nr; > xecfg.ctxlen = o->context; > @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt, > return 0; > } > > +static int diff_opt_ignore_regex(const struct option *opt, > + const char *arg, int unset) > +{ > + struct diff_options *options = opt->value; > + regex_t *regex; > + > + BUG_ON_OPT_NEG(unset); > + regex = xmalloc(sizeof(*regex)); > + if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE)) > + return error(_("invalid regex given to -I: '%s'"), arg); > + ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1, > + options->ignore_regex_alloc); > + options->ignore_regex[options->ignore_regex_nr++] = regex; > + return 0; > +} > + > static int diff_opt_pickaxe_regex(const struct option *opt, > const char *arg, int unset) > { > @@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options) > OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts, > N_("ignore changes whose lines are all blank"), > XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG), > + OPT_CALLBACK_F('I', "ignore-matching-lines", options, N_("<regex>"), > + N_("ignore changes whose all lines match <regex>"), > + 0, diff_opt_ignore_regex), > OPT_BIT(0, "indent-heuristic", &options->xdl_opts, > N_("heuristic to shift diff hunk boundaries for easy reading"), > XDF_INDENT_HEURISTIC), > @@ -6405,6 +6428,11 @@ void diff_flush(struct diff_options *options) > DIFF_QUEUE_CLEAR(q); > if (options->close_file) > fclose(options->file); > + for (i = 0; i < options->ignore_regex_nr; i++) { > + regfree(options->ignore_regex[i]); > + free(options->ignore_regex[i]); > + } > + free(options->ignore_regex); If I run `git log -p -I foo` then the address sanitizer reports AddressSanitizer: heap-use-after-free xdiff/xdiffi.c:1027 in record_matches_regex after it has printed the diff for the first commit. I think freeing the regex here is the cause of the problem. Best Wishes Phillip > /* > * Report the content-level differences with HAS_CHANGES; > diff --git a/diff.h b/diff.h > index 11de52e9e9..a402227b80 100644 > --- a/diff.h > +++ b/diff.h > @@ -234,6 +234,10 @@ struct diff_options { > */ > const char *pickaxe; > > + /* -I<regex> */ > + regex_t **ignore_regex; > + size_t ignore_regex_nr, ignore_regex_alloc; > + > const char *single_follow; > const char *a_prefix, *b_prefix; > const char *line_prefix; > diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh > index 5c7b0122b4..efaaee2ef0 100755 > --- a/t/t4013-diff-various.sh > +++ b/t/t4013-diff-various.sh > @@ -6,6 +6,7 @@ > test_description='Various diff formatting options' > > . ./test-lib.sh > +. "$TEST_DIRECTORY"/diff-lib.sh > > test_expect_success setup ' > > @@ -473,4 +474,36 @@ test_expect_success 'diff-tree --stdin with log formatting' ' > test_cmp expect actual > ' > > +test_expect_success 'diff -I<regex>' ' > + test_seq 50 >I.txt && > + sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt && > + echo >>J.txt && > + > + test_expect_code 1 git diff --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && > + cat >expect <<-\EOF && > + diff --git a/I.txt b/J.txt > + --- a/I.txt > + +++ b/J.txt > + @@ -34,7 +31,6 @@ > + 34 > + 35 > + 36 > + -37 > + 38 > + 39 > + 40 > + EOF > + compare_diff_patch expect actual && > + > + test_expect_code 1 git diff --stat --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && > + cat >expect <<-\EOF && > + I.txt => J.txt | 1 - > + 1 file changed, 1 deletion(-) > + EOF > + test_cmp expect actual && > + > + test_expect_code 129 git diff --no-index --ignore-matching-lines="^[124-9]" --ignore-matching-lines="^[124-9" I.txt J.txt >output 2>&1 && > + test_i18ngrep "invalid regex given to -I: " output > +' > + > test_done > diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h > index 032e3a9f41..7a04605146 100644 > --- a/xdiff/xdiff.h > +++ b/xdiff/xdiff.h > @@ -79,6 +79,10 @@ typedef struct s_mmbuffer { > typedef struct s_xpparam { > unsigned long flags; > > + /* -I<regex> */ > + regex_t **ignore_regex; > + size_t ignore_regex_nr; > + > /* See Documentation/diff-options.txt. */ > char **anchors; > size_t anchors_nr; > diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c > index bd035139f9..380eb728ed 100644 > --- a/xdiff/xdiffi.c > +++ b/xdiff/xdiffi.c > @@ -998,7 +998,7 @@ static int xdl_call_hunk_func(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, > return 0; > } > > -static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) > +static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags) > { > xdchange_t *xch; > > @@ -1019,6 +1019,46 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) > } > } > > +static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) { > + regmatch_t regmatch; > + int i; > + > + for (i = 0; i < xpp->ignore_regex_nr; i++) > + if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1, > + ®match, 0)) > + return 1; > + > + return 0; > +} > + > +static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, > + xpparam_t const *xpp) > +{ > + xdchange_t *xch; > + > + for (xch = xscr; xch; xch = xch->next) { > + xrecord_t **rec; > + int ignore = 1; > + long i; > + > + /* > + * Do not override --ignore-blank-lines. > + */ > + if (xch->ignore) > + continue; > + > + rec = &xe->xdf1.recs[xch->i1]; > + for (i = 0; i < xch->chg1 && ignore; i++) > + ignore = record_matches_regex(rec[i], xpp); > + > + rec = &xe->xdf2.recs[xch->i2]; > + for (i = 0; i < xch->chg2 && ignore; i++) > + ignore = record_matches_regex(rec[i], xpp); > + > + xch->ignore = ignore; > + } > +} > + > int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > xdemitconf_t const *xecfg, xdemitcb_t *ecb) { > xdchange_t *xscr; > @@ -1038,7 +1078,10 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > } > if (xscr) { > if (xpp->flags & XDF_IGNORE_BLANK_LINES) > - xdl_mark_ignorable(xscr, &xe, xpp->flags); > + xdl_mark_ignorable_lines(xscr, &xe, xpp->flags); > + > + if (xpp->ignore_regex) > + xdl_mark_ignorable_regex(xscr, &xe, xpp); > > if (ef(&xe, xscr, ecb, xecfg) < 0) { > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes 2020-10-16 15:32 ` Phillip Wood @ 2020-10-16 18:04 ` Junio C Hamano 2020-10-19 9:48 ` Michał Kępień 0 siblings, 1 reply; 57+ messages in thread From: Junio C Hamano @ 2020-10-16 18:04 UTC (permalink / raw) To: Phillip Wood; +Cc: Michał Kępień, git Phillip Wood <phillip.wood123@gmail.com> writes: >> @@ -6405,6 +6428,11 @@ void diff_flush(struct diff_options *options) >> DIFF_QUEUE_CLEAR(q); >> if (options->close_file) >> fclose(options->file); >> + for (i = 0; i < options->ignore_regex_nr; i++) { >> + regfree(options->ignore_regex[i]); >> + free(options->ignore_regex[i]); >> + } >> + free(options->ignore_regex); > > If I run `git log -p -I foo` then the address sanitizer reports > > AddressSanitizer: heap-use-after-free xdiff/xdiffi.c:1027 in > record_matches_regex > > after it has printed the diff for the first commit. I think freeing > the regex here is the cause of the problem. Yes, this is a good location to clear the diff_queue, which is per-invocation. But diff_options->ignore_regex[] can and should be shared across multiple invocations of the diff machinery, as we are parsing upfront in the command line option parser just once to be used throughout the life of the program. It is wrong to free them early like this. I really hate to suggest this, one common API call made into the diff machinery from various consumers, when they are absolutely done with diff_options, is probably diff_result_code(). Its purpose is to collect bits computed to participate in the overall result into the final result code and anybody who ran one or more diff and wants to learn the outcome must call it, so it is unlikely that callers that matter are missing a call to finalize their uses of the diff machinery. So if freeing .ignore_regex[] is really necessary, it could be used to tell us where to do so [*1*]. Unlike per-invocation resources that MUST be freed for repeated invocations of the diff machinery made in "git log" family, resources held for and repeatedly used during the entire session like this may not have to be freed, to be honest, though. Thanks. [Footnote] *1* Adding calls to free() in diff_result_code() directly is probably not a good idea. Some callers may not even need result code (e.g. "blame") and do not call it at all, but we may want to free the resource in diff_options that is not per-invocation. So if we were to do this, we probably need a new API function, perhaps diff_options_clear() or something, and call that from diff_result_code(), and then find callers that do not care about the result code and call diff_options_clear() directly from them. If a caller does a single diff_setup_done(), makes repeated calls to the diff machinery, and calls diff_result_code() once per each iteration (e.g. imagine "git log -p" that detects the status for each commit), then having the diff_options_clear() call in diff_result_code() will break the machinery, so if we were to follow the outline given in the previous paragraph, we need to make sure there is no such caller. But I see that diff_result_code() does not clear the fields it looks at in the diff_options fields after it uses them, so the second and subsequent calls into the diff machinery by such a caller may already be broken (not in the "resource leakage" sense, but in the correctness sense). ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes 2020-10-16 18:04 ` Junio C Hamano @ 2020-10-19 9:48 ` Michał Kępień 0 siblings, 0 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-19 9:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: Phillip Wood, git First, thanks to Phillip for testing this. > Yes, this is a good location to clear the diff_queue, which is > per-invocation. But diff_options->ignore_regex[] can and should be > shared across multiple invocations of the diff machinery, as we are > parsing upfront in the command line option parser just once to be > used throughout the life of the program. It is wrong to free them > early like this. Ugh, sorry, I am only now starting to see the big picture. > I really hate to suggest this, one common API call made into the > diff machinery from various consumers, when they are absolutely done > with diff_options, is probably diff_result_code(). Its purpose is > to collect bits computed to participate in the overall result into > the final result code and anybody who ran one or more diff and wants > to learn the outcome must call it, so it is unlikely that callers > that matter are missing a call to finalize their uses of the diff > machinery. So if freeing .ignore_regex[] is really necessary, it > could be used to tell us where to do so [*1*]. > > Unlike per-invocation resources that MUST be freed for repeated > invocations of the diff machinery made in "git log" family, > resources held for and repeatedly used during the entire session > like this may not have to be freed, to be honest, though. After reading your message, taking a closer look, and doing a few Valgrind runs, I tend to agree that perhaps there is no need to free the 'ignore_regex' array after all - it does *not* get allocated repeatedly during each invocation of the diff machinery and when Git exits, the OS will reclaim any memory allocated by the process anyway. On my laptop, a Valgrind run done for a -DSUPPRESS_ANNOTATED_LEAKS build claims that 'ignore_regex' is one of 1,300+ memory blocks still reachable at exit time, which amount to about 2.5 MB of memory in total. Given this, it feels slightly weird to single 'ignore_regex' out. I will revert the diff_flush() changes in v4 unless somebody disagrees with the above reasoning. -- Best regards, Michał Kępień ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes 2020-10-15 7:24 ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień 2020-10-16 15:32 ` Phillip Wood @ 2020-10-16 18:16 ` Junio C Hamano 2020-10-19 9:55 ` Michał Kępień 1 sibling, 1 reply; 57+ messages in thread From: Junio C Hamano @ 2020-10-16 18:16 UTC (permalink / raw) To: Michał Kępień; +Cc: git Michał Kępień <michal@isc.org> writes: > +test_expect_success 'diff -I<regex>' ' > + test_seq 50 >I.txt && > + sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt && > + echo >>J.txt && > + > + test_expect_code 1 git diff --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && Please wrap an overly long line like this oned, perhaps like: test_expect_code 1 git diff --no-index --ignore-blank-lines \ -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && > + test_expect_code 1 git diff --stat --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && > + test_expect_code 129 git diff --no-index --ignore-matching-lines="^[124-9]" --ignore-matching-lines="^[124-9" I.txt J.txt >output 2>&1 && Cramming unrelated tests into a single one made me puzzled, staring at this line for longer than necessary before realizing that this is an attempt to catch a malformed regexp. If this were in a separate test with its own title, e.g. test_expect_success 'diff -I<regex>: setup' ' ... set up the sample input, perhaps sample history ... ... perhaps prepare the expected output in "expect" ... ' test_expect_success 'diff -I<regex> -p' ' git diff --I"ten.*e" ... >actual && test_cmp expect actual ' test_expect_success 'diff -I<regex> --stat' ' git diff --stat --I"ten.*e" ... >actual && test_cmp expect actual ' test_expect_success 'diff -I<regex>: detect malformed regex' test_must_fail git diff -I="[^124-9" ... 2>error && test_i18ngrep "invalid regex" error ' It would have been much easier to follow. It also is curious that it only tests --no-index, which is a bolted on feature that may not exercise the codepath the users depend on working correctly. If this were tested with "git log -p" for a few commits, the early destruction of regexp may have been caught, especially with ASAN build. Other than that, and also the premature destruction of regexp pointed out by Phillip already, looks reasonably done. Thanks. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes 2020-10-16 18:16 ` Junio C Hamano @ 2020-10-19 9:55 ` Michał Kępień 2020-10-19 17:29 ` Junio C Hamano 0 siblings, 1 reply; 57+ messages in thread From: Michał Kępień @ 2020-10-19 9:55 UTC (permalink / raw) To: Junio C Hamano; +Cc: git > Michał Kępień <michal@isc.org> writes: > > > +test_expect_success 'diff -I<regex>' ' > > + test_seq 50 >I.txt && > > + sed -e "s/13/ten and three/" -e "/7\$/d" <I.txt >J.txt && > > + echo >>J.txt && > > + > > + test_expect_code 1 git diff --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && > > Please wrap an overly long line like this oned, perhaps like: > > test_expect_code 1 git diff --no-index --ignore-blank-lines \ > -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && Yeah, thanks, I was unsure what the line length limit for test code is. I will fix this in v4. > > + test_expect_code 1 git diff --stat --no-index --ignore-blank-lines -I"ten.*e" -I"^[124-9]" I.txt J.txt >actual && > > > + test_expect_code 129 git diff --no-index --ignore-matching-lines="^[124-9]" --ignore-matching-lines="^[124-9" I.txt J.txt >output 2>&1 && > > Cramming unrelated tests into a single one made me puzzled, staring > at this line for longer than necessary before realizing that this is > an attempt to catch a malformed regexp. If this were in a separate > test with its own title, e.g. > > (...) > > It would have been much easier to follow. > > It also is curious that it only tests --no-index, which is a bolted > on feature that may not exercise the codepath the users depend on > working correctly. If this were tested with "git log -p" for a few > commits, the early destruction of regexp may have been caught, > especially with ASAN build. Sorry, I think I got a bit carried away trying to use the test suggested by Johannes while also extending it with the missing bits. I will try to come up with something more balanced in v4. -- Best regards, Michał Kępień ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes 2020-10-19 9:55 ` Michał Kępień @ 2020-10-19 17:29 ` Junio C Hamano 0 siblings, 0 replies; 57+ messages in thread From: Junio C Hamano @ 2020-10-19 17:29 UTC (permalink / raw) To: Michał Kępień; +Cc: git Michał Kępień <michal@isc.org> writes: >> Cramming unrelated tests into a single one made me puzzled, staring >> at this line for longer than necessary before realizing that this is >> an attempt to catch a malformed regexp. If this were in a separate >> test with its own title, e.g. >> >> It would have been much easier to follow. > ... I will try > to come up with something more balanced in v4. We want to avoid going overboard and doing a full test matrix. But having it tested fully with one variant while testing just the basic for other variants may give us a good balance, e.g. "git log -p -I<regex>" is tested but we do not bother testing interactions with other options", while we try different combinations when testing "git diff --no-index -I<regex>", like multiple -I options etc. Thanks. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes 2020-10-15 7:24 ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień 2020-10-15 7:24 ` [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień 2020-10-15 7:24 ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień @ 2020-10-16 10:00 ` Johannes Schindelin 2020-10-20 6:48 ` [PATCH v4 " Michał Kępień 3 siblings, 0 replies; 57+ messages in thread From: Johannes Schindelin @ 2020-10-16 10:00 UTC (permalink / raw) To: Michał Kępień; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1831 bytes --] Hi Michał, On Thu, 15 Oct 2020, Michał Kępień wrote: > This patch series adds a new diff option that enables ignoring changes > whose all lines (changed, removed, and added) match a given regular > expression. This is similar to the -I/--ignore-matching-lines option in > standalone diff utilities and can be used e.g. to ignore changes which > only affect code comments or to look for unrelated changes in commits > containing a large number of automatically applied modifications (e.g. a > tree-wide string replacement). The difference between -G/-S and the new > -I option is that the latter filters output on a per-change basis. > > Changes from v2: > > - Add a long option for -I (--ignore-matching-lines) as it is > commonplace in standalone diff utilities. Update documentation and > commit log messages accordingly. > > - Use xmalloc() instead of xcalloc() for allocating regex_t > structures in diff_opt_ignore_regex(). > > - Ensure the memory allocated in diff_opt_ignore_regex() gets > released. > > - Use "return error(...)" instead of die() in the -I option callback. > Make the relevant error message localizable. > > - Drastically reduce the number of -I<regex> tests due to excessive > run time of t/t4069-diff-ignore-regex.sh from v1/v2 on some > platforms (notably Windows). Use a tweaked version of a test > suggested by Johannes Schindelin (thanks!). Given its reduction in > size, squash patch 3 (which contained the tests) into patch 2. > > - Replace "see Documentation/diff-options.txt" with "-I<regex>" in the > comments for the added structure fields, in order to make these > comments more useful. Thank you for this diligent work! I looked over the patches and like them a lot! Thanks, Dscho ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 0/2] diff: add -I<regex> that ignores matching changes 2020-10-15 7:24 ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień ` (2 preceding siblings ...) 2020-10-16 10:00 ` [PATCH v3 0/2] " Johannes Schindelin @ 2020-10-20 6:48 ` Michał Kępień 2020-10-20 6:48 ` [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień ` (3 more replies) 3 siblings, 4 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-20 6:48 UTC (permalink / raw) To: git This patch series adds a new diff option that enables ignoring changes whose all lines (changed, removed, and added) match a given regular expression. This is similar to the -I/--ignore-matching-lines option in standalone diff utilities and can be used e.g. to ignore changes which only affect code comments or to look for unrelated changes in commits containing a large number of automatically applied modifications (e.g. a tree-wide string replacement). The difference between -G/-S and the new -I option is that the latter filters output on a per-change basis. Changes from v3: - Do not release the memory allocated in diff_opt_ignore_regex() after all as the free() calls added in v2 were triggering use-after-free errors when the diff machinery was invoked multiple times throughout the lifetime of a single Git command (e.g. by "git log -p"). - Further test improvements: split various -I<regex> checks into multiple tests; add a check for "git log -p -I<regex>" to trigger the issue mentioned above; avoid using --no-index. Changes from v2: - Add a long option for -I (--ignore-matching-lines) as it is commonplace in standalone diff utilities. Update documentation and commit log messages accordingly. - Use xmalloc() instead of xcalloc() for allocating regex_t structures in diff_opt_ignore_regex(). - Ensure the memory allocated in diff_opt_ignore_regex() gets released. - Use "return error(...)" instead of die() in the -I option callback. Make sure the error message is localizable. - Drastically reduce the number of -I<regex> tests due to excessive run time of t/t4069-diff-ignore-regex.sh from v1/v2 on some platforms (notably Windows). Use a tweaked version of a test suggested by Johannes Schindelin (thanks!). Squash patch 3 into patch 2. - Replace "see Documentation/diff-options.txt" with "-I<regex>" in the comments for the added structure fields, in order to make these comments more useful. Changes from v1: - Add a new preliminary cleanup patch which ensures xpparam_t structures are always zero-initialized. (This was a prerequisite for the next change below.) - Do not add a new 'xdl_opts' flag to check whether -I was used; instead, just check whether the array of regular expressions to match against is non-NULL. - Enable the -I option to be used multiple times. As a consequence of this, regular expressions are now "pre-compiled" in the option's callback (and passed around as an array of regex_t structures) rather than deep down in xdiff code. Add test cases exercising use of multiple -I options in the same git invocation. Update documentation accordingly. - Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate that it is logically a "sibling" of xdl_mark_ignorable_regex() rather than its "parent". - Optimize xdl_mark_ignorable_regex() by making it immediately skip changes already marked as ignored by xdl_mark_ignorable_lines(). - Fix coding style issue in the prototype part of the definition of xdl_mark_ignorable_regex(). - Add "/* see Documentation/diff-options.txt */" comments for the fields added to struct diff_options and xpparam_t, mimicking the comments used for 'anchors', 'anchors_nr', and 'anchors_alloc'. - Revise commit log messages to reflect all of the above. Michał Kępień (2): merge-base, xdiff: zero out xpparam_t structures diff: add -I<regex> that ignores matching changes Documentation/diff-options.txt | 5 ++ builtin/merge-tree.c | 1 + diff.c | 23 +++++ diff.h | 4 + t/t4013-diff-various.sh | 41 +++++++++ t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master | 99 ++++++++++++++++++++++ xdiff/xdiff.h | 4 + xdiff/xdiffi.c | 47 +++++++++- xdiff/xhistogram.c | 2 + xdiff/xpatience.c | 2 + 10 files changed, 226 insertions(+), 2 deletions(-) create mode 100644 t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master -- 2.28.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures 2020-10-20 6:48 ` [PATCH v4 " Michał Kępień @ 2020-10-20 6:48 ` Michał Kępień 2020-10-20 6:48 ` [PATCH v4 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień ` (2 subsequent siblings) 3 siblings, 0 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-20 6:48 UTC (permalink / raw) To: git xpparam_t structures are usually zero-initialized before their specific fields are assigned to, but there are three locations in the tree where that does not happen. Add the missing memset() calls in order to make initialization of xpparam_t structures consistent tree-wide and to prevent stack garbage from being used as field values. Signed-off-by: Michał Kępień <michal@isc.org> --- builtin/merge-tree.c | 1 + xdiff/xhistogram.c | 2 ++ xdiff/xpatience.c | 2 ++ 3 files changed, 5 insertions(+) diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index e72714a5a8..de8520778d 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -109,6 +109,7 @@ static void show_diff(struct merge_list *entry) xdemitconf_t xecfg; xdemitcb_t ecb; + memset(&xpp, 0, sizeof(xpp)); xpp.flags = 0; memset(&xecfg, 0, sizeof(xecfg)); xecfg.ctxlen = 3; diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c index c7b35a9667..e694bfd9e3 100644 --- a/xdiff/xhistogram.c +++ b/xdiff/xhistogram.c @@ -235,6 +235,8 @@ static int fall_back_to_classic_diff(xpparam_t const *xpp, xdfenv_t *env, int line1, int count1, int line2, int count2) { xpparam_t xpparam; + + memset(&xpparam, 0, sizeof(xpparam)); xpparam.flags = xpp->flags & ~XDF_DIFF_ALGORITHM_MASK; return xdl_fall_back_diff(env, &xpparam, diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c index 3c5601b602..20699a6f60 100644 --- a/xdiff/xpatience.c +++ b/xdiff/xpatience.c @@ -318,6 +318,8 @@ static int fall_back_to_classic_diff(struct hashmap *map, int line1, int count1, int line2, int count2) { xpparam_t xpp; + + memset(&xpp, 0, sizeof(xpp)); xpp.flags = map->xpp->flags & ~XDF_DIFF_ALGORITHM_MASK; return xdl_fall_back_diff(map->env, &xpp, -- 2.28.0 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 2/2] diff: add -I<regex> that ignores matching changes 2020-10-20 6:48 ` [PATCH v4 " Michał Kępień 2020-10-20 6:48 ` [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień @ 2020-10-20 6:48 ` Michał Kępień 2021-02-05 14:13 ` [PATCH 1/2] diff: add an API for deferred freeing Ævar Arnfjörð Bjarmason 2021-02-05 14:13 ` [PATCH " Ævar Arnfjörð Bjarmason 3 siblings, 0 replies; 57+ messages in thread From: Michał Kępień @ 2020-10-20 6:48 UTC (permalink / raw) To: git Add a new diff option that enables ignoring changes whose all lines (changed, removed, and added) match a given regular expression. This is similar to the -I/--ignore-matching-lines option in standalone diff utilities and can be used e.g. to ignore changes which only affect code comments or to look for unrelated changes in commits containing a large number of automatically applied modifications (e.g. a tree-wide string replacement). The difference between -G/-S and the new -I option is that the latter filters output on a per-change basis. Use the 'ignore' field of xdchange_t for marking a change as ignored or not. Since the same field is used by --ignore-blank-lines, identical hunk emitting rules apply for --ignore-blank-lines and -I. These two options can also be used together in the same git invocation (they are complementary to each other). Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate that it is logically a "sibling" of xdl_mark_ignorable_regex() rather than its "parent". Signed-off-by: Michał Kępień <michal@isc.org> --- Documentation/diff-options.txt | 5 ++ diff.c | 23 +++++ diff.h | 4 + t/t4013-diff-various.sh | 41 +++++++++ t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master | 99 ++++++++++++++++++++++ xdiff/xdiff.h | 4 + xdiff/xdiffi.c | 47 +++++++++- 7 files changed, 221 insertions(+), 2 deletions(-) create mode 100644 t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 573fb9bb71..ee52b65e46 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -687,6 +687,11 @@ endif::git-format-patch[] --ignore-blank-lines:: Ignore changes whose lines are all blank. +-I<regex>:: +--ignore-matching-lines=<regex>:: + Ignore changes whose all lines match <regex>. This option may + be specified more than once. + --inter-hunk-context=<lines>:: Show the context between diff hunks, up to the specified number of lines, thereby fusing hunks that are close to each other. diff --git a/diff.c b/diff.c index 2bb2f8f57e..d24f47df99 100644 --- a/diff.c +++ b/diff.c @@ -3587,6 +3587,8 @@ static void builtin_diff(const char *name_a, if (header.len && !o->flags.suppress_diff_headers) ecbdata.header = &header; xpp.flags = o->xdl_opts; + xpp.ignore_regex = o->ignore_regex; + xpp.ignore_regex_nr = o->ignore_regex_nr; xpp.anchors = o->anchors; xpp.anchors_nr = o->anchors_nr; xecfg.ctxlen = o->context; @@ -3716,6 +3718,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b, memset(&xpp, 0, sizeof(xpp)); memset(&xecfg, 0, sizeof(xecfg)); xpp.flags = o->xdl_opts; + xpp.ignore_regex = o->ignore_regex; + xpp.ignore_regex_nr = o->ignore_regex_nr; xpp.anchors = o->anchors; xpp.anchors_nr = o->anchors_nr; xecfg.ctxlen = o->context; @@ -5203,6 +5207,22 @@ static int diff_opt_patience(const struct option *opt, return 0; } +static int diff_opt_ignore_regex(const struct option *opt, + const char *arg, int unset) +{ + struct diff_options *options = opt->value; + regex_t *regex; + + BUG_ON_OPT_NEG(unset); + regex = xmalloc(sizeof(*regex)); + if (regcomp(regex, arg, REG_EXTENDED | REG_NEWLINE)) + return error(_("invalid regex given to -I: '%s'"), arg); + ALLOC_GROW(options->ignore_regex, options->ignore_regex_nr + 1, + options->ignore_regex_alloc); + options->ignore_regex[options->ignore_regex_nr++] = regex; + return 0; +} + static int diff_opt_pickaxe_regex(const struct option *opt, const char *arg, int unset) { @@ -5491,6 +5511,9 @@ static void prep_parse_options(struct diff_options *options) OPT_BIT_F(0, "ignore-blank-lines", &options->xdl_opts, N_("ignore changes whose lines are all blank"), XDF_IGNORE_BLANK_LINES, PARSE_OPT_NONEG), + OPT_CALLBACK_F('I', "ignore-matching-lines", options, N_("<regex>"), + N_("ignore changes whose all lines match <regex>"), + 0, diff_opt_ignore_regex), OPT_BIT(0, "indent-heuristic", &options->xdl_opts, N_("heuristic to shift diff hunk boundaries for easy reading"), XDF_INDENT_HEURISTIC), diff --git a/diff.h b/diff.h index 11de52e9e9..a402227b80 100644 --- a/diff.h +++ b/diff.h @@ -234,6 +234,10 @@ struct diff_options { */ const char *pickaxe; + /* -I<regex> */ + regex_t **ignore_regex; + size_t ignore_regex_nr, ignore_regex_alloc; + const char *single_follow; const char *a_prefix, *b_prefix; const char *line_prefix; diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh index 5c7b0122b4..f72d456d3b 100755 --- a/t/t4013-diff-various.sh +++ b/t/t4013-diff-various.sh @@ -6,6 +6,7 @@ test_description='Various diff formatting options' . ./test-lib.sh +. "$TEST_DIRECTORY"/diff-lib.sh test_expect_success setup ' @@ -333,6 +334,7 @@ log -SF master --max-count=2 log -GF master log -GF -p master log -GF -p --pickaxe-all master +log -IA -IB -I1 -I2 -p master log --decorate --all log --decorate=full --all @@ -473,4 +475,43 @@ test_expect_success 'diff-tree --stdin with log formatting' ' test_cmp expect actual ' +test_expect_success 'diff -I<regex>: setup' ' + git checkout master && + test_seq 50 >file0 && + git commit -m "Set up -I<regex> test file" file0 && + test_seq 50 | sed -e "s/13/ten and three/" -e "/7\$/d" >file0 && + echo >>file0 +' +test_expect_success 'diff -I<regex>' ' + git diff --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >actual && + cat >expect <<-\EOF && + diff --git a/file0 b/file0 + --- a/file0 + +++ b/file0 + @@ -34,7 +31,6 @@ + 34 + 35 + 36 + -37 + 38 + 39 + 40 + EOF + compare_diff_patch expect actual +' + +test_expect_success 'diff -I<regex> --stat' ' + git diff --stat --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >actual && + cat >expect <<-\EOF && + file0 | 1 - + 1 file changed, 1 deletion(-) + EOF + test_cmp expect actual +' + +test_expect_success 'diff -I<regex>: detect malformed regex' ' + test_expect_code 129 git diff --ignore-matching-lines="^[124-9" 2>error && + test_i18ngrep "invalid regex given to -I: " error +' + test_done diff --git a/t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master b/t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master new file mode 100644 index 0000000000..929f35a05b --- /dev/null +++ b/t/t4013/diff.log_-IA_-IB_-I1_-I2_-p_master @@ -0,0 +1,99 @@ +$ git log -IA -IB -I1 -I2 -p master +commit 59d314ad6f356dd08601a4cd5e530381da3e3c64 +Merge: 9a6d494 c7a2ab9 +Author: A U Thor <author@example.com> +Date: Mon Jun 26 00:04:00 2006 +0000 + + Merge branch 'side' + +commit c7a2ab9e8eac7b117442a607d5a9b3950ae34d5a +Author: A U Thor <author@example.com> +Date: Mon Jun 26 00:03:00 2006 +0000 + + Side + +diff --git a/file0 b/file0 +index 01e79c3..f4615da 100644 +--- a/file0 ++++ b/file0 +@@ -1,3 +1,6 @@ + 1 + 2 + 3 ++A ++B ++C +diff --git a/file3 b/file3 +new file mode 100644 +index 0000000..7289e35 + +commit 9a6d4949b6b76956d9d5e26f2791ec2ceff5fdc0 +Author: A U Thor <author@example.com> +Date: Mon Jun 26 00:02:00 2006 +0000 + + Third + +diff --git a/dir/sub b/dir/sub +index 8422d40..cead32e 100644 +--- a/dir/sub ++++ b/dir/sub +@@ -2,3 +2,5 @@ A + B + C + D ++E ++F +diff --git a/file1 b/file1 +new file mode 100644 +index 0000000..b1e6722 +--- /dev/null ++++ b/file1 +@@ -0,0 +1,3 @@ ++A ++B ++C + +commit 1bde4ae5f36c8d9abe3a0fce0c6aab3c4a12fe44 +Author: A U Thor <author@example.com> +Date: Mon Jun 26 00:01:00 2006 +0000 + + Second + + This is the second commit. + +diff --git a/dir/sub b/dir/sub +index 35d242b..8422d40 100644 +--- a/dir/sub ++++ b/dir/sub +@@ -1,2 +1,4 @@ + A + B ++C ++D +diff --git a/file0 b/file0 +index 01e79c3..b414108 100644 +--- a/file0 ++++ b/file0 +@@ -1,3 +1,6 @@ + 1 + 2 + 3 ++4 ++5 ++6 +diff --git a/file2 b/file2 +deleted file mode 100644 +index 01e79c3..0000000 +--- a/file2 ++++ /dev/null +@@ -1,3 +0,0 @@ +-1 +-2 +-3 + +commit 444ac553ac7612cc88969031b02b3767fb8a353a +Author: A U Thor <author@example.com> +Date: Mon Jun 26 00:00:00 2006 +0000 + + Initial +$ diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index 032e3a9f41..7a04605146 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -79,6 +79,10 @@ typedef struct s_mmbuffer { typedef struct s_xpparam { unsigned long flags; + /* -I<regex> */ + regex_t **ignore_regex; + size_t ignore_regex_nr; + /* See Documentation/diff-options.txt. */ char **anchors; size_t anchors_nr; diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index bd035139f9..380eb728ed 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -998,7 +998,7 @@ static int xdl_call_hunk_func(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, return 0; } -static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) +static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags) { xdchange_t *xch; @@ -1019,6 +1019,46 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) } } +static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) { + regmatch_t regmatch; + int i; + + for (i = 0; i < xpp->ignore_regex_nr; i++) + if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1, + ®match, 0)) + return 1; + + return 0; +} + +static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, + xpparam_t const *xpp) +{ + xdchange_t *xch; + + for (xch = xscr; xch; xch = xch->next) { + xrecord_t **rec; + int ignore = 1; + long i; + + /* + * Do not override --ignore-blank-lines. + */ + if (xch->ignore) + continue; + + rec = &xe->xdf1.recs[xch->i1]; + for (i = 0; i < xch->chg1 && ignore; i++) + ignore = record_matches_regex(rec[i], xpp); + + rec = &xe->xdf2.recs[xch->i2]; + for (i = 0; i < xch->chg2 && ignore; i++) + ignore = record_matches_regex(rec[i], xpp); + + xch->ignore = ignore; + } +} + int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb) { xdchange_t *xscr; @@ -1038,7 +1078,10 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, } if (xscr) { if (xpp->flags & XDF_IGNORE_BLANK_LINES) - xdl_mark_ignorable(xscr, &xe, xpp->flags); + xdl_mark_ignorable_lines(xscr, &xe, xpp->flags); + + if (xpp->ignore_regex) + xdl_mark_ignorable_regex(xscr, &xe, xpp); if (ef(&xe, xscr, ecb, xecfg) < 0) { -- 2.28.0 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH 1/2] diff: add an API for deferred freeing 2020-10-20 6:48 ` [PATCH v4 " Michał Kępień 2020-10-20 6:48 ` [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień 2020-10-20 6:48 ` [PATCH v4 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień @ 2021-02-05 14:13 ` Ævar Arnfjörð Bjarmason 2021-02-10 16:00 ` Johannes Schindelin 2021-02-05 14:13 ` [PATCH " Ævar Arnfjörð Bjarmason 3 siblings, 1 reply; 57+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-05 14:13 UTC (permalink / raw) To: git Cc: Junio C Hamano, Michał Kępień, Johannes Schindelin, Phillip Wood, Ævar Arnfjörð Bjarmason Add a diff_free() function to free anything we may have allocated in the "diff_options" struct, and the ability to make calling it a noop by setting "no_free" in "diff_options". This is required because when e.g. "git diff" is run we'll allocate things in that struct, use the diff machinery once, and then exit, but if we run e.g. "git log -p" we're going to re-use what we allocated across multiple diff_flush() calls, and only want to free things at the end. We've thus ended up with features like the recently added "diff -I"[1] where we'll leak memory. As it turns out it could have simply used the pattern established in 6ea57703f6 (log: prepare log/log-tree to reuse the diffopt.close_file attribute, 2016-06-22). Manually adding more such flags to things log_tree_commit() every time we need to allocate something would be tedious. Let's instead move that fclose() code it to a new diff_free(), in anticipation of freeing more things in that function in follow-up commits. I'm renaming the "close_file" struct member to "fclose_file" for the ease of validating this, we can be certain that these are all the relevant callsites. Some functions such as log_tree_commit() need an idiom of optionally retaining a previous "no_free", as they may either free the memory themselves, or their caller may do so. I'm keeping that idiom in log_show_early() even though I don't think it's currently called in this manner, since it also gets passed an existing "struct rev_info".. 1. 296d4a94e7 (diff: add -I<regex> that ignores matching changes, 2020-10-20) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- builtin/add.c | 2 +- builtin/am.c | 6 ++++-- builtin/log.c | 27 ++++++++++++++------------- diff.c | 18 +++++++++++++----- diff.h | 14 +++++++++++++- log-tree.c | 10 ++++++---- wt-status.c | 2 +- 7 files changed, 52 insertions(+), 27 deletions(-) diff --git a/builtin/add.c b/builtin/add.c index a825887c50..6319710186 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -282,7 +282,7 @@ static int edit_patch(int argc, const char **argv, const char *prefix) if (out < 0) die(_("Could not open '%s' for writing."), file); rev.diffopt.file = xfdopen(out, "w"); - rev.diffopt.close_file = 1; + rev.diffopt.fclose_file = 1; if (run_diff_files(&rev, 0)) die(_("Could not write patch")); diff --git a/builtin/am.c b/builtin/am.c index 8355e3566f..157d264583 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -1315,7 +1315,7 @@ static void write_commit_patch(const struct am_state *state, struct commit *comm rev_info.diffopt.flags.full_index = 1; rev_info.diffopt.use_color = 0; rev_info.diffopt.file = fp; - rev_info.diffopt.close_file = 1; + rev_info.diffopt.fclose_file = 1; /* log_tree_commit() sets .no_free=1 */ add_pending_object(&rev_info, &commit->object, ""); diff_setup_done(&rev_info.diffopt); log_tree_commit(&rev_info, commit); @@ -1347,10 +1347,12 @@ static void write_index_patch(const struct am_state *state) rev_info.diffopt.output_format = DIFF_FORMAT_PATCH; rev_info.diffopt.use_color = 0; rev_info.diffopt.file = fp; - rev_info.diffopt.close_file = 1; + rev_info.diffopt.fclose_file = 1; + rev_info.diffopt.no_free = 1; add_pending_object(&rev_info, &tree->object, ""); diff_setup_done(&rev_info.diffopt); run_diff_index(&rev_info, 1); + diff_free(&rev_info.diffopt); } /** diff --git a/builtin/log.c b/builtin/log.c index fd282def43..604ee29ec0 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -306,10 +306,11 @@ static struct itimerval early_output_timer; static void log_show_early(struct rev_info *revs, struct commit_list *list) { - int i = revs->early_output, close_file = revs->diffopt.close_file; + int i = revs->early_output; int show_header = 1; + int no_free = revs->diffopt.no_free; - revs->diffopt.close_file = 0; + revs->diffopt.no_free = 0; sort_in_topological_order(&list, revs->sort_order); while (list && i) { struct commit *commit = list->item; @@ -326,8 +327,8 @@ static void log_show_early(struct rev_info *revs, struct commit_list *list) case commit_ignore: break; case commit_error: - if (close_file) - fclose(revs->diffopt.file); + revs->diffopt.no_free = no_free; + diff_free(&revs->diffopt); return; } list = list->next; @@ -335,8 +336,8 @@ static void log_show_early(struct rev_info *revs, struct commit_list *list) /* Did we already get enough commits for the early output? */ if (!i) { - if (close_file) - fclose(revs->diffopt.file); + revs->diffopt.no_free = 0; + diff_free(&revs->diffopt); return; } @@ -400,7 +401,7 @@ static int cmd_log_walk(struct rev_info *rev) { struct commit *commit; int saved_nrl = 0; - int saved_dcctc = 0, close_file = rev->diffopt.close_file; + int saved_dcctc = 0; if (rev->early_output) setup_early_output(); @@ -416,7 +417,7 @@ static int cmd_log_walk(struct rev_info *rev) * and HAS_CHANGES being accumulated in rev->diffopt, so be careful to * retain that state information if replacing rev->diffopt in this loop */ - rev->diffopt.close_file = 0; + rev->diffopt.no_free = 1; while ((commit = get_revision(rev)) != NULL) { if (!log_tree_commit(rev, commit) && rev->max_count >= 0) /* @@ -441,8 +442,8 @@ static int cmd_log_walk(struct rev_info *rev) } rev->diffopt.degraded_cc_to_c = saved_dcctc; rev->diffopt.needed_rename_limit = saved_nrl; - if (close_file) - fclose(rev->diffopt.file); + rev->diffopt.no_free = 0; + diff_free(&rev->diffopt); if (rev->diffopt.output_format & DIFF_FORMAT_CHECKDIFF && rev->diffopt.flags.check_failed) { @@ -1952,18 +1953,18 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) if (rev.show_notes) load_display_notes(&rev.notes_opt); - if (use_stdout + rev.diffopt.close_file + !!output_directory > 1) + if (use_stdout + rev.diffopt.fclose_file + !!output_directory > 1) die(_("--stdout, --output, and --output-directory are mutually exclusive")); if (use_stdout) { setup_pager(); - } else if (rev.diffopt.close_file) { + } else if (rev.diffopt.fclose_file) { /* * The diff code parsed --output; it has already opened the * file, but but we must instruct it not to close after each * diff. */ - rev.diffopt.close_file = 0; + rev.diffopt.no_free = 1; } else { int saved; diff --git a/diff.c b/diff.c index 69e3bc00ed..3e6f8f0a71 100644 --- a/diff.c +++ b/diff.c @@ -5187,7 +5187,7 @@ static enum parse_opt_result diff_opt_output(struct parse_opt_ctx_t *ctx, BUG_ON_OPT_NEG(unset); path = prefix_filename(ctx->prefix, arg); options->file = xfopen(path, "w"); - options->close_file = 1; + options->fclose_file = 1; if (options->use_color != GIT_COLOR_ALWAYS) options->use_color = GIT_COLOR_NEVER; free(path); @@ -6399,10 +6399,10 @@ void diff_flush(struct diff_options *options) * options->file to /dev/null should be safe, because we * aren't supposed to produce any output anyway. */ - if (options->close_file) + if (options->fclose_file) fclose(options->file); options->file = xfopen("/dev/null", "w"); - options->close_file = 1; + options->fclose_file = 1; options->color_moved = 0; for (i = 0; i < q->nr; i++) { struct diff_filepair *p = q->queue[i]; @@ -6433,8 +6433,7 @@ void diff_flush(struct diff_options *options) free_queue: free(q->queue); DIFF_QUEUE_CLEAR(q); - if (options->close_file) - fclose(options->file); + diff_free(options); /* * Report the content-level differences with HAS_CHANGES; @@ -6449,6 +6448,15 @@ void diff_flush(struct diff_options *options) } } +void diff_free(struct diff_options *options) +{ + if (options->no_free) + return; + if (options->fclose_file) + fclose(options->file); +} + + static int match_filter(const struct diff_options *options, const struct diff_filepair *p) { return (((p->status == DIFF_STATUS_MODIFIED) && diff --git a/diff.h b/diff.h index 2ff2b1c7f2..d1d74c3a9e 100644 --- a/diff.h +++ b/diff.h @@ -49,7 +49,16 @@ * - Once you finish feeding the pairs of files, call `diffcore_std()`. * This will tell the diffcore library to go ahead and do its work. * + * - The `diff_opt_parse()` etc. functions might allocate memory in + * `struct diff_options`. When running the API `N > 1` set `.no_free + * = 1` to make the `diff_free()` invoked by `diff_flush()` below a + * noop. + * * - Calling `diff_flush()` will produce the output. + * + * - If you set `.no_free = 1` before set it to `0` and call + * `diff_free()` again. If `.no_free = 1` was not set there's no + * need to call `diff_free()`, `diff_flush()` will call it. */ struct combine_diff_path; @@ -328,7 +337,7 @@ struct diff_options { void (*set_default)(struct diff_options *); FILE *file; - int close_file; + int fclose_file; #define OUTPUT_INDICATOR_NEW 0 #define OUTPUT_INDICATOR_OLD 1 @@ -365,6 +374,8 @@ struct diff_options { struct repository *repo; struct option *parseopts; + + int no_free; }; unsigned diff_filter_bit(char status); @@ -559,6 +570,7 @@ void diffcore_fix_diff_index(void); int diff_queue_is_empty(void); void diff_flush(struct diff_options*); +void diff_free(struct diff_options*); void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc); /* diff-raw status letters */ diff --git a/log-tree.c b/log-tree.c index fd0dde97ec..bb946f15f1 100644 --- a/log-tree.c +++ b/log-tree.c @@ -952,12 +952,14 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log int log_tree_commit(struct rev_info *opt, struct commit *commit) { struct log_info log; - int shown, close_file = opt->diffopt.close_file; + int shown; + /* maybe called by e.g. cmd_log_walk(), maybe stand-alone */ + int no_free = opt->diffopt.no_free; log.commit = commit; log.parent = NULL; opt->loginfo = &log; - opt->diffopt.close_file = 0; + opt->diffopt.no_free = 1; if (opt->line_level_traverse) return line_log_print(opt, commit); @@ -974,7 +976,7 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit) fprintf(opt->diffopt.file, "\n%s\n", opt->break_bar); opt->loginfo = NULL; maybe_flush_or_die(opt->diffopt.file, "stdout"); - if (close_file) - fclose(opt->diffopt.file); + opt->diffopt.no_free = no_free; + diff_free(&opt->diffopt); return shown; } diff --git a/wt-status.c b/wt-status.c index 0c8287a023..41b08474e5 100644 --- a/wt-status.c +++ b/wt-status.c @@ -1052,7 +1052,7 @@ static void wt_longstatus_print_verbose(struct wt_status *s) rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit; rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score; rev.diffopt.file = s->fp; - rev.diffopt.close_file = 0; + rev.diffopt.fclose_file = 0; /* wt_status owns the s->fp */ /* * If we're not going to stdout, then we definitely don't * want color, since we are going to the commit message -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] diff: add an API for deferred freeing 2021-02-05 14:13 ` [PATCH 1/2] diff: add an API for deferred freeing Ævar Arnfjörð Bjarmason @ 2021-02-10 16:00 ` Johannes Schindelin 2021-02-11 3:00 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 57+ messages in thread From: Johannes Schindelin @ 2021-02-10 16:00 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Michał Kępień, Phillip Wood [-- Attachment #1: Type: text/plain, Size: 12871 bytes --] Hi Ævar, On Fri, 5 Feb 2021, Ævar Arnfjörð Bjarmason wrote: > Add a diff_free() function to free anything we may have allocated in > the "diff_options" struct, and the ability to make calling it a noop > by setting "no_free" in "diff_options". Why do we need a `no_free` flag? Why not simply set the `free()`d (or `fclose()`d) attributes to `NULL`? > This is required because when e.g. "git diff" is run we'll allocate > things in that struct, use the diff machinery once, and then exit, but > if we run e.g. "git log -p" we're going to re-use what we allocated > across multiple diff_flush() calls, and only want to free things at > the end. > > We've thus ended up with features like the recently added "diff -I"[1] > where we'll leak memory. As it turns out it could have simply used the > pattern established in 6ea57703f6 (log: prepare log/log-tree to reuse > the diffopt.close_file attribute, 2016-06-22). > > Manually adding more such flags to things log_tree_commit() every time > we need to allocate something would be tedious. > > Let's instead move that fclose() code it to a new diff_free(), in > anticipation of freeing more things in that function in follow-up > commits. I'm renaming the "close_file" struct member to "fclose_file" > for the ease of validating this, we can be certain that these are all > the relevant callsites. I like that idea a lot. > Some functions such as log_tree_commit() need an idiom of optionally > retaining a previous "no_free", as they may either free the memory > themselves, or their caller may do so. I'm keeping that idiom in > log_show_early() even though I don't think it's currently called in > this manner, since it also gets passed an existing "struct rev_info".. > > 1. 296d4a94e7 (diff: add -I<regex> that ignores matching changes, > 2020-10-20) > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > builtin/add.c | 2 +- > builtin/am.c | 6 ++++-- > builtin/log.c | 27 ++++++++++++++------------- > diff.c | 18 +++++++++++++----- > diff.h | 14 +++++++++++++- > log-tree.c | 10 ++++++---- > wt-status.c | 2 +- > 7 files changed, 52 insertions(+), 27 deletions(-) > > diff --git a/builtin/add.c b/builtin/add.c > index a825887c50..6319710186 100644 > --- a/builtin/add.c > +++ b/builtin/add.c > @@ -282,7 +282,7 @@ static int edit_patch(int argc, const char **argv, const char *prefix) > if (out < 0) > die(_("Could not open '%s' for writing."), file); > rev.diffopt.file = xfdopen(out, "w"); > - rev.diffopt.close_file = 1; > + rev.diffopt.fclose_file = 1; This rename makes the patch unnecessarily tedious to review, and I do not even agree with leaking the implementation detail that we need to `fclose()` the file. Let's just not? > if (run_diff_files(&rev, 0)) > die(_("Could not write patch")); > > diff --git a/builtin/am.c b/builtin/am.c > index 8355e3566f..157d264583 100644 > --- a/builtin/am.c > +++ b/builtin/am.c > @@ -1315,7 +1315,7 @@ static void write_commit_patch(const struct am_state *state, struct commit *comm > rev_info.diffopt.flags.full_index = 1; > rev_info.diffopt.use_color = 0; > rev_info.diffopt.file = fp; > - rev_info.diffopt.close_file = 1; > + rev_info.diffopt.fclose_file = 1; /* log_tree_commit() sets .no_free=1 */ > add_pending_object(&rev_info, &commit->object, ""); > diff_setup_done(&rev_info.diffopt); > log_tree_commit(&rev_info, commit); > @@ -1347,10 +1347,12 @@ static void write_index_patch(const struct am_state *state) > rev_info.diffopt.output_format = DIFF_FORMAT_PATCH; > rev_info.diffopt.use_color = 0; > rev_info.diffopt.file = fp; > - rev_info.diffopt.close_file = 1; > + rev_info.diffopt.fclose_file = 1; > + rev_info.diffopt.no_free = 1; > add_pending_object(&rev_info, &tree->object, ""); > diff_setup_done(&rev_info.diffopt); > run_diff_index(&rev_info, 1); > + diff_free(&rev_info.diffopt); > } > > /** > diff --git a/builtin/log.c b/builtin/log.c > index fd282def43..604ee29ec0 100644 > --- a/builtin/log.c > +++ b/builtin/log.c > @@ -306,10 +306,11 @@ static struct itimerval early_output_timer; > > static void log_show_early(struct rev_info *revs, struct commit_list *list) > { > - int i = revs->early_output, close_file = revs->diffopt.close_file; > + int i = revs->early_output; > int show_header = 1; > + int no_free = revs->diffopt.no_free; > > - revs->diffopt.close_file = 0; > + revs->diffopt.no_free = 0; > sort_in_topological_order(&list, revs->sort_order); > while (list && i) { > struct commit *commit = list->item; > @@ -326,8 +327,8 @@ static void log_show_early(struct rev_info *revs, struct commit_list *list) > case commit_ignore: > break; > case commit_error: > - if (close_file) > - fclose(revs->diffopt.file); > + revs->diffopt.no_free = no_free; > + diff_free(&revs->diffopt); > return; > } > list = list->next; > @@ -335,8 +336,8 @@ static void log_show_early(struct rev_info *revs, struct commit_list *list) > > /* Did we already get enough commits for the early output? */ > if (!i) { > - if (close_file) > - fclose(revs->diffopt.file); > + revs->diffopt.no_free = 0; > + diff_free(&revs->diffopt); > return; > } > > @@ -400,7 +401,7 @@ static int cmd_log_walk(struct rev_info *rev) > { > struct commit *commit; > int saved_nrl = 0; > - int saved_dcctc = 0, close_file = rev->diffopt.close_file; > + int saved_dcctc = 0; > > if (rev->early_output) > setup_early_output(); > @@ -416,7 +417,7 @@ static int cmd_log_walk(struct rev_info *rev) > * and HAS_CHANGES being accumulated in rev->diffopt, so be careful to > * retain that state information if replacing rev->diffopt in this loop > */ > - rev->diffopt.close_file = 0; > + rev->diffopt.no_free = 1; > while ((commit = get_revision(rev)) != NULL) { > if (!log_tree_commit(rev, commit) && rev->max_count >= 0) > /* > @@ -441,8 +442,8 @@ static int cmd_log_walk(struct rev_info *rev) > } > rev->diffopt.degraded_cc_to_c = saved_dcctc; > rev->diffopt.needed_rename_limit = saved_nrl; > - if (close_file) > - fclose(rev->diffopt.file); > + rev->diffopt.no_free = 0; > + diff_free(&rev->diffopt); > > if (rev->diffopt.output_format & DIFF_FORMAT_CHECKDIFF && > rev->diffopt.flags.check_failed) { > @@ -1952,18 +1953,18 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) > if (rev.show_notes) > load_display_notes(&rev.notes_opt); > > - if (use_stdout + rev.diffopt.close_file + !!output_directory > 1) > + if (use_stdout + rev.diffopt.fclose_file + !!output_directory > 1) > die(_("--stdout, --output, and --output-directory are mutually exclusive")); > > if (use_stdout) { > setup_pager(); > - } else if (rev.diffopt.close_file) { > + } else if (rev.diffopt.fclose_file) { > /* > * The diff code parsed --output; it has already opened the > * file, but but we must instruct it not to close after each > * diff. > */ > - rev.diffopt.close_file = 0; > + rev.diffopt.no_free = 1; > } else { > int saved; > > diff --git a/diff.c b/diff.c > index 69e3bc00ed..3e6f8f0a71 100644 > --- a/diff.c > +++ b/diff.c > @@ -5187,7 +5187,7 @@ static enum parse_opt_result diff_opt_output(struct parse_opt_ctx_t *ctx, > BUG_ON_OPT_NEG(unset); > path = prefix_filename(ctx->prefix, arg); > options->file = xfopen(path, "w"); > - options->close_file = 1; > + options->fclose_file = 1; > if (options->use_color != GIT_COLOR_ALWAYS) > options->use_color = GIT_COLOR_NEVER; > free(path); > @@ -6399,10 +6399,10 @@ void diff_flush(struct diff_options *options) > * options->file to /dev/null should be safe, because we > * aren't supposed to produce any output anyway. > */ > - if (options->close_file) > + if (options->fclose_file) > fclose(options->file); And at this stage, we should set `options->file = NULL`. > options->file = xfopen("/dev/null", "w"); > - options->close_file = 1; > + options->fclose_file = 1; > options->color_moved = 0; > for (i = 0; i < q->nr; i++) { > struct diff_filepair *p = q->queue[i]; > @@ -6433,8 +6433,7 @@ void diff_flush(struct diff_options *options) > free_queue: > free(q->queue); > DIFF_QUEUE_CLEAR(q); > - if (options->close_file) > - fclose(options->file); > + diff_free(options); > > /* > * Report the content-level differences with HAS_CHANGES; > @@ -6449,6 +6448,15 @@ void diff_flush(struct diff_options *options) > } > } > > +void diff_free(struct diff_options *options) > +{ > + if (options->no_free) > + return; > + if (options->fclose_file) > + fclose(options->file); And at this stage, we should set `options->file = NULL`. > +} > + > + > static int match_filter(const struct diff_options *options, const struct diff_filepair *p) > { > return (((p->status == DIFF_STATUS_MODIFIED) && > diff --git a/diff.h b/diff.h > index 2ff2b1c7f2..d1d74c3a9e 100644 > --- a/diff.h > +++ b/diff.h > @@ -49,7 +49,16 @@ > * - Once you finish feeding the pairs of files, call `diffcore_std()`. > * This will tell the diffcore library to go ahead and do its work. > * > + * - The `diff_opt_parse()` etc. functions might allocate memory in > + * `struct diff_options`. When running the API `N > 1` set `.no_free > + * = 1` to make the `diff_free()` invoked by `diff_flush()` below a > + * noop. I have serious trouble parsing that last sentence. Would you mind rephrasing it? > + * > * - Calling `diff_flush()` will produce the output. > + * > + * - If you set `.no_free = 1` before set it to `0` and call > + * `diff_free()` again. If `.no_free = 1` was not set there's no > + * need to call `diff_free()`, `diff_flush()` will call it. Again, as I mentioned before, the indicator whether things need to be released should be whether the attribute is `NULL` or not. And once released, ot should be set to `NULL`. Other than that, it looks fine to me. And I definitely like the idea of introducing a function to release all of the stuff in `struct diffopt`. Thanks, Dscho > */ > > struct combine_diff_path; > @@ -328,7 +337,7 @@ struct diff_options { > void (*set_default)(struct diff_options *); > > FILE *file; > - int close_file; > + int fclose_file; > > #define OUTPUT_INDICATOR_NEW 0 > #define OUTPUT_INDICATOR_OLD 1 > @@ -365,6 +374,8 @@ struct diff_options { > > struct repository *repo; > struct option *parseopts; > + > + int no_free; > }; > > unsigned diff_filter_bit(char status); > @@ -559,6 +570,7 @@ void diffcore_fix_diff_index(void); > > int diff_queue_is_empty(void); > void diff_flush(struct diff_options*); > +void diff_free(struct diff_options*); > void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc); > > /* diff-raw status letters */ > diff --git a/log-tree.c b/log-tree.c > index fd0dde97ec..bb946f15f1 100644 > --- a/log-tree.c > +++ b/log-tree.c > @@ -952,12 +952,14 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log > int log_tree_commit(struct rev_info *opt, struct commit *commit) > { > struct log_info log; > - int shown, close_file = opt->diffopt.close_file; > + int shown; > + /* maybe called by e.g. cmd_log_walk(), maybe stand-alone */ > + int no_free = opt->diffopt.no_free; > > log.commit = commit; > log.parent = NULL; > opt->loginfo = &log; > - opt->diffopt.close_file = 0; > + opt->diffopt.no_free = 1; > > if (opt->line_level_traverse) > return line_log_print(opt, commit); > @@ -974,7 +976,7 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit) > fprintf(opt->diffopt.file, "\n%s\n", opt->break_bar); > opt->loginfo = NULL; > maybe_flush_or_die(opt->diffopt.file, "stdout"); > - if (close_file) > - fclose(opt->diffopt.file); > + opt->diffopt.no_free = no_free; > + diff_free(&opt->diffopt); > return shown; > } > diff --git a/wt-status.c b/wt-status.c > index 0c8287a023..41b08474e5 100644 > --- a/wt-status.c > +++ b/wt-status.c > @@ -1052,7 +1052,7 @@ static void wt_longstatus_print_verbose(struct wt_status *s) > rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit; > rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score; > rev.diffopt.file = s->fp; > - rev.diffopt.close_file = 0; > + rev.diffopt.fclose_file = 0; /* wt_status owns the s->fp */ > /* > * If we're not going to stdout, then we definitely don't > * want color, since we are going to the commit message > -- > 2.30.0.284.gd98b1dd5eaa7 > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] diff: add an API for deferred freeing 2021-02-10 16:00 ` Johannes Schindelin @ 2021-02-11 3:00 ` Ævar Arnfjörð Bjarmason 2021-02-11 9:40 ` Johannes Schindelin 0 siblings, 1 reply; 57+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-11 3:00 UTC (permalink / raw) To: Johannes Schindelin Cc: git, Junio C Hamano, Michał Kępień, Phillip Wood On Wed, Feb 10 2021, Johannes Schindelin wrote: > Hi Ævar, > > On Fri, 5 Feb 2021, Ævar Arnfjörð Bjarmason wrote: > >> Add a diff_free() function to free anything we may have allocated in >> the "diff_options" struct, and the ability to make calling it a noop >> by setting "no_free" in "diff_options". > > Why do we need a `no_free` flag? Why not simply set the `free()`d (or > `fclose()`d) attributes to `NULL`? Because we're calling the diff API N times, so we need to not have those be NULL while we're using them in a loop, and we need to not free() them, and then flip "no_free = 0" at the end and free() them. >> diff --git a/builtin/add.c b/builtin/add.c >> index a825887c50..6319710186 100644 >> --- a/builtin/add.c >> +++ b/builtin/add.c >> @@ -282,7 +282,7 @@ static int edit_patch(int argc, const char **argv, const char *prefix) >> if (out < 0) >> die(_("Could not open '%s' for writing."), file); >> rev.diffopt.file = xfdopen(out, "w"); >> - rev.diffopt.close_file = 1; >> + rev.diffopt.fclose_file = 1; > > This rename makes the patch unnecessarily tedious to review, and I do not > even agree with leaking the implementation detail that we need to > `fclose()` the file. > > Let's just not? Fair enough, I figured it would be easier for reviewers to reason about to be guaranteed to see all the uses of the flag, but I'll drop it. >> @@ -6399,10 +6399,10 @@ void diff_flush(struct diff_options *options) >> * options->file to /dev/null should be safe, because we >> * aren't supposed to produce any output anyway. >> */ >> - if (options->close_file) >> + if (options->fclose_file) >> fclose(options->file); > > And at this stage, we should set `options->file = NULL`. Sure, why not, but unless we get rid of the need for "close_file" there's not much point other than ease of inspection in a debugger...[cont]. >> [...] >> +void diff_free(struct diff_options *options) >> +{ >> + if (options->no_free) >> + return; >> + if (options->fclose_file) >> + fclose(options->file); > > And at this stage, we should set `options->file = NULL`. ...ditto...[cont] >> +} >> + >> + >> static int match_filter(const struct diff_options *options, const struct diff_filepair *p) >> { >> return (((p->status == DIFF_STATUS_MODIFIED) && >> diff --git a/diff.h b/diff.h >> index 2ff2b1c7f2..d1d74c3a9e 100644 >> --- a/diff.h >> +++ b/diff.h >> @@ -49,7 +49,16 @@ >> * - Once you finish feeding the pairs of files, call `diffcore_std()`. >> * This will tell the diffcore library to go ahead and do its work. >> * >> + * - The `diff_opt_parse()` etc. functions might allocate memory in >> + * `struct diff_options`. When running the API `N > 1` set `.no_free >> + * = 1` to make the `diff_free()` invoked by `diff_flush()` below a >> + * noop. > > I have serious trouble parsing that last sentence. Would you mind > rephrasing it? Willdo. >> + * >> * - Calling `diff_flush()` will produce the output. >> + * >> + * - If you set `.no_free = 1` before set it to `0` and call >> + * `diff_free()` again. If `.no_free = 1` was not set there's no >> + * need to call `diff_free()`, `diff_flush()` will call it. > > Again, as I mentioned before, the indicator whether things need to be > released should be whether the attribute is `NULL` or not. And once > released, ot should be set to `NULL`. > > Other than that, it looks fine to me. And I definitely like the idea of > introducing a function to release all of the stuff in `struct diffopt`. [cont]...I don't think it's possible in anything like the current API to do that. We don't want to "if (file) fclose(file)", instead we need an opt-in "if (close_file) fclose(file)". That's because usually the "file" is stdout. So close_file=1 is only set when we're opening a "real" file, /dev/null or whatever. That along with the "no_free" flag as noted above means we need both a "no_free" and "close_file" flags. Unless there's some way of re-arranging this that I'm missing. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] diff: add an API for deferred freeing 2021-02-11 3:00 ` Ævar Arnfjörð Bjarmason @ 2021-02-11 9:40 ` Johannes Schindelin 2021-02-11 10:21 ` Jeff King 0 siblings, 1 reply; 57+ messages in thread From: Johannes Schindelin @ 2021-02-11 9:40 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Michał Kępień, Phillip Wood [-- Attachment #1: Type: text/plain, Size: 934 bytes --] Hi Ævar, On Thu, 11 Feb 2021, Ævar Arnfjörð Bjarmason wrote: > On Wed, Feb 10 2021, Johannes Schindelin wrote: > > > On Fri, 5 Feb 2021, Ævar Arnfjörð Bjarmason wrote: > > > >> Add a diff_free() function to free anything we may have allocated in > >> the "diff_options" struct, and the ability to make calling it a noop > >> by setting "no_free" in "diff_options". > > > > Why do we need a `no_free` flag? Why not simply set the `free()`d (or > > `fclose()`d) attributes to `NULL`? Hmm. That was not even clear to me until I read this reply. Doesn't this indicate that the closing is done at the wrong layer? If we want to call a function N times and only at the last iteration should it clean up our resources, doesn't that indicate that the clean-up should be pulled out from that function? I thought that's exactly what you did, but I must have glanced over something obvious... Ciao, Dscho ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] diff: add an API for deferred freeing 2021-02-11 9:40 ` Johannes Schindelin @ 2021-02-11 10:21 ` Jeff King 2021-02-11 10:45 ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason ` (2 more replies) 0 siblings, 3 replies; 57+ messages in thread From: Jeff King @ 2021-02-11 10:21 UTC (permalink / raw) To: Johannes Schindelin Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano, Michał Kępień, Phillip Wood On Thu, Feb 11, 2021 at 10:40:45AM +0100, Johannes Schindelin wrote: > On Thu, 11 Feb 2021, Ævar Arnfjörð Bjarmason wrote: > > > On Wed, Feb 10 2021, Johannes Schindelin wrote: > > > > > On Fri, 5 Feb 2021, Ævar Arnfjörð Bjarmason wrote: > > > > > >> Add a diff_free() function to free anything we may have allocated in > > >> the "diff_options" struct, and the ability to make calling it a noop > > >> by setting "no_free" in "diff_options". > > > > > > Why do we need a `no_free` flag? Why not simply set the `free()`d (or > > > `fclose()`d) attributes to `NULL`? > > Hmm. That was not even clear to me until I read this reply. > > Doesn't this indicate that the closing is done at the wrong layer? If we > want to call a function N times and only at the last iteration should it > clean up our resources, doesn't that indicate that the clean-up should be > pulled out from that function? > > I thought that's exactly what you did, but I must have glanced over > something obvious... I think the issue then is that every caller must now be modified to do the clean up (which otherwise happens automatically when flushing the diff). So I definitely agree that the current API is weird and backwards. But flipping it now may introduce new leaks in existing callers. -Peff ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v2 0/2] diff: add an API for deferred freeing 2021-02-11 10:21 ` Jeff King @ 2021-02-11 10:45 ` Ævar Arnfjörð Bjarmason 2021-02-11 10:45 ` [PATCH v2 1/2] " Ævar Arnfjörð Bjarmason 2021-02-11 10:45 ` [PATCH v2 2/2] diff: plug memory leak from regcomp() on {log,diff} -I Ævar Arnfjörð Bjarmason 2 siblings, 0 replies; 57+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-11 10:45 UTC (permalink / raw) To: git Cc: Junio C Hamano, Johannes Schindelin, Michał Kępień, Phillip Wood, Jeff King, Ævar Arnfjörð Bjarmason I skipped the rename of the "close_file" flag, and updated the code & commit messages in response to Johannes's feedback on v1. Hopefully this is all better now. Ævar Arnfjörð Bjarmason (2): diff: add an API for deferred freeing diff: plug memory leak from regcomp() on {log,diff} -I builtin/log.c | 23 ++++++++++++----------- diff.c | 32 ++++++++++++++++++++++++++++---- diff.h | 15 ++++++++++++++- log-tree.c | 10 ++++++---- 4 files changed, 60 insertions(+), 20 deletions(-) Range-diff: 1: 531fed77f4c ! 1: 045d3f72d15 diff: add an API for deferred freeing @@ Commit message by setting "no_free" in "diff_options". This is required because when e.g. "git diff" is run we'll allocate - things in that struct, use the diff machinery once, and then exit, but - if we run e.g. "git log -p" we're going to re-use what we allocated - across multiple diff_flush() calls, and only want to free things at - the end. + things in that struct, use the diff machinery once, and then exit. + + But if we run e.g. "git log -p" we're going to re-use what we + allocated across multiple diff_flush() calls, and only want to free + things at the end. We've thus ended up with features like the recently added "diff -I"[1] where we'll leak memory. As it turns out it could have simply used the @@ Commit message the diffopt.close_file attribute, 2016-06-22). Manually adding more such flags to things log_tree_commit() every time - we need to allocate something would be tedious. - - Let's instead move that fclose() code it to a new diff_free(), in - anticipation of freeing more things in that function in follow-up - commits. I'm renaming the "close_file" struct member to "fclose_file" - for the ease of validating this, we can be certain that these are all - the relevant callsites. + we need to allocate something would be tedious. Let's instead move + that fclose() code it to a new diff_free(), in anticipation of freeing + more things in that function in follow-up commits. Some functions such as log_tree_commit() need an idiom of optionally retaining a previous "no_free", as they may either free the memory themselves, or their caller may do so. I'm keeping that idiom in - log_show_early() even though I don't think it's currently called in - this manner, since it also gets passed an existing "struct rev_info".. + log_show_early() for good measure, even though I don't think it's + currently called in this manner. It also gets passed an existing + "struct rev_info", so future callers may want to set the "no_free" + flag. + + This change is a bit hard to read because while the freeing pattern + we're introducing isn't unusual, the "file" member is a special + snowflake. We usually don't want to fclose() it. This is because + "file" is usually stdout, in which case we don't want to fclose() + it. We only want to opt-in to closing it when we e.g. open a file on + the filesystem. Thus the opt-in "close_file" flag. + + So the API in general just needs a "no_free" flag to defer freeing, + but the "file" member still needs its "close_file" flag. This is made + more confusing because while refactoring this code we could replace + some "close_file=0" with "no_free=1", whereas others need to set both + flags. + + This is because there were some cases where an existing "close_file=0" + meant "let's defer deallocation", and others where it meant "we don't + want to close this file handle at all". 1. 296d4a94e7 (diff: add -I<regex> that ignores matching changes, 2020-10-20) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> - ## builtin/add.c ## -@@ builtin/add.c: static int edit_patch(int argc, const char **argv, const char *prefix) - if (out < 0) - die(_("Could not open '%s' for writing."), file); - rev.diffopt.file = xfdopen(out, "w"); -- rev.diffopt.close_file = 1; -+ rev.diffopt.fclose_file = 1; - if (run_diff_files(&rev, 0)) - die(_("Could not write patch")); - - - ## builtin/am.c ## -@@ builtin/am.c: static void write_commit_patch(const struct am_state *state, struct commit *comm - rev_info.diffopt.flags.full_index = 1; - rev_info.diffopt.use_color = 0; - rev_info.diffopt.file = fp; -- rev_info.diffopt.close_file = 1; -+ rev_info.diffopt.fclose_file = 1; /* log_tree_commit() sets .no_free=1 */ - add_pending_object(&rev_info, &commit->object, ""); - diff_setup_done(&rev_info.diffopt); - log_tree_commit(&rev_info, commit); -@@ builtin/am.c: static void write_index_patch(const struct am_state *state) - rev_info.diffopt.output_format = DIFF_FORMAT_PATCH; - rev_info.diffopt.use_color = 0; - rev_info.diffopt.file = fp; -- rev_info.diffopt.close_file = 1; -+ rev_info.diffopt.fclose_file = 1; -+ rev_info.diffopt.no_free = 1; - add_pending_object(&rev_info, &tree->object, ""); - diff_setup_done(&rev_info.diffopt); - run_diff_index(&rev_info, 1); -+ diff_free(&rev_info.diffopt); - } - - /** - ## builtin/log.c ## @@ builtin/log.c: static struct itimerval early_output_timer; @@ builtin/log.c: static int cmd_log_walk(struct rev_info *rev) if (rev->diffopt.output_format & DIFF_FORMAT_CHECKDIFF && rev->diffopt.flags.check_failed) { @@ builtin/log.c: int cmd_format_patch(int argc, const char **argv, const char *prefix) - if (rev.show_notes) - load_display_notes(&rev.notes_opt); - -- if (use_stdout + rev.diffopt.close_file + !!output_directory > 1) -+ if (use_stdout + rev.diffopt.fclose_file + !!output_directory > 1) - die(_("--stdout, --output, and --output-directory are mutually exclusive")); - - if (use_stdout) { - setup_pager(); -- } else if (rev.diffopt.close_file) { -+ } else if (rev.diffopt.fclose_file) { - /* - * The diff code parsed --output; it has already opened the * file, but but we must instruct it not to close after each * diff. */ @@ builtin/log.c: int cmd_format_patch(int argc, const char **argv, const char *pre ## diff.c ## -@@ diff.c: static enum parse_opt_result diff_opt_output(struct parse_opt_ctx_t *ctx, - BUG_ON_OPT_NEG(unset); - path = prefix_filename(ctx->prefix, arg); - options->file = xfopen(path, "w"); -- options->close_file = 1; -+ options->fclose_file = 1; - if (options->use_color != GIT_COLOR_ALWAYS) - options->use_color = GIT_COLOR_NEVER; - free(path); +@@ diff.c: static void diff_flush_patch_all_file_pairs(struct diff_options *o) + } + } + ++static void diff_free_file(struct diff_options *options) ++{ ++ if (options->close_file) ++ fclose(options->file); ++} ++ ++void diff_free(struct diff_options *options) ++{ ++ if (options->no_free) ++ return; ++ ++ diff_free_file(options); ++} ++ + void diff_flush(struct diff_options *options) + { + struct diff_queue_struct *q = &diff_queued_diff; @@ diff.c: void diff_flush(struct diff_options *options) * options->file to /dev/null should be safe, because we * aren't supposed to produce any output anyway. */ - if (options->close_file) -+ if (options->fclose_file) - fclose(options->file); +- fclose(options->file); ++ diff_free_file(options); options->file = xfopen("/dev/null", "w"); -- options->close_file = 1; -+ options->fclose_file = 1; + options->close_file = 1; options->color_moved = 0; - for (i = 0; i < q->nr; i++) { - struct diff_filepair *p = q->queue[i]; @@ diff.c: void diff_flush(struct diff_options *options) free_queue: free(q->queue); @@ diff.c: void diff_flush(struct diff_options *options) /* * Report the content-level differences with HAS_CHANGES; -@@ diff.c: void diff_flush(struct diff_options *options) - } - } - -+void diff_free(struct diff_options *options) -+{ -+ if (options->no_free) -+ return; -+ if (options->fclose_file) -+ fclose(options->file); -+} -+ -+ - static int match_filter(const struct diff_options *options, const struct diff_filepair *p) - { - return (((p->status == DIFF_STATUS_MODIFIED) && ## diff.h ## @@ * - Once you finish feeding the pairs of files, call `diffcore_std()`. * This will tell the diffcore library to go ahead and do its work. * -+ * - The `diff_opt_parse()` etc. functions might allocate memory in -+ * `struct diff_options`. When running the API `N > 1` set `.no_free -+ * = 1` to make the `diff_free()` invoked by `diff_flush()` below a -+ * noop. -+ * - * - Calling `diff_flush()` will produce the output. +- * - Calling `diff_flush()` will produce the output. ++ * - Calling `diff_flush()` will produce the output, it will call ++ * `diff_free()` to free any resources, e.g. those allocated in ++ * `diff_opt_parse()`. + * -+ * - If you set `.no_free = 1` before set it to `0` and call -+ * `diff_free()` again. If `.no_free = 1` was not set there's no -+ * need to call `diff_free()`, `diff_flush()` will call it. ++ * - Set `.no_free = 1` before calling `diff_flush()` to defer the ++ * freeing of allocated memory in diff_options. This is useful when ++ * `diff_flush()` is being called in a loop, rather than as a ++ * one-off. When setting `.no_free = 1` you must ensure that ++ * `diff_free()` is called at the end, either by flipping the flag ++ * before the last `diff_flush()` call, or by flipping it before ++ * calling `diff_free()` yourself. */ struct combine_diff_path; @@ diff.h: struct diff_options { - void (*set_default)(struct diff_options *); - - FILE *file; -- int close_file; -+ int fclose_file; - - #define OUTPUT_INDICATOR_NEW 0 - #define OUTPUT_INDICATOR_OLD 1 -@@ diff.h: struct diff_options { struct repository *repo; struct option *parseopts; @@ log-tree.c: int log_tree_commit(struct rev_info *opt, struct commit *commit) + diff_free(&opt->diffopt); return shown; } - - ## wt-status.c ## -@@ wt-status.c: static void wt_longstatus_print_verbose(struct wt_status *s) - rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit; - rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score; - rev.diffopt.file = s->fp; -- rev.diffopt.close_file = 0; -+ rev.diffopt.fclose_file = 0; /* wt_status owns the s->fp */ - /* - * If we're not going to stdout, then we definitely don't - * want color, since we are going to the commit message 2: 7192cf01e71 ! 2: f571524e6d8 diff: plug memory leak from regcomp() on {log,diff} -I @@ Commit message At that time freeing the memory was somewhat tedious, but since it isn't anymore with the newly introduced diff_free() let's use it. + Let's retain the pattern for diff_free_file() and add a + diff_free_ignore_regex(), even though (unlike "diff_free_file") we + don't need to call it elsewhere. I think this'll make for more + readable code than gradually accumulating a giant diff_free() + function, sharing "int i" across unrelated code etc. + Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> ## diff.c ## -@@ diff.c: void diff_flush(struct diff_options *options) +@@ diff.c: static void diff_free_file(struct diff_options *options) + fclose(options->file); + } - void diff_free(struct diff_options *options) - { ++static void diff_free_ignore_regex(struct diff_options *options) ++{ + int i; - if (options->no_free) - return; - if (options->fclose_file) - fclose(options->file); + + for (i = 0; i < options->ignore_regex_nr; i++) { + regfree(options->ignore_regex[i]); + free(options->ignore_regex[i]); + } + free(options->ignore_regex); ++} ++ + void diff_free(struct diff_options *options) + { + if (options->no_free) + return; + + diff_free_file(options); ++ diff_free_ignore_regex(options); } - + void diff_flush(struct diff_options *options) -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v2 1/2] diff: add an API for deferred freeing 2021-02-11 10:21 ` Jeff King 2021-02-11 10:45 ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason @ 2021-02-11 10:45 ` Ævar Arnfjörð Bjarmason 2021-02-11 10:45 ` [PATCH v2 2/2] diff: plug memory leak from regcomp() on {log,diff} -I Ævar Arnfjörð Bjarmason 2 siblings, 0 replies; 57+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-11 10:45 UTC (permalink / raw) To: git Cc: Junio C Hamano, Johannes Schindelin, Michał Kępień, Phillip Wood, Jeff King, Ævar Arnfjörð Bjarmason Add a diff_free() function to free anything we may have allocated in the "diff_options" struct, and the ability to make calling it a noop by setting "no_free" in "diff_options". This is required because when e.g. "git diff" is run we'll allocate things in that struct, use the diff machinery once, and then exit. But if we run e.g. "git log -p" we're going to re-use what we allocated across multiple diff_flush() calls, and only want to free things at the end. We've thus ended up with features like the recently added "diff -I"[1] where we'll leak memory. As it turns out it could have simply used the pattern established in 6ea57703f6 (log: prepare log/log-tree to reuse the diffopt.close_file attribute, 2016-06-22). Manually adding more such flags to things log_tree_commit() every time we need to allocate something would be tedious. Let's instead move that fclose() code it to a new diff_free(), in anticipation of freeing more things in that function in follow-up commits. Some functions such as log_tree_commit() need an idiom of optionally retaining a previous "no_free", as they may either free the memory themselves, or their caller may do so. I'm keeping that idiom in log_show_early() for good measure, even though I don't think it's currently called in this manner. It also gets passed an existing "struct rev_info", so future callers may want to set the "no_free" flag. This change is a bit hard to read because while the freeing pattern we're introducing isn't unusual, the "file" member is a special snowflake. We usually don't want to fclose() it. This is because "file" is usually stdout, in which case we don't want to fclose() it. We only want to opt-in to closing it when we e.g. open a file on the filesystem. Thus the opt-in "close_file" flag. So the API in general just needs a "no_free" flag to defer freeing, but the "file" member still needs its "close_file" flag. This is made more confusing because while refactoring this code we could replace some "close_file=0" with "no_free=1", whereas others need to set both flags. This is because there were some cases where an existing "close_file=0" meant "let's defer deallocation", and others where it meant "we don't want to close this file handle at all". 1. 296d4a94e7 (diff: add -I<regex> that ignores matching changes, 2020-10-20) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- builtin/log.c | 23 ++++++++++++----------- diff.c | 20 ++++++++++++++++---- diff.h | 15 ++++++++++++++- log-tree.c | 10 ++++++---- 4 files changed, 48 insertions(+), 20 deletions(-) diff --git a/builtin/log.c b/builtin/log.c index d0cbaaf68a0..fffaf51970d 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -307,10 +307,11 @@ static struct itimerval early_output_timer; static void log_show_early(struct rev_info *revs, struct commit_list *list) { - int i = revs->early_output, close_file = revs->diffopt.close_file; + int i = revs->early_output; int show_header = 1; + int no_free = revs->diffopt.no_free; - revs->diffopt.close_file = 0; + revs->diffopt.no_free = 0; sort_in_topological_order(&list, revs->sort_order); while (list && i) { struct commit *commit = list->item; @@ -327,8 +328,8 @@ static void log_show_early(struct rev_info *revs, struct commit_list *list) case commit_ignore: break; case commit_error: - if (close_file) - fclose(revs->diffopt.file); + revs->diffopt.no_free = no_free; + diff_free(&revs->diffopt); return; } list = list->next; @@ -336,8 +337,8 @@ static void log_show_early(struct rev_info *revs, struct commit_list *list) /* Did we already get enough commits for the early output? */ if (!i) { - if (close_file) - fclose(revs->diffopt.file); + revs->diffopt.no_free = 0; + diff_free(&revs->diffopt); return; } @@ -401,7 +402,7 @@ static int cmd_log_walk(struct rev_info *rev) { struct commit *commit; int saved_nrl = 0; - int saved_dcctc = 0, close_file = rev->diffopt.close_file; + int saved_dcctc = 0; if (rev->early_output) setup_early_output(); @@ -417,7 +418,7 @@ static int cmd_log_walk(struct rev_info *rev) * and HAS_CHANGES being accumulated in rev->diffopt, so be careful to * retain that state information if replacing rev->diffopt in this loop */ - rev->diffopt.close_file = 0; + rev->diffopt.no_free = 1; while ((commit = get_revision(rev)) != NULL) { if (!log_tree_commit(rev, commit) && rev->max_count >= 0) /* @@ -442,8 +443,8 @@ static int cmd_log_walk(struct rev_info *rev) } rev->diffopt.degraded_cc_to_c = saved_dcctc; rev->diffopt.needed_rename_limit = saved_nrl; - if (close_file) - fclose(rev->diffopt.file); + rev->diffopt.no_free = 0; + diff_free(&rev->diffopt); if (rev->diffopt.output_format & DIFF_FORMAT_CHECKDIFF && rev->diffopt.flags.check_failed) { @@ -1955,7 +1956,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) * file, but but we must instruct it not to close after each * diff. */ - rev.diffopt.close_file = 0; + rev.diffopt.no_free = 1; } else { int saved; diff --git a/diff.c b/diff.c index 69e3bc00ed8..a63c9ecae79 100644 --- a/diff.c +++ b/diff.c @@ -6336,6 +6336,20 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o) } } +static void diff_free_file(struct diff_options *options) +{ + if (options->close_file) + fclose(options->file); +} + +void diff_free(struct diff_options *options) +{ + if (options->no_free) + return; + + diff_free_file(options); +} + void diff_flush(struct diff_options *options) { struct diff_queue_struct *q = &diff_queued_diff; @@ -6399,8 +6413,7 @@ void diff_flush(struct diff_options *options) * options->file to /dev/null should be safe, because we * aren't supposed to produce any output anyway. */ - if (options->close_file) - fclose(options->file); + diff_free_file(options); options->file = xfopen("/dev/null", "w"); options->close_file = 1; options->color_moved = 0; @@ -6433,8 +6446,7 @@ void diff_flush(struct diff_options *options) free_queue: free(q->queue); DIFF_QUEUE_CLEAR(q); - if (options->close_file) - fclose(options->file); + diff_free(options); /* * Report the content-level differences with HAS_CHANGES; diff --git a/diff.h b/diff.h index 2ff2b1c7f2c..527fb56d851 100644 --- a/diff.h +++ b/diff.h @@ -49,7 +49,17 @@ * - Once you finish feeding the pairs of files, call `diffcore_std()`. * This will tell the diffcore library to go ahead and do its work. * - * - Calling `diff_flush()` will produce the output. + * - Calling `diff_flush()` will produce the output, it will call + * `diff_free()` to free any resources, e.g. those allocated in + * `diff_opt_parse()`. + * + * - Set `.no_free = 1` before calling `diff_flush()` to defer the + * freeing of allocated memory in diff_options. This is useful when + * `diff_flush()` is being called in a loop, rather than as a + * one-off. When setting `.no_free = 1` you must ensure that + * `diff_free()` is called at the end, either by flipping the flag + * before the last `diff_flush()` call, or by flipping it before + * calling `diff_free()` yourself. */ struct combine_diff_path; @@ -365,6 +375,8 @@ struct diff_options { struct repository *repo; struct option *parseopts; + + int no_free; }; unsigned diff_filter_bit(char status); @@ -559,6 +571,7 @@ void diffcore_fix_diff_index(void); int diff_queue_is_empty(void); void diff_flush(struct diff_options*); +void diff_free(struct diff_options*); void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc); /* diff-raw status letters */ diff --git a/log-tree.c b/log-tree.c index e0484676507..e7fcd70ba17 100644 --- a/log-tree.c +++ b/log-tree.c @@ -958,12 +958,14 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log int log_tree_commit(struct rev_info *opt, struct commit *commit) { struct log_info log; - int shown, close_file = opt->diffopt.close_file; + int shown; + /* maybe called by e.g. cmd_log_walk(), maybe stand-alone */ + int no_free = opt->diffopt.no_free; log.commit = commit; log.parent = NULL; opt->loginfo = &log; - opt->diffopt.close_file = 0; + opt->diffopt.no_free = 1; if (opt->line_level_traverse) return line_log_print(opt, commit); @@ -980,7 +982,7 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit) fprintf(opt->diffopt.file, "\n%s\n", opt->break_bar); opt->loginfo = NULL; maybe_flush_or_die(opt->diffopt.file, "stdout"); - if (close_file) - fclose(opt->diffopt.file); + opt->diffopt.no_free = no_free; + diff_free(&opt->diffopt); return shown; } -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v2 2/2] diff: plug memory leak from regcomp() on {log,diff} -I 2021-02-11 10:21 ` Jeff King 2021-02-11 10:45 ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason 2021-02-11 10:45 ` [PATCH v2 1/2] " Ævar Arnfjörð Bjarmason @ 2021-02-11 10:45 ` Ævar Arnfjörð Bjarmason 2 siblings, 0 replies; 57+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-11 10:45 UTC (permalink / raw) To: git Cc: Junio C Hamano, Johannes Schindelin, Michał Kępień, Phillip Wood, Jeff King, Ævar Arnfjörð Bjarmason Fix a memory leak in 296d4a94e7 (diff: add -I<regex> that ignores matching changes, 2020-10-20) by freeing the memory it allocates in the newly introduced diff_free(). See the previous commit for details on that. This memory leak was intentionally introduced in 296d4a94e7, see the discussion on a previous iteration of it in https://lore.kernel.org/git/xmqqeelycajx.fsf@gitster.c.googlers.com/ At that time freeing the memory was somewhat tedious, but since it isn't anymore with the newly introduced diff_free() let's use it. Let's retain the pattern for diff_free_file() and add a diff_free_ignore_regex(), even though (unlike "diff_free_file") we don't need to call it elsewhere. I think this'll make for more readable code than gradually accumulating a giant diff_free() function, sharing "int i" across unrelated code etc. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- diff.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/diff.c b/diff.c index a63c9ecae79..bf2cbf15e77 100644 --- a/diff.c +++ b/diff.c @@ -6342,12 +6342,24 @@ static void diff_free_file(struct diff_options *options) fclose(options->file); } +static void diff_free_ignore_regex(struct diff_options *options) +{ + int i; + + for (i = 0; i < options->ignore_regex_nr; i++) { + regfree(options->ignore_regex[i]); + free(options->ignore_regex[i]); + } + free(options->ignore_regex); +} + void diff_free(struct diff_options *options) { if (options->no_free) return; diff_free_file(options); + diff_free_ignore_regex(options); } void diff_flush(struct diff_options *options) -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH 2/2] diff: plug memory leak from regcomp() on {log,diff} -I 2020-10-20 6:48 ` [PATCH v4 " Michał Kępień ` (2 preceding siblings ...) 2021-02-05 14:13 ` [PATCH 1/2] diff: add an API for deferred freeing Ævar Arnfjörð Bjarmason @ 2021-02-05 14:13 ` Ævar Arnfjörð Bjarmason 3 siblings, 0 replies; 57+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-02-05 14:13 UTC (permalink / raw) To: git Cc: Junio C Hamano, Michał Kępień, Johannes Schindelin, Phillip Wood, Ævar Arnfjörð Bjarmason Fix a memory leak in 296d4a94e7 (diff: add -I<regex> that ignores matching changes, 2020-10-20) by freeing the memory it allocates in the newly introduced diff_free(). See the previous commit for details on that. This memory leak was intentionally introduced in 296d4a94e7, see the discussion on a previous iteration of it in https://lore.kernel.org/git/xmqqeelycajx.fsf@gitster.c.googlers.com/ At that time freeing the memory was somewhat tedious, but since it isn't anymore with the newly introduced diff_free() let's use it. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- diff.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/diff.c b/diff.c index 3e6f8f0a71..24257759bb 100644 --- a/diff.c +++ b/diff.c @@ -6450,10 +6450,17 @@ void diff_flush(struct diff_options *options) void diff_free(struct diff_options *options) { + int i; if (options->no_free) return; if (options->fclose_file) fclose(options->file); + + for (i = 0; i < options->ignore_regex_nr; i++) { + regfree(options->ignore_regex[i]); + free(options->ignore_regex[i]); + } + free(options->ignore_regex); } -- 2.30.0.284.gd98b1dd5eaa7 ^ permalink raw reply related [flat|nested] 57+ messages in thread
end of thread, other threads:[~2021-02-11 10:50 UTC | newest] Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień 2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień 2020-10-01 18:21 ` Junio C Hamano 2020-10-07 19:48 ` Michał Kępień 2020-10-07 20:08 ` Junio C Hamano 2020-10-01 12:06 ` [PATCH 2/2] t: add -I<regex> tests Michał Kępień 2020-10-01 17:02 ` [PATCH 0/2] diff: add -I<regex> that ignores matching changes Junio C Hamano 2020-10-12 9:17 ` [PATCH v2 0/3] " Michał Kępień 2020-10-12 9:17 ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień 2020-10-12 11:14 ` Johannes Schindelin 2020-10-12 17:09 ` Junio C Hamano 2020-10-12 19:52 ` Junio C Hamano 2020-10-13 6:35 ` Michał Kępień 2020-10-12 9:17 ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień 2020-10-12 11:20 ` Johannes Schindelin 2020-10-12 20:00 ` Junio C Hamano 2020-10-12 20:39 ` Johannes Schindelin 2020-10-12 21:43 ` Junio C Hamano 2020-10-13 6:37 ` Michał Kępień 2020-10-13 15:49 ` Junio C Hamano 2020-10-13 6:36 ` Michał Kępień 2020-10-13 12:02 ` Johannes Schindelin 2020-10-13 15:53 ` Junio C Hamano 2020-10-13 18:45 ` Michał Kępień 2020-10-12 18:01 ` Junio C Hamano 2020-10-13 6:38 ` Michał Kępień 2020-10-12 20:04 ` Junio C Hamano 2020-10-13 6:38 ` Michał Kępień 2020-10-12 9:17 ` [PATCH v2 3/3] t: add -I<regex> tests Michał Kępień 2020-10-12 11:49 ` Johannes Schindelin 2020-10-13 6:38 ` Michał Kępień 2020-10-13 12:00 ` Johannes Schindelin 2020-10-13 16:00 ` Junio C Hamano 2020-10-13 19:01 ` Michał Kępień 2020-10-15 11:45 ` Johannes Schindelin 2020-10-15 7:24 ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień 2020-10-15 7:24 ` [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień 2020-10-15 7:24 ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień 2020-10-16 15:32 ` Phillip Wood 2020-10-16 18:04 ` Junio C Hamano 2020-10-19 9:48 ` Michał Kępień 2020-10-16 18:16 ` Junio C Hamano 2020-10-19 9:55 ` Michał Kępień 2020-10-19 17:29 ` Junio C Hamano 2020-10-16 10:00 ` [PATCH v3 0/2] " Johannes Schindelin 2020-10-20 6:48 ` [PATCH v4 " Michał Kępień 2020-10-20 6:48 ` [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień 2020-10-20 6:48 ` [PATCH v4 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień 2021-02-05 14:13 ` [PATCH 1/2] diff: add an API for deferred freeing Ævar Arnfjörð Bjarmason 2021-02-10 16:00 ` Johannes Schindelin 2021-02-11 3:00 ` Ævar Arnfjörð Bjarmason 2021-02-11 9:40 ` Johannes Schindelin 2021-02-11 10:21 ` Jeff King 2021-02-11 10:45 ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason 2021-02-11 10:45 ` [PATCH v2 1/2] " Ævar Arnfjörð Bjarmason 2021-02-11 10:45 ` [PATCH v2 2/2] diff: plug memory leak from regcomp() on {log,diff} -I Ævar Arnfjörð Bjarmason 2021-02-05 14:13 ` [PATCH " Ævar Arnfjörð Bjarmason
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).