All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michał Kępień" <michal@isc.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/2] diff: add -I<regex> that ignores matching changes
Date: Wed, 7 Oct 2020 21:48:21 +0200	[thread overview]
Message-ID: <20201007194821.GA20549@larwa.hq.kempniu.pl> (raw)
In-Reply-To: <xmqqy2kpbye9.fsf@gitster.c.googlers.com>

> > Apart from adding a new field to struct diff_options, also define a new
> > flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure.
> > This is done because the xpparam_t structure (which is used for passing
> > around the regular expression supplied to -I) is not zeroed out in all
> > call stacks involving xdl_diff() and thus only performing a NULL check
> > on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating
> > garbage on the stack as a regular expression.  As the 'flags' field of
> > xpparam_t is initialized in all call stacks involving xdl_diff(), adding
> > a flag check avoids that problem.
> 
> You mentioned in your cover letter about this, and I tend to agree
> that this feels like a hack, at least from the first glance.  The
> next feature that wants to have an optional pointer in xpparam_t and
> have the code behave differently with the data pointed by it would
> need to waste another bit the same way.

Agreed.

> Do you already know (read:
> I am not asking you to dig to find out immediately---just asking if
> you already know the answer) if there is an inherent reason why they
> cannot be memset(0) before use?  It seems like a much better
> approach to ensure that we clear the structure.

I do not know of any reason for xpparam_t structures to not be
zero-initialized.  In fact, they _are_ zero-initialized most of the
time; AFAICT, there are just three places in the tree in which they are
not.

Would you like me to address that in a separate patch in this patch
series?

> Doesn't existing
> anchors array share the same issue (at least anchors_nr must be
> cleared if there is no interesting anchors) already?  IOW, should
> "git grep anchors_nr" be a valid way to identify _all_ places where
> you need to clear the ignore_regex field?

Yes, the 'anchors' and 'anchors_nr' fields of xpparam_t are also
affected by the same problem, but the symptoms are more benign in their
case because these fields are only used in xdiff/xpatience.c, i.e. when
XDF_PATIENCE_DIFF is set in 'flags' - and as I already mentioned in
commit messages, that field is always initialized properly, so there is
currently no practical possibility of stack garbage being interpreted as
e.g. a pointer to the anchor array.

> > +-I<regex>::
> > +	Ignore changes whose all lines match <regex>.
> > +
> 
> The implementation seems to allow only one regex, but I suspect we'd
> want to mimic
> 
>     $ printf "%s\n" a a a >test_a
>     $ printf "%s\n" b b b >test_b
>     $ diff -Ia     test_a test_b
>     $ diff     -Ib test_a test_b
>     $ diff -Ia -Ib test_a test_b

Ah, sure.  After skimming through various man pages available online, I
was not sure whether all standalone versions of diff which support -I
allow that switch to be used multiple times and I thought it would be
better to start with the simplest thing possible.  If you would rather
have the above implemented immediately, I can sure try that in v2.

> and until that happens, the explanation needs to say something about
> earlier <regex> being discarded when this option is given more than
> once.

Sorry, where do I look for that explanation?  I tried to make -I behave
similarly to -G and -S, each of which can be specified more than once,
but the last value provided overrides the earlier ones - and I could not
find any mention of that behavior in the docs.  Please help?

> > @@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt,
> >  	return 0;
> >  }
> >  
> > +static int diff_opt_ignore_regex(const struct option *opt,
> > +				 const char *arg, int unset)
> > +{
> > +	struct diff_options *options = opt->value;
> > +
> > +	BUG_ON_OPT_NEG(unset);
> > +	options->xdl_opts |= XDF_IGNORE_REGEX;
> > +	options->ignore_regex = arg;
> 
> When given twice or more, the earlier value gets lost (it does not
> leak, luckily, though).

Right, as I mentioned above, I just wanted to start with something
simple that resembles -G/-S.  I will revise this part in v2 if you would
like to see support for multiple regular expressions in this patch
series.

> > +	return 0;
> > +}
> 
> If we somehow can lose the use of XDF_IGNORE_REGEX bit, we do not
> have to have this as a callback.  Instead, it can be OPT_STRING with
> the current semantics of "only the last one is used", or we can use
> OPT_STRING_LIST to keep all the expressions.

I see, thanks for the hints!

> On the other hand, I wonder if it would be a valid approach to make
> the new field(s) in diff_options a "regex_t *ignore_regex" (and add
> an "int ignore_regex_nr" next to it, if we shoot for multiple -I
> options), instead of "const char *".  For that, we would need a
> callback even without XDF_IGNORE_REGEX bit having to futz with
> xdl_opts field.
> 
> Doing so would give us a chance to compile and notice a malformed
> regular expression in diff.c, before it hits xdiff/ layer, which may
> or may not be a good thing.

I have not thought about this approach before, but it sounds elegant to
me, at least at first glance - option parsing code sounds like the right
place for sanitizing user-provided parameters.  Doubly so if we take the
multiple -I options approach.  I will try this in v2.

> > @@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
> >  	}
> >  }
> 
> I agree with what you said in the cover letter that it would be a
> good idea to name the existing xdl_mark_ignorable() and the new one
> in such a way that they look similar and parallel, by renaming the
> former to xdl_mark_ignorable_lines or something.

Then I will do that in v2.  Separate patch?

> > +static void
> > +xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
> > +			 const char *ignore_regex)
> 
> I know some coding standard do chomp line immediately before the
> function name (I grew up with one), but that is not what this
> project uses, and judging from the surrounding code, it is not what
> the upstream xdiff source we borrowed uses, either.

Oh, right, sorry - force of habit.  I will fix this in v2.

> > +{
> > +	xdchange_t *xch;
> > +	regex_t regex;
> > +
> > +	if (regcomp(&regex, ignore_regex, REG_EXTENDED | REG_NEWLINE))
> > +		die("invalid regex: %s", ignore_regex);
> 
> If we compile in diff.c layer and pass regex_t down here, we won't
> have to fail here this deep in the callchain.

Agreed - I already responded to this remark above.

> > +	for (xch = xscr; xch; xch = xch->next) {
> > +		regmatch_t regmatch;
> > +		xrecord_t **rec;
> > +		int ignore = 1;
> > +		long i;
> > +
> > +		rec = &xe->xdf1.recs[xch->i1];
> > +		for (i = 0; i < xch->chg1 && ignore; i++)
> > +			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
> > +					      1, &regmatch, 0);
> > +		rec = &xe->xdf2.recs[xch->i2];
> > +		for (i = 0; i < xch->chg2 && ignore; i++)
> > +			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
> > +					      1, &regmatch, 0);
> > +
> > +		/*
> > +		 * Do not override --ignore-blank-lines.
> > +		 */
> > +		xch->ignore |= ignore;
> 
> Well, you could optimize out the whole regexp matching by adding
> 
> 		if (xch->ignore)
> 			continue;
> 
> before the two loops try to find a non-matching line, no?

Ah, of course, it looks so obvious now that you pointed it out :-)  I
guess I was copy-past^W^W^Wtrying to make xdl_mark_ignorable_regex()
look similar to xdl_mark_ignorable().  I will use your suggestion in v2.

> > +	}
> > +}
> > +
> >  int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> >  	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
> >  	xdchange_t *xscr;
> > @@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> >  		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
> >  			xdl_mark_ignorable(xscr, &xe, xpp->flags);
> >  
> > +		if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex)
> > +			xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex);
> > +
> >  		if (ef(&xe, xscr, ecb, xecfg) < 0) {
> >  
> >  			xdl_free_script(xscr);
> 
> Thanks for an exciting read ;-)

My pleasure ;-)  And thanks for taking a look.  Sorry about the long
turnaround time, but unfortunately this is the best I can do at the
moment.

-- 
Best regards,
Michał Kępień

  reply	other threads:[~2020-10-07 19:48 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-01 12:06 [PATCH 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-01 12:06 ` [PATCH 1/2] " Michał Kępień
2020-10-01 18:21   ` Junio C Hamano
2020-10-07 19:48     ` Michał Kępień [this message]
2020-10-07 20:08       ` Junio C Hamano
2020-10-01 12:06 ` [PATCH 2/2] t: add -I<regex> tests Michał Kępień
2020-10-01 17:02 ` [PATCH 0/2] diff: add -I<regex> that ignores matching changes Junio C Hamano
2020-10-12  9:17 ` [PATCH v2 0/3] " Michał Kępień
2020-10-12  9:17   ` [PATCH v2 1/3] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-12 11:14     ` Johannes Schindelin
2020-10-12 17:09       ` Junio C Hamano
2020-10-12 19:52     ` Junio C Hamano
2020-10-13  6:35       ` Michał Kępień
2020-10-12  9:17   ` [PATCH v2 2/3] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-12 11:20     ` Johannes Schindelin
2020-10-12 20:00       ` Junio C Hamano
2020-10-12 20:39         ` Johannes Schindelin
2020-10-12 21:43           ` Junio C Hamano
2020-10-13  6:37             ` Michał Kępień
2020-10-13 15:49               ` Junio C Hamano
2020-10-13  6:36       ` Michał Kępień
2020-10-13 12:02         ` Johannes Schindelin
2020-10-13 15:53           ` Junio C Hamano
2020-10-13 18:45           ` Michał Kępień
2020-10-12 18:01     ` Junio C Hamano
2020-10-13  6:38       ` Michał Kępień
2020-10-12 20:04     ` Junio C Hamano
2020-10-13  6:38       ` Michał Kępień
2020-10-12  9:17   ` [PATCH v2 3/3] t: add -I<regex> tests Michał Kępień
2020-10-12 11:49     ` Johannes Schindelin
2020-10-13  6:38       ` Michał Kępień
2020-10-13 12:00         ` Johannes Schindelin
2020-10-13 16:00           ` Junio C Hamano
2020-10-13 19:01           ` Michał Kępień
2020-10-15 11:45             ` Johannes Schindelin
2020-10-15  7:24   ` [PATCH v3 0/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-15  7:24     ` [PATCH v3 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-15  7:24     ` [PATCH v3 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2020-10-16 15:32       ` Phillip Wood
2020-10-16 18:04         ` Junio C Hamano
2020-10-19  9:48           ` Michał Kępień
2020-10-16 18:16       ` Junio C Hamano
2020-10-19  9:55         ` Michał Kępień
2020-10-19 17:29           ` Junio C Hamano
2020-10-16 10:00     ` [PATCH v3 0/2] " Johannes Schindelin
2020-10-20  6:48     ` [PATCH v4 " Michał Kępień
2020-10-20  6:48       ` [PATCH v4 1/2] merge-base, xdiff: zero out xpparam_t structures Michał Kępień
2020-10-20  6:48       ` [PATCH v4 2/2] diff: add -I<regex> that ignores matching changes Michał Kępień
2021-02-05 14:13       ` [PATCH 1/2] diff: add an API for deferred freeing Ævar Arnfjörð Bjarmason
2021-02-10 16:00         ` Johannes Schindelin
2021-02-11  3:00           ` Ævar Arnfjörð Bjarmason
2021-02-11  9:40             ` Johannes Schindelin
2021-02-11 10:21               ` Jeff King
2021-02-11 10:45                 ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason
2021-02-11 10:45                 ` [PATCH v2 1/2] " Ævar Arnfjörð Bjarmason
2021-02-11 10:45                 ` [PATCH v2 2/2] diff: plug memory leak from regcomp() on {log,diff} -I Ævar Arnfjörð Bjarmason
2021-02-05 14:13       ` [PATCH " Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201007194821.GA20549@larwa.hq.kempniu.pl \
    --to=michal@isc.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.