git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git diff -I<regex> does not work with empty +/- lines
@ 2021-11-28  9:15 Johannes Altmanninger
  2021-11-28 23:26 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Johannes Altmanninger @ 2021-11-28  9:15 UTC (permalink / raw)
  To: git; +Cc: Michał Kępień

diff -I<regex> suppresses hunks where all +/- lines match <regex>.
it is useful to filter away boilerplate changes.

Unfortunately, it doesn't help if a hunk has a blank line, because the one
obvious way to filter out blank lines (^$) match *every* line, surprisingly.
This is because for a line "01\n"
we have a zero-width match here ^$ (offset 3).

IOW, while we succesfully ignore deleted blank lines

	printf '\n' | git diff --no-index - /dev/null -I'^$'
	diff --git a/- b/-
	deleted file mode 100644

we also ignore non-blank lines (very surprising)

	printf 'non-blank-line\n' | git diff --no-index - /dev/null -I'^$'
	diff --git a/- b/-
	deleted file mode 100644

unless they don't end in a newline (special case)

	printf 'line without ending newline' | git diff --no-index - /dev/null -I'^$'
	diff --git a/- b/-
	deleted file mode 100644
	--- a/-
	+++ /dev/null
	@@ -1 +0,0 @@
	-line without ending newline
	\ No newline at end of file

This patch fixes the second example. Is this the right direction?
Do we want to honor core.eol, so we preserve the \r when we have Unix endings?

In any case -I<regex> won't be able to discern between "line\n" and "line"
but that's not important to me.

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index a4542c05b6..23325022b9 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -1016,10 +1016,17 @@ static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
 static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) {
 	regmatch_t regmatch;
 	int i;
+	const char *end = rec->ptr + rec->size;
+
+	if (rec->size >= 2 && end[-2] == '\r' && end[-1] == '\n') {
+		end -= 2;
+	} else if (rec->size && end[-1] == '\n') {
+		end -= 1;
+	}
 
 	for (i = 0; i < xpp->ignore_regex_nr; i++)
-		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1,
-				 &regmatch, 0))
+		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr,
+				 end - rec->ptr, 1, &regmatch, 0))
 			return 1;
 
 	return 0;

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: git diff -I<regex> does not work with empty +/- lines
  2021-11-28  9:15 git diff -I<regex> does not work with empty +/- lines Johannes Altmanninger
@ 2021-11-28 23:26 ` Junio C Hamano
  2021-11-29  4:54   ` Johannes Altmanninger
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2021-11-28 23:26 UTC (permalink / raw)
  To: Johannes Altmanninger; +Cc: git, Michał Kępień

Johannes Altmanninger <aclopte@gmail.com> writes:

> diff -I<regex> suppresses hunks where all +/- lines match <regex>.
> it is useful to filter away boilerplate changes.
>
> Unfortunately, it doesn't help if a hunk has a blank line, because the one
> obvious way to filter out blank lines (^$) match *every* line, surprisingly.
> This is because for a line "01\n"
> we have a zero-width match here ^$ (offset 3).
>
> IOW, while we succesfully ignore deleted blank lines
>
> 	printf '\n' | git diff --no-index - /dev/null -I'^$'
> 	diff --git a/- b/-
> 	deleted file mode 100644
>
> we also ignore non-blank lines (very surprising)
>
> 	printf 'non-blank-line\n' | git diff --no-index - /dev/null -I'^$'
> 	diff --git a/- b/-
> 	deleted file mode 100644
>
> unless they don't end in a newline (special case)
>
> 	printf 'line without ending newline' | git diff --no-index - /dev/null -I'^$'
> 	diff --git a/- b/-
> 	deleted file mode 100644
> 	--- a/-
> 	+++ /dev/null
> 	@@ -1 +0,0 @@
> 	-line without ending newline
> 	\ No newline at end of file
>
> This patch fixes the second example. Is this the right direction?

I do not know where in the code the breakage in the first example
comes from.  It sounds like a bug if a pattern is not matched
honoring the anchor, whether the beginning-of-line "^" or the
end-of-line "$" one.

> Do we want to honor core.eol, so we preserve the \r when we have Unix endings?

Absolutely.  I think the code paths that work on blob data (i.e. Git
internal, before the smudge operation is applied to externalize it
to a working tree file) should only consider LF as the end of line
(and any existing code that doesn't is a bug).

> In any case -I<regex> won't be able to discern between "line\n" and "line"
> but that's not important to me.
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index a4542c05b6..23325022b9 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -1016,10 +1016,17 @@ static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
>  static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) {
>  	regmatch_t regmatch;
>  	int i;
> +	const char *end = rec->ptr + rec->size;
> +
> +	if (rec->size >= 2 && end[-2] == '\r' && end[-1] == '\n') {
> +		end -= 2;
> +	} else if (rec->size && end[-1] == '\n') {
> +		end -= 1;
> +	}
>  
>  	for (i = 0; i < xpp->ignore_regex_nr; i++)
> -		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1,
> -				 &regmatch, 0))
> +		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr,
> +				 end - rec->ptr, 1, &regmatch, 0))
>  			return 1;
>  
>  	return 0;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: git diff -I<regex> does not work with empty +/- lines
  2021-11-28 23:26 ` Junio C Hamano
@ 2021-11-29  4:54   ` Johannes Altmanninger
  0 siblings, 0 replies; 3+ messages in thread
From: Johannes Altmanninger @ 2021-11-29  4:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Michał Kępień

On Sun, Nov 28, 2021 at 03:26:10PM -0800, Junio C Hamano wrote:
> Johannes Altmanninger <aclopte@gmail.com> writes:
> 
> > diff -I<regex> suppresses hunks where all +/- lines match <regex>.
> > it is useful to filter away boilerplate changes.
> >
> > Unfortunately, it doesn't help if a hunk has a blank line, because the one
> > obvious way to filter out blank lines (^$) match *every* line, surprisingly.
> > This is because for a line "01\n"
> > we have a zero-width match here ^$ (offset 3).
> >
> > IOW, while we succesfully ignore deleted blank lines
> >
> > 	printf '\n' | git diff --no-index - /dev/null -I'^$'
> > 	diff --git a/- b/-
> > 	deleted file mode 100644
> >
> > we also ignore non-blank lines (very surprising)
> >
> > 	printf 'non-blank-line\n' | git diff --no-index - /dev/null -I'^$'
> > 	diff --git a/- b/-
> > 	deleted file mode 100644
> >
> > unless they don't end in a newline (special case)
> >
> > 	printf 'line without ending newline' | git diff --no-index - /dev/null -I'^$'
> > 	diff --git a/- b/-
> > 	deleted file mode 100644
> > 	--- a/-
> > 	+++ /dev/null
> > 	@@ -1 +0,0 @@
> > 	-line without ending newline
> > 	\ No newline at end of file
> >
> > This patch fixes the second example. Is this the right direction?
> 
> I do not know where in the code the breakage in the first example
> comes from.  It sounds like a bug if a pattern is not matched
> honoring the anchor, whether the beginning-of-line "^" or the
> end-of-line "$" one.

THe first example (printf '\n' | ... -I'^$') works fine AFAICT.
The regex matches the empty line, so the hunk is ignored
(but the file header is still printed).
Only the second example (printf 'non-blank-line\n' | ... -I'^$')
shows that we ignore too much, because ^$ (incorrectly) matches,
but it should only match blank lines.

Thanks

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-29  4:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-28  9:15 git diff -I<regex> does not work with empty +/- lines Johannes Altmanninger
2021-11-28 23:26 ` Junio C Hamano
2021-11-29  4:54   ` Johannes Altmanninger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).