git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* `git blame` Line Number Off-by-one
@ 2020-08-07  1:05 Nuthan Munaiah
  2020-08-07 21:21 ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Nuthan Munaiah @ 2020-08-07  1:05 UTC (permalink / raw)
  To: git

What did you do before the bug happened? (Steps to reproduce your issue)

  * Clone https://github.com/apache/tomcat
  * Run `git blame --root -leftw -L 21,21 -L 23,23 
51844327d8613448bb0bf9667e1a61e462e2043c^ -- 
modules/jdbc-pool/java/org/apache/tomcat/jdbc/pool/PoolProperties.java`

What did you expect to happen? (Expected behavior)

`git blame` shows the last commit that modified lines 21 and 23 of 
`modules/jdbc-pool/java/org/apache/tomcat/jdbc/pool/PoolProperties.java` 
starting at the parent of `51844327d8613448bb0bf9667e1a61e462e2043c`.

```
$ git blame --root -leftw -L 21,21 -L 23,23  
51844327d8613448bb0bf9667e1a61e462e2043c^ -- 
modules/jdbc-pool/java/org/apache/tomcat/jdbc/pool/PoolProperties.java
c65a429f06f4e4a025a306e377211863d9ff2a0c 
modules/jdbc-pool/java/org/apache/tomcat/jdbc/pool/PoolProperties.java 
(<fhanik@apache.org> 1226896977 +0000 21) import java.util.ArrayList;
c65a429f06f4e4a025a306e377211863d9ff2a0c 
modules/jdbc-pool/java/org/apache/tomcat/jdbc/pool/PoolProperties.java 
(<fhanik@apache.org> 1226896977 +0000 23) import java.util.List;
```

What happened instead? (Actual behavior)

Line 23 is not shown in the `git blame` output. Instead, line 22 is 
shown.

```
$ git blame --root -leftw -L 21,21 -L 23,23  
51844327d8613448bb0bf9667e1a61e462e2043c^ -- 
modules/jdbc-pool/java/org/apache/tomcat/jdbc/pool/PoolProperties.java
c65a429f06f4e4a025a306e377211863d9ff2a0c 
modules/jdbc-pool/java/org/apache/tomcat/jdbc/pool/PoolProperties.java 
(<fhanik@apache.org> 1226896977 +0000 21) import java.util.ArrayList;
c65a429f06f4e4a025a306e377211863d9ff2a0c 
modules/jdbc-pool/java/org/apache/tomcat/jdbc/pool/PoolProperties.java 
(<fhanik@apache.org> 1226896977 +0000 22) import java.util.HashMap;
```

What's different between what you expected and what actually happened?

Line 23 is not shown in the `git blame` output. Instead line 22 is 
shown.

Anything else you want to add:

  * The issue is reproducible on git versions 2.28.0 (on Ubuntu 18.04.4 
LTS), 2.24.0 (on Ubuntu 18.04.3 LTS), 2.17.0 (on Ubuntu 18.04.4 LTS), 
and 2.27.0.windows.1 (on Microsoft Windows 10 Enterprise 10.0.19041 
Build 19041).
  * The addition of `--porcelain` does not resolve the issue.
  * The issue is not specific to a particular repository. For instance, 
the output from `git blame -L 1466,1466 -L 1468,1468  
35e69fc7cf9421ab04ffc9d52cb36d07fa12984a^ -- c/dec/decode.c` in a clone 
of https://github.com/google/brotli shows last commit that modified 
lines 1466 and 1467 (instead 1468).
  * The issue seems to present itself only when there is exactly one line 
in between two specifications of `-L`. For instance, `-L 20,20 -L 23,23` 
correctly blames lines 20 and 23, `-L 20,20 -L 21,21 -L 22,22 -L 23,23` 
correctly blames lines 20, 21, 22, and 23, `-L 23,23` correctly blames 
line 23, but `-L 21,21 -L 23,23` blames lines 21 and 22 (instead of 23).

[System Info]
git version:
git version 2.28.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 4.15.0-88-generic #88-Ubuntu SMP Tue Feb 11 20:11:34 UTC 
2020 x86_64
compiler info: gnuc: 7.5
libc info: glibc: 2.27
$SHELL (typically, interactive shell): <unset>

[Enabled Hooks]


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: `git blame` Line Number Off-by-one
  2020-08-07  1:05 `git blame` Line Number Off-by-one Nuthan Munaiah
@ 2020-08-07 21:21 ` Jeff King
  2020-08-07 21:33   ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2020-08-07 21:21 UTC (permalink / raw)
  To: Nuthan Munaiah; +Cc: git, Junio C Hamano

On Fri, Aug 07, 2020 at 01:05:51AM +0000, Nuthan Munaiah wrote:

>  * Clone https://github.com/apache/tomcat
>  * Run `git blame --root -leftw -L 21,21 -L 23,23
> 51844327d8613448bb0bf9667e1a61e462e2043c^ --
> modules/jdbc-pool/java/org/apache/tomcat/jdbc/pool/PoolProperties.java`
>
> [...]
> 
> `git blame` shows the last commit that modified lines 21 and 23 of
> `modules/jdbc-pool/java/org/apache/tomcat/jdbc/pool/PoolProperties.java`
> starting at the parent of `51844327d8613448bb0bf9667e1a61e462e2043c`.
>
> [...]
>
> Line 23 is not shown in the `git blame` output. Instead, line 22 is shown.

Thanks for providing an easy reproduction case.

I think the issue is not in the -L input or in the blame algorithm
itself, but in the hunk-coalescing at the end.

As you note, this shows up even with --porcelain:

  $ commit=51844327d8613448bb0bf9667e1a61e462e2043c^
  $ fn=modules/jdbc-pool/java/org/apache/tomcat/jdbc/pool/PoolProperties.java
  $ git blame --porcelain -L 21,21 -L 23,23 $commit -- $fn |
    egrep '^[0-9a-f]{40}'
  c65a429f06f4e4a025a306e377211863d9ff2a0c 21 21 2
  c65a429f06f4e4a025a306e377211863d9ff2a0c 22 22

but if we try --incremental:

  $ git blame --incremental -L 21,21 -L 23,23 $commit -- $fn |
    egrep '^[0-9a-f]{40}'
  c65a429f06f4e4a025a306e377211863d9ff2a0c 21 21 1
  c65a429f06f4e4a025a306e377211863d9ff2a0c 22 23 1

So we do know at the moment we find the line that it was at line 23 in
the final result, but line 22 in the earlier version at c65a429f06.

And indeed, running a non-incremental blame in a debugger, right before
calling blame_coalesce() our entries look like this:

  cmd_blame (argc=3, argv=0x7fffffffe458, prefix=0x0) at builtin/blame.c:1146
  1146		blame_coalesce(&sb);
  (gdb) print *sb->ent 
  $44 = {next = 0x55555596eda0, lno = 20, num_lines = 1, suspect = 0x555555999a30, s_lno = 20, score = 0, ignored = 0, 
    unblamable = 0}
  (gdb) print *sb->ent->next
  $45 = {next = 0x0, lno = 22, num_lines = 1, suspect = 0x555555999a30, s_lno = 21, score = 0, ignored = 0, 
    unblamable = 0}

So we have two one-line entries at lines 21 and 23 ("lno"; note that
internally we zero-index the lines), and we know that the second one is
actually from 22 ("s_lno").

But then after blame_coalesce() returns, we have only one entry with
both lines:

  (gdb) n
  1148		if (!(output_option & (OUTPUT_COLOR_LINE | OUTPUT_SHOW_AGE_WITH_COLOR)))
  (gdb) print *sb->ent
  $46 = {next = 0x0, lno = 20, num_lines = 2, suspect = 0x555555999a30, s_lno = 20, score = 0, ignored = 0, 
    unblamable = 0}

Presumably it saw the adjacent lines in the _source_ file and coalesced
them, but that's not the right thing to do. They're distinct hunks in
the output we're going to show, so they have to remain such.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: `git blame` Line Number Off-by-one
  2020-08-07 21:21 ` Jeff King
@ 2020-08-07 21:33   ` Jeff King
  2020-08-07 21:45     ` Jeff King
  2020-08-07 22:05     ` Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: Jeff King @ 2020-08-07 21:33 UTC (permalink / raw)
  To: Nuthan Munaiah; +Cc: git, Junio C Hamano

On Fri, Aug 07, 2020 at 05:21:59PM -0400, Jeff King wrote:

> So we have two one-line entries at lines 21 and 23 ("lno"; note that
> internally we zero-index the lines), and we know that the second one is
> actually from 22 ("s_lno").
> 
> But then after blame_coalesce() returns, we have only one entry with
> both lines:
> 
>   (gdb) n
>   1148		if (!(output_option & (OUTPUT_COLOR_LINE | OUTPUT_SHOW_AGE_WITH_COLOR)))
>   (gdb) print *sb->ent
>   $46 = {next = 0x0, lno = 20, num_lines = 2, suspect = 0x555555999a30, s_lno = 20, score = 0, ignored = 0, 
>     unblamable = 0}
> 
> Presumably it saw the adjacent lines in the _source_ file and coalesced
> them, but that's not the right thing to do. They're distinct hunks in
> the output we're going to show, so they have to remain such.

I took a look at blame_coalesce(), wondering if there might be a bug in
it (e.g., using source lines instead of result lines). But it seems to
be intentionally joining lines based on the original source count. And
thinking on it, we wouldn't need to join groups based no result lines,
since those start as groups in the first place.

So I'm having trouble seeing how this coalescing could ever be helpful.
It dates back to the original cee7f245dc (git-pickaxe: blame rewritten.,
2006-10-19). Is it possible that this is just an ill-conceived idea that
nobody ever noticed before (because most of the time it simply does
nothing)?

Dropping it entirely, as below, doesn't break any tests. Junio, do you
know of a case this is meant to improve?

-Peff

diff --git a/blame.c b/blame.c
index 82fa16d658..e4fd9a80ef 100644
--- a/blame.c
+++ b/blame.c
@@ -1172,33 +1172,6 @@ static void sanity_check_refcnt(struct blame_scoreboard *sb)
 		sb->on_sanity_fail(sb, baa);
 }
 
-/*
- * If two blame entries that are next to each other came from
- * contiguous lines in the same origin (i.e. <commit, path> pair),
- * merge them together.
- */
-void blame_coalesce(struct blame_scoreboard *sb)
-{
-	struct blame_entry *ent, *next;
-
-	for (ent = sb->ent; ent && (next = ent->next); ent = next) {
-		if (ent->suspect == next->suspect &&
-		    ent->s_lno + ent->num_lines == next->s_lno &&
-		    ent->ignored == next->ignored &&
-		    ent->unblamable == next->unblamable) {
-			ent->num_lines += next->num_lines;
-			ent->next = next->next;
-			blame_origin_decref(next->suspect);
-			free(next);
-			ent->score = 0;
-			next = ent; /* again */
-		}
-	}
-
-	if (sb->debug) /* sanity */
-		sanity_check_refcnt(sb);
-}
-
 /*
  * Merge the given sorted list of blames into a preexisting origin.
  * If there were no previous blames to that commit, it is entered into
diff --git a/blame.h b/blame.h
index b6bbee4147..1807253ef8 100644
--- a/blame.h
+++ b/blame.h
@@ -173,7 +173,6 @@ static inline struct blame_origin *blame_origin_incref(struct blame_origin *o)
 }
 void blame_origin_decref(struct blame_origin *o);
 
-void blame_coalesce(struct blame_scoreboard *sb);
 void blame_sort_final(struct blame_scoreboard *sb);
 unsigned blame_entry_score(struct blame_scoreboard *sb, struct blame_entry *e);
 void assign_blame(struct blame_scoreboard *sb, int opt);
diff --git a/builtin/blame.c b/builtin/blame.c
index 94ef57c1cc..ce564a5916 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1143,8 +1143,6 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 
 	blame_sort_final(&sb);
 
-	blame_coalesce(&sb);
-
 	if (!(output_option & (OUTPUT_COLOR_LINE | OUTPUT_SHOW_AGE_WITH_COLOR)))
 		output_option |= coloring_mode;
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: `git blame` Line Number Off-by-one
  2020-08-07 21:33   ` Jeff King
@ 2020-08-07 21:45     ` Jeff King
  2020-08-07 22:05     ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff King @ 2020-08-07 21:45 UTC (permalink / raw)
  To: Nuthan Munaiah; +Cc: git, Junio C Hamano

On Fri, Aug 07, 2020 at 05:33:49PM -0400, Jeff King wrote:

> Dropping it entirely, as below, doesn't break any tests. Junio, do you
> know of a case this is meant to improve?
> [...]
> diff --git a/builtin/blame.c b/builtin/blame.c
> index 94ef57c1cc..ce564a5916 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -1143,8 +1143,6 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
>  
>  	blame_sort_final(&sb);
>  
> -	blame_coalesce(&sb);
> -
>  	if (!(output_option & (OUTPUT_COLOR_LINE | OUTPUT_SHOW_AGE_WITH_COLOR)))
>  		output_option |= coloring_mode;

I wondered also whether this sort was necessary (obviously it is for
coalescing, but is it otherwise?). But there are definitely tests that
break if it's removed (e.g. t8011-blame-split-file). And it's sorting on
"lno", which implies that yes, in some cases we may end up with a
contiguous block of final-result lines that was broken up and shuffled
around.

I'm not sure if that implies a case where source-coalescing could help.
Would you ever have such a case where they end up blaming to the same
commit?

I'm not sure. If so, then it would probably make sense for
blame_coalesce() to make sure that the ent->lno and next->lno blocks are
contiguous (as well as the s_lno ones).

But even if that's true, I'm not sure that this coalescing is really all
that beneficial. It doesn't impact the normal blame output at all (which
is line oriented anyway, and repeats information). It doesn't make
--porcelain more efficient, because we omit repeated commit details even
for non-contiguous blocks. It would let a consumer of --porcelain see
bigger blocks which could be useful for coloring, etc.

So I dunno. I'd still be curious to see a concrete case where this code
triggers and does something useful.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: `git blame` Line Number Off-by-one
  2020-08-07 21:33   ` Jeff King
  2020-08-07 21:45     ` Jeff King
@ 2020-08-07 22:05     ` Junio C Hamano
  2020-08-07 22:26       ` Jeff King
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2020-08-07 22:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Nuthan Munaiah, git

Jeff King <peff@peff.net> writes:

> Dropping it entirely, as below, doesn't break any tests. Junio, do you
> know of a case this is meant to improve?

I think the only conceivable case is that in the middle of a single
block of text in an ancient version, another block of lines gets
inserted during the evolution of the file, but in the end these
intermediate edits all go away and the same original text remains.

In such a case, without coalescing, we would not treat the original
single block of text as a single unit.

IIRC, blame has some threshold that makes too small a block not
subject to move and copy detection, and it is most likely to avoid
fragmenting the blocks too small.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: `git blame` Line Number Off-by-one
  2020-08-07 22:05     ` Junio C Hamano
@ 2020-08-07 22:26       ` Jeff King
  2020-08-07 22:35         ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2020-08-07 22:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nuthan Munaiah, git

On Fri, Aug 07, 2020 at 03:05:27PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Dropping it entirely, as below, doesn't break any tests. Junio, do you
> > know of a case this is meant to improve?
> 
> I think the only conceivable case is that in the middle of a single
> block of text in an ancient version, another block of lines gets
> inserted during the evolution of the file, but in the end these
> intermediate edits all go away and the same original text remains.
> 
> In such a case, without coalescing, we would not treat the original
> single block of text as a single unit.

Yeah, that makes sense, and it should be possible to construct a case
based on that.

> IIRC, blame has some threshold that makes too small a block not
> subject to move and copy detection, and it is most likely to avoid
> fragmenting the blocks too small.

I don't know if that would matter here, though. This coalescing is
taking place after all of the blaming has happened, and is just
preparing the final output.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: `git blame` Line Number Off-by-one
  2020-08-07 22:26       ` Jeff King
@ 2020-08-07 22:35         ` Jeff King
  2020-08-13  5:20           ` [PATCH 0/3] blame: fix bug in coalescing non-adjacent "-L" chunks Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2020-08-07 22:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nuthan Munaiah, git

On Fri, Aug 07, 2020 at 06:26:30PM -0400, Jeff King wrote:

> On Fri, Aug 07, 2020 at 03:05:27PM -0700, Junio C Hamano wrote:
> 
> > Jeff King <peff@peff.net> writes:
> > 
> > > Dropping it entirely, as below, doesn't break any tests. Junio, do you
> > > know of a case this is meant to improve?
> > 
> > I think the only conceivable case is that in the middle of a single
> > block of text in an ancient version, another block of lines gets
> > inserted during the evolution of the file, but in the end these
> > intermediate edits all go away and the same original text remains.
> > 
> > In such a case, without coalescing, we would not treat the original
> > single block of text as a single unit.
> 
> Yeah, that makes sense, and it should be possible to construct a case
> based on that.

It was even easier than I expected. :)

Try this:

  git init repo
  cd repo
  
  seq 20 >file
  git add file
  git commit -m base
  
  { seq 10; echo foo; seq 11 20; } >file
  git commit -am cruft
  
  seq 20 >file
  git commit -am revert

  git blame --root --porcelain file

With the coalescing, we get a single 20-line block:

  25d93ad4c4b27570ccd4b6412ccaa549ae6f9ff7 1 1 20
  author Jeff King
  ...
          1
  ...
  25d93ad4c4b27570ccd4b6412ccaa549ae6f9ff7 11 11
          11

and without it, we get two 10-line blocks:

  25d93ad4c4b27570ccd4b6412ccaa549ae6f9ff7 1 1 10
  author Jeff King
  ...
          1
  ...
  25d93ad4c4b27570ccd4b6412ccaa549ae6f9ff7 11 11 10
          11

I doubt it really matters all that much in practice, but it's possible
some consumer of --porcelain relies on us giving minimal blocks.

So that argues for fixing blame_coalesce() to make sure that we coalesce
only when the blocks are adjacent in both the source and the result.
Otherwise we're losing information.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 0/3] blame: fix bug in coalescing non-adjacent "-L" chunks
  2020-08-07 22:35         ` Jeff King
@ 2020-08-13  5:20           ` Jeff King
  2020-08-13  5:23             ` [PATCH 1/3] t8003: check output of coalesced blame Jeff King
                               ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Jeff King @ 2020-08-13  5:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Barret Rhoden, Nuthan Munaiah, git

On Fri, Aug 07, 2020 at 06:35:22PM -0400, Jeff King wrote:

> > > I think the only conceivable case is that in the middle of a single
> > > block of text in an ancient version, another block of lines gets
> > > inserted during the evolution of the file, but in the end these
> > > intermediate edits all go away and the same original text remains.
> > > 
> > > In such a case, without coalescing, we would not treat the original
> > > single block of text as a single unit.
> > 
> > Yeah, that makes sense, and it should be possible to construct a case
> > based on that.

I started to add a test for this, and it turns out we already had one!
It just wasn't checking the output as carefully as it could. :)

So here's a series which actually checks that blame_coalesce() is doing
something useful, and then fixes Nuthan's bug on top (with a test case,
but I also confirmed it makes the original tomcat issue go away).

  [1/3]: t8003: check output of coalesced blame
  [2/3]: t8003: factor setup out of coalesce test
  [3/3]: blame: only coalesce lines that are adjacent in result

 blame.c                       |  1 +
 t/t8003-blame-corner-cases.sh | 28 +++++++++++++++++++---------
 2 files changed, 20 insertions(+), 9 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] t8003: check output of coalesced blame
  2020-08-13  5:20           ` [PATCH 0/3] blame: fix bug in coalescing non-adjacent "-L" chunks Jeff King
@ 2020-08-13  5:23             ` Jeff King
  2020-08-13 17:04               ` Junio C Hamano
  2020-08-13  5:23             ` [PATCH 2/3] t8003: factor setup out of coalesce test Jeff King
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2020-08-13  5:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Barret Rhoden, Nuthan Munaiah, git

Commit f0cbe742f4 (blame: add a test to cover blame_coalesce(),
2019-06-20) added a test case where blame can usefully coalesce two
groups of lines. But since it relies on the normal blame output, it only
exercises the code and can't tell whether the lines were actually
joined into a single group.

However, by using --porcelain output, we can see how git-blame considers
the groupings (and likewise how the coalescing might have a real
user-visible impact for a tool that uses the porcelain-output
groupings). This lets us confirm that we are indeed coalescing correctly
(and the fact that this test case requires coalescing can be verified by
dropping the call to blame_coalesce(), causing the test to fail).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t8003-blame-corner-cases.sh | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/t/t8003-blame-corner-cases.sh b/t/t8003-blame-corner-cases.sh
index b871dd4f86..1e89494ef6 100755
--- a/t/t8003-blame-corner-cases.sh
+++ b/t/t8003-blame-corner-cases.sh
@@ -273,10 +273,6 @@ test_expect_success 'blame file with CRLF core.autocrlf=true' '
 	grep "A U Thor" actual
 '
 
-# Tests the splitting and merging of blame entries in blame_coalesce().
-# The output of blame is the same, regardless of whether blame_coalesce() runs
-# or not, so we'd likely only notice a problem if blame crashes or assigned
-# blame to the "splitting" commit ('SPLIT' below).
 test_expect_success 'blame coalesce' '
 	cat >giraffe <<-\EOF &&
 	ABC
@@ -302,10 +298,11 @@ test_expect_success 'blame coalesce' '
 	git commit -m "same contents as original" &&
 
 	cat >expect <<-EOF &&
-	$oid 1) ABC
-	$oid 2) DEF
+	$oid 1 1 2
+	$oid 2 2
 	EOF
-	git -c core.abbrev=$(test_oid hexsz) blame -s giraffe >actual &&
+	git blame --porcelain giraffe >actual.raw &&
+	grep "^$oid" actual.raw >actual &&
 	test_cmp expect actual
 '
 
-- 
2.28.0.570.ge2c3ee08e4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] t8003: factor setup out of coalesce test
  2020-08-13  5:20           ` [PATCH 0/3] blame: fix bug in coalescing non-adjacent "-L" chunks Jeff King
  2020-08-13  5:23             ` [PATCH 1/3] t8003: check output of coalesced blame Jeff King
@ 2020-08-13  5:23             ` Jeff King
  2020-08-13  5:25             ` [PATCH 3/3] blame: only coalesce lines that are adjacent in result Jeff King
  2020-08-13 12:25             ` [PATCH 0/3] blame: fix bug in coalescing non-adjacent "-L" chunks Barret Rhoden
  3 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2020-08-13  5:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Barret Rhoden, Nuthan Munaiah, git

In preparation for adding more tests of blame's coalesce code, let's
split the setup out from the first test, and give each of the commits
a more meaningful name:

  - $orig for the original source that added the lines

  - $split for the version where they are split apart

  - $final for the final version that re-joins them

That's not strictly necessary, but makes the follow-on tests less
brittle than relying on HEAD^, etc, to name the commits.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t8003-blame-corner-cases.sh | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/t/t8003-blame-corner-cases.sh b/t/t8003-blame-corner-cases.sh
index 1e89494ef6..7f39a84289 100755
--- a/t/t8003-blame-corner-cases.sh
+++ b/t/t8003-blame-corner-cases.sh
@@ -273,14 +273,14 @@ test_expect_success 'blame file with CRLF core.autocrlf=true' '
 	grep "A U Thor" actual
 '
 
-test_expect_success 'blame coalesce' '
+test_expect_success 'setup coalesce tests' '
 	cat >giraffe <<-\EOF &&
 	ABC
 	DEF
 	EOF
 	git add giraffe &&
 	git commit -m "original file" &&
-	oid=$(git rev-parse HEAD) &&
+	orig=$(git rev-parse HEAD) &&
 
 	cat >giraffe <<-\EOF &&
 	ABC
@@ -289,20 +289,24 @@ test_expect_success 'blame coalesce' '
 	EOF
 	git add giraffe &&
 	git commit -m "interior SPLIT line" &&
+	split=$(git rev-parse HEAD) &&
 
 	cat >giraffe <<-\EOF &&
 	ABC
 	DEF
 	EOF
 	git add giraffe &&
 	git commit -m "same contents as original" &&
+	final=$(git rev-parse HEAD)
+'
 
+test_expect_success 'blame coalesce' '
 	cat >expect <<-EOF &&
-	$oid 1 1 2
-	$oid 2 2
+	$orig 1 1 2
+	$orig 2 2
 	EOF
-	git blame --porcelain giraffe >actual.raw &&
-	grep "^$oid" actual.raw >actual &&
+	git blame --porcelain $final giraffe >actual.raw &&
+	grep "^$orig" actual.raw >actual &&
 	test_cmp expect actual
 '
 
-- 
2.28.0.570.ge2c3ee08e4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] blame: only coalesce lines that are adjacent in result
  2020-08-13  5:20           ` [PATCH 0/3] blame: fix bug in coalescing non-adjacent "-L" chunks Jeff King
  2020-08-13  5:23             ` [PATCH 1/3] t8003: check output of coalesced blame Jeff King
  2020-08-13  5:23             ` [PATCH 2/3] t8003: factor setup out of coalesce test Jeff King
@ 2020-08-13  5:25             ` Jeff King
  2020-08-13 16:59               ` Junio C Hamano
  2020-08-13 18:41               ` Junio C Hamano
  2020-08-13 12:25             ` [PATCH 0/3] blame: fix bug in coalescing non-adjacent "-L" chunks Barret Rhoden
  3 siblings, 2 replies; 16+ messages in thread
From: Jeff King @ 2020-08-13  5:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Barret Rhoden, Nuthan Munaiah, git

After blame has finished but before we produce any output, we coalesce
groups of lines that were adjacent in the original suspect (which may
have been split apart by lines in intermediate commits which went away).
However, this can cause incorrect output if the lines are not also
adjacent in the result. For instance, the case in t8003 has:

  ABC
  DEF

which becomes

  ABC
  SPLIT
  DEF

Blaming only lines 1 and 3 in the result yields two blame groups (one
for each line) that were adjacent in the original. That's enough for us
to coalesce them into a single group, but that loses information: our
output routines assume they're adjacent in the result as well, and we
output:

  <oid> 1) ABC
  <oid> 2) SPLIT

This is nonsense for two reasons:

  - we were asked about line 3, not line 2; we should not output the
    SPLIT line at all

  - commit <oid> did not touch the SPLIT line at all! We found the
    correct blame for line 3, but the bug is actually in the output
    stage, which is showing the wrong line number and content from the
    final file.

We can fix this by only coalescing when both the suspect and result
lines are adjacent. That fixes this bug, but keeps coalescing in cases
where want it (e.g., the existing test in t8003 where SPLIT goes away,
and the lines really are adjacent in the result).

Reported-by: Nuthan Munaiah <nm6061@rit.edu>
Signed-off-by: Jeff King <peff@peff.net>
---
 blame.c                       | 1 +
 t/t8003-blame-corner-cases.sh | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/blame.c b/blame.c
index 82fa16d658..1be1cd82a2 100644
--- a/blame.c
+++ b/blame.c
@@ -1184,6 +1184,7 @@ void blame_coalesce(struct blame_scoreboard *sb)
 	for (ent = sb->ent; ent && (next = ent->next); ent = next) {
 		if (ent->suspect == next->suspect &&
 		    ent->s_lno + ent->num_lines == next->s_lno &&
+		    ent->lno + ent->num_lines == next->lno &&
 		    ent->ignored == next->ignored &&
 		    ent->unblamable == next->unblamable) {
 			ent->num_lines += next->num_lines;
diff --git a/t/t8003-blame-corner-cases.sh b/t/t8003-blame-corner-cases.sh
index 7f39a84289..ba8013b002 100755
--- a/t/t8003-blame-corner-cases.sh
+++ b/t/t8003-blame-corner-cases.sh
@@ -310,4 +310,13 @@ test_expect_success 'blame coalesce' '
 	test_cmp expect actual
 '
 
+test_expect_success 'blame does not coalesce non-adjacent result lines' '
+	cat >expect <<-EOF &&
+	$orig 1) ABC
+	$orig 3) DEF
+	EOF
+	git blame --no-abbrev -s -L1,1 -L3,3 $split giraffe >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.28.0.570.ge2c3ee08e4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] blame: fix bug in coalescing non-adjacent "-L" chunks
  2020-08-13  5:20           ` [PATCH 0/3] blame: fix bug in coalescing non-adjacent "-L" chunks Jeff King
                               ` (2 preceding siblings ...)
  2020-08-13  5:25             ` [PATCH 3/3] blame: only coalesce lines that are adjacent in result Jeff King
@ 2020-08-13 12:25             ` Barret Rhoden
  3 siblings, 0 replies; 16+ messages in thread
From: Barret Rhoden @ 2020-08-13 12:25 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Nuthan Munaiah, git

On 8/13/20 1:20 AM, Jeff King wrote:
> On Fri, Aug 07, 2020 at 06:35:22PM -0400, Jeff King wrote:
> 
>>>> I think the only conceivable case is that in the middle of a single
>>>> block of text in an ancient version, another block of lines gets
>>>> inserted during the evolution of the file, but in the end these
>>>> intermediate edits all go away and the same original text remains.
>>>>
>>>> In such a case, without coalescing, we would not treat the original
>>>> single block of text as a single unit.
>>>
>>> Yeah, that makes sense, and it should be possible to construct a case
>>> based on that.
> 
> I started to add a test for this, and it turns out we already had one!
> It just wasn't checking the output as carefully as it could. :)
> 
> So here's a series which actually checks that blame_coalesce() is doing
> something useful, and then fixes Nuthan's bug on top (with a test case,
> but I also confirmed it makes the original tomcat issue go away).
> 
>    [1/3]: t8003: check output of coalesced blame
>    [2/3]: t8003: factor setup out of coalesce test
>    [3/3]: blame: only coalesce lines that are adjacent in result
> 

thanks for the fixes, looks good to me.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] blame: only coalesce lines that are adjacent in result
  2020-08-13  5:25             ` [PATCH 3/3] blame: only coalesce lines that are adjacent in result Jeff King
@ 2020-08-13 16:59               ` Junio C Hamano
  2020-08-13 18:41               ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2020-08-13 16:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Barret Rhoden, Nuthan Munaiah, git

Jeff King <peff@peff.net> writes:

> diff --git a/blame.c b/blame.c
> index 82fa16d658..1be1cd82a2 100644
> --- a/blame.c
> +++ b/blame.c
> @@ -1184,6 +1184,7 @@ void blame_coalesce(struct blame_scoreboard *sb)
>  	for (ent = sb->ent; ent && (next = ent->next); ent = next) {
>  		if (ent->suspect == next->suspect &&
>  		    ent->s_lno + ent->num_lines == next->s_lno &&
> +		    ent->lno + ent->num_lines == next->lno &&
>  		    ent->ignored == next->ignored &&
>  		    ent->unblamable == next->unblamable) {
>  			ent->num_lines += next->num_lines;

When laid out like this, the correctness of the fix is quite obvious
and it is surprising that this bug has survived for all these years,
as it dates back to cee7f245 (git-pickaxe: blame rewritten.,
2006-10-19), the inception of the current "git blame".

Thanks.

> diff --git a/t/t8003-blame-corner-cases.sh b/t/t8003-blame-corner-cases.sh
> index 7f39a84289..ba8013b002 100755
> --- a/t/t8003-blame-corner-cases.sh
> +++ b/t/t8003-blame-corner-cases.sh
> @@ -310,4 +310,13 @@ test_expect_success 'blame coalesce' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'blame does not coalesce non-adjacent result lines' '
> +	cat >expect <<-EOF &&
> +	$orig 1) ABC
> +	$orig 3) DEF
> +	EOF
> +	git blame --no-abbrev -s -L1,1 -L3,3 $split giraffe >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] t8003: check output of coalesced blame
  2020-08-13  5:23             ` [PATCH 1/3] t8003: check output of coalesced blame Jeff King
@ 2020-08-13 17:04               ` Junio C Hamano
  2020-08-13 17:22                 ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2020-08-13 17:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Barret Rhoden, Nuthan Munaiah, git

Jeff King <peff@peff.net> writes:

> Commit f0cbe742f4 (blame: add a test to cover blame_coalesce(),
> 2019-06-20) added a test case where blame can usefully coalesce two
> groups of lines. But since it relies on the normal blame output, it only
> exercises the code and can't tell whether the lines were actually
> joined into a single group.
>
> However, by using --porcelain output, we can see how git-blame considers
> the groupings (and likewise how the coalescing might have a real
> user-visible impact for a tool that uses the porcelain-output
> groupings). This lets us confirm that we are indeed coalescing correctly
> (and the fact that this test case requires coalescing can be verified by
> dropping the call to blame_coalesce(), causing the test to fail).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t8003-blame-corner-cases.sh | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/t/t8003-blame-corner-cases.sh b/t/t8003-blame-corner-cases.sh
> index b871dd4f86..1e89494ef6 100755
> --- a/t/t8003-blame-corner-cases.sh
> +++ b/t/t8003-blame-corner-cases.sh
> @@ -273,10 +273,6 @@ test_expect_success 'blame file with CRLF core.autocrlf=true' '
>  	grep "A U Thor" actual
>  '
>  
> -# Tests the splitting and merging of blame entries in blame_coalesce().
> -# The output of blame is the same, regardless of whether blame_coalesce() runs
> -# or not, so we'd likely only notice a problem if blame crashes or assigned
> -# blame to the "splitting" commit ('SPLIT' below).
>  test_expect_success 'blame coalesce' '
>  	cat >giraffe <<-\EOF &&
>  	ABC
> @@ -302,10 +298,11 @@ test_expect_success 'blame coalesce' '
>  	git commit -m "same contents as original" &&
>  
>  	cat >expect <<-EOF &&
> -	$oid 1) ABC
> -	$oid 2) DEF
> +	$oid 1 1 2
> +	$oid 2 2
>  	EOF

It has become a bit harder to grok, but for the purpose of the later
steps to see where things exactly came from (including their line
numbers), it is easier to see what is going on with the new format.

> -	git -c core.abbrev=$(test_oid hexsz) blame -s giraffe >actual &&
> +	git blame --porcelain giraffe >actual.raw &&

The original forced the abbrev length; by switching to the format
for porcelain-writers, we know we will get the full object name.  OK.

> +	grep "^$oid" actual.raw >actual &&
>  	test_cmp expect actual
>  '

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] t8003: check output of coalesced blame
  2020-08-13 17:04               ` Junio C Hamano
@ 2020-08-13 17:22                 ` Jeff King
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2020-08-13 17:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Barret Rhoden, Nuthan Munaiah, git

On Thu, Aug 13, 2020 at 10:04:51AM -0700, Junio C Hamano wrote:

> >  	cat >expect <<-EOF &&
> > -	$oid 1) ABC
> > -	$oid 2) DEF
> > +	$oid 1 1 2
> > +	$oid 2 2
> >  	EOF
> 
> It has become a bit harder to grok, but for the purpose of the later
> steps to see where things exactly came from (including their line
> numbers), it is easier to see what is going on with the new format.

Yeah, I agree the numbers are a bit more inscrutable. We could do
something like:

diff --git a/t/t8003-blame-corner-cases.sh b/t/t8003-blame-corner-cases.sh
index ba8013b002..9486888e5a 100755
--- a/t/t8003-blame-corner-cases.sh
+++ b/t/t8003-blame-corner-cases.sh
@@ -301,12 +301,14 @@ test_expect_success 'setup coalesce tests' '
 '
 
 test_expect_success 'blame coalesce' '
-	cat >expect <<-EOF &&
+	q_to_tab >expect <<-EOF &&
 	$orig 1 1 2
+	QABC
 	$orig 2 2
+	QDEF
 	EOF
 	git blame --porcelain $final giraffe >actual.raw &&
-	grep "^$orig" actual.raw >actual &&
+	egrep "^($orig|	)" actual.raw >actual &&
 	test_cmp expect actual
 '

but IMHO that isn't much more readable (largely due to the
tab-handling in the here-doc).

-Peff

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] blame: only coalesce lines that are adjacent in result
  2020-08-13  5:25             ` [PATCH 3/3] blame: only coalesce lines that are adjacent in result Jeff King
  2020-08-13 16:59               ` Junio C Hamano
@ 2020-08-13 18:41               ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2020-08-13 18:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Barret Rhoden, Nuthan Munaiah, git

Jeff King <peff@peff.net> writes:

> +test_expect_success 'blame does not coalesce non-adjacent result lines' '
> +	cat >expect <<-EOF &&
> +	$orig 1) ABC
> +	$orig 3) DEF
> +	EOF
> +	git blame --no-abbrev -s -L1,1 -L3,3 $split giraffe >actual &&

I like this much better than "git -c core.abbrev=40", too.  Good.

> +	test_cmp expect actual
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-08-13 18:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-07  1:05 `git blame` Line Number Off-by-one Nuthan Munaiah
2020-08-07 21:21 ` Jeff King
2020-08-07 21:33   ` Jeff King
2020-08-07 21:45     ` Jeff King
2020-08-07 22:05     ` Junio C Hamano
2020-08-07 22:26       ` Jeff King
2020-08-07 22:35         ` Jeff King
2020-08-13  5:20           ` [PATCH 0/3] blame: fix bug in coalescing non-adjacent "-L" chunks Jeff King
2020-08-13  5:23             ` [PATCH 1/3] t8003: check output of coalesced blame Jeff King
2020-08-13 17:04               ` Junio C Hamano
2020-08-13 17:22                 ` Jeff King
2020-08-13  5:23             ` [PATCH 2/3] t8003: factor setup out of coalesce test Jeff King
2020-08-13  5:25             ` [PATCH 3/3] blame: only coalesce lines that are adjacent in result Jeff King
2020-08-13 16:59               ` Junio C Hamano
2020-08-13 18:41               ` Junio C Hamano
2020-08-13 12:25             ` [PATCH 0/3] blame: fix bug in coalescing non-adjacent "-L" chunks Barret Rhoden

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).