git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Lines missing from git diff-tree -p -c output?
@ 2013-05-15 14:35 Matthijs Kooijman
  2013-05-15 15:46 ` Matthijs Kooijman
  2013-05-15 17:17 ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Matthijs Kooijman @ 2013-05-15 14:35 UTC (permalink / raw)
  To: git

Hi folks,

while trying to parse git diff-tree output, I found out that in some
cases it appears to generate an incorrect diff (AFAICT). I orginally
found this in a 5-way merge commit in the Linux kernel, but managed to
reduce this to something a lot more managable (an ordinary 2-way merge
on a 6-line file).

To start with the wrong-ness, this is the diff generated:

$ git diff-tree -p -c HEAD
d945a51b6ca22e6e8e550c53980d026f11b05158
diff --combined file
index 3404f54,0eab113..e8c8c18
--- a/file
+++ b/file
@@@ -1,7 -1,5 +1,6 @@@
 +LEFT
  BASE2
  BASE3
  BASE4
- BASE5
+ BASE5MODIFIED
  BASE6

Here, the header claims that the first head has 7 lines, but there really are
only 6 (5 lines of context and one delete line). The numbers for the others
heads are incorrect. In the original diff, the difference was bigger
(first head was stated to have 28 lines, while the output was similar to
the above).

To find out what's going on, we can look at the -m output, which is
correct (or look at the original file contents at the end of this mail).

$ git diff-tree -m -p HEAD
d945a51b6ca22e6e8e550c53980d026f11b05158
diff --git a/file b/file
index 3404f54..e8c8c18 100644
--- a/file
+++ b/file
@@ -1,7 +1,6 @@
 LEFT
-BASE1
 BASE2
 BASE3
 BASE4
-BASE5
+BASE5MODIFIED
 BASE6
d945a51b6ca22e6e8e550c53980d026f11b05158
diff --git a/file b/file
index 0eab113..e8c8c18 100644
--- a/file
+++ b/file
@@ -1,3 +1,4 @@
+LEFT
 BASE2
 BASE3
 BASE4

As you can see here, first head added "LEFT", and the second head removed
"BASE1" and modified "BASE5". In the -c diff-tree output above, this removal of
"BASE1" is not shown, but it is counted in the number of lines, causing this
breakage.


Note that to trigger this behaviour, the number of context lines between the
BASE1 and BASE5 must be _exactly_ 3, more or less prevents this bug from
occuring. Also, the "LEFT" line introduced does not seem to be
essential, but there needed to be some change from both sides in order
to generate a diff at all.

I haven't looked into the code, though I might give that a go later.
Anyone got any clue why this is happening? Is this really a bug, or am I
misunderstanding here?

To recreate the above situation, you can use the following commands:

git init
cat > file <<EOF
BASE1
BASE2
BASE3
BASE4
BASE5
BASE6
EOF
git add file
git commit -m BASE
git checkout -b RIGHT
cat > file <<EOF
BASE2
BASE3
BASE4
BASE5MODIFIED
BASE6
EOF
git commit -m RIGHT file
git checkout -b LEFT master
cat > file <<EOF
LEFT
BASE1
BASE2
BASE3
BASE4
BASE5
BASE6
EOF
git commit -m LEFT file
git merge RIGHT
cat > file <<EOF
LEFT
BASE2
BASE3
BASE4
BASE5MODIFIED
BASE6
EOF
git add file
git commit --no-edit
git diff-tree -p -c HEAD


Gr.

Matthijs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Lines missing from git diff-tree -p -c output?
  2013-05-15 14:35 Lines missing from git diff-tree -p -c output? Matthijs Kooijman
@ 2013-05-15 15:46 ` Matthijs Kooijman
  2013-05-15 17:17 ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Matthijs Kooijman @ 2013-05-15 15:46 UTC (permalink / raw)
  To: git

Hi folks,

> $ git diff-tree -p -c HEAD
> d945a51b6ca22e6e8e550c53980d026f11b05158
> diff --combined file
> index 3404f54,0eab113..e8c8c18
> --- a/file
> +++ b/file
> @@@ -1,7 -1,5 +1,6 @@@
>  +LEFT
>   BASE2
>   BASE3
>   BASE4
> - BASE5
> + BASE5MODIFIED
>   BASE6

I found the spot in the code where this is going wrong, there is an
incorrectly set "no_pre_delete" flag for the context lines before each
hunk. Since a patch says more than a thousand words, here's what I think
will fix this problem:

diff --git a/combine-diff.c b/combine-diff.c
index 77d7872..d36bfcf 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -518,8 +518,11 @@ static int give_context(struct sline *sline, unsigned long cnt, int num_parent)
                unsigned long k;
 
                /* Paint a few lines before the first interesting line. */
-               while (j < i)
-                       sline[j++].flag |= mark | no_pre_delete;
+               while (j < i) {
+                       if (!(sline[j++].flag & mark))
+                               sline[j++].flag |= no_pre_delete;
+                       sline[j++].flag |= mark;
+               }
 
        again:
                /* we know up to i is to be included.  where does the

I'll see if I can write up a testcase and then submit this as a proper
patch, but I wanted to at least send this over now lest someone wastes
time coming to the same conclusion as I did.

Gr.

Matthijs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Lines missing from git diff-tree -p -c output?
  2013-05-15 14:35 Lines missing from git diff-tree -p -c output? Matthijs Kooijman
  2013-05-15 15:46 ` Matthijs Kooijman
@ 2013-05-15 17:17 ` Junio C Hamano
  2013-05-15 17:33   ` Matthijs Kooijman
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2013-05-15 17:17 UTC (permalink / raw)
  To: Matthijs Kooijman; +Cc: git

Matthijs Kooijman <matthijs@stdin.nl> writes:

> $ git diff-tree -p -c HEAD
> d945a51b6ca22e6e8e550c53980d026f11b05158
> diff --combined file
> index 3404f54,0eab113..e8c8c18
> --- a/file
> +++ b/file
> @@@ -1,7 -1,5 +1,6 @@@
>  +LEFT
>   BASE2
>   BASE3
>   BASE4
> - BASE5
> + BASE5MODIFIED
>   BASE6
>
> Here, the header claims that the first head has 7 lines, but there really are
> only 6 (5 lines of context and one delete line). The numbers for the others
> heads are incorrect. In the original diff, the difference was bigger
> (first head was stated to have 28 lines, while the output was similar to
> the above).

The count and the output does look inconsistent.  The hunk header
claims that it is showing:

 - range 1,7 for the first parent but it should be 1,5 (2, 3, 4, 5 and 6) 
   to match the output.
 - range 1,5 for the second parent (left, 2, 3, 4, 5mod, and 6 -- correct)
 - range 1,6 for the result (left, 2, 3, 4, 5mod and 6 -- correct)

If we resurrect the loss of "BASE1" from the output, then the
output should have shown:

  +LEFT
 - BASE1
   BASE2
   BASE3
   BASE4
 - BASE5
 + BASE5MODIFIED
   BASE6

which means the numbers shown for the first parent (1, 2, 3, 4, 5
and 6) should be 1,6.

> Note that to trigger this behaviour, the number of context lines between the
> BASE1 and BASE5 must be _exactly_ 3, more or less prevents this bug from
> occuring.

I think the coalescing of two adjacent hunks into one is painting
leading lines "interesting to show context but not worth showing
deletion before it" incorrectly.

Does this patch fix the issue?

 combine-diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/combine-diff.c b/combine-diff.c
index 77d7872..7359b84 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -533,7 +533,7 @@ static int give_context(struct sline *sline, unsigned long cnt, int num_parent)
 		k = find_next(sline, mark, j, cnt, 0);
 		j = adjust_hunk_tail(sline, all_mask, i, j);
 
-		if (k < j + context) {
+		if (k <= j + context) {
 			/* k is interesting and [j,k) are not, but
 			 * paint them interesting because the gap is small.
 			 */

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Lines missing from git diff-tree -p -c output?
  2013-05-15 17:17 ` Junio C Hamano
@ 2013-05-15 17:33   ` Matthijs Kooijman
  2013-05-15 17:42     ` [PATCH] combine-diff.c: Fix output when changes are exactly 3 lines apart Matthijs Kooijman
  2013-05-15 17:48     ` Lines missing from git diff-tree -p -c output? Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Matthijs Kooijman @ 2013-05-15 17:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

> I think the coalescing of two adjacent hunks into one is painting
> leading lines "interesting to show context but not worth showing
> deletion before it" incorrectly.
Yup, that seems to be the case.

> Does this patch fix the issue?

Yes, it fixes the issue. However, I think that this patch actually hides
the real problem (in a way that will always work with the current code,
though).

I had come up with a different fix myself (similar to the one I sent to
the list as a followup, but that one still had a bug), which I think
might be better. In any case, it includes a testcase for this bug which
seems good to include.

I'll send my patch as a followup in a minute, feel free to use it
entirely or only partially.

Gr.

Matthijs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] combine-diff.c: Fix output when changes are exactly 3 lines apart
  2013-05-15 17:33   ` Matthijs Kooijman
@ 2013-05-15 17:42     ` Matthijs Kooijman
  2013-05-15 17:48     ` Lines missing from git diff-tree -p -c output? Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Matthijs Kooijman @ 2013-05-15 17:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Matthijs Kooijman

When a deletion is followed by exactly 3 (or whatever the number of
context lines) unchanged lines, followed by another change, the combined
diff output would hide the first deletion, resulting in a malformed
diff.

This happened because the 3 lines before each change are painted
interesting, but also marked as no_pre_delete to prevent showing deletes
that were previously marked as uninteresting. This behaviour was
introduced in c86fbe53 (diff -c/--cc: do not include uninteresting
deletion before leading context). However, as a side effect, this could
also mark deletes that were already interesting as no_pre_delete. This
would happen only if the delete was exactly 3 lines away from the next
change, since lines farther away would not be touched by the "paint
three lines before the change" code and lines closer would be painted
by the "merge two adjacent hunks" code instead, which does not set the
no_pre_delete flag.

This commit fixes this problem by only setting the no_pre_delete flag
for changes that were previously uninteresting.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
---
 combine-diff.c           |  7 +++++--
 t/t4038-diff-combined.sh | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 77d7872..3e8bb17 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -518,8 +518,11 @@ static int give_context(struct sline *sline, unsigned long cnt, int num_parent)
 		unsigned long k;
 
 		/* Paint a few lines before the first interesting line. */
-		while (j < i)
-			sline[j++].flag |= mark | no_pre_delete;
+		while (j < i) {
+			if (!(sline[j].flag & mark))
+				sline[j].flag |= no_pre_delete;
+			sline[j++].flag |= mark;
+		}
 
 	again:
 		/* we know up to i is to be included.  where does the
diff --git a/t/t4038-diff-combined.sh b/t/t4038-diff-combined.sh
index 1261dbb..a23ca7e 100755
--- a/t/t4038-diff-combined.sh
+++ b/t/t4038-diff-combined.sh
@@ -353,4 +353,51 @@ test_expect_failure 'combine diff coalesce three parents' '
 	compare_diff_patch expected actual
 '
 
+# Test for a bug reported at
+# http://thread.gmane.org/gmane.comp.version-control.git/224410
+# where a delete lines were missing from combined diff output when they
+# occurred exactly before the context lines of a later change.
+test_expect_success 'combine diff missing delete bug' '
+	git commit -m initial --allow-empty &&
+	cat <<-\EOF >test &&
+	1
+	2
+	3
+	4
+	EOF
+	git add test
+	git commit -a -m side1 &&
+	git checkout -B side1 &&
+	git checkout HEAD^ &&
+	cat <<-\EOF >test &&
+	0
+	1
+	2
+	3
+	4modified
+	EOF
+	git commit -a -m side2 &&
+	git branch -f side2 &&
+	test_must_fail git merge --no-commit side1 &&
+	cat <<-\EOF >test &&
+	1
+	2
+	3
+	4modified
+	EOF
+	git add test &&
+	git commit -a -m merge &&
+	git diff-tree -c -p HEAD >actual.tmp &&
+	sed -e "1,/^@@@/d" < actual.tmp >actual &&
+	tr -d Q <<-\EOF >expected &&
+	- 0
+	  1
+	  2
+	  3
+	 -4
+	 +4modified
+	EOF
+	compare_diff_patch expected actual
+'
+
 test_done
-- 
1.8.3.rc1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Lines missing from git diff-tree -p -c output?
  2013-05-15 17:33   ` Matthijs Kooijman
  2013-05-15 17:42     ` [PATCH] combine-diff.c: Fix output when changes are exactly 3 lines apart Matthijs Kooijman
@ 2013-05-15 17:48     ` Junio C Hamano
  2013-05-15 18:17       ` Matthijs Kooijman
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2013-05-15 17:48 UTC (permalink / raw)
  To: Matthijs Kooijman; +Cc: git

Matthijs Kooijman <matthijs@stdin.nl> writes:

> Hi Junio,
>
>> I think the coalescing of two adjacent hunks into one is painting
>> leading lines "interesting to show context but not worth showing
>> deletion before it" incorrectly.
> Yup, that seems to be the case.
>
>> Does this patch fix the issue?
>
> Yes, it fixes the issue. However, I think that this patch actually hides
> the real problem (in a way that will always work with the current code,
> though).

Could you explain why you think it hides the real problem, and what
kind of future enhancement may break it?

This is *not* my usual rhetorical question "Please explain yourself,
because I think you are wrong", but is "I do not understand the
reasoning behind your statement, and I (and the reasoning behind my
patch) must be missing something important, so please enlighten me
by pointing out where I am wrong, so that I won't stick to my flawed
patch".

The painting with no_pre_delete is applied when we extend the common
context back to lines we _know_ otherwise not worth showing (because
there is no difference) only because we want to show them as the
context lines and we do not need to show deletions that come before
these common context.  By forcing (k == j + context) case, that is,
there are exactly "context" number of lines between the end of the
current hunk and the next hunk, which the old code would have showed
"context" lines at the beginning of the next hunk, to go back to the
"again" label, we are coalescing the two hunks that _should_ have
been shown together anyway, without painting the context lines
incorrectly with "before this line, do not show deletion" mark.

> I had come up with a different fix myself (similar to the one I sent to
> the list as a followup, but that one still had a bug), which I think
> might be better. In any case, it includes a testcase for this bug which
> seems good to include.
>
> I'll send my patch as a followup in a minute, feel free to use it
> entirely or only partially.
>
> Gr.
>
> Matthijs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Lines missing from git diff-tree -p -c output?
  2013-05-15 17:48     ` Lines missing from git diff-tree -p -c output? Junio C Hamano
@ 2013-05-15 18:17       ` Matthijs Kooijman
  2013-05-15 19:13         ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Matthijs Kooijman @ 2013-05-15 18:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

> Could you explain why you think it hides the real problem, and what
> kind of future enhancement may break it?
I think the differences is mostly in the locality of the fix. In my
proposed patch, the no_pre_delete flag is never set on an interesting
line because it is checked in the line before it. In your patch, it
never happens because the control flow guarantees the "context" lines
before each change must be uninteresting.

The net effect is of course identical, but I'm arguing that depending on
the control flow and some code a doze lines down is easier to break than
depending on a previous line.

Having said that: I'm not sure if the difference is significant enough
to convince me in either direction.



However, thinking about this a bit more (and getting sidetracked on a
completely separate issue/question), I wonder why the coalescing-hunks
code is there in the first place? e.g., why not leave out these lines?

	if (k < j + context) {
		/* k is interesting and [j,k) are not, but
		 * paint them interesting because the gap is small.
		 */
		while (j < k)
			sline[j++].flag |= mark;
		i = k;
		goto again;
	}

If the "context" lines before and after each group of changes are
painted interesting, then these lines in between will also be painted
interesting. Of course, this could cause some lines to be painted as
interesting twice and it needs my fix for the no_pre_delete thing, but
it would work just as well?

However, I can imagine that this code is present to prevent painting
lines twice, which would of course be a bit of a performance loss. But
if this really was the motivation, why is the first if not something
like:

	if (k <= j + 2 * context) {

Since IIUC, the current code can still paint a few context lines twice
when they are exacly "context" lines apart, once by the "paint before"
and one by the "paint after" code (which is also what happens in my bug
example, I think). The above should "fix" that as well (the first part
of the test suite hasn't complained so far).

Gr.

Matthijs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Lines missing from git diff-tree -p -c output?
  2013-05-15 18:17       ` Matthijs Kooijman
@ 2013-05-15 19:13         ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2013-05-15 19:13 UTC (permalink / raw)
  To: Matthijs Kooijman; +Cc: git

Matthijs Kooijman <matthijs@stdin.nl> writes:

>> Could you explain why you think it hides the real problem, and what
>> kind of future enhancement may break it?
> I think the differences is mostly in the locality of the fix. In my
> proposed patch, the no_pre_delete flag is never set on an interesting
> line because it is checked in the line before it. In your patch, it
> never happens because the control flow guarantees the "context" lines
> before each change must be uninteresting.
>
> The net effect is of course identical, but I'm arguing that depending on
> the control flow and some code a doze lines down is easier to break than
> depending on a previous line.

Yeah, that sounds like a reasonable reasoning.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-05-15 19:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-15 14:35 Lines missing from git diff-tree -p -c output? Matthijs Kooijman
2013-05-15 15:46 ` Matthijs Kooijman
2013-05-15 17:17 ` Junio C Hamano
2013-05-15 17:33   ` Matthijs Kooijman
2013-05-15 17:42     ` [PATCH] combine-diff.c: Fix output when changes are exactly 3 lines apart Matthijs Kooijman
2013-05-15 17:48     ` Lines missing from git diff-tree -p -c output? Junio C Hamano
2013-05-15 18:17       ` Matthijs Kooijman
2013-05-15 19:13         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).