* Lines missing from git diff-tree -p -c output? @ 2013-05-15 14:35 Matthijs Kooijman 2013-05-15 15:46 ` Matthijs Kooijman 2013-05-15 17:17 ` Junio C Hamano 0 siblings, 2 replies; 8+ messages in thread From: Matthijs Kooijman @ 2013-05-15 14:35 UTC (permalink / raw) To: git Hi folks, while trying to parse git diff-tree output, I found out that in some cases it appears to generate an incorrect diff (AFAICT). I orginally found this in a 5-way merge commit in the Linux kernel, but managed to reduce this to something a lot more managable (an ordinary 2-way merge on a 6-line file). To start with the wrong-ness, this is the diff generated: $ git diff-tree -p -c HEAD d945a51b6ca22e6e8e550c53980d026f11b05158 diff --combined file index 3404f54,0eab113..e8c8c18 --- a/file +++ b/file @@@ -1,7 -1,5 +1,6 @@@ +LEFT BASE2 BASE3 BASE4 - BASE5 + BASE5MODIFIED BASE6 Here, the header claims that the first head has 7 lines, but there really are only 6 (5 lines of context and one delete line). The numbers for the others heads are incorrect. In the original diff, the difference was bigger (first head was stated to have 28 lines, while the output was similar to the above). To find out what's going on, we can look at the -m output, which is correct (or look at the original file contents at the end of this mail). $ git diff-tree -m -p HEAD d945a51b6ca22e6e8e550c53980d026f11b05158 diff --git a/file b/file index 3404f54..e8c8c18 100644 --- a/file +++ b/file @@ -1,7 +1,6 @@ LEFT -BASE1 BASE2 BASE3 BASE4 -BASE5 +BASE5MODIFIED BASE6 d945a51b6ca22e6e8e550c53980d026f11b05158 diff --git a/file b/file index 0eab113..e8c8c18 100644 --- a/file +++ b/file @@ -1,3 +1,4 @@ +LEFT BASE2 BASE3 BASE4 As you can see here, first head added "LEFT", and the second head removed "BASE1" and modified "BASE5". In the -c diff-tree output above, this removal of "BASE1" is not shown, but it is counted in the number of lines, causing this breakage. Note that to trigger this behaviour, the number of context lines between the BASE1 and BASE5 must be _exactly_ 3, more or less prevents this bug from occuring. Also, the "LEFT" line introduced does not seem to be essential, but there needed to be some change from both sides in order to generate a diff at all. I haven't looked into the code, though I might give that a go later. Anyone got any clue why this is happening? Is this really a bug, or am I misunderstanding here? To recreate the above situation, you can use the following commands: git init cat > file <<EOF BASE1 BASE2 BASE3 BASE4 BASE5 BASE6 EOF git add file git commit -m BASE git checkout -b RIGHT cat > file <<EOF BASE2 BASE3 BASE4 BASE5MODIFIED BASE6 EOF git commit -m RIGHT file git checkout -b LEFT master cat > file <<EOF LEFT BASE1 BASE2 BASE3 BASE4 BASE5 BASE6 EOF git commit -m LEFT file git merge RIGHT cat > file <<EOF LEFT BASE2 BASE3 BASE4 BASE5MODIFIED BASE6 EOF git add file git commit --no-edit git diff-tree -p -c HEAD Gr. Matthijs ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Lines missing from git diff-tree -p -c output? 2013-05-15 14:35 Lines missing from git diff-tree -p -c output? Matthijs Kooijman @ 2013-05-15 15:46 ` Matthijs Kooijman 2013-05-15 17:17 ` Junio C Hamano 1 sibling, 0 replies; 8+ messages in thread From: Matthijs Kooijman @ 2013-05-15 15:46 UTC (permalink / raw) To: git Hi folks, > $ git diff-tree -p -c HEAD > d945a51b6ca22e6e8e550c53980d026f11b05158 > diff --combined file > index 3404f54,0eab113..e8c8c18 > --- a/file > +++ b/file > @@@ -1,7 -1,5 +1,6 @@@ > +LEFT > BASE2 > BASE3 > BASE4 > - BASE5 > + BASE5MODIFIED > BASE6 I found the spot in the code where this is going wrong, there is an incorrectly set "no_pre_delete" flag for the context lines before each hunk. Since a patch says more than a thousand words, here's what I think will fix this problem: diff --git a/combine-diff.c b/combine-diff.c index 77d7872..d36bfcf 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -518,8 +518,11 @@ static int give_context(struct sline *sline, unsigned long cnt, int num_parent) unsigned long k; /* Paint a few lines before the first interesting line. */ - while (j < i) - sline[j++].flag |= mark | no_pre_delete; + while (j < i) { + if (!(sline[j++].flag & mark)) + sline[j++].flag |= no_pre_delete; + sline[j++].flag |= mark; + } again: /* we know up to i is to be included. where does the I'll see if I can write up a testcase and then submit this as a proper patch, but I wanted to at least send this over now lest someone wastes time coming to the same conclusion as I did. Gr. Matthijs ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Lines missing from git diff-tree -p -c output? 2013-05-15 14:35 Lines missing from git diff-tree -p -c output? Matthijs Kooijman 2013-05-15 15:46 ` Matthijs Kooijman @ 2013-05-15 17:17 ` Junio C Hamano 2013-05-15 17:33 ` Matthijs Kooijman 1 sibling, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2013-05-15 17:17 UTC (permalink / raw) To: Matthijs Kooijman; +Cc: git Matthijs Kooijman <matthijs@stdin.nl> writes: > $ git diff-tree -p -c HEAD > d945a51b6ca22e6e8e550c53980d026f11b05158 > diff --combined file > index 3404f54,0eab113..e8c8c18 > --- a/file > +++ b/file > @@@ -1,7 -1,5 +1,6 @@@ > +LEFT > BASE2 > BASE3 > BASE4 > - BASE5 > + BASE5MODIFIED > BASE6 > > Here, the header claims that the first head has 7 lines, but there really are > only 6 (5 lines of context and one delete line). The numbers for the others > heads are incorrect. In the original diff, the difference was bigger > (first head was stated to have 28 lines, while the output was similar to > the above). The count and the output does look inconsistent. The hunk header claims that it is showing: - range 1,7 for the first parent but it should be 1,5 (2, 3, 4, 5 and 6) to match the output. - range 1,5 for the second parent (left, 2, 3, 4, 5mod, and 6 -- correct) - range 1,6 for the result (left, 2, 3, 4, 5mod and 6 -- correct) If we resurrect the loss of "BASE1" from the output, then the output should have shown: +LEFT - BASE1 BASE2 BASE3 BASE4 - BASE5 + BASE5MODIFIED BASE6 which means the numbers shown for the first parent (1, 2, 3, 4, 5 and 6) should be 1,6. > Note that to trigger this behaviour, the number of context lines between the > BASE1 and BASE5 must be _exactly_ 3, more or less prevents this bug from > occuring. I think the coalescing of two adjacent hunks into one is painting leading lines "interesting to show context but not worth showing deletion before it" incorrectly. Does this patch fix the issue? combine-diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/combine-diff.c b/combine-diff.c index 77d7872..7359b84 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -533,7 +533,7 @@ static int give_context(struct sline *sline, unsigned long cnt, int num_parent) k = find_next(sline, mark, j, cnt, 0); j = adjust_hunk_tail(sline, all_mask, i, j); - if (k < j + context) { + if (k <= j + context) { /* k is interesting and [j,k) are not, but * paint them interesting because the gap is small. */ ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Lines missing from git diff-tree -p -c output? 2013-05-15 17:17 ` Junio C Hamano @ 2013-05-15 17:33 ` Matthijs Kooijman 2013-05-15 17:42 ` [PATCH] combine-diff.c: Fix output when changes are exactly 3 lines apart Matthijs Kooijman 2013-05-15 17:48 ` Lines missing from git diff-tree -p -c output? Junio C Hamano 0 siblings, 2 replies; 8+ messages in thread From: Matthijs Kooijman @ 2013-05-15 17:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi Junio, > I think the coalescing of two adjacent hunks into one is painting > leading lines "interesting to show context but not worth showing > deletion before it" incorrectly. Yup, that seems to be the case. > Does this patch fix the issue? Yes, it fixes the issue. However, I think that this patch actually hides the real problem (in a way that will always work with the current code, though). I had come up with a different fix myself (similar to the one I sent to the list as a followup, but that one still had a bug), which I think might be better. In any case, it includes a testcase for this bug which seems good to include. I'll send my patch as a followup in a minute, feel free to use it entirely or only partially. Gr. Matthijs ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] combine-diff.c: Fix output when changes are exactly 3 lines apart 2013-05-15 17:33 ` Matthijs Kooijman @ 2013-05-15 17:42 ` Matthijs Kooijman 2013-05-15 17:48 ` Lines missing from git diff-tree -p -c output? Junio C Hamano 1 sibling, 0 replies; 8+ messages in thread From: Matthijs Kooijman @ 2013-05-15 17:42 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Matthijs Kooijman When a deletion is followed by exactly 3 (or whatever the number of context lines) unchanged lines, followed by another change, the combined diff output would hide the first deletion, resulting in a malformed diff. This happened because the 3 lines before each change are painted interesting, but also marked as no_pre_delete to prevent showing deletes that were previously marked as uninteresting. This behaviour was introduced in c86fbe53 (diff -c/--cc: do not include uninteresting deletion before leading context). However, as a side effect, this could also mark deletes that were already interesting as no_pre_delete. This would happen only if the delete was exactly 3 lines away from the next change, since lines farther away would not be touched by the "paint three lines before the change" code and lines closer would be painted by the "merge two adjacent hunks" code instead, which does not set the no_pre_delete flag. This commit fixes this problem by only setting the no_pre_delete flag for changes that were previously uninteresting. Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl> --- combine-diff.c | 7 +++++-- t/t4038-diff-combined.sh | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 52 insertions(+), 2 deletions(-) diff --git a/combine-diff.c b/combine-diff.c index 77d7872..3e8bb17 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -518,8 +518,11 @@ static int give_context(struct sline *sline, unsigned long cnt, int num_parent) unsigned long k; /* Paint a few lines before the first interesting line. */ - while (j < i) - sline[j++].flag |= mark | no_pre_delete; + while (j < i) { + if (!(sline[j].flag & mark)) + sline[j].flag |= no_pre_delete; + sline[j++].flag |= mark; + } again: /* we know up to i is to be included. where does the diff --git a/t/t4038-diff-combined.sh b/t/t4038-diff-combined.sh index 1261dbb..a23ca7e 100755 --- a/t/t4038-diff-combined.sh +++ b/t/t4038-diff-combined.sh @@ -353,4 +353,51 @@ test_expect_failure 'combine diff coalesce three parents' ' compare_diff_patch expected actual ' +# Test for a bug reported at +# http://thread.gmane.org/gmane.comp.version-control.git/224410 +# where a delete lines were missing from combined diff output when they +# occurred exactly before the context lines of a later change. +test_expect_success 'combine diff missing delete bug' ' + git commit -m initial --allow-empty && + cat <<-\EOF >test && + 1 + 2 + 3 + 4 + EOF + git add test + git commit -a -m side1 && + git checkout -B side1 && + git checkout HEAD^ && + cat <<-\EOF >test && + 0 + 1 + 2 + 3 + 4modified + EOF + git commit -a -m side2 && + git branch -f side2 && + test_must_fail git merge --no-commit side1 && + cat <<-\EOF >test && + 1 + 2 + 3 + 4modified + EOF + git add test && + git commit -a -m merge && + git diff-tree -c -p HEAD >actual.tmp && + sed -e "1,/^@@@/d" < actual.tmp >actual && + tr -d Q <<-\EOF >expected && + - 0 + 1 + 2 + 3 + -4 + +4modified + EOF + compare_diff_patch expected actual +' + test_done -- 1.8.3.rc1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Lines missing from git diff-tree -p -c output? 2013-05-15 17:33 ` Matthijs Kooijman 2013-05-15 17:42 ` [PATCH] combine-diff.c: Fix output when changes are exactly 3 lines apart Matthijs Kooijman @ 2013-05-15 17:48 ` Junio C Hamano 2013-05-15 18:17 ` Matthijs Kooijman 1 sibling, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2013-05-15 17:48 UTC (permalink / raw) To: Matthijs Kooijman; +Cc: git Matthijs Kooijman <matthijs@stdin.nl> writes: > Hi Junio, > >> I think the coalescing of two adjacent hunks into one is painting >> leading lines "interesting to show context but not worth showing >> deletion before it" incorrectly. > Yup, that seems to be the case. > >> Does this patch fix the issue? > > Yes, it fixes the issue. However, I think that this patch actually hides > the real problem (in a way that will always work with the current code, > though). Could you explain why you think it hides the real problem, and what kind of future enhancement may break it? This is *not* my usual rhetorical question "Please explain yourself, because I think you are wrong", but is "I do not understand the reasoning behind your statement, and I (and the reasoning behind my patch) must be missing something important, so please enlighten me by pointing out where I am wrong, so that I won't stick to my flawed patch". The painting with no_pre_delete is applied when we extend the common context back to lines we _know_ otherwise not worth showing (because there is no difference) only because we want to show them as the context lines and we do not need to show deletions that come before these common context. By forcing (k == j + context) case, that is, there are exactly "context" number of lines between the end of the current hunk and the next hunk, which the old code would have showed "context" lines at the beginning of the next hunk, to go back to the "again" label, we are coalescing the two hunks that _should_ have been shown together anyway, without painting the context lines incorrectly with "before this line, do not show deletion" mark. > I had come up with a different fix myself (similar to the one I sent to > the list as a followup, but that one still had a bug), which I think > might be better. In any case, it includes a testcase for this bug which > seems good to include. > > I'll send my patch as a followup in a minute, feel free to use it > entirely or only partially. > > Gr. > > Matthijs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Lines missing from git diff-tree -p -c output? 2013-05-15 17:48 ` Lines missing from git diff-tree -p -c output? Junio C Hamano @ 2013-05-15 18:17 ` Matthijs Kooijman 2013-05-15 19:13 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Matthijs Kooijman @ 2013-05-15 18:17 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi Junio, > Could you explain why you think it hides the real problem, and what > kind of future enhancement may break it? I think the differences is mostly in the locality of the fix. In my proposed patch, the no_pre_delete flag is never set on an interesting line because it is checked in the line before it. In your patch, it never happens because the control flow guarantees the "context" lines before each change must be uninteresting. The net effect is of course identical, but I'm arguing that depending on the control flow and some code a doze lines down is easier to break than depending on a previous line. Having said that: I'm not sure if the difference is significant enough to convince me in either direction. However, thinking about this a bit more (and getting sidetracked on a completely separate issue/question), I wonder why the coalescing-hunks code is there in the first place? e.g., why not leave out these lines? if (k < j + context) { /* k is interesting and [j,k) are not, but * paint them interesting because the gap is small. */ while (j < k) sline[j++].flag |= mark; i = k; goto again; } If the "context" lines before and after each group of changes are painted interesting, then these lines in between will also be painted interesting. Of course, this could cause some lines to be painted as interesting twice and it needs my fix for the no_pre_delete thing, but it would work just as well? However, I can imagine that this code is present to prevent painting lines twice, which would of course be a bit of a performance loss. But if this really was the motivation, why is the first if not something like: if (k <= j + 2 * context) { Since IIUC, the current code can still paint a few context lines twice when they are exacly "context" lines apart, once by the "paint before" and one by the "paint after" code (which is also what happens in my bug example, I think). The above should "fix" that as well (the first part of the test suite hasn't complained so far). Gr. Matthijs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Lines missing from git diff-tree -p -c output? 2013-05-15 18:17 ` Matthijs Kooijman @ 2013-05-15 19:13 ` Junio C Hamano 0 siblings, 0 replies; 8+ messages in thread From: Junio C Hamano @ 2013-05-15 19:13 UTC (permalink / raw) To: Matthijs Kooijman; +Cc: git Matthijs Kooijman <matthijs@stdin.nl> writes: >> Could you explain why you think it hides the real problem, and what >> kind of future enhancement may break it? > I think the differences is mostly in the locality of the fix. In my > proposed patch, the no_pre_delete flag is never set on an interesting > line because it is checked in the line before it. In your patch, it > never happens because the control flow guarantees the "context" lines > before each change must be uninteresting. > > The net effect is of course identical, but I'm arguing that depending on > the control flow and some code a doze lines down is easier to break than > depending on a previous line. Yeah, that sounds like a reasonable reasoning. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-05-15 19:13 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-05-15 14:35 Lines missing from git diff-tree -p -c output? Matthijs Kooijman 2013-05-15 15:46 ` Matthijs Kooijman 2013-05-15 17:17 ` Junio C Hamano 2013-05-15 17:33 ` Matthijs Kooijman 2013-05-15 17:42 ` [PATCH] combine-diff.c: Fix output when changes are exactly 3 lines apart Matthijs Kooijman 2013-05-15 17:48 ` Lines missing from git diff-tree -p -c output? Junio C Hamano 2013-05-15 18:17 ` Matthijs Kooijman 2013-05-15 19:13 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).