* git-rebase produces incorrect output @ 2019-11-29 8:21 Pavel Roskin 2019-11-29 13:31 ` Philip Oakley 2019-11-30 4:22 ` Elijah Newren 0 siblings, 2 replies; 6+ messages in thread From: Pavel Roskin @ 2019-11-29 8:21 UTC (permalink / raw) To: git Hi! I've discovered an issue with "git rebase" producing a subtly incorrect file. In fact, that files even compiled but failed in unit tests! That's so scary that I'm going to stop using "git rebase" for now. Fortunately, "git rebase --merge" is working correctly, so I'll use it. Too bad there is no option to use "--merge" by default. The issue was observed in git 2.23 and reproduced in today's next branch (2.24.0.449.g4c06f74957) on up-to-date Fedora 31 x86_64. I've created a repository that demonstrates the issue: https://github.com/proski/git-rebase-demo The branch names should be self-explanatory. "master" is the base, "branch1" and "branch2" contain one change each. If "branch1" is rebased on top of "branch2", the result is incorrect, saved in the "rebase-bad" branch. If "git rebase -m" is used, the result is correct, saved in the "merge-good" branch. The files in "rebase-bad" and "merge-good" have exactly the same lines but in a different order. Yet the changes on branch1 and branch2 affect non-overlapping parts of the file. There should be no doubt how the merged code should look like. I believe the change on branch2 shifts the lines, so that the first change from branch1 is applies to a place below the intended location, and then git goes back to an earlier line to apply the next hunk. I can imagine that it would do the right thing in case of swapped blocks of code. Yet I have a real life example where it does a very wrong thing. Indeed, "git diff origin/branch2 origin/rebase-bad" and "git diff origin/branch2 origin/merge-good" both produce diffs of 9957 bytes long, different only in the order of the hunks. Another interesting data point - "git rebase --interactive" is working correctly. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-rebase produces incorrect output 2019-11-29 8:21 git-rebase produces incorrect output Pavel Roskin @ 2019-11-29 13:31 ` Philip Oakley 2019-11-29 21:28 ` Pavel Roskin 2019-11-30 4:22 ` Elijah Newren 1 sibling, 1 reply; 6+ messages in thread From: Philip Oakley @ 2019-11-29 13:31 UTC (permalink / raw) To: Pavel Roskin, git On 29/11/2019 08:21, Pavel Roskin wrote: > Hi! > > I've discovered an issue with "git rebase" producing a subtly > incorrect file. In fact, that files even compiled but failed in unit > tests! That's so scary that I'm going to stop using "git rebase" for > now. Fortunately, "git rebase --merge" is working correctly, so I'll > use it. Too bad there is no option to use "--merge" by default. > > The issue was observed in git 2.23 and reproduced in today's next > branch (2.24.0.449.g4c06f74957) on up-to-date Fedora 31 x86_64. > > I've created a repository that demonstrates the issue: > https://github.com/proski/git-rebase-demo > > The branch names should be self-explanatory. "master" is the base, > "branch1" and "branch2" contain one change each. If "branch1" is > rebased on top of "branch2", the result is incorrect, saved in the > "rebase-bad" branch. If "git rebase -m" is used, the result is > correct, saved in the "merge-good" branch. > > The files in "rebase-bad" and "merge-good" have exactly the same lines > but in a different order. Yet the changes on branch1 and branch2 > affect non-overlapping parts of the file. There should be no doubt how > the merged code should look like. > > I believe the change on branch2 shifts the lines, so that the first > change from branch1 is applies to a place below the intended location, > and then git goes back to an earlier line to apply the next hunk. I > can imagine that it would do the right thing in case of swapped blocks > of code. Yet I have a real life example where it does a very wrong > thing. > > Indeed, "git diff origin/branch2 origin/rebase-bad" and "git diff > origin/branch2 origin/merge-good" both produce diffs of 9957 bytes > long, different only in the order of the hunks. > > Another interesting data point - "git rebase --interactive" is working > correctly. > Which specific lines is this on? Using the Github compare facility [1], I see multiple changes, some of which are probably just noise from the example. https://github.com/proski/git-rebase-demo/compare/merge-good...rebase-bad Philip [1] https://help.github.com/en/github/committing-changes-to-your-project/comparing-commits-across-time ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-rebase produces incorrect output 2019-11-29 13:31 ` Philip Oakley @ 2019-11-29 21:28 ` Pavel Roskin 0 siblings, 0 replies; 6+ messages in thread From: Pavel Roskin @ 2019-11-29 21:28 UTC (permalink / raw) To: Philip Oakley; +Cc: git Hello Philip, I just found out that the true diff on Github should use two dots, not three: https://github.com/proski/git-rebase-demo/compare/merge-good..rebase-bad The first difference is on line 187 in getPullRequests_throws_on_not_found(). The correct change should have "nullValue" (see the diff between "master" and "branch1"), but "rebase-bad" has "SocketException" there instead. [Sorry for the private email that I sent by mistake] Pavel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-rebase produces incorrect output 2019-11-29 8:21 git-rebase produces incorrect output Pavel Roskin 2019-11-29 13:31 ` Philip Oakley @ 2019-11-30 4:22 ` Elijah Newren 2019-11-30 16:37 ` Elijah Newren 2019-11-30 17:58 ` Junio C Hamano 1 sibling, 2 replies; 6+ messages in thread From: Elijah Newren @ 2019-11-30 4:22 UTC (permalink / raw) To: Pavel Roskin; +Cc: Git Mailing List On Fri, Nov 29, 2019 at 12:24 AM Pavel Roskin <plroskin@gmail.com> wrote: > > Hi! > > I've discovered an issue with "git rebase" producing a subtly > incorrect file. In fact, that files even compiled but failed in unit > tests! That's so scary that I'm going to stop using "git rebase" for > now. Fortunately, "git rebase --merge" is working correctly, so I'll > use it. Too bad there is no option to use "--merge" by default. Indeed. We really should fix that, if not just make it the default for everyone. > The issue was observed in git 2.23 and reproduced in today's next > branch (2.24.0.449.g4c06f74957) on up-to-date Fedora 31 x86_64. > > I've created a repository that demonstrates the issue: > https://github.com/proski/git-rebase-demo > > The branch names should be self-explanatory. "master" is the base, > "branch1" and "branch2" contain one change each. If "branch1" is > rebased on top of "branch2", the result is incorrect, saved in the > "rebase-bad" branch. If "git rebase -m" is used, the result is > correct, saved in the "merge-good" branch. > > The files in "rebase-bad" and "merge-good" have exactly the same lines > but in a different order. Yet the changes on branch1 and branch2 > affect non-overlapping parts of the file. There should be no doubt how > the merged code should look like. > > I believe the change on branch2 shifts the lines, so that the first > change from branch1 is applies to a place below the intended location, > and then git goes back to an earlier line to apply the next hunk. I > can imagine that it would do the right thing in case of swapped blocks > of code. Yet I have a real life example where it does a very wrong > thing. > > Indeed, "git diff origin/branch2 origin/rebase-bad" and "git diff > origin/branch2 origin/merge-good" both produce diffs of 9957 bytes > long, different only in the order of the hunks. > > Another interesting data point - "git rebase --interactive" is working > correctly. Thanks for the detailed report and simple testcase. Turns out the --interactive isn't so interesting, because a few cycles back we re-implemented the --merge behavior on top of the interactive machinery so the two use the exact same engine. Anyway, I can duplicate the problem and noticed a few interesting things. Since the am-backend for rebase (the default) basically just uses diff and apply, I tried duplicating with just those after looking at things and noticing that it appeared to be applying patch hunks on the wrong lines: $ git switch branch2 $ git reset --hard origin/branch2 HEAD is now at 1331204 Change on branch 2 $ git diff -U3 origin/master origin/branch1 >diff.patch $ git apply diff.patch $ git diff --shortstat origin/merge-good 1 file changed, 43 insertions(+), 43 deletions(-) $ git diff --shortstat origin/rebase-bad So, this reproduces your bad results. Let's repeat with -U4: $ git reset --hard origin/branch2 HEAD is now at 1331204 Change on branch 2 $ git diff -U4 origin/master origin/branch1 >diff.patch $ git apply diff.patch $ git diff --shortstat origin/merge-good 1 file changed, 10 insertions(+), 10 deletions(-) $ git diff --shortstat origin/rebase-bad 1 file changed, 37 insertions(+), 37 deletions(-) That gives us a result that matches neither merge-good nor rebase-bad, but is closer to the good side. Let's try again with -U5: $ git reset --hard origin/branch2 HEAD is now at 1331204 Change on branch 2 $ git diff -U5 origin/master origin/branch1 >diff.patch $ git apply diff.patch $ git diff --shortstat origin/merge-good $ git diff --shortstat origin/rebase-bad 1 file changed, 43 insertions(+), 43 deletions(-) Ahah! With five lines of context, git diff & git apply can produce the correct result. Sadly, I tried to force this with git rebase, but -C5 only affected the apply side and there's no option to pass to rebase to pass through -U5 to the diff logic. Also, although there is a diff.context config option, git-am ignores it (Note that git_am_config() does not directly check that value and it calls git_default_config(), not git_diff_ui_config() or even git_diff_basic_config()). So, it's not possible to force the am-based rebase to get the right answer currently even if you figure out what the problem is. The merge-based rebase, by contrast, essentially benfits from having the entire files of each version accessible so it automatically gets it right. So, to summarize here: * you have a case where the default 3 lines of context mess stuff up; but rebase --merge works great * am doesn't have a -U option, and ignores the diff.context setting, making it impossible to force the am backend to work on your case and also: * rebase doesn't have an option to use the merge/interactive backend by default (nor an --am option to override it) Also, * The performance of the merge/interactive backend is slightly better than the am-backend (https://public-inbox.org/git/CABPp-BF=ev03WgODk6TMQmuNoatg2kiEe5DR__gJ0OTVqHSnfQ@mail.gmail.com/) and will be getting better * The merge/interactive backend supports many more options than the am-backend, though the am one still has a few the merge backend doesn't. Once the ra/rebase-i-more-options topic merges, --whitespace will be the only consequential option that the am-backend supports that the merge/interactive-backend doesn't. (There's also -C, but as noted above, the merge/interactive backend already have access to the full file). Maybe we should just switch the default, for everyone? (And provide an --am option to override it and a config setting to get the old default?) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-rebase produces incorrect output 2019-11-30 4:22 ` Elijah Newren @ 2019-11-30 16:37 ` Elijah Newren 2019-11-30 17:58 ` Junio C Hamano 1 sibling, 0 replies; 6+ messages in thread From: Elijah Newren @ 2019-11-30 16:37 UTC (permalink / raw) To: Pavel Roskin Cc: Git Mailing List, Phillip Wood, Johannes Schindelin, Junio C Hamano On Fri, Nov 29, 2019 at 8:22 PM Elijah Newren <newren@gmail.com> wrote: > On Fri, Nov 29, 2019 at 12:24 AM Pavel Roskin <plroskin@gmail.com> wrote: > So, to summarize here: > * you have a case where the default 3 lines of context mess stuff > up; but rebase --merge works great > * am doesn't have a -U option, and ignores the diff.context setting, > making it impossible to force the am backend to work on your case > * rebase doesn't have an option to use the merge/interactive backend > by default (nor an --am option to override it) > > Also: > * The performance of the merge/interactive backend is slightly > better than the am-backend > (https://public-inbox.org/git/CABPp-BF=ev03WgODk6TMQmuNoatg2kiEe5DR__gJ0OTVqHSnfQ@mail.gmail.com/) > and will continue to get better > * The merge/interactive backend supports many more options than the > am-backend, though the am one still has a few the merge backend > doesn't. Once the ra/rebase-i-more-options topic merges, --whitespace > will be the only consequential option that the am-backend supports > that the merge/interactive-backend doesn't. (There's also -C, but as > noted above, the merge/interactive backend already have access to the > full file). In case it wasn't clear above, the merge/interactive backend could just accept the -C option and ignore it and do nothing, since it already has access to the full file (thus why I consider the -C option to not be consequential). Also, I remembered and dug out a few more items about the default rebase backend that might be worth including in this summary: * The am backend operates with incomplete tree information as well, limiting what the merge/resolve/whatever can do and what information can be provided to the user (see https://public-inbox.org/git/xmqqh8jeh1id.fsf@gitster-ct.c.googlers.com/) * The interactive backend, although slightly faster than the am-backend (on p3400 at least), is slightly slower with split-index which hasn't yet been investigated (see https://public-inbox.org/git/nycvar.QRO.7.76.6.1901312310280.41@tvgsbejvaqbjf.bet/) > Maybe we should just switch the default, for everyone? (And provide > an --am option to override it and a config setting to get the old > default?) CC'ing a few folks for opinions on switching the default backend of rebase from --am to --merge. Johannes already agreed it was the right path eventually[1], and Junio suggested the am backend should be deprecated[2] and eventually replaced, so I was going to push on this after some merge performance work but perhaps it's a good time to query if it's time to switch the default sooner. [1] See the end of https://public-inbox.org/git/nycvar.QRO.7.76.6.1808311158540.71@tvgsbejvaqbjf.bet/, also linked above. [2] https://public-inbox.org/git/xmqqh8jeh1id.fsf@gitster-ct.c.googlers.com/, also linked above. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git-rebase produces incorrect output 2019-11-30 4:22 ` Elijah Newren 2019-11-30 16:37 ` Elijah Newren @ 2019-11-30 17:58 ` Junio C Hamano 1 sibling, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2019-11-30 17:58 UTC (permalink / raw) To: Elijah Newren; +Cc: Pavel Roskin, Git Mailing List Elijah Newren <newren@gmail.com> writes: > Sadly, I tried to force this with git rebase, but -C5 only affected > the apply side and there's no option to pass to rebase to pass through > -U5 to the diff logic. Also, although there is a diff.context config > option, git-am ignores it (Note that git_am_config() does not directly > check that value and it calls git_default_config(), not > git_diff_ui_config() or even git_diff_basic_config()). Not essential but puzzled. The context applies to the generation side, not acceptance side, no? IOW, I suspect that you are talking about "git format-patch" that sits on the upstream side of the pipe that feeds "git am". > So, to summarize here: > * you have a case where the default 3 lines of context mess stuff > up; but rebase --merge works great > * am doesn't have a -U option, and ignores the diff.context setting, > making it impossible to force the am backend to work on your case > and also: I do not think it is super hard to teach "git rebase" to pass backend specific options so that "git rebase--am" can be told to work with wider context (which will reduce the risk of ambiguous patch like this example, trading the increased risk of unnecessary conflicts; it is a good trade-off most of the time for added safety, as nobody wants a system that produces a wrong result silently and quickly). Having said that, > * rebase doesn't have an option to use the merge/interactive backend > by default (nor an --am option to override it) I think addition of rebase.backend would be a good first step for eventually flipping the default, which by the way I have no trouble with. > Maybe we should just switch the default, for everyone? (And provide > an --am option to override it and a config setting to get the old > default?) Yes, that would be a sensible second step. I actually think a longer term goal is to deprecate the am backend. It was invented first and then kept to be the default backend for a long time because the merge based backend historically has been noticeably slow (it was expected to be---it was essentially a shell script that run cherry-pick repeatedly in a loop). In some future, it would outlive its usefulness, and that I think that that future is just around the corner. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-11-30 17:58 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-29 8:21 git-rebase produces incorrect output Pavel Roskin 2019-11-29 13:31 ` Philip Oakley 2019-11-29 21:28 ` Pavel Roskin 2019-11-30 4:22 ` Elijah Newren 2019-11-30 16:37 ` Elijah Newren 2019-11-30 17:58 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).