Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Pavel Roskin <plroskin@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: git-rebase produces incorrect output
Date: Fri, 29 Nov 2019 20:22:37 -0800
Message-ID: <CABPp-BGiu2nVMQY_t-rnFR5GQUz_ipyEE8oDocKeO+h+t4Mn4A@mail.gmail.com> (raw)
In-Reply-To: <CAN_72e2h2avv-U9BVBYqXVKiC+5kHy-pjejyMSD3X22uRXE39g@mail.gmail.com>

On Fri, Nov 29, 2019 at 12:24 AM Pavel Roskin <plroskin@gmail.com> wrote:
>
> Hi!
>
> I've discovered an issue with "git rebase" producing a subtly
> incorrect file. In fact, that files even compiled but failed in unit
> tests! That's so scary that I'm going to stop using "git rebase" for
> now. Fortunately, "git rebase --merge" is working correctly, so I'll
> use it. Too bad there is no option to use "--merge" by default.

Indeed.  We really should fix that, if not just make it the default
for everyone.

> The issue was observed in git 2.23 and reproduced in today's next
> branch (2.24.0.449.g4c06f74957) on up-to-date Fedora 31 x86_64.
>
> I've created a repository that demonstrates the issue:
> https://github.com/proski/git-rebase-demo
>
> The branch names should be self-explanatory. "master" is the base,
> "branch1" and "branch2" contain one change each. If "branch1" is
> rebased on top of "branch2", the result is incorrect, saved in the
> "rebase-bad" branch. If "git rebase -m" is used, the result is
> correct, saved in the "merge-good" branch.
>
> The files in "rebase-bad" and "merge-good" have exactly the same lines
> but in a different order. Yet the changes on branch1 and branch2
> affect non-overlapping parts of the file. There should be no doubt how
> the merged code should look like.
>
> I believe the change on branch2 shifts the lines, so that the first
> change from branch1 is applies to a place below the intended location,
> and then git goes back to an earlier line to apply the next hunk. I
> can imagine that it would do the right thing in case of swapped blocks
> of code. Yet I have a real life example where it does a very wrong
> thing.
>
> Indeed, "git diff origin/branch2 origin/rebase-bad" and "git diff
> origin/branch2 origin/merge-good" both produce diffs of 9957 bytes
> long, different only in the order of the hunks.
>
> Another interesting data point - "git rebase --interactive" is working
> correctly.

Thanks for the detailed report and simple testcase.  Turns out the
--interactive isn't so interesting, because a few cycles back we
re-implemented the --merge behavior on top of the interactive
machinery so the two use the exact same engine.  Anyway, I can
duplicate the problem and noticed a few interesting things.  Since the
am-backend for rebase (the default) basically just uses diff and
apply, I tried duplicating with just those after looking at things and
noticing that it appeared to be applying patch hunks on the wrong
lines:

$ git switch branch2

$ git reset --hard origin/branch2
HEAD is now at 1331204 Change on branch 2
$ git diff -U3 origin/master origin/branch1 >diff.patch
$ git apply diff.patch
$ git diff --shortstat origin/merge-good
 1 file changed, 43 insertions(+), 43 deletions(-)
$ git diff --shortstat origin/rebase-bad

So, this reproduces your bad results.  Let's repeat with -U4:

$ git reset --hard origin/branch2
HEAD is now at 1331204 Change on branch 2
$ git diff -U4 origin/master origin/branch1 >diff.patch
$ git apply diff.patch
$ git diff --shortstat origin/merge-good
 1 file changed, 10 insertions(+), 10 deletions(-)
$ git diff --shortstat origin/rebase-bad
 1 file changed, 37 insertions(+), 37 deletions(-)

That gives us a result that matches neither merge-good nor rebase-bad,
but is closer to the good side.  Let's try again with -U5:

$ git reset --hard origin/branch2
HEAD is now at 1331204 Change on branch 2
$ git diff -U5 origin/master origin/branch1 >diff.patch
$ git apply diff.patch
$ git diff --shortstat origin/merge-good
$ git diff --shortstat origin/rebase-bad
 1 file changed, 43 insertions(+), 43 deletions(-)

Ahah!  With five lines of context, git diff & git apply can produce
the correct result.

Sadly, I tried to force this with git rebase, but -C5 only affected
the apply side and there's no option to pass to rebase to pass through
-U5 to the diff logic.  Also, although there is a diff.context config
option, git-am ignores it (Note that git_am_config() does not directly
check that value and it calls git_default_config(), not
git_diff_ui_config() or even git_diff_basic_config()).  So, it's not
possible to force the am-based rebase to get the right answer
currently even if you figure out what the problem is.

The merge-based rebase, by contrast, essentially benfits from having
the entire files of each version accessible so it automatically gets
it right.


So, to summarize here:
  * you have a case where the default 3 lines of context mess stuff
up; but rebase --merge works great
  * am doesn't have a -U option, and ignores the diff.context setting,
making it impossible to force the am backend to work on your case
and also:
  * rebase doesn't have an option to use the merge/interactive backend
by default (nor an --am option to override it)
Also,
  * The performance of the merge/interactive backend is slightly
better than the am-backend
(https://public-inbox.org/git/CABPp-BF=ev03WgODk6TMQmuNoatg2kiEe5DR__gJ0OTVqHSnfQ@mail.gmail.com/)
and will be getting better
  * The merge/interactive backend supports many more options than the
am-backend, though the am one still has a few the merge backend
doesn't.  Once the ra/rebase-i-more-options topic merges, --whitespace
will be the only consequential option that the am-backend supports
that the merge/interactive-backend doesn't.  (There's also -C, but as
noted above, the merge/interactive backend already have access to the
full file).

Maybe we should just switch the default, for everyone?  (And provide
an --am option to override it and a config setting to get the old
default?)

  parent reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-29  8:21 Pavel Roskin
2019-11-29 13:31 ` Philip Oakley
2019-11-29 21:28   ` Pavel Roskin
2019-11-30  4:22 ` Elijah Newren [this message]
2019-11-30 16:37   ` Elijah Newren
2019-11-30 17:58   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BGiu2nVMQY_t-rnFR5GQUz_ipyEE8oDocKeO+h+t4Mn4A@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=plroskin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git