All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Derrick Stolee <stolee@gmail.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
	Jakub Narebski <jnareb@gmail.com>,
	Jeff Hostetler <git@jeffhostetler.com>
Subject: Re: commit-graph: change in "best" merge-base when ambiguous
Date: Tue, 22 May 2018 07:39:19 +0200	[thread overview]
Message-ID: <3705af00-00b7-b620-cc77-eef8f0a73bc1@alum.mit.edu> (raw)
In-Reply-To: <e78a115a-a5ea-3c0a-5437-51ba0bcc56e1@gmail.com>

On 05/21/2018 08:10 PM, Derrick Stolee wrote:
> [...]
> In the Discussion section of the `git merge-base` docs [1], we have the
> following:
> 
>     When the history involves criss-cross merges, there can be more than
> one best common ancestor for two commits. For example, with this topology:
> 
>     ---1---o---A
>         \ /
>          X
>         / \
>     ---2---o---o---B
> 
>     both 1 and 2 are merge-bases of A and B. Neither one is better than
> the other (both are best merge bases). When the --all option is not
> given,     it is unspecified which best one is output.
> 
> This means our official documentation mentions that we do not have a
> concrete way to differentiate between these choices. This makes me think
> that this change in behavior is not a bug, but it _is_ a change in
> behavior. It's worth mentioning, but I don't think there is any value in
> making sure `git merge-base` returns the same output.
> 
> Does anyone disagree? Is this something we should solidify so we always
> have a "definitive" merge-base?
> [...]

This may be beyond the scope of what you are working on, but there are
significant advantages to selecting a "best" merge base from among the
candidates. Long ago [1] I proposed that the "best" merge base is the
merge base candidate that minimizes the number of non-merge commits that
are in

    git rev-list $candidate..$branch

that are already in master:

    git rev-list $master

(assuming merging branch into master), which is equivalent to choosing
the merge base that minimizes

    git rev-list --count $candidate..$branch

In fact, this criterion is symmetric if you exchange branch ↔ master,
which is a nice property, and indeed generalizes pretty simply to
computing the merge base of more than two commits.

In that email I also included some data showing that the "best" merge
base almost always results in either the same or a shorter diff than the
more or less arbitrary algorithm that we currently use. Sometimes the
difference in diff length is dramatic.

To me it feels like the best *deterministic* merge base would be based
on the above criterion, maybe with first-parent reachability, commit
times, and SHA-1s used (in that order) to break ties.

I don't plan to work on the implementation of this idea myself (though
we've long used a script-based implementation of this algorithm
internally at GitHub).

Michael

[1] https://public-inbox.org/git/539A25BF.4060501@alum.mit.edu/
    See the rest of the thread for more interesting discussion.
[2]
https://public-inbox.org/git/8a9b3f20-eed2-c59b-f7ea-3c68b3c30bf5@alum.mit.edu/
    Higher in this thread, Junio proposes a different criterion.

  parent reply	other threads:[~2018-05-22  5:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-21 18:10 commit-graph: change in "best" merge-base when ambiguous Derrick Stolee
2018-05-21 18:33 ` Elijah Newren
2018-05-21 21:50   ` Jeff King
2018-05-21 22:28     ` Stefan Beller
2018-05-21 21:54 ` Jeff King
2018-05-21 22:25   ` Jacob Keller
2018-05-22  5:39 ` Michael Haggerty [this message]
2018-05-22 12:48   ` Derrick Stolee
2018-05-24 22:08     ` Jakub Narebski
2018-05-25  6:03       ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3705af00-00b7-b620-cc77-eef8f0a73bc1@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.