All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
	git@vger.kernel.org, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>,
	Jeff Hostetler <git@jeffhostetler.com>,
	Jacob Keller <jacob.keller@gmail.com>,
	Stefan Beller <sbeller@google.com>,
	Elijah Newren <newren@gmail.com>
Subject: Re: commit-graph: change in "best" merge-base when ambiguous
Date: Fri, 25 May 2018 00:08:28 +0200	[thread overview]
Message-ID: <86o9h41zc3.fsf@gmail.com> (raw)
In-Reply-To: <8b480e9e-1fd3-35ff-2974-653fadd49fa7@gmail.com> (Derrick Stolee's message of "Tue, 22 May 2018 08:48:04 -0400")

Derrick Stolee <stolee@gmail.com> writes:
> On 5/22/2018 1:39 AM, Michael Haggerty wrote:
>> On 05/21/2018 08:10 PM, Derrick Stolee wrote:
>>> [...]
>>> In the Discussion section of the `git merge-base` docs [1], we have the
>>> following:
>>>
>>>      When the history involves criss-cross merges, there can be more than
>>> one best common ancestor for two commits. For example, with this topology:
>>>
>>>      ---1---o---A
>>>          \ /
>>>           X
>>>          / \
>>>      ---2---o---o---B
>>>
>>>      both 1 and 2 are merge-bases of A and B. Neither one is better than
>>> the other (both are best merge bases). When the --all option is not
>>> given, it is unspecified which best one is output.
>>>
>>> This means our official documentation mentions that we do not have a
>>> concrete way to differentiate between these choices. This makes me think
>>> that this change in behavior is not a bug, but it _is_ a change in
>>> behavior. It's worth mentioning, but I don't think there is any value in
>>> making sure `git merge-base` returns the same output.
>>>
>>> Does anyone disagree? Is this something we should solidify so we always
>>> have a "definitive" merge-base?
>>> [...]
>> This may be beyond the scope of what you are working on, but there are
>> significant advantages to selecting a "best" merge base from among the
>> candidates. Long ago [1] I proposed that the "best" merge base is the
>> merge base candidate that minimizes the number of non-merge commits that
>> are in
>>
>>      git rev-list $candidate..$branch
>>
>> that are already in master:
>>
>>      git rev-list $master
>>
>> (assuming merging branch into master), which is equivalent to choosing
>> the merge base that minimizes
>>
>>      git rev-list --count $candidate..$branch

Is the above correct...

>> In fact, this criterion is symmetric if you exchange branch ↔ master,
>> which is a nice property, and indeed generalizes pretty simply to
>> computing the merge base of more than two commits.

...as it doesn't seem to have the described symmetry.

>>
>> In that email I also included some data showing that the "best" merge
>> base almost always results in either the same or a shorter diff than the
>> more or less arbitrary algorithm that we currently use. Sometimes the
>> difference in diff length is dramatic.
>>
>> To me it feels like the best *deterministic* merge base would be based
>> on the above criterion, maybe with first-parent reachability, commit
>> times, and SHA-1s used (in that order) to break ties.
>
> Thanks, everyone, for your perspective on this. I'm walking away with
> these conclusions:
>
> 1. While this is a change in behavior, it is not a regression. We do
> not need to act immediately to preserve old behavior in these
> ambiguous cases.
>
> 2. We should (eventually) define tie-breaking conditions. I like
> Michael's suggestion above.

One thing I'd like to point out is that when searching for some
algorithm to speed up merge-base calculation (which is called lowest
common ancestor in graph theory, and for which I have currently found
only an algorithm with O(|V|*{E|) preparation time, and U(1) query)
I have found instead attempts to rigorously define single representative
lowest common ancestor.  It might be worth a look how it is done.

Another possible source to compare against is the algorithm used by
Mercurial (which as far as I know doesn't use recursive merge strategy,
so it needs to chose one merge base).

HTH,
-- 
Jakub Narębski

  reply	other threads:[~2018-05-24 22:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-21 18:10 commit-graph: change in "best" merge-base when ambiguous Derrick Stolee
2018-05-21 18:33 ` Elijah Newren
2018-05-21 21:50   ` Jeff King
2018-05-21 22:28     ` Stefan Beller
2018-05-21 21:54 ` Jeff King
2018-05-21 22:25   ` Jacob Keller
2018-05-22  5:39 ` Michael Haggerty
2018-05-22 12:48   ` Derrick Stolee
2018-05-24 22:08     ` Jakub Narebski [this message]
2018-05-25  6:03       ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86o9h41zc3.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jacob.keller@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.