git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <stolee@microsoft.com>
Subject: Re: [PATCH 0/1] v2.19.0-rc1 Performance Regression in 'git merge-base'
Date: Sat, 06 Oct 2018 18:09:40 +0200	[thread overview]
Message-ID: <865zyfys3f.fsf@gmail.com> (raw)
In-Reply-To: <pull.28.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Thu, 30 Aug 2018 05:58:07 -0700 (PDT)")

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> As I was testing the release candidate, I stumbled across a regression in
> 'git merge-base' as a result of the switch to generation numbers. The commit
> message in [PATCH 1/1] describes the topology involved, but you can test it
> yourself by comparing 'git merge-base v4.8 v4.9' in the Linux kernel. The
> regression does not show up when running merge-base for tags at least v4.9,
> which is why I didn't see it when I was testing earlier.

Strange that it happens; I'll take a look.

> The solution is simple, but also will conflict with ds/reachable in next. I
> can send a similar patch that applies the same diff into commit-reach.c.
>
> With the integration of generation numbers into most commit walks coming to
> a close [1], it will be time to re-investigate other options for
> reachability indexes [2]. As I was digging into the issue with this
> regression, I discovered a way we can modify our generation numbers and pair
> them with commit dates to give us a simple-to-compute, immutable
> two-dimensional reachability index that would be immune to this regression.
> I will investigate that more and report back, but it is more important to
> fix this regression now.

I am interested in what you have created, especially because commit
creation dates are imperfect indicators because of clock skew etc.

>
> Thanks, -Stolee
>
> [1] https://public-inbox.org/git/pull.25.git.gitgitgadget@gmail.com/[PATCH
> 0/6] Use generation numbers for --topo-order
>
> [2] https://public-inbox.org/git/86muxcuyod.fsf@gmail.com/[RFC] Other chunks
> for commit-graph, part 2 - reachability indexes

Since then I have found few more possible approaches:
- working with repository metagraph[1], where all chains of commits were
  replaced by edge, perhaps together with DAG Reduction[2] boosting
  framework on this metagraph
- using FELINE-like index (the Graph+Label approch, also known as online
  search), and for those where this index have false results, use Labels
  only approach[3]

[1] Xian Tang et. al., "An Optimized Labeling Scheme for Reachability
    Queries", CMC, vol.55, no.2, pp.267-283, 2018
[2] Marco Biazzini, Martin Monperrus, Benoit Baudry "On Analyzing the
    Topology of Commit Histories in Decentralized Version Control
    Systems", ICSME 2014 (conference paper)
[3] Junfeng Zhou et. al., "Accelerating reachability query processing
    based on DAG reduction", The VLDB Journal (2018) 27: 271

--
Jakub Narębski

      parent reply	other threads:[~2018-10-06 16:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-30 12:58 [PATCH 0/1] v2.19.0-rc1 Performance Regression in 'git merge-base' Derrick Stolee via GitGitGadget
2018-08-30 12:58 ` [PATCH 1/1] commit: don't use generation numbers if not needed Derrick Stolee via GitGitGadget
2018-10-06 16:54   ` Jakub Narebski
2018-08-30 15:26 ` [PATCH 0/1] v2.19.0-rc1 Performance Regression in 'git merge-base' Junio C Hamano
2018-10-06 16:09 ` Jakub Narebski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=865zyfys3f.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=stolee@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).