git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: jnareb@gmail.com
To: Abhishek Kumar <abhishekkumar8222@gmail.com>
Cc: git@vger.kernel.org, "Derrick Stolee" <stolee@gmail.com>,
	"Jakub Narębski" <jnareb@gmail.com>
Subject: Re: [GSoC] Blog post on reachability queries
Date: Thu, 21 May 2020 16:16:42 +0200	[thread overview]
Message-ID: <85tv09weyd.fsf@LAPTOP-ACER-ASPIRE-F5.i-did-not-set--mail-host-address--so-tickle-me> (raw)
In-Reply-To: <20200520163653.GA76552@Abhishek-Arch> (Abhishek Kumar's message of "Wed, 20 May 2020 22:07:22 +0530")

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> Greetings everyone!
>
> I am working on implementing Generation Number v2. I have written an
> article about reachability queries, which I feel is necessary background
> for understanding the project.

Thank you for your introductory work.  It is always nice to have the
problem to be solved well described.

That said, I am not sure if all this theoretical introduction is needed
to understand the reachability labeling that Git is using, and is the
centerpoint of "Generation Number v2", namely the generation number or
topological level.

> Here's the summary of article:
>
>> Reachability refers to the ability to get from one vertex to another
>> within a graph.
>>
>> Reachability queries are an interesting problem, improving performance
>> for many graph operations. Better and more sophisticated solutions are
>> being created as the size of working graphs keeps increasing.

One caveat: the solution that works for the scale-free small-world graph
like citation network might not work for the commit graph, and vice
versa.  Also, not all algorithms for static graph would work well for
constantly growing history graph (for commit graph).

But it is also what makes it interesting, in my opinion.

>>
>> Reachability for the undirected graph can be found in linear
>> preprocessing and constant query time with disjoint set unions. The
>> answer isn't as evident for a directed graph because of differing
>> performance on positive and negative queries, nature and size of graph
>> and other factors. Topological Levels, Post Order DFS Intervals and
>> Contraction Hierarchies are some of the building blocks for such
>> algorithms.

All right.

>
> In a later article, I will talk about the specifics of generation number
> for Git. In particular, how Git uses reachability queries, the need for
> Generation Number v2 i.e., _Corrected Commit Date With Strictly Monotonic
> Offset_ and other interesting tidbits I come across.

Actually, _Corrected Commit Date With Strictly Monotonic Offset_ variant
of generation number v2 is needed to have *backward compatibility*.  It
turned out that there was a bug in implementing versioning for
serialized commit-graph format, namely that unsupported (newer) versions
of file make Git crash, instead of just not using commit-graph file to
speed up operations.

But it seems that there is a workaround. It happens that if commit graph
file is missing the CDAT chunk, then Git wouldn't use it to speed up
operations, but would not hard fail.  So if we put generation number v2
in a chunk with different name, e.g. CDA2, we would not need backward
compatibility.  Then simpler (and benchmarked) variant of _Corrected
Date_ would be enough.

There are advantages and disadvantages of each approach (assuming that
the second one actually works as described).  The first is more complex,
but backward compatibile with respect to reading (using) commit-graph.
The second is simpler, but old Git using new format would get
performance regression.

> You can find the article here:
>
> https://abhishekkumar2718.github.io/programming/2020/05/20/reachability-queries.html

Regards,
--
Jakub Narębski

      reply	other threads:[~2020-05-21 14:23 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20 16:37 [GSoC] Blog post on reachability queries Abhishek Kumar
2020-05-21 14:16 ` jnareb [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85tv09weyd.fsf@LAPTOP-ACER-ASPIRE-F5.i-did-not-set--mail-host-address--so-tickle-me \
    --to=jnareb@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).