git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Abhishek Kumar <abhishekkumar8222@gmail.com>,
	git@vger.kernel.org,
	Christian Couder <christian.couder@gmail.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: Re: [RFC][GSoC] Implement Generation Number v2
Date: Tue, 24 Mar 2020 16:44:28 +0100	[thread overview]
Message-ID: <86k139ahb7.fsf@gmail.com> (raw)
In-Reply-To: xmqqtv2f5a6x.fsf@gitster.c.googlers.com

Junio C Hamano <gitster@pobox.com> writes:
> Jakub Narebski <jnareb@gmail.com> writes:
>
>> About moving commit data with generation number v2 to "CDA2" chunk: if
>> "CDAT" chunk is missing then (I think) old Git would simply not use
>> commit-graph file at all; it may crash, but I don't think so.  If "CDAT"
>> chunk has zero length... I don't know what would happen then, possibly
>> also old Git would simply not use commit-graph data at all.
>
> Yeah, if it makes it crash, then we cannot use that "missing CDAT"
> approach.

I have not tested this, but from reading the code it looks like "missing
CDAT" makes Git fail softly -- it would return NULL for the
commit-graph, and thus not use commit-graph data at all... which might
be too high a price (too high performance penalty for old Git).

>> Putting generation number v2 into separate chunk (which might be called
>> "GEN2" or "OFFS"/"DOFF") has the disadvantage of increasing the on disk
>> size of the commit graph, and possibly also increasing memory
>> consumption (the latter depends on how it would be handled), but has the
>> advantage of being fullly backward compatibile.  Old Git would simply
>> use generation numbers v1 in "CDAT", new Git would use generation
>> numbers v2 in "OFFS" -- combining commit creation date from "CDAT" and
>> offset from "OFFS"),
>
> Do we have an option *not* to record meaningful generation numbers
> in CDAT and have the current Git binaries understand and still use
> the rest of the graph file, while not using the optimizations that
> rely on having generation numbers?  If not, then the new version of
> Git that tries to be compatible with old one needs to compute both
> generation numbers, and we would need to keep the topological number
> for quite some time.

We can, as Derrick Stolee wrote, put zero (GENERATION_NUMBER_ZERO) for
generation number.  Without generation number data we lose some of
performance improvements, though.

On the other hand computing generation number v1 (topological level) and
generation number v2 ([monotonic] offset for corrected commit date)
should not be much more costly than calculating single generation
number, assuming that most of the cost is walking the commit graph.  But
this would need benchmarking.

Also, as Stolee wrote, with generation number v2 in separate chunk we
have commit data not together, but split into two areas.

>> and there should be no problems with updating
>> commit-graph file (either rewriting, or adding new commit-graph to the
>> chain).
>
> Would merging by the current Git also work well (meaning, would
> "GEN2" or whatever it does not understand be omitted)?

From the analysis of write_commit_graph_file(), it looks like unknown
chunks are simply skipped (ommitted), but I have not checked it in
practice.

Best,
-- 
Jakub Narębski

  reply	other threads:[~2020-03-24 15:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-22  9:35 [RFC][GSoC] Implement Generation Number v2 Abhishek Kumar
2020-03-22 20:05 ` Jakub Narebski
2020-03-23  4:25   ` Abhishek Kumar
2020-03-23  5:32     ` Junio C Hamano
2020-03-23 11:32       ` Abhishek Kumar
2020-03-23 13:43       ` Jakub Narebski
2020-03-23 15:54         ` Derrick Stolee
2020-03-24  9:24           ` Jakub Narebski
2020-03-23 16:04         ` Junio C Hamano
2020-03-24 15:44           ` Jakub Narebski [this message]
2020-03-24 21:13             ` Junio C Hamano
2020-03-26 10:15         ` [GSoC][Proposal v2] " Abhishek Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86k139ahb7.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).