Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Abhishek Kumar <abhishekkumar8222@gmail.com>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: abhishekkumar8222@gmail.com, git@vger.kernel.org,
	gitgitgadget@gmail.com, me@ttaylor.com, stolee@gmail.com
Subject: Re: [PATCH v3 00/11] [GSoC] Implement Corrected Commit Date
Date: Tue, 18 Aug 2020 11:42:20 +0530
Message-ID: <20200818061220.GA28571@Abhishek-Arch> (raw)
In-Reply-To: <85zh6uxh7l.fsf@gmail.com>

On Mon, Aug 17, 2020 at 02:13:18AM +0200, Jakub Narębski wrote:
> "Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > This patch series implements the corrected commit date offsets as generation
> > number v2, along with other pre-requisites.
> 
> I'm not sure if this level of detail is required in the cover letter for
> the series, but generation number v2 is corrected commit date; corrected
> commit date offsets is how we store this value in the commit-graph file.
> 
> >
> > Git uses topological levels in the commit-graph file for commit-graph
> > traversal operations like git log --graph. Unfortunately, using topological
> > levels can result in a worse performance than without them when compared
> > with committer date as a heuristics. For example, git merge-base v4.8 v4.9
> > on the Linux repository walks 635,579 commits using topological levels and
> > walks 167,468 using committer date.
> 
> I would say "committer date heuristics" instead of just "committer
> date", to be more exact.
> 
> Is this data generated using https://github.com/derrickstolee/gen-test
> scripts?
> 

Yes, it is.

> >
> > Thus, the need for generation number v2 was born. New generation number
> > needed to provide good performance, increment updates, and backward
> > compatibility. Due to an unfortunate problem, we also needed a way to
> > distinguish between the old and new generation number without incrementing
> > graph version.
> 
> It would be nice to have reference email (or other place with details)
> for "unfortunate problem".
> 

Will add.

> >
> > Various candidates were examined (https://github.com/derrickstolee/gen-test,
> > https://github.com/abhishekkumar2718/git/pull/1). The proposed generation
> > number v2, Corrected Commit Date with Mononotically Increasing Offsets
> > performed much worse than committer date (506,577 vs. 167,468 commits walked
> > for git merge-base v4.8 v4.9) and was dropped.
> >
> > Using Generation Data chunk (GDAT) relieves the requirement of backward
> > compatibility as we would continue to store topological levels in Commit
> > Data (CDAT) chunk. Thus, Corrected Commit Date was chosen as generation
> > number v2.
> 
> This is a bit of simplification, but good enough for a cover letter.
> 
> To be more exact, from various candidates the Corrected Commit Date was
> chosen.  Then it turned out that old Git crashes on changed commit-graph
> format version value, so if the generation number v2 was to replace v1
> it needed to be backward-compatibile: hence the idea of Corrected Commit
> Date with Monotonically Increasing Offsets.  But with GDAT chunk to
> store generation number v2 (and for the time being leaving generation
> number v1, i.e. Topological Levels, in CDAT), we are no longer
> constrained by the requirement of backward-compatibility to make old Git
> work with commit-graph file created by new Git.  So we could go back to
> Corrected Commit Date, and as you wrote above the backward-compatibile
> variant performs worse.
> 
> > The Corrected Commit Date is defined as:
> >
> > For a commit C, let its corrected commit date be the maximum of the commit
> > date of C and the corrected commit dates of its parents.
> 
> Actually it needs to be "corrected commit dates of its parents plus 1"
> to fulfill the reachability condition for a generation number for a
> commit:
> 
>       A can reach B   =>  gen(A) < gen(B)
> 
> Of course it can be computed in simpler way, because
> 
>   max_P (gen(P) + 1)  ==  max_P (gen(P)) + 1
> 
> 
> >                                                           Then corrected
> > commit date offset is the difference between corrected commit date of C and
> > commit date of C.
> 
> All right.
> 
> >
> > We will introduce an additional commit-graph chunk, Generation Data chunk,
> > and store corrected commit date offsets in GDAT chunk while storing
> > topological levels in CDAT chunk. The old versions of Git would ignore GDAT
> > chunk, using topological levels from CDAT chunk. In contrast, new versions
> > of Git would use corrected commit dates, falling back to topological level
> > if the generation data chunk is absent in the commit-graph file.
> 
> All right.
> 
> However I think the cover letter should also describe what should happen
> in a mixed version environment (for example new Git on command line,
> copy of old Git used by GUI client), and in particular what should
> happen in a mixed-chain case - both for reading and for writing the
> commit-graph file.
> 

Yes, definitely. Will add

> For *writing*: because old Git would create commit-graph layers without
> the GDAT chunk, to simplify the behavior and make easy to reason about
> commit-graph data (the situation should be not that common, and
> transient -- it should get more rare as the time goes), we want the
> following behavior from new Git:
> 
> - If top layer contains the GDAT chunk, or we are rewriting commit-graph
>   file (--split=replace), or we are merging layers and there are no
>   layers without GDAT chunk below set of layers that are merged, then
> 
>      write commit-graph file or commit-graph layer with GDAT chunk,
> 
>   otherwise
> 
>      write commit-graph layer without GDAT chunk.
> 
>   This means that there are commit-graph layers without GDAT chunk if
>   and only if the top layer is also without GDAT chunk.
> 
> 
> For *reading* we want to use generation number v2 (corrected commit
> date) if possible, and fall back to generation number v1 (topological
> levels).
> 
> - If the top layer contains the GDAT chunk (or maybe even if the topmost
>   layer that involves all commits in question, not necessarily the top
>   layer in the full commit-graph chain), then use generation number v2
> 

The current implementation checks the entire chain for GDAT, rather than
just the topmost layer as we cannot assert that `g` would be the topmost
layer of the chain.

See the discussion here: https://lore.kernel.org/git/20200814045957.GA1380@Abhishek-Arch/

It's one of drawbacks of having a single member 64-bit `generation`
instead of two 32-bit members `level` and `odate`.

>
>
>   - commit_graph_data->generation stores corrected commit date,
>     computed as sum of committer date (from CDAT) and offset (from GDAT)
> 
>   - A can reach B   =>  gen(A) < gen(B)
> 
>   - there is no need for committer date heuristics, and no need for
>     limiting use of generation number to where there is a cutoff (to not
>     hamper performance).
> 
> - If there are layers without GDAT chunks, which thanks to the write
>   behavior means simply top layer without GDAT chunk, we need to turn
>   off use of generation numbers or fall back to using topological levels
> 
>   - commit_graph_data->generation stores topological levels,
>     taken from CDAT chunk (30-bits)
> 
>   - A can reach B   =>  gen(A) < gen(B)
> 
>   - we probably want to keep tie-breaking of sorting by generation
>     number via committer date, and limit use of generation number as
>     opposed to using committer date heuristics (with slop) to not make
>     performance worse.
> 
> >
> > Thanks to Dr. Stolee, Dr. Narębski, and Taylor for their reviews on the
> > first version.
> >
> > I look forward to everyone's reviews!
> >
> > Thanks
> >
> >  * Abhishek
> >
> >
> > ----------------------------------------------------------------------------
> >
> > Changes in version 3:
> >
> >  * Reordered patches as discussed in 1
> >    [https://lore.kernel.org/git/aee0ae56-3395-6848-d573-27a318d72755@gmail.com/]
> 
> If I remember it correctly this was done to always store in GDAT chunk
> corrected commit date offsets, isn't it?
> 

Yes.

> >  * Split "implement corrected commit date" into two patches - one
> >    introducing the topo level slab and other implementing corrected commit
> >    dates.
> 
> All right.
> 
> I think it might be good idea to split off the change to tar file tests
> (as a preparatory patch), to make reviews and bisecting easier.
> 
> >  * Extended split-commit-graph tests to verify at the end of test.
> 
> Do we also test for proper merging of split commit-graph layers, not
> only adding a new layer and a full rewrite (--split=replace)?
> 

We do not, will add a test at end. Thanks for pointing this out.

> >  * Use topological levels as generation number if any of split commit-graph
> >    files do not have generation data chunk.
> 
> That is good for performance.
> 
> >
> > Changes in version 2:
> >
> >  * Add tests for generation data chunk.
> 
> Good.
> 
> >  * Add an option GIT_TEST_COMMIT_GRAPH_NO_GDAT to control whether to write
> >    generation data chunk.
> 
> Good, that is needed for testing mixed-version behavior.
> 
> >  * Compare commits with corrected commit dates if present in
> >    paint_down_to_common().
> 
> All right, but see the caveat.
> 
> >  * Update technical documentation.
> 
> Always a good thing.
> 
> >  * Handle mixed graph version commit chains.
> 
> Where by "version" you mean generation number version - the commit-graph
> version number unfortunately needs to stay the same...
> 

Yes, clarified.

> >  * Improve commit messages for
>                                 ^^^^^^
> Something missing in this point, the sentence ends abruptly.

I didn't finish the sentence. Meant to say:

- Improve commit messages for "commit-graph: fix regression when computing bloom filter", "commit-graph: consolidate fill_commit_graph_info",
> 
> >  * Revert unnecessary whitespace changes.
> 
> Thanks.
> 
> >  * Split uint_32 -> timestamp_t change into a new commit.
> 
> It is usually better to keep the commits small.  Good.
> 
> 
> Good work!
> 
> ...
> --
> Jakub Narębski

  parent reply index

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  9:13 [PATCH 0/6] " Abhishek Kumar via GitGitGadget
2020-07-28  9:13 ` [PATCH 1/6] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-07-28 15:28   ` Taylor Blau
2020-07-30  5:24     ` Abhishek Kumar
2020-08-04  0:46   ` Jakub Narębski
2020-08-04  0:56     ` Taylor Blau
2020-08-04 10:10       ` Jakub Narębski
2020-08-04  7:55     ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 2/6] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-07-28 13:00   ` Derrick Stolee
2020-07-28 15:30     ` Taylor Blau
2020-08-05 23:16   ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 3/6] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-07-28 13:14   ` Derrick Stolee
2020-07-28 15:19     ` René Scharfe
2020-07-28 15:58       ` Derrick Stolee
2020-07-28 16:01     ` Taylor Blau
2020-07-30  6:07     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 4/6] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-07-28 16:03   ` Taylor Blau
2020-07-28  9:13 ` [PATCH 5/6] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-07-28 16:12   ` Taylor Blau
2020-07-30  6:52     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 6/6] commit-graph: implement corrected commit date offset Abhishek Kumar via GitGitGadget
2020-07-28 15:55   ` Derrick Stolee
2020-07-28 16:23     ` Taylor Blau
2020-07-30  7:27     ` Abhishek Kumar
2020-07-28 14:54 ` [PATCH 0/6] [GSoC] Implement Corrected Commit Date Taylor Blau
2020-07-30  7:47   ` Abhishek Kumar
2020-07-28 16:35 ` Derrick Stolee
2020-08-09  2:53 ` [PATCH v2 00/10] " Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 01/10] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 04/10] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 05/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-10 16:28     ` Derrick Stolee
2020-08-11 11:03       ` Abhishek Kumar
2020-08-11 12:27         ` Derrick Stolee
2020-08-11 18:58           ` Taylor Blau
2020-08-09  2:53   ` [PATCH v2 06/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 07/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-10 14:23     ` Derrick Stolee
2020-08-14  4:59       ` Abhishek Kumar
2020-08-14 12:24         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 08/10] commit-graph: handle mixed generation commit chains Abhishek Kumar via GitGitGadget
2020-08-10 16:42     ` Derrick Stolee
2020-08-11 11:36       ` Abhishek Kumar
2020-08-11 12:43         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-10 16:47   ` [PATCH v2 00/10] [GSoC] Implement Corrected Commit Date Derrick Stolee
2020-08-15 16:39   ` [PATCH v3 00/11] " Abhishek Kumar via GitGitGadget
2020-08-15 16:39     ` [PATCH v3 01/11] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-17 22:30       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 02/11] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-18 14:18       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 03/11] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-19 17:54       ` Jakub Narębski
2020-08-21  4:11         ` Abhishek Kumar
2020-08-25 11:11           ` Jakub Narębski
2020-09-01 11:35             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 04/11] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-17 13:22       ` Derrick Stolee
2020-08-21 11:05       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 05/11] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-21 13:14       ` Jakub Narębski
2020-08-25  5:04         ` Abhishek Kumar
2020-08-25 12:18           ` Jakub Narębski
2020-09-01 12:06             ` Abhishek Kumar
2020-09-03 13:42               ` Jakub Narębski
2020-09-05 17:21                 ` Abhishek Kumar
2020-09-13 15:39                   ` Jakub Narębski
2020-09-28 21:48                     ` Jakub Narębski
2020-10-05  5:25                       ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 06/11] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-08-21 18:43       ` Jakub Narębski
2020-08-25  6:14         ` Abhishek Kumar
2020-08-25  7:33           ` Jakub Narębski
2020-08-25  7:56             ` Jakub Narębski
2020-09-01 10:26               ` Abhishek Kumar
2020-09-03  9:25                 ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 07/11] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-22  0:05       ` Jakub Narębski
2020-08-25  6:49         ` Abhishek Kumar
2020-08-25 10:07           ` Jakub Narębski
2020-09-01 11:01             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 08/11] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-22 13:09       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 09/11] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-08-22 17:14       ` Jakub Narębski
2020-08-26  7:15         ` Abhishek Kumar
2020-08-26 10:38           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 10/11] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-22 19:09       ` Jakub Narębski
2020-09-01 10:08         ` Abhishek Kumar
2020-09-03 19:11           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 11/11] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-22 22:20       ` Jakub Narębski
2020-08-27  6:39         ` Abhishek Kumar
2020-08-27 12:43           ` Jakub Narębski
2020-08-27 13:15           ` Derrick Stolee
2020-09-01 13:01             ` Abhishek Kumar
2020-08-17  0:13     ` [PATCH v3 00/11] [GSoC] Implement Corrected Commit Date Jakub Narębski
     [not found]       ` <CANQwDwdKp7oKy9BeKdvKhwPUiq0R5MS8TCw-eWGCYCoMGv=G-g@mail.gmail.com>
2020-08-17  1:32         ` Fwd: " Taylor Blau
2020-08-17  7:56           ` Jakub Narębski
2020-08-18  6:12       ` Abhishek Kumar [this message]
2020-08-23 15:27       ` Jakub Narębski
2020-08-24  2:49         ` Abhishek Kumar
2020-10-07 14:09     ` [PATCH v4 00/10] " Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 01/10] commit-graph: fix regression when computing Bloom filters Abhishek Kumar via GitGitGadget
2020-10-24 23:16         ` Jakub Narębski
2020-10-25 20:58           ` Taylor Blau
2020-10-07 14:09       ` [PATCH v4 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-10-24 23:41         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-10-25 10:52         ` Jakub Narębski
2020-10-27  6:33           ` Abhishek Kumar
2020-10-07 14:09       ` [PATCH v4 04/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-10-25 13:48         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 05/10] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-10-25 22:17         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 06/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-10-27 18:53         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 07/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-10-30 12:45         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 08/10] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200818061220.GA28571@Abhishek-Arch \
    --to=abhishekkumar8222@gmail.com \
    --cc=85zh6uxh7l.fsf@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jnareb@gmail.com \
    --cc=me@ttaylor.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git