Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: "Jakub Narębski" <jnareb@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git <git@vger.kernel.org>, Derrick Stolee <dstolee@microsoft.com>,
	Abhishek Kumar <abhishekkumar8222@gmail.com>
Subject: Re: Fwd: [PATCH v3 00/11] [GSoC] Implement Corrected Commit Date
Date: Mon, 17 Aug 2020 09:56:46 +0200
Message-ID: <CANQwDwebQXS8pghXYCBMHvQhCLr9PKFZYsWO4hLAqV=JctV8dA@mail.gmail.com> (raw)
In-Reply-To: <20200817013206.GA57201@syl.lan>

On Mon, 17 Aug 2020 at 03:32, Taylor Blau <me@ttaylorr.com> wrote:
> On Mon, Aug 17, 2020 at 02:16:08AM +0200, Jakub Narębski wrote:
> > "Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > >
> > > We will introduce an additional commit-graph chunk, Generation Data chunk,
> > > and store corrected commit date offsets in GDAT chunk while storing
> > > topological levels in CDAT chunk. The old versions of Git would ignore GDAT
> > > chunk, using topological levels from CDAT chunk. In contrast, new versions
> > > of Git would use corrected commit dates, falling back to topological level
> > > if the generation data chunk is absent in the commit-graph file.
> >
> > All right.
> >
> > However I think the cover letter should also describe what should happen
> > in a mixed version environment (for example new Git on command line,
> > copy of old Git used by GUI client), and in particular what should
> > happen in a mixed-chain case - both for reading and for writing the
> > commit-graph file.
> >
> > For *writing*: because old Git would create commit-graph layers without
> > the GDAT chunk, to simplify the behavior and make easy to reason about
> > commit-graph data (the situation should be not that common, and
> > transient -- it should get more rare as the time goes), we want the
> > following behavior from new Git:
> >
> > - If top layer contains the GDAT chunk, or we are rewriting commit-graph
> >   file (--split=replace), or we are merging layers and there are no
> >   layers without GDAT chunk below set of layers that are merged, then
> >
> >      write commit-graph file or commit-graph layer with GDAT chunk,
> >
> >   otherwise
> >
> >      write commit-graph layer without GDAT chunk.
> >
> >   This means that there are commit-graph layers without GDAT chunk if
> >   and only if the top layer is also without GDAT chunk.
>
> This seems very sane to me, and I'd be glad to see it spelled out in
> more specific detail. I was wondering this myself, and had to double
> check with Stolee off-list that my interpretation of Abhishek's code was
> correct.
>
> But yes, only writing GDAT chunks when all layers in the chain have GDAT
> chunks makes sense, since we can't interoperate between corrected dates
> and topological levels. Since we can't fill in the GDAT data of layers
> generated in pre-GDAT versions of Git without invalidating the GDAT
> layers on-disk, there's no point to speculatively computing both chunks.
>
> Merging rules are obviously correct, which is good. For what it's worth,
> the '--split=replace' case is what we'll really care about at GitHub,
> since it's unlikely we'd drop all existing commit-graph chains and
> rebuild them from scratch. More likely is that we'll let the new GDAT
> chunks trickle in over time when we run 'git commit-graph write' with
> '--split=replace', which happens "every so often".

To be more detailed, without '--split=replace' we would want the following
layer merging behavior:

   [layer with GDAT][with GDAT][without GDAT][without GDAT][without GDAT]

In the split commit-graph chain above, merging two topmost layers
should create a layer without GDAT; merging three topmost layers
(and any other layers, e.g. two middle ones) should create a layer
 with GDAT.

> > For *reading* we want to use generation number v2 (corrected commit
> > date) if possible, and fall back to generation number v1 (topological
> > levels).
> >
> > - If the top layer contains the GDAT chunk (or maybe even if the topmost
> >   layer that involves all commits in question, not necessarily the top
> >   layer in the full commit-graph chain), then use generation number v2
>
> I don't follow this. If we have a multi-layer chain, either all or none
> of the layers have a GDAT chunk. So, "if the top layer contains the GDAT
> chunk" makes sense, since it implies that all layers have the GDAT
> chunk. I don't see how "even if the topmost layer that involves all
> commits in question" would be possible, since (if I'm understanding your
> description correctly), we can't have *some* of the layers having a GDAT
> chunk with others only having a CDAT chunk.
>
> I'm a little confused here.

This is only speculative, and most probably totally unnecessary
complication (either that, or something that we would get for free).
Assume that the command in question operates only on
historical data; for example `git log --topo-order HEAD~1000`.
If all commits (or, what's equivalent, most recent commits
i.e. HEAD~1000) have their data in split commit-graph layers
with GDAT, we can theoretically use generation number v2,
even if there are some newer commits that have their data
in layers without GDAT (and some even newer ones outside
commit-graph files).

I hope that this explains my (possibly harebrained) idea.

> >   - commit_graph_data->generation stores corrected commit date,
> >     computed as sum of committer date (from CDAT) and offset (from GDAT)
> >
> >   - A can reach B   =>  gen(A) < gen(B)
> >
> >   - there is no need for committer date heuristics, and no need for
> >     limiting use of generation number to where there is a cutoff (to not
> >     hamper performance).
> >
> > - If there are layers without GDAT chunks, which thanks to the write
> >   behavior means simply top layer without GDAT chunk, we need to turn
> >   off use of generation numbers or fall back to using topological levels
>
> Good, I'm glad that this can be a quick check (that we can cache for
> future reads, but I'm not even sure the caching would be necessary
> without measuring).

There is a question where to store the information that we cannot
use generation number v2 (that 'generation' contains topological
levels and not corrected commit date):
- create new global variable
- store it in `struct split_commit_graph_opts`
- set `chunk_generation_data` to NULL for all graphs
  in chain (it is in `struct commit_graph`)?

> >
> >   - commit_graph_data->generation stores topological levels,
> >     taken from CDAT chunk (30-bits)
> >
> >   - A can reach B   =>  gen(A) < gen(B)
> >
> >   - we probably want to keep tie-breaking of sorting by generation
> >     number via committer date, and limit use of generation number as
> >     opposed to using committer date heuristics (with slop) to not make
> >     performance worse.
>
> All makes very good sense, except for the one point I raised above.
>
> > >
> > > Thanks to Dr. Stolee, Dr. Narębski, and Taylor for their reviews on the
> > > first version.
>
> Thanks, Abhishek for your great work on this. I was feeling bad that I
> wasn't more involved in the early discussions about the transition plan,
> but what you, Stolee, and Jakub came up with all seems like what I would
> have suggested, anyway ;-).

Thank you for your work on improving this feature.

Best,
-- 
Jakub Narebski

  reply index

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  9:13 [PATCH 0/6] " Abhishek Kumar via GitGitGadget
2020-07-28  9:13 ` [PATCH 1/6] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-07-28 15:28   ` Taylor Blau
2020-07-30  5:24     ` Abhishek Kumar
2020-08-04  0:46   ` Jakub Narębski
2020-08-04  0:56     ` Taylor Blau
2020-08-04 10:10       ` Jakub Narębski
2020-08-04  7:55     ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 2/6] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-07-28 13:00   ` Derrick Stolee
2020-07-28 15:30     ` Taylor Blau
2020-08-05 23:16   ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 3/6] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-07-28 13:14   ` Derrick Stolee
2020-07-28 15:19     ` René Scharfe
2020-07-28 15:58       ` Derrick Stolee
2020-07-28 16:01     ` Taylor Blau
2020-07-30  6:07     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 4/6] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-07-28 16:03   ` Taylor Blau
2020-07-28  9:13 ` [PATCH 5/6] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-07-28 16:12   ` Taylor Blau
2020-07-30  6:52     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 6/6] commit-graph: implement corrected commit date offset Abhishek Kumar via GitGitGadget
2020-07-28 15:55   ` Derrick Stolee
2020-07-28 16:23     ` Taylor Blau
2020-07-30  7:27     ` Abhishek Kumar
2020-07-28 14:54 ` [PATCH 0/6] [GSoC] Implement Corrected Commit Date Taylor Blau
2020-07-30  7:47   ` Abhishek Kumar
2020-07-28 16:35 ` Derrick Stolee
2020-08-09  2:53 ` [PATCH v2 00/10] " Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 01/10] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 04/10] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 05/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-10 16:28     ` Derrick Stolee
2020-08-11 11:03       ` Abhishek Kumar
2020-08-11 12:27         ` Derrick Stolee
2020-08-11 18:58           ` Taylor Blau
2020-08-09  2:53   ` [PATCH v2 06/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 07/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-10 14:23     ` Derrick Stolee
2020-08-14  4:59       ` Abhishek Kumar
2020-08-14 12:24         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 08/10] commit-graph: handle mixed generation commit chains Abhishek Kumar via GitGitGadget
2020-08-10 16:42     ` Derrick Stolee
2020-08-11 11:36       ` Abhishek Kumar
2020-08-11 12:43         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-10 16:47   ` [PATCH v2 00/10] [GSoC] Implement Corrected Commit Date Derrick Stolee
2020-08-15 16:39   ` [PATCH v3 00/11] " Abhishek Kumar via GitGitGadget
2020-08-15 16:39     ` [PATCH v3 01/11] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-17 22:30       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 02/11] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-18 14:18       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 03/11] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-19 17:54       ` Jakub Narębski
2020-08-21  4:11         ` Abhishek Kumar
2020-08-25 11:11           ` Jakub Narębski
2020-09-01 11:35             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 04/11] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-17 13:22       ` Derrick Stolee
2020-08-21 11:05       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 05/11] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-21 13:14       ` Jakub Narębski
2020-08-25  5:04         ` Abhishek Kumar
2020-08-25 12:18           ` Jakub Narębski
2020-09-01 12:06             ` Abhishek Kumar
2020-09-03 13:42               ` Jakub Narębski
2020-09-05 17:21                 ` Abhishek Kumar
2020-09-13 15:39                   ` Jakub Narębski
2020-09-28 21:48                     ` Jakub Narębski
2020-10-05  5:25                       ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 06/11] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-08-21 18:43       ` Jakub Narębski
2020-08-25  6:14         ` Abhishek Kumar
2020-08-25  7:33           ` Jakub Narębski
2020-08-25  7:56             ` Jakub Narębski
2020-09-01 10:26               ` Abhishek Kumar
2020-09-03  9:25                 ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 07/11] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-22  0:05       ` Jakub Narębski
2020-08-25  6:49         ` Abhishek Kumar
2020-08-25 10:07           ` Jakub Narębski
2020-09-01 11:01             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 08/11] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-22 13:09       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 09/11] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-08-22 17:14       ` Jakub Narębski
2020-08-26  7:15         ` Abhishek Kumar
2020-08-26 10:38           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 10/11] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-22 19:09       ` Jakub Narębski
2020-09-01 10:08         ` Abhishek Kumar
2020-09-03 19:11           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 11/11] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-22 22:20       ` Jakub Narębski
2020-08-27  6:39         ` Abhishek Kumar
2020-08-27 12:43           ` Jakub Narębski
2020-08-27 13:15           ` Derrick Stolee
2020-09-01 13:01             ` Abhishek Kumar
2020-08-17  0:13     ` [PATCH v3 00/11] [GSoC] Implement Corrected Commit Date Jakub Narębski
     [not found]       ` <CANQwDwdKp7oKy9BeKdvKhwPUiq0R5MS8TCw-eWGCYCoMGv=G-g@mail.gmail.com>
2020-08-17  1:32         ` Fwd: " Taylor Blau
2020-08-17  7:56           ` Jakub Narębski [this message]
2020-08-18  6:12       ` Abhishek Kumar
2020-08-23 15:27       ` Jakub Narębski
2020-08-24  2:49         ` Abhishek Kumar
2020-10-07 14:09     ` [PATCH v4 00/10] " Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 01/10] commit-graph: fix regression when computing Bloom filters Abhishek Kumar via GitGitGadget
2020-10-24 23:16         ` Jakub Narębski
2020-10-25 20:58           ` Taylor Blau
2020-10-07 14:09       ` [PATCH v4 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-10-24 23:41         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-10-25 10:52         ` Jakub Narębski
2020-10-27  6:33           ` Abhishek Kumar
2020-10-07 14:09       ` [PATCH v4 04/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-10-25 13:48         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 05/10] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-10-25 22:17         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 06/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-10-27 18:53         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 07/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-10-30 12:45         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 08/10] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANQwDwebQXS8pghXYCBMHvQhCLr9PKFZYsWO4hLAqV=JctV8dA@mail.gmail.com' \
    --to=jnareb@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git