Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Abhishek Kumar <abhishekkumar8222@gmail.com>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: abhishekkumar8222@gmail.com, git@vger.kernel.org,
	gitgitgadget@gmail.com, me@ttaylorr.com, stolee@gmail.com
Subject: Re: [PATCH v3 07/11] commit-graph: implement corrected commit date
Date: Tue, 25 Aug 2020 12:19:54 +0530
Message-ID: <20200825064954.GA645690@Abhishek-Arch> (raw)
In-Reply-To: <85wo1rk0iy.fsf@gmail.com>

On Sat, Aug 22, 2020 at 02:05:41AM +0200, Jakub Narębski wrote:
> "Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > From: Abhishek Kumar <abhishekkumar8222@gmail.com>
> >
> > With most of preparations done, let's implement corrected commit date.
> >
> > The corrected commit date for a commit is defined as:
> >
> > * A commit with no parents (a root commit) has corrected commit date
> >   equal to its committer date.
> > * A commit with at least one parent has corrected commit date equal to
> >   the maximum of its commit date and one more than the largest corrected
> >   commit date among its parents.
> 
> Good.
> 
> >
> > To minimize the space required to store corrected commit date, Git
> > stores corrected commit date offsets into the commit-graph file. The
> > corrected commit date offset for a commit is defined as the difference
> > between its corrected commit date and actual commit date.
> 
> Perhaps we should add more details about data type sizes in question.

Will add.

> 
> Storing corrected commit date requires sizeof(timestamp_t) bytes, which
> in most cases is 64 bits (uintmax_t).  However corrected commit date
> offsets can be safely stored^* using only 32 bits.  This halves the size
> of GDAT chunk, reducing per-commit storage from 2*H + 16 + 8 bytes to
> 2*H + 16 + 4 bytes, which is reduction of around 6%, not including
> header, fanout table (OIDF) and extra edges list (EDGE).
> 
> Which might mean that the extra complication is not worth it, and we
> should store corrected commit date directly instead.
> 
> *) unless for example one of commits is malformed but valid,
>    and has committerdate of 0 Unix time, 1 January 1970.
> 
> >
> > While Git does not write out offsets at this stage, Git stores the
> > corrected commit dates in member generation of struct commit_graph_data.
> > It will begin writing commit date offsets with the introduction of
> > generation data chunk.
> 
> OK, so the agenda for introducing geeration number v2 is as follows:
> - compute generation numbers v2, i.e. corrected commit date
> - store corrected commit date [offsets] in new GDAT chunk,
>   unless backward-compatibility concerns require us to not to
> - load [and compute] corrected commit date from commit-graph
>   storing it as 'generation' field of `struct commit_graph_data`,
>   unless backward-compatibility concerns require us to store
>   topological levels (generation number v1) in there instead
> 

The last point is not correct. We always store topological levels into
the topo_levels slab introduced and always store corrected commit date
into data->generation, regardless of backward compatibility concerns.

We could avoid initializing topo_slab if we are not writing generation
data chunk (and thus don't need corrected commit dates) but that
wouldn't have an impact on run time while writing commit-graph because
computing corrected commit dates is cheap as the main cost is in walking
the graph and writing the file.

> Because the reachability condition for corrected commit date and for
> topological level is exactly the same, we don't need to do anything to
> take advantage of generation number v2.
> 
> Though we can use generation number v2 in more cases, where we turned
> off use of generation numbers because v1 gave worse performance than
> date heuristics.
> 
> Did I got this right?
> 
> >
> > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> > ---
> >  commit-graph.c | 58 +++++++++++++++++++++++++++-----------------------
> >  1 file changed, 31 insertions(+), 27 deletions(-)
> >
> > diff --git a/commit-graph.c b/commit-graph.c
> > index a2f15b2825..fd69534dd5 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -169,11 +169,6 @@ static int commit_gen_cmp(const void *va, const void *vb)
> >  	else if (generation_a > generation_b)
> >  		return 1;
> >
> > -	/* use date as a heuristic when generations are equal */
> > -	if (a->date < b->date)
> > -		return -1;
> > -	else if (a->date > b->date)
> > -		return 1;
> 
> At first I was wondering why this tie-breaking is beig removed; wouldn't
> be needed for backward-compatibility?  But then I remembered that this
> comparison function is used _only_ for sorting commits when writing
> Bloom filters, for `git commit-graph write --reachable --changed-paths ...`
> 
> Assuming that when writing the commit graph we always compute geeration
> number v2 and 'generation' field stores corrected commit date, we don't
> need to use date as a heuristic when generations are equal, and it would
> not help in tie-breaking anyway.
> 
> All right.
> 
> >  	return 0;
> >  }
> >
> > @@ -1342,10 +1337,14 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> >  					ctx->commits.nr);
> >  	for (i = 0; i < ctx->commits.nr; i++) {
> >  		uint32_t level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]);
> > +		timestamp_t corrected_commit_date = commit_graph_data_at(ctx->commits.list[i])->generation;
> 
> All right, so the pattern is to add 'corrected_commit_date' stuff after
> 'topological_level' stuff.
> 
> >
> >  		display_progress(ctx->progress, i + 1);
> >  		if (level != GENERATION_NUMBER_V1_INFINITY &&
> > -		    level != GENERATION_NUMBER_ZERO)
> > +		    level != GENERATION_NUMBER_ZERO &&
> > +		    corrected_commit_date != GENERATION_NUMBER_INFINITY &&
> > +		    corrected_commit_date != GENERATION_NUMBER_ZERO
> > +		    )
> >  			continue;
> >
> >  		commit_list_insert(ctx->commits.list[i], &list);
> > @@ -1354,17 +1353,26 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> >  			struct commit_list *parent;
> >  			int all_parents_computed = 1;
> >  			uint32_t max_level = 0;
> > +			timestamp_t max_corrected_commit_date = 0;
> >
> >  			for (parent = current->parents; parent; parent = parent->next) {
> >  				level = *topo_level_slab_at(ctx->topo_levels, parent->item);
> > +				corrected_commit_date = commit_graph_data_at(parent->item)->generation;
> >
> >  				if (level == GENERATION_NUMBER_V1_INFINITY ||
> > -				    level == GENERATION_NUMBER_ZERO) {
> > +				    level == GENERATION_NUMBER_ZERO ||
> > +				    corrected_commit_date == GENERATION_NUMBER_INFINITY ||
> > +				    corrected_commit_date == GENERATION_NUMBER_ZERO
> > +				    ) {
> >  					all_parents_computed = 0;
> >  					commit_list_insert(parent->item, &list);
> >  					break;
> > -				} else if (level > max_level) {
> > -					max_level = level;
> > +				} else {
> > +					if (level > max_level)
> > +						max_level = level;
> > +
> > +					if (corrected_commit_date > max_corrected_commit_date)
> > +						max_corrected_commit_date = corrected_commit_date;
> >  				}
> >  			}
> >
> > @@ -1374,6 +1382,10 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> >  				if (max_level > GENERATION_NUMBER_MAX - 1)
> >  					max_level = GENERATION_NUMBER_MAX - 1;
> >  				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
> > +
> > +				if (current->date > max_corrected_commit_date)
> > +					max_corrected_commit_date = current->date - 1;
> > +				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
> >  			}
> >  		}
> >  	}
> 
> All right.  Looks good to me.
> 
> > @@ -2372,8 +2384,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
> >  	for (i = 0; i < g->num_commits; i++) {
> >  		struct commit *graph_commit, *odb_commit;
> >  		struct commit_list *graph_parents, *odb_parents;
> > -		timestamp_t max_generation = 0;
> > -		timestamp_t generation;
> > +		timestamp_t max_corrected_commit_date = 0;
> > +		timestamp_t corrected_commit_date;
> 
> This is simple, and perhaps unnecessary, rename of variables.
> Shouldn't we however verify *both* topological level, and
> (if exists) corrected commit date?

The problem with verifying both topological level and corrected commit
dates is that we would have to re-fill commit_graph_data slab with commit
data chunk as we cannot modify data->generation otherwise, essentially
repeating the whole verification process.

While it's okay for now, I might take this up in a future series [1].

[1]: https://lore.kernel.org/git/4043ffbc-84df-0cd6-5c75-af80383a56cf@gmail.com/

> 
> >
> >  		display_progress(progress, i + 1);
> >  		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
> > @@ -2412,9 +2424,9 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
> >  					     oid_to_hex(&graph_parents->item->object.oid),
> >  					     oid_to_hex(&odb_parents->item->object.oid));
> >
> > -			generation = commit_graph_generation(graph_parents->item);
> > -			if (generation > max_generation)
> > -				max_generation = generation;
> > +			corrected_commit_date = commit_graph_generation(graph_parents->item);
> > +			if (corrected_commit_date > max_corrected_commit_date)
> > +				max_corrected_commit_date = corrected_commit_date;
> 
> Actually, commit_graph_generation(<commit>) can return either corrected
> commit date, or topological level, the latter in backward-compatibility
> case (if at least one commit-graph file is lacking GDAT chunk, because
> [some of] it was created by the "Old" Git).
> 
> >
> >  			graph_parents = graph_parents->next;
> >  			odb_parents = odb_parents->next;
> > @@ -2436,20 +2448,12 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
> >  		if (generation_zero == GENERATION_ZERO_EXISTS)
> >  			continue;
> >
> > -		/*
> > -		 * If one of our parents has generation GENERATION_NUMBER_MAX, then
> > -		 * our generation is also GENERATION_NUMBER_MAX. Decrement to avoid
> > -		 * extra logic in the following condition.
> > -		 */
> > -		if (max_generation == GENERATION_NUMBER_MAX)
> > -			max_generation--;
> 
> All right, this was needed for checking the correctness of topological
> levels (generation number v1) because we were checking not that it
> fullfills the reachability condition, but more strict one: namely that
> topological level of commit is equal to maximum of topological levels of
> its parents plus one.
> 
> The comment about checking both generation number v1 and v2 still
> applies.
> 
> > -
> > -		generation = commit_graph_generation(graph_commit);
> > -		if (generation != max_generation + 1)
> > -			graph_report(_("commit-graph generation for commit %s is %u != %u"),
> > +		corrected_commit_date = commit_graph_generation(graph_commit);
> > +		if (corrected_commit_date < max_corrected_commit_date + 1)
> > +			graph_report(_("commit-graph generation for commit %s is %"PRItime" < %"PRItime),
> >  				     oid_to_hex(&cur_oid),
> > -				     generation,
> > -				     max_generation + 1);
> > +				     corrected_commit_date,
> > +				     max_corrected_commit_date + 1);
> 
> All right, we check less strict condition for corrected commit date.
> 
> >
> >  		if (graph_commit->date != odb_commit->date)
> >  			graph_report(_("commit date for commit %s in commit-graph is %"PRItime" != %"PRItime),
> 
> Best,
> -- 
> Jakub Narębski

  reply index

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  9:13 [PATCH 0/6] [GSoC] Implement Corrected Commit Date Abhishek Kumar via GitGitGadget
2020-07-28  9:13 ` [PATCH 1/6] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-07-28 15:28   ` Taylor Blau
2020-07-30  5:24     ` Abhishek Kumar
2020-08-04  0:46   ` Jakub Narębski
2020-08-04  0:56     ` Taylor Blau
2020-08-04 10:10       ` Jakub Narębski
2020-08-04  7:55     ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 2/6] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-07-28 13:00   ` Derrick Stolee
2020-07-28 15:30     ` Taylor Blau
2020-08-05 23:16   ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 3/6] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-07-28 13:14   ` Derrick Stolee
2020-07-28 15:19     ` René Scharfe
2020-07-28 15:58       ` Derrick Stolee
2020-07-28 16:01     ` Taylor Blau
2020-07-30  6:07     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 4/6] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-07-28 16:03   ` Taylor Blau
2020-07-28  9:13 ` [PATCH 5/6] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-07-28 16:12   ` Taylor Blau
2020-07-30  6:52     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 6/6] commit-graph: implement corrected commit date offset Abhishek Kumar via GitGitGadget
2020-07-28 15:55   ` Derrick Stolee
2020-07-28 16:23     ` Taylor Blau
2020-07-30  7:27     ` Abhishek Kumar
2020-07-28 14:54 ` [PATCH 0/6] [GSoC] Implement Corrected Commit Date Taylor Blau
2020-07-30  7:47   ` Abhishek Kumar
2020-07-28 16:35 ` Derrick Stolee
2020-08-09  2:53 ` [PATCH v2 00/10] " Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 01/10] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 04/10] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 05/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-10 16:28     ` Derrick Stolee
2020-08-11 11:03       ` Abhishek Kumar
2020-08-11 12:27         ` Derrick Stolee
2020-08-11 18:58           ` Taylor Blau
2020-08-09  2:53   ` [PATCH v2 06/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 07/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-10 14:23     ` Derrick Stolee
2020-08-14  4:59       ` Abhishek Kumar
2020-08-14 12:24         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 08/10] commit-graph: handle mixed generation commit chains Abhishek Kumar via GitGitGadget
2020-08-10 16:42     ` Derrick Stolee
2020-08-11 11:36       ` Abhishek Kumar
2020-08-11 12:43         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-10 16:47   ` [PATCH v2 00/10] [GSoC] Implement Corrected Commit Date Derrick Stolee
2020-08-15 16:39   ` [PATCH v3 00/11] " Abhishek Kumar via GitGitGadget
2020-08-15 16:39     ` [PATCH v3 01/11] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-17 22:30       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 02/11] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-18 14:18       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 03/11] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-19 17:54       ` Jakub Narębski
2020-08-21  4:11         ` Abhishek Kumar
2020-08-25 11:11           ` Jakub Narębski
2020-09-01 11:35             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 04/11] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-17 13:22       ` Derrick Stolee
2020-08-21 11:05       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 05/11] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-21 13:14       ` Jakub Narębski
2020-08-25  5:04         ` Abhishek Kumar
2020-08-25 12:18           ` Jakub Narębski
2020-09-01 12:06             ` Abhishek Kumar
2020-09-03 13:42               ` Jakub Narębski
2020-09-05 17:21                 ` Abhishek Kumar
2020-09-13 15:39                   ` Jakub Narębski
2020-09-28 21:48                     ` Jakub Narębski
2020-10-05  5:25                       ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 06/11] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-08-21 18:43       ` Jakub Narębski
2020-08-25  6:14         ` Abhishek Kumar
2020-08-25  7:33           ` Jakub Narębski
2020-08-25  7:56             ` Jakub Narębski
2020-09-01 10:26               ` Abhishek Kumar
2020-09-03  9:25                 ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 07/11] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-22  0:05       ` Jakub Narębski
2020-08-25  6:49         ` Abhishek Kumar [this message]
2020-08-25 10:07           ` Jakub Narębski
2020-09-01 11:01             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 08/11] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-22 13:09       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 09/11] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-08-22 17:14       ` Jakub Narębski
2020-08-26  7:15         ` Abhishek Kumar
2020-08-26 10:38           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 10/11] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-22 19:09       ` Jakub Narębski
2020-09-01 10:08         ` Abhishek Kumar
2020-09-03 19:11           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 11/11] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-22 22:20       ` Jakub Narębski
2020-08-27  6:39         ` Abhishek Kumar
2020-08-27 12:43           ` Jakub Narębski
2020-08-27 13:15           ` Derrick Stolee
2020-09-01 13:01             ` Abhishek Kumar
2020-08-17  0:13     ` [PATCH v3 00/11] [GSoC] Implement Corrected Commit Date Jakub Narębski
     [not found]       ` <CANQwDwdKp7oKy9BeKdvKhwPUiq0R5MS8TCw-eWGCYCoMGv=G-g@mail.gmail.com>
2020-08-17  1:32         ` Fwd: " Taylor Blau
2020-08-17  7:56           ` Jakub Narębski
2020-08-18  6:12       ` Abhishek Kumar
2020-08-23 15:27       ` Jakub Narębski
2020-08-24  2:49         ` Abhishek Kumar
2020-10-07 14:09     ` [PATCH v4 00/10] " Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 01/10] commit-graph: fix regression when computing Bloom filters Abhishek Kumar via GitGitGadget
2020-10-24 23:16         ` Jakub Narębski
2020-10-25 20:58           ` Taylor Blau
2020-10-07 14:09       ` [PATCH v4 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-10-24 23:41         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-10-25 10:52         ` Jakub Narębski
2020-10-27  6:33           ` Abhishek Kumar
2020-10-07 14:09       ` [PATCH v4 04/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-10-25 13:48         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 05/10] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-10-25 22:17         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 06/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-10-27 18:53         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 07/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-10-30 12:45         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 08/10] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200825064954.GA645690@Abhishek-Arch \
    --to=abhishekkumar8222@gmail.com \
    --cc=85wo1rk0iy.fsf@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jnareb@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git