Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: jnareb@gmail.com (Jakub Narębski)
To: "Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>,
	Taylor Blau <me@ttaylorr.com>,
	Abhishek Kumar <abhishekkumar8222@gmail.com>
Subject: Re: [PATCH v4 06/10] commit-graph: implement corrected commit date
Date: Tue, 27 Oct 2020 19:53:23 +0100
Message-ID: <85r1pjzejg.fsf@gmail.com> (raw)
In-Reply-To: <694ef1ec08d9dc96a74a2631b2710ad206397dbc.1602079786.git.gitgitgadget@gmail.com> (Abhishek Kumar via GitGitGadget's message of "Wed, 07 Oct 2020 14:09:41 +0000")

"Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Abhishek Kumar <abhishekkumar8222@gmail.com>
>
> With most of preparations done, let's implement corrected commit date.
>
> The corrected commit date for a commit is defined as:
>
> * A commit with no parents (a root commit) has corrected commit date
>   equal to its committer date.
> * A commit with at least one parent has corrected commit date equal to
>   the maximum of its commit date and one more than the largest corrected
>   commit date among its parents.

All right.  We might want to say that it fulfills the same reachability
criteria as topological level, but perhaps this level of detail is not
necessary here.

> As a special case, a root commit with timestamp of zero (01.01.1970
> 00:00:00Z) has corrected commit date of one, to be able to distinguish
> from GENERATION_NUMBER_ZERO (that is, an uncomputed corrected commit
> date).

I'm not sure if this special case is really necessary, but it makes for
cleaner reasoning.

> To minimize the space required to store corrected commit date, Git
> stores corrected commit date offsets into the commit-graph file. The
> corrected commit date offset for a commit is defined as the difference
> between its corrected commit date and actual commit date.
>
> Storing corrected commit date requires sizeof(timestamp_t) bytes, which
> in most cases is 64 bits (uintmax_t). However, corrected commit date
> offsets can be safely stored using only 32-bits. This halves the size
> of GDAT chunk, which is a reduction of around 6% in the size of
> commit-graph file.
>
> However, using offsets be problematic if one of commits is malformed but
> valid and has committerdate of 0 Unix time, as the offset would be the
> same as corrected commit date and thus require 64-bits to be stored
> properly.
>
> While Git does not write out offsets at this stage, Git stores the
> corrected commit dates in member generation of struct commit_graph_data.
> It will begin writing commit date offsets with the introduction of
> generation data chunk.

All right.

>
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>

Somewhere in the commit message we should also describe that this commit
changes how commit-graph is verified: from checking that the generation
number agrees with _topological level definition_, that is that for a
given commit it is 1 more than maximum of its parents (with the caveat
that we need to handle GENERATION_NUMBER_V1_MAX values correctly), to
checking that slightly weaker condition fulfilled by both topological
levels (generation number v1) and by corrected commit date (generation
number v2) that for a given commit its generation number is 1 more than
maximum of its parents or larger.

But, as far as I understand it, current code does not handle correctly
GENERATION_NUMBER_V1_MAX case (if we use generation number v1).

On the other hand we could have simpy use functional check, that
generation number used (which can be v1 or v2, or any similar other)
fulfills the reachability condition for each edge, which can be
simplified to checking that generation(parents) <= generation(commit).
If the reachability condition is true for each edge, then it is true for
each path, and for each commit.

> ---
>  commit-graph.c | 43 +++++++++++++++++++++++--------------------
>  1 file changed, 23 insertions(+), 20 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index cedd311024..03948adfce 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -154,11 +154,6 @@ static int commit_gen_cmp(const void *va, const void *vb)
>  	else if (generation_a > generation_b)
>  		return 1;
>  
> -	/* use date as a heuristic when generations are equal */
> -	if (a->date < b->date)
> -		return -1;
> -	else if (a->date > b->date)
> -		return 1;

Why this change?  It is not described in the commit message.

Note that while this tie-breaking fallback doesn't make much sense for
corrected committer date generation number v2, this tie-breaking helps
if we have to use topological levels (generation number v2).

>  	return 0;
>  }
>  
> @@ -1357,10 +1352,14 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  					ctx->commits.nr);
>  	for (i = 0; i < ctx->commits.nr; i++) {
>  		timestamp_t level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]);

Sidenote: I haven't noticed it earlier, but here 'uint32_t' might be
enough; no need for 'timestamp_t' for 'level' variable.

> +		timestamp_t corrected_commit_date = commit_graph_data_at(ctx->commits.list[i])->generation;
>

All right, we compute both generation numbers: topological levels and
corrected commit date.

I guess we use 'corrected_commit_date' instead of simply 'generation' to
make it asier to remember which is which.

>  		display_progress(ctx->progress, i + 1);
>  		if (level != GENERATION_NUMBER_INFINITY &&
> -		    level != GENERATION_NUMBER_ZERO)
> +		    level != GENERATION_NUMBER_ZERO &&
> +		    corrected_commit_date != GENERATION_NUMBER_INFINITY &&
> +		    corrected_commit_date != GENERATION_NUMBER_ZERO

Straightforward addition.

> +		    )

Why this closing parenthesis is now in separated line?

>  			continue;
>  
>  		commit_list_insert(ctx->commits.list[i], &list);
> @@ -1369,17 +1368,25 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  			struct commit_list *parent;
>  			int all_parents_computed = 1;
>  			uint32_t max_level = 0;
> +			timestamp_t max_corrected_commit_date = 0;

All right, straightforward addition.

>  
>  			for (parent = current->parents; parent; parent = parent->next) {
>  				level = *topo_level_slab_at(ctx->topo_levels, parent->item);
> -

Why we have removed this empty line?

> +				corrected_commit_date = commit_graph_data_at(parent->item)->generation;

All right.

>  				if (level == GENERATION_NUMBER_INFINITY ||
> -				    level == GENERATION_NUMBER_ZERO) {
> +				    level == GENERATION_NUMBER_ZERO ||
> +				    corrected_commit_date == GENERATION_NUMBER_INFINITY ||
> +				    corrected_commit_date == GENERATION_NUMBER_ZERO
> +				    ) {

All right, same as above.

>  					all_parents_computed = 0;
>  					commit_list_insert(parent->item, &list);
>  					break;
> -				} else if (level > max_level) {
> -					max_level = level;
> +				} else {
> +					if (level > max_level)
> +						max_level = level;
> +
> +					if (corrected_commit_date > max_corrected_commit_date)
> +						max_corrected_commit_date = corrected_commit_date;
>  				}

All right, reasonable and straightforward.

>  			}
>  
> @@ -1389,6 +1396,10 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  				if (max_level > GENERATION_NUMBER_V1_MAX - 1)
>  					max_level = GENERATION_NUMBER_V1_MAX - 1;
>  				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
> +
> +				if (current->date && current->date > max_corrected_commit_date)
> +					max_corrected_commit_date = current->date - 1;
> +				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;

All right.

Here we use the same trick as in previous commit (and as above) to avoid
any possible overflow, to minimize number of conditionals.  The fact
that max_corrected_commit_date might store incorrect value doesn't
matter, as it is reset at beginning of this loop.

>  			}
>  		}
>  	}
> @@ -2485,17 +2496,9 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
>  		if (generation_zero == GENERATION_ZERO_EXISTS)
>  			continue;
>  
> -		/*
> -		 * If one of our parents has generation GENERATION_NUMBER_V1_MAX, then
> -		 * our generation is also GENERATION_NUMBER_V1_MAX. Decrement to avoid
> -		 * extra logic in the following condition.
> -		 */
> -		if (max_generation == GENERATION_NUMBER_V1_MAX)
> -			max_generation--;
> -

Perhaps in the future we should check that both topological levels, and
also corrected committer date (if it exists) for correctness according
to their definition.  Then the above removed part would be restored (but
with s/max_generation/max_level/).

>  		generation = commit_graph_generation(graph_commit);
> -		if (generation != max_generation + 1)
> -			graph_report(_("commit-graph generation for commit %s is %u != %u"),
> +		if (generation < max_generation + 1)
> +			graph_report(_("commit-graph generation for commit %s is %"PRItime" < %"PRItime),

All right, so we relaxed the check so that it will be fulfilled by
generation number v2 (and also by generation number v1, as it implies
the more strict check for v1).

What would happen however if generation holds topological levels, and it
is GENERATION_NUMBER_V1_MAX for at least one parent, which means it is
GENERATION_NUMBER_V1_MAX for a commit?  As you can check, the condition
would be true: GENERATION_NUMBER_V1_MAX < GENERATION_NUMBER_V1_MAX + 1,
so the `git commit-graph verify` would incorrectly say that there is
a problem with generation number, while there isn't one (false positive
detection of error).

Sidenote: I think we don't have to worry about having to introduce
GENERATION_NUMBER_V2_MAX, as the in-memory size (of reconstructed from
disck representation) corrected commiter date is the same as of commiter
date itself, plus some, and I don't see us coming close to 64-bit limit
of timestamp_t for commit dates.

>  				     oid_to_hex(&cur_oid),
>  				     generation,
>  				     max_generation + 1);

Best,
-- 
Jakub Narębski

  reply index

Thread overview: 153+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  9:13 [PATCH 0/6] [GSoC] Implement Corrected Commit Date Abhishek Kumar via GitGitGadget
2020-07-28  9:13 ` [PATCH 1/6] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-07-28 15:28   ` Taylor Blau
2020-07-30  5:24     ` Abhishek Kumar
2020-08-04  0:46   ` Jakub Narębski
2020-08-04  0:56     ` Taylor Blau
2020-08-04 10:10       ` Jakub Narębski
2020-08-04  7:55     ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 2/6] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-07-28 13:00   ` Derrick Stolee
2020-07-28 15:30     ` Taylor Blau
2020-08-05 23:16   ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 3/6] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-07-28 13:14   ` Derrick Stolee
2020-07-28 15:19     ` René Scharfe
2020-07-28 15:58       ` Derrick Stolee
2020-07-28 16:01     ` Taylor Blau
2020-07-30  6:07     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 4/6] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-07-28 16:03   ` Taylor Blau
2020-07-28  9:13 ` [PATCH 5/6] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-07-28 16:12   ` Taylor Blau
2020-07-30  6:52     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 6/6] commit-graph: implement corrected commit date offset Abhishek Kumar via GitGitGadget
2020-07-28 15:55   ` Derrick Stolee
2020-07-28 16:23     ` Taylor Blau
2020-07-30  7:27     ` Abhishek Kumar
2020-07-28 14:54 ` [PATCH 0/6] [GSoC] Implement Corrected Commit Date Taylor Blau
2020-07-30  7:47   ` Abhishek Kumar
2020-07-28 16:35 ` Derrick Stolee
2020-08-09  2:53 ` [PATCH v2 00/10] " Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 01/10] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 04/10] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 05/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-10 16:28     ` Derrick Stolee
2020-08-11 11:03       ` Abhishek Kumar
2020-08-11 12:27         ` Derrick Stolee
2020-08-11 18:58           ` Taylor Blau
2020-08-09  2:53   ` [PATCH v2 06/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 07/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-10 14:23     ` Derrick Stolee
2020-08-14  4:59       ` Abhishek Kumar
2020-08-14 12:24         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 08/10] commit-graph: handle mixed generation commit chains Abhishek Kumar via GitGitGadget
2020-08-10 16:42     ` Derrick Stolee
2020-08-11 11:36       ` Abhishek Kumar
2020-08-11 12:43         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-10 16:47   ` [PATCH v2 00/10] [GSoC] Implement Corrected Commit Date Derrick Stolee
2020-08-15 16:39   ` [PATCH v3 00/11] " Abhishek Kumar via GitGitGadget
2020-08-15 16:39     ` [PATCH v3 01/11] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-17 22:30       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 02/11] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-18 14:18       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 03/11] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-19 17:54       ` Jakub Narębski
2020-08-21  4:11         ` Abhishek Kumar
2020-08-25 11:11           ` Jakub Narębski
2020-09-01 11:35             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 04/11] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-17 13:22       ` Derrick Stolee
2020-08-21 11:05       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 05/11] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-21 13:14       ` Jakub Narębski
2020-08-25  5:04         ` Abhishek Kumar
2020-08-25 12:18           ` Jakub Narębski
2020-09-01 12:06             ` Abhishek Kumar
2020-09-03 13:42               ` Jakub Narębski
2020-09-05 17:21                 ` Abhishek Kumar
2020-09-13 15:39                   ` Jakub Narębski
2020-09-28 21:48                     ` Jakub Narębski
2020-10-05  5:25                       ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 06/11] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-08-21 18:43       ` Jakub Narębski
2020-08-25  6:14         ` Abhishek Kumar
2020-08-25  7:33           ` Jakub Narębski
2020-08-25  7:56             ` Jakub Narębski
2020-09-01 10:26               ` Abhishek Kumar
2020-09-03  9:25                 ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 07/11] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-22  0:05       ` Jakub Narębski
2020-08-25  6:49         ` Abhishek Kumar
2020-08-25 10:07           ` Jakub Narębski
2020-09-01 11:01             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 08/11] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-22 13:09       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 09/11] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-08-22 17:14       ` Jakub Narębski
2020-08-26  7:15         ` Abhishek Kumar
2020-08-26 10:38           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 10/11] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-22 19:09       ` Jakub Narębski
2020-09-01 10:08         ` Abhishek Kumar
2020-09-03 19:11           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 11/11] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-22 22:20       ` Jakub Narębski
2020-08-27  6:39         ` Abhishek Kumar
2020-08-27 12:43           ` Jakub Narębski
2020-08-27 13:15           ` Derrick Stolee
2020-09-01 13:01             ` Abhishek Kumar
2020-08-17  0:13     ` [PATCH v3 00/11] [GSoC] Implement Corrected Commit Date Jakub Narębski
     [not found]       ` <CANQwDwdKp7oKy9BeKdvKhwPUiq0R5MS8TCw-eWGCYCoMGv=G-g@mail.gmail.com>
2020-08-17  1:32         ` Fwd: " Taylor Blau
2020-08-17  7:56           ` Jakub Narębski
2020-08-18  6:12       ` Abhishek Kumar
2020-08-23 15:27       ` Jakub Narębski
2020-08-24  2:49         ` Abhishek Kumar
2020-10-07 14:09     ` [PATCH v4 00/10] " Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 01/10] commit-graph: fix regression when computing Bloom filters Abhishek Kumar via GitGitGadget
2020-10-24 23:16         ` Jakub Narębski
2020-10-25 20:58           ` Taylor Blau
2020-11-03  5:36             ` Abhishek Kumar
2020-10-07 14:09       ` [PATCH v4 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-10-24 23:41         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-10-25 10:52         ` Jakub Narębski
2020-10-27  6:33           ` Abhishek Kumar
2020-10-07 14:09       ` [PATCH v4 04/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-10-25 13:48         ` Jakub Narębski
2020-11-03  6:40           ` Abhishek Kumar
2020-10-07 14:09       ` [PATCH v4 05/10] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-10-25 22:17         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 06/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-10-27 18:53         ` Jakub Narębski [this message]
2020-11-03 11:44           ` Abhishek Kumar
2020-11-04 16:45             ` Jakub Narębski
2020-11-05 14:05               ` Philip Oakley
2020-11-05 18:22                 ` Junio C Hamano
2020-11-06 18:26                   ` Extending and updating gitglossary (was: Re: [PATCH v4 06/10] commit-graph: implement corrected commit date) Jakub Narębski
2020-11-06 19:33                     ` Extending and updating gitglossary Junio C Hamano
2020-11-08 17:23                     ` Extending and updating gitglossary (was: Re: [PATCH v4 06/10] commit-graph: implement corrected commit date) Philip Oakley
2020-11-10  1:35                       ` Extending and updating gitglossary Jakub Narębski
2020-11-10 14:04                         ` Philip Oakley
2020-11-10 23:52                           ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 07/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-10-30 12:45         ` Jakub Narębski
2020-11-06 11:25           ` Abhishek Kumar
2020-11-06 17:56             ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 08/10] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-11-01  0:55         ` Jakub Narębski
2020-11-12 10:01           ` Abhishek Kumar
2020-11-13  9:59             ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-11-03 17:59         ` Jakub Narębski
2020-11-03 18:19           ` Junio C Hamano
2020-11-20 10:33           ` Abhishek Kumar
2020-10-07 14:09       ` [PATCH v4 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-11-04  1:37         ` Jakub Narębski
2020-11-21  6:30           ` Abhishek Kumar
2020-11-04 23:37       ` [PATCH v4 00/10] [GSoC] Implement Corrected Commit Date Jakub Narębski
2020-11-22  5:31         ` Abhishek Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85r1pjzejg.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git