Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Abhishek Kumar via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: "Jakub Narębski" <jnareb@gmail.com>,
	"Abhishek Kumar" <abhishekkumar8222@gmail.com>
Subject: Re: [PATCH 6/6] commit-graph: implement corrected commit date offset
Date: Tue, 28 Jul 2020 11:55:12 -0400
Message-ID: <e8646aaa-667f-b7d8-f8f2-efbaaeb8877d@gmail.com> (raw)
In-Reply-To: <647290d0368e385227614dd1822aa9083b0dba5e.1595927632.git.gitgitgadget@gmail.com>

On 7/28/2020 5:13 AM, Abhishek Kumar via GitGitGadget wrote:
> From: Abhishek Kumar <abhishekkumar8222@gmail.com>
> 
> With preparations done,...

I feel like this commit could have been made smaller by doing the
uint32_t -> timestamp_t conversion in a separate patch. That would
make it easier to focus on the changes to the generation number v2
logic.

> let's implement corrected commit date offset.
> We add a new commit-slab to store topological levels while writing

It's important to add: we store topological levels to ensure that older
versions of Git will still have the performance benefits from generation
number v1.

> commit graph and upgrade number of struct commit_graph_data to 64-bits.

Do you mean "update the generation member in struct commit_graph_data
to a 64-bit timestamp"? The struct itself also has the 32-bit graph_pos
member.

> We have to touch many files, upgrading generation number from uint32_t
> to timestamp_t.

Yes, that's why I recommend doing that in a different step.

> We drop 'detect incorrect generation number' from t5318-commit-graph.sh,
> which tests if verify can detect if a commit graph have
> GENERATION_NUMBER_ZERO for a commit, followed by a non-zero generation.
> With corrected commit dates, GENERATION_NUMBER_ZERO is possible only if
> one of dates is Unix epoch zero.

What about the topological levels? Are we caring about verifying the data
that we start to ignore in this new version? I'm hesitant to drop this
right now, but I'm open to it if we really don't see it as a valuable test.

> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---
>  blame.c                 |   2 +-
>  commit-graph.c          | 109 ++++++++++++++++++++++------------------
>  commit-graph.h          |   4 +-
>  commit-reach.c          |  32 ++++++------
>  commit-reach.h          |   2 +-
>  commit.h                |   3 ++
>  revision.c              |  14 +++---
>  t/t5318-commit-graph.sh |   2 +-
>  upload-pack.c           |   2 +-
>  9 files changed, 93 insertions(+), 77 deletions(-)
> 
> diff --git a/blame.c b/blame.c
> index 82fa16d658..48aa632461 100644
> --- a/blame.c
> +++ b/blame.c
> @@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r,
>  	if (!bd)
>  		return 1;
>  
> -	if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY)
> +	if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_V2_INFINITY)
>  		return 1;

I don't see value in changing the name of this macro. It
is only used as the default value for a commit not in the
commit-graph. Changing its value to 0xFFFFFFFF works for
both versions when the type is updated to timestamp_t.

The actually-important change in this patch (not just the
type change) is here:

> -static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> +static void compute_corrected_commit_date_offsets(struct write_commit_graph_context *ctx)
>  {
>  	int i;
>  	struct commit_list *list = NULL;
> @@ -1326,11 +1334,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  					_("Computing commit graph generation numbers"),
>  					ctx->commits.nr);
>  	for (i = 0; i < ctx->commits.nr; i++) {
> -		uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
> +		uint32_t topo_level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]);
>  
>  		display_progress(ctx->progress, i + 1);
> -		if (generation != GENERATION_NUMBER_INFINITY &&
> -		    generation != GENERATION_NUMBER_ZERO)
> +		if (topo_level != GENERATION_NUMBER_INFINITY &&
> +		    topo_level != GENERATION_NUMBER_ZERO)
>  			continue;

Here, our "skip" condition is that the topo_level has been computed.
This should be fine, as we are never reading that out of the commit-graph.
We will never be in a mode where topo_level is computed but corrected
commit-date is not.

>  		commit_list_insert(ctx->commits.list[i], &list);
> @@ -1338,29 +1346,38 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  			struct commit *current = list->item;
>  			struct commit_list *parent;
>  			int all_parents_computed = 1;
> -			uint32_t max_generation = 0;
> +			uint32_t max_level = 0;
> +			timestamp_t max_corrected_commit_date = current->date;

Later you assign data->generation to be "max_corrected_commit_date + 1",
which made me think this should be "current->date - 1". Is that so? Or,
do we want most offsets to be one instead of zero? Is there value there?

>  
>  			for (parent = current->parents; parent; parent = parent->next) {
> -				generation = commit_graph_data_at(parent->item)->generation;
> +				topo_level = *topo_level_slab_at(ctx->topo_levels, parent->item);
>  
> -				if (generation == GENERATION_NUMBER_INFINITY ||
> -				    generation == GENERATION_NUMBER_ZERO) {
> +				if (topo_level == GENERATION_NUMBER_INFINITY ||
> +				    topo_level == GENERATION_NUMBER_ZERO) {
>  					all_parents_computed = 0;
>  					commit_list_insert(parent->item, &list);
>  					break;
> -				} else if (generation > max_generation) {
> -					max_generation = generation;
> +				} else {
> +					struct commit_graph_data *data = commit_graph_data_at(parent->item);
> +
> +					if (topo_level > max_level)
> +						max_level = topo_level;
> +
> +					if (data->generation > max_corrected_commit_date)
> +						max_corrected_commit_date = data->generation;
>  				}
>  			}
>  
>  			if (all_parents_computed) {
>  				struct commit_graph_data *data = commit_graph_data_at(current);
>  
> -				data->generation = max_generation + 1;
> -				pop_commit(&list);
> +				if (max_level > GENERATION_NUMBER_MAX - 1)
> +					max_level = GENERATION_NUMBER_MAX - 1;
> +
> +				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
> +				data->generation = max_corrected_commit_date + 1;
>  
> -				if (data->generation > GENERATION_NUMBER_MAX)
> -					data->generation = GENERATION_NUMBER_MAX;
> +				pop_commit(&list);
>  			}
>  		}
>  	}

This looks correct, and I've done a tiny bit of perf tests locally.

> @@ -2085,6 +2102,7 @@ int write_commit_graph(struct object_directory *odb,
>  	uint32_t i, count_distinct = 0;
>  	int res = 0;
>  	int replace = 0;
> +	struct topo_level_slab topo_levels;
>  
>  	if (!commit_graph_compatible(the_repository))
>  		return 0;
> @@ -2099,6 +2117,9 @@ int write_commit_graph(struct object_directory *odb,
>  	ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0;
>  	ctx->total_bloom_filter_data_size = 0;
>  
> +	init_topo_level_slab(&topo_levels);
> +	ctx->topo_levels = &topo_levels;
> +
>  	if (ctx->split) {
>  		struct commit_graph *g;
>  		prepare_commit_graph(ctx->r);
> @@ -2197,7 +2218,7 @@ int write_commit_graph(struct object_directory *odb,
>  	} else
>  		ctx->num_commit_graphs_after = 1;
>  
> -	compute_generation_numbers(ctx);
> +	compute_corrected_commit_date_offsets(ctx);

This rename might not be necessary. You are computing both
versions (v1 and v2) so the name change is actually less
accurate than the old name.

Thanks,
-Stolee


  reply index

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  9:13 [PATCH 0/6] [GSoC] Implement Corrected Commit Date Abhishek Kumar via GitGitGadget
2020-07-28  9:13 ` [PATCH 1/6] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-07-28 15:28   ` Taylor Blau
2020-07-30  5:24     ` Abhishek Kumar
2020-08-04  0:46   ` Jakub Narębski
2020-08-04  0:56     ` Taylor Blau
2020-08-04 10:10       ` Jakub Narębski
2020-08-04  7:55     ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 2/6] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-07-28 13:00   ` Derrick Stolee
2020-07-28 15:30     ` Taylor Blau
2020-08-05 23:16   ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 3/6] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-07-28 13:14   ` Derrick Stolee
2020-07-28 15:19     ` René Scharfe
2020-07-28 15:58       ` Derrick Stolee
2020-07-28 16:01     ` Taylor Blau
2020-07-30  6:07     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 4/6] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-07-28 16:03   ` Taylor Blau
2020-07-28  9:13 ` [PATCH 5/6] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-07-28 16:12   ` Taylor Blau
2020-07-30  6:52     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 6/6] commit-graph: implement corrected commit date offset Abhishek Kumar via GitGitGadget
2020-07-28 15:55   ` Derrick Stolee [this message]
2020-07-28 16:23     ` Taylor Blau
2020-07-30  7:27     ` Abhishek Kumar
2020-07-28 14:54 ` [PATCH 0/6] [GSoC] Implement Corrected Commit Date Taylor Blau
2020-07-30  7:47   ` Abhishek Kumar
2020-07-28 16:35 ` Derrick Stolee
2020-08-09  2:53 ` [PATCH v2 00/10] " Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 01/10] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 04/10] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 05/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-10 16:28     ` Derrick Stolee
2020-08-11 11:03       ` Abhishek Kumar
2020-08-11 12:27         ` Derrick Stolee
2020-08-11 18:58           ` Taylor Blau
2020-08-09  2:53   ` [PATCH v2 06/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 07/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-10 14:23     ` Derrick Stolee
2020-08-14  4:59       ` Abhishek Kumar
2020-08-14 12:24         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 08/10] commit-graph: handle mixed generation commit chains Abhishek Kumar via GitGitGadget
2020-08-10 16:42     ` Derrick Stolee
2020-08-11 11:36       ` Abhishek Kumar
2020-08-11 12:43         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-10 16:47   ` [PATCH v2 00/10] [GSoC] Implement Corrected Commit Date Derrick Stolee
2020-08-15 16:39   ` [PATCH v3 00/11] " Abhishek Kumar via GitGitGadget
2020-08-15 16:39     ` [PATCH v3 01/11] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-17 22:30       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 02/11] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-18 14:18       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 03/11] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-19 17:54       ` Jakub Narębski
2020-08-21  4:11         ` Abhishek Kumar
2020-08-25 11:11           ` Jakub Narębski
2020-09-01 11:35             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 04/11] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-17 13:22       ` Derrick Stolee
2020-08-21 11:05       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 05/11] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-21 13:14       ` Jakub Narębski
2020-08-25  5:04         ` Abhishek Kumar
2020-08-25 12:18           ` Jakub Narębski
2020-09-01 12:06             ` Abhishek Kumar
2020-09-03 13:42               ` Jakub Narębski
2020-09-05 17:21                 ` Abhishek Kumar
2020-09-13 15:39                   ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 06/11] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-08-21 18:43       ` Jakub Narębski
2020-08-25  6:14         ` Abhishek Kumar
2020-08-25  7:33           ` Jakub Narębski
2020-08-25  7:56             ` Jakub Narębski
2020-09-01 10:26               ` Abhishek Kumar
2020-09-03  9:25                 ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 07/11] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-22  0:05       ` Jakub Narębski
2020-08-25  6:49         ` Abhishek Kumar
2020-08-25 10:07           ` Jakub Narębski
2020-09-01 11:01             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 08/11] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-22 13:09       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 09/11] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-08-22 17:14       ` Jakub Narębski
2020-08-26  7:15         ` Abhishek Kumar
2020-08-26 10:38           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 10/11] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-22 19:09       ` Jakub Narębski
2020-09-01 10:08         ` Abhishek Kumar
2020-09-03 19:11           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 11/11] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-22 22:20       ` Jakub Narębski
2020-08-27  6:39         ` Abhishek Kumar
2020-08-27 12:43           ` Jakub Narębski
2020-08-27 13:15           ` Derrick Stolee
2020-09-01 13:01             ` Abhishek Kumar
2020-08-17  0:13     ` [PATCH v3 00/11] [GSoC] Implement Corrected Commit Date Jakub Narębski
     [not found]       ` <CANQwDwdKp7oKy9BeKdvKhwPUiq0R5MS8TCw-eWGCYCoMGv=G-g@mail.gmail.com>
2020-08-17  1:32         ` Fwd: " Taylor Blau
2020-08-17  7:56           ` Jakub Narębski
2020-08-18  6:12       ` Abhishek Kumar
2020-08-23 15:27       ` Jakub Narębski
2020-08-24  2:49         ` Abhishek Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8646aaa-667f-b7d8-f8f2-efbaaeb8877d@gmail.com \
    --to=stolee@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jnareb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git