git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: me@ttaylorr.com, peff@peff.net, gitster@pobox.com,
	abhishekkumar8222@gmail.com, Derrick Stolee <stolee@gmail.com>,
	Taylor Blau <ttaylorr@github.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH v2 2/6] commit-graph: always parse before commit_graph_data_at()
Date: Tue, 02 Feb 2021 03:01:19 +0000	[thread overview]
Message-ID: <454b183b9ba502da7f40dc36aaa95cc3d12b5c2f.1612234883.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.850.v2.git.1612234883.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

There is a subtle failure happening when computing corrected commit
dates with --split enabled. It requires a base layer needing the
generation_data_overflow chunk. Then, the next layer on top
erroneously thinks it needs an overflow chunk due to a bug leading
to recalculating all reachable generation numbers. The output of
the failure is

  BUG: commit-graph.c:1912: expected to write 8 bytes to
  chunk 47444f56, but wrote 0 instead

These "expected" 8 bytes are due to re-computing the corrected
commit date for the lower layer but the new layer does not need
any overflow.

Add a test to t5318-commit-graph.sh that demonstrates this bug. However,
it does not trigger consistently with the existing code.

The generation number data is stored in a slab and accessed by
commit_graph_data_at(). This data is initialized when parsing a commit,
but is otherwise used assuming it has been populated. The loop in
compute_generation_numbers() did not enforce that all reachable
commits were parsed and had correct values. This could lead to some
problems when writing a commit-graph with corrected commit dates based
on a commit-graph without them.

It has been difficult to identify the issue here because it was so hard
to reproduce. It relies on this uninitialized data having a non-zero
value, but also on specifically in a way that overwrites the existing
data.

This patch adds the extra parse to ensure the data is filled before we
compute the generation number of a commit. This triggers the new test
to fail because the generation number overflow count does not match
between this computation and the write for that chunk.

The actual fix will follow as the next few changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 16 ++++++++++++----
 t/t5318-commit-graph.sh | 21 +++++++++++++++++++++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 03e5a987968..edbb3a0f2cc 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1193,7 +1193,9 @@ static int write_graph_chunk_generation_data(struct hashfile *f,
 
 	for (i = 0; i < ctx->commits.nr; i++) {
 		struct commit *c = ctx->commits.list[i];
-		timestamp_t offset = commit_graph_data_at(c)->generation - c->date;
+		timestamp_t offset;
+		repo_parse_commit(ctx->r, c);
+		offset = commit_graph_data_at(c)->generation - c->date;
 		display_progress(ctx->progress, ++ctx->progress_cnt);
 
 		if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) {
@@ -1444,15 +1446,20 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 					_("Computing commit graph generation numbers"),
 					ctx->commits.nr);
 	for (i = 0; i < ctx->commits.nr; i++) {
-		uint32_t level = *topo_level_slab_at(ctx->topo_levels, ctx->commits.list[i]);
-		timestamp_t corrected_commit_date = commit_graph_data_at(ctx->commits.list[i])->generation;
+		struct commit *c = ctx->commits.list[i];
+		uint32_t level;
+		timestamp_t corrected_commit_date;
+
+		repo_parse_commit(ctx->r, c);
+		level = *topo_level_slab_at(ctx->topo_levels, c);
+		corrected_commit_date = commit_graph_data_at(c)->generation;
 
 		display_progress(ctx->progress, i + 1);
 		if (level != GENERATION_NUMBER_ZERO &&
 		    corrected_commit_date != GENERATION_NUMBER_ZERO)
 			continue;
 
-		commit_list_insert(ctx->commits.list[i], &list);
+		commit_list_insert(c, &list);
 		while (list) {
 			struct commit *current = list->item;
 			struct commit_list *parent;
@@ -1461,6 +1468,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 			timestamp_t max_corrected_commit_date = 0;
 
 			for (parent = current->parents; parent; parent = parent->next) {
+				repo_parse_commit(ctx->r, parent->item);
 				level = *topo_level_slab_at(ctx->topo_levels, parent->item);
 				corrected_commit_date = commit_graph_data_at(parent->item)->generation;
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index fa27df579a5..2cf29f425a0 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -446,6 +446,27 @@ test_expect_success 'warn on improper hash version' '
 	)
 '
 
+test_expect_failure 'lower layers have overflow chunk' '
+	cd "$TRASH_DIRECTORY/full" &&
+	UNIX_EPOCH_ZERO="@0 +0000" &&
+	FUTURE_DATE="@2147483646 +0000" &&
+	rm -f .git/objects/info/commit-graph &&
+	test_commit --date "$FUTURE_DATE" future-1 &&
+	test_commit --date "$UNIX_EPOCH_ZERO" old-1 &&
+	git commit-graph write --reachable &&
+	test_commit --date "$FUTURE_DATE" future-2 &&
+	test_commit --date "$UNIX_EPOCH_ZERO" old-2 &&
+	git commit-graph write --reachable --split=no-merge &&
+	test_commit extra &&
+	git commit-graph write --reachable --split=no-merge &&
+	git commit-graph write --reachable &&
+	graph_read_expect 16 "generation_data generation_data_overflow extra_edges" &&
+	mv .git/objects/info/commit-graph commit-graph-upgraded &&
+	git commit-graph write --reachable &&
+	graph_read_expect 16 "generation_data generation_data_overflow extra_edges" &&
+	test_cmp .git/objects/info/commit-graph commit-graph-upgraded
+'
+
 # the verify tests below expect the commit-graph to contain
 # exactly the commits reachable from the commits/8 branch.
 # If the file changes the set of commits in the list, then the
-- 
gitgitgadget


  parent reply	other threads:[~2021-02-02  3:02 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-01 17:15 [PATCH 0/5] Generation Number v2: Fix a tricky split graph bug Derrick Stolee via GitGitGadget
2021-02-01 17:15 ` [PATCH 1/5] commit-graph: use repo_parse_commit Derrick Stolee via GitGitGadget
2021-02-01 17:32   ` Taylor Blau
2021-02-01 17:15 ` [PATCH 2/5] commit-graph: always parse before commit_graph_data_at() Derrick Stolee via GitGitGadget
2021-02-01 18:44   ` Junio C Hamano
2021-02-01 17:15 ` [PATCH 3/5] commit-graph: validate layers for generation data Derrick Stolee via GitGitGadget
2021-02-01 17:39   ` Taylor Blau
2021-02-01 18:10     ` Derrick Stolee
2021-02-01 17:15 ` [PATCH 4/5] commit-graph: be extra careful about mixed generations Derrick Stolee via GitGitGadget
2021-02-01 18:04   ` Taylor Blau
2021-02-01 18:13     ` Derrick Stolee
2021-02-01 18:55   ` Junio C Hamano
2021-02-01 17:15 ` [PATCH 5/5] commit-graph: prepare commit graph Derrick Stolee via GitGitGadget
2021-02-01 18:25   ` Taylor Blau
2021-02-02  3:01 ` [PATCH v2 0/6] Generation Number v2: Fix a tricky split graph bug Derrick Stolee via GitGitGadget
2021-02-02  3:01   ` [PATCH v2 1/6] commit-graph: use repo_parse_commit Derrick Stolee via GitGitGadget
2021-02-02  3:01   ` Derrick Stolee via GitGitGadget [this message]
2021-02-03  1:08     ` [PATCH v2 2/6] commit-graph: always parse before commit_graph_data_at() Jonathan Nieder
2021-02-03  1:35       ` Derrick Stolee
2021-02-03  1:48         ` Jonathan Nieder
2021-02-03  3:07           ` Derrick Stolee
2021-02-03 15:34             ` Taylor Blau
2021-02-03 17:37               ` Eric Sunshine
2021-02-03 18:41               ` Junio C Hamano
2021-02-03 21:08                 ` Taylor Blau
2021-02-03  2:06         ` Junio C Hamano
2021-02-03  3:09           ` Derrick Stolee
2021-02-07 19:04           ` SZEDER Gábor
2021-02-07 20:12             ` Junio C Hamano
2021-02-08  2:01               ` Derrick Stolee
2021-02-08  5:55                 ` Junio C Hamano
2021-02-02  3:01   ` [PATCH v2 3/6] commit-graph: validate layers for generation data Derrick Stolee via GitGitGadget
2021-02-02  3:01   ` [PATCH v2 4/6] commit-graph: compute generations separately Derrick Stolee via GitGitGadget
2021-02-02  3:01   ` [PATCH v2 5/6] commit-graph: be extra careful about mixed generations Derrick Stolee via GitGitGadget
2021-02-02  3:01   ` [PATCH v2 6/6] commit-graph: prepare commit graph Derrick Stolee via GitGitGadget
2021-02-02  3:08   ` [PATCH v2 0/6] Generation Number v2: Fix a tricky split graph bug Taylor Blau
2021-02-11  4:44   ` Abhishek Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=454b183b9ba502da7f40dc36aaa95cc3d12b5c2f.1612234883.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=ttaylorr@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).