All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>, stolee@gmail.com
Subject: [PATCH] commit-reach: do not parse and iterate minima
Date: Wed, 23 Mar 2022 14:08:02 -0700	[thread overview]
Message-ID: <20220323210803.1130790-1-jonathantanmy@google.com> (raw)

When a commit is parsed, it pretends to have a different (possibly
empty) list of parents if there is graft information for that commit.
But there is a bug that could occur when a commit is parsed, the graft
information is updated (for example, when a shallow file is rewritten),
and the same commit is subsequently used: the parents of the commit do
not conform to the updated graft information, but the information at the
time of parsing.

This is usually not an issue, as a commit is usually introduced into the
repository at the same time as its graft information. That means that
when we try to parse that commit, we already have its graft information.

However, this is not the case when fetching with --update-shallow. In
post_assign_shallow() in shallow.c, a revision walk is done that also
parses commits at the shallow boundary before updating the shallow
information (and hence, the graft information). (This revision walk
needs to be done before the update because the nature of the update
depends on the outcome of the revision walk.) If we were to
revision-walk such a commit (at the shallow boundary), we would end up
trying and failing to parse its parents because its list of parents is
not empty (since it was parsed before there was any graft information
telling us to conceal its parents). This revision walk will happen if
the client has submodules, as it will revision-walk the fetched commits
to check for new submodules, triggering this bug.

This revision walk in post_assign_shallow() actually does not need to go
beyond the shallow boundaries, so the solution is twofold: (1) do not
iterate beyond such commits, and (2) in doing so, we no longer need to
parse them, so do not parse them.

To do (1), do something similar to d7c1ec3efd ("commit: add
short-circuit to paint_down_to_common()", 2018-05-22), which taught
paint_down_to_common() the concept of "min_generation": commits older
than that generation do not need to be iterated over. Introduce yet
another parameter that indicates whether "one" is of that minimum
generation. If yes, we never need to iterate over its ancestors, so we
do not need to add it to the priority queue in the first place. And if
we reach it (because another input commit has it as an ancestor), we do
not need to add its parents to the priority queue. There are only a few
callers, so update all of them (including the one that matters for this
bug - repo_in_merge_bases_many()) to indicate whether we know that "one"
is of the minimum generation.

To do (2), just delete the code that parses it.

This solves the problem, because the commit is now only parsed during
the revision walk that checks for new submodules, which happens after
the shallow information is written.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
This is on jt/reset-grafts-when-resetting-shallow.

I tried figuring out a better reproduction recipe, but couldn't (I
couldn't figure out what the local repo should contain with respect to
the remote repo in order to reproduce this bug without just modifying an
existing test). If anyone has any suggestions, please let me know.
---
 commit-reach.c           | 25 +++++++++++++++----------
 t/t5537-fetch-shallow.sh |  8 ++++++--
 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index c226ee3da4..7f4f3f7424 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -47,11 +47,15 @@ static int queue_has_nonstale(struct prio_queue *queue)
 	return 0;
 }
 
-/* all input commits in one and twos[] must have been parsed! */
+/*
+ * All input commits in twos[] must have been parsed. If
+ * one_is_at_min_generation is false, one must also have be parsed.
+ */
 static struct commit_list *paint_down_to_common(struct repository *r,
 						struct commit *one, int n,
 						struct commit **twos,
-						timestamp_t min_generation)
+						timestamp_t min_generation,
+						int one_is_at_min_generation)
 {
 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 	struct commit_list *result = NULL;
@@ -66,7 +70,8 @@ static struct commit_list *paint_down_to_common(struct repository *r,
 		commit_list_append(one, &result);
 		return result;
 	}
-	prio_queue_put(&queue, one);
+	if (!one_is_at_min_generation)
+		prio_queue_put(&queue, one);
 
 	for (i = 0; i < n; i++) {
 		twos[i]->object.flags |= PARENT2;
@@ -97,6 +102,10 @@ static struct commit_list *paint_down_to_common(struct repository *r,
 			/* Mark parents of a found merge stale */
 			flags |= STALE;
 		}
+
+		if (commit == one && one_is_at_min_generation)
+			continue;
+
 		parents = commit->parents;
 		while (parents) {
 			struct commit *p = parents->item;
@@ -138,7 +147,7 @@ static struct commit_list *merge_bases_many(struct repository *r,
 			return NULL;
 	}
 
-	list = paint_down_to_common(r, one, n, twos, 0);
+	list = paint_down_to_common(r, one, n, twos, 0, 0);
 
 	while (list) {
 		struct commit *commit = pop_commit(&list);
@@ -187,8 +196,6 @@ static int remove_redundant_no_gen(struct repository *r,
 	redundant = xcalloc(cnt, 1);
 	ALLOC_ARRAY(filled_index, cnt - 1);
 
-	for (i = 0; i < cnt; i++)
-		repo_parse_commit(r, array[i]);
 	for (i = 0; i < cnt; i++) {
 		struct commit_list *common;
 		timestamp_t min_generation = commit_graph_generation(array[i]);
@@ -207,7 +214,7 @@ static int remove_redundant_no_gen(struct repository *r,
 				min_generation = curr_generation;
 		}
 		common = paint_down_to_common(r, array[i], filled,
-					      work, min_generation);
+					      work, min_generation, 1);
 		if (array[i]->object.flags & PARENT2)
 			redundant[i] = 1;
 		for (j = 0; j < filled; j++)
@@ -478,8 +485,6 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
 	int ret = 0, i;
 	timestamp_t generation, max_generation = GENERATION_NUMBER_ZERO;
 
-	if (repo_parse_commit(r, commit))
-		return ret;
 	for (i = 0; i < nr_reference; i++) {
 		if (repo_parse_commit(r, reference[i]))
 			return ret;
@@ -495,7 +500,7 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
 
 	bases = paint_down_to_common(r, commit,
 				     nr_reference, reference,
-				     generation);
+				     generation, 1);
 	if (commit->object.flags & PARENT2)
 		ret = 1;
 	clear_commit_marks(commit, all_flags);
diff --git a/t/t5537-fetch-shallow.sh b/t/t5537-fetch-shallow.sh
index 92948de7a0..f736ccf9f5 100755
--- a/t/t5537-fetch-shallow.sh
+++ b/t/t5537-fetch-shallow.sh
@@ -136,6 +136,8 @@ test_expect_success 'fetch that requires changes in .git/shallow is filtered' '
 '
 
 test_expect_success 'fetch --update-shallow' '
+	git init a-submodule &&
+	test_commit -C a-submodule foo &&
 	(
 	cd shallow &&
 	git checkout main &&
@@ -145,10 +147,13 @@ test_expect_success 'fetch --update-shallow' '
 	) &&
 	(
 	cd notshallow &&
+	git submodule add ../a-submodule a-submodule &&
+	git commit -m "added submodule" &&
 	git fetch --update-shallow ../shallow/.git refs/heads/*:refs/remotes/shallow/* &&
 	git fsck &&
 	git for-each-ref --sort=refname --format="%(refname)" >actual.refs &&
 	cat <<-\EOF >expect.refs &&
+	refs/heads/main
 	refs/remotes/shallow/main
 	refs/remotes/shallow/no-shallow
 	refs/tags/heavy-tag
@@ -162,8 +167,6 @@ test_expect_success 'fetch --update-shallow' '
 '
 
 test_expect_success 'fetch --update-shallow into a repo with submodules' '
-	git init a-submodule &&
-	test_commit -C a-submodule foo &&
 	git init repo-with-sub &&
 	git -C repo-with-sub submodule add ../a-submodule a-submodule &&
 	git -C repo-with-sub commit -m "added submodule" &&
@@ -185,6 +188,7 @@ test_expect_success 'fetch --update-shallow (with fetch.writeCommitGraph)' '
 	git fsck &&
 	git for-each-ref --sort=refname --format="%(refname)" >actual.refs &&
 	cat <<-EOF >expect.refs &&
+	refs/heads/main
 	refs/remotes/shallow/main
 	refs/remotes/shallow/no-shallow
 	refs/tags/heavy-tag
-- 
2.35.1.894.gb6a874cedc-goog


             reply	other threads:[~2022-03-23 21:08 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-23 21:08 Jonathan Tan [this message]
2022-03-23 23:30 ` [PATCH] commit-reach: do not parse and iterate minima Junio C Hamano
2022-03-24 15:27   ` Derrick Stolee
2022-03-24 22:06     ` Taylor Blau
2022-03-25 14:32       ` Derrick Stolee
2022-03-24 22:15     ` Jonathan Tan
2022-03-24 12:05 ` Bagas Sanjaya
2022-03-24 22:19   ` Jonathan Tan
2022-03-24 15:29 ` Derrick Stolee
2022-03-24 22:21   ` Jonathan Tan
2022-06-02 23:11 ` [PATCH v2] commit,shallow: unparse commits if grafts changed Jonathan Tan
2022-06-03  9:30   ` Ævar Arnfjörð Bjarmason
2022-06-03 13:29     ` Derrick Stolee
2022-06-03 15:27       ` Jonathan Tan
2022-06-03 15:26     ` Jonathan Tan
2022-06-06 17:54 ` [PATCH v3] " Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220323210803.1130790-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.