git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch
@ 2019-10-22 17:28 Derrick Stolee via GitGitGadget
  2019-10-22 17:28 ` [PATCH 1/1] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
  2019-10-23 13:01 ` [PATCH v2 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
  0 siblings, 2 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-10-22 17:28 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano

While dogfooding, Johannes found a bug in the fetch.writeCommitGraph config
behavior. While his example initially happened during a clone with
--recurse-submodules, we found that it is actually a problem with the first
fetch after a new clone and no existing commit-graph file:

$ git clone <url> test
$ cd test
$ git -c fetch.writeCommitGraph=true fetch origin
Computing commit graph generation numbers: 100% (12/12), done.
BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
Aborted (core dumped)

In the repo I had cloned, there were really 60 commits to scan, but only 12
were in the list to write when calling compute_generation_numbers(). A
commit in the list expects to see a parent, but that parent is not in the
list.

The details above are the start of the commit message for [PATCH 1/1]. I
tried to include as much detail as I could for how I investigated the
problem and why I think this is the right solution.

I would like help creating a test that demonstrates this problem. It does
not appear to trigger on the simplest examples. The simple example I used
for my testing was https://github.com/derrickstolee/numbers

Thanks, -Stolee

/cc johannes.schindelin@gmx.de, peff@peff.net

Derrick Stolee (1):
  commit-graph: fix writing first commit-graph during fetch

 commit-graph.c | 11 +++++++----
 commit-reach.c |  1 -
 object.h       |  3 ++-
 3 files changed, 9 insertions(+), 6 deletions(-)


base-commit: d966095db01190a2196e31195ea6fa0c722aa732
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-415%2Fderrickstolee%2Ffetch-first-write-fail-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-415/derrickstolee/fetch-first-write-fail-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/415
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/1] commit-graph: fix writing first commit-graph during fetch
  2019-10-22 17:28 [PATCH 0/1] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
@ 2019-10-22 17:28 ` Derrick Stolee via GitGitGadget
  2019-10-22 20:33   ` Jeff King
  2019-10-23 13:01 ` [PATCH v2 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
  1 sibling, 1 reply; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-10-22 17:28 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

While dogfooding, Johannes found a bug in the fetch.writeCommitGraph
config behavior. While his example initially happened during a clone
with --recurse-submodules, we found that it is actually a problem with
the first fetch after a new clone and no existing commit-graph file:

	$ git clone <url> test
	$ cd test
	$ git -c fetch.writeCommitGraph=true fetch origin
	Computing commit graph generation numbers: 100% (12/12), done.
	BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
	Aborted (core dumped)

In the repo I had cloned, there were really 60 commits to scan, but
only 12 were in the list to write when calling
compute_generation_numbers(). A commit in the list expects to see a
parent, but that parent is not in the list.

As an initial test, I converted the code in builtin/fetch.c that calls
write_commit_graph_reachable() to instead launch a "git commit-graph
write --reachable --split" process. That code worked, but is not how we
want the feature to work long-term.

That test did demonstrate that the issue must be something to do with
internal state of the 'git fetch' process.

The write_commit_graph() method in commit-graph.c ensures the commits we
plan to write are "closed under reachability" using close_reachable().
This method walks from the input commits, and uses the UNINTERESTING
flag to mark which commits have already been visited. This allows the
walk to take O(N) time, where N is the number of commits, instead of
O(P) time, where P is the number of paths. (The number of paths can be
exponential in the number of commits.)

However, the UNINTERESTING flag is used in lots of places in the
codebase. This flag usually means some barrier to stop a commit walk,
such as in revision-walking to compare histories. It is not often
cleared after the walk completes because the starting points of those
walks do not have the UNINTERESTING flag, and clear_commit_marks() would
stop immediately.

This is happening during a 'git fetch' call with a remote. The fetch
negotiation is comparing the remote refs with the local refs and marking
some commits as UNINTERESTING.

You may ask: did this feature ever work at all? Yes, it did, as long as
you had a commit-graph covering all of your local refs. My testing was
unfortunately limited to this scenario. The UNINTERESTING commits are
always part of the "old" commit-graph, and when we add new commits to a
top layer of the commit-graph chain those are not needed. If we happen
to merge layers of the chain, then the commits are added as a list, not
using a commit walk. Further, the test added for this config option in
t5510-fetch.sh uses local filesystem clones, which somehow avoids this
logic.

I tested running clear_commit_marks_many() to clear the UNINTERESTING
flag inside close_reachable(), but the tips did not have the flag, so
that did nothing.

Instead, I finally arrived on the conclusion that I should use a flag
that is not used in any other part of the code. In commit-reach.c, a
number of flags were defined for commit walk algorithms. The REACHABLE
flag seemed like it made the most sense, and it seems it was not
actually used in the file.

Add the REACHABLE flag to commit-graph.c and use it instead of
UNINTERESTING in close_reachable(). This fixes the bug in manual
testing.

I have failed to produce a test using the file:// protocol that
demonstrates this bug.

Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 11 +++++++----
 commit-reach.c |  1 -
 object.h       |  3 ++-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index fc4a43b8d6..0aea7b2ae5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -41,6 +41,9 @@
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
 			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
+/* Remember to update object flag allocation in object.h */
+#define REACHABLE       (1u<<15)
+
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
@@ -1030,11 +1033,11 @@ static void add_missing_parents(struct write_commit_graph_context *ctx, struct c
 {
 	struct commit_list *parent;
 	for (parent = commit->parents; parent; parent = parent->next) {
-		if (!(parent->item->object.flags & UNINTERESTING)) {
+		if (!(parent->item->object.flags & REACHABLE)) {
 			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
 			oidcpy(&ctx->oids.list[ctx->oids.nr], &(parent->item->object.oid));
 			ctx->oids.nr++;
-			parent->item->object.flags |= UNINTERESTING;
+			parent->item->object.flags |= REACHABLE;
 		}
 	}
 }
@@ -1052,7 +1055,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 		display_progress(ctx->progress, i + 1);
 		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 		if (commit)
-			commit->object.flags |= UNINTERESTING;
+			commit->object.flags |= REACHABLE;
 	}
 	stop_progress(&ctx->progress);
 
@@ -1089,7 +1092,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
 		if (commit)
-			commit->object.flags &= ~UNINTERESTING;
+			commit->object.flags &= ~REACHABLE;
 	}
 	stop_progress(&ctx->progress);
 }
diff --git a/commit-reach.c b/commit-reach.c
index 3ea174788a..4ca7e706a1 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -10,7 +10,6 @@
 #include "commit-reach.h"
 
 /* Remember to update object flag allocation in object.h */
-#define REACHABLE       (1u<<15)
 #define PARENT1		(1u<<16)
 #define PARENT2		(1u<<17)
 #define STALE		(1u<<18)
diff --git a/object.h b/object.h
index 0120892bbd..25f5ab3d54 100644
--- a/object.h
+++ b/object.h
@@ -68,7 +68,8 @@ struct object_array {
  * bisect.c:                                        16
  * bundle.c:                                        16
  * http-push.c:                                     16-----19
- * commit-reach.c:                                15-------19
+ * commit-graph.c:                                15
+ * commit-reach.c:                                  16-----19
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
  * builtin/fsck.c:           0--3
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] commit-graph: fix writing first commit-graph during fetch
  2019-10-22 17:28 ` [PATCH 1/1] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
@ 2019-10-22 20:33   ` Jeff King
  2019-10-22 21:45     ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2019-10-22 20:33 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee, Junio C Hamano

On Tue, Oct 22, 2019 at 05:28:55PM +0000, Derrick Stolee via GitGitGadget wrote:

> However, the UNINTERESTING flag is used in lots of places in the
> codebase. This flag usually means some barrier to stop a commit walk,
> such as in revision-walking to compare histories. It is not often
> cleared after the walk completes because the starting points of those
> walks do not have the UNINTERESTING flag, and clear_commit_marks() would
> stop immediately.

Oof. Nicely explained, and your fix makes sense.

The global-ness of revision flags always makes me nervous about doing
more things in-process (this isn't the first such bug we've had).

I have a dream of converting most uses of flags into using a
commit-slab. That provides cheap access to an auxiliary structure, so
each traversal, etc, could keep its own flag structure. I'm not sure if
it would have a performance impact, though. Even though it's O(1), it is
an indirect lookup, which could have some memory-access impact (though
my guess is it would be lost in the noise).

One of the sticking points is that all object types, not just commits,
use flags. But we only assign slab ids to commits. I noticed recently
that "struct object" has quite a few spare bits in it these days,
because the switch to a 32-byte oid means 64-bit machines now have an
extra 4 bytes of padding. I wonder if we could use that to store an
index field.

Anyway, that's getting far off the topic; clearly we need a fix in the
meantime, and what you have here looks good to me.

> I tested running clear_commit_marks_many() to clear the UNINTERESTING
> flag inside close_reachable(), but the tips did not have the flag, so
> that did nothing.

Another option would be clear_object_flags(), which just walks all of
the in-memory structs. Your REACHABLE solution is cheaper, though.

> Instead, I finally arrived on the conclusion that I should use a flag
> that is not used in any other part of the code. In commit-reach.c, a
> number of flags were defined for commit walk algorithms. The REACHABLE
> flag seemed like it made the most sense, and it seems it was not
> actually used in the file.

Yeah, being able to remove it from commit-reach.c surprised me for a
moment. To further add to the confusion, builtin/fsck.c has its own
REACHABLE flag (with a different bit and a totally different purpose). I
don't think there's any practical problem there, though.

> I have failed to produce a test using the file:// protocol that
> demonstrates this bug.

Hmm, from the description, it sounds like it should be easy. I might
poke at it a bit.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] commit-graph: fix writing first commit-graph during fetch
  2019-10-22 20:33   ` Jeff King
@ 2019-10-22 21:45     ` Jeff King
  2019-10-22 23:35       ` SZEDER Gábor
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2019-10-22 21:45 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee, Junio C Hamano

On Tue, Oct 22, 2019 at 04:33:16PM -0400, Jeff King wrote:

> > I have failed to produce a test using the file:// protocol that
> > demonstrates this bug.
> 
> Hmm, from the description, it sounds like it should be easy. I might
> poke at it a bit.

Hmph. I can reproduce it here, but it seems to depend on the repository.
If I do this:

diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index ecabbe1616..8d473a456f 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -583,6 +583,14 @@ test_expect_success 'fetch.writeCommitGraph' '
 	)
 '
 
+test_expect_success 'fetch.writeCommitGraph with a bigger repo' '
+	git clone "$TEST_DIRECTORY/.." repo &&
+	(
+		cd repo &&
+		git -c fetch.writeCommitGraph fetch origin
+	)
+'
+
 # configured prune tests
 
 set_config_tristate () {

it reliably triggers the bug. But if I make a synthetic repo, even it
has a lot of commits (thousands or more), it doesn't trigger. I thought
maybe it had to do with having commits that were not at tips (since the
tip ones presumably _are_ fed into the graph generation process). But
that doesn't seem to help.

Puzzling...

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] commit-graph: fix writing first commit-graph during fetch
  2019-10-22 21:45     ` Jeff King
@ 2019-10-22 23:35       ` SZEDER Gábor
  2019-10-23  0:35         ` Derrick Stolee
  0 siblings, 1 reply; 23+ messages in thread
From: SZEDER Gábor @ 2019-10-22 23:35 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, Derrick Stolee, Junio C Hamano

On Tue, Oct 22, 2019 at 05:45:54PM -0400, Jeff King wrote:
> On Tue, Oct 22, 2019 at 04:33:16PM -0400, Jeff King wrote:
> 
> > > I have failed to produce a test using the file:// protocol that
> > > demonstrates this bug.
> > 
> > Hmm, from the description, it sounds like it should be easy. I might
> > poke at it a bit.
> 
> Hmph. I can reproduce it here, but it seems to depend on the repository.
> If I do this:
> 
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index ecabbe1616..8d473a456f 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -583,6 +583,14 @@ test_expect_success 'fetch.writeCommitGraph' '
>  	)
>  '
>  
> +test_expect_success 'fetch.writeCommitGraph with a bigger repo' '
> +	git clone "$TEST_DIRECTORY/.." repo &&
> +	(
> +		cd repo &&
> +		git -c fetch.writeCommitGraph fetch origin
> +	)
> +'
> +
>  # configured prune tests
>  
>  set_config_tristate () {
> 
> it reliably triggers the bug. But if I make a synthetic repo, even it
> has a lot of commits (thousands or more), it doesn't trigger. I thought
> maybe it had to do with having commits that were not at tips (since the
> tip ones presumably _are_ fed into the graph generation process). But
> that doesn't seem to help.
> 
> Puzzling...

Submodules?

  $ cd ~/src/git/
  $ git quotelog 86cfd61e6b
  86cfd61e6b (sha1dc: optionally use sha1collisiondetection as a submodule, 2017-07-01)
  $ git init --bare good.git
  Initialized empty Git repository in /home/szeder/src/git/good.git/
  $ git push -q good.git 86cfd61e6b^:refs/heads/master
  $ git clone good.git good-clone
  Cloning into 'good-clone'...
  done.
  $ git -c fetch.writeCommitGraph -C good-clone fetch origin
  Computing commit graph generation numbers: 100% (46958/46958), done.
  $ git init --bare bad.git
  Initialized empty Git repository in /home/szeder/src/git/bad.git/
  $ git push -q bad.git 86cfd61e6b:refs/heads/master
  $ git clone bad.git bad-clone
  Cloning into 'bad-clone'...
  done.
  $ git -c fetch.writeCommitGraph -C bad-clone fetch origin
  Computing commit graph generation numbers: 100% (1/1), done.
  BUG: commit-graph.c:886: missing parent 9936c1b52a39fa14fca04f937df3e75f7498ac66 for commit 86cfd61e6bc12745751c43b4f69886b290cd85cb
  Aborted

In the cover letter Derrick mentioned that he used
https://github.com/derrickstolee/numbers for testing, and that repo
has a submodule as well.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] commit-graph: fix writing first commit-graph during fetch
  2019-10-22 23:35       ` SZEDER Gábor
@ 2019-10-23  0:35         ` Derrick Stolee
  2019-10-23  0:48           ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2019-10-23  0:35 UTC (permalink / raw)
  To: SZEDER Gábor, Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, Derrick Stolee, Junio C Hamano



On 10/22/2019 7:35 PM, SZEDER Gábor wrote:
> On Tue, Oct 22, 2019 at 05:45:54PM -0400, Jeff King wrote:
>> Puzzling...
> 
> Submodules?
> 
>   $ cd ~/src/git/
>   $ git quotelog 86cfd61e6b
>   86cfd61e6b (sha1dc: optionally use sha1collisiondetection as a submodule, 2017-07-01)
>   $ git init --bare good.git
>   Initialized empty Git repository in /home/szeder/src/git/good.git/
>   $ git push -q good.git 86cfd61e6b^:refs/heads/master
>   $ git clone good.git good-clone
>   Cloning into 'good-clone'...
>   done.
>   $ git -c fetch.writeCommitGraph -C good-clone fetch origin
>   Computing commit graph generation numbers: 100% (46958/46958), done.
>   $ git init --bare bad.git
>   Initialized empty Git repository in /home/szeder/src/git/bad.git/
>   $ git push -q bad.git 86cfd61e6b:refs/heads/master
>   $ git clone bad.git bad-clone
>   Cloning into 'bad-clone'...
>   done.
>   $ git -c fetch.writeCommitGraph -C bad-clone fetch origin
>   Computing commit graph generation numbers: 100% (1/1), done.
>   BUG: commit-graph.c:886: missing parent 9936c1b52a39fa14fca04f937df3e75f7498ac66 for commit 86cfd61e6bc12745751c43b4f69886b290cd85cb
>   Aborted
> 
> In the cover letter Derrick mentioned that he used
> https://github.com/derrickstolee/numbers for testing, and that repo
> has a submodule as well.

I completely forgot that I put a submodule in that repo. It makes sense
that the Git repo also has one. Thanks for the test! I'll get it into
the test suite tomorrow.

-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] commit-graph: fix writing first commit-graph during fetch
  2019-10-23  0:35         ` Derrick Stolee
@ 2019-10-23  0:48           ` Jeff King
  2019-10-23  1:22             ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2019-10-23  0:48 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: SZEDER Gábor, Derrick Stolee via GitGitGadget, git,
	Derrick Stolee, Junio C Hamano

On Tue, Oct 22, 2019 at 08:35:57PM -0400, Derrick Stolee wrote:

> > In the cover letter Derrick mentioned that he used
> > https://github.com/derrickstolee/numbers for testing, and that repo
> > has a submodule as well.
> 
> I completely forgot that I put a submodule in that repo. It makes sense
> that the Git repo also has one. Thanks for the test! I'll get it into
> the test suite tomorrow.

Ah, right. Good catch Gábor. The test below fails for me without your
patch, and succeeds with it (the extra empty commit is necessary so that
we have a parent).

I admit I am puzzled, though, _why_ the presence of the submodule
matters. That is, from your explanation, I thought the issue was simply
that `fetch` walked (and marked) some commits, and the flags overlapped
with what the commit-graph code expected.

I could guess that the presence of the submodule triggers some analysis
for --recurse-submodules. But then we don't actually recurse (maybe
because they're not activated? In which case maybe we shouldn't be doing
that extra walk to look for submodules if there aren't any activated
ones in our local repo).

I'd also wonder if it would be possible to trigger in another way (say,
due to want/have negotiation, though I guess most of the walking there
is done on the server side). But it may not be worth digging too far
into it.

-Peff

---
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index ecabbe1616..1b092fec0b 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -583,6 +583,21 @@ test_expect_success 'fetch.writeCommitGraph' '
 	)
 '
 
+test_expect_success 'fetch.writeCommitGraph with a submodule' '
+	git init has-submodule &&
+	(
+		cd has-submodule &&
+		git commit --allow-empty -m parent &&
+		git submodule add ../three &&
+		git commit -m "add submodule"
+	) &&
+	git clone has-submodule submod-clone &&
+	(
+		cd submod-clone &&
+		git -c fetch.writeCommitGraph fetch origin
+	)
+'
+
 # configured prune tests
 
 set_config_tristate () {

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] commit-graph: fix writing first commit-graph during fetch
  2019-10-23  0:48           ` Jeff King
@ 2019-10-23  1:22             ` Jeff King
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2019-10-23  1:22 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: SZEDER Gábor, Derrick Stolee via GitGitGadget, git,
	Derrick Stolee, Junio C Hamano

On Tue, Oct 22, 2019 at 08:48:20PM -0400, Jeff King wrote:

> I admit I am puzzled, though, _why_ the presence of the submodule
> matters. That is, from your explanation, I thought the issue was simply
> that `fetch` walked (and marked) some commits, and the flags overlapped
> with what the commit-graph code expected.
> 
> I could guess that the presence of the submodule triggers some analysis
> for --recurse-submodules. But then we don't actually recurse (maybe
> because they're not activated? In which case maybe we shouldn't be doing
> that extra walk to look for submodules if there aren't any activated
> ones in our local repo).

Indeed, that seems to be it. If I do this:

  git init repo
  cd repo
  cat >.gitmodules <<\EOF
  [submodule "foo"]
  path = foo
  url = https://example.com
  EOF
  time git fetch /path/to/git.git

then we end up traversing the whole git.git history a second time, even
though we should know off the bat that there are no active submodules
that we would recurse to.

Doing this makes the problem go away:

diff --git a/submodule.c b/submodule.c
index 0f199c5137..0db2f18b93 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1193,7 +1193,7 @@ static void calculate_changed_submodule_paths(struct repository *r,
 	struct string_list_item *name;
 
 	/* No need to check if there are no submodules configured */
-	if (!submodule_from_path(r, NULL, NULL))
+	if (!is_submodule_active(r, NULL))
 		return;
 
 	argv_array_push(&argv, "--"); /* argv[0] program name */

but causes some tests to fail (I think that in some cases we're supposed
to auto-initialize, and we'd probably need to cover that case, too).

All of this is outside of your fix, of course, but:

  1. I'm satisfied now that I understand why the test triggers the
     problem.

  2. You may want have a real activated submodule in your test. Right
     now we'll trigger the submodule-recursion check even without that,
     but in the future we might do something like the hunk above. In
     which case your test wouldn't be checking anything interesting
     anymore.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch
  2019-10-22 17:28 [PATCH 0/1] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
  2019-10-22 17:28 ` [PATCH 1/1] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
@ 2019-10-23 13:01 ` Derrick Stolee via GitGitGadget
  2019-10-23 13:01   ` [PATCH v2 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
                     ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-10-23 13:01 UTC (permalink / raw)
  To: git; +Cc: johannes.schindelin, peff, szeder.dev, Derrick Stolee, Junio C Hamano

UPDATE for V2: We now know the full repro, and a test is added. Thanks
Szeder and Peff for your insights!

While dogfooding, Johannes found a bug in the fetch.writeCommitGraph config
behavior. While his example initially happened during a clone with
--recurse-submodules, (UPDATE) and the submodule is important, but
--recurse-submodules is not:

$ git clone <url> test
$ cd test
$ git -c fetch.writeCommitGraph=true fetch origin
Computing commit graph generation numbers: 100% (12/12), done.
BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
Aborted (core dumped)

In the repo I had cloned, there were really 60 commits to scan, but only 12
were in the list to write when calling compute_generation_numbers(). A
commit in the list expects to see a parent, but that parent is not in the
list. The simple example I used for my testing was 
https://github.com/derrickstolee/numbers. Thie repo HAS A SUBMODULE, I just
forgot. Sorry for derailing the investigation somewhat.

The details above are the start of the commit message for [PATCH 1/2],
including a test that fails when fetching after cloning a repo with a
submodule.

In [PATCH 2/2], I actually have the fix. I tried to include as much detail
as I could for how I investigated the problem and why I think this is the
right solution. I added details that have come from the on-list discussion,
including what the submodule code is doing and why REACHABLE is no longer
used in commit-reach.c.

Thanks, -Stolee

Derrick Stolee (2):
  t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug
  commit-graph: fix writing first commit-graph during fetch

 commit-graph.c   | 11 +++++++----
 commit-reach.c   |  1 -
 object.h         |  3 ++-
 t/t5510-fetch.sh | 17 +++++++++++++++++
 4 files changed, 26 insertions(+), 6 deletions(-)


base-commit: d966095db01190a2196e31195ea6fa0c722aa732
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-415%2Fderrickstolee%2Ffetch-first-write-fail-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-415/derrickstolee/fetch-first-write-fail-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/415

Range-diff vs v1:

 -:  ---------- > 1:  6ac0a05746 t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug
 1:  a1e5280d4b ! 2:  ca59b118f1 commit-graph: fix writing first commit-graph during fetch
     @@ -2,10 +2,13 @@
      
          commit-graph: fix writing first commit-graph during fetch
      
     -    While dogfooding, Johannes found a bug in the fetch.writeCommitGraph
     -    config behavior. While his example initially happened during a clone
     -    with --recurse-submodules, we found that it is actually a problem with
     -    the first fetch after a new clone and no existing commit-graph file:
     +    The previous commit includes a failing test for an issue around
     +    fetch.writeCommitGraph and fetching in a repo with a submodule. Here, we
     +    fix that bug and set the test to "test_expect_success".
     +
     +    The prolem arises with this set of commands when the remote repo at
     +    <url> has a submodule. Note that --recurse-submodules is not needed to
     +    demonstrate the bug.
      
                  $ git clone <url> test
                  $ cd test
     @@ -14,12 +17,7 @@
                  BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
                  Aborted (core dumped)
      
     -    In the repo I had cloned, there were really 60 commits to scan, but
     -    only 12 were in the list to write when calling
     -    compute_generation_numbers(). A commit in the list expects to see a
     -    parent, but that parent is not in the list.
     -
     -    As an initial test, I converted the code in builtin/fetch.c that calls
     +    As an initial fix, I converted the code in builtin/fetch.c that calls
          write_commit_graph_reachable() to instead launch a "git commit-graph
          write --reachable --split" process. That code worked, but is not how we
          want the feature to work long-term.
     @@ -60,20 +58,28 @@
          flag inside close_reachable(), but the tips did not have the flag, so
          that did nothing.
      
     +    It turns out that the calculate_changed_submodule_paths() method is at
     +    fault. Thanks, Peff, for pointing out this detail! More specifically,
     +    for each submodule, the collect_changed_submodules() runs a revision
     +    walk to essentially do file-history on the list of submodules. That
     +    revision walk marks commits UNININTERESTING if they are simiplified away
     +    by not changing the submodule.
     +
          Instead, I finally arrived on the conclusion that I should use a flag
          that is not used in any other part of the code. In commit-reach.c, a
          number of flags were defined for commit walk algorithms. The REACHABLE
          flag seemed like it made the most sense, and it seems it was not
     -    actually used in the file.
     +    actually used in the file. The REACHABLE flag was used in early versions
     +    of commit-reach.c, but was removed by 4fbcca4 (commit-reach: make
     +    can_all_from_reach... linear, 2018-07-20).
      
          Add the REACHABLE flag to commit-graph.c and use it instead of
          UNINTERESTING in close_reachable(). This fixes the bug in manual
          testing.
      
     -    I have failed to produce a test using the file:// protocol that
     -    demonstrates this bug.
     -
          Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
     +    Helped-by: Jeff King <peff@peff.net>
     +    Helped-by: Szeder Gábor <szeder.dev@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       diff --git a/commit-graph.c b/commit-graph.c
     @@ -147,3 +153,16 @@
        * sha1-name.c:                                              20
        * list-objects-filter.c:                                      21
        * builtin/fsck.c:           0--3
     +
     + diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
     + --- a/t/t5510-fetch.sh
     + +++ b/t/t5510-fetch.sh
     +@@
     + 	)
     + '
     + 
     +-test_expect_failure 'fetch.writeCommitGraph with submodules' '
     ++test_expect_success 'fetch.writeCommitGraph with submodules' '
     + 	pwd="$(pwd)" &&
     + 	git clone dups super &&
     + 	(

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug
  2019-10-23 13:01 ` [PATCH v2 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
@ 2019-10-23 13:01   ` Derrick Stolee via GitGitGadget
  2019-10-23 14:18     ` SZEDER Gábor
  2019-10-24 12:18     ` SZEDER Gábor
  2019-10-23 13:01   ` [PATCH v2 2/2] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
  2019-10-24 12:18   ` [PATCH v3 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
  2 siblings, 2 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-10-23 13:01 UTC (permalink / raw)
  To: git
  Cc: johannes.schindelin, peff, szeder.dev, Derrick Stolee,
	Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

While dogfooding, Johannes found a bug in the fetch.writeCommitGraph
config behavior. His example initially happened during a clone with
--recurse-submodules, we found that this happens with the first fetch
after cloning a repository that contains a submodule:

	$ git clone <url> test
	$ cd test
	$ git -c fetch.writeCommitGraph=true fetch origin
	Computing commit graph generation numbers: 100% (12/12), done.
	BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
	Aborted (core dumped)

In the repo I had cloned, there were really 60 commits to scan, but
only 12 were in the list to write when calling
compute_generation_numbers(). A commit in the list expects to see a
parent, but that parent is not in the list.

A follow-up will fix the bug, but first we create a test that
demonstrates the problem.

I used "test_expect_failure" for the entire test instead of
"test_must_fail" only on the command that I expect to fail. This is
because the BUG() returns an exit code so test_must_fail complains.

Helped-by: Jeff King <peff@peff.net>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t5510-fetch.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index ecabbe1616..e8ae3af0b6 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -583,6 +583,23 @@ test_expect_success 'fetch.writeCommitGraph' '
 	)
 '
 
+test_expect_failure 'fetch.writeCommitGraph with submodules' '
+	pwd="$(pwd)" &&
+	git clone dups super &&
+	(
+		cd super &&
+		git submodule add "file://$pwd/three" &&
+		git commit -m "add submodule"
+	) &&
+	git clone "super" writeError &&
+	(
+		cd writeError &&
+		test_path_is_missing .git/objects/info/commit-graphs/commit-graph-chain &&
+		git -c fetch.writeCommitGraph=true fetch origin &&
+		test_path_is_file .git/objects/info/commit-graphs/commit-graph-chain
+	)
+'
+
 # configured prune tests
 
 set_config_tristate () {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 2/2] commit-graph: fix writing first commit-graph during fetch
  2019-10-23 13:01 ` [PATCH v2 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
  2019-10-23 13:01   ` [PATCH v2 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
@ 2019-10-23 13:01   ` Derrick Stolee via GitGitGadget
  2019-10-23 15:04     ` SZEDER Gábor
  2019-10-24 12:18   ` [PATCH v3 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
  2 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-10-23 13:01 UTC (permalink / raw)
  To: git
  Cc: johannes.schindelin, peff, szeder.dev, Derrick Stolee,
	Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The previous commit includes a failing test for an issue around
fetch.writeCommitGraph and fetching in a repo with a submodule. Here, we
fix that bug and set the test to "test_expect_success".

The prolem arises with this set of commands when the remote repo at
<url> has a submodule. Note that --recurse-submodules is not needed to
demonstrate the bug.

	$ git clone <url> test
	$ cd test
	$ git -c fetch.writeCommitGraph=true fetch origin
	Computing commit graph generation numbers: 100% (12/12), done.
	BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
	Aborted (core dumped)

As an initial fix, I converted the code in builtin/fetch.c that calls
write_commit_graph_reachable() to instead launch a "git commit-graph
write --reachable --split" process. That code worked, but is not how we
want the feature to work long-term.

That test did demonstrate that the issue must be something to do with
internal state of the 'git fetch' process.

The write_commit_graph() method in commit-graph.c ensures the commits we
plan to write are "closed under reachability" using close_reachable().
This method walks from the input commits, and uses the UNINTERESTING
flag to mark which commits have already been visited. This allows the
walk to take O(N) time, where N is the number of commits, instead of
O(P) time, where P is the number of paths. (The number of paths can be
exponential in the number of commits.)

However, the UNINTERESTING flag is used in lots of places in the
codebase. This flag usually means some barrier to stop a commit walk,
such as in revision-walking to compare histories. It is not often
cleared after the walk completes because the starting points of those
walks do not have the UNINTERESTING flag, and clear_commit_marks() would
stop immediately.

This is happening during a 'git fetch' call with a remote. The fetch
negotiation is comparing the remote refs with the local refs and marking
some commits as UNINTERESTING.

You may ask: did this feature ever work at all? Yes, it did, as long as
you had a commit-graph covering all of your local refs. My testing was
unfortunately limited to this scenario. The UNINTERESTING commits are
always part of the "old" commit-graph, and when we add new commits to a
top layer of the commit-graph chain those are not needed. If we happen
to merge layers of the chain, then the commits are added as a list, not
using a commit walk. Further, the test added for this config option in
t5510-fetch.sh uses local filesystem clones, which somehow avoids this
logic.

I tested running clear_commit_marks_many() to clear the UNINTERESTING
flag inside close_reachable(), but the tips did not have the flag, so
that did nothing.

It turns out that the calculate_changed_submodule_paths() method is at
fault. Thanks, Peff, for pointing out this detail! More specifically,
for each submodule, the collect_changed_submodules() runs a revision
walk to essentially do file-history on the list of submodules. That
revision walk marks commits UNININTERESTING if they are simiplified away
by not changing the submodule.

Instead, I finally arrived on the conclusion that I should use a flag
that is not used in any other part of the code. In commit-reach.c, a
number of flags were defined for commit walk algorithms. The REACHABLE
flag seemed like it made the most sense, and it seems it was not
actually used in the file. The REACHABLE flag was used in early versions
of commit-reach.c, but was removed by 4fbcca4 (commit-reach: make
can_all_from_reach... linear, 2018-07-20).

Add the REACHABLE flag to commit-graph.c and use it instead of
UNINTERESTING in close_reachable(). This fixes the bug in manual
testing.

Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c   | 11 +++++++----
 commit-reach.c   |  1 -
 object.h         |  3 ++-
 t/t5510-fetch.sh |  2 +-
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index fc4a43b8d6..0aea7b2ae5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -41,6 +41,9 @@
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
 			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
+/* Remember to update object flag allocation in object.h */
+#define REACHABLE       (1u<<15)
+
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
@@ -1030,11 +1033,11 @@ static void add_missing_parents(struct write_commit_graph_context *ctx, struct c
 {
 	struct commit_list *parent;
 	for (parent = commit->parents; parent; parent = parent->next) {
-		if (!(parent->item->object.flags & UNINTERESTING)) {
+		if (!(parent->item->object.flags & REACHABLE)) {
 			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
 			oidcpy(&ctx->oids.list[ctx->oids.nr], &(parent->item->object.oid));
 			ctx->oids.nr++;
-			parent->item->object.flags |= UNINTERESTING;
+			parent->item->object.flags |= REACHABLE;
 		}
 	}
 }
@@ -1052,7 +1055,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 		display_progress(ctx->progress, i + 1);
 		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 		if (commit)
-			commit->object.flags |= UNINTERESTING;
+			commit->object.flags |= REACHABLE;
 	}
 	stop_progress(&ctx->progress);
 
@@ -1089,7 +1092,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
 		if (commit)
-			commit->object.flags &= ~UNINTERESTING;
+			commit->object.flags &= ~REACHABLE;
 	}
 	stop_progress(&ctx->progress);
 }
diff --git a/commit-reach.c b/commit-reach.c
index 3ea174788a..4ca7e706a1 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -10,7 +10,6 @@
 #include "commit-reach.h"
 
 /* Remember to update object flag allocation in object.h */
-#define REACHABLE       (1u<<15)
 #define PARENT1		(1u<<16)
 #define PARENT2		(1u<<17)
 #define STALE		(1u<<18)
diff --git a/object.h b/object.h
index 0120892bbd..25f5ab3d54 100644
--- a/object.h
+++ b/object.h
@@ -68,7 +68,8 @@ struct object_array {
  * bisect.c:                                        16
  * bundle.c:                                        16
  * http-push.c:                                     16-----19
- * commit-reach.c:                                15-------19
+ * commit-graph.c:                                15
+ * commit-reach.c:                                  16-----19
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
  * builtin/fsck.c:           0--3
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index e8ae3af0b6..7bfbcc2036 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -583,7 +583,7 @@ test_expect_success 'fetch.writeCommitGraph' '
 	)
 '
 
-test_expect_failure 'fetch.writeCommitGraph with submodules' '
+test_expect_success 'fetch.writeCommitGraph with submodules' '
 	pwd="$(pwd)" &&
 	git clone dups super &&
 	(
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug
  2019-10-23 13:01   ` [PATCH v2 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
@ 2019-10-23 14:18     ` SZEDER Gábor
  2019-10-23 20:46       ` Derrick Stolee
  2019-10-24 12:18     ` SZEDER Gábor
  1 sibling, 1 reply; 23+ messages in thread
From: SZEDER Gábor @ 2019-10-23 14:18 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, johannes.schindelin, peff, Derrick Stolee, Junio C Hamano

On Wed, Oct 23, 2019 at 01:01:34PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> While dogfooding, Johannes found a bug in the fetch.writeCommitGraph
> config behavior. His example initially happened during a clone with
> --recurse-submodules, we found that this happens with the first fetch
> after cloning a repository that contains a submodule:
> 
> 	$ git clone <url> test
> 	$ cd test
> 	$ git -c fetch.writeCommitGraph=true fetch origin
> 	Computing commit graph generation numbers: 100% (12/12), done.
> 	BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
> 	Aborted (core dumped)
> 
> In the repo I had cloned, there were really 60 commits to scan, but
> only 12 were in the list to write when calling
> compute_generation_numbers(). A commit in the list expects to see a
> parent, but that parent is not in the list.
> 
> A follow-up will fix the bug, but first we create a test that
> demonstrates the problem.
> 
> I used "test_expect_failure" for the entire test instead of
> "test_must_fail" only on the command that I expect to fail. This is
> because the BUG() returns an exit code so test_must_fail complains.

I don't think this paragraph is necessary; using 'test_expect_failure'
is the way to demonstrate a known breakage.

'test_must_fail' should only be used when the failure is the desired
behavior of a git command.  (I used the word "desired" here, because
you just used the word "expect" above in the sense that "I expect it
to fail, because I know it's buggy, and I want to demonstrate that
bug")

> Helped-by: Jeff King <peff@peff.net>
> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> Helped-by: Szeder Gábor <szeder.dev@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t5510-fetch.sh | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index ecabbe1616..e8ae3af0b6 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -583,6 +583,23 @@ test_expect_success 'fetch.writeCommitGraph' '
>  	)
>  '
>  
> +test_expect_failure 'fetch.writeCommitGraph with submodules' '
> +	pwd="$(pwd)" &&
> +	git clone dups super &&
> +	(
> +		cd super &&
> +		git submodule add "file://$pwd/three" &&

Nits: couldn't the URL simply be file://$TRASH_DIRECTORY/three ?

> +		git commit -m "add submodule"
> +	) &&
> +	git clone "super" writeError &&

There is a write error now, because we have a bug, but after the next
patch the bug will be fixed and we won't have a write error anymore.

> +	(
> +		cd writeError &&
> +		test_path_is_missing .git/objects/info/commit-graphs/commit-graph-chain &&
> +		git -c fetch.writeCommitGraph=true fetch origin &&
> +		test_path_is_file .git/objects/info/commit-graphs/commit-graph-chain
> +	)
> +'
> +
>  # configured prune tests
>  
>  set_config_tristate () {
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/2] commit-graph: fix writing first commit-graph during fetch
  2019-10-23 13:01   ` [PATCH v2 2/2] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
@ 2019-10-23 15:04     ` SZEDER Gábor
  2019-10-24 10:39       ` Derrick Stolee
  0 siblings, 1 reply; 23+ messages in thread
From: SZEDER Gábor @ 2019-10-23 15:04 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, johannes.schindelin, peff, Derrick Stolee, Junio C Hamano

On Wed, Oct 23, 2019 at 01:01:35PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The previous commit includes a failing test for an issue around
> fetch.writeCommitGraph and fetching in a repo with a submodule. Here, we
> fix that bug and set the test to "test_expect_success".
> 
> The prolem arises with this set of commands when the remote repo at

s/prolem/problem/

> <url> has a submodule. Note that --recurse-submodules is not needed to
> demonstrate the bug.
> 
> 	$ git clone <url> test
> 	$ cd test
> 	$ git -c fetch.writeCommitGraph=true fetch origin
> 	Computing commit graph generation numbers: 100% (12/12), done.
> 	BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
> 	Aborted (core dumped)
> 
> As an initial fix, I converted the code in builtin/fetch.c that calls
> write_commit_graph_reachable() to instead launch a "git commit-graph
> write --reachable --split" process. That code worked, but is not how we
> want the feature to work long-term.
> 
> That test did demonstrate that the issue must be something to do with
> internal state of the 'git fetch' process.
> 
> The write_commit_graph() method in commit-graph.c ensures the commits we
> plan to write are "closed under reachability" using close_reachable().
> This method walks from the input commits, and uses the UNINTERESTING
> flag to mark which commits have already been visited. This allows the
> walk to take O(N) time, where N is the number of commits, instead of
> O(P) time, where P is the number of paths. (The number of paths can be
> exponential in the number of commits.)
> 
> However, the UNINTERESTING flag is used in lots of places in the
> codebase. This flag usually means some barrier to stop a commit walk,
> such as in revision-walking to compare histories. It is not often
> cleared after the walk completes because the starting points of those
> walks do not have the UNINTERESTING flag, and clear_commit_marks() would
> stop immediately.
> 
> This is happening during a 'git fetch' call with a remote. The fetch
> negotiation is comparing the remote refs with the local refs and marking
> some commits as UNINTERESTING.
> 
> You may ask: did this feature ever work at all? Yes, it did, as long as
> you had a commit-graph covering all of your local refs. My testing was
> unfortunately limited to this scenario. The UNINTERESTING commits are
> always part of the "old" commit-graph, and when we add new commits to a
> top layer of the commit-graph chain those are not needed. If we happen
> to merge layers of the chain, then the commits are added as a list, not
> using a commit walk. Further, the test added for this config option in
> t5510-fetch.sh uses local filesystem clones, which somehow avoids this
> logic.

Does this last sentence still holds, given that a submodule plays a
crucial role in triggering this bug?  I think it either doesn't, or
I still don't completely understand the situation.

> I tested running clear_commit_marks_many() to clear the UNINTERESTING
> flag inside close_reachable(), but the tips did not have the flag, so
> that did nothing.
> 
> It turns out that the calculate_changed_submodule_paths() method is at
> fault. Thanks, Peff, for pointing out this detail! More specifically,
> for each submodule, the collect_changed_submodules() runs a revision
> walk to essentially do file-history on the list of submodules. That
> revision walk marks commits UNININTERESTING if they are simiplified away

s/simiplified/simplified/

> by not changing the submodule.
> 
> Instead, I finally arrived on the conclusion that I should use a flag
> that is not used in any other part of the code. In commit-reach.c, a
> number of flags were defined for commit walk algorithms. The REACHABLE
> flag seemed like it made the most sense, and it seems it was not
> actually used in the file. The REACHABLE flag was used in early versions
> of commit-reach.c, but was removed by 4fbcca4 (commit-reach: make
> can_all_from_reach... linear, 2018-07-20).
> 
> Add the REACHABLE flag to commit-graph.c and use it instead of
> UNINTERESTING in close_reachable(). This fixes the bug in manual
> testing.

I'm inclined to agree that using a flag that is not used anywhere else
is the safest thing to do, and at -rcX time safest is good.  I'm not
sure whether it's the right thing to do in the long term, though.

Furthermore, calling this flag REACHABLE is misleading, because the
code actually means SEEN.
Consider the following sequence of commands:

  # Create a pack with two commits
  $ git commit --allow-empty -m one &&
  $ git commit --allow-empty -m two &&
  $ git repack -ad &&
  # Make one of those commits unreachable
  $ git reset --hard HEAD^ &&
  # Not even from reflogs!
  $ git reflog expire --expire-unreachable=now --all
  # Now write a commit-graph from that pack file
  $ git commit-graph write
  Computing commit graph generation numbers: 100% (2/2), done.

It added two commits to the commit-graph, although one of them is
clearly not reachable anymore, so marking it as REACHABLE while
enumerating all commits feels wrong.

> Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> Helped-by: Jeff King <peff@peff.net>
> Helped-by: Szeder Gábor <szeder.dev@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit-graph.c   | 11 +++++++----
>  commit-reach.c   |  1 -
>  object.h         |  3 ++-
>  t/t5510-fetch.sh |  2 +-
>  4 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index fc4a43b8d6..0aea7b2ae5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -41,6 +41,9 @@
>  #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
>  			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
>  
> +/* Remember to update object flag allocation in object.h */
> +#define REACHABLE       (1u<<15)
> +
>  char *get_commit_graph_filename(const char *obj_dir)
>  {
>  	char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
> @@ -1030,11 +1033,11 @@ static void add_missing_parents(struct write_commit_graph_context *ctx, struct c
>  {
>  	struct commit_list *parent;
>  	for (parent = commit->parents; parent; parent = parent->next) {
> -		if (!(parent->item->object.flags & UNINTERESTING)) {
> +		if (!(parent->item->object.flags & REACHABLE)) {
>  			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
>  			oidcpy(&ctx->oids.list[ctx->oids.nr], &(parent->item->object.oid));
>  			ctx->oids.nr++;
> -			parent->item->object.flags |= UNINTERESTING;
> +			parent->item->object.flags |= REACHABLE;
>  		}
>  	}
>  }
> @@ -1052,7 +1055,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
>  		display_progress(ctx->progress, i + 1);
>  		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
>  		if (commit)
> -			commit->object.flags |= UNINTERESTING;
> +			commit->object.flags |= REACHABLE;
>  	}
>  	stop_progress(&ctx->progress);
>  
> @@ -1089,7 +1092,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
>  		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
>  
>  		if (commit)
> -			commit->object.flags &= ~UNINTERESTING;
> +			commit->object.flags &= ~REACHABLE;
>  	}
>  	stop_progress(&ctx->progress);
>  }
> diff --git a/commit-reach.c b/commit-reach.c
> index 3ea174788a..4ca7e706a1 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -10,7 +10,6 @@
>  #include "commit-reach.h"
>  
>  /* Remember to update object flag allocation in object.h */
> -#define REACHABLE       (1u<<15)
>  #define PARENT1		(1u<<16)
>  #define PARENT2		(1u<<17)
>  #define STALE		(1u<<18)
> diff --git a/object.h b/object.h
> index 0120892bbd..25f5ab3d54 100644
> --- a/object.h
> +++ b/object.h
> @@ -68,7 +68,8 @@ struct object_array {
>   * bisect.c:                                        16
>   * bundle.c:                                        16
>   * http-push.c:                                     16-----19
> - * commit-reach.c:                                15-------19
> + * commit-graph.c:                                15
> + * commit-reach.c:                                  16-----19
>   * sha1-name.c:                                              20
>   * list-objects-filter.c:                                      21
>   * builtin/fsck.c:           0--3
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index e8ae3af0b6..7bfbcc2036 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -583,7 +583,7 @@ test_expect_success 'fetch.writeCommitGraph' '
>  	)
>  '
>  
> -test_expect_failure 'fetch.writeCommitGraph with submodules' '
> +test_expect_success 'fetch.writeCommitGraph with submodules' '
>  	pwd="$(pwd)" &&
>  	git clone dups super &&
>  	(
> -- 
> gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug
  2019-10-23 14:18     ` SZEDER Gábor
@ 2019-10-23 20:46       ` Derrick Stolee
  0 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee @ 2019-10-23 20:46 UTC (permalink / raw)
  To: SZEDER Gábor, Derrick Stolee via GitGitGadget
  Cc: git, johannes.schindelin, peff, Derrick Stolee, Junio C Hamano

On 10/23/2019 10:18 AM, SZEDER Gábor wrote:
> On Wed, Oct 23, 2019 at 01:01:34PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> While dogfooding, Johannes found a bug in the fetch.writeCommitGraph
>> config behavior. His example initially happened during a clone with
>> --recurse-submodules, we found that this happens with the first fetch
>> after cloning a repository that contains a submodule:
>>
>> 	$ git clone <url> test
>> 	$ cd test
>> 	$ git -c fetch.writeCommitGraph=true fetch origin
>> 	Computing commit graph generation numbers: 100% (12/12), done.
>> 	BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
>> 	Aborted (core dumped)
>>
>> In the repo I had cloned, there were really 60 commits to scan, but
>> only 12 were in the list to write when calling
>> compute_generation_numbers(). A commit in the list expects to see a
>> parent, but that parent is not in the list.
>>
>> A follow-up will fix the bug, but first we create a test that
>> demonstrates the problem.
>>
>> I used "test_expect_failure" for the entire test instead of
>> "test_must_fail" only on the command that I expect to fail. This is
>> because the BUG() returns an exit code so test_must_fail complains.
> 
> I don't think this paragraph is necessary; using 'test_expect_failure'
> is the way to demonstrate a known breakage.
> 
> 'test_must_fail' should only be used when the failure is the desired
> behavior of a git command.  (I used the word "desired" here, because
> you just used the word "expect" above in the sense that "I expect it
> to fail, because I know it's buggy, and I want to demonstrate that
> bug")

I guess that I prefer pointing out which line of the test fails, and
making that part of the test (that must otherwise pass). However, you
are right that test_expect_failure does a good job of communicating that
the test script shows an existing bug. Those are not always cleaned up
immediately, but at least we can find them easily.

>> Helped-by: Jeff King <peff@peff.net>
>> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>> Helped-by: Szeder Gábor <szeder.dev@gmail.com>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  t/t5510-fetch.sh | 17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
>> index ecabbe1616..e8ae3af0b6 100755
>> --- a/t/t5510-fetch.sh
>> +++ b/t/t5510-fetch.sh
>> @@ -583,6 +583,23 @@ test_expect_success 'fetch.writeCommitGraph' '
>>  	)
>>  '
>>  
>> +test_expect_failure 'fetch.writeCommitGraph with submodules' '
>> +	pwd="$(pwd)" &&
>> +	git clone dups super &&
>> +	(
>> +		cd super &&
>> +		git submodule add "file://$pwd/three" &&
> 
> Nits: couldn't the URL simply be file://$TRASH_DIRECTORY/three ?

True, that would be better. Thanks!

 
>> +		git commit -m "add submodule"
>> +	) &&
>> +	git clone "super" writeError &&
> 
> There is a write error now, because we have a bug, but after the next
> patch the bug will be fixed and we won't have a write error anymore.

Good point.

Thanks!
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/2] commit-graph: fix writing first commit-graph during fetch
  2019-10-23 15:04     ` SZEDER Gábor
@ 2019-10-24 10:39       ` Derrick Stolee
  2019-10-30 14:31         ` SZEDER Gábor
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2019-10-24 10:39 UTC (permalink / raw)
  To: SZEDER Gábor, Derrick Stolee via GitGitGadget
  Cc: git, johannes.schindelin, peff, Derrick Stolee, Junio C Hamano

On 10/23/2019 11:04 AM, SZEDER Gábor wrote:
> On Wed, Oct 23, 2019 at 01:01:35PM +0000, Derrick Stolee via GitGitGadget wrote:
>>
>> You may ask: did this feature ever work at all? Yes, it did, as long as
>> you had a commit-graph covering all of your local refs. My testing was
>> unfortunately limited to this scenario. The UNINTERESTING commits are
>> always part of the "old" commit-graph, and when we add new commits to a
>> top layer of the commit-graph chain those are not needed. If we happen
>> to merge layers of the chain, then the commits are added as a list, not
>> using a commit walk. Further, the test added for this config option in
>> t5510-fetch.sh uses local filesystem clones, which somehow avoids this
>> logic.
> 
> Does this last sentence still holds, given that a submodule plays a
> crucial role in triggering this bug?  I think it either doesn't, or
> I still don't completely understand the situation.

This does not apply anymore. I forgot to delete it.
 
>> I tested running clear_commit_marks_many() to clear the UNINTERESTING
>> flag inside close_reachable(), but the tips did not have the flag, so
>> that did nothing.
>>
>> It turns out that the calculate_changed_submodule_paths() method is at
>> fault. Thanks, Peff, for pointing out this detail! More specifically,
>> for each submodule, the collect_changed_submodules() runs a revision
>> walk to essentially do file-history on the list of submodules. That
>> revision walk marks commits UNININTERESTING if they are simiplified away
> 
> s/simiplified/simplified/
> 
>> by not changing the submodule.
>>
>> Instead, I finally arrived on the conclusion that I should use a flag
>> that is not used in any other part of the code. In commit-reach.c, a
>> number of flags were defined for commit walk algorithms. The REACHABLE
>> flag seemed like it made the most sense, and it seems it was not
>> actually used in the file. The REACHABLE flag was used in early versions
>> of commit-reach.c, but was removed by 4fbcca4 (commit-reach: make
>> can_all_from_reach... linear, 2018-07-20).
>>
>> Add the REACHABLE flag to commit-graph.c and use it instead of
>> UNINTERESTING in close_reachable(). This fixes the bug in manual
>> testing.
> 
> I'm inclined to agree that using a flag that is not used anywhere else
> is the safest thing to do, and at -rcX time safest is good.  I'm not
> sure whether it's the right thing to do in the long term, though.
> 
> Furthermore, calling this flag REACHABLE is misleading, because the
> code actually means SEEN.
> Consider the following sequence of commands:
> 
>   # Create a pack with two commits
>   $ git commit --allow-empty -m one &&
>   $ git commit --allow-empty -m two &&
>   $ git repack -ad &&
>   # Make one of those commits unreachable
>   $ git reset --hard HEAD^ &&
>   # Not even from reflogs!
>   $ git reflog expire --expire-unreachable=now --all
>   # Now write a commit-graph from that pack file
>   $ git commit-graph write
>   Computing commit graph generation numbers: 100% (2/2), done.
> 
> It added two commits to the commit-graph, although one of them is
> clearly not reachable anymore, so marking it as REACHABLE while
> enumerating all commits feels wrong.

Since you are using "git commit-graph write", the command is scanning
all pack-files for commits to include. Even in this case, the
close_reachable() method needs to walk to see if any commits are missing.
(It could be that the root commit is loose for some strange reason.)

In this case, we are marking REACHABLE the commits that can be reached
from our "starting" commits. In your example we start with every commit.

If you had used `git commit-graph write --stdin-packs` and provided a
small pack name over stdin, the concept would be similar and even more
pronounced: the pack (perhaps downloaded via 'fetch') is not likely to
contain every commit, so we need to walk all reachable commits from
those included.

I'll have a v3 today with the requested fixes.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch
  2019-10-23 13:01 ` [PATCH v2 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
  2019-10-23 13:01   ` [PATCH v2 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
  2019-10-23 13:01   ` [PATCH v2 2/2] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
@ 2019-10-24 12:18   ` Derrick Stolee via GitGitGadget
  2019-10-24 12:18     ` [PATCH v3 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
                       ` (2 more replies)
  2 siblings, 3 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-10-24 12:18 UTC (permalink / raw)
  To: git; +Cc: johannes.schindelin, peff, szeder.dev, Derrick Stolee, Junio C Hamano

UPDATE for V2: We now know the full repro, and a test is added. Thanks
Szeder and Peff for your insights!

UPDATE in V3: Cleaned up the commit messages and some test details.

While dogfooding, Johannes found a bug in the fetch.writeCommitGraph config
behavior. While his example initially happened during a clone with
--recurse-submodules, (UPDATE) and the submodule is important, but
--recurse-submodules is not:

$ git clone <url> test
$ cd test
$ git -c fetch.writeCommitGraph=true fetch origin
Computing commit graph generation numbers: 100% (12/12), done.
BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
Aborted (core dumped)

In the repo I had cloned, there were really 60 commits to scan, but only 12
were in the list to write when calling compute_generation_numbers(). A
commit in the list expects to see a parent, but that parent is not in the
list. The simple example I used for my testing was 
https://github.com/derrickstolee/numbers. Thie repo HAS A SUBMODULE, I just
forgot. Sorry for derailing the investigation somewhat.

The details above are the start of the commit message for [PATCH 1/2],
including a test that fails when fetching after cloning a repo with a
submodule.

In [PATCH 2/2], I actually have the fix. I tried to include as much detail
as I could for how I investigated the problem and why I think this is the
right solution. I added details that have come from the on-list discussion,
including what the submodule code is doing and why REACHABLE is no longer
used in commit-reach.c.

Thanks, -Stolee

Derrick Stolee (2):
  t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug
  commit-graph: fix writing first commit-graph during fetch

 commit-graph.c   | 11 +++++++----
 commit-reach.c   |  1 -
 object.h         |  3 ++-
 t/t5510-fetch.sh | 16 ++++++++++++++++
 4 files changed, 25 insertions(+), 6 deletions(-)


base-commit: d966095db01190a2196e31195ea6fa0c722aa732
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-415%2Fderrickstolee%2Ffetch-first-write-fail-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-415/derrickstolee/fetch-first-write-fail-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/415

Range-diff vs v2:

 1:  6ac0a05746 ! 1:  ce53b5a7bf t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug
     @@ -22,10 +22,6 @@
          A follow-up will fix the bug, but first we create a test that
          demonstrates the problem.
      
     -    I used "test_expect_failure" for the entire test instead of
     -    "test_must_fail" only on the command that I expect to fail. This is
     -    because the BUG() returns an exit code so test_must_fail complains.
     -
          Helped-by: Jeff King <peff@peff.net>
          Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
          Helped-by: Szeder Gábor <szeder.dev@gmail.com>
     @@ -39,16 +35,15 @@
       '
       
      +test_expect_failure 'fetch.writeCommitGraph with submodules' '
     -+	pwd="$(pwd)" &&
      +	git clone dups super &&
      +	(
      +		cd super &&
     -+		git submodule add "file://$pwd/three" &&
     ++		git submodule add "file://$TRASH_DIRECTORY/three" &&
      +		git commit -m "add submodule"
      +	) &&
     -+	git clone "super" writeError &&
     ++	git clone "super" super-clone &&
      +	(
     -+		cd writeError &&
     ++		cd super-clone &&
      +		test_path_is_missing .git/objects/info/commit-graphs/commit-graph-chain &&
      +		git -c fetch.writeCommitGraph=true fetch origin &&
      +		test_path_is_file .git/objects/info/commit-graphs/commit-graph-chain
 2:  ca59b118f1 ! 2:  edacfff490 commit-graph: fix writing first commit-graph during fetch
     @@ -6,7 +6,7 @@
          fetch.writeCommitGraph and fetching in a repo with a submodule. Here, we
          fix that bug and set the test to "test_expect_success".
      
     -    The prolem arises with this set of commands when the remote repo at
     +    The problem arises with this set of commands when the remote repo at
          <url> has a submodule. Note that --recurse-submodules is not needed to
          demonstrate the bug.
      
     @@ -44,16 +44,6 @@
          negotiation is comparing the remote refs with the local refs and marking
          some commits as UNINTERESTING.
      
     -    You may ask: did this feature ever work at all? Yes, it did, as long as
     -    you had a commit-graph covering all of your local refs. My testing was
     -    unfortunately limited to this scenario. The UNINTERESTING commits are
     -    always part of the "old" commit-graph, and when we add new commits to a
     -    top layer of the commit-graph chain those are not needed. If we happen
     -    to merge layers of the chain, then the commits are added as a list, not
     -    using a commit walk. Further, the test added for this config option in
     -    t5510-fetch.sh uses local filesystem clones, which somehow avoids this
     -    logic.
     -
          I tested running clear_commit_marks_many() to clear the UNINTERESTING
          flag inside close_reachable(), but the tips did not have the flag, so
          that did nothing.
     @@ -62,7 +52,7 @@
          fault. Thanks, Peff, for pointing out this detail! More specifically,
          for each submodule, the collect_changed_submodules() runs a revision
          walk to essentially do file-history on the list of submodules. That
     -    revision walk marks commits UNININTERESTING if they are simiplified away
     +    revision walk marks commits UNININTERESTING if they are simplified away
          by not changing the submodule.
      
          Instead, I finally arrived on the conclusion that I should use a flag
     @@ -163,6 +153,6 @@
       
      -test_expect_failure 'fetch.writeCommitGraph with submodules' '
      +test_expect_success 'fetch.writeCommitGraph with submodules' '
     - 	pwd="$(pwd)" &&
       	git clone dups super &&
       	(
     + 		cd super &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug
  2019-10-24 12:18   ` [PATCH v3 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
@ 2019-10-24 12:18     ` Derrick Stolee via GitGitGadget
  2019-10-24 12:18     ` [PATCH v3 2/2] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
  2019-10-24 13:40     ` [PATCH v4 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
  2 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-10-24 12:18 UTC (permalink / raw)
  To: git
  Cc: johannes.schindelin, peff, szeder.dev, Derrick Stolee,
	Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

While dogfooding, Johannes found a bug in the fetch.writeCommitGraph
config behavior. His example initially happened during a clone with
--recurse-submodules, we found that this happens with the first fetch
after cloning a repository that contains a submodule:

	$ git clone <url> test
	$ cd test
	$ git -c fetch.writeCommitGraph=true fetch origin
	Computing commit graph generation numbers: 100% (12/12), done.
	BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
	Aborted (core dumped)

In the repo I had cloned, there were really 60 commits to scan, but
only 12 were in the list to write when calling
compute_generation_numbers(). A commit in the list expects to see a
parent, but that parent is not in the list.

A follow-up will fix the bug, but first we create a test that
demonstrates the problem.

Helped-by: Jeff King <peff@peff.net>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t5510-fetch.sh | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index ecabbe1616..cc76e1ce50 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -583,6 +583,22 @@ test_expect_success 'fetch.writeCommitGraph' '
 	)
 '
 
+test_expect_failure 'fetch.writeCommitGraph with submodules' '
+	git clone dups super &&
+	(
+		cd super &&
+		git submodule add "file://$TRASH_DIRECTORY/three" &&
+		git commit -m "add submodule"
+	) &&
+	git clone "super" super-clone &&
+	(
+		cd super-clone &&
+		test_path_is_missing .git/objects/info/commit-graphs/commit-graph-chain &&
+		git -c fetch.writeCommitGraph=true fetch origin &&
+		test_path_is_file .git/objects/info/commit-graphs/commit-graph-chain
+	)
+'
+
 # configured prune tests
 
 set_config_tristate () {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 2/2] commit-graph: fix writing first commit-graph during fetch
  2019-10-24 12:18   ` [PATCH v3 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
  2019-10-24 12:18     ` [PATCH v3 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
@ 2019-10-24 12:18     ` Derrick Stolee via GitGitGadget
  2019-10-24 13:40     ` [PATCH v4 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
  2 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-10-24 12:18 UTC (permalink / raw)
  To: git
  Cc: johannes.schindelin, peff, szeder.dev, Derrick Stolee,
	Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The previous commit includes a failing test for an issue around
fetch.writeCommitGraph and fetching in a repo with a submodule. Here, we
fix that bug and set the test to "test_expect_success".

The problem arises with this set of commands when the remote repo at
<url> has a submodule. Note that --recurse-submodules is not needed to
demonstrate the bug.

	$ git clone <url> test
	$ cd test
	$ git -c fetch.writeCommitGraph=true fetch origin
	Computing commit graph generation numbers: 100% (12/12), done.
	BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
	Aborted (core dumped)

As an initial fix, I converted the code in builtin/fetch.c that calls
write_commit_graph_reachable() to instead launch a "git commit-graph
write --reachable --split" process. That code worked, but is not how we
want the feature to work long-term.

That test did demonstrate that the issue must be something to do with
internal state of the 'git fetch' process.

The write_commit_graph() method in commit-graph.c ensures the commits we
plan to write are "closed under reachability" using close_reachable().
This method walks from the input commits, and uses the UNINTERESTING
flag to mark which commits have already been visited. This allows the
walk to take O(N) time, where N is the number of commits, instead of
O(P) time, where P is the number of paths. (The number of paths can be
exponential in the number of commits.)

However, the UNINTERESTING flag is used in lots of places in the
codebase. This flag usually means some barrier to stop a commit walk,
such as in revision-walking to compare histories. It is not often
cleared after the walk completes because the starting points of those
walks do not have the UNINTERESTING flag, and clear_commit_marks() would
stop immediately.

This is happening during a 'git fetch' call with a remote. The fetch
negotiation is comparing the remote refs with the local refs and marking
some commits as UNINTERESTING.

I tested running clear_commit_marks_many() to clear the UNINTERESTING
flag inside close_reachable(), but the tips did not have the flag, so
that did nothing.

It turns out that the calculate_changed_submodule_paths() method is at
fault. Thanks, Peff, for pointing out this detail! More specifically,
for each submodule, the collect_changed_submodules() runs a revision
walk to essentially do file-history on the list of submodules. That
revision walk marks commits UNININTERESTING if they are simplified away
by not changing the submodule.

Instead, I finally arrived on the conclusion that I should use a flag
that is not used in any other part of the code. In commit-reach.c, a
number of flags were defined for commit walk algorithms. The REACHABLE
flag seemed like it made the most sense, and it seems it was not
actually used in the file. The REACHABLE flag was used in early versions
of commit-reach.c, but was removed by 4fbcca4 (commit-reach: make
can_all_from_reach... linear, 2018-07-20).

Add the REACHABLE flag to commit-graph.c and use it instead of
UNINTERESTING in close_reachable(). This fixes the bug in manual
testing.

Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c   | 11 +++++++----
 commit-reach.c   |  1 -
 object.h         |  3 ++-
 t/t5510-fetch.sh |  2 +-
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index fc4a43b8d6..0aea7b2ae5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -41,6 +41,9 @@
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
 			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
+/* Remember to update object flag allocation in object.h */
+#define REACHABLE       (1u<<15)
+
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
@@ -1030,11 +1033,11 @@ static void add_missing_parents(struct write_commit_graph_context *ctx, struct c
 {
 	struct commit_list *parent;
 	for (parent = commit->parents; parent; parent = parent->next) {
-		if (!(parent->item->object.flags & UNINTERESTING)) {
+		if (!(parent->item->object.flags & REACHABLE)) {
 			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
 			oidcpy(&ctx->oids.list[ctx->oids.nr], &(parent->item->object.oid));
 			ctx->oids.nr++;
-			parent->item->object.flags |= UNINTERESTING;
+			parent->item->object.flags |= REACHABLE;
 		}
 	}
 }
@@ -1052,7 +1055,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 		display_progress(ctx->progress, i + 1);
 		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 		if (commit)
-			commit->object.flags |= UNINTERESTING;
+			commit->object.flags |= REACHABLE;
 	}
 	stop_progress(&ctx->progress);
 
@@ -1089,7 +1092,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
 		if (commit)
-			commit->object.flags &= ~UNINTERESTING;
+			commit->object.flags &= ~REACHABLE;
 	}
 	stop_progress(&ctx->progress);
 }
diff --git a/commit-reach.c b/commit-reach.c
index 3ea174788a..4ca7e706a1 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -10,7 +10,6 @@
 #include "commit-reach.h"
 
 /* Remember to update object flag allocation in object.h */
-#define REACHABLE       (1u<<15)
 #define PARENT1		(1u<<16)
 #define PARENT2		(1u<<17)
 #define STALE		(1u<<18)
diff --git a/object.h b/object.h
index 0120892bbd..25f5ab3d54 100644
--- a/object.h
+++ b/object.h
@@ -68,7 +68,8 @@ struct object_array {
  * bisect.c:                                        16
  * bundle.c:                                        16
  * http-push.c:                                     16-----19
- * commit-reach.c:                                15-------19
+ * commit-graph.c:                                15
+ * commit-reach.c:                                  16-----19
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
  * builtin/fsck.c:           0--3
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index cc76e1ce50..e16b7b8db2 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -583,7 +583,7 @@ test_expect_success 'fetch.writeCommitGraph' '
 	)
 '
 
-test_expect_failure 'fetch.writeCommitGraph with submodules' '
+test_expect_success 'fetch.writeCommitGraph with submodules' '
 	git clone dups super &&
 	(
 		cd super &&
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug
  2019-10-23 13:01   ` [PATCH v2 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
  2019-10-23 14:18     ` SZEDER Gábor
@ 2019-10-24 12:18     ` SZEDER Gábor
  1 sibling, 0 replies; 23+ messages in thread
From: SZEDER Gábor @ 2019-10-24 12:18 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, johannes.schindelin, peff, Derrick Stolee, Junio C Hamano

On Wed, Oct 23, 2019 at 01:01:34PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index ecabbe1616..e8ae3af0b6 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -583,6 +583,23 @@ test_expect_success 'fetch.writeCommitGraph' '
>  	)
>  '
>  
> +test_expect_failure 'fetch.writeCommitGraph with submodules' '
> +	pwd="$(pwd)" &&
> +	git clone dups super &&
> +	(
> +		cd super &&
> +		git submodule add "file://$pwd/three" &&
> +		git commit -m "add submodule"
> +	) &&
> +	git clone "super" writeError &&
> +	(
> +		cd writeError &&
> +		test_path_is_missing .git/objects/info/commit-graphs/commit-graph-chain &&
> +		git -c fetch.writeCommitGraph=true fetch origin &&
> +		test_path_is_file .git/objects/info/commit-graphs/commit-graph-chain

With the fix applied this test fails when run with
GIT_TEST_COMMIT_GRAPH=1:

  + cd writeError
  + test_path_is_missing .git/objects/info/commit-graphs/commit-graph-chain
  + test -e .git/objects/info/commit-graphs/commit-graph-chain
  + git -c fetch.writeCommitGraph=true fetch origin
  + test_path_is_file .git/objects/info/commit-graphs/commit-graph-chain
  + test -f .git/objects/info/commit-graphs/commit-graph-chain
  + echo File .git/objects/info/commit-graphs/commit-graph-chain doesn't exist.
  File .git/objects/info/commit-graphs/commit-graph-chain doesn't exist.
  + false
  error: last command exited with $?=1

I think GIT_TEST_COMMIT_GRAPH should be unset for both
fetch.writeCommitGraph tests.

> +	)
> +'
> +
>  # configured prune tests
>  
>  set_config_tristate () {
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch
  2019-10-24 12:18   ` [PATCH v3 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
  2019-10-24 12:18     ` [PATCH v3 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
  2019-10-24 12:18     ` [PATCH v3 2/2] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
@ 2019-10-24 13:40     ` Derrick Stolee via GitGitGadget
  2019-10-24 13:40       ` [PATCH v4 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
  2019-10-24 13:40       ` [PATCH v4 2/2] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
  2 siblings, 2 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-10-24 13:40 UTC (permalink / raw)
  To: git; +Cc: johannes.schindelin, peff, szeder.dev, Derrick Stolee, Junio C Hamano

UPDATE for V2: We now know the full repro, and a test is added. Thanks
Szeder and Peff for your insights!

UPDATE in V3: Cleaned up the commit messages and some test details.

UPDATE in V4: There is an unfortunate interaction with GIT_TEST_COMMIT_GRAPH
that requires an update to the test. Sorry for not noticing this earlier.
Thanks, Johannes for pointing it out!

While dogfooding, Johannes found a bug in the fetch.writeCommitGraph config
behavior. While his example initially happened during a clone with
--recurse-submodules, (UPDATE) and the submodule is important, but
--recurse-submodules is not:

$ git clone <url> test
$ cd test
$ git -c fetch.writeCommitGraph=true fetch origin
Computing commit graph generation numbers: 100% (12/12), done.
BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
Aborted (core dumped)

In the repo I had cloned, there were really 60 commits to scan, but only 12
were in the list to write when calling compute_generation_numbers(). A
commit in the list expects to see a parent, but that parent is not in the
list. The simple example I used for my testing was 
https://github.com/derrickstolee/numbers. Thie repo HAS A SUBMODULE, I just
forgot. Sorry for derailing the investigation somewhat.

The details above are the start of the commit message for [PATCH 1/2],
including a test that fails when fetching after cloning a repo with a
submodule.

In [PATCH 2/2], I actually have the fix. I tried to include as much detail
as I could for how I investigated the problem and why I think this is the
right solution. I added details that have come from the on-list discussion,
including what the submodule code is doing and why REACHABLE is no longer
used in commit-reach.c.

Thanks, -Stolee

Derrick Stolee (2):
  t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug
  commit-graph: fix writing first commit-graph during fetch

 commit-graph.c   | 11 +++++++----
 commit-reach.c   |  1 -
 object.h         |  3 ++-
 t/t5510-fetch.sh | 16 ++++++++++++++++
 4 files changed, 25 insertions(+), 6 deletions(-)


base-commit: d966095db01190a2196e31195ea6fa0c722aa732
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-415%2Fderrickstolee%2Ffetch-first-write-fail-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-415/derrickstolee/fetch-first-write-fail-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/415

Range-diff vs v3:

 1:  ce53b5a7bf ! 1:  8150108471 t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug
     @@ -20,7 +20,10 @@
          parent, but that parent is not in the list.
      
          A follow-up will fix the bug, but first we create a test that
     -    demonstrates the problem.
     +    demonstrates the problem. This test must be careful about an existing
     +    commit-graph file, since GIT_TEST_COMMIT_GRAPH=1 will cause the repo we
     +    are cloning to already have one. This then prevents the incremtnal
     +    commit-graph write during the first 'git fetch'.
      
          Helped-by: Jeff King <peff@peff.net>
          Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
     @@ -44,7 +47,7 @@
      +	git clone "super" super-clone &&
      +	(
      +		cd super-clone &&
     -+		test_path_is_missing .git/objects/info/commit-graphs/commit-graph-chain &&
     ++		rm -rf .git/objects/info &&
      +		git -c fetch.writeCommitGraph=true fetch origin &&
      +		test_path_is_file .git/objects/info/commit-graphs/commit-graph-chain
      +	)
 2:  edacfff490 = 2:  6d01e90591 commit-graph: fix writing first commit-graph during fetch

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug
  2019-10-24 13:40     ` [PATCH v4 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
@ 2019-10-24 13:40       ` Derrick Stolee via GitGitGadget
  2019-10-24 13:40       ` [PATCH v4 2/2] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
  1 sibling, 0 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-10-24 13:40 UTC (permalink / raw)
  To: git
  Cc: johannes.schindelin, peff, szeder.dev, Derrick Stolee,
	Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

While dogfooding, Johannes found a bug in the fetch.writeCommitGraph
config behavior. His example initially happened during a clone with
--recurse-submodules, we found that this happens with the first fetch
after cloning a repository that contains a submodule:

	$ git clone <url> test
	$ cd test
	$ git -c fetch.writeCommitGraph=true fetch origin
	Computing commit graph generation numbers: 100% (12/12), done.
	BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
	Aborted (core dumped)

In the repo I had cloned, there were really 60 commits to scan, but
only 12 were in the list to write when calling
compute_generation_numbers(). A commit in the list expects to see a
parent, but that parent is not in the list.

A follow-up will fix the bug, but first we create a test that
demonstrates the problem. This test must be careful about an existing
commit-graph file, since GIT_TEST_COMMIT_GRAPH=1 will cause the repo we
are cloning to already have one. This then prevents the incremtnal
commit-graph write during the first 'git fetch'.

Helped-by: Jeff King <peff@peff.net>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t5510-fetch.sh | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index ecabbe1616..195781ce8d 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -583,6 +583,22 @@ test_expect_success 'fetch.writeCommitGraph' '
 	)
 '
 
+test_expect_failure 'fetch.writeCommitGraph with submodules' '
+	git clone dups super &&
+	(
+		cd super &&
+		git submodule add "file://$TRASH_DIRECTORY/three" &&
+		git commit -m "add submodule"
+	) &&
+	git clone "super" super-clone &&
+	(
+		cd super-clone &&
+		rm -rf .git/objects/info &&
+		git -c fetch.writeCommitGraph=true fetch origin &&
+		test_path_is_file .git/objects/info/commit-graphs/commit-graph-chain
+	)
+'
+
 # configured prune tests
 
 set_config_tristate () {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 2/2] commit-graph: fix writing first commit-graph during fetch
  2019-10-24 13:40     ` [PATCH v4 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
  2019-10-24 13:40       ` [PATCH v4 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
@ 2019-10-24 13:40       ` Derrick Stolee via GitGitGadget
  1 sibling, 0 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-10-24 13:40 UTC (permalink / raw)
  To: git
  Cc: johannes.schindelin, peff, szeder.dev, Derrick Stolee,
	Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The previous commit includes a failing test for an issue around
fetch.writeCommitGraph and fetching in a repo with a submodule. Here, we
fix that bug and set the test to "test_expect_success".

The problem arises with this set of commands when the remote repo at
<url> has a submodule. Note that --recurse-submodules is not needed to
demonstrate the bug.

	$ git clone <url> test
	$ cd test
	$ git -c fetch.writeCommitGraph=true fetch origin
	Computing commit graph generation numbers: 100% (12/12), done.
	BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2>
	Aborted (core dumped)

As an initial fix, I converted the code in builtin/fetch.c that calls
write_commit_graph_reachable() to instead launch a "git commit-graph
write --reachable --split" process. That code worked, but is not how we
want the feature to work long-term.

That test did demonstrate that the issue must be something to do with
internal state of the 'git fetch' process.

The write_commit_graph() method in commit-graph.c ensures the commits we
plan to write are "closed under reachability" using close_reachable().
This method walks from the input commits, and uses the UNINTERESTING
flag to mark which commits have already been visited. This allows the
walk to take O(N) time, where N is the number of commits, instead of
O(P) time, where P is the number of paths. (The number of paths can be
exponential in the number of commits.)

However, the UNINTERESTING flag is used in lots of places in the
codebase. This flag usually means some barrier to stop a commit walk,
such as in revision-walking to compare histories. It is not often
cleared after the walk completes because the starting points of those
walks do not have the UNINTERESTING flag, and clear_commit_marks() would
stop immediately.

This is happening during a 'git fetch' call with a remote. The fetch
negotiation is comparing the remote refs with the local refs and marking
some commits as UNINTERESTING.

I tested running clear_commit_marks_many() to clear the UNINTERESTING
flag inside close_reachable(), but the tips did not have the flag, so
that did nothing.

It turns out that the calculate_changed_submodule_paths() method is at
fault. Thanks, Peff, for pointing out this detail! More specifically,
for each submodule, the collect_changed_submodules() runs a revision
walk to essentially do file-history on the list of submodules. That
revision walk marks commits UNININTERESTING if they are simplified away
by not changing the submodule.

Instead, I finally arrived on the conclusion that I should use a flag
that is not used in any other part of the code. In commit-reach.c, a
number of flags were defined for commit walk algorithms. The REACHABLE
flag seemed like it made the most sense, and it seems it was not
actually used in the file. The REACHABLE flag was used in early versions
of commit-reach.c, but was removed by 4fbcca4 (commit-reach: make
can_all_from_reach... linear, 2018-07-20).

Add the REACHABLE flag to commit-graph.c and use it instead of
UNINTERESTING in close_reachable(). This fixes the bug in manual
testing.

Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c   | 11 +++++++----
 commit-reach.c   |  1 -
 object.h         |  3 ++-
 t/t5510-fetch.sh |  2 +-
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index fc4a43b8d6..0aea7b2ae5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -41,6 +41,9 @@
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
 			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
+/* Remember to update object flag allocation in object.h */
+#define REACHABLE       (1u<<15)
+
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	char *filename = xstrfmt("%s/info/commit-graph", obj_dir);
@@ -1030,11 +1033,11 @@ static void add_missing_parents(struct write_commit_graph_context *ctx, struct c
 {
 	struct commit_list *parent;
 	for (parent = commit->parents; parent; parent = parent->next) {
-		if (!(parent->item->object.flags & UNINTERESTING)) {
+		if (!(parent->item->object.flags & REACHABLE)) {
 			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
 			oidcpy(&ctx->oids.list[ctx->oids.nr], &(parent->item->object.oid));
 			ctx->oids.nr++;
-			parent->item->object.flags |= UNINTERESTING;
+			parent->item->object.flags |= REACHABLE;
 		}
 	}
 }
@@ -1052,7 +1055,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 		display_progress(ctx->progress, i + 1);
 		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 		if (commit)
-			commit->object.flags |= UNINTERESTING;
+			commit->object.flags |= REACHABLE;
 	}
 	stop_progress(&ctx->progress);
 
@@ -1089,7 +1092,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
 		if (commit)
-			commit->object.flags &= ~UNINTERESTING;
+			commit->object.flags &= ~REACHABLE;
 	}
 	stop_progress(&ctx->progress);
 }
diff --git a/commit-reach.c b/commit-reach.c
index 3ea174788a..4ca7e706a1 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -10,7 +10,6 @@
 #include "commit-reach.h"
 
 /* Remember to update object flag allocation in object.h */
-#define REACHABLE       (1u<<15)
 #define PARENT1		(1u<<16)
 #define PARENT2		(1u<<17)
 #define STALE		(1u<<18)
diff --git a/object.h b/object.h
index 0120892bbd..25f5ab3d54 100644
--- a/object.h
+++ b/object.h
@@ -68,7 +68,8 @@ struct object_array {
  * bisect.c:                                        16
  * bundle.c:                                        16
  * http-push.c:                                     16-----19
- * commit-reach.c:                                15-------19
+ * commit-graph.c:                                15
+ * commit-reach.c:                                  16-----19
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
  * builtin/fsck.c:           0--3
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 195781ce8d..4b60282689 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -583,7 +583,7 @@ test_expect_success 'fetch.writeCommitGraph' '
 	)
 '
 
-test_expect_failure 'fetch.writeCommitGraph with submodules' '
+test_expect_success 'fetch.writeCommitGraph with submodules' '
 	git clone dups super &&
 	(
 		cd super &&
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/2] commit-graph: fix writing first commit-graph during fetch
  2019-10-24 10:39       ` Derrick Stolee
@ 2019-10-30 14:31         ` SZEDER Gábor
  0 siblings, 0 replies; 23+ messages in thread
From: SZEDER Gábor @ 2019-10-30 14:31 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, johannes.schindelin, peff,
	Derrick Stolee, Junio C Hamano

On Thu, Oct 24, 2019 at 06:39:51AM -0400, Derrick Stolee wrote:
> >> Instead, I finally arrived on the conclusion that I should use a flag
> >> that is not used in any other part of the code. In commit-reach.c, a
> >> number of flags were defined for commit walk algorithms. The REACHABLE
> >> flag seemed like it made the most sense, and it seems it was not
> >> actually used in the file. The REACHABLE flag was used in early versions
> >> of commit-reach.c, but was removed by 4fbcca4 (commit-reach: make
> >> can_all_from_reach... linear, 2018-07-20).
> >>
> >> Add the REACHABLE flag to commit-graph.c and use it instead of
> >> UNINTERESTING in close_reachable(). This fixes the bug in manual
> >> testing.
> > 
> > I'm inclined to agree that using a flag that is not used anywhere else
> > is the safest thing to do, and at -rcX time safest is good.  I'm not
> > sure whether it's the right thing to do in the long term, though.
> > 
> > Furthermore, calling this flag REACHABLE is misleading, because the
> > code actually means SEEN.
> > Consider the following sequence of commands:
> > 
> >   # Create a pack with two commits
> >   $ git commit --allow-empty -m one &&
> >   $ git commit --allow-empty -m two &&
> >   $ git repack -ad &&
> >   # Make one of those commits unreachable
> >   $ git reset --hard HEAD^ &&
> >   # Not even from reflogs!
> >   $ git reflog expire --expire-unreachable=now --all
> >   # Now write a commit-graph from that pack file
> >   $ git commit-graph write
> >   Computing commit graph generation numbers: 100% (2/2), done.
> > 
> > It added two commits to the commit-graph, although one of them is
> > clearly not reachable anymore, so marking it as REACHABLE while
> > enumerating all commits feels wrong.
> 
> Since you are using "git commit-graph write", the command is scanning
> all pack-files for commits to include. Even in this case, the
> close_reachable() method needs to walk to see if any commits are missing.
> (It could be that the root commit is loose for some strange reason.)
> 
> In this case, we are marking REACHABLE the commits that can be reached
> from our "starting" commits. In your example we start with every commit.

That's exactly my point.  fsck already uses the REACHABLE flag to mark
objects that are reachable not only from all refs (including the HEADs
of all worktrees), but their reflogs and the index as well.

However, in close_unreachable() we start with a bunch of commits and
then go into a loop adding their parents to an array, while marking
each parent to prevent them from being added multiple times in a mergy
history.  We have a good couple of similar traversals in our code
base, and in revision.c, builtin/describe.c, walker.c,
negotiator/default.c (and at this point I stopped looking) we mark
those parents as SEEN.

On a somewhat related note: 'git commit-graph write --reachable'
doesn't include HEAD; should it?

> If you had used `git commit-graph write --stdin-packs` and provided a
> small pack name over stdin, the concept would be similar and even more
> pronounced: the pack (perhaps downloaded via 'fetch') is not likely to
> contain every commit, so we need to walk all reachable commits from
> those included.
> 
> I'll have a v3 today with the requested fixes.
> 
> Thanks,
> -Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-10-30 14:31 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-22 17:28 [PATCH 0/1] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
2019-10-22 17:28 ` [PATCH 1/1] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
2019-10-22 20:33   ` Jeff King
2019-10-22 21:45     ` Jeff King
2019-10-22 23:35       ` SZEDER Gábor
2019-10-23  0:35         ` Derrick Stolee
2019-10-23  0:48           ` Jeff King
2019-10-23  1:22             ` Jeff King
2019-10-23 13:01 ` [PATCH v2 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
2019-10-23 13:01   ` [PATCH v2 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
2019-10-23 14:18     ` SZEDER Gábor
2019-10-23 20:46       ` Derrick Stolee
2019-10-24 12:18     ` SZEDER Gábor
2019-10-23 13:01   ` [PATCH v2 2/2] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
2019-10-23 15:04     ` SZEDER Gábor
2019-10-24 10:39       ` Derrick Stolee
2019-10-30 14:31         ` SZEDER Gábor
2019-10-24 12:18   ` [PATCH v3 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
2019-10-24 12:18     ` [PATCH v3 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
2019-10-24 12:18     ` [PATCH v3 2/2] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget
2019-10-24 13:40     ` [PATCH v4 0/2] [v2.24.0-rc0 BUG] fetch.writeCommitGraph fails on first fetch Derrick Stolee via GitGitGadget
2019-10-24 13:40       ` [PATCH v4 1/2] t5510-fetch.sh: demonstrate fetch.writeCommitGraph bug Derrick Stolee via GitGitGadget
2019-10-24 13:40       ` [PATCH v4 2/2] commit-graph: fix writing first commit-graph during fetch Derrick Stolee via GitGitGadget

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).