git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: commit-graph overflow generation chicken and egg
Date: Wed, 8 Jun 2022 16:08:03 -0400	[thread overview]
Message-ID: <581c7ef2-3de4-eb8a-bfbb-d4bca3522a2d@github.com> (raw)
In-Reply-To: <YqD5dgalb9EPnz85@coredump.intra.peff.net>

On 6/8/2022 3:33 PM, Jeff King wrote:
> I hadn't touched my git.git repository for a while, so I upgraded to the
> most recent version of Git (v2.36.1) and was met with this:
> 
>   $ git rev-list --all
>   fatal: commit-graph requires overflow generation data but has none
> 
> Not very friendly, but OK, maybe my commit graph is out of date. Let's
> regenerate it:
> 
>   $ git gc
>   fatal: commit-graph requires overflow generation data but has none
>   fatal: failed to run repack
> 
> OK, we can't get far enough in the gc to rebuild the commit graph. Let's
> try doing it manually:
> 
>   $ git commit-graph write --reachable
>   fatal: commit-graph requires overflow generation data but has none
>   $ git commit-graph write
>   fatal: commit-graph requires overflow generation data but has none
> 
> Yikes. Here's where it happens within the write process:
> 
>   $ GIT_PROGRESS_DELAY=0 git commit-graph write
>   Finding commits for commit graph among packed objects: 100% (360229/360229), done.
>   Loading known commits in commit graph: 100% (78366/78366), done.
>   Expanding reachable commits in commit graph: 78366, done.
>   Clearing commit marks in commit graph: 100% (78366/78366), done.
>   Finding extra edges in commit graph: 100% (78366/78366), done.
>   Computing commit graph topological levels: 100% (78366/78366), done.
>   Computing commit graph generation numbers: 100% (78366/78366), done.
>   fatal: commit-graph requires overflow generation data but has none
> 
> Now being the enterprising fellow that I am, I was able to get out of it
> like this:
> 
>   $ rm -f objects/info/commit-graph
>   $ git gc
> 
> But I wonder if this is a foot-gun waiting for some other user. I'm not
> sure how I got into the broken state exactly. The repo was last touched
> in December using a version of Git running 'next'. It worked fine with
> versions of Git prior to 6dbf4b8172 (commit-graph: declare bankruptcy on
> GDAT chunks, 2022-03-02). It's entirely possible that the bad state was
> generated by a version of Git that wasn't ever released, and this isn't
> a problem that normal humans would ever run into. It does feel a bit
> unfriendly that neither gc nor commit-graph could unstick things,
> though. Especially because 6dbf4b8172 says:
> 
>   [...]a previous version of Git wrote possibly erroneous data in these
>   chunks with the IDs "GDAT" and "GDOV". By changing the IDs, newer
>   versions of Git will silently ignore those older chunks[...]

You've done enough homework to discover exactly what's going on here,
including talking about the commit that I was going to point out.

> Presumably we _are_ ignoring those chunks, but some other part of the
> commit-graph file has a dependency on them (and of course we don't have
> the new GDA2/GDO2 chunks to read in their place). If that's true, then
> the solution may be a more graceful "we can't use this commit graph"
> error return rather than the "fatal:" message seen above.
> 
> I have a copy of the broken repo state if anybody would care to look at
> it.

I'd love to see the full binary, but for the sake of sharing on the
list, could you give the following output?

	xxd .git/objects/info/commit-graph | head

or any other command that shows the first few hex bytes along with
their ASCII equivalents. Here is one that used Git 2.34.0:

00000000: 4347 5048 0101 0500 4f49 4446 0000 0000  CGPH....OIDF....
00000010: 0000 0050 4f49 444c 0000 0000 0000 0450  ...POIDL.......P
00000020: 4344 4154 0000 0000 002d cf0c 4744 4154  CDAT.....-..GDAT
00000030: 0000 0000 0080 3bf8 4544 4745 0000 0000  ......;.EDGE....
00000040: 0089 6484 0000 0000 0000 0000 0089 6660  ..d...........f`
00000050: 0000 0246 0000 0498 0000 06c8 0000 0908  ...F............
00000060: 0000 0b4b 0000 0d58 0000 0f9e 0000 1207  ...K...X........
00000070: 0000 1423 0000 168e 0000 18d3 0000 1b36  ...#...........6
00000080: 0000 1d67 0000 1f9b 0000 2219 0000 2456  ...g......"...$V
00000090: 0000 26a2 0000 2908 0000 2b5c 0000 2d96  ..&...)...+\..-.

and here is one with latest master:

00000000: 4347 5048 0101 0500 4f49 4446 0000 0000  CGPH....OIDF....
00000010: 0000 0050 4f49 444c 0000 0000 0000 0450  ...POIDL.......P
00000020: 4344 4154 0000 0000 002e 20b0 4744 4132  CDAT...... .GDA2
00000030: 0000 0000 0081 2090 4544 4745 0000 0000  ...... .EDGE....
00000040: 008a 5970 0000 0000 0000 0000 008a 5b4c  ..Yp..........[L
00000050: 0000 024c 0000 04a1 0000 06d6 0000 091a  ...L............
00000060: 0000 0b60 0000 0d70 0000 0fbb 0000 1229  ...`...p.......)
00000070: 0000 1448 0000 16b6 0000 1905 0000 1b6c  ...H...........l
00000080: 0000 1da3 0000 1fd9 0000 225b 0000 249d  .........."[..$.
00000090: 0000 26f0 0000 2956 0000 2bac 0000 2dea  ..&...)V..+...-.

If you have a GDA2 chunk, then we are somehow incorrectly writing the
offset there. However, I don't see this happening in the latest code.

We have this macro in commit.h:

#define GENERATION_NUMBER_V2_OFFSET_MAX ((1ULL << 31) - 1)

and this in commit-graph.c:

#define CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW (1ULL << 31)

being used here:

static int write_graph_chunk_generation_data(struct hashfile *f,
					     void *data)
{
	struct write_commit_graph_context *ctx = data;
	int i, num_generation_data_overflows = 0;

	for (i = 0; i < ctx->commits.nr; i++) {
		struct commit *c = ctx->commits.list[i];
		timestamp_t offset;
		repo_parse_commit(ctx->r, c);
		offset = commit_graph_data_at(c)->generation - c->date;
		display_progress(ctx->progress, ++ctx->progress_cnt);

		if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) {
			offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows;
			num_generation_data_overflows++;
		}

		hashwrite_be32(f, offset);
	}

	return 0;
}

But this hasn't been changed since eb9071912f (commit-graph: anonymize
data in chunk_write_fn, 2021-02-05) and 90cb1c47c7 (commit-graph: always
parse before commit_graph_data_at(), 2021-02-01) which were not meaningfully
different from the first implementation in e8b63005c4 (commit-graph:
implement generation data chunk, 2021-01-16).

However, the lack of the large offset chunk could be due to the bug fixed by
75979d9460 (commit-graph: fix ordering bug in generation numbers,
2022-03-01). Perhaps that was the thing that was missing from your version?

But otherwise, I'm stumped. I'd be very interested to see a repro from a
fresh repository. That is: what situation do we need to be in to write such
an offset without including the large offset chunk?

Thanks,
-Stolee

  reply	other threads:[~2022-06-08 20:08 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-08 19:33 commit-graph overflow generation chicken and egg Jeff King
2022-06-08 20:08 ` Derrick Stolee [this message]
2022-06-08 23:17   ` Jeff King
2022-07-01 12:06     ` Patrick Steinhardt
2022-07-04 10:46       ` Patrick Steinhardt
2022-07-04 20:50         ` Derrick Stolee
2022-07-05 21:03           ` Will Chandler
2022-07-05 22:28             ` Taylor Blau
2022-07-06  8:52               ` Jeff King
2022-07-06  9:11           ` Jeff King
2022-06-09  7:49   ` Ævar Arnfjörð Bjarmason
2022-06-09 15:26     ` Jeff King
2022-06-09 15:39       ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=581c7ef2-3de4-eb8a-bfbb-d4bca3522a2d@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).