From: Derrick Stolee <derrickstolee@github.com>
To: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: commit-graph overflow generation chicken and egg
Date: Wed, 8 Jun 2022 16:08:03 -0400 [thread overview]
Message-ID: <581c7ef2-3de4-eb8a-bfbb-d4bca3522a2d@github.com> (raw)
In-Reply-To: <YqD5dgalb9EPnz85@coredump.intra.peff.net>
On 6/8/2022 3:33 PM, Jeff King wrote:
> I hadn't touched my git.git repository for a while, so I upgraded to the
> most recent version of Git (v2.36.1) and was met with this:
>
> $ git rev-list --all
> fatal: commit-graph requires overflow generation data but has none
>
> Not very friendly, but OK, maybe my commit graph is out of date. Let's
> regenerate it:
>
> $ git gc
> fatal: commit-graph requires overflow generation data but has none
> fatal: failed to run repack
>
> OK, we can't get far enough in the gc to rebuild the commit graph. Let's
> try doing it manually:
>
> $ git commit-graph write --reachable
> fatal: commit-graph requires overflow generation data but has none
> $ git commit-graph write
> fatal: commit-graph requires overflow generation data but has none
>
> Yikes. Here's where it happens within the write process:
>
> $ GIT_PROGRESS_DELAY=0 git commit-graph write
> Finding commits for commit graph among packed objects: 100% (360229/360229), done.
> Loading known commits in commit graph: 100% (78366/78366), done.
> Expanding reachable commits in commit graph: 78366, done.
> Clearing commit marks in commit graph: 100% (78366/78366), done.
> Finding extra edges in commit graph: 100% (78366/78366), done.
> Computing commit graph topological levels: 100% (78366/78366), done.
> Computing commit graph generation numbers: 100% (78366/78366), done.
> fatal: commit-graph requires overflow generation data but has none
>
> Now being the enterprising fellow that I am, I was able to get out of it
> like this:
>
> $ rm -f objects/info/commit-graph
> $ git gc
>
> But I wonder if this is a foot-gun waiting for some other user. I'm not
> sure how I got into the broken state exactly. The repo was last touched
> in December using a version of Git running 'next'. It worked fine with
> versions of Git prior to 6dbf4b8172 (commit-graph: declare bankruptcy on
> GDAT chunks, 2022-03-02). It's entirely possible that the bad state was
> generated by a version of Git that wasn't ever released, and this isn't
> a problem that normal humans would ever run into. It does feel a bit
> unfriendly that neither gc nor commit-graph could unstick things,
> though. Especially because 6dbf4b8172 says:
>
> [...]a previous version of Git wrote possibly erroneous data in these
> chunks with the IDs "GDAT" and "GDOV". By changing the IDs, newer
> versions of Git will silently ignore those older chunks[...]
You've done enough homework to discover exactly what's going on here,
including talking about the commit that I was going to point out.
> Presumably we _are_ ignoring those chunks, but some other part of the
> commit-graph file has a dependency on them (and of course we don't have
> the new GDA2/GDO2 chunks to read in their place). If that's true, then
> the solution may be a more graceful "we can't use this commit graph"
> error return rather than the "fatal:" message seen above.
>
> I have a copy of the broken repo state if anybody would care to look at
> it.
I'd love to see the full binary, but for the sake of sharing on the
list, could you give the following output?
xxd .git/objects/info/commit-graph | head
or any other command that shows the first few hex bytes along with
their ASCII equivalents. Here is one that used Git 2.34.0:
00000000: 4347 5048 0101 0500 4f49 4446 0000 0000 CGPH....OIDF....
00000010: 0000 0050 4f49 444c 0000 0000 0000 0450 ...POIDL.......P
00000020: 4344 4154 0000 0000 002d cf0c 4744 4154 CDAT.....-..GDAT
00000030: 0000 0000 0080 3bf8 4544 4745 0000 0000 ......;.EDGE....
00000040: 0089 6484 0000 0000 0000 0000 0089 6660 ..d...........f`
00000050: 0000 0246 0000 0498 0000 06c8 0000 0908 ...F............
00000060: 0000 0b4b 0000 0d58 0000 0f9e 0000 1207 ...K...X........
00000070: 0000 1423 0000 168e 0000 18d3 0000 1b36 ...#...........6
00000080: 0000 1d67 0000 1f9b 0000 2219 0000 2456 ...g......"...$V
00000090: 0000 26a2 0000 2908 0000 2b5c 0000 2d96 ..&...)...+\..-.
and here is one with latest master:
00000000: 4347 5048 0101 0500 4f49 4446 0000 0000 CGPH....OIDF....
00000010: 0000 0050 4f49 444c 0000 0000 0000 0450 ...POIDL.......P
00000020: 4344 4154 0000 0000 002e 20b0 4744 4132 CDAT...... .GDA2
00000030: 0000 0000 0081 2090 4544 4745 0000 0000 ...... .EDGE....
00000040: 008a 5970 0000 0000 0000 0000 008a 5b4c ..Yp..........[L
00000050: 0000 024c 0000 04a1 0000 06d6 0000 091a ...L............
00000060: 0000 0b60 0000 0d70 0000 0fbb 0000 1229 ...`...p.......)
00000070: 0000 1448 0000 16b6 0000 1905 0000 1b6c ...H...........l
00000080: 0000 1da3 0000 1fd9 0000 225b 0000 249d .........."[..$.
00000090: 0000 26f0 0000 2956 0000 2bac 0000 2dea ..&...)V..+...-.
If you have a GDA2 chunk, then we are somehow incorrectly writing the
offset there. However, I don't see this happening in the latest code.
We have this macro in commit.h:
#define GENERATION_NUMBER_V2_OFFSET_MAX ((1ULL << 31) - 1)
and this in commit-graph.c:
#define CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW (1ULL << 31)
being used here:
static int write_graph_chunk_generation_data(struct hashfile *f,
void *data)
{
struct write_commit_graph_context *ctx = data;
int i, num_generation_data_overflows = 0;
for (i = 0; i < ctx->commits.nr; i++) {
struct commit *c = ctx->commits.list[i];
timestamp_t offset;
repo_parse_commit(ctx->r, c);
offset = commit_graph_data_at(c)->generation - c->date;
display_progress(ctx->progress, ++ctx->progress_cnt);
if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) {
offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows;
num_generation_data_overflows++;
}
hashwrite_be32(f, offset);
}
return 0;
}
But this hasn't been changed since eb9071912f (commit-graph: anonymize
data in chunk_write_fn, 2021-02-05) and 90cb1c47c7 (commit-graph: always
parse before commit_graph_data_at(), 2021-02-01) which were not meaningfully
different from the first implementation in e8b63005c4 (commit-graph:
implement generation data chunk, 2021-01-16).
However, the lack of the large offset chunk could be due to the bug fixed by
75979d9460 (commit-graph: fix ordering bug in generation numbers,
2022-03-01). Perhaps that was the thing that was missing from your version?
But otherwise, I'm stumped. I'd be very interested to see a repro from a
fresh repository. That is: what situation do we need to be in to write such
an offset without including the large offset chunk?
Thanks,
-Stolee
next prev parent reply other threads:[~2022-06-08 20:08 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-08 19:33 commit-graph overflow generation chicken and egg Jeff King
2022-06-08 20:08 ` Derrick Stolee [this message]
2022-06-08 23:17 ` Jeff King
2022-07-01 12:06 ` Patrick Steinhardt
2022-07-04 10:46 ` Patrick Steinhardt
2022-07-04 20:50 ` Derrick Stolee
2022-07-05 21:03 ` Will Chandler
2022-07-05 22:28 ` Taylor Blau
2022-07-06 8:52 ` Jeff King
2022-07-06 9:11 ` Jeff King
2022-06-09 7:49 ` Ævar Arnfjörð Bjarmason
2022-06-09 15:26 ` Jeff King
2022-06-09 15:39 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=581c7ef2-3de4-eb8a-bfbb-d4bca3522a2d@github.com \
--to=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).