All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Derrick Stolee <derrickstolee@github.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, me@ttaylorr.com, gitster@pobox.com,
	abhishekkumar8222@gmail.com
Subject: Re: [PATCH 3/7] commit-graph: start parsing generation v2 (again)
Date: Thu, 3 Mar 2022 12:19:32 +0100	[thread overview]
Message-ID: <YiCkRLFxn8Pok7Kc@ncase> (raw)
In-Reply-To: <db5eb248-1b54-9f24-8f8a-28c19a1eee6a@github.com>

[-- Attachment #1: Type: text/plain, Size: 11805 bytes --]

On Wed, Mar 02, 2022 at 09:57:17AM -0500, Derrick Stolee wrote:
> On 3/2/2022 8:57 AM, Patrick Steinhardt wrote:
> > On Tue, Mar 01, 2022 at 10:25:46AM -0500, Derrick Stolee wrote:
> >> On 3/1/2022 9:53 AM, Patrick Steinhardt wrote:
> 
> >>> Hum. I have re-verified, and this indeed seems to play out. So I must've
> >>> accidentally ran all my testing with the state generated without the
> >>> final patch which fixes the corruption. I do see lots of the following
> >>> warnings, but overall I can verify and write the commit-graph just fine:
> >>>
> >>>     commit-graph generation for commit c80a42de8803e2d77818d0c82f88e748d7f9425f is 1623362063 < 1623362139
> >>
> >> But I'm not able to generate these warnings from either version. I
> >> tried generating different levels of a split commit-graph, but
> >> could not reproduce it. If you have reproduction steps using current
> >> 'master' (or any released Git version) and the four patches here,
> >> then I would love to get a full understanding of your errors.
> >>
> >> Thanks,
> >> -Stolee
> > 
> > I haven't yet been able to reproduce it with publicly available data,
> > but with the internal references I'm able to evoke the warnings
> > reliably. It only works when I have two repositories connected via
> > alternates, when generating the commit-graph in the linked-to repo
> > first, and then generating the commit-graph in the linking repo.
> > 
> > The following recipe allows me to reproduce, but rely on private data:
> > 
> >     $ git --version
> >     git version 2.35.1
> > 
> >     # The pool repository is the one we're linked to from the fork.
> >     $ cd "$pool"
> >     $ rm -rf objects/info/commit-graph objects/info/commit-graph
> >     $ git commit-graph write --split
> > 
> >     $ cd "$fork"
> >     $ rm -rf objects/info/commit-graph objects/info/commit-graph
> >     $ git commit-graph write --split
> > 
> >     $ git commit-graph verify --no-progress
> >     $ echo $?
> >     0
> > 
> >     # This is 715d08a9e51251ad8290b181b6ac3b9e1f9719d7 with your full v2
> >     # applied on top.
> >     $ ~/Development/git/bin-wrappers/git --version
> >     git version 2.35.1.358.g7ede1bea24
> > 
> >     $ ~/Development/git/bin-wrappers/git commit-graph verify --no-progress
> >     commit-graph generation for commit 06a91bac00ed11128becd48d5ae77eacd8f24c97 is 1623273624 < 1623273710
> >     commit-graph generation for commit 0ae91029f27238e8f8e109c6bb3907f864dda14f is 1622151146 < 1622151220
> >     commit-graph generation for commit 0d4582a33d8c8e3eb01adbf564f5e1deeb3b56a2 is 1631045222 < 1631045225
> >     commit-graph generation for commit 0daf8976439d7e0bb9710c5ee63b570580e0dc03 is 1620347739 < 1620347789
> >     commit-graph generation for commit 0e0ee8ffb3fa22cee7d28e21cbd6df26454932cf is 1623783297 < 1623783380
> >     commit-graph generation for commit 0f08ab3de6ec115ea8a956a1996cb9759e640e74 is 1621543278 < 1621543339
> >     commit-graph generation for commit 133ed0319b5a66ae0c2be76e5a887b880452b111 is 1620949864 < 1620949915
> >     commit-graph generation for commit 1341b3e6c63343ae94a8a473fa057126ddd4669a is 1637344364 < 1637344384
> >     commit-graph generation for commit 15bdfc501c2c9f23e9353bf6e6a5facd9c32a07a is 1623348103 < 1623348133
> >     ...
> >     $ echo $?
> >     1
> > 
> > When generating commit-graphs with your patches applied the `verify`
> > step works alright.
> > 
> > I've also by accident stumbled over the original error again:
> > 
> >     fatal: commit-graph requires overflow generation data but has none
> > 
> > This time it's definitely not caused by generating commit-graphs with an
> > in-between state of your patch series because the data comes straight
> > from production with no changes to the commit-graphs performed by
> > myself. There we're running Git v2.33.1 with a couple of backported
> > patches (see [1]). While those patches cause us to make more use of the
> > commit-graph, none modify the way we generate them.
> > 
> > Of note is that the commit-graph contains references to commits which
> > don't exist in the ODB anymore.
> > 
> > Patrick
> > 
> > [1]: https://gitlab.com/gitlab-org/gitlab-git/-/commits/pks-v2.33.1.gl3
> 
> Thank you for your diligence here, Patrick. I really appreciate the
> work you're putting in to verify the situation.
> 
> Since our repro relies on private information, but is consistent, I
> wonder if we should take the patch below, which starts to ignore the
> older generation number v2 data and only writes freshly-computed
> numbers.
> 
> Thanks,
> -Stolee

Thanks. With your patch below the `fatal:` error is gone, but I'm still
seeing the same errors with regards to the commit-graph generations.

So to summarize my findings:

    - This bug occurs when writing commit-graphs with v2.35.1, but
      reading them with your patches.

    - This bug occurs when I have two repositories connected via an
      alternates file. I haven't yet been able to reproduce it in a
      single repository that is not connected to a separate ODB.

    - This bug only occurs when I first generate the commit-graph in the
      repository I'm borrowing objects from.

    - This bug only occurs when I write commit-graphs with `--split` in
      both repositories. "Normal" commit-graphs don't have this issue,
      and neither can I see it with `--split=replace` or mixed-type
      commit-graphs.

Beware, the following explanation is based on my very basic
understanding of the commit-graph code and thus more likely to be wrong
than right:

With the old Git version, we've been mis-parsing the generation because
`read_generation_data` wasn't ever set. As a result it can happen that
the second split commit-graph we're generating computes its own
generation numbers from the wrong starting point because it uses the
mis-parsed generation numbers from the parent commit-graph.

With your patches, we start to correctly account for overflows and would
thus end up with a different value for the generation depending on where
we parse the commit from: if we parse it from the first commit-graph it
would be correct because it's contains the "root" of the generation
numbers. But if we parse a commit from the second commit-graph we may
have a mismatch because the generation numbers in there may have been
derived from generation numbers mis-parsed from the first commit-graph.
And because these would be wrong in case there was an overflow it is
clear that the new corrected generation number may be wrong, as well.

Patrick

> --- 8< ---
> 
> From c53d8bd52bbcab3862e8a826ee75692edc7e4173 Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <derrickstolee@github.com>
> Date: Wed, 2 Mar 2022 09:45:13 -0500
> Subject: [PATCH v3 5/4] commit-graph: declare bankruptcy on GDAT chunks
> 
> The Generation Data (GDAT) and Generation Data Overflow (GDOV) chunks
> store corrected commit date offsets, used for generation number v2.
> Recent changes have demonstrated that previous versions of Git were
> incorrectly parsing data from these chunks, but might have also been
> writing them incorrectly.
> 
> I asserted [1] that the previous fixes were sufficient because the known
> reasons for incorrectly writing generation number v2 data relied on
> parsing the information incorrectly out of a commit-graph file, but the
> previous versions of Git were not reading the generation number v2 data.
> 
> However, Patrick demonstrated [2] a case where in split commit-graphs
> across an alternate boundary (and possibly some other special
> conditions) it was possible to have a commit-graph that was generated by
> a previous version of Git have incorrect generation number v2 data which
> results in errors like the following:
> 
>   commit-graph generation for commit <oid> is 1623273624 < 1623273710
> 
> [1] https://lore.kernel.org/git/f50e74f0-9ffa-f4f2-4663-269801495ed3@github.com/
> [2] https://lore.kernel.org/git/Yh93vOkt2DkrGPh2@ncase/
> 
> Clearly, there is something else going on. The situation is not
> completely understood, but the errors do not reproduce if the
> commit-graphs are all generated by a Git version including these recent
> fixes.
> 
> If we cannot trust the existing data in the GDAT and GDOV chunks, then
> we can alter the format to change the chunk IDs for these chunks. This
> causes the new version of Git to silently ignore the older chunks (and
> disabling generation number v2 in the process) while writing new
> commit-graph files with correct data in the GDA2 and GDO2 chunks.
> 
> Update commit-graph-format.txt including a historical note about these
> deprecated chunks.
> 
> Reported-by: Patrick Steinhardt <ps@pks.im>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  Documentation/technical/commit-graph-format.txt | 12 ++++++++++--
>  commit-graph.c                                  |  4 ++--
>  2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
> index 87971c27dd7..484b185ba98 100644
> --- a/Documentation/technical/commit-graph-format.txt
> +++ b/Documentation/technical/commit-graph-format.txt
> @@ -93,7 +93,7 @@ CHUNK DATA:
>        2 bits of the lowest byte, storing the 33rd and 34th bit of the
>        commit time.
>  
> -  Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional]
> +  Generation Data (ID: {'G', 'D', 'A', '2' }) (N * 4 bytes) [Optional]
>      * This list of 4-byte values store corrected commit date offsets for the
>        commits, arranged in the same order as commit data chunk.
>      * If the corrected commit date offset cannot be stored within 31 bits,
> @@ -104,7 +104,7 @@ CHUNK DATA:
>        by compatible versions of Git and in case of split commit-graph chains,
>        the topmost layer also has Generation Data chunk.
>  
> -  Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional]
> +  Generation Data Overflow (ID: {'G', 'D', 'O', '2' }) [Optional]
>      * This list of 8-byte values stores the corrected commit date offsets
>        for commits with corrected commit date offsets that cannot be
>        stored within 31 bits.
> @@ -156,3 +156,11 @@ CHUNK DATA:
>  TRAILER:
>  
>  	H-byte HASH-checksum of all of the above.
> +
> +== Historical Notes:
> +
> +The Generation Data (GDA2) and Generation Data Overflow (GDO2) chunks have
> +the number '2' in their chunk IDs because a previous version of Git wrote
> +possibly erroneous data in these chunks with the IDs "GDAT" and "GDOV". By
> +changing the IDs, newer versions of Git will silently ignore those older
> +chunks and write the new information without trusting the incorrect data.
> diff --git a/commit-graph.c b/commit-graph.c
> index b86a6a634fe..fb2ced0bd6d 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -39,8 +39,8 @@ void git_test_write_commit_graph_or_die(void)
>  #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
>  #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
>  #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
> -#define GRAPH_CHUNKID_GENERATION_DATA 0x47444154 /* "GDAT" */
> -#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f56 /* "GDOV" */
> +#define GRAPH_CHUNKID_GENERATION_DATA 0x47444132 /* "GDA2" */
> +#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f32 /* "GDO2" */
>  #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */
>  #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */
>  #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */
> -- 
> 2.35.1.138.gfc5de29e9e6
> 
> 
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2022-03-03 11:19 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-24 20:38 [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 1/7] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 2/7] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-24 22:15   ` Junio C Hamano
2022-02-25 13:51     ` Derrick Stolee
2022-02-25 17:35       ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 3/7] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:18   ` Patrick Steinhardt
2022-02-28 16:23     ` Derrick Stolee
2022-02-28 16:59       ` Patrick Steinhardt
2022-02-28 18:44         ` Derrick Stolee
2022-03-01  9:46           ` Patrick Steinhardt
2022-03-01 10:35             ` Patrick Steinhardt
2022-03-01 14:06               ` Derrick Stolee
2022-03-01 14:53                 ` Patrick Steinhardt
2022-03-01 15:25                   ` Derrick Stolee
2022-03-02 13:57                     ` Patrick Steinhardt
2022-03-02 14:57                       ` Derrick Stolee
2022-03-02 18:15                         ` Junio C Hamano
2022-03-02 18:46                           ` Derrick Stolee
2022-03-02 22:42                             ` Junio C Hamano
2022-03-03 11:19                         ` Patrick Steinhardt [this message]
2022-03-03 16:00                           ` Derrick Stolee
2022-03-04 14:03                             ` Derrick Stolee
2022-03-07 10:34                               ` Patrick Steinhardt
2022-03-07 13:45                                 ` Derrick Stolee
2022-03-07 17:22                                   ` Junio C Hamano
2022-03-10 13:58                                   ` Patrick Steinhardt
2022-03-10 17:18                                     ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 4/7] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-24 22:35   ` Junio C Hamano
2022-02-25 13:53     ` Derrick Stolee
2022-02-25 17:38       ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 5/7] commit-graph: document file format v2 Derrick Stolee via GitGitGadget
2022-02-24 22:55   ` Junio C Hamano
2022-02-25 22:31   ` Ævar Arnfjörð Bjarmason
2022-02-28 13:44     ` Derrick Stolee
2022-02-28 14:27       ` Ævar Arnfjörð Bjarmason
2022-02-28 16:39         ` Derrick Stolee
2022-02-28 21:14           ` Ævar Arnfjörð Bjarmason
2022-03-01 14:19             ` Derrick Stolee
2022-03-01 14:29               ` Ævar Arnfjörð Bjarmason
2022-03-01 15:59                 ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 6/7] commit-graph: parse " Derrick Stolee via GitGitGadget
2022-02-24 23:01   ` Junio C Hamano
2022-02-25 13:54     ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 7/7] commit-graph: write " Derrick Stolee via GitGitGadget
2022-02-24 21:42 ` [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Junio C Hamano
2022-02-24 23:06   ` Junio C Hamano
2022-02-25 13:55     ` Derrick Stolee
2022-02-28 13:53 ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Derrick Stolee via GitGitGadget
2022-02-28 13:53   ` [PATCH v2 1/4] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-28 15:22     ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53   ` [PATCH v2 2/4] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-28 15:25     ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53   ` [PATCH v2 3/4] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:30     ` Ævar Arnfjörð Bjarmason
2022-02-28 16:43       ` Derrick Stolee
2022-02-28 13:53   ` [PATCH v2 4/4] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-28 15:40     ` Ævar Arnfjörð Bjarmason
2022-03-01 17:23   ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Ævar Arnfjörð Bjarmason
2022-03-01 19:48   ` [PATCH v3 0/5] " Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 1/5] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 2/5] t5318: extract helpers to lib-commit-graph.sh Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 3/5] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-03-01 20:13       ` Junio C Hamano
2022-03-01 20:30         ` Junio C Hamano
2022-03-02 14:13           ` Derrick Stolee
2022-03-01 19:48     ` [PATCH v3 4/5] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 5/5] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YiCkRLFxn8Pok7Kc@ncase \
    --to=ps@pks.im \
    --cc=abhishekkumar8222@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.