git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, me@ttaylorr.com, gitster@pobox.com,
	abhishekkumar8222@gmail.com
Subject: Re: [PATCH 3/7] commit-graph: start parsing generation v2 (again)
Date: Fri, 4 Mar 2022 09:03:15 -0500	[thread overview]
Message-ID: <06ea3190-32d0-c792-0ae9-c5600305f158@github.com> (raw)
In-Reply-To: <33deae83-1afd-1645-82f3-5af14f14094d@github.com>

On 3/3/2022 11:00 AM, Derrick Stolee wrote:
> On 3/3/2022 6:19 AM, Patrick Steinhardt wrote:
>> On Wed, Mar 02, 2022 at 09:57:17AM -0500, Derrick Stolee wrote:
>>> On 3/2/2022 8:57 AM, Patrick Steinhardt wrote:
>>>> On Tue, Mar 01, 2022 at 10:25:46AM -0500, Derrick Stolee wrote:
>>>>> On 3/1/2022 9:53 AM, Patrick Steinhardt wrote:
>>>
>>>>>> Hum. I have re-verified, and this indeed seems to play out. So I must've
>>>>>> accidentally ran all my testing with the state generated without the
>>>>>> final patch which fixes the corruption. I do see lots of the following
>>>>>> warnings, but overall I can verify and write the commit-graph just fine:
>>>>>>
>>>>>>     commit-graph generation for commit c80a42de8803e2d77818d0c82f88e748d7f9425f is 1623362063 < 1623362139
>>>>>
>>>>> But I'm not able to generate these warnings from either version. I
>>>>> tried generating different levels of a split commit-graph, but
>>>>> could not reproduce it. If you have reproduction steps using current
>>>>> 'master' (or any released Git version) and the four patches here,
>>>>> then I would love to get a full understanding of your errors.
>>>>>
>>>>> Thanks,
>>>>> -Stolee
>>>>
>>>> I haven't yet been able to reproduce it with publicly available data,
>>>> but with the internal references I'm able to evoke the warnings
>>>> reliably. It only works when I have two repositories connected via
>>>> alternates, when generating the commit-graph in the linked-to repo
>>>> first, and then generating the commit-graph in the linking repo.
>>>>
>>>> The following recipe allows me to reproduce, but rely on private data:
>>>>
>>>>     $ git --version
>>>>     git version 2.35.1
>>>>
>>>>     # The pool repository is the one we're linked to from the fork.
>>>>     $ cd "$pool"
>>>>     $ rm -rf objects/info/commit-graph objects/info/commit-graph
>>>>     $ git commit-graph write --split
>>>>
>>>>     $ cd "$fork"
>>>>     $ rm -rf objects/info/commit-graph objects/info/commit-graph
>>>>     $ git commit-graph write --split
>>>>
>>>>     $ git commit-graph verify --no-progress
>>>>     $ echo $?
>>>>     0
>>>>
>>>>     # This is 715d08a9e51251ad8290b181b6ac3b9e1f9719d7 with your full v2
>>>>     # applied on top.
>>>>     $ ~/Development/git/bin-wrappers/git --version
>>>>     git version 2.35.1.358.g7ede1bea24
>>>>
>>>>     $ ~/Development/git/bin-wrappers/git commit-graph verify --no-progress
>>>>     commit-graph generation for commit 06a91bac00ed11128becd48d5ae77eacd8f24c97 is 1623273624 < 1623273710
>>>>     commit-graph generation for commit 0ae91029f27238e8f8e109c6bb3907f864dda14f is 1622151146 < 1622151220
>>>>     commit-graph generation for commit 0d4582a33d8c8e3eb01adbf564f5e1deeb3b56a2 is 1631045222 < 1631045225
>>>>     commit-graph generation for commit 0daf8976439d7e0bb9710c5ee63b570580e0dc03 is 1620347739 < 1620347789
>>>>     commit-graph generation for commit 0e0ee8ffb3fa22cee7d28e21cbd6df26454932cf is 1623783297 < 1623783380
>>>>     commit-graph generation for commit 0f08ab3de6ec115ea8a956a1996cb9759e640e74 is 1621543278 < 1621543339
>>>>     commit-graph generation for commit 133ed0319b5a66ae0c2be76e5a887b880452b111 is 1620949864 < 1620949915
>>>>     commit-graph generation for commit 1341b3e6c63343ae94a8a473fa057126ddd4669a is 1637344364 < 1637344384
>>>>     commit-graph generation for commit 15bdfc501c2c9f23e9353bf6e6a5facd9c32a07a is 1623348103 < 1623348133
>>>>     ...
>>>>     $ echo $?
>>>>     1
>>>>
>>>> When generating commit-graphs with your patches applied the `verify`
>>>> step works alright.
>>>>
>>>> I've also by accident stumbled over the original error again:
>>>>
>>>>     fatal: commit-graph requires overflow generation data but has none
>>>>
>>>> This time it's definitely not caused by generating commit-graphs with an
>>>> in-between state of your patch series because the data comes straight
>>>> from production with no changes to the commit-graphs performed by
>>>> myself. There we're running Git v2.33.1 with a couple of backported
>>>> patches (see [1]). While those patches cause us to make more use of the
>>>> commit-graph, none modify the way we generate them.
>>>>
>>>> Of note is that the commit-graph contains references to commits which
>>>> don't exist in the ODB anymore.
>>>>
>>>> Patrick
>>>>
>>>> [1]: https://gitlab.com/gitlab-org/gitlab-git/-/commits/pks-v2.33.1.gl3
>>>
>>> Thank you for your diligence here, Patrick. I really appreciate the
>>> work you're putting in to verify the situation.
>>>
>>> Since our repro relies on private information, but is consistent, I
>>> wonder if we should take the patch below, which starts to ignore the
>>> older generation number v2 data and only writes freshly-computed
>>> numbers.
>>>
>>> Thanks,
>>> -Stolee
>>
>> Thanks. With your patch below the `fatal:` error is gone, but I'm still
>> seeing the same errors with regards to the commit-graph generations.
> 
> This is disappointing and unexpected. Thanks for verifying.
> 
>> So to summarize my findings:
>>
>>     - This bug occurs when writing commit-graphs with v2.35.1, but
>>       reading them with your patches.
>>
>>     - This bug occurs when I have two repositories connected via an
>>       alternates file. I haven't yet been able to reproduce it in a
>>       single repository that is not connected to a separate ODB.
> 
> This is an interesting distinction. One that I didn't think would
> matter, but I'll look into the code to see how that could affect
> things.
> 
>>     - This bug only occurs when I first generate the commit-graph in the
>>       repository I'm borrowing objects from.
>>
>>     - This bug only occurs when I write commit-graphs with `--split` in
>>       both repositories. "Normal" commit-graphs don't have this issue,
>>       and neither can I see it with `--split=replace` or mixed-type
>>       commit-graphs.
>>
>> Beware, the following explanation is based on my very basic
>> understanding of the commit-graph code and thus more likely to be wrong
>> than right:
>>
>> With the old Git version, we've been mis-parsing the generation because
>> `read_generation_data` wasn't ever set. As a result it can happen that
>> the second split commit-graph we're generating computes its own
>> generation numbers from the wrong starting point because it uses the
>> mis-parsed generation numbers from the parent commit-graph.
>>
>> With your patches, we start to correctly account for overflows and would
>> thus end up with a different value for the generation depending on where
>> we parse the commit from: if we parse it from the first commit-graph it
>> would be correct because it's contains the "root" of the generation
>> numbers. But if we parse a commit from the second commit-graph we may
>> have a mismatch because the generation numbers in there may have been
>> derived from generation numbers mis-parsed from the first commit-graph.
>> And because these would be wrong in case there was an overflow it is
>> clear that the new corrected generation number may be wrong, as well.
> 
> Hm. My expectation was that the older layers of the split commit-graph
> would have read_generation_data disabled (because the new Git version
> cannot read the GDAT chunk) and then the validate_mixed_generation_chain()
> method would remove read_generation_data from all of the graphs in the
> list.
> 
> Combining this with your thoughts on cross-alternate split commit-graphs,
> this makes me think we should try this:
> 
> --- >8 ---
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index fb2ced0bd6..74c6534f56 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -609,8 +609,6 @@ struct commit_graph *read_commit_graph_one(struct repository *r,
>  	if (!g)
>  		g = load_commit_graph_chain(r, odb);
>  
> -	validate_mixed_generation_chain(g);
> -
>  	return g;
>  }
>  
> @@ -668,7 +666,13 @@ static int prepare_commit_graph(struct repository *r)
>  	     !r->objects->commit_graph && odb;
>  	     odb = odb->next)
>  		prepare_commit_graph_one(r, odb);
> -	return !!r->objects->commit_graph;
> +
> +	if (r->objects->commit_graph) {
> +		validate_mixed_generation_chain(r->objects->commit_graph);
> +		return 1;
> +	}
> +
> +	return 0;
>  }
>  
>  int generation_numbers_enabled(struct repository *r)
> 
> 
> --- >8 ---
> 
> Notice that I'm moving the validate_mixed_generation_chain() call
> out of read_commit_graph_one() and into prepare_commit_graph(). To
> my understanding, this _should_ have an equivalent end state as the
> old code, but might be worth trying just as a quick check.
> 
> I will continue investigating and try to reproduce with this
> additional constraint of working across an alternate.

My attempts to reproduce this across an alternate have failed. I
tried running the following test against Git without these patches,
then verify with the newer version of Git. (I also have generated
a few new layers on top with these patches, and they correctly drop
the GDA2 and GDO2 chunks when the lower layers "don't have gen v2".)


test_description='commit-graph with offsets across alternates'
. ./test-lib.sh

if ! test_have_prereq TIME_IS_64BIT || ! test_have_prereq TIME_T_IS_64BIT
then
	skip_all='skipping 64-bit timestamp tests'
	test_done
fi


UNIX_EPOCH_ZERO="@0 +0000"
FUTURE_DATE="@4147483646 +0000"

GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0

test_expect_success 'generate alternate split commit-graph' '
	git init alternate &&
	(
		cd alternate &&
		test_commit --date "$UNIX_EPOCH_ZERO" 1 &&
		test_commit --date "$FUTURE_DATE" 2 &&
		git commit-graph write --reachable &&
		test_commit --date "$UNIX_EPOCH_ZERO" 3 &&
		test_commit --date "$FUTURE_DATE" 4 &&
		git commit-graph write --reachable --split=no-merge
	) &&
	git clone --shared alternate fork &&
	(
		cd fork &&
		test_commit --date "$UNIX_EPOCH_ZERO" 5 &&
		test_commit --date "$FUTURE_DATE" 6 &&
		git commit-graph write --reachable --split=no-merge &&
		test_commit --date "$UNIX_EPOCH_ZERO" 7 &&
		test_commit --date "$FUTURE_DATE" 8 &&
		git commit-graph write --reachable --split=no-merge
	)
'

test_done


My testing after running this with -d allows me to reliably see these
layers being created with GDAT and GDOV chunks. Running the 'git
commit-graph verify' command with the new code does not show those
errors, even after adding commits and another layer to the split
commit-graph.

I look forward to any additional insights you might have here.

Thanks,
-Stolee

  reply	other threads:[~2022-03-04 14:03 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-24 20:38 [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 1/7] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 2/7] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-24 22:15   ` Junio C Hamano
2022-02-25 13:51     ` Derrick Stolee
2022-02-25 17:35       ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 3/7] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:18   ` Patrick Steinhardt
2022-02-28 16:23     ` Derrick Stolee
2022-02-28 16:59       ` Patrick Steinhardt
2022-02-28 18:44         ` Derrick Stolee
2022-03-01  9:46           ` Patrick Steinhardt
2022-03-01 10:35             ` Patrick Steinhardt
2022-03-01 14:06               ` Derrick Stolee
2022-03-01 14:53                 ` Patrick Steinhardt
2022-03-01 15:25                   ` Derrick Stolee
2022-03-02 13:57                     ` Patrick Steinhardt
2022-03-02 14:57                       ` Derrick Stolee
2022-03-02 18:15                         ` Junio C Hamano
2022-03-02 18:46                           ` Derrick Stolee
2022-03-02 22:42                             ` Junio C Hamano
2022-03-03 11:19                         ` Patrick Steinhardt
2022-03-03 16:00                           ` Derrick Stolee
2022-03-04 14:03                             ` Derrick Stolee [this message]
2022-03-07 10:34                               ` Patrick Steinhardt
2022-03-07 13:45                                 ` Derrick Stolee
2022-03-07 17:22                                   ` Junio C Hamano
2022-03-10 13:58                                   ` Patrick Steinhardt
2022-03-10 17:18                                     ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 4/7] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-24 22:35   ` Junio C Hamano
2022-02-25 13:53     ` Derrick Stolee
2022-02-25 17:38       ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 5/7] commit-graph: document file format v2 Derrick Stolee via GitGitGadget
2022-02-24 22:55   ` Junio C Hamano
2022-02-25 22:31   ` Ævar Arnfjörð Bjarmason
2022-02-28 13:44     ` Derrick Stolee
2022-02-28 14:27       ` Ævar Arnfjörð Bjarmason
2022-02-28 16:39         ` Derrick Stolee
2022-02-28 21:14           ` Ævar Arnfjörð Bjarmason
2022-03-01 14:19             ` Derrick Stolee
2022-03-01 14:29               ` Ævar Arnfjörð Bjarmason
2022-03-01 15:59                 ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 6/7] commit-graph: parse " Derrick Stolee via GitGitGadget
2022-02-24 23:01   ` Junio C Hamano
2022-02-25 13:54     ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 7/7] commit-graph: write " Derrick Stolee via GitGitGadget
2022-02-24 21:42 ` [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Junio C Hamano
2022-02-24 23:06   ` Junio C Hamano
2022-02-25 13:55     ` Derrick Stolee
2022-02-28 13:53 ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Derrick Stolee via GitGitGadget
2022-02-28 13:53   ` [PATCH v2 1/4] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-28 15:22     ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53   ` [PATCH v2 2/4] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-28 15:25     ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53   ` [PATCH v2 3/4] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:30     ` Ævar Arnfjörð Bjarmason
2022-02-28 16:43       ` Derrick Stolee
2022-02-28 13:53   ` [PATCH v2 4/4] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-28 15:40     ` Ævar Arnfjörð Bjarmason
2022-03-01 17:23   ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Ævar Arnfjörð Bjarmason
2022-03-01 19:48   ` [PATCH v3 0/5] " Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 1/5] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 2/5] t5318: extract helpers to lib-commit-graph.sh Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 3/5] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-03-01 20:13       ` Junio C Hamano
2022-03-01 20:30         ` Junio C Hamano
2022-03-02 14:13           ` Derrick Stolee
2022-03-01 19:48     ` [PATCH v3 4/5] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 5/5] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=06ea3190-32d0-c792-0ae9-c5600305f158@github.com \
    --to=derrickstolee@github.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).