All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, me@ttaylorr.com, gitster@pobox.com,
	abhishekkumar8222@gmail.com
Subject: Re: [PATCH 3/7] commit-graph: start parsing generation v2 (again)
Date: Fri, 4 Mar 2022 09:03:15 -0500	[thread overview]
Message-ID: <06ea3190-32d0-c792-0ae9-c5600305f158@github.com> (raw)
In-Reply-To: <33deae83-1afd-1645-82f3-5af14f14094d@github.com>

On 3/3/2022 11:00 AM, Derrick Stolee wrote:
> On 3/3/2022 6:19 AM, Patrick Steinhardt wrote:
>> On Wed, Mar 02, 2022 at 09:57:17AM -0500, Derrick Stolee wrote:
>>> On 3/2/2022 8:57 AM, Patrick Steinhardt wrote:
>>>> On Tue, Mar 01, 2022 at 10:25:46AM -0500, Derrick Stolee wrote:
>>>>> On 3/1/2022 9:53 AM, Patrick Steinhardt wrote:
>>>
>>>>>> Hum. I have re-verified, and this indeed seems to play out. So I must've
>>>>>> accidentally ran all my testing with the state generated without the
>>>>>> final patch which fixes the corruption. I do see lots of the following
>>>>>> warnings, but overall I can verify and write the commit-graph just fine:
>>>>>>
>>>>>>     commit-graph generation for commit c80a42de8803e2d77818d0c82f88e748d7f9425f is 1623362063 < 1623362139
>>>>>
>>>>> But I'm not able to generate these warnings from either version. I
>>>>> tried generating different levels of a split commit-graph, but
>>>>> could not reproduce it. If you have reproduction steps using current
>>>>> 'master' (or any released Git version) and the four patches here,
>>>>> then I would love to get a full understanding of your errors.
>>>>>
>>>>> Thanks,
>>>>> -Stolee
>>>>
>>>> I haven't yet been able to reproduce it with publicly available data,
>>>> but with the internal references I'm able to evoke the warnings
>>>> reliably. It only works when I have two repositories connected via
>>>> alternates, when generating the commit-graph in the linked-to repo
>>>> first, and then generating the commit-graph in the linking repo.
>>>>
>>>> The following recipe allows me to reproduce, but rely on private data:
>>>>
>>>>     $ git --version
>>>>     git version 2.35.1
>>>>
>>>>     # The pool repository is the one we're linked to from the fork.
>>>>     $ cd "$pool"
>>>>     $ rm -rf objects/info/commit-graph objects/info/commit-graph
>>>>     $ git commit-graph write --split
>>>>
>>>>     $ cd "$fork"
>>>>     $ rm -rf objects/info/commit-graph objects/info/commit-graph
>>>>     $ git commit-graph write --split
>>>>
>>>>     $ git commit-graph verify --no-progress
>>>>     $ echo $?
>>>>     0
>>>>
>>>>     # This is 715d08a9e51251ad8290b181b6ac3b9e1f9719d7 with your full v2
>>>>     # applied on top.
>>>>     $ ~/Development/git/bin-wrappers/git --version
>>>>     git version 2.35.1.358.g7ede1bea24
>>>>
>>>>     $ ~/Development/git/bin-wrappers/git commit-graph verify --no-progress
>>>>     commit-graph generation for commit 06a91bac00ed11128becd48d5ae77eacd8f24c97 is 1623273624 < 1623273710
>>>>     commit-graph generation for commit 0ae91029f27238e8f8e109c6bb3907f864dda14f is 1622151146 < 1622151220
>>>>     commit-graph generation for commit 0d4582a33d8c8e3eb01adbf564f5e1deeb3b56a2 is 1631045222 < 1631045225
>>>>     commit-graph generation for commit 0daf8976439d7e0bb9710c5ee63b570580e0dc03 is 1620347739 < 1620347789
>>>>     commit-graph generation for commit 0e0ee8ffb3fa22cee7d28e21cbd6df26454932cf is 1623783297 < 1623783380
>>>>     commit-graph generation for commit 0f08ab3de6ec115ea8a956a1996cb9759e640e74 is 1621543278 < 1621543339
>>>>     commit-graph generation for commit 133ed0319b5a66ae0c2be76e5a887b880452b111 is 1620949864 < 1620949915
>>>>     commit-graph generation for commit 1341b3e6c63343ae94a8a473fa057126ddd4669a is 1637344364 < 1637344384
>>>>     commit-graph generation for commit 15bdfc501c2c9f23e9353bf6e6a5facd9c32a07a is 1623348103 < 1623348133
>>>>     ...
>>>>     $ echo $?
>>>>     1
>>>>
>>>> When generating commit-graphs with your patches applied the `verify`
>>>> step works alright.
>>>>
>>>> I've also by accident stumbled over the original error again:
>>>>
>>>>     fatal: commit-graph requires overflow generation data but has none
>>>>
>>>> This time it's definitely not caused by generating commit-graphs with an
>>>> in-between state of your patch series because the data comes straight
>>>> from production with no changes to the commit-graphs performed by
>>>> myself. There we're running Git v2.33.1 with a couple of backported
>>>> patches (see [1]). While those patches cause us to make more use of the
>>>> commit-graph, none modify the way we generate them.
>>>>
>>>> Of note is that the commit-graph contains references to commits which
>>>> don't exist in the ODB anymore.
>>>>
>>>> Patrick
>>>>
>>>> [1]: https://gitlab.com/gitlab-org/gitlab-git/-/commits/pks-v2.33.1.gl3
>>>
>>> Thank you for your diligence here, Patrick. I really appreciate the
>>> work you're putting in to verify the situation.
>>>
>>> Since our repro relies on private information, but is consistent, I
>>> wonder if we should take the patch below, which starts to ignore the
>>> older generation number v2 data and only writes freshly-computed
>>> numbers.
>>>
>>> Thanks,
>>> -Stolee
>>
>> Thanks. With your patch below the `fatal:` error is gone, but I'm still
>> seeing the same errors with regards to the commit-graph generations.
> 
> This is disappointing and unexpected. Thanks for verifying.
> 
>> So to summarize my findings:
>>
>>     - This bug occurs when writing commit-graphs with v2.35.1, but
>>       reading them with your patches.
>>
>>     - This bug occurs when I have two repositories connected via an
>>       alternates file. I haven't yet been able to reproduce it in a
>>       single repository that is not connected to a separate ODB.
> 
> This is an interesting distinction. One that I didn't think would
> matter, but I'll look into the code to see how that could affect
> things.
> 
>>     - This bug only occurs when I first generate the commit-graph in the
>>       repository I'm borrowing objects from.
>>
>>     - This bug only occurs when I write commit-graphs with `--split` in
>>       both repositories. "Normal" commit-graphs don't have this issue,
>>       and neither can I see it with `--split=replace` or mixed-type
>>       commit-graphs.
>>
>> Beware, the following explanation is based on my very basic
>> understanding of the commit-graph code and thus more likely to be wrong
>> than right:
>>
>> With the old Git version, we've been mis-parsing the generation because
>> `read_generation_data` wasn't ever set. As a result it can happen that
>> the second split commit-graph we're generating computes its own
>> generation numbers from the wrong starting point because it uses the
>> mis-parsed generation numbers from the parent commit-graph.
>>
>> With your patches, we start to correctly account for overflows and would
>> thus end up with a different value for the generation depending on where
>> we parse the commit from: if we parse it from the first commit-graph it
>> would be correct because it's contains the "root" of the generation
>> numbers. But if we parse a commit from the second commit-graph we may
>> have a mismatch because the generation numbers in there may have been
>> derived from generation numbers mis-parsed from the first commit-graph.
>> And because these would be wrong in case there was an overflow it is
>> clear that the new corrected generation number may be wrong, as well.
> 
> Hm. My expectation was that the older layers of the split commit-graph
> would have read_generation_data disabled (because the new Git version
> cannot read the GDAT chunk) and then the validate_mixed_generation_chain()
> method would remove read_generation_data from all of the graphs in the
> list.
> 
> Combining this with your thoughts on cross-alternate split commit-graphs,
> this makes me think we should try this:
> 
> --- >8 ---
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index fb2ced0bd6..74c6534f56 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -609,8 +609,6 @@ struct commit_graph *read_commit_graph_one(struct repository *r,
>  	if (!g)
>  		g = load_commit_graph_chain(r, odb);
>  
> -	validate_mixed_generation_chain(g);
> -
>  	return g;
>  }
>  
> @@ -668,7 +666,13 @@ static int prepare_commit_graph(struct repository *r)
>  	     !r->objects->commit_graph && odb;
>  	     odb = odb->next)
>  		prepare_commit_graph_one(r, odb);
> -	return !!r->objects->commit_graph;
> +
> +	if (r->objects->commit_graph) {
> +		validate_mixed_generation_chain(r->objects->commit_graph);
> +		return 1;
> +	}
> +
> +	return 0;
>  }
>  
>  int generation_numbers_enabled(struct repository *r)
> 
> 
> --- >8 ---
> 
> Notice that I'm moving the validate_mixed_generation_chain() call
> out of read_commit_graph_one() and into prepare_commit_graph(). To
> my understanding, this _should_ have an equivalent end state as the
> old code, but might be worth trying just as a quick check.
> 
> I will continue investigating and try to reproduce with this
> additional constraint of working across an alternate.

My attempts to reproduce this across an alternate have failed. I
tried running the following test against Git without these patches,
then verify with the newer version of Git. (I also have generated
a few new layers on top with these patches, and they correctly drop
the GDA2 and GDO2 chunks when the lower layers "don't have gen v2".)


test_description='commit-graph with offsets across alternates'
. ./test-lib.sh

if ! test_have_prereq TIME_IS_64BIT || ! test_have_prereq TIME_T_IS_64BIT
then
	skip_all='skipping 64-bit timestamp tests'
	test_done
fi


UNIX_EPOCH_ZERO="@0 +0000"
FUTURE_DATE="@4147483646 +0000"

GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0

test_expect_success 'generate alternate split commit-graph' '
	git init alternate &&
	(
		cd alternate &&
		test_commit --date "$UNIX_EPOCH_ZERO" 1 &&
		test_commit --date "$FUTURE_DATE" 2 &&
		git commit-graph write --reachable &&
		test_commit --date "$UNIX_EPOCH_ZERO" 3 &&
		test_commit --date "$FUTURE_DATE" 4 &&
		git commit-graph write --reachable --split=no-merge
	) &&
	git clone --shared alternate fork &&
	(
		cd fork &&
		test_commit --date "$UNIX_EPOCH_ZERO" 5 &&
		test_commit --date "$FUTURE_DATE" 6 &&
		git commit-graph write --reachable --split=no-merge &&
		test_commit --date "$UNIX_EPOCH_ZERO" 7 &&
		test_commit --date "$FUTURE_DATE" 8 &&
		git commit-graph write --reachable --split=no-merge
	)
'

test_done


My testing after running this with -d allows me to reliably see these
layers being created with GDAT and GDOV chunks. Running the 'git
commit-graph verify' command with the new code does not show those
errors, even after adding commits and another layer to the split
commit-graph.

I look forward to any additional insights you might have here.

Thanks,
-Stolee

  reply	other threads:[~2022-03-04 14:03 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-24 20:38 [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 1/7] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 2/7] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-24 22:15   ` Junio C Hamano
2022-02-25 13:51     ` Derrick Stolee
2022-02-25 17:35       ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 3/7] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:18   ` Patrick Steinhardt
2022-02-28 16:23     ` Derrick Stolee
2022-02-28 16:59       ` Patrick Steinhardt
2022-02-28 18:44         ` Derrick Stolee
2022-03-01  9:46           ` Patrick Steinhardt
2022-03-01 10:35             ` Patrick Steinhardt
2022-03-01 14:06               ` Derrick Stolee
2022-03-01 14:53                 ` Patrick Steinhardt
2022-03-01 15:25                   ` Derrick Stolee
2022-03-02 13:57                     ` Patrick Steinhardt
2022-03-02 14:57                       ` Derrick Stolee
2022-03-02 18:15                         ` Junio C Hamano
2022-03-02 18:46                           ` Derrick Stolee
2022-03-02 22:42                             ` Junio C Hamano
2022-03-03 11:19                         ` Patrick Steinhardt
2022-03-03 16:00                           ` Derrick Stolee
2022-03-04 14:03                             ` Derrick Stolee [this message]
2022-03-07 10:34                               ` Patrick Steinhardt
2022-03-07 13:45                                 ` Derrick Stolee
2022-03-07 17:22                                   ` Junio C Hamano
2022-03-10 13:58                                   ` Patrick Steinhardt
2022-03-10 17:18                                     ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 4/7] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-24 22:35   ` Junio C Hamano
2022-02-25 13:53     ` Derrick Stolee
2022-02-25 17:38       ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 5/7] commit-graph: document file format v2 Derrick Stolee via GitGitGadget
2022-02-24 22:55   ` Junio C Hamano
2022-02-25 22:31   ` Ævar Arnfjörð Bjarmason
2022-02-28 13:44     ` Derrick Stolee
2022-02-28 14:27       ` Ævar Arnfjörð Bjarmason
2022-02-28 16:39         ` Derrick Stolee
2022-02-28 21:14           ` Ævar Arnfjörð Bjarmason
2022-03-01 14:19             ` Derrick Stolee
2022-03-01 14:29               ` Ævar Arnfjörð Bjarmason
2022-03-01 15:59                 ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 6/7] commit-graph: parse " Derrick Stolee via GitGitGadget
2022-02-24 23:01   ` Junio C Hamano
2022-02-25 13:54     ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 7/7] commit-graph: write " Derrick Stolee via GitGitGadget
2022-02-24 21:42 ` [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Junio C Hamano
2022-02-24 23:06   ` Junio C Hamano
2022-02-25 13:55     ` Derrick Stolee
2022-02-28 13:53 ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Derrick Stolee via GitGitGadget
2022-02-28 13:53   ` [PATCH v2 1/4] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-28 15:22     ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53   ` [PATCH v2 2/4] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-28 15:25     ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53   ` [PATCH v2 3/4] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:30     ` Ævar Arnfjörð Bjarmason
2022-02-28 16:43       ` Derrick Stolee
2022-02-28 13:53   ` [PATCH v2 4/4] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-28 15:40     ` Ævar Arnfjörð Bjarmason
2022-03-01 17:23   ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Ævar Arnfjörð Bjarmason
2022-03-01 19:48   ` [PATCH v3 0/5] " Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 1/5] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 2/5] t5318: extract helpers to lib-commit-graph.sh Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 3/5] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-03-01 20:13       ` Junio C Hamano
2022-03-01 20:30         ` Junio C Hamano
2022-03-02 14:13           ` Derrick Stolee
2022-03-01 19:48     ` [PATCH v3 4/5] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 5/5] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=06ea3190-32d0-c792-0ae9-c5600305f158@github.com \
    --to=derrickstolee@github.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.