On Tue, Mar 01, 2022 at 09:06:44AM -0500, Derrick Stolee wrote: > On 3/1/2022 5:35 AM, Patrick Steinhardt wrote: > > On Tue, Mar 01, 2022 at 10:46:14AM +0100, Patrick Steinhardt wrote: > >> On Mon, Feb 28, 2022 at 01:44:01PM -0500, Derrick Stolee wrote: > >>> On 2/28/2022 11:59 AM, Patrick Steinhardt wrote: > >>>> On Mon, Feb 28, 2022 at 11:23:38AM -0500, Derrick Stolee wrote: > >>>>> On 2/28/2022 10:18 AM, Patrick Steinhardt wrote: > >>>>>> I haven't yet found the time to dig deeper into why this is happening. > >>>>>> While the repository is publicly accessible at [1], unfortunately the > >>>>>> bug seems to be triggered by a commit that's only kept alive by an > >>>>>> internal reference. > >>>>>> > >>>>>> Patrick > >>>>>> > >>>>>> [1]: https://gitlab.com/gitlab-com/www-gitlab-com.git > >>>>> > >>>>> Thanks for including this information. Just to be clear: did you > >>>>> include patch 4 in your tests, or not? Patch 4 includes a fix > >>>>> related to overflow values, so it would be helpful to know if you > >>>>> found a _different_ bug or if it is the same one. > >>>>> > >>>>> Thanks, > >>>>> -Stolee > >>>> > >>>> I initially only applied the first three patches, but after having hit > >>>> the fatal error I also applied the rest of this series to have a look at > >>>> whether it is indeed fixed already by one of your later patches. The > >>>> error remains the same though. > >>> > >>> Thanks for this extra context. Is this a commit-graph that you wrote > >>> with the first three patches and then you get an error when reading it? > >>> > >>> Do you get the same error when deleting that file and rewriting it with > >>> all patches included? > >>> > >>> Thanks, > >>> -Stolee > >> > >> Yes, I do. I've applied all four patches from v2 on top of 715d08a9e5 > >> (The eighth batch, 2022-02-25) and still get the same results: > >> > >> $ find objects/info/commit-graphs/ > >> objects/info/commit-graphs/ > >> objects/info/commit-graphs/graph-607e641165f3e83a82d5b14af4e611bf2a688f35.graph > >> objects/info/commit-graphs/commit-graph-chain > >> objects/info/commit-graphs/graph-5f357c7573c0075d42d82b28e660bc3eac01bfe8.graph > >> objects/info/commit-graphs/graph-e0c12ead1b61c7c30720ae372e8a9f98d95dfb2d.graph > >> objects/info/commit-graphs/graph-c96723b133c2d81106a01ecd7a8773bb2ef6c2e1.graph > >> > >> $ git commit-graph verify > >> fatal: commit-graph requires overflow generation data but has none > >> > >> $ git commit-graph write > >> Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. > >> Expanding reachable commits in commit graph: 2197197, done. > >> Finding extra edges in commit graph: 100% (2197197/2197197), done. > >> fatal: commit-graph requires overflow generation data but has none > >> > >> $ rm -rf objects/info/commit-graphs/ > >> > >> $ git commit-graph write > >> Finding commits for commit graph among packed objects: 100% (10235119/10235119), done. > >> Expanding reachable commits in commit graph: 2197197, done. > >> Finding extra edges in commit graph: 100% (2197197/2197197), done. > >> fatal: commit-graph requires overflow generation data but has none) > >> > >> So even generating them completely anew doesn't seem to generate the > >> overflow generation data. > >> > >> Patrick > > > > I stand corrected. I forgot that the repository at hand was connected to > > another one via `objects/info/alternates`. If I prune commit-graphs from > > that alternate, too, then it works alright with your patches. > > OK, thanks. That clarifies the situation. > > I ordered the patches such that the fix in patch 4 could be immediately > testable, which is not the case without patch 3. However, it does leave > this temporary state where information can be incorrect if only a subset > of the series is applied. > > > This makes me wonder how such a bugfix should be handled though. As this > > series is right now, users will be faced with repository corruption as > > soon as they upgrade their Git version to one that contains this patch > > series. This corruption needs manual action: they have to go into the > > repository, delete the commit-graphs and then optionally create new > > ones. > > > > This is not a good user experience, and it's worse on the server-side > > where we now have a timeframe where all commit-graphs are potentially > > corrupt. This effectively leads to us being unable to serve those repos > > at all until we have rewritten the commit-graphs because all commands > > which make use of the commit-graph will now die: > > > > $ git log > > fatal: commit-graph requires overflow generation data but has none > > > > So the question is whether this is a change that needs to be rolled out > > over multiple releases. First we'd get in the bug fix such that we write > > correct commit-graphs, and after this fix has been released we can also > > release the fix that starts to actually parse the generation. This > > ensures there's a grace period during which we can hopefully correct the > > data on-disk such that users are not faced with failures. > > You are right that we need to be careful here, but I also think that > previous versions of Git always wrote the correct data. Here is my > thought process: > > 1. To get this bug, we need to have parsed the corrected commit date > from an existing commit-graph in order to under-count the number > of overflow values. > > 2. Before this series, Git versions were not parsing the corrected > commit date, so they recompute the corrected commit date every > time the commit-graph is written, getting the proper count of > overflow values. > > For these reasons, data written by previous versions of Git are > correct and can be trusted without a staged release. > > Does this make sense? Or, do you experience a different result when > you build commit-graphs with a released Git version and then when > writing on top with all patches applied? Just to verify my understanding: you claim that the bug I was hitting shouldn't be encountered in the wild when the release , but only if one were to write a commit-graph with the intermediate stafe until patch 3/4 of your patch series? Hum. I have re-verified, and this indeed seems to play out. So I must've accidentally ran all my testing with the state generated without the final patch which fixes the corruption. I do see lots of the following warnings, but overall I can verify and write the commit-graph just fine: commit-graph generation for commit c80a42de8803e2d77818d0c82f88e748d7f9425f is 1623362063 < 1623362139 Thanks for your patience, and sorry for the noise :) > > The better alternative would probably be to just gracefully handle > > commit-graphs which are corrupted in such a way. Can we maybe just > > continue to not parse generations in case we find that the commit-graph > > doesn't have overflow generation data? > > > > This is more of a general issue though: commit-graphs are an auxiliary > > cache that is not required for proper operation at all. If we fail to > > parse it, then Git shouldn't die but instead fail gracefully just ignore > > it. Furthermore, if we notice that graphs are corrupt when we try to > > write new ones, we may just delete the corrupt versions automatically > > and generate completely new ones. > > You are right that we can be better about failures here and report > and error instead of a die(). Especially in this case, we could just > revert to topological levels instead of throwing out the commit-graph > entirely. > > This seems like something for another series, so we can be sure to > audit all cases of fatal errors when parsing the commit-graph so we > catch all of them and do the "best" thing in each case. I agree. Patrick