* Commit graph chains with no corresponding files? @ 2020-06-29 22:07 Jonathan Tan 2020-06-30 1:51 ` Derrick Stolee 0 siblings, 1 reply; 6+ messages in thread From: Jonathan Tan @ 2020-06-29 22:07 UTC (permalink / raw) Cc: git, stolee, Jonathan Tan At $DAYJOB, a few people have reported "warning: unable to find all commit-graph files" warnings. Their commit-graph-chain files have a few lines, but they only have one commit graph file with very few commits. I suspected something happening during fetch, because (as far as I know) a fetch may cause an incremental commit graph to be written, but I ran a fetch on a large repository myself and didn't run into this problem. Has anyone ran into this problem before, and know how to reproduce? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Commit graph chains with no corresponding files? 2020-06-29 22:07 Commit graph chains with no corresponding files? Jonathan Tan @ 2020-06-30 1:51 ` Derrick Stolee 2020-07-16 22:57 ` [FYI] commit-graph: trace expiry of commit graph links Jonathan Tan 2021-02-25 4:54 ` Commit graph chains with no corresponding files? Bryan Turner 0 siblings, 2 replies; 6+ messages in thread From: Derrick Stolee @ 2020-06-30 1:51 UTC (permalink / raw) To: Jonathan Tan; +Cc: git On 6/29/2020 6:07 PM, Jonathan Tan wrote: > At $DAYJOB, a few people have reported "warning: unable to find all > commit-graph files" warnings. Their commit-graph-chain files have a few > lines, but they only have one commit graph file with very few commits. I > suspected something happening during fetch, because (as far as I know) a > fetch may cause an incremental commit graph to be written, but I ran a > fetch on a large repository myself and didn't run into this problem. > > Has anyone ran into this problem before, and know how to reproduce? The incremental commit-graph code deletes any commit-graph files that do not appear in the chain. I believe this is done by comparing the contents of the ".git/objects/info/commit-graphs/" directory to the contents of the chain file. These appear to be case-sensitive, full-path comparisons. It is _possible_ that something like a case switch or a symlink could be causing a problem here. That's where I would look on the affected systems. Likely the full-path comparison in expire_commit_graphs() should be dropped in favor of local filename comparisons. A case- sensitive match is less likely to be important here since Git is writing the paths itself and should get the proper case back from the directory listing. Thanks, -Stolee ^ permalink raw reply [flat|nested] 6+ messages in thread
* [FYI] commit-graph: trace expiry of commit graph links 2020-06-30 1:51 ` Derrick Stolee @ 2020-07-16 22:57 ` Jonathan Tan 2021-02-25 4:54 ` Commit graph chains with no corresponding files? Bryan Turner 1 sibling, 0 replies; 6+ messages in thread From: Jonathan Tan @ 2020-07-16 22:57 UTC (permalink / raw) To: git; +Cc: Jonathan Tan, stolee When an obsolete link in the commit graph chain is deleted, and trace2 is enabled, trace a message that it is deleted, along with the list of links before and after the current chain refresh. The messages are emitted using trace2_data_string() and not trace2_printf() because the latter is not implemented for some trace targets. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- This patch is just for informational purposes for people who have the same problem and/or want to help diagnose. I've tested that this produces the expected trace2 messages using: git commit-graph write --reachable --split=replace > It is _possible_ that something like a case switch or a symlink > could be causing a problem here. That's where I would look on > the affected systems. Indeed a symlink is present - the affected repositories are using the git-repo [1] local mirror feature, which causes .git/objects (among other things) to be a symlink - but I couldn't figure out how this symlink would cause problems. In particular, looking at the code, all relevant filenames seem to be constructed from ctx->odb->path, so no matter which names were used to get to the object directory, all names are built from those names, and this seems to be true in practice as well. So I've added some trace2 messages (in this patch), and let's see if I can figure out what's going on. I'll write back if I find something. [1] https://gerrit.googlesource.com/git-repo --- commit-graph.c | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/commit-graph.c b/commit-graph.c index 328ab06fd4..b5bd2ac6de 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -2009,6 +2009,7 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx) struct dirent *de; size_t dirnamelen; timestamp_t expire_time = time(NULL); + int commit_graph_deleted = 0; if (ctx->split_opts && ctx->split_opts->expire_time) expire_time = ctx->split_opts->expire_time; @@ -2050,8 +2051,38 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx) } } - if (!found) + if (!found) { + if (trace2_is_enabled()) { + struct strbuf message = STRBUF_INIT; + + strbuf_addf(&message, "Deleting '%s' because it is not in [", path.buf); + for (i = 0; i < ctx->num_commit_graphs_after; i++) { + if (i != 0) + strbuf_addstr(&message, ", "); + strbuf_addf(&message, "'%s'", ctx->commit_graph_filenames_after[i]); + } + strbuf_addstr(&message, "]"); + trace2_data_string("commit-graph", the_repository, "graph-deletion", message.buf); + strbuf_release(&message); + commit_graph_deleted = 1; + } unlink(path.buf); + } + } + + if (commit_graph_deleted) { + struct strbuf message = STRBUF_INIT; + uint32_t i; + + strbuf_addstr(&message, "The commit graphs before were ["); + for (i = 0; i < ctx->num_commit_graphs_before; i++) { + if (i != 0) + strbuf_addstr(&message, ", "); + strbuf_addf(&message, "'%s'", ctx->commit_graph_filenames_before[i]); + } + strbuf_addstr(&message, "]"); + trace2_data_string("commit-graph", the_repository, "graph-before", message.buf); + strbuf_release(&message); } out: -- 2.28.0.rc0.105.gf9edc3c819-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Commit graph chains with no corresponding files? 2020-06-30 1:51 ` Derrick Stolee 2020-07-16 22:57 ` [FYI] commit-graph: trace expiry of commit graph links Jonathan Tan @ 2021-02-25 4:54 ` Bryan Turner 2021-02-25 14:20 ` Derrick Stolee 1 sibling, 1 reply; 6+ messages in thread From: Bryan Turner @ 2021-02-25 4:54 UTC (permalink / raw) To: Derrick Stolee; +Cc: Jonathan Tan, Git Users On Mon, Jun 29, 2020 at 6:51 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 6/29/2020 6:07 PM, Jonathan Tan wrote: > > At $DAYJOB, a few people have reported "warning: unable to find all > > commit-graph files" warnings. Their commit-graph-chain files have a few > > lines, but they only have one commit graph file with very few commits. I > > suspected something happening during fetch, because (as far as I know) a > > fetch may cause an incremental commit graph to be written, but I ran a > > fetch on a large repository myself and didn't run into this problem. > > > > Has anyone ran into this problem before, and know how to reproduce? I don't have any specific reproduction steps, but we've just run into our first case of this on Git 2.29. I ended up kicking off a full `git commit-graph write` to fix it. That displayed the same warning, but commands run after it no longer do. Prior to writing the new graph, I had this: $ ls commit-graph-chain graph-88f5fe6e0c659e3742e556982263813d528ead81.graph Afterward, the `objects/info/commits-graphs` directory still exists but is empty, and there is now an `objects/commit-graph` that didn't exist before. `git commit-graph verify` seems happy with the state of things. > > The incremental commit-graph code deletes any commit-graph files > that do not appear in the chain. I believe this is done by comparing > the contents of the ".git/objects/info/commit-graphs/" directory to > the contents of the chain file. > > These appear to be case-sensitive, full-path comparisons. > > It is _possible_ that something like a case switch or a symlink > could be causing a problem here. That's where I would look on > the affected systems. Are commit graphs potentially problematic in repositories that are borrowing objects from other repositories via alternates? Have there been important changes to commit graphs since 2.29? > > Likely the full-path comparison in expire_commit_graphs() should > be dropped in favor of local filename comparisons. A case- > sensitive match is less likely to be important here since Git > is writing the paths itself and should get the proper case back > from the directory listing. > > Thanks, > -Stolee > -b ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Commit graph chains with no corresponding files? 2021-02-25 4:54 ` Commit graph chains with no corresponding files? Bryan Turner @ 2021-02-25 14:20 ` Derrick Stolee 2021-02-27 2:49 ` Bryan Turner 0 siblings, 1 reply; 6+ messages in thread From: Derrick Stolee @ 2021-02-25 14:20 UTC (permalink / raw) To: Bryan Turner; +Cc: Jonathan Tan, Git Users On 2/24/2021 11:54 PM, Bryan Turner wrote: > On Mon, Jun 29, 2020 at 6:51 PM Derrick Stolee <stolee@gmail.com> wrote: >> >> On 6/29/2020 6:07 PM, Jonathan Tan wrote: >>> At $DAYJOB, a few people have reported "warning: unable to find all >>> commit-graph files" warnings. Their commit-graph-chain files have a few >>> lines, but they only have one commit graph file with very few commits. I >>> suspected something happening during fetch, because (as far as I know) a >>> fetch may cause an incremental commit graph to be written, but I ran a >>> fetch on a large repository myself and didn't run into this problem. >>> >>> Has anyone ran into this problem before, and know how to reproduce? > > I don't have any specific reproduction steps, but we've just run into > our first case of this on Git 2.29. I ended up kicking off a full `git > commit-graph write` to fix it. That displayed the same warning, but > commands run after it no longer do. Prior to writing the new graph, I > had this: > $ ls > commit-graph-chain graph-88f5fe6e0c659e3742e556982263813d528ead81.graph The contents of the 'commit-graph-chain' file are critical to diagnosing the problems here. Likely it had multiple lines. > Afterward, the `objects/info/commits-graphs` directory still exists > but is empty, and there is now an `objects/commit-graph` that didn't > exist before. `git commit-graph verify` seems happy with the state of > things. Yes, a full rewrite without "--split" will get you to this state. >> The incremental commit-graph code deletes any commit-graph files >> that do not appear in the chain. I believe this is done by comparing >> the contents of the ".git/objects/info/commit-graphs/" directory to >> the contents of the chain file. >> >> These appear to be case-sensitive, full-path comparisons. >> >> It is _possible_ that something like a case switch or a symlink >> could be causing a problem here. That's where I would look on >> the affected systems. > > Are commit graphs potentially problematic in repositories that are > borrowing objects from other repositories via alternates? This was definitely part of the design, with the intention of working with a common base in the alternate. However, if the alternate collapses layers, then the repo that is borrowing from that alternate may have a broken chain. It is likely a better setup to have the alternate keep a commit-graph file and leave the dependent repos clear of a commit-graph. _Or_ the dependent repos should use a full commit-graph instead of a chain. If you have a better idea for how to make this work, then there is room for improvement. For example, if we ensure during the commit-graph write that all layers of the chain are within our local repo, then these dependency issues go away without breaking any old Git versions that are reading the data. > Have there > been important changes to commit graphs since 2.29? Not in the area of commit-graph chains. Thanks, -Stolee ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Commit graph chains with no corresponding files? 2021-02-25 14:20 ` Derrick Stolee @ 2021-02-27 2:49 ` Bryan Turner 0 siblings, 0 replies; 6+ messages in thread From: Bryan Turner @ 2021-02-27 2:49 UTC (permalink / raw) To: Derrick Stolee; +Cc: Jonathan Tan, Git Users On Thu, Feb 25, 2021 at 6:20 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 2/24/2021 11:54 PM, Bryan Turner wrote: > > On Mon, Jun 29, 2020 at 6:51 PM Derrick Stolee <stolee@gmail.com> wrote: > >> > >> On 6/29/2020 6:07 PM, Jonathan Tan wrote: > >>> At $DAYJOB, a few people have reported "warning: unable to find all > >>> commit-graph files" warnings. Their commit-graph-chain files have a few > >>> lines, but they only have one commit graph file with very few commits. I > >>> suspected something happening during fetch, because (as far as I know) a > >>> fetch may cause an incremental commit graph to be written, but I ran a > >>> fetch on a large repository myself and didn't run into this problem. > >>> > >>> Has anyone ran into this problem before, and know how to reproduce? > > > > I don't have any specific reproduction steps, but we've just run into > > our first case of this on Git 2.29. I ended up kicking off a full `git > > commit-graph write` to fix it. That displayed the same warning, but > > commands run after it no longer do. Prior to writing the new graph, I > > had this: > > $ ls > > commit-graph-chain graph-88f5fe6e0c659e3742e556982263813d528ead81.graph > > The contents of the 'commit-graph-chain' file are critical to diagnosing > the problems here. Likely it had multiple lines. > > > Afterward, the `objects/info/commits-graphs` directory still exists > > but is empty, and there is now an `objects/commit-graph` that didn't > > exist before. `git commit-graph verify` seems happy with the state of > > things. > > Yes, a full rewrite without "--split" will get you to this state. > > >> The incremental commit-graph code deletes any commit-graph files > >> that do not appear in the chain. I believe this is done by comparing > >> the contents of the ".git/objects/info/commit-graphs/" directory to > >> the contents of the chain file. > >> > >> These appear to be case-sensitive, full-path comparisons. > >> > >> It is _possible_ that something like a case switch or a symlink > >> could be causing a problem here. That's where I would look on > >> the affected systems. > > > > Are commit graphs potentially problematic in repositories that are > > borrowing objects from other repositories via alternates? > > This was definitely part of the design, with the intention of > working with a common base in the alternate. However, if the > alternate collapses layers, then the repo that is borrowing > from that alternate may have a broken chain. Thanks for the analysis, Derrick. This seems like a likely culprit for how the repository got into this state, because it is a fork (of a fork) and does use a series of alternates. > > It is likely a better setup to have the alternate keep a > commit-graph file and leave the dependent repos clear of a > commit-graph. _Or_ the dependent repos should use a full > commit-graph instead of a chain. Skipping writing the commit graph in forks seems like a reasonable place for us to start, given the way it currently works, but always writing full graphs may be another option. If the fork is able to borrow the commit graph from its origin across the alternate, though, then that implies there may not be a lot of value in writing commit graphs in the forks (since they're likely to share the majority of their refs with their origin). > > If you have a better idea for how to make this work, then there > is room for improvement. > > For example, if we ensure during the commit-graph write that > all layers of the chain are within our local repo, then these > dependency issues go away without breaking any old Git versions > that are reading the data. Naively, this was the way I assumed it already worked--which is why I was writing commit graphs in forks in the first place. > > > Have there > > been important changes to commit graphs since 2.29? > > Not in the area of commit-graph chains. Thanks again! -b > > Thanks, > -Stolee ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-02-27 2:50 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-29 22:07 Commit graph chains with no corresponding files? Jonathan Tan 2020-06-30 1:51 ` Derrick Stolee 2020-07-16 22:57 ` [FYI] commit-graph: trace expiry of commit graph links Jonathan Tan 2021-02-25 4:54 ` Commit graph chains with no corresponding files? Bryan Turner 2021-02-25 14:20 ` Derrick Stolee 2021-02-27 2:49 ` Bryan Turner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).