All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>,
	Thomas Braun <thomas.braun@virtuell-zuhause.de>
Cc: Taylor Blau <me@ttaylorr.com>,
	GIT Mailing-list <git@vger.kernel.org>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: 2.29.0.rc0.windows.1: Duplicate commit id error message when fetching
Date: Thu, 8 Oct 2020 08:50:22 -0400	[thread overview]
Message-ID: <52782500-274e-2c72-39e2-be4252959d47@gmail.com> (raw)
In-Reply-To: <20201008120658.GA2689590@coredump.intra.peff.net>

On 10/8/2020 8:06 AM, Jeff King wrote:
> On Thu, Oct 08, 2020 at 11:52:03AM +0200, Thomas Braun wrote:
> 
>>> Is it possible to share the contents of your .git directory? If not, can
>>> you look in .git/objects/info/ and see if there are multiple
>>> commit-graph files (and if so, possibly share those; they don't contain
>>> any identifying info).
>>
>> Yes sure, I can share that [1]. Thanks for looking into that.

Thank you for the report! And thanks for sharing your data.

> To solve your immediate problem, you can just remove the whole
> .git/objects/info/commit-graphs directory. It doesn't have any data that
> can't be regenerated from the actual objects.

Yes, this is the best way forward for issues like this. Sorry
for the inconvenience!
 
> The rest of this email is my look at what the actual bug is.
...
> So I _think_ everything being done by that patch is correct, and we
> didn't see the problem before simply because we were erroneously not
> rolling up the graph files. And once we do, we can see that indeed we
> have the same commit in two files.

I think the die() that is reporting this answer is doing so because
this state is one that _shouldn't_ happen. The intention is that
there is always only one "graph position" for a commit object, and
any case where we have multiple is an unknown and unsupported situation.

However, how wrong would that be (as long as the two "rows" have the
same data)? It depends on how we find the commit:

1. If we are trying to look up a commit from a ref, then the binary
   search into the commit-graph(s) will find one of the rows first,
   then set "graph_pos" in the commit_graph_data_slab for that commit.

2. If we are looking up a commit via a graph position from a child
   commit in the commit-graph, then we will immediately navigate to
   that specific row to find the OID. If we previously parsed that
   commit, then we will not try to parse it a second time based on
   the populated "struct commit" we get from lookup_commit().
   Otherwise, we'll read the row to fill the parents and other data
   at that commit.

So, it seems that the existing commit-graph reading strategy can
handle this "duplicate commit" case (at least, across layers).

 If I instrument the commit-graph code
> like this (I couldn't find a command to dump incremental graph file
> data; is there one?):

Not really. Could be useful for cases like this, though.

> I'm not sure how that happened, and whether it's a bug that we got into
> this state at all.

It is likely a bug that we got into this state. We should still be
able to handle it gracefully.

> But regardless, it seems unfriendly that we can't
> get out of it while merging the graphs. Doing this obviously makes the
> problem go away:
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index cb042bdba8..ae1f94ccc4 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -2023,8 +2023,11 @@ static void sort_and_scan_merged_commits(struct write_commit_graph_context *ctx)
>  
>  		if (i && oideq(&ctx->commits.list[i - 1]->object.oid,
>  			  &ctx->commits.list[i]->object.oid)) {
> -			die(_("unexpected duplicate commit id %s"),
> -			    oid_to_hex(&ctx->commits.list[i]->object.oid));
> +			/*
> +			 * quietly ignore duplicates; these could come from
> +			 * incremental graph files mentioning the same commit.
> +			 */
> +			continue;
>  		} else {
>  			unsigned int num_parents;
>  
> 
> but it's not clear to me if that's papering over another bug, or
> gracefully handling a situation that we ought to be.

I think this is a good thing to do, at minimum. As I discussed above,
the "input data" of the incremental commit-graph chain with duplicate
commits across layers isn't actually _invalid_. It's unexpected based
on what Git "should" be doing.

This kind of change is something we could possibly handle within the
RC window, since it only unblocks people who are already in a bad state.

It would also be good to see if we can discover how this happened in
the first place, but that might be a more lengthy investigation and
patch.

Thanks,
-Stolee

  reply	other threads:[~2020-10-08 12:50 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-07 20:28 2.29.0.rc0.windows.1: Duplicate commit id error message when fetching Thomas Braun
2020-10-07 21:06 ` Jeff King
2020-10-08  9:52   ` Thomas Braun
2020-10-08 12:06     ` Jeff King
2020-10-08 12:50       ` Derrick Stolee [this message]
2020-10-08 13:22         ` Derrick Stolee
2020-10-09 15:29           ` Thomas Braun
2020-10-09 16:49             ` Derrick Stolee
2020-10-09 17:12               ` Thomas Braun
2020-10-09 17:46                 ` Derrick Stolee
2020-10-09 17:55                   ` Jeff King
2020-10-09 18:28                     ` Taylor Blau
2020-10-09 18:33                       ` Derrick Stolee
2020-10-09 18:37                         ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52782500-274e-2c72-39e2-be4252959d47@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=thomas.braun@virtuell-zuhause.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.