git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: Derrick Stolee <stolee@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging
Date: Fri, 27 Mar 2020 04:42:48 -0400	[thread overview]
Message-ID: <20200327084248.GA607390@coredump.intra.peff.net> (raw)
In-Reply-To: <20200324230826.GA42939@syl.local>

On Tue, Mar 24, 2020 at 05:08:26PM -0600, Taylor Blau wrote:

> > The trouble is that I'm not sure what _should_ happen. Aborting the
> > whole commit-graph generation seems overboard (and would be annoying for
> > cases where whole swaths of history became unreachable and went away;
> > presumably we'd be dropping _all_ of those objects, and the write phase
> > would be just fine). I have a feeling the correct solution is to do this
> > merging pass earlier, before we check close_reachable(). Or if not a
> > true merging pass, at least a pass to check which existing entries are
> > still valid. But I don't know how painful that reordering would be.
> 
> Maybe, but I'm not sure that moving 'close_reachable' up would
> necessarily solve all of our problems. Or, at least, that it wouldn't
> create a set of new problems :).
> 
> Let's say that you have (1) some connected component of your history
> that is written to a commit-graph, where (2) part of that cluster has
> been dropped (e.g., due to corruption, a rogue 'git gc', etc), and (3)
> you're writing a new graph with '--input=graphed'.
> 
> What is 'close_reachable' supposed to do? If some of the now-missing
> commits are in the reachability closure of the commits that still exist
> in the repository, then we're back to where we started. We *could* have
> 'close_reachable' take all missing commits and drop their ancestors and
> descendants, but this creates quite the headache for 'close_reachable',
> which now needs to keep track of such things.

Yeah, I think my suggestion was dumb. I was thinking that
close_reachable() might somehow fix things up, but all it ever does with
the current code is _add_ commits to the graph. And we would want the
opposite: removing ones which can no longer be present, because their
ancestors are gone. That would be possible, but would be quite a pain
(you'd have to walk an inversion of the parent DAG).

> I'm hopeful that this isn't so common in practice, but I'm also not
> entirely sure one way or another. I can plan to deploy this patch to
> GitHub's servers for a ~month and see if we experience it. If it turns
> out to be common, I'll assume that others have this problem, too, in
> which case we can go back and think more about it.

It _shouldn't_ be too common, because it implies that we have commit X
but not its ancestor X^. And the pruning and repacking code tries to
avoid throwing away X^ in such a case. It could still happen due to a
race, though.

So yeah, I think we should mostly ignore it. It's not a new problem with
your series (and the only reason it is more common with your series is
that the old code was actually incorrectly handling some cases). It
might be worth turning the "missing parent" BUG() into a die(), since we
know it's reachable. But that's largely academic, since in either case
commit-graph is going to barf and fail to write.

As a follow-up to this part for anyone else following the discussion:

> I can plan to deploy this patch to GitHub's servers for a ~month and
> see if we experience it.

...I don't think we'll actually generate good data here. We're probably
going to end up doing our "big maintenance" commit-graph roll-ups by
just feeding --reachable as input, and dropping all of the existing
graphs.

-Peff

  reply	other threads:[~2020-03-27  8:42 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-21  3:44 [PATCH 0/1] commit-graph: avoid unnecessary tag deference when merging Taylor Blau
2020-03-21  3:44 ` [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference " Taylor Blau
2020-03-21  5:00   ` Jeff King
2020-03-21  6:11     ` Taylor Blau
2020-03-21  6:24       ` Taylor Blau
2020-03-21  7:03       ` Jeff King
2020-03-21 17:27         ` Taylor Blau
2020-03-22  5:36           ` Jeff King
2020-03-22 11:04             ` SZEDER Gábor
2020-03-22 18:45               ` looking up object types quickly, was " Jeff King
2020-03-22 19:18                 ` Jeff King
2020-03-23 20:15               ` Taylor Blau
2020-03-22 16:45             ` Taylor Blau
2020-03-24  6:06               ` Jeff King
2020-03-21 18:50         ` Junio C Hamano
2020-03-22  0:03           ` Derrick Stolee
2020-03-22  0:20             ` Taylor Blau
2020-03-22  0:23               ` Derrick Stolee
2020-03-22  5:49                 ` Jeff King
2020-03-22  6:04                   ` Jeff King
2020-03-22 15:47                     ` Taylor Blau
2020-03-24  6:11                       ` Jeff King
2020-03-24 23:08                         ` Taylor Blau
2020-03-27  8:42                           ` Jeff King [this message]
2020-03-27 15:03                             ` Taylor Blau
2020-03-22 15:44                   ` Taylor Blau
2020-03-24  6:14                     ` Jeff King
2020-03-21  5:01   ` Junio C Hamano
2020-03-21  4:56 ` [PATCH 0/1] commit-graph: avoid unnecessary tag deference " Junio C Hamano
2020-03-21  5:04   ` Jeff King
2020-03-21  6:12     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200327084248.GA607390@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).