git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	git@vger.kernel.org, jonathantanmy@google.com, gitster@pobox.com,
	newren@gmail.com, Jay Conrod <jayconrod@google.com>
Subject: Re: [PATCH v2 2/2] shallow.c: use '{commit,rollback}_shallow_file'
Date: Wed, 3 Jun 2020 13:26:08 -0600	[thread overview]
Message-ID: <20200603192608.GB24049@syl.local> (raw)
In-Reply-To: <1253efb6-f1bc-0a16-68e3-c1bc07e1bc18@gmail.com>

Hi Stolee,

On Wed, Jun 03, 2020 at 09:08:26AM -0400, Derrick Stolee wrote:
> On 6/3/2020 1:16 AM, Taylor Blau wrote:
> > On Tue, Jun 02, 2020 at 10:52:48PM -0600, Taylor Blau wrote:
> >> Hi Jonathan,
> >>
> >> On Tue, Jun 02, 2020 at 08:42:13PM -0700, Jonathan Nieder wrote:
> >>> Hi,
> >>>
> >>> Taylor Blau wrote:
> >>>
> >>>> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> >>>> ---
> >>>>  builtin/receive-pack.c   |  4 ++--
> >>>>  commit.h                 |  2 ++
> >>>>  fetch-pack.c             | 10 +++++-----
> >>>>  shallow.c                | 30 +++++++++++++++++++++---------
> >>>>  t/t5537-fetch-shallow.sh | 29 +++++++++++++++++++++++++++++
> >>>>  5 files changed, 59 insertions(+), 16 deletions(-)
> >>>
> >>> I haven't investigated the cause yet, but I've run into an interesting
> >>> bug that bisects to this commit.  Jay Conrod (cc-ed) reports:
> >>>
> >>> | I believe this is also the cause of Go toolchain test failures we've
> >>> | been seeing. Go uses git to fetch dependencies.
> >>> |
> >>> | The problem we're seeing can be reproduced with the script below. It
> >>> | should print "success". Instead, the git merge-base command fails
> >>> | because the commit 7303f77963648d5f1ec5e55eccfad8e14035866c
> >>> | (origin/master) has no history.
> >>>
> >>> -- 8< --
> >>> #!/bin/bash
> >>>
> >>> set -euxo pipefail
> >>> if [ -d legacytest ]; then
> >>>   echo "legacytest directory already exists" >&2
> >>>   exit 1
> >>> fi
> >>> mkdir legacytest
> >>> cd legacytest
> >>> git init --bare
> >>> git config protocol.version 2
> >>> git config fetch.writeCommitGraph true
> >>> git remote add origin -- https://github.com/rsc/legacytest
> >>> git fetch -f --depth=1 origin refs/heads/master:refs/heads/master
> >>> git fetch -f origin 'refs/heads/*:refs/heads/*' 'refs/tags/*:refs/tags/*'
> >>> git fetch --unshallow -f origin
> >>> git merge-base --is-ancestor -- v2.0.0 7303f77963648d5f1ec5e55eccfad8e14035866c
> >>> echo success
> >>> -- >8 --
> >>
> >> Thanks to you and Jay for the report and reproduction script. Indeed, I
> >> can reproduce this on the tip of master (which is equivalent to v2.27.0
> >> at the time of writing).
> >>
> >>> The fetch.writeCommitGraph part is interesting.  When does a commit
> >>> graph file get written in this sequence of operations?  In an
> >>> unshallow operation, does the usual guard against writing a commit
> >>> graph in a shallow repo get missed?
> >>
> >> The last 'git fetch' is the one that writes the commit-graph. You can
> >> verify this by sticking a 'ls objects/info' after each 'git' invocation
> >> in your script.
> >>
> >> Here's where things get weird, though. Prior to this patch, Git would
> >> pick up that the repository is shallow before unshallowing, but never
> >> invalidate this fact after unshallowing. That means that once we got to
> >> 'write_commit_graph', we'd exit immediately since it appears as if the
> >> repository is shallow.
> >>
> >> In this patch, we don't do that anymore, since we rightly unset the fact
> >> that we are (were) shallow.
> >>
> >> In a debugger, I ran your script and a 'git commit-graph write --split
> >> --reachable' side-by-side, and found an interesting discrepancy: some
> >> commits (loaded from 'copy_oids_to_commits') *don't* have their parents
> >> set when invoked from 'git fetch', but *do* when invoked as 'git
> >> commit-graph write ...'.
> >>
> >> I'm not an expert in the object cache, but my hunch is that when we
> >> fetch these objects they're marked as parsed without having loaded their
> >> parents. When we load them again via 'lookup_object', we get objects
> >> that look parsed, but don't have parents where they otherwise should.
> >
> > Ah, this only sort of has to do with the object cache. In
> > 'parse_commit_buffer()' we stop parsing parents in the case that the
> > repository is shallow (this goes back to 7f3140cd23 (git repack: keep
> > commits hidden by a graft, 2009-07-23)).
> >
> > That makes me somewhat nervous. We're going to keep any objects opened
> > prior to unshallowing in the cache, along with their hidden parents. I
> > suspect that this is why Git has kept the shallow bit as sticky for so
> > long.
> >
> > I'm not quite sure what to do here. I think that any of the following
> > would work:
> >
> >   * Keep the shallow bit sticky, at least for fetch.writeCommitGraph
> >     (i.e., pretend as if fetch.writecommitgraph=0 in the case of
> >     '--unshallow').
>
> I'm in favor of this option, if possible. Anything that alters the
> commit history in-memory at any point in the Git process is unsafe to
> combine with a commit-graph read _or_ write. I'm sorry that the guards
> in commit_graph_compatible() are not enough here.
>
> >   * Dump the object cache upon un-shallowing, forcing us to re-discover
> >     the parents when they are no longer hidden behind a graft.
> >
> > The latter seems like the most complete feasible fix. The former should
> > work fine to address this case, but I wonder if there are other
> > call-sites that are affected by this behavior. My hunch is that this is
> > a unique case, since it requires going from shallow to unshallow in the
> > same process.
>
> The latter would solve issues that could arise outside of the commit-graph
> space. But it also presents an opportunity for another gap if someone edits
> the shallow logic without putting in the proper guards.
>
> To be extra safe, I'd be in favor of adding an "if (grafts_ever_existed)"
> condition in commit_graph_compatible() based on a global that is assigned
> a non-zero value whenever grafts are loaded at any point in the process,
> mostly because it would be easy to guarantee that it is safe. It could
> even be localized to the repository struct.
>
> > I have yet to create a smaller test case, but the following should be
> > sufficient to dump the cache of parsed objects upon shallowing or
> > un-shallowing:
> >
> > diff --git a/shallow.c b/shallow.c
> > index b826de9b67..06db857f53 100644
> > --- a/shallow.c
> > +++ b/shallow.c
> > @@ -90,6 +90,9 @@ static void reset_repository_shallow(struct repository *r)
> >  {
> >  	r->parsed_objects->is_shallow = -1;
> >  	stat_validity_clear(r->parsed_objects->shallow_stat);
> > +
> > +	parsed_object_pool_clear(r->parsed_objects);
> > +	r->parsed_objects = parsed_object_pool_new();
> >  }
> >
> >  int commit_shallow_file(struct repository *r, struct shallow_lock *lk)
> >
> > Is this something we want to go forward with? Are there some
> > far-reaching implications that I'm missing?
>
> I'd like to see the extra-careful check, in addition to this one. This
> is such a rarely-used and narrowly-tested case that we need to be really
> really careful to avoid regressions.

I'm a little confused at which suggestion you're in favor of ;-). For
clarity, are you suggesting that we add a new 'r->grafts_ever_existed'
bit in addition to doing a hard reset of the object pool?

> Thanks,
> -Stolee

Thanks,
Taylor

  reply	other threads:[~2020-06-03 19:26 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-21 18:09 [PATCH] shallow.c: use 'reset_repository_shallow' when appropriate Taylor Blau
2020-04-21 20:41 ` Junio C Hamano
2020-04-21 20:45   ` Taylor Blau
2020-04-21 20:52     ` Junio C Hamano
2020-04-21 22:21       ` Taylor Blau
2020-04-21 23:06         ` Junio C Hamano
2020-04-22 18:05       ` Jonathan Tan
2020-04-22 18:02 ` Jonathan Tan
2020-04-22 18:15   ` Junio C Hamano
2020-04-23  0:14     ` Taylor Blau
2020-04-23  0:25       ` [PATCH v2 0/2] shallow.c: reset shallow-ness after updating Taylor Blau
2020-04-23  0:25         ` [PATCH v2 1/2] t5537: use test_write_lines, indented heredocs for readability Taylor Blau
2020-04-23  1:14           ` Jonathan Nieder
2020-04-24 17:11             ` Taylor Blau
2020-04-24 17:17               ` Jonathan Nieder
2020-04-24 20:45               ` Junio C Hamano
2020-04-23  0:25         ` [PATCH v2 2/2] shallow.c: use '{commit,rollback}_shallow_file' Taylor Blau
2020-04-23  1:23           ` Jonathan Nieder
2020-04-23 18:09           ` Jonathan Tan
2020-04-23 20:40             ` Junio C Hamano
2020-04-24 17:13               ` Taylor Blau
2020-06-03  3:42           ` Jonathan Nieder
2020-06-03  4:52             ` Taylor Blau
2020-06-03  5:16               ` Taylor Blau
2020-06-03 13:08                 ` Derrick Stolee
2020-06-03 19:26                   ` Taylor Blau [this message]
2020-06-03 21:23                   ` Jonathan Nieder
2020-06-03 20:51                 ` Jonathan Nieder
2020-06-03 22:14                   ` Taylor Blau
2020-06-03 23:06                     ` Jonathan Nieder
2020-06-04 17:45                       ` Taylor Blau
2020-04-23 19:05       ` [PATCH] shallow.c: use 'reset_repository_shallow' when appropriate Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200603192608.GB24049@syl.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jayconrod@google.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).