Re: How to move forward with Reftable

From: Jonathan Nieder <jrnieder@gmail.com>
To: Han-Wen Nienhuys <hanwen@google.com>
Cc: git <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Patrick Steinhardt <ps@pks.im>
Subject: Re: How to move forward with Reftable
Date: Mon, 11 May 2020 11:36:02 -0700	[thread overview]
Message-ID: <20200511183602.GA141481@google.com> (raw)
In-Reply-To: <CAFQ2z_Ne-1TWMRS88ADhzQmx4OfzNEkVUJ1anjf57mQD4Gdywg@mail.gmail.com>

(+cc: a few people I'm particularly interested in feedback from)

Han-Wen Nienhuys wrote[3]:

> I have noticed that not everyone is happy with the reftable series as
> it currently stands. I wanted to give some background on why it looks
> the way it does today, I want to solicit feedback on where to go from
> here, and share my worries about the negative feedback I've gotten so
> far.
[...]
> As it currently stands, the library implements the specification
> completely, including the parts that are currently probably not useful
> for Git, like support for the object-ID/ref mapping. The API between
> reftable and git is extremely narrow: it's about 20 functions, and 5
> structure definitions. It's also well tested (see coverage [2]. As of
> today, the tests are leak-free according to valgrind). This is also
> why I don't believe Johannes' argument that, because he found one bug,
> it must be full of bugs.
>
> This approach has been successful, in the sense that the libgit2
> project has experimented with integrating it, see
> https://github.com/libgit2/libgit2/pull/5462. They seem happy at the
> prospect of integrating existing code rather than reimplementing it.

Separate from the integration aspect is that this is not yet
battle-tested code.  One benefit of sharing code is to be able to share
the benefit of users testing it.

Since the ref system is fairly modular and this is about a non-default
backend, it's likely okay to integrate it initially as "experimental"
and then update docs as we gain confidence.

> Johannes had suggested that this should be developed and maintained in
> git-core first, and the result could then be reused by libgit2
> project. According to the libgit2 folks, this what that would look
> like:
>
> """
>     - It needs to be easy to split out from git-core. If it is
>       self-contained in a single directory, then I'd be sufficiently
>       happy already.
>
>     - It should continue providing a clean interface to external
>       callers. This also means that its interface should stay stable so
>       we don't have to adapt on every update. git-core historically
>       never had such promises, but it kind of worked out for the xdiff
>       code.
>
>     - My most important fear would be that the reftable interface
>       becomes heavily dependent on git-core's own data types. We had
>       this discussion at the Contributor's Summit already, but if it
>       starts adopting things like git-core's own string buffer then it
>       would become a lot harder for us to use it.
>
>     - Probably obvious, but contained in the above is that it stays
>       compilable on its own. So even if you split out its directory and
>       wire up some build instructions, it should not have any
>       dependencies on git-core.
> """
>
> (for the discussion at the summit:
> https://lore.kernel.org/git/1B71B54C-E000-4CEB-8AC6-3DB86E96E31A@jramsay.com.au/)
>
> I can make that work, but it would be good to know if this is
> something the project is OK with in principle, or whether the code
> needs to be completely Git-ified. If the latter happens, that would
> effectively fork the code, which I think is a missed opportunity.

There's been some discussion about use of strbuf versus the module's
own growing-buffer "struct slice".  Is that the only instance of this
kind of infrastructure duplication or are there others?

> I have received a lot of negative comments, and I can't gauge well
> what to make of them, but they kindle several worries:
>
> * The git community decides they don't need the object reverse index,
>   and do not write 'o' section in reftables, because that culls 1000
>   lines of code. This generally works, but will cause hard to debug
>   performance regressions if command-line git is used on a gerrit
>   server.
>
> * The git community looks at this, and decides the standard is too
>   complex, and goes off to create a reftable v3.
>
> * The git community asks me to do a ton of superficial work (eg. slice
>   -> strbuf), and then decides the overall design needs to be different,
>   and should be completely rewritten.
>
> Jonathan Nieder said I shouldn't worry about standards compliance,
> because the Git project has already ratified the reftable standard,
> and wouldn't want to break JGit compatibility, but it would be good to
> have the community leaders reaffirm this stance.

Recording from today's IRC standup for easy reference later:

  <gitster> jrnieder: that might have been solved already by me declaring
  that whether they like it or not, reftable must be in git-core to match
  what JGit has.

> For my last worry, it would be good if someone would make an
> assessment of the overall design to see if it is acceptable.

Thanks.  I think Peff did a little in this area; help from others
would be welcome as well.

> Once we have a path forward we can think of a way of integrating the
> code. I think it may take a while to shake out bugs. Not bugs in the
> reftable library itself, but Git is not very strict in enforcing
> proper use of the ref backend abstraction (eg. pseudorefs [1]). Many
> integration tests also don't go through proper channels for accessing
> refs.

Yes, I'm *very* excited that the series includes a knob for running
the testsuite using the reftable ref backend, providing a way for
anyone interested to pitch in to help with the issues they reveal
(both in Git and in its testsuite).

Summary: thanks for writing this.  I personally think your proposal is
a reasonable one, but I'm looking forward to hearing from others as
well.  I'm also looking forward to discussing "struct slice" and the
overall architecture in the context of the patches[4] and making some
more progress.

Sincerely,
Jonathan

> cheers,
>
> [1] https://lore.kernel.org/git/CAFQ2z_NZgkPE+3oazfb_m0_7TWxHjje1yYCc0bMZG05_KUKEow@mail.gmail.com/
>
> [2] https://hanwen.home.xs4all.nl/public/software/reftable-coverage/

[3] https://lore.kernel.org/git/CAFQ2z_Ne-1TWMRS88ADhzQmx4OfzNEkVUJ1anjf57mQD4Gdywg@mail.gmail.com/

[4] https://lore.kernel.org/git/pull.539.v12.git.1588845585.gitgitgadget@gmail.com/