All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Christian Couder <christian.couder@gmail.com>
Cc: git <git@vger.kernel.org>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Stefan Beller" <sbeller@google.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"David Turner" <novalis@novalis.org>
Subject: Re: Implementing reftable in Git
Date: Fri, 11 May 2018 11:31:57 +0200	[thread overview]
Message-ID: <CAMy9T_H=_+9Z=CpX85Ma4gCyUuvNAPR7fSBHi2J=4nC1XzF2sg@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD0PPZSjBnxCA7ez91vBuatcHKQ+JUWvTD1iHcXzPBjPBg@mail.gmail.com>

On Wed, May 9, 2018 at 4:33 PM, Christian Couder
<christian.couder@gmail.com> wrote:
> I might start working on implementing reftable in Git soon.
> [...]

Nice. It'll be great to have a reftable implementation in git core
(and ideally libgit2, as well). It seems to me that it could someday
become the new default reference storage method. The file format is
considerably more complicated than the current loose/packed scheme,
which is definitely a disadvantage (for example, for other Git
implementations). But implementing it *with good performance and
without races* might be no more complicated than the current scheme.

Testing will be important. There are already many tests specifically
about testing loose/packed reference storage. These will always have
to run against repositories that are forced to use that reference
scheme. And there will need to be new tests specifically about the
reftable scheme. Both classes of tests should be run every time. That
much is pretty obvious.

But currently, there are a lot of tests that assume the loose/packed
reference format on disk even though the tests are not really related
to references at all. ISTM that these should be converted to work at a
higher level, for example using `for-each-ref`, `rev-parse`, etc. to
examine references rather than reading reference files directly. That
way the tests should run correctly regardless of which scheme is in
use.

And since it's too expensive to run the whole test suite with both
reference storage schemes, it seems to me that the reference storage
scheme that is used while running the scheme-neutral tests should be
easy to choose at runtime.

David Turner did some analogous work for wiring up and testing his
proposed LMDB ref storage backend that might be useful [1]. I'm CCing
him, since he might have thoughts on this topic.

Regarding the reftable spec itself:

I recently gave a little internal talk about it, and while preparing
the talk I noticed a couple of things that should maybe be tweaked:

* The spec proposes to change `$GIT_DIR/refs`, which is currently a
directory that holds the loose refs, into a file that holds the table
of contents of reftable files comprising the full set of references.
This was my suggestion. I was thinking that this would prevent old
refs code from being used accidentally on a reftable-enabled
repository, while still enabling old versions of Git recognize this as
a git directory [2]. I think that the latter is important to make
things like `git rev-parse --git-dir` work correctly, even if the
installed version of git can't actually *read* the repository.

  The problem is that `is_git_directory()` checks not only whether
`$GIT_DIR/refs` exists, but also whether it is executable (i.e., since
it is normally a directory, that it is searchable). It would be silly
to make the reftable table of contents executable, so this doesn't
seem like a good approach after all.

  So probably `$GIT_DIR/refs` should continue to be a directory. If
it's there, it would probably make sense to place the reftable files
and maybe the ToC inside of it. We would have to rely on older Git
versions refusing to work in the directory because its `config` file
has an unrecognized `core.repositoryFormatVersion`, but that should be
OK I think.

* The scheme for naming reftable files [3] is, I believe, just a
suggestion as far as the spec is concerned (except for the use of
`.ref`/`.log` file extensions). It might be more less unwieldy to use
`%d` rather than `%08d`, and more convenient to name compacted files
to `${min_update_index}-${max_update_index}_${n}.{ref,log}` to make it
clearer to see by inspection what each file contains. That would also
make it unnecessary, in most cases, to insert a `_${n}` to make the
filename unique.

Michael

[1] https://github.com/dturner-tw/git/tree/dturner/pluggable-backends
[2] https://github.com/git/git/blob/ccdcbd54c4475c2238b310f7113ab3075b5abc9c/setup.c#L309-L347
[3] https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#layout
    https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#compaction
[4] https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#footer

  parent reply	other threads:[~2018-05-11  9:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-09 14:33 Implementing reftable in Git Christian Couder
2018-05-09 14:52 ` Derrick Stolee
2018-05-09 16:07 ` Duy Nguyen
2018-05-09 16:48 ` Jonathan Nieder
2018-05-09 17:51   ` Carlos Martín Nieto
2018-05-09 17:54     ` Jonathan Nieder
2018-05-09 18:05       ` Carlos Martín Nieto
2018-05-09 17:42 ` Stefan Beller
2018-05-09 17:48   ` Jonathan Nieder
2018-05-09 17:55     ` Stefan Beller
2018-05-09 18:52   ` Ævar Arnfjörð Bjarmason
2018-05-11  9:31 ` Michael Haggerty [this message]
2018-05-11 22:21   ` David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMy9T_H=_+9Z=CpX85Ma4gCyUuvNAPR7fSBHi2J=4nC1XzF2sg@mail.gmail.com' \
    --to=mhagger@alum.mit.edu \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=novalis@novalis.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.