RFC/Pull Request: Refs db backend

* RFC/Pull Request: Refs db backend
@ 2015-06-23  0:50 David Turner
  2015-06-23  5:36 ` Junio C Hamano
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: David Turner @ 2015-06-23  0:50 UTC (permalink / raw)
  To: git mailing list

I've revived and modified Ronnie Sahlberg's work on the refs db
backend.  

The work is on top of be3c13e5564, Junio's "First batch for 2.5 cycle".
I recognize that there have been changes to the refs code since then,
and that there are some further changes in-flight from e.g. Michael
Haggerty.  If there is interest in this, I can rebase once Michael's
changes land.

The changes can be found here:
https://github.com/dturner-tw/git.git on the dturner/pluggable-backends
branch

The db backend code was added in the penultimate commit; the rest is
just code rearrangement and minor changes to make alternate backends
possible.  There ended up being a fair amount of this rearrangement, but
the end result is that almost the entire git test suite runs under the
db backend without error (see below for details).

The db backend runs git for-each-ref about 30% faster than the files
backend with fully-packed refs on a repo with ~120k refs.  It's also
about 4x faster than using fully-unpacked refs.  In addition, and
perhaps more importantly, it avoids case-conflict issues on OS X.

I chose to use LMDB for the database.  LMDB has a few features that make
it suitable for usage in git:

1. It is relatively lightweight; it requires only one header file, and
the library itself is under 300k (as opposed to 700k for
e.g. sqlite).

2. It is well-tested: it's been used in OpenLDAP for years.

3. It's very fast.  LMDB's benchmarks show that it is among
the fastest key-value stores.

4. It has a relatively simple concurrency story; readers don't
block writers and writers don't block readers.

Ronnie Sahlberg's original version of this patchset used tdb.  The
advantage of tdb is that it's smaller (~125k).  The disadvantages are
that tdb is hard to build on OS X.  It's also not in homebrew.  So lmdb
seemed simpler.

To test this backend's correctness, I hacked test-lib.sh and
test-lib-functions.sh to run all tests under the refs backend. Dozens
of tests use manual ref/reflog reading/writing, or create submodules
without passing --refs-backend-type to git init.  If those tests are
changed to use the update-ref machinery or test-refs-be-db (or, in the
case of packed-refs, corrupt refs, and dumb fetch tests, are skipped),
the only remaining failing tests are the git-new-workdir tests and the
gitweb tests.  

Please let me know how it would be best to proceed. 

^ permalink raw reply	[flat|nested] 26+ messages in thread