All of lore.kernel.org
 help / color / mirror / Atom feed
* Pluggable backends for refs,wip
@ 2014-08-05 21:40 Ronnie Sahlberg
  2014-08-05 21:56 ` Nico Williams
  2014-08-07 12:57 ` Michael Haggerty
  0 siblings, 2 replies; 6+ messages in thread
From: Ronnie Sahlberg @ 2014-08-05 21:40 UTC (permalink / raw)
  To: git, Michael Haggerty

List, Michael,

Please see
https://github.com/rsahlberg/git/tree/backend-struct-db-2
for an example of a pluggable backend for refs storage.

This series contain changes to make it possible to add new backends
for handling/storage of refs and implements one new backend :
refs-be-be.c .

This new backend offloads the actual refs handling to a small database
daemon with which ita talks via a very simple rpc protocol. That
daemon in turn then connects to the datastore and read/writes the
values to it.

By having an always running database daemon it will allow faster
startup of the git commands since they will now only need to connect
to a domain socket instead of having to traverse a potentially very
large number of files during the "build ref cache" phase.
Another nice feature is that it can allow running one single database
daemon and use it to host the refs for multiple independent git
repositoris (by using the new repository name config to distinguish
between them).

It can not yet apply to origin/* since it is based on some small
series that have yet not arrived there
and is still a wip. But if you want to test/look at what we could be
doing one day, please feel free to clone this repo.


FAQ:
Q:
This sound cool. How do I test this?

A:
1, Clone https://github.com/rsahlberg/git/tree/backend-struct-db-2 and
build git.

2, gcc refsd-tdb.c -o refsd-tdb -l tdb
3, ./refsd-tdb /tmp/refsd.socket /tmp /tmp/refsd.log

4, git clone --db-repo-name=ROCKet --db-socket=/tmp/refsd.socket <some-repo> foo

./foo should now contain a git repository that store its refs in a
separate database.
(teh databases are store under /tmp  so don't use this for anything
important because bad things happens to things stored under /tmp)


regards
ronnie sahlberg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pluggable backends for refs,wip
  2014-08-05 21:40 Pluggable backends for refs,wip Ronnie Sahlberg
@ 2014-08-05 21:56 ` Nico Williams
  2014-08-05 22:23   ` Ronnie Sahlberg
  2014-08-07 12:57 ` Michael Haggerty
  1 sibling, 1 reply; 6+ messages in thread
From: Nico Williams @ 2014-08-05 21:56 UTC (permalink / raw)
  To: Ronnie Sahlberg; +Cc: git, Michael Haggerty

Personally (a user of, not a maintainer of, git) I really want some
alternative backends.  In particular I'm after something like Fossil's
use of SQLite3; I want a SQLite3 backend for several reasons, not the
least of which is the power of SQL for looking at history.

I'm not sure that I necessarily want a daemon/background process.  I
get the appeal (add inotify and bingo, very fast git status, always),
but it seems likely to add obnoxious failure modes.

As to a SQLite3-type backend, I am of two minds: either add it as a
bolt-on to the builtin backend, or add it as a first-class backend
that replaces the builtin one.  The former is nice because the SQLite3
DB becomes more of a cache/index and query engine than a store, and
can be used without migrating any repos, but the latter is also nice
because SQLite3 provides strong ACID transactional semantics on local
filesystems.

Nico
--

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pluggable backends for refs,wip
  2014-08-05 21:56 ` Nico Williams
@ 2014-08-05 22:23   ` Ronnie Sahlberg
  2014-08-05 22:28     ` Nico Williams
  0 siblings, 1 reply; 6+ messages in thread
From: Ronnie Sahlberg @ 2014-08-05 22:23 UTC (permalink / raw)
  To: Nico Williams; +Cc: git, Michael Haggerty

On Tue, Aug 5, 2014 at 2:56 PM, Nico Williams <nico@cryptonector.com> wrote:
> Personally (a user of, not a maintainer of, git) I really want some
> alternative backends.  In particular I'm after something like Fossil's
> use of SQLite3; I want a SQLite3 backend for several reasons, not the
> least of which is the power of SQL for looking at history.
>
> I'm not sure that I necessarily want a daemon/background process.  I
> get the appeal (add inotify and bingo, very fast git status, always),
> but it seems likely to add obnoxious failure modes.
>
> As to a SQLite3-type backend, I am of two minds: either add it as a
> bolt-on to the builtin backend, or add it as a first-class backend
> that replaces the builtin one.  The former is nice because the SQLite3
> DB becomes more of a cache/index and query engine than a store, and
> can be used without migrating any repos, but the latter is also nice
> because SQLite3 provides strong ACID transactional semantics on local
> filesystems.


This will allow you to do either or both, depending on what you want.

I am adding one new first-class backend to talk to a separate daemon :
  refs-be-db.c
which then talks to a separate daemon   refsd-tdb.c

refsd-tdb.c is 7 RPCs and ~500 lines of code for a naive
implementation for a standalone separate daemon implementation.


If you rather want want a new first-class backend builtin to git
itself instead of as a separate daemon, then that will be possible
too.
It just means that you will have to base the work on refs-be-db.c
which is a much larger and complex code base than refsd-tdb.c.

But yeah, once this work is finished, you will be able to build new
first-class ref backends if you so wish.
Please see refs-be-db.c  that is the file and the methods you will
need to implement in order to have a first-class SQL* backend.


regards
ronnie sahlberg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pluggable backends for refs,wip
  2014-08-05 22:23   ` Ronnie Sahlberg
@ 2014-08-05 22:28     ` Nico Williams
  0 siblings, 0 replies; 6+ messages in thread
From: Nico Williams @ 2014-08-05 22:28 UTC (permalink / raw)
  To: Ronnie Sahlberg; +Cc: git, Michael Haggerty

Excellent.  Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pluggable backends for refs,wip
  2014-08-05 21:40 Pluggable backends for refs,wip Ronnie Sahlberg
  2014-08-05 21:56 ` Nico Williams
@ 2014-08-07 12:57 ` Michael Haggerty
  2014-08-08 15:53   ` Ronnie Sahlberg
  1 sibling, 1 reply; 6+ messages in thread
From: Michael Haggerty @ 2014-08-07 12:57 UTC (permalink / raw)
  To: Ronnie Sahlberg, git

On 08/05/2014 02:40 PM, Ronnie Sahlberg wrote:
> Please see
> https://github.com/rsahlberg/git/tree/backend-struct-db-2
> for an example of a pluggable backend for refs storage.
> 
> This series contain changes to make it possible to add new backends
> for handling/storage of refs and implements one new backend :
> refs-be-be.c .
> 
> This new backend offloads the actual refs handling to a small database
> daemon with which ita talks via a very simple rpc protocol. That
> daemon in turn then connects to the datastore and read/writes the
> values to it.
> [...]

Ronnie,

This is awesome!  Congratulations on your progress.

I'm still on vacation and haven't yet looked at the code.  I will be
back next week and hope to find time to check it out, and also to do
some more review of the code that you have already submitted to git core.


Have you thought about how to test alternate reference backends?  This
will be very important to getting one or more of them accepted into git
core (not to mention giving people confidence to actually *use* them!)

It seems to me that a few steps are needed:

* Each backend would need a suite of backend-aware tests that verify
proper operation *within* the backend.  These tests would mostly use
low-level plumbing commands like update-refs to create/modify/delete
references, and would be allowed to grub around in the filesystem, talk
directly with the database, etc. to make sure that the commands have the
correct effects.  For example, for the traditional filesystem backend,
these tests would be the ones to check that creating a reference causes
a file to spring into existence under $GIT_DIR/refs.

The tests for pack-refs, and all tests that care about the distinction
between packed and loose refs, would become part of the backend-aware
tests for the filesystem backend.

All of the backend-aware tests should be run every time the test suite
is run (provided, of course, that the correct prerequisites are
available, and subject to being turned off manually).

* The rest of the test suite has to be made backend-agnostic.  For
example, such tests should *not* be allowed to look under $GIT_DIR for
the existence/absence of loose reference files [1] but would rather have
to inquire about references via git commands.

* It should be possible for the developer to choose easily which
reference backend to use when running the agnostic part of the test
suite.  The chosen backend should be used to run *all* backend-agnostic
tests.

A database-backed backend might even want to be testable in two modes:
one with the DB daemon running constantly, and one where the daemon is
stopped and started between each pair of Git commands.

So after the changes, a single run of the test suite should run the
backend-aware tests for *all* known backends followed by the
backend-agnostic tests for a single selected backend.

Michael

[1] When I was working on my quagga-reference spike [2] I found that a
lot of the test suite uses knowledge about how references and reflogs
are stored by the filesystem backend and just grabs at the files rather
than accessing the references using git commands.  It will take some
work to clean this up.

[2] http://thread.gmane.org/gmane.comp.version-control.git/243726

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pluggable backends for refs,wip
  2014-08-07 12:57 ` Michael Haggerty
@ 2014-08-08 15:53   ` Ronnie Sahlberg
  0 siblings, 0 replies; 6+ messages in thread
From: Ronnie Sahlberg @ 2014-08-08 15:53 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git

On Thu, Aug 7, 2014 at 5:57 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 08/05/2014 02:40 PM, Ronnie Sahlberg wrote:
>> Please see
>> https://github.com/rsahlberg/git/tree/backend-struct-db-2
>> for an example of a pluggable backend for refs storage.
>>
>> This series contain changes to make it possible to add new backends
>> for handling/storage of refs and implements one new backend :
>> refs-be-be.c .
>>
>> This new backend offloads the actual refs handling to a small database
>> daemon with which ita talks via a very simple rpc protocol. That
>> daemon in turn then connects to the datastore and read/writes the
>> values to it.
>> [...]
>
> Ronnie,
>
> This is awesome!  Congratulations on your progress.
>
> I'm still on vacation and haven't yet looked at the code.  I will be
> back next week and hope to find time to check it out, and also to do
> some more review of the code that you have already submitted to git core.

Thanks!

>
>
> Have you thought about how to test alternate reference backends?  This
> will be very important to getting one or more of them accepted into git
> core (not to mention giving people confidence to actually *use* them!)

I have thought about it and also done some experiments.
For the initial git support, I think we first should try to get the
pluggable backend support
into git, and also the work to change the current files backend into a
built-in pluggable backend.

I.e. get everything in the
https://github.com/rsahlberg/git/tree/backend-struct-db-2
branch except the last three patches.
That brings us to a stage where we have pluggable backend support and
we have one backend, the files backend, that works just like today.

The last three patches in that series are then just confirmation that
the pluggable backend approach works and we can add that a little
later once we finish tests and other things.



For tests there are the issues with "git-clone" and "git-init"
requiring two additional arguments in order to set up and initialize a
repository to use the "database daemon backend".
Other future backends I would imagine would have similar needs.
The way I handle in the experiments I did was to use two new
environment variables GIT_INIT and GIT_CLONE that would default to
"git-clone" and "git-init" respectively
and then just override them with GIT_INIT="git-init
--db-repo-name=ROCKy --db-socket=/tmp/refsd.socket" when I wanted the
tests to initialize a "database backend" repository.
This required some updates to test-lib.sh and test-lib-functions.sh as
well as the tests themself to use ${GIT_INIT} instead of git-init
directly.

I am not sure what is the best approach here is and would love if you
could help out with this once we get the basic pluggable backend stuff
in.



>
> It seems to me that a few steps are needed:
>
> * Each backend would need a suite of backend-aware tests that verify
> proper operation *within* the backend.  These tests would mostly use
> low-level plumbing commands like update-refs to create/modify/delete
> references, and would be allowed to grub around in the filesystem, talk
> directly with the database, etc. to make sure that the commands have the
> correct effects.  For example, for the traditional filesystem backend,
> these tests would be the ones to check that creating a reference causes
> a file to spring into existence under $GIT_DIR/refs.

Yes.
Quite a few tests do muck around with the files directly. Some for
good reasons but I think there are a lot of cases where the tests do
it just out of convenience.

For this we will need to convert the tests that don't strictly need to
muck around with the files to use a backend agnostic method to do the
same checks.
For the tests that are truly testing the backend itself, such as a
hypothetical test to check that a symbolic link to a ref behaves as it
should, we will need a mechanism where we can conditionalize the tests
based on what is the current backend.
So lots of "if backend == database then skip this test"


>
> The tests for pack-refs, and all tests that care about the distinction
> between packed and loose refs, would become part of the backend-aware
> tests for the filesystem backend.
>
> All of the backend-aware tests should be run every time the test suite
> is run (provided, of course, that the correct prerequisites are
> available, and subject to being turned off manually).
>
> * The rest of the test suite has to be made backend-agnostic.  For
> example, such tests should *not* be allowed to look under $GIT_DIR for
> the existence/absence of loose reference files [1] but would rather have
> to inquire about references via git commands.
>
> * It should be possible for the developer to choose easily which
> reference backend to use when running the agnostic part of the test
> suite.  The chosen backend should be used to run *all* backend-agnostic
> tests.
>

Agree.
It would be great if we could work on this together.


> A database-backed backend might even want to be testable in two modes:
> one with the DB daemon running constantly, and one where the daemon is
> stopped and started between each pair of Git commands.
>
> So after the changes, a single run of the test suite should run the
> backend-aware tests for *all* known backends followed by the
> backend-agnostic tests for a single selected backend.

ACK.

>
> Michael
>
> [1] When I was working on my quagga-reference spike [2] I found that a
> lot of the test suite uses knowledge about how references and reflogs
> are stored by the filesystem backend and just grabs at the files rather
> than accessing the references using git commands.  It will take some
> work to clean this up.
>
> [2] http://thread.gmane.org/gmane.comp.version-control.git/243726
>
> --
> Michael Haggerty
> mhagger@alum.mit.edu
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-08-08 15:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-05 21:40 Pluggable backends for refs,wip Ronnie Sahlberg
2014-08-05 21:56 ` Nico Williams
2014-08-05 22:23   ` Ronnie Sahlberg
2014-08-05 22:28     ` Nico Williams
2014-08-07 12:57 ` Michael Haggerty
2014-08-08 15:53   ` Ronnie Sahlberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.