All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/40] Add initial experimental external ODB support
@ 2017-08-03  9:18 Christian Couder
  2017-08-03  9:18 ` [PATCH v5 01/40] builtin/clone: get rid of 'value' strbuf Christian Couder
                   ` (40 more replies)
  0 siblings, 41 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Note: a lot of information about the goals, the design and how things
work is now in patch 35/40 "Add
Documentation/technical/external-odb.txt" of this v5 series.

Goal
~~~~

Git can store its objects only in the form of loose objects in
separate files or packed objects in a pack file.

To be able to better handle some kind of objects, for example big
blobs, it would be nice if Git could store its objects in other object
databases (ODB).

To do that, this patch series makes it possible to register commands,
also called "helpers", using "odb.<odbname>.scriptCommand" or
"odb.<odbname>.subprocessCommand" config variables, to access external
ODBs where objects can be stored and retrieved.

Design
~~~~~~

* The "helpers" (registered commands)

Each helper manages access to one external ODB.

There are 2 different modes for helper:

  - Helpers configured using "odb.<odbname>.scriptCommand" are
    launched each time Git wants to communicate with the <odbname>
    external ODB. This is called "script mode".

  - Helpers configured using "odb.<odbname>.subprocessCommand" are
    launched launched once as a sub-process (using sub-process.h), and
    Git communicates with them using packet lines. This is called
    "process mode".

A helper can be given different instructions by Git. The instructions
that are supported are negociated at the beginning of the
communication using a capability mechanism.

See patch 35/40 (the documentation patch) for more information about
the different instructions and their arguments.

* Performance

The process mode has been implemented using the refactoring that Ben
Peart did on top of Lars Schneider's work on using sub-processes and
packet lines in the smudge/clean filters for git-lfs.

This also uses further work from Ben Peart called "read object
process".

See:

https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/
https://public-inbox.org/git/20170322165220.5660-1-benpeart@microsoft.com/

Ben recently sent an update of this work but this update has not been
integrated into the current patch series. See:

https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/

Anyway thanks to this, the external ODB mechanism should in the end perform
as well as the git-lfs mechanism when many objects should be
transfered.

Implementation
~~~~~~~~~~~~~~

* Mechanism to call the registered commands

A set of function in external-odb.{c,h} are called by the rest of Git
to manage all the external ODBs.

These functions use 'struct odb_helper' and its associated functions
defined in odb-helper.{c,h} to talk to the different external ODBs by
launching the configured "odb.<odbname>.*command" commands and writing
to or reading from them.

* Transfering information

To tranfer information about the blobs stored in external ODB, some
special refs, called "odb ref", similar as replace refs, are used in
the tests of this series, but in general nothing forces the helper to
use that mechanism.

The external odb helper is responsible for using and creating the refs
in refs/odbs/<odbname>/, if it wants to do that. It is free for example
to just create one ref, as it is also free to create many refs. Git
would just transmit the refs that have been created by this helper, if
Git is asked to do so.

For now in the tests there is one odb ref per blob, as it is simple
and as it is similar to what git-lfs does. Each ref name is
refs/odbs/<odbname>/<sha1> where <sha1> is the sha1 of the blob stored
in the external odb named <odbname>.

These odb refs point to a blob that is stored in the Git
repository and contain information about the blob stored in the
external odb. This information can be specific to the external odb.
The repos can then share this information using commands like:

`git fetch origin "refs/odbs/<odbname>/*:refs/odbs/<odbname>/*"`

At the end of the current patch series, "git clone" is teached a
"--initial-refspec" option, that asks it to first fetch some specified
refs. This is used in the tests to fetch the odb refs first.

This way only one "git clone" command can setup a repo using the
external ODB mechanism as long as the right helper is installed on the
machine and as long as the following options are used:

  - "--initial-refspec <odbrefspec>" to fetch the odb refspec
  - "-c odb.<odbname>.command=<helper>" to configure the helper

There is also a test script that shows that the "--initial-refspec"
option along with the external ODB mechanism can be used to implement
cloning using bundles.

* ODB refs

For now odb ref management is only implemented in a helper in t0410.

When a new blob is added to an external odb, its sha1, size and type
are writen in another new blob and the odb ref is created.

When the list of existing blobs is requested from the external odb,
the content of the blobs pointed to by the odb refs can also be used
by the odb to claim that it can get the objects.

When a blob is actually requested from the external odb, it can use
the content stored in the blobs pointed to by the odb refs to get the
actual blobs and then pass them.

Highlevel view of the patches in the series
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    - Patch 1/40 is a small code cleanup that I already sent to the
      mailing list but will probably be removed in the end due to
      ongoing work on "git clone".

    - Patches 02/40 to 08/40 create a Git/Packet.pm module by
      refactoring "t0021/rot13-filter.pl". Functions from this new
      module will be used later in test scripts.

    - Patches 09/40 to 17/40 create the external ODB insfrastructure
      in external-odb.{c,h} and odb-helper.{c,h} for the script mode.

    - Patches 18/40 to 24/40 improve lib-http to make it possible to
      use it as an external ODB to test storing blobs in an HTTP
      server.

    - Patches 25/40 to 33/40 improve the external ODB insfrastructure
      to support sub-processes and make everything work using them.

    - Patch 34/40 uses attributes to mark blobs that should be handled
      by an external odb.

    - Patch 35/40 adds documentation about the external odb mechanism.

    - Patches 36/40 to 40/40 add the --initial-refspec to git clone
      along with tests.

Big changes since the previous patch series
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  - The "odb.<odbname>.scriptMode" and "odb.<odbname>.command" options
    have been replaced with "odb.<odbname>.scriptCommand" and
    "odb.<odbname>.subprocessCommand".

  - Capabilities are used to decide which kind of "get" will be
    used. So there is now 'get_raw_obj', 'get_git_obj' and
    'get_direct' instead of just 'get' and "odb.<odbname>.fetchKind".

  - An "init" instruction has been added and is the only required
    instruction for any helper to implement. It replaces the "get_cap"
    instruction that was only available in script node, and it helps
    the process mode too, as it makes it possible for Git to know the
    capabilities before trying to send any instruction (that might not
    be supported by the helper).

  - An attributes based mechanism has been added to mark files that
    should be handled by an external odb. See patch 34/40
    "external-odb: use 'odb=magic' attribute to mark odb blobs"

  - A lot of functions, structures and variables have been
    renamed. The "read-object-process" mechanism and related names
    that came from Ben's work have been renamed or prefixed using just
    "object-process" or just "process" as this is not about reading
    only and the instructions are called 'get_*' not 'read_*'. Names
    related to script mode have been renamed or prefixed using just
    "object-script" or just "script".

  - Documentation and commit messages are much improved.

Future work
~~~~~~~~~~~

There are still many things that could be cleaned or improved, but I
think that now the series is in a good enough state to not be RFC
anymore.

Things I think I may work on:

  - Integrate recent "read-object-process" work by Ben Peart.

  - Look at possible short-read and better checking objects Git
    receives.

  - Better test all the combinations of the different modes with and
    without "have" and "put_*" instructions.

  - Maybe implement the missing kinds of 'put' ('put_git_obj' and
    'put_direct'), so that Git could pass either a git object a plain
    object or ask the helper to retreive it directly from Git's object
    database.

  - Add more long running tests and improve tests in general.

Previous work and discussions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(Sorry for the old Gmane links, I hope I will try to replace them with
public-inbox.org at one point.)

Peff started to work on this and discuss this some years ago:

http://thread.gmane.org/gmane.comp.version-control.git/206886/focus=207040
http://thread.gmane.org/gmane.comp.version-control.git/247171
http://thread.gmane.org/gmane.comp.version-control.git/202902/focus=203020

His work, which is not compile-tested any more, is still there:

https://github.com/peff/git/commits/jk/external-odb-wip

Initial discussions about this new series are there:

http://thread.gmane.org/gmane.comp.version-control.git/288151/focus=295160

Version 1, 2, 3 and 4 of this series are here:

https://public-inbox.org/git/20160613085546.11784-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20160628181933.24620-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20161130210420.15982-1-chriscool@tuxfamily.org/
https://public-inbox.org/git/20170620075523.26961-1-chriscool@tuxfamily.org/

Some of the discussions related to Ben Peart's work that is used by
this series are here:

https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/
https://public-inbox.org/git/20170322165220.5660-1-benpeart@microsoft.com/
https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/

Links
~~~~~

This patch series is available here:

https://github.com/chriscool/git/commits/external-odb

Version 1, 2, 3 and 4 are here:

https://github.com/chriscool/git/commits/gl-external-odb12
https://github.com/chriscool/git/commits/gl-external-odb22
https://github.com/chriscool/git/commits/gl-external-odb61
https://github.com/chriscool/git/commits/gl-external-odb239

Ben Peart (2):
  odb-helper: add init_object_process()
  Add t0450 to test 'get_direct' mechanism

Christian Couder (38):
  builtin/clone: get rid of 'value' strbuf
  t0021/rot13-filter: refactor packet reading functions
  t0021/rot13-filter: improve 'if .. elsif .. else' style
  Add Git/Packet.pm from parts of t0021/rot13-filter.pl
  t0021/rot13-filter: use Git/Packet.pm
  Git/Packet.pm: improve error message
  Git/Packet.pm: add packet_initialize()
  Git/Packet.pm: add capability functions
  sha1_file: prepare for external odbs
  Add initial external odb support
  odb-helper: add odb_helper_init() to send 'init' instruction
  t0400: add 'put_raw_obj' instruction to odb-helper script
  external odb: add 'put_raw_obj' support
  external-odb: accept only blobs for now
  t0400: add test for external odb write support
  Add GIT_NO_EXTERNAL_ODB env variable
  Add t0410 to test external ODB transfer
  lib-httpd: pass config file to start_httpd()
  lib-httpd: add upload.sh
  lib-httpd: add list.sh
  lib-httpd: add apache-e-odb.conf
  odb-helper: add odb_helper_get_raw_object()
  pack-objects: don't pack objects in external odbs
  Add t0420 to test transfer to HTTP external odb
  external-odb: add 'get_direct' support
  odb-helper: add 'script_mode' to 'struct odb_helper'
  Add t0460 to test passing git objects
  odb-helper: add put_object_process()
  Add t0470 to test passing raw objects
  odb-helper: add have_object_process()
  Add t0480 to test "have" capability and raw objects
  external-odb: use 'odb=magic' attribute to mark odb blobs
  Add Documentation/technical/external-odb.txt
  clone: add 'initial' param to write_remote_refs()
  clone: add --initial-refspec option
  clone: disable external odb before initial clone
  Add tests for 'clone --initial-refspec'
  Add t0430 to test cloning using bundles

 Documentation/technical/external-odb.txt |  295 +++++++++
 Makefile                                 |    2 +
 builtin/clone.c                          |   91 ++-
 builtin/pack-objects.c                   |    4 +
 cache.h                                  |   18 +
 environment.c                            |    4 +
 external-odb.c                           |  196 ++++++
 external-odb.h                           |   12 +
 odb-helper.c                             | 1067 ++++++++++++++++++++++++++++++
 odb-helper.h                             |   42 ++
 perl/Git/Packet.pm                       |  118 ++++
 sha1_file.c                              |  161 +++--
 t/lib-httpd.sh                           |    8 +-
 t/lib-httpd/apache-e-odb.conf            |  214 ++++++
 t/lib-httpd/list.sh                      |   41 ++
 t/lib-httpd/upload.sh                    |   45 ++
 t/t0021/rot13-filter.pl                  |   97 +--
 t/t0400-external-odb.sh                  |   85 +++
 t/t0410-transfer-e-odb.sh                |  147 ++++
 t/t0420-transfer-http-e-odb.sh           |  152 +++++
 t/t0430-clone-bundle-e-odb.sh            |   85 +++
 t/t0450-read-object.sh                   |   28 +
 t/t0450/read-object                      |   68 ++
 t/t0460-read-object-git.sh               |   28 +
 t/t0460/read-object-git                  |   78 +++
 t/t0470-read-object-http-e-odb.sh        |  119 ++++
 t/t0470/read-object-plain                |   83 +++
 t/t0480-read-object-have-http-e-odb.sh   |  119 ++++
 t/t0480/read-object-plain-have           |  103 +++
 t/t5616-clone-initial-refspec.sh         |   48 ++
 30 files changed, 3423 insertions(+), 135 deletions(-)
 create mode 100644 Documentation/technical/external-odb.txt
 create mode 100644 external-odb.c
 create mode 100644 external-odb.h
 create mode 100644 odb-helper.c
 create mode 100644 odb-helper.h
 create mode 100644 perl/Git/Packet.pm
 create mode 100644 t/lib-httpd/apache-e-odb.conf
 create mode 100644 t/lib-httpd/list.sh
 create mode 100644 t/lib-httpd/upload.sh
 create mode 100755 t/t0400-external-odb.sh
 create mode 100755 t/t0410-transfer-e-odb.sh
 create mode 100755 t/t0420-transfer-http-e-odb.sh
 create mode 100755 t/t0430-clone-bundle-e-odb.sh
 create mode 100755 t/t0450-read-object.sh
 create mode 100755 t/t0450/read-object
 create mode 100755 t/t0460-read-object-git.sh
 create mode 100755 t/t0460/read-object-git
 create mode 100755 t/t0470-read-object-http-e-odb.sh
 create mode 100755 t/t0470/read-object-plain
 create mode 100755 t/t0480-read-object-have-http-e-odb.sh
 create mode 100755 t/t0480/read-object-plain-have
 create mode 100755 t/t5616-clone-initial-refspec.sh

-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 01/40] builtin/clone: get rid of 'value' strbuf
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-08-03  9:18 ` [PATCH v5 02/40] t0021/rot13-filter: refactor packet reading functions Christian Couder
                   ` (39 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This makes the code simpler by removing a few lines, and getting
rid of one variable.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/clone.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 08b5cc433c..4b5340c55f 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -871,7 +871,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	const struct ref *our_head_points_at;
 	struct ref *mapped_refs;
 	const struct ref *ref;
-	struct strbuf key = STRBUF_INIT, value = STRBUF_INIT;
+	struct strbuf key = STRBUF_INIT;
 	struct strbuf branch_top = STRBUF_INIT, reflog_msg = STRBUF_INIT;
 	struct transport *transport = NULL;
 	const char *src_ref_prefix = "refs/heads/";
@@ -1036,7 +1036,6 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		strbuf_addf(&branch_top, "refs/remotes/%s/", option_origin);
 	}
 
-	strbuf_addf(&value, "+%s*:%s*", src_ref_prefix, branch_top.buf);
 	strbuf_addf(&key, "remote.%s.url", option_origin);
 	git_config_set(key.buf, repo);
 	strbuf_reset(&key);
@@ -1050,10 +1049,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (option_required_reference.nr || option_optional_reference.nr)
 		setup_reference();
 
-	fetch_pattern = value.buf;
+	fetch_pattern = xstrfmt("+%s*:%s*", src_ref_prefix, branch_top.buf);
 	refspec = parse_fetch_refspec(1, &fetch_pattern);
-
-	strbuf_reset(&value);
+	free((char *)fetch_pattern);
 
 	remote = remote_get(option_origin);
 	transport = transport_get(remote, remote->url[0]);
@@ -1192,7 +1190,6 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	strbuf_release(&reflog_msg);
 	strbuf_release(&branch_top);
 	strbuf_release(&key);
-	strbuf_release(&value);
 	junk_mode = JUNK_LEAVE_ALL;
 
 	free(refspec);
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 02/40] t0021/rot13-filter: refactor packet reading functions
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
  2017-08-03  9:18 ` [PATCH v5 01/40] builtin/clone: get rid of 'value' strbuf Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-08-03  9:18 ` [PATCH v5 03/40] t0021/rot13-filter: improve 'if .. elsif .. else' style Christian Couder
                   ` (38 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

To make it possible in a following commit to move packet
reading and writing functions into a Packet.pm module,
let's refactor these functions, so they don't handle
printing debug output and exiting.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0021/rot13-filter.pl | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 617f581e56..d6411ca523 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -39,8 +39,7 @@ sub packet_bin_read {
 	my $bytes_read = read STDIN, $buffer, 4;
 	if ( $bytes_read == 0 ) {
 		# EOF - Git stopped talking to us!
-		print $debug "STOP\n";
-		exit();
+		return ( -1, "" );
 	}
 	elsif ( $bytes_read != 4 ) {
 		die "invalid packet: '$buffer'";
@@ -64,7 +63,7 @@ sub packet_bin_read {
 
 sub packet_txt_read {
 	my ( $res, $buf ) = packet_bin_read();
-	unless ( $buf =~ s/\n$// ) {
+	unless ( $res == -1 || $buf =~ s/\n$// ) {
 		die "A non-binary line MUST be terminated by an LF.";
 	}
 	return ( $res, $buf );
@@ -109,7 +108,12 @@ print $debug "init handshake complete\n";
 $debug->flush();
 
 while (1) {
-	my ($command) = packet_txt_read() =~ /^command=(.+)$/;
+	my ($res, $command) = packet_txt_read();
+	if ( $res == -1 ) {
+		print $debug "STOP\n";
+		exit();
+	}
+	$command =~ s/^command=//;
 	print $debug "IN: $command";
 	$debug->flush();
 
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 03/40] t0021/rot13-filter: improve 'if .. elsif .. else' style
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
  2017-08-03  9:18 ` [PATCH v5 01/40] builtin/clone: get rid of 'value' strbuf Christian Couder
  2017-08-03  9:18 ` [PATCH v5 02/40] t0021/rot13-filter: refactor packet reading functions Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-08-03  9:18 ` [PATCH v5 04/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl Christian Couder
                   ` (37 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Before further refactoring the "t0021/rot13-filter.pl" script,
let's modernize the style of its 'if .. elsif .. else' clauses
to improve its readability by making it more similar to our
other perl scripts.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0021/rot13-filter.pl | 27 +++++++++------------------
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index d6411ca523..1fc581c814 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -40,23 +40,20 @@ sub packet_bin_read {
 	if ( $bytes_read == 0 ) {
 		# EOF - Git stopped talking to us!
 		return ( -1, "" );
-	}
-	elsif ( $bytes_read != 4 ) {
+	} elsif ( $bytes_read != 4 ) {
 		die "invalid packet: '$buffer'";
 	}
 	my $pkt_size = hex($buffer);
 	if ( $pkt_size == 0 ) {
 		return ( 1, "" );
-	}
-	elsif ( $pkt_size > 4 ) {
+	} elsif ( $pkt_size > 4 ) {
 		my $content_size = $pkt_size - 4;
 		$bytes_read = read STDIN, $buffer, $content_size;
 		if ( $bytes_read != $content_size ) {
 			die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
 		}
 		return ( 0, $buffer );
-	}
-	else {
+	} else {
 		die "invalid packet size: $pkt_size";
 	}
 }
@@ -144,14 +141,11 @@ while (1) {
 	my $output;
 	if ( $pathname eq "error.r" or $pathname eq "abort.r" ) {
 		$output = "";
-	}
-	elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
+	} elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
 		$output = rot13($input);
-	}
-	elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
+	} elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
 		$output = rot13($input);
-	}
-	else {
+	} else {
 		die "bad command '$command'";
 	}
 
@@ -163,14 +157,12 @@ while (1) {
 		$debug->flush();
 		packet_txt_write("status=error");
 		packet_flush();
-	}
-	elsif ( $pathname eq "abort.r" ) {
+	} elsif ( $pathname eq "abort.r" ) {
 		print $debug "[ABORT]\n";
 		$debug->flush();
 		packet_txt_write("status=abort");
 		packet_flush();
-	}
-	else {
+	} else {
 		packet_txt_write("status=success");
 		packet_flush();
 
@@ -187,8 +179,7 @@ while (1) {
 			print $debug ".";
 			if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
 				$output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
-			}
-			else {
+			} else {
 				$output = "";
 			}
 		}
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 04/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (2 preceding siblings ...)
  2017-08-03  9:18 ` [PATCH v5 03/40] t0021/rot13-filter: improve 'if .. elsif .. else' style Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-08-03 19:11   ` Junio C Hamano
  2017-08-03  9:18 ` [PATCH v5 05/40] t0021/rot13-filter: use Git/Packet.pm Christian Couder
                   ` (36 subsequent siblings)
  40 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This will make it possible to reuse packet reading and writing
functions in other test scripts.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 perl/Git/Packet.pm | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100644 perl/Git/Packet.pm

diff --git a/perl/Git/Packet.pm b/perl/Git/Packet.pm
new file mode 100644
index 0000000000..aaffecbe2a
--- /dev/null
+++ b/perl/Git/Packet.pm
@@ -0,0 +1,71 @@
+package Git::Packet;
+use 5.008;
+use strict;
+use warnings;
+BEGIN {
+	require Exporter;
+	if ($] < 5.008003) {
+		*import = \&Exporter::import;
+	} else {
+		# Exporter 5.57 which supports this invocation was
+		# released with perl 5.8.3
+		Exporter->import('import');
+	}
+}
+
+our @EXPORT = qw(
+			packet_bin_read
+			packet_txt_read
+			packet_bin_write
+			packet_txt_write
+			packet_flush
+		);
+our @EXPORT_OK = @EXPORT;
+
+sub packet_bin_read {
+	my $buffer;
+	my $bytes_read = read STDIN, $buffer, 4;
+	if ( $bytes_read == 0 ) {
+		# EOF - Git stopped talking to us!
+		return ( -1, "" );
+	} elsif ( $bytes_read != 4 ) {
+		die "invalid packet: '$buffer'";
+	}
+	my $pkt_size = hex($buffer);
+	if ( $pkt_size == 0 ) {
+		return ( 1, "" );
+	} elsif ( $pkt_size > 4 ) {
+		my $content_size = $pkt_size - 4;
+		$bytes_read = read STDIN, $buffer, $content_size;
+		if ( $bytes_read != $content_size ) {
+			die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
+		}
+		return ( 0, $buffer );
+	} else {
+		die "invalid packet size: $pkt_size";
+	}
+}
+
+sub packet_txt_read {
+	my ( $res, $buf ) = packet_bin_read();
+	unless ( $res == -1 || $buf =~ s/\n$// ) {
+		die "A non-binary line MUST be terminated by an LF.";
+	}
+	return ( $res, $buf );
+}
+
+sub packet_bin_write {
+	my $buf = shift;
+	print STDOUT sprintf( "%04x", length($buf) + 4 );
+	print STDOUT $buf;
+	STDOUT->flush();
+}
+
+sub packet_txt_write {
+	packet_bin_write( $_[0] . "\n" );
+}
+
+sub packet_flush {
+	print STDOUT sprintf( "%04x", 0 );
+	STDOUT->flush();
+}
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 05/40] t0021/rot13-filter: use Git/Packet.pm
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (3 preceding siblings ...)
  2017-08-03  9:18 ` [PATCH v5 04/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-08-03  9:18 ` [PATCH v5 06/40] Git/Packet.pm: improve error message Christian Couder
                   ` (35 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

After creating Git/Packet.pm from part of t0021/rot13-filter.pl,
we can now simplify this script by using Git/Packet.pm.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0021/rot13-filter.pl | 51 +++----------------------------------------------
 1 file changed, 3 insertions(+), 48 deletions(-)

diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 1fc581c814..36a9eb3608 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -19,9 +19,12 @@
 #     same command.
 #
 
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
 use strict;
 use warnings;
 use IO::File;
+use Git::Packet;
 
 my $MAX_PACKET_CONTENT_SIZE = 65516;
 my @capabilities            = @ARGV;
@@ -34,54 +37,6 @@ sub rot13 {
 	return $str;
 }
 
-sub packet_bin_read {
-	my $buffer;
-	my $bytes_read = read STDIN, $buffer, 4;
-	if ( $bytes_read == 0 ) {
-		# EOF - Git stopped talking to us!
-		return ( -1, "" );
-	} elsif ( $bytes_read != 4 ) {
-		die "invalid packet: '$buffer'";
-	}
-	my $pkt_size = hex($buffer);
-	if ( $pkt_size == 0 ) {
-		return ( 1, "" );
-	} elsif ( $pkt_size > 4 ) {
-		my $content_size = $pkt_size - 4;
-		$bytes_read = read STDIN, $buffer, $content_size;
-		if ( $bytes_read != $content_size ) {
-			die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
-		}
-		return ( 0, $buffer );
-	} else {
-		die "invalid packet size: $pkt_size";
-	}
-}
-
-sub packet_txt_read {
-	my ( $res, $buf ) = packet_bin_read();
-	unless ( $res == -1 || $buf =~ s/\n$// ) {
-		die "A non-binary line MUST be terminated by an LF.";
-	}
-	return ( $res, $buf );
-}
-
-sub packet_bin_write {
-	my $buf = shift;
-	print STDOUT sprintf( "%04x", length($buf) + 4 );
-	print STDOUT $buf;
-	STDOUT->flush();
-}
-
-sub packet_txt_write {
-	packet_bin_write( $_[0] . "\n" );
-}
-
-sub packet_flush {
-	print STDOUT sprintf( "%04x", 0 );
-	STDOUT->flush();
-}
-
 print $debug "START\n";
 $debug->flush();
 
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 06/40] Git/Packet.pm: improve error message
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (4 preceding siblings ...)
  2017-08-03  9:18 ` [PATCH v5 05/40] t0021/rot13-filter: use Git/Packet.pm Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-08-03  9:18 ` [PATCH v5 07/40] Git/Packet.pm: add packet_initialize() Christian Couder
                   ` (34 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Try to give a bit more information when we die()
because there is no new line at the end of something
we receive.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 perl/Git/Packet.pm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/perl/Git/Packet.pm b/perl/Git/Packet.pm
index aaffecbe2a..2ad6b00d6c 100644
--- a/perl/Git/Packet.pm
+++ b/perl/Git/Packet.pm
@@ -49,7 +49,8 @@ sub packet_bin_read {
 sub packet_txt_read {
 	my ( $res, $buf ) = packet_bin_read();
 	unless ( $res == -1 || $buf =~ s/\n$// ) {
-		die "A non-binary line MUST be terminated by an LF.";
+		die "A non-binary line MUST be terminated by an LF.\n"
+		    . "Received: '$buf'";
 	}
 	return ( $res, $buf );
 }
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 07/40] Git/Packet.pm: add packet_initialize()
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (5 preceding siblings ...)
  2017-08-03  9:18 ` [PATCH v5 06/40] Git/Packet.pm: improve error message Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-08-03  9:18 ` [PATCH v5 08/40] Git/Packet.pm: add capability functions Christian Couder
                   ` (33 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Add a function to initialize the communication. And use this
function in 't/t0021/rot13-filter.pl'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 perl/Git/Packet.pm      | 13 +++++++++++++
 t/t0021/rot13-filter.pl |  8 +-------
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/perl/Git/Packet.pm b/perl/Git/Packet.pm
index 2ad6b00d6c..b0233caf37 100644
--- a/perl/Git/Packet.pm
+++ b/perl/Git/Packet.pm
@@ -19,6 +19,7 @@ our @EXPORT = qw(
 			packet_bin_write
 			packet_txt_write
 			packet_flush
+			packet_initialize
 		);
 our @EXPORT_OK = @EXPORT;
 
@@ -70,3 +71,15 @@ sub packet_flush {
 	print STDOUT sprintf( "%04x", 0 );
 	STDOUT->flush();
 }
+
+sub packet_initialize {
+	my ($name, $version) = @_;
+
+	( packet_txt_read() eq ( 0, $name . "-client" ) )	|| die "bad initialize";
+	( packet_txt_read() eq ( 0, "version=" . $version ) )	|| die "bad version";
+	( packet_bin_read() eq ( 1, "" ) )			|| die "bad version end";
+
+	packet_txt_write( $name . "-server" );
+	packet_txt_write( "version=" . $version );
+	packet_flush();
+}
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 36a9eb3608..5b05518640 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -40,13 +40,7 @@ sub rot13 {
 print $debug "START\n";
 $debug->flush();
 
-( packet_txt_read() eq ( 0, "git-filter-client" ) ) || die "bad initialize";
-( packet_txt_read() eq ( 0, "version=2" ) )         || die "bad version";
-( packet_bin_read() eq ( 1, "" ) )                  || die "bad version end";
-
-packet_txt_write("git-filter-server");
-packet_txt_write("version=2");
-packet_flush();
+packet_initialize("git-filter", 2);
 
 ( packet_txt_read() eq ( 0, "capability=clean" ) )  || die "bad capability";
 ( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 08/40] Git/Packet.pm: add capability functions
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (6 preceding siblings ...)
  2017-08-03  9:18 ` [PATCH v5 07/40] Git/Packet.pm: add packet_initialize() Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-08-03 19:14   ` Junio C Hamano
  2017-08-03  9:18 ` [PATCH v5 09/40] sha1_file: prepare for external odbs Christian Couder
                   ` (32 subsequent siblings)
  40 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Add functions to help read and write capabilities.
Use these functions in 't/t0021/rot13-filter.pl'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 perl/Git/Packet.pm      | 33 +++++++++++++++++++++++++++++++++
 t/t0021/rot13-filter.pl |  9 ++-------
 2 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/perl/Git/Packet.pm b/perl/Git/Packet.pm
index b0233caf37..4443b67724 100644
--- a/perl/Git/Packet.pm
+++ b/perl/Git/Packet.pm
@@ -20,6 +20,9 @@ our @EXPORT = qw(
 			packet_txt_write
 			packet_flush
 			packet_initialize
+			packet_read_capabilities
+			packet_write_capabilities
+			packet_read_and_check_capabilities
 		);
 our @EXPORT_OK = @EXPORT;
 
@@ -83,3 +86,33 @@ sub packet_initialize {
 	packet_txt_write( "version=" . $version );
 	packet_flush();
 }
+
+sub packet_read_capabilities {
+	my @cap;
+	while (1) {
+		my ( $res, $buf ) = packet_bin_read();
+		return ( $res, @cap ) if ( $res != 0 );
+		unless ( $buf =~ s/\n$// ) {
+			die "A non-binary line MUST be terminated by an LF.\n"
+			    . "Received: '$buf'";
+		}
+		die "bad capability buf: '$buf'" unless ( $buf =~ s/capability=// );
+		push @cap, $buf;
+	}
+}
+
+sub packet_read_and_check_capabilities {
+	my @local_caps = @_;
+	my @remote_res_caps = packet_read_capabilities();
+	my $res = shift @remote_res_caps;
+	my %remote_caps = map { $_ => 1 } @remote_res_caps;
+	foreach (@local_caps) {
+	    die "'$_' capability not available" unless (exists($remote_caps{$_}));
+	}
+	return $res;
+}
+
+sub packet_write_capabilities {
+	packet_txt_write( "capability=" . $_ ) foreach (@_);
+	packet_flush();
+}
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 5b05518640..bbfd52619d 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -42,14 +42,9 @@ $debug->flush();
 
 packet_initialize("git-filter", 2);
 
-( packet_txt_read() eq ( 0, "capability=clean" ) )  || die "bad capability";
-( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
-( packet_bin_read() eq ( 1, "" ) )                  || die "bad capability end";
+packet_read_and_check_capabilities("clean", "smudge");
+packet_write_capabilities(@capabilities);
 
-foreach (@capabilities) {
-	packet_txt_write( "capability=" . $_ );
-}
-packet_flush();
 print $debug "init handshake complete\n";
 $debug->flush();
 
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 09/40] sha1_file: prepare for external odbs
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (7 preceding siblings ...)
  2017-08-03  9:18 ` [PATCH v5 08/40] Git/Packet.pm: add capability functions Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-08-03  9:18 ` [PATCH v5 10/40] Add initial external odb support Christian Couder
                   ` (31 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

In the following commits we will need some functions that were
internal to sha1_file.c, so let's first make them non static
and declare them in "cache.h". While at it, let's rename
'create_tmpfile()' to 'create_object_tmpfile()' to make its
name less generic.

Let's also split out 'sha1_file_name_alt()' from
'sha1_file_name()' and 'open_sha1_file_alt()' from
'open_sha1_file()', as we will need both of these new
functions too.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 cache.h     |  8 ++++++++
 sha1_file.c | 57 ++++++++++++++++++++++++++++++++++++---------------------
 2 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/cache.h b/cache.h
index 71fe092644..06da3d8a3f 100644
--- a/cache.h
+++ b/cache.h
@@ -902,6 +902,12 @@ extern void check_repository_format(void);
  */
 extern const char *sha1_file_name(const unsigned char *sha1);
 
+/*
+ * Like sha1_file_name, but return the filename within a specific alternate
+ * object directory. Shares the same static buffer with sha1_file_name.
+ */
+extern const char *sha1_file_name_alt(const char *objdir, const unsigned char *sha1);
+
 /*
  * Return the name of the (local) packfile with the specified sha1 in
  * its name.  The return value is a pointer to memory that is
@@ -1205,6 +1211,8 @@ extern int do_check_packed_object_crc;
 
 extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned long size, const char *type);
 
+extern int create_object_tmpfile(struct strbuf *tmp, const char *filename);
+extern void close_sha1_file(int fd);
 extern int finalize_object_file(const char *tmpfile, const char *filename);
 
 extern int has_sha1_pack(const unsigned char *sha1);
diff --git a/sha1_file.c b/sha1_file.c
index b60ae15f70..d330996bc4 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -253,12 +253,12 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
 	}
 }
 
-const char *sha1_file_name(const unsigned char *sha1)
+const char *sha1_file_name_alt(const char *objdir, const unsigned char *sha1)
 {
 	static struct strbuf buf = STRBUF_INIT;
 
 	strbuf_reset(&buf);
-	strbuf_addf(&buf, "%s/", get_object_directory());
+	strbuf_addf(&buf, "%s/", objdir);
 
 	fill_sha1_path(&buf, sha1);
 	return buf.buf;
@@ -278,9 +278,14 @@ static const char *alt_sha1_path(struct alternate_object_database *alt,
 	return buf->buf;
 }
 
- char *odb_pack_name(struct strbuf *buf,
-		     const unsigned char *sha1,
-		     const char *ext)
+const char *sha1_file_name(const unsigned char *sha1)
+{
+	return sha1_file_name_alt(get_object_directory(), sha1);
+}
+
+char *odb_pack_name(struct strbuf *buf,
+		    const unsigned char *sha1,
+		    const char *ext)
 {
 	strbuf_reset(buf);
 	strbuf_addf(buf, "%s/pack/pack-%s.%s", get_object_directory(),
@@ -1727,24 +1732,14 @@ static int stat_sha1_file(const unsigned char *sha1, struct stat *st,
 	return -1;
 }
 
-/*
- * Like stat_sha1_file(), but actually open the object and return the
- * descriptor. See the caveats on the "path" parameter above.
- */
-static int open_sha1_file(const unsigned char *sha1, const char **path)
+static int open_sha1_file_alt(const unsigned char *sha1, const char **path)
 {
-	int fd;
 	struct alternate_object_database *alt;
-	int most_interesting_errno;
-
-	*path = sha1_file_name(sha1);
-	fd = git_open(*path);
-	if (fd >= 0)
-		return fd;
-	most_interesting_errno = errno;
+	int most_interesting_errno = errno;
 
 	prepare_alt_odb();
 	for (alt = alt_odb_list; alt; alt = alt->next) {
+		int fd;
 		*path = alt_sha1_path(alt, sha1);
 		fd = git_open(*path);
 		if (fd >= 0)
@@ -1756,6 +1751,26 @@ static int open_sha1_file(const unsigned char *sha1, const char **path)
 	return -1;
 }
 
+/*
+ * Like stat_sha1_file(), but actually open the object and return the
+ * descriptor. See the caveats on the "path" parameter above.
+ */
+static int open_sha1_file(const unsigned char *sha1, const char **path)
+{
+	int fd;
+
+	*path = sha1_file_name(sha1);
+	fd = git_open(*path);
+	if (fd >= 0)
+		return fd;
+
+	fd = open_sha1_file_alt(sha1, path);
+	if (fd >= 0)
+		return fd;
+
+	return fd;
+}
+
 /*
  * Map the loose object at "path" if it is not NULL, or the path found by
  * searching for a loose object named "sha1".
@@ -3284,7 +3299,7 @@ int hash_sha1_file(const void *buf, unsigned long len, const char *type,
 }
 
 /* Finalize a file on disk, and close it. */
-static void close_sha1_file(int fd)
+void close_sha1_file(int fd)
 {
 	if (fsync_object_files)
 		fsync_or_die(fd, "sha1 file");
@@ -3308,7 +3323,7 @@ static inline int directory_size(const char *filename)
  * We want to avoid cross-directory filename renames, because those
  * can have problems on various filesystems (FAT, NFS, Coda).
  */
-static int create_tmpfile(struct strbuf *tmp, const char *filename)
+int create_object_tmpfile(struct strbuf *tmp, const char *filename)
 {
 	int fd, dirlen = directory_size(filename);
 
@@ -3348,7 +3363,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	static struct strbuf tmp_file = STRBUF_INIT;
 	const char *filename = sha1_file_name(sha1);
 
-	fd = create_tmpfile(&tmp_file, filename);
+	fd = create_object_tmpfile(&tmp_file, filename);
 	if (fd < 0) {
 		if (errno == EACCES)
 			return error("insufficient permission for adding an object to repository database %s", get_object_directory());
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 10/40] Add initial external odb support
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (8 preceding siblings ...)
  2017-08-03  9:18 ` [PATCH v5 09/40] sha1_file: prepare for external odbs Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-08-03 19:34   ` Junio C Hamano
  2017-08-03  9:18 ` [PATCH v5 11/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
                   ` (30 subsequent siblings)
  40 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

The external-odb.{c,h} files contains the functions that are
called by the rest of Git from "sha1_file.c".

The odb-helper.{c,h} files contains the functions to
actually implement communication with the external scripts or
processes that will manage external git objects.

For now only script mode is supported, and only the 'have' and
'get_git_obj' instructions are supported.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Makefile                |   2 +
 cache.h                 |   1 +
 external-odb.c          | 113 ++++++++++++++++++++++
 external-odb.h          |   8 ++
 odb-helper.c            | 249 ++++++++++++++++++++++++++++++++++++++++++++++++
 odb-helper.h            |  25 +++++
 sha1_file.c             |  25 ++++-
 t/t0400-external-odb.sh |  46 +++++++++
 8 files changed, 468 insertions(+), 1 deletion(-)
 create mode 100644 external-odb.c
 create mode 100644 external-odb.h
 create mode 100644 odb-helper.c
 create mode 100644 odb-helper.h
 create mode 100755 t/t0400-external-odb.sh

diff --git a/Makefile b/Makefile
index 461c845d33..dde4b0e868 100644
--- a/Makefile
+++ b/Makefile
@@ -783,6 +783,7 @@ LIB_OBJS += ewah/ewah_bitmap.o
 LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
+LIB_OBJS += external-odb.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += gettext.o
@@ -815,6 +816,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += odb-helper.o
 LIB_OBJS += oidset.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
diff --git a/cache.h b/cache.h
index 06da3d8a3f..13694069b1 100644
--- a/cache.h
+++ b/cache.h
@@ -1553,6 +1553,7 @@ extern void read_info_alternates(const char * relative_base, int depth);
 extern char *compute_alternate_path(const char *path, struct strbuf *err);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
 extern int foreach_alt_odb(alt_odb_fn, void*);
+extern void prepare_external_alt_odb(void);
 
 /*
  * Allocate a "struct alternate_object_database" but do _not_ actually
diff --git a/external-odb.c b/external-odb.c
new file mode 100644
index 0000000000..e9c3f11666
--- /dev/null
+++ b/external-odb.c
@@ -0,0 +1,113 @@
+#include "cache.h"
+#include "external-odb.h"
+#include "odb-helper.h"
+
+static struct odb_helper *helpers;
+static struct odb_helper **helpers_tail = &helpers;
+
+static struct odb_helper *find_or_create_helper(const char *name, int len)
+{
+	struct odb_helper *o;
+
+	for (o = helpers; o; o = o->next)
+		if (!strncmp(o->name, name, len) && !o->name[len])
+			return o;
+
+	o = odb_helper_new(name, len);
+	*helpers_tail = o;
+	helpers_tail = &o->next;
+
+	return o;
+}
+
+static int external_odb_config(const char *var, const char *value, void *data)
+{
+	struct odb_helper *o;
+	const char *name;
+	int namelen;
+	const char *subkey;
+
+	if (parse_config_key(var, "odb", &name, &namelen, &subkey) < 0)
+		return 0;
+
+	o = find_or_create_helper(name, namelen);
+
+	if (!strcmp(subkey, "scriptcommand"))
+		return git_config_string(&o->cmd, var, value);
+
+	return 0;
+}
+
+static void external_odb_init(void)
+{
+	static int initialized;
+
+	if (initialized)
+		return;
+	initialized = 1;
+
+	git_config(external_odb_config, NULL);
+}
+
+const char *external_odb_root(void)
+{
+	static const char *root;
+	if (!root)
+		root = git_pathdup("objects/external");
+	return root;
+}
+
+int external_odb_has_object(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	external_odb_init();
+
+	for (o = helpers; o; o = o->next)
+		if (odb_helper_has_object(o, sha1))
+			return 1;
+	return 0;
+}
+
+int external_odb_get_object(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+	const char *path;
+
+	if (!external_odb_has_object(sha1))
+		return -1;
+
+	path = sha1_file_name_alt(external_odb_root(), sha1);
+	safe_create_leading_directories_const(path);
+	prepare_external_alt_odb();
+
+	for (o = helpers; o; o = o->next) {
+		struct strbuf tmpfile = STRBUF_INIT;
+		int ret;
+		int fd;
+
+		if (!odb_helper_has_object(o, sha1))
+			continue;
+
+		fd = create_object_tmpfile(&tmpfile, path);
+		if (fd < 0) {
+			strbuf_release(&tmpfile);
+			return -1;
+		}
+
+		if (odb_helper_get_object(o, sha1, fd) < 0) {
+			close(fd);
+			unlink(tmpfile.buf);
+			strbuf_release(&tmpfile);
+			continue;
+		}
+
+		close_sha1_file(fd);
+		ret = finalize_object_file(tmpfile.buf, path);
+		strbuf_release(&tmpfile);
+		if (!ret)
+			return 0;
+	}
+
+	return -1;
+}
diff --git a/external-odb.h b/external-odb.h
new file mode 100644
index 0000000000..9989490c9e
--- /dev/null
+++ b/external-odb.h
@@ -0,0 +1,8 @@
+#ifndef EXTERNAL_ODB_H
+#define EXTERNAL_ODB_H
+
+const char *external_odb_root(void);
+int external_odb_has_object(const unsigned char *sha1);
+int external_odb_get_object(const unsigned char *sha1);
+
+#endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
new file mode 100644
index 0000000000..0e6f824e4a
--- /dev/null
+++ b/odb-helper.c
@@ -0,0 +1,249 @@
+#include "cache.h"
+#include "object.h"
+#include "argv-array.h"
+#include "odb-helper.h"
+#include "run-command.h"
+#include "sha1-lookup.h"
+
+struct odb_helper *odb_helper_new(const char *name, int namelen)
+{
+	struct odb_helper *o;
+
+	o = xcalloc(1, sizeof(*o));
+	o->name = xmemdupz(name, namelen);
+
+	return o;
+}
+
+struct odb_helper_cmd {
+	struct argv_array argv;
+	struct child_process child;
+};
+
+/*
+ * Callers are responsible to ensure that the result of vaddf(fmt, ap)
+ * is properly shell-quoted.
+ */
+static void prepare_helper_command(struct argv_array *argv, const char *cmd,
+				   const char *fmt, va_list ap)
+{
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_addstr(&buf, cmd);
+	strbuf_addch(&buf, ' ');
+	strbuf_vaddf(&buf, fmt, ap);
+
+	argv_array_push(argv, buf.buf);
+	strbuf_release(&buf);
+}
+
+__attribute__((format (printf,3,4)))
+static int odb_helper_start(struct odb_helper *o,
+			    struct odb_helper_cmd *cmd,
+			    const char *fmt, ...)
+{
+	va_list ap;
+
+	memset(cmd, 0, sizeof(*cmd));
+	argv_array_init(&cmd->argv);
+
+	if (!o->cmd)
+		return -1;
+
+	va_start(ap, fmt);
+	prepare_helper_command(&cmd->argv, o->cmd, fmt, ap);
+	va_end(ap);
+
+	cmd->child.argv = cmd->argv.argv;
+	cmd->child.use_shell = 1;
+	cmd->child.no_stdin = 1;
+	cmd->child.out = -1;
+
+	if (start_command(&cmd->child) < 0) {
+		argv_array_clear(&cmd->argv);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int odb_helper_finish(struct odb_helper *o,
+			     struct odb_helper_cmd *cmd)
+{
+	int ret = finish_command(&cmd->child);
+	argv_array_clear(&cmd->argv);
+	if (ret) {
+		warning("odb helper '%s' reported failure", o->name);
+		return -1;
+	}
+	return 0;
+}
+
+static int parse_object_line(struct odb_helper_object *o, const char *line)
+{
+	char *end;
+	if (get_sha1_hex(line, o->sha1) < 0)
+		return -1;
+
+	line += 40;
+	if (*line++ != ' ')
+		return -1;
+
+	o->size = strtoul(line, &end, 10);
+	if (line == end || *end++ != ' ')
+		return -1;
+
+	o->type = type_from_string(end);
+	return 0;
+}
+
+static int add_have_entry(struct odb_helper *o, const char *line)
+{
+	ALLOC_GROW(o->have, o->have_nr+1, o->have_alloc);
+	if (parse_object_line(&o->have[o->have_nr], line) < 0) {
+		warning("bad 'have' input from odb helper '%s': %s",
+			o->name, line);
+		return 1;
+	}
+	o->have_nr++;
+	return 0;
+}
+
+static int odb_helper_object_cmp(const void *va, const void *vb)
+{
+	const struct odb_helper_object *a = va, *b = vb;
+	return hashcmp(a->sha1, b->sha1);
+}
+
+static void odb_helper_load_have(struct odb_helper *o)
+{
+	struct odb_helper_cmd cmd;
+	FILE *fh;
+	struct strbuf line = STRBUF_INIT;
+
+	if (o->have_valid)
+		return;
+	o->have_valid = 1;
+
+	if (odb_helper_start(o, &cmd, "have") < 0)
+		return;
+
+	fh = xfdopen(cmd.child.out, "r");
+	while (strbuf_getline(&line, fh) != EOF)
+		if (add_have_entry(o, line.buf))
+			break;
+
+	strbuf_release(&line);
+	fclose(fh);
+	odb_helper_finish(o, &cmd);
+
+	qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
+}
+
+static struct odb_helper_object *odb_helper_lookup(struct odb_helper *o,
+						   const unsigned char *sha1)
+{
+	int idx;
+
+	odb_helper_load_have(o);
+	idx = sha1_entry_pos(o->have, sizeof(*o->have), 0,
+			     0, o->have_nr, o->have_nr,
+			     sha1);
+	if (idx < 0)
+		return NULL;
+	return &o->have[idx];
+}
+
+int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1)
+{
+	return !!odb_helper_lookup(o, sha1);
+}
+
+int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
+			    int fd)
+{
+	struct odb_helper_object *obj;
+	struct odb_helper_cmd cmd;
+	unsigned long total_got;
+	git_zstream stream;
+	int zret = Z_STREAM_END;
+	git_SHA_CTX hash;
+	unsigned char real_sha1[20];
+
+	obj = odb_helper_lookup(o, sha1);
+	if (!obj)
+		return -1;
+
+	if (odb_helper_start(o, &cmd, "get_git_obj %s", sha1_to_hex(sha1)) < 0)
+		return -1;
+
+	memset(&stream, 0, sizeof(stream));
+	git_inflate_init(&stream);
+	git_SHA1_Init(&hash);
+	total_got = 0;
+
+	for (;;) {
+		unsigned char buf[4096];
+		int r;
+
+		r = xread(cmd.child.out, buf, sizeof(buf));
+		if (r < 0) {
+			error("unable to read from odb helper '%s': %s",
+			      o->name, strerror(errno));
+			close(cmd.child.out);
+			odb_helper_finish(o, &cmd);
+			git_inflate_end(&stream);
+			return -1;
+		}
+		if (r == 0)
+			break;
+
+		write_or_die(fd, buf, r);
+
+		stream.next_in = buf;
+		stream.avail_in = r;
+		do {
+			unsigned char inflated[4096];
+			unsigned long got;
+
+			stream.next_out = inflated;
+			stream.avail_out = sizeof(inflated);
+			zret = git_inflate(&stream, Z_SYNC_FLUSH);
+			got = sizeof(inflated) - stream.avail_out;
+
+			git_SHA1_Update(&hash, inflated, got);
+			/* skip header when counting size */
+			if (!total_got) {
+				const unsigned char *p = memchr(inflated, '\0', got);
+				if (p)
+					got -= p - inflated + 1;
+				else
+					got = 0;
+			}
+			total_got += got;
+		} while (stream.avail_in && zret == Z_OK);
+	}
+
+	close(cmd.child.out);
+	git_inflate_end(&stream);
+	git_SHA1_Final(real_sha1, &hash);
+	if (odb_helper_finish(o, &cmd))
+		return -1;
+	if (zret != Z_STREAM_END) {
+		warning("bad zlib data from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	if (total_got != obj->size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, obj->size);
+		return -1;
+	}
+	if (hashcmp(real_sha1, sha1)) {
+		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+		return -1;
+	}
+
+	return 0;
+}
diff --git a/odb-helper.h b/odb-helper.h
new file mode 100644
index 0000000000..5800661704
--- /dev/null
+++ b/odb-helper.h
@@ -0,0 +1,25 @@
+#ifndef ODB_HELPER_H
+#define ODB_HELPER_H
+
+struct odb_helper {
+	const char *name;
+	const char *cmd;
+
+	struct odb_helper_object {
+		unsigned char sha1[20];
+		unsigned long size;
+		enum object_type type;
+	} *have;
+	int have_nr;
+	int have_alloc;
+	int have_valid;
+
+	struct odb_helper *next;
+};
+
+struct odb_helper *odb_helper_new(const char *name, int namelen);
+int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1);
+int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
+			  int fd);
+
+#endif /* ODB_HELPER_H */
diff --git a/sha1_file.c b/sha1_file.c
index d330996bc4..4bd790f6f8 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -28,6 +28,7 @@
 #include "list.h"
 #include "mergesort.h"
 #include "quote.h"
+#include "external-odb.h"
 
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
@@ -636,6 +637,21 @@ int foreach_alt_odb(alt_odb_fn fn, void *cb)
 	return r;
 }
 
+void prepare_external_alt_odb(void)
+{
+	static int linked_external;
+	const char *path;
+
+	if (linked_external)
+		return;
+
+	path = external_odb_root();
+	if (!access(path, F_OK)) {
+		link_alt_odb_entry(path, NULL, 0, "");
+		linked_external = 1;
+	}
+}
+
 void prepare_alt_odb(void)
 {
 	const char *alt;
@@ -650,6 +666,7 @@ void prepare_alt_odb(void)
 	link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0);
 
 	read_info_alternates(get_object_directory(), 0);
+	prepare_external_alt_odb();
 }
 
 /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
@@ -690,7 +707,7 @@ static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen)
 		if (check_and_freshen_file(path, freshen))
 			return 1;
 	}
-	return 0;
+	return external_odb_has_object(sha1);
 }
 
 static int check_and_freshen(const unsigned char *sha1, int freshen)
@@ -1729,6 +1746,9 @@ static int stat_sha1_file(const unsigned char *sha1, struct stat *st,
 			return 0;
 	}
 
+	if (!external_odb_get_object(sha1) && !lstat(*path, st))
+		return 0;
+
 	return -1;
 }
 
@@ -1768,6 +1788,9 @@ static int open_sha1_file(const unsigned char *sha1, const char **path)
 	if (fd >= 0)
 		return fd;
 
+	if (!external_odb_get_object(sha1))
+		fd = open_sha1_file_alt(sha1, path);
+
 	return fd;
 }
 
diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
new file mode 100755
index 0000000000..2f4749fab1
--- /dev/null
+++ b/t/t0400-external-odb.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='basic tests for external object databases'
+
+. ./test-lib.sh
+
+ALT_SOURCE="$PWD/alt-repo/.git"
+export ALT_SOURCE
+write_script odb-helper <<\EOF
+GIT_DIR=$ALT_SOURCE; export GIT_DIR
+case "$1" in
+have)
+	git cat-file --batch-check --batch-all-objects |
+	awk '{print $1 " " $3 " " $2}'
+	;;
+get_git_obj)
+	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+	;;
+esac
+EOF
+HELPER="\"$PWD\"/odb-helper"
+
+test_expect_success 'setup alternate repo' '
+	git init alt-repo &&
+	(cd alt-repo &&
+	 test_commit one &&
+	 test_commit two
+	) &&
+	alt_head=`cd alt-repo && git rev-parse HEAD`
+'
+
+test_expect_success 'alt objects are missing' '
+	test_must_fail git log --format=%s $alt_head
+'
+
+test_expect_success 'helper can retrieve alt objects' '
+	test_config odb.magic.scriptCommand "$HELPER" &&
+	cat >expect <<-\EOF &&
+	two
+	one
+	EOF
+	git log --format=%s $alt_head >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 11/40] odb-helper: add odb_helper_init() to send 'init' instruction
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (9 preceding siblings ...)
  2017-08-03  9:18 ` [PATCH v5 10/40] Add initial external odb support Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-09-10 12:12   ` Lars Schneider
  2017-08-03  9:18 ` [PATCH v5 12/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
                   ` (29 subsequent siblings)
  40 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Let's add an odb_helper_init() function to send an 'init'
instruction to the helpers. This 'init' instruction is
especially useful to get the capabilities that are supported
by the helpers.

So while at it, let's also add a parse_capabilities()
function to parse them and a supported_capabilities
variable in struct odb_helper to store them.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c          |  9 ++++++++-
 odb-helper.c            | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
 odb-helper.h            | 12 +++++++++++
 t/t0400-external-odb.sh |  4 ++++
 4 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/external-odb.c b/external-odb.c
index e9c3f11666..0f0de170b8 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -41,12 +41,16 @@ static int external_odb_config(const char *var, const char *value, void *data)
 static void external_odb_init(void)
 {
 	static int initialized;
+	struct odb_helper *o;
 
 	if (initialized)
 		return;
 	initialized = 1;
 
 	git_config(external_odb_config, NULL);
+
+	for (o = helpers; o; o = o->next)
+		odb_helper_init(o);
 }
 
 const char *external_odb_root(void)
@@ -63,9 +67,12 @@ int external_odb_has_object(const unsigned char *sha1)
 
 	external_odb_init();
 
-	for (o = helpers; o; o = o->next)
+	for (o = helpers; o; o = o->next) {
+		if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE))
+			return 1;
 		if (odb_helper_has_object(o, sha1))
 			return 1;
+	}
 	return 0;
 }
 
diff --git a/odb-helper.c b/odb-helper.c
index 0e6f824e4a..c6e16b938c 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -5,6 +5,40 @@
 #include "run-command.h"
 #include "sha1-lookup.h"
 
+static void parse_capabilities(char *cap_buf,
+			       unsigned int *supported_capabilities,
+			       const char *process_name)
+{
+	struct string_list cap_list = STRING_LIST_INIT_NODUP;
+
+	string_list_split_in_place(&cap_list, cap_buf, '=', 1);
+
+	if (cap_list.nr == 2 && !strcmp(cap_list.items[0].string, "capability")) {
+		const char *cap_name = cap_list.items[1].string;
+
+		if (!strcmp(cap_name, "get_git_obj")) {
+			*supported_capabilities |= ODB_HELPER_CAP_GET_GIT_OBJ;
+		} else if (!strcmp(cap_name, "get_raw_obj")) {
+			*supported_capabilities |= ODB_HELPER_CAP_GET_RAW_OBJ;
+		} else if (!strcmp(cap_name, "get_direct")) {
+			*supported_capabilities |= ODB_HELPER_CAP_GET_DIRECT;
+		} else if (!strcmp(cap_name, "put_git_obj")) {
+			*supported_capabilities |= ODB_HELPER_CAP_PUT_GIT_OBJ;
+		} else if (!strcmp(cap_name, "put_raw_obj")) {
+			*supported_capabilities |= ODB_HELPER_CAP_PUT_RAW_OBJ;
+		} else if (!strcmp(cap_name, "put_direct")) {
+			*supported_capabilities |= ODB_HELPER_CAP_PUT_DIRECT;
+		} else if (!strcmp(cap_name, "have")) {
+			*supported_capabilities |= ODB_HELPER_CAP_HAVE;
+		} else {
+			warning("external process '%s' requested unsupported read-object capability '%s'",
+				process_name, cap_name);
+		}
+	}
+
+	string_list_clear(&cap_list, 0);
+}
+
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
 	struct odb_helper *o;
@@ -79,6 +113,26 @@ static int odb_helper_finish(struct odb_helper *o,
 	return 0;
 }
 
+int odb_helper_init(struct odb_helper *o)
+{
+	struct odb_helper_cmd cmd;
+	FILE *fh;
+	struct strbuf line = STRBUF_INIT;
+
+	if (odb_helper_start(o, &cmd, "init") < 0)
+		return -1;
+
+	fh = xfdopen(cmd.child.out, "r");
+	while (strbuf_getline(&line, fh) != EOF)
+		parse_capabilities(line.buf, &o->supported_capabilities, o->name);
+
+	strbuf_release(&line);
+	fclose(fh);
+	odb_helper_finish(o, &cmd);
+
+	return 0;
+}
+
 static int parse_object_line(struct odb_helper_object *o, const char *line)
 {
 	char *end;
diff --git a/odb-helper.h b/odb-helper.h
index 5800661704..8e0b9dd9cb 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -1,9 +1,20 @@
 #ifndef ODB_HELPER_H
 #define ODB_HELPER_H
 
+#include "external-odb.h"
+
+#define ODB_HELPER_CAP_GET_GIT_OBJ    (1u<<0)
+#define ODB_HELPER_CAP_GET_RAW_OBJ    (1u<<1)
+#define ODB_HELPER_CAP_GET_DIRECT     (1u<<2)
+#define ODB_HELPER_CAP_PUT_GIT_OBJ    (1u<<3)
+#define ODB_HELPER_CAP_PUT_RAW_OBJ    (1u<<4)
+#define ODB_HELPER_CAP_PUT_DIRECT     (1u<<5)
+#define ODB_HELPER_CAP_HAVE           (1u<<6)
+
 struct odb_helper {
 	const char *name;
 	const char *cmd;
+	unsigned int supported_capabilities;
 
 	struct odb_helper_object {
 		unsigned char sha1[20];
@@ -18,6 +29,7 @@ struct odb_helper {
 };
 
 struct odb_helper *odb_helper_new(const char *name, int namelen);
+int odb_helper_init(struct odb_helper *o);
 int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1);
 int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
 			  int fd);
diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 2f4749fab1..ed89f3ab40 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -9,6 +9,10 @@ export ALT_SOURCE
 write_script odb-helper <<\EOF
 GIT_DIR=$ALT_SOURCE; export GIT_DIR
 case "$1" in
+init)
+	echo "capability=get_git_obj"
+	echo "capability=have"
+	;;
 have)
 	git cat-file --batch-check --batch-all-objects |
 	awk '{print $1 " " $3 " " $2}'
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 12/40] t0400: add 'put_raw_obj' instruction to odb-helper script
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (10 preceding siblings ...)
  2017-08-03  9:18 ` [PATCH v5 11/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-09-10 12:12   ` Lars Schneider
  2017-08-03  9:18 ` [PATCH v5 13/40] external odb: add 'put_raw_obj' support Christian Couder
                   ` (28 subsequent siblings)
  40 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

To properly test passing objects from Git to an external odb
we need an odb-helper script that supports a 'put'
capability/instruction.

For now we will support only sending raw blobs, so the
supported capability/instruction will be 'put_raw_obj'.

While at it let's add a test to check that our odb-helper
script works well.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0400-external-odb.sh | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index ed89f3ab40..3fa0449883 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -7,10 +7,15 @@ test_description='basic tests for external object databases'
 ALT_SOURCE="$PWD/alt-repo/.git"
 export ALT_SOURCE
 write_script odb-helper <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
 GIT_DIR=$ALT_SOURCE; export GIT_DIR
 case "$1" in
 init)
 	echo "capability=get_git_obj"
+	echo "capability=put_raw_obj"
 	echo "capability=have"
 	;;
 have)
@@ -20,6 +25,16 @@ have)
 get_git_obj)
 	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
 	;;
+put_raw_obj)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	writen=$(git hash-object -w -t "$kind" --stdin)
+	test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen '$writen'"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
 esac
 EOF
 HELPER="\"$PWD\"/odb-helper"
@@ -47,4 +62,13 @@ test_expect_success 'helper can retrieve alt objects' '
 	test_cmp expect actual
 '
 
+test_expect_success 'helper can add objects to alt repo' '
+	hash=$(echo "Hello odb!" | git hash-object -w -t blob --stdin) &&
+	test -f .git/objects/$(echo $hash | sed "s#..#&/#") &&
+	size=$(git cat-file -s "$hash") &&
+	git cat-file blob "$hash" | ./odb-helper put_raw_obj "$hash" "$size" blob &&
+	alt_size=$(cd alt-repo && git cat-file -s "$hash") &&
+	test "$size" -eq "$alt_size"
+'
+
 test_done
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 13/40] external odb: add 'put_raw_obj' support
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (11 preceding siblings ...)
  2017-08-03  9:18 ` [PATCH v5 12/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
@ 2017-08-03  9:18 ` Christian Couder
  2017-08-03 19:50   ` Junio C Hamano
  2017-08-03  9:19 ` [PATCH v5 14/40] external-odb: accept only blobs for now Christian Couder
                   ` (27 subsequent siblings)
  40 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Add support for a 'put_raw_obj' capability/instruction to send new
objects to an external odb. Objects will be sent as they are (in
their 'raw' format). They will not be converted to Git objects.

For now any new Git object (blob, tree, commit, ...) would be sent
if 'put_raw_obj' is supported by an odb helper. This is not a great
default, but let's leave it to following commits to tweak that.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c | 15 +++++++++++++++
 external-odb.h |  2 ++
 odb-helper.c   | 43 ++++++++++++++++++++++++++++++++++++++-----
 odb-helper.h   |  3 +++
 sha1_file.c    |  2 ++
 5 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 0f0de170b8..82fac702e8 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -118,3 +118,18 @@ int external_odb_get_object(const unsigned char *sha1)
 
 	return -1;
 }
+
+int external_odb_put_object(const void *buf, size_t len,
+			    const char *type, unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	external_odb_init();
+
+	for (o = helpers; o; o = o->next) {
+		int r = odb_helper_put_object(o, buf, len, type, sha1);
+		if (r <= 0)
+			return r;
+	}
+	return 1;
+}
diff --git a/external-odb.h b/external-odb.h
index 9989490c9e..3e0e6d0165 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -4,5 +4,7 @@
 const char *external_odb_root(void);
 int external_odb_has_object(const unsigned char *sha1);
 int external_odb_get_object(const unsigned char *sha1);
+int external_odb_put_object(const void *buf, size_t len,
+			    const char *type, unsigned char *sha1);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
index c6e16b938c..1be4461158 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -71,9 +71,10 @@ static void prepare_helper_command(struct argv_array *argv, const char *cmd,
 	strbuf_release(&buf);
 }
 
-__attribute__((format (printf,3,4)))
+__attribute__((format (printf,4,5)))
 static int odb_helper_start(struct odb_helper *o,
 			    struct odb_helper_cmd *cmd,
+			    int use_stdin,
 			    const char *fmt, ...)
 {
 	va_list ap;
@@ -90,7 +91,10 @@ static int odb_helper_start(struct odb_helper *o,
 
 	cmd->child.argv = cmd->argv.argv;
 	cmd->child.use_shell = 1;
-	cmd->child.no_stdin = 1;
+	if (use_stdin)
+		cmd->child.in = -1;
+	else
+		cmd->child.no_stdin = 1;
 	cmd->child.out = -1;
 
 	if (start_command(&cmd->child) < 0) {
@@ -119,7 +123,7 @@ int odb_helper_init(struct odb_helper *o)
 	FILE *fh;
 	struct strbuf line = STRBUF_INIT;
 
-	if (odb_helper_start(o, &cmd, "init") < 0)
+	if (odb_helper_start(o, &cmd, 0, "init") < 0)
 		return -1;
 
 	fh = xfdopen(cmd.child.out, "r");
@@ -179,7 +183,7 @@ static void odb_helper_load_have(struct odb_helper *o)
 		return;
 	o->have_valid = 1;
 
-	if (odb_helper_start(o, &cmd, "have") < 0)
+	if (odb_helper_start(o, &cmd, 0, "have") < 0)
 		return;
 
 	fh = xfdopen(cmd.child.out, "r");
@@ -228,7 +232,7 @@ int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
 	if (!obj)
 		return -1;
 
-	if (odb_helper_start(o, &cmd, "get_git_obj %s", sha1_to_hex(sha1)) < 0)
+	if (odb_helper_start(o, &cmd, 0, "get_git_obj %s", sha1_to_hex(sha1)) < 0)
 		return -1;
 
 	memset(&stream, 0, sizeof(stream));
@@ -301,3 +305,32 @@ int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
 
 	return 0;
 }
+
+int odb_helper_put_object(struct odb_helper *o,
+			  const void *buf, size_t len,
+			  const char *type, unsigned char *sha1)
+{
+	struct odb_helper_cmd cmd;
+
+	if (odb_helper_start(o, &cmd, 1, "put_raw_obj %s %"PRIuMAX" %s",
+			     sha1_to_hex(sha1), (uintmax_t)len, type) < 0)
+		return -1;
+
+	do {
+		int w = xwrite(cmd.child.in, buf, len);
+		if (w < 0) {
+			error("unable to write to odb helper '%s': %s",
+			      o->name, strerror(errno));
+			close(cmd.child.in);
+			close(cmd.child.out);
+			odb_helper_finish(o, &cmd);
+			return -1;
+		}
+		len -= w;
+	} while (len > 0);
+
+	close(cmd.child.in);
+	close(cmd.child.out);
+	odb_helper_finish(o, &cmd);
+	return 0;
+}
diff --git a/odb-helper.h b/odb-helper.h
index 8e0b9dd9cb..318e0d48dc 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -33,5 +33,8 @@ int odb_helper_init(struct odb_helper *o);
 int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1);
 int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
 			  int fd);
+int odb_helper_put_object(struct odb_helper *o,
+			  const void *buf, size_t len,
+			  const char *type, unsigned char *sha1);
 
 #endif /* ODB_HELPER_H */
diff --git a/sha1_file.c b/sha1_file.c
index 4bd790f6f8..6f6406fb36 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3469,6 +3469,8 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
 	 * it out into .git/objects/??/?{38} file.
 	 */
 	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
+	if (!external_odb_put_object(buf, len, type, sha1))
+		return 0;
 	if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
 		return 0;
 	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 14/40] external-odb: accept only blobs for now
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (12 preceding siblings ...)
  2017-08-03  9:18 ` [PATCH v5 13/40] external odb: add 'put_raw_obj' support Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03 19:52   ` Junio C Hamano
  2017-08-03  9:19 ` [PATCH v5 15/40] t0400: add test for external odb write support Christian Couder
                   ` (26 subsequent siblings)
  40 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

The mechanism to decide which blobs should be sent to which
external object database will be very simple for now.
If the external odb helper support any "put_*" instruction
all the new blobs will be sent to it.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/external-odb.c b/external-odb.c
index 82fac702e8..a4f8c72e1c 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -124,6 +124,10 @@ int external_odb_put_object(const void *buf, size_t len,
 {
 	struct odb_helper *o;
 
+	/* For now accept only blobs */
+	if (strcmp(type, "blob"))
+		return 1;
+
 	external_odb_init();
 
 	for (o = helpers; o; o = o->next) {
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 15/40] t0400: add test for external odb write support
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (13 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 14/40] external-odb: accept only blobs for now Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 16/40] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
                   ` (25 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0400-external-odb.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 3fa0449883..fa355bd7bb 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -71,4 +71,12 @@ test_expect_success 'helper can add objects to alt repo' '
 	test "$size" -eq "$alt_size"
 '
 
+test_expect_success 'commit adds objects to alt repo' '
+	test_config odb.magic.scriptCommand "$HELPER" &&
+	test_commit three &&
+	hash3=$(git ls-tree HEAD | grep three.t | cut -f1 | cut -d\  -f3) &&
+	content=$(cd alt-repo && git show "$hash3") &&
+	test "$content" = "three"
+'
+
 test_done
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 16/40] Add GIT_NO_EXTERNAL_ODB env variable
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (14 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 15/40] t0400: add test for external odb write support Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 17/40] Add t0410 to test external ODB transfer Christian Couder
                   ` (24 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This new environment variable will be used to perform git
commands without involving any external odb mechanism.

This makes it possible for example to create new blobs that
will not be sent to an external odb even if the external odb
supports "put_*" instructions.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 cache.h        | 9 +++++++++
 environment.c  | 4 ++++
 external-odb.c | 6 ++++++
 sha1_file.c    | 3 +++
 4 files changed, 22 insertions(+)

diff --git a/cache.h b/cache.h
index 13694069b1..73ebd99830 100644
--- a/cache.h
+++ b/cache.h
@@ -430,6 +430,7 @@ static inline enum object_type object_type(unsigned int mode)
 #define CEILING_DIRECTORIES_ENVIRONMENT "GIT_CEILING_DIRECTORIES"
 #define NO_REPLACE_OBJECTS_ENVIRONMENT "GIT_NO_REPLACE_OBJECTS"
 #define GIT_REPLACE_REF_BASE_ENVIRONMENT "GIT_REPLACE_REF_BASE"
+#define NO_EXTERNAL_ODB_ENVIRONMENT "GIT_NO_EXTERNAL_ODB"
 #define GITATTRIBUTES_FILE ".gitattributes"
 #define INFOATTRIBUTES_FILE "info/attributes"
 #define ATTRIBUTE_MACRO_PREFIX "[attr]"
@@ -767,6 +768,14 @@ void reset_shared_repository(void);
 extern int check_replace_refs;
 extern char *git_replace_ref_base;
 
+/*
+ * Do external odbs need to be used this run?  This variable is
+ * initialized to true unless $GIT_NO_EXTERNAL_ODB is set, but it
+ * maybe set to false by some commands that do not want external
+ * odbs to be active.
+ */
+extern int use_external_odb;
+
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
diff --git a/environment.c b/environment.c
index 3fd4b10845..bbccabef6b 100644
--- a/environment.c
+++ b/environment.c
@@ -48,6 +48,7 @@ const char *excludes_file;
 enum auto_crlf auto_crlf = AUTO_CRLF_FALSE;
 int check_replace_refs = 1;
 char *git_replace_ref_base;
+int use_external_odb = 1;
 enum eol core_eol = EOL_UNSET;
 enum safe_crlf safe_crlf = SAFE_CRLF_WARN;
 unsigned whitespace_rule_cfg = WS_DEFAULT_RULE;
@@ -116,6 +117,7 @@ const char * const local_repo_env[] = {
 	INDEX_ENVIRONMENT,
 	NO_REPLACE_OBJECTS_ENVIRONMENT,
 	GIT_REPLACE_REF_BASE_ENVIRONMENT,
+	NO_EXTERNAL_ODB_ENVIRONMENT,
 	GIT_PREFIX_ENVIRONMENT,
 	GIT_SUPER_PREFIX_ENVIRONMENT,
 	GIT_SHALLOW_FILE_ENVIRONMENT,
@@ -154,6 +156,8 @@ void setup_git_env(void)
 	replace_ref_base = getenv(GIT_REPLACE_REF_BASE_ENVIRONMENT);
 	git_replace_ref_base = xstrdup(replace_ref_base ? replace_ref_base
 							  : "refs/replace/");
+	if (getenv(NO_EXTERNAL_ODB_ENVIRONMENT))
+		use_external_odb = 0;
 	namespace = expand_namespace(getenv(GIT_NAMESPACE_ENVIRONMENT));
 	shallow_file = getenv(GIT_SHALLOW_FILE_ENVIRONMENT);
 	if (shallow_file)
diff --git a/external-odb.c b/external-odb.c
index a4f8c72e1c..52cb448d01 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -65,6 +65,9 @@ int external_odb_has_object(const unsigned char *sha1)
 {
 	struct odb_helper *o;
 
+	if (!use_external_odb)
+		return 0;
+
 	external_odb_init();
 
 	for (o = helpers; o; o = o->next) {
@@ -124,6 +127,9 @@ int external_odb_put_object(const void *buf, size_t len,
 {
 	struct odb_helper *o;
 
+	if (!use_external_odb)
+		return 1;
+
 	/* For now accept only blobs */
 	if (strcmp(type, "blob"))
 		return 1;
diff --git a/sha1_file.c b/sha1_file.c
index 6f6406fb36..3735720bfc 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -642,6 +642,9 @@ void prepare_external_alt_odb(void)
 	static int linked_external;
 	const char *path;
 
+	if (!use_external_odb)
+		return;
+
 	if (linked_external)
 		return;
 
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 17/40] Add t0410 to test external ODB transfer
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (15 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 16/40] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 18/40] lib-httpd: pass config file to start_httpd() Christian Couder
                   ` (23 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0410-transfer-e-odb.sh | 144 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)
 create mode 100755 t/t0410-transfer-e-odb.sh

diff --git a/t/t0410-transfer-e-odb.sh b/t/t0410-transfer-e-odb.sh
new file mode 100755
index 0000000000..065ec7d759
--- /dev/null
+++ b/t/t0410-transfer-e-odb.sh
@@ -0,0 +1,144 @@
+#!/bin/sh
+
+test_description='basic tests for transfering external ODBs'
+
+. ./test-lib.sh
+
+ORIG_SOURCE="$PWD/.git"
+export ORIG_SOURCE
+
+ALT_SOURCE1="$PWD/alt-repo1/.git"
+export ALT_SOURCE1
+write_script odb-helper1 <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
+GIT_DIR=$ALT_SOURCE1; export GIT_DIR
+case "$1" in
+init)
+	echo "capability=get_git_obj"
+	echo "capability=have"
+	;;
+have)
+	git cat-file --batch-check --batch-all-objects |
+	awk '{print $1 " " $3 " " $2}'
+	;;
+get_git_obj)
+	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+	;;
+put_raw_obj)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	writen=$(git hash-object -w -t "$kind" --stdin)
+	test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen '$writen'"
+	ref_hash=$(echo "$sha1 $size $kind" | GIT_DIR=$ORIG_SOURCE GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+	GIT_DIR=$ORIG_SOURCE git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
+esac
+EOF
+HELPER1="\"$PWD\"/odb-helper1"
+
+OTHER_SOURCE="$PWD/.git"
+export OTHER_SOURCE
+
+ALT_SOURCE2="$PWD/alt-repo2/.git"
+export ALT_SOURCE2
+write_script odb-helper2 <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
+GIT_DIR=$ALT_SOURCE2; export GIT_DIR
+case "$1" in
+init)
+	echo "capability=get_git_obj"
+	echo "capability=have"
+	;;
+have)
+	GIT_DIR=$OTHER_SOURCE git for-each-ref --format='%(objectname)' refs/odbs/magic/ | GIT_DIR=$OTHER_SOURCE xargs git show
+	;;
+get_git_obj)
+	OBJ_FILE="$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+	if ! test -f "$OBJ_FILE"
+	then
+		# "Download" the missing object by copying it from alt-repo1
+		OBJ_DIR=$(echo $2 | sed 's/\(..\).*/\1/')
+		OBJ_BASE=$(basename "$OBJ_FILE")
+		ALT_OBJ_DIR1="$ALT_SOURCE1/objects/$OBJ_DIR"
+		ALT_OBJ_DIR2="$ALT_SOURCE2/objects/$OBJ_DIR"
+		mkdir -p "$ALT_OBJ_DIR2" || die "Could not mkdir '$ALT_OBJ_DIR2'"
+		OBJ_SRC="$ALT_OBJ_DIR1/$OBJ_BASE"
+		cp "$OBJ_SRC" "$ALT_OBJ_DIR2" ||
+		die "Could not cp '$OBJ_SRC' into '$ALT_OBJ_DIR2'"
+	fi
+	cat "$OBJ_FILE" || die "Could not cat '$OBJ_FILE'"
+	;;
+put_raw_obj)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	writen=$(git hash-object -w -t "$kind" --stdin)
+	test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen '$writen'"
+	ref_hash=$(echo "$sha1 $size $kind" | GIT_DIR=$OTHER_SOURCE GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+	GIT_DIR=$OTHER_SOURCE git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
+esac
+EOF
+HELPER2="\"$PWD\"/odb-helper2"
+
+test_expect_success 'setup first alternate repo' '
+	git init alt-repo1 &&
+	test_commit zero &&
+	git config odb.magic.scriptCommand "$HELPER1"
+'
+
+test_expect_success 'setup other repo and its alternate repo' '
+	git init other-repo &&
+	git init alt-repo2 &&
+	(cd other-repo &&
+	 git remote add origin .. &&
+	 git pull origin master &&
+	 git checkout master &&
+	 git log)
+'
+
+test_expect_success 'new blobs are put in first object store' '
+	test_commit one &&
+	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+	content=$(cd alt-repo1 && git show "$hash1") &&
+	test "$content" = "one" &&
+	test_commit two &&
+	hash2=$(git ls-tree HEAD | grep two.t | cut -f1 | cut -d\  -f3) &&
+	content=$(cd alt-repo1 && git show "$hash2") &&
+	test "$content" = "two"
+'
+
+test_expect_success 'other repo gets the blobs from object store' '
+	(cd other-repo &&
+	 git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+	 test_must_fail git cat-file blob "$hash1" &&
+	 test_must_fail git cat-file blob "$hash2" &&
+	 git config odb.magic.scriptCommand "$HELPER2" &&
+	 git cat-file blob "$hash1" &&
+	 git cat-file blob "$hash2"
+	)
+'
+
+test_expect_success 'other repo gets everything else' '
+	(cd other-repo &&
+	 git fetch origin &&
+	 content=$(git show "$hash1") &&
+	 test "$content" = "one" &&
+	 content=$(git show "$hash2") &&
+	 test "$content" = "two")
+'
+
+test_done
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 18/40] lib-httpd: pass config file to start_httpd()
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (16 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 17/40] Add t0410 to test external ODB transfer Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 19/40] lib-httpd: add upload.sh Christian Couder
                   ` (22 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This makes it possible to start an apache web server with different
config files.

This will be used in a later patch to pass a config file that makes
apache store external objects.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/lib-httpd.sh | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 435a37465a..2e659a8ee2 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -171,12 +171,14 @@ prepare_httpd() {
 }
 
 start_httpd() {
+	APACHE_CONF_FILE=${1-apache.conf}
+
 	prepare_httpd >&3 2>&4
 
 	trap 'code=$?; stop_httpd; (exit $code); die' EXIT
 
 	"$LIB_HTTPD_PATH" -d "$HTTPD_ROOT_PATH" \
-		-f "$TEST_PATH/apache.conf" $HTTPD_PARA \
+		-f "$TEST_PATH/$APACHE_CONF_FILE" $HTTPD_PARA \
 		-c "Listen 127.0.0.1:$LIB_HTTPD_PORT" -k start \
 		>&3 2>&4
 	if test $? -ne 0
@@ -191,7 +193,7 @@ stop_httpd() {
 	trap 'die' EXIT
 
 	"$LIB_HTTPD_PATH" -d "$HTTPD_ROOT_PATH" \
-		-f "$TEST_PATH/apache.conf" $HTTPD_PARA -k stop
+		-f "$TEST_PATH/$APACHE_CONF_FILE" $HTTPD_PARA -k stop
 }
 
 test_http_push_nonff () {
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 19/40] lib-httpd: add upload.sh
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (17 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 18/40] lib-httpd: pass config file to start_httpd() Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03 20:07   ` Junio C Hamano
  2017-08-03  9:19 ` [PATCH v5 20/40] lib-httpd: add list.sh Christian Couder
                   ` (21 subsequent siblings)
  40 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This cgi will be used to upload objects to, or to delete
objects from, an apache web server.

This way the apache server can work as an external object
database.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/lib-httpd.sh        |  1 +
 t/lib-httpd/upload.sh | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)
 create mode 100644 t/lib-httpd/upload.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 2e659a8ee2..d80b004549 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -132,6 +132,7 @@ prepare_httpd() {
 	cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
 	install_script broken-smart-http.sh
 	install_script error.sh
+	install_script upload.sh
 
 	ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/upload.sh b/t/lib-httpd/upload.sh
new file mode 100644
index 0000000000..172be0f73f
--- /dev/null
+++ b/t/lib-httpd/upload.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+# In part from http://codereview.stackexchange.com/questions/79549/bash-cgi-upload-file
+
+FILES_DIR="www/files"
+
+OLDIFS="$IFS"
+IFS='&'
+set -- $QUERY_STRING
+IFS="$OLDIFS"
+
+while test $# -gt 0
+do
+    key=${1%=*}
+    val=${1#*=}
+
+    case "$key" in
+	"sha1") sha1="$val" ;;
+	"type") type="$val" ;;
+	"size") size="$val" ;;
+	"delete") delete=1 ;;
+	*) echo >&2 "unknown key '$key'" ;;
+    esac
+
+    shift
+done
+
+case "$REQUEST_METHOD" in
+  POST)
+    if test "$delete" = "1"
+    then
+	rm -f "$FILES_DIR/$sha1-$size-$type"
+    else
+	mkdir -p "$FILES_DIR"
+	cat >"$FILES_DIR/$sha1-$size-$type"
+    fi
+
+    echo 'Status: 204 No Content'
+    echo
+    ;;
+
+  *)
+    echo 'Status: 405 Method Not Allowed'
+    echo
+esac
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 20/40] lib-httpd: add list.sh
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (18 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 19/40] lib-httpd: add upload.sh Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 21/40] lib-httpd: add apache-e-odb.conf Christian Couder
                   ` (20 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This cgi script can list Git objects that have been uploaded as
files to an apache web server. This script can also retrieve
the content of each of these files.

This will help make apache work as an external object database.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/lib-httpd.sh      |  1 +
 t/lib-httpd/list.sh | 41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)
 create mode 100644 t/lib-httpd/list.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index d80b004549..f31ea261f5 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -133,6 +133,7 @@ prepare_httpd() {
 	install_script broken-smart-http.sh
 	install_script error.sh
 	install_script upload.sh
+	install_script list.sh
 
 	ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/list.sh b/t/lib-httpd/list.sh
new file mode 100644
index 0000000000..7e520e507a
--- /dev/null
+++ b/t/lib-httpd/list.sh
@@ -0,0 +1,41 @@
+#!/bin/sh
+
+FILES_DIR="www/files"
+
+OLDIFS="$IFS"
+IFS='&'
+set -- $QUERY_STRING
+IFS="$OLDIFS"
+
+while test $# -gt 0
+do
+    key=${1%=*}
+    val=${1#*=}
+
+    case "$key" in
+	"sha1") sha1="$val" ;;
+	*) echo >&2 "unknown key '$key'" ;;
+    esac
+
+    shift
+done
+
+if test -d "$FILES_DIR"
+then
+    if test -z "$sha1"
+    then
+	echo 'Status: 200 OK'
+	echo
+	ls "$FILES_DIR" | tr '-' ' '
+    else
+	if test -f "$FILES_DIR/$sha1"-*
+	then
+	    echo 'Status: 200 OK'
+	    echo
+	    cat "$FILES_DIR/$sha1"-*
+	else
+	    echo 'Status: 404 Not Found'
+	    echo
+	fi
+    fi
+fi
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 21/40] lib-httpd: add apache-e-odb.conf
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (19 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 20/40] lib-httpd: add list.sh Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 22/40] odb-helper: add odb_helper_get_raw_object() Christian Couder
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This is an apache config file to test external object databases.
It uses the upload.sh and list.sh cgi that have been added
previously to make apache store external objects.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/lib-httpd/apache-e-odb.conf | 214 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 214 insertions(+)
 create mode 100644 t/lib-httpd/apache-e-odb.conf

diff --git a/t/lib-httpd/apache-e-odb.conf b/t/lib-httpd/apache-e-odb.conf
new file mode 100644
index 0000000000..19a1540c82
--- /dev/null
+++ b/t/lib-httpd/apache-e-odb.conf
@@ -0,0 +1,214 @@
+ServerName dummy
+PidFile httpd.pid
+DocumentRoot www
+LogFormat "%h %l %u %t \"%r\" %>s %b" common
+CustomLog access.log common
+ErrorLog error.log
+<IfModule !mod_log_config.c>
+	LoadModule log_config_module modules/mod_log_config.so
+</IfModule>
+<IfModule !mod_alias.c>
+	LoadModule alias_module modules/mod_alias.so
+</IfModule>
+<IfModule !mod_cgi.c>
+	LoadModule cgi_module modules/mod_cgi.so
+</IfModule>
+<IfModule !mod_env.c>
+	LoadModule env_module modules/mod_env.so
+</IfModule>
+<IfModule !mod_rewrite.c>
+	LoadModule rewrite_module modules/mod_rewrite.so
+</IFModule>
+<IfModule !mod_version.c>
+	LoadModule version_module modules/mod_version.so
+</IfModule>
+<IfModule !mod_headers.c>
+	LoadModule headers_module modules/mod_headers.so
+</IfModule>
+
+<IfVersion < 2.4>
+LockFile accept.lock
+</IfVersion>
+
+<IfVersion < 2.1>
+<IfModule !mod_auth.c>
+	LoadModule auth_module modules/mod_auth.so
+</IfModule>
+</IfVersion>
+
+<IfVersion >= 2.1>
+<IfModule !mod_auth_basic.c>
+	LoadModule auth_basic_module modules/mod_auth_basic.so
+</IfModule>
+<IfModule !mod_authn_file.c>
+	LoadModule authn_file_module modules/mod_authn_file.so
+</IfModule>
+<IfModule !mod_authz_user.c>
+	LoadModule authz_user_module modules/mod_authz_user.so
+</IfModule>
+<IfModule !mod_authz_host.c>
+	LoadModule authz_host_module modules/mod_authz_host.so
+</IfModule>
+</IfVersion>
+
+<IfVersion >= 2.4>
+<IfModule !mod_authn_core.c>
+	LoadModule authn_core_module modules/mod_authn_core.so
+</IfModule>
+<IfModule !mod_authz_core.c>
+	LoadModule authz_core_module modules/mod_authz_core.so
+</IfModule>
+<IfModule !mod_access_compat.c>
+	LoadModule access_compat_module modules/mod_access_compat.so
+</IfModule>
+<IfModule !mod_mpm_prefork.c>
+	LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
+</IfModule>
+<IfModule !mod_unixd.c>
+	LoadModule unixd_module modules/mod_unixd.so
+</IfModule>
+</IfVersion>
+
+PassEnv GIT_VALGRIND
+PassEnv GIT_VALGRIND_OPTIONS
+PassEnv GNUPGHOME
+PassEnv ASAN_OPTIONS
+PassEnv GIT_TRACE
+PassEnv GIT_CONFIG_NOSYSTEM
+
+Alias /dumb/ www/
+Alias /auth/dumb/ www/auth/dumb/
+
+<LocationMatch /smart/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+</LocationMatch>
+<LocationMatch /smart_noexport/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+</LocationMatch>
+<LocationMatch /smart_custom_env/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+	SetEnv GIT_COMMITTER_NAME "Custom User"
+	SetEnv GIT_COMMITTER_EMAIL custom@example.com
+</LocationMatch>
+<LocationMatch /smart_namespace/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+	SetEnv GIT_NAMESPACE ns
+</LocationMatch>
+<LocationMatch /smart_cookies/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+	Header set Set-Cookie name=value
+</LocationMatch>
+<LocationMatch /smart_headers/>
+	SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+	SetEnv GIT_HTTP_EXPORT_ALL
+</LocationMatch>
+ScriptAlias /upload/ upload.sh/
+ScriptAlias /list/ list.sh/
+<Directory ${GIT_EXEC_PATH}>
+	Options FollowSymlinks
+</Directory>
+<Files upload.sh>
+  Options ExecCGI
+</Files>
+<Files list.sh>
+  Options ExecCGI
+</Files>
+<Files ${GIT_EXEC_PATH}/git-http-backend>
+	Options ExecCGI
+</Files>
+
+RewriteEngine on
+RewriteRule ^/smart-redir-perm/(.*)$ /smart/$1 [R=301]
+RewriteRule ^/smart-redir-temp/(.*)$ /smart/$1 [R=302]
+RewriteRule ^/smart-redir-auth/(.*)$ /auth/smart/$1 [R=301]
+RewriteRule ^/smart-redir-limited/(.*)/info/refs$ /smart/$1/info/refs [R=301]
+RewriteRule ^/ftp-redir/(.*)$ ftp://localhost:1000/$1 [R=302]
+
+RewriteRule ^/loop-redir/x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-(.*) /$1 [R=302]
+RewriteRule ^/loop-redir/(.*)$ /loop-redir/x-$1 [R=302]
+
+# Apache 2.2 does not understand <RequireAll>, so we use RewriteCond.
+# And as RewriteCond does not allow testing for non-matches, we match
+# the desired case first (one has abra, two has cadabra), and let it
+# pass by marking the RewriteRule as [L], "last rule, do not process
+# any other matching RewriteRules after this"), and then have another
+# RewriteRule that matches all other cases and lets them fail via '[F]',
+# "fail the request".
+RewriteCond %{HTTP:x-magic-one} =abra
+RewriteCond %{HTTP:x-magic-two} =cadabra
+RewriteRule ^/smart_headers/.* - [L]
+RewriteRule ^/smart_headers/.* - [F]
+
+<IfDefine SSL>
+LoadModule ssl_module modules/mod_ssl.so
+
+SSLCertificateFile httpd.pem
+SSLCertificateKeyFile httpd.pem
+SSLRandomSeed startup file:/dev/urandom 512
+SSLRandomSeed connect file:/dev/urandom 512
+SSLSessionCache none
+SSLMutex file:ssl_mutex
+SSLEngine On
+</IfDefine>
+
+<Location /auth/>
+	AuthType Basic
+	AuthName "git-auth"
+	AuthUserFile passwd
+	Require valid-user
+</Location>
+
+<LocationMatch "^/auth-push/.*/git-receive-pack$">
+	AuthType Basic
+	AuthName "git-auth"
+	AuthUserFile passwd
+	Require valid-user
+</LocationMatch>
+
+<LocationMatch "^/auth-fetch/.*/git-upload-pack$">
+	AuthType Basic
+	AuthName "git-auth"
+	AuthUserFile passwd
+	Require valid-user
+</LocationMatch>
+
+RewriteCond %{QUERY_STRING} service=git-receive-pack [OR]
+RewriteCond %{REQUEST_URI} /git-receive-pack$
+RewriteRule ^/half-auth-complete/ - [E=AUTHREQUIRED:yes]
+
+<Location /half-auth-complete/>
+  Order Deny,Allow
+  Deny from env=AUTHREQUIRED
+
+  AuthType Basic
+  AuthName "Git Access"
+  AuthUserFile passwd
+  Require valid-user
+  Satisfy Any
+</Location>
+
+<IfDefine DAV>
+	LoadModule dav_module modules/mod_dav.so
+	LoadModule dav_fs_module modules/mod_dav_fs.so
+
+	DAVLockDB DAVLock
+	<Location /dumb/>
+		Dav on
+	</Location>
+	<Location /auth/dumb>
+		Dav on
+	</Location>
+</IfDefine>
+
+<IfDefine SVN>
+	LoadModule dav_svn_module modules/mod_dav_svn.so
+
+	<Location /${LIB_HTTPD_SVN}>
+		DAV svn
+		SVNPath "${LIB_HTTPD_SVNPATH}"
+	</Location>
+</IfDefine>
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 22/40] odb-helper: add odb_helper_get_raw_object()
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (20 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 21/40] lib-httpd: add apache-e-odb.conf Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 23/40] pack-objects: don't pack objects in external odbs Christian Couder
                   ` (18 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

The existing odb_helper_get_object() is renamed
odb_helper_get_git_object() and a new odb_helper_get_raw_object()
is introduced to deal with external objects that are not in Git format.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 odb-helper.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 111 insertions(+), 2 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index 1be4461158..0603993057 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -217,8 +217,107 @@ int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1)
 	return !!odb_helper_lookup(o, sha1);
 }
 
-int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
-			    int fd)
+static int odb_helper_get_raw_object(struct odb_helper *o,
+				     const unsigned char *sha1,
+				     int fd)
+{
+	struct odb_helper_object *obj;
+	struct odb_helper_cmd cmd;
+	unsigned long total_got = 0;
+
+	char hdr[32];
+	int hdrlen;
+
+	int ret = Z_STREAM_END;
+	unsigned char compressed[4096];
+	git_zstream stream;
+	git_SHA_CTX hash;
+	unsigned char real_sha1[20];
+
+	obj = odb_helper_lookup(o, sha1);
+	if (!obj)
+		return -1;
+
+	if (odb_helper_start(o, &cmd, 0, "get_raw_obj %s", sha1_to_hex(sha1)) < 0)
+		return -1;
+
+	/* Set it up */
+	git_deflate_init(&stream, zlib_compression_level);
+	stream.next_out = compressed;
+	stream.avail_out = sizeof(compressed);
+	git_SHA1_Init(&hash);
+
+	/* First header.. */
+	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", typename(obj->type), obj->size) + 1;
+	stream.next_in = (unsigned char *)hdr;
+	stream.avail_in = hdrlen;
+	while (git_deflate(&stream, 0) == Z_OK)
+		; /* nothing */
+	git_SHA1_Update(&hash, hdr, hdrlen);
+
+	for (;;) {
+		unsigned char buf[4096];
+		int r;
+
+		r = xread(cmd.child.out, buf, sizeof(buf));
+		if (r < 0) {
+			error("unable to read from odb helper '%s': %s",
+			      o->name, strerror(errno));
+			close(cmd.child.out);
+			odb_helper_finish(o, &cmd);
+			git_deflate_end(&stream);
+			return -1;
+		}
+		if (r == 0)
+			break;
+
+		total_got += r;
+
+		/* Then the data itself.. */
+		stream.next_in = (void *)buf;
+		stream.avail_in = r;
+		do {
+			unsigned char *in0 = stream.next_in;
+			ret = git_deflate(&stream, Z_FINISH);
+			git_SHA1_Update(&hash, in0, stream.next_in - in0);
+			write_or_die(fd, compressed, stream.next_out - compressed);
+			stream.next_out = compressed;
+			stream.avail_out = sizeof(compressed);
+		} while (ret == Z_OK);
+	}
+
+	close(cmd.child.out);
+	if (ret != Z_STREAM_END) {
+		warning("bad zlib data from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	ret = git_deflate_end_gently(&stream);
+	if (ret != Z_OK) {
+		warning("deflateEnd on object %s from odb helper '%s' failed (%d)",
+			sha1_to_hex(sha1), o->name, ret);
+		return -1;
+	}
+	git_SHA1_Final(real_sha1, &hash);
+	if (hashcmp(sha1, real_sha1)) {
+		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+		return -1;
+	}
+	if (odb_helper_finish(o, &cmd))
+		return -1;
+	if (total_got != obj->size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, obj->size);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int odb_helper_get_git_object(struct odb_helper *o,
+				     const unsigned char *sha1,
+				     int fd)
 {
 	struct odb_helper_object *obj;
 	struct odb_helper_cmd cmd;
@@ -306,6 +405,16 @@ int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
 	return 0;
 }
 
+int odb_helper_get_object(struct odb_helper *o,
+			  const unsigned char *sha1,
+			  int fd)
+{
+	if (o->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ)
+		return odb_helper_get_raw_object(o, sha1, fd);
+	else
+		return odb_helper_get_git_object(o, sha1, fd);
+}
+
 int odb_helper_put_object(struct odb_helper *o,
 			  const void *buf, size_t len,
 			  const char *type, unsigned char *sha1)
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 23/40] pack-objects: don't pack objects in external odbs
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (21 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 22/40] odb-helper: add odb_helper_get_raw_object() Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 24/40] Add t0420 to test transfer to HTTP external odb Christian Couder
                   ` (17 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Objects managed by an external ODB should not be put into
pack files. They should be transfered using other mechanism
that can be specific to the external odb.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/pack-objects.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f4a8441fe9..8283d15408 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -25,6 +25,7 @@
 #include "sha1-array.h"
 #include "argv-array.h"
 #include "mru.h"
+#include "external-odb.h"
 
 static const char *pack_usage[] = {
 	N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
@@ -1011,6 +1012,9 @@ static int want_object_in_pack(const unsigned char *sha1,
 			return want;
 	}
 
+	if (external_odb_has_object(sha1))
+		return 0;
+
 	for (entry = packed_git_mru->head; entry; entry = entry->next) {
 		struct packed_git *p = entry->item;
 		off_t offset;
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 24/40] Add t0420 to test transfer to HTTP external odb
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (22 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 23/40] pack-objects: don't pack objects in external odbs Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 25/40] external-odb: add 'get_direct' support Christian Couder
                   ` (16 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This tests that an apache web server can be used as an
external object database and store files in their native
format instead of converting them to a Git object.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0420-transfer-http-e-odb.sh | 142 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)
 create mode 100755 t/t0420-transfer-http-e-odb.sh

diff --git a/t/t0420-transfer-http-e-odb.sh b/t/t0420-transfer-http-e-odb.sh
new file mode 100755
index 0000000000..f84fe950ec
--- /dev/null
+++ b/t/t0420-transfer-http-e-odb.sh
@@ -0,0 +1,142 @@
+#!/bin/sh
+
+test_description='tests for transfering external objects to an HTTPD server'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+# odb helper script must see this
+export HTTPD_URL
+
+write_script odb-http-helper <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
+echo >&2 "odb-http-helper args:" "$@"
+case "$1" in
+init)
+	echo "capability=get_raw_obj"
+	echo "capability=put_raw_obj"
+	echo "capability=have"
+	;;
+have)
+	list_url="$HTTPD_URL/list/"
+	curl "$list_url" ||
+	die "curl '$list_url' failed"
+	;;
+get_raw_obj)
+	get_url="$HTTPD_URL/list/?sha1=$2"
+	curl "$get_url" ||
+	die "curl '$get_url' failed"
+	;;
+put_raw_obj)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	upload_url="$HTTPD_URL/upload/?sha1=$sha1&size=$size&type=$kind"
+	curl --data-binary @- --include "$upload_url" >out ||
+	die "curl '$upload_url' failed"
+	ref_hash=$(echo "$sha1 $size $kind" | GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+	git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
+esac
+EOF
+HELPER="\"$PWD\"/odb-http-helper"
+
+test_expect_success 'setup repo with a root commit and the helper' '
+	test_commit zero &&
+	git config odb.magic.scriptCommand "$HELPER"
+'
+
+test_expect_success 'setup another repo from the first one' '
+	git init other-repo &&
+	(cd other-repo &&
+	 git remote add origin .. &&
+	 git pull origin master &&
+	 git checkout master &&
+	 git log)
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME&size=123&type=blob"
+
+test_expect_success 'can upload a file' '
+	echo "Hello Apache World!" >hello_to_send.txt &&
+	echo "How are you?" >>hello_to_send.txt &&
+	curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" >out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+	curl --include "$LIST_URL" >out_list &&
+	grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+	curl --data "delete" --include "$UPLOAD_URL&delete=1" >out_delete &&
+	curl --include "$LIST_URL" >out_list2 &&
+	! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+	test_commit one &&
+	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+	echo "$hash1-4-blob" >expected &&
+	ls "$FILES_DIR" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+	git cat-file blob "$hash1" &&
+	git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+	(cd other-repo &&
+	 git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+	 test_must_fail git cat-file blob "$hash1" &&
+	 git config odb.magic.scriptCommand "$HELPER" &&
+	 git cat-file blob "$hash1" &&
+	 git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone .. . &&
+	 git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 test_must_fail git clone --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git clone -c odb.magic.scriptCommand="$HELPER" \
+		--no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+stop_httpd
+
+test_done
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 25/40] external-odb: add 'get_direct' support
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (23 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 24/40] Add t0420 to test transfer to HTTP external odb Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03 21:40   ` Junio C Hamano
  2017-08-03  9:19 ` [PATCH v5 26/40] odb-helper: add 'script_mode' to 'struct odb_helper' Christian Couder
                   ` (15 subsequent siblings)
  40 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This implements the 'get_direct' capability/instruction that makes
it possible for external odb helper scripts to pass blobs to Git
by directly writing them as loose objects files.

It is better to call this a "direct" mode rather than a "fault-in"
mode as we could have the same kind of mechanism to "put" objects
into an external odb, where the odb helper would access blobs it
wants to send to an external odb directly from files, but it
would be strange to call that a fault-in mode too.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c | 21 ++++++++++++++++++++-
 external-odb.h |  1 +
 odb-helper.c   | 27 +++++++++++++++++++++++++--
 odb-helper.h   |  1 +
 4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 52cb448d01..31d21bfe04 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -96,7 +96,8 @@ int external_odb_get_object(const unsigned char *sha1)
 		int ret;
 		int fd;
 
-		if (!odb_helper_has_object(o, sha1))
+		if (!(o->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ) &&
+		    !(o->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ))
 			continue;
 
 		fd = create_object_tmpfile(&tmpfile, path);
@@ -122,6 +123,24 @@ int external_odb_get_object(const unsigned char *sha1)
 	return -1;
 }
 
+int external_odb_get_direct(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	if (!external_odb_has_object(sha1))
+		return -1;
+
+	for (o = helpers; o; o = o->next) {
+		if (!(o->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT))
+			continue;
+		if (odb_helper_get_direct(o, sha1) < 0)
+			continue;
+		return 0;
+	}
+
+	return -1;
+}
+
 int external_odb_put_object(const void *buf, size_t len,
 			    const char *type, unsigned char *sha1)
 {
diff --git a/external-odb.h b/external-odb.h
index 3e0e6d0165..247b131fd5 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -4,6 +4,7 @@
 const char *external_odb_root(void);
 int external_odb_has_object(const unsigned char *sha1);
 int external_odb_get_object(const unsigned char *sha1);
+int external_odb_get_direct(const unsigned char *sha1);
 int external_odb_put_object(const void *buf, size_t len,
 			    const char *type, unsigned char *sha1);
 
diff --git a/odb-helper.c b/odb-helper.c
index 0603993057..b1f5464214 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -405,14 +405,37 @@ static int odb_helper_get_git_object(struct odb_helper *o,
 	return 0;
 }
 
+int odb_helper_get_direct(struct odb_helper *o,
+			  const unsigned char *sha1)
+{
+	struct odb_helper_object *obj;
+	struct odb_helper_cmd cmd;
+
+	obj = odb_helper_lookup(o, sha1);
+	if (!obj)
+		return -1;
+
+	if (odb_helper_start(o, &cmd, 0, "get_direct %s", sha1_to_hex(sha1)) < 0)
+		return -1;
+
+	if (odb_helper_finish(o, &cmd))
+		return -1;
+
+	return 0;
+}
+
 int odb_helper_get_object(struct odb_helper *o,
 			  const unsigned char *sha1,
 			  int fd)
 {
+	if (o->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ)
+		return odb_helper_get_git_object(o, sha1, fd);
 	if (o->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ)
 		return odb_helper_get_raw_object(o, sha1, fd);
-	else
-		return odb_helper_get_git_object(o, sha1, fd);
+	if (o->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT)
+		return 0;
+
+	BUG("invalid get capability (capabilities: '%d')", o->supported_capabilities);
 }
 
 int odb_helper_put_object(struct odb_helper *o,
diff --git a/odb-helper.h b/odb-helper.h
index 318e0d48dc..f2fd2b7c9c 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -33,6 +33,7 @@ int odb_helper_init(struct odb_helper *o);
 int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1);
 int odb_helper_get_object(struct odb_helper *o, const unsigned char *sha1,
 			  int fd);
+int odb_helper_get_direct(struct odb_helper *o, const unsigned char *sha1);
 int odb_helper_put_object(struct odb_helper *o,
 			  const void *buf, size_t len,
 			  const char *type, unsigned char *sha1);
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 26/40] odb-helper: add 'script_mode' to 'struct odb_helper'
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (24 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 25/40] external-odb: add 'get_direct' support Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 27/40] odb-helper: add init_object_process() Christian Couder
                   ` (14 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

to prepare for having a long running odb helper sub-process
handling the communication between Git and an external odb.

We introduce "odb.<name>.subprocesscommand" to make it
possible to define such a sub-process, and we mark such odb
helpers with the new 'script_mode' field set to 0.

Helpers defined using the existing "odb.<name>.scriptcommand"
are marked with the 'script_mode' field set to 1.

Implementation of the different capabilities/instructions in
the new (sub-)process mode is left for following commits.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c |  8 +++++++-
 odb-helper.c   | 19 ++++++++++++++-----
 odb-helper.h   |  1 +
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 31d21bfe04..ccca67eff5 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -32,8 +32,14 @@ static int external_odb_config(const char *var, const char *value, void *data)
 
 	o = find_or_create_helper(name, namelen);
 
-	if (!strcmp(subkey, "scriptcommand"))
+	if (!strcmp(subkey, "scriptcommand")) {
+		o->script_mode = 1;
 		return git_config_string(&o->cmd, var, value);
+	}
+	if (!strcmp(subkey, "subprocesscommand")) {
+		o->script_mode = 0;
+		return git_config_string(&o->cmd, var, value);
+	}
 
 	return 0;
 }
diff --git a/odb-helper.c b/odb-helper.c
index b1f5464214..4c16dd297a 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -123,6 +123,9 @@ int odb_helper_init(struct odb_helper *o)
 	FILE *fh;
 	struct strbuf line = STRBUF_INIT;
 
+	if (!o->script_mode)
+		return 0;
+
 	if (odb_helper_start(o, &cmd, 0, "init") < 0)
 		return -1;
 
@@ -173,16 +176,12 @@ static int odb_helper_object_cmp(const void *va, const void *vb)
 	return hashcmp(a->sha1, b->sha1);
 }
 
-static void odb_helper_load_have(struct odb_helper *o)
+static void have_object_script(struct odb_helper *o)
 {
 	struct odb_helper_cmd cmd;
 	FILE *fh;
 	struct strbuf line = STRBUF_INIT;
 
-	if (o->have_valid)
-		return;
-	o->have_valid = 1;
-
 	if (odb_helper_start(o, &cmd, 0, "have") < 0)
 		return;
 
@@ -194,6 +193,16 @@ static void odb_helper_load_have(struct odb_helper *o)
 	strbuf_release(&line);
 	fclose(fh);
 	odb_helper_finish(o, &cmd);
+}
+
+static void odb_helper_load_have(struct odb_helper *o)
+{
+	if (o->have_valid)
+		return;
+	o->have_valid = 1;
+
+	if (o->script_mode)
+		have_object_script(o);
 
 	qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
 }
diff --git a/odb-helper.h b/odb-helper.h
index f2fd2b7c9c..04b85f1d02 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -15,6 +15,7 @@ struct odb_helper {
 	const char *name;
 	const char *cmd;
 	unsigned int supported_capabilities;
+	int script_mode;
 
 	struct odb_helper_object {
 		unsigned char sha1[20];
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 27/40] odb-helper: add init_object_process()
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (25 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 26/40] odb-helper: add 'script_mode' to 'struct odb_helper' Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 28/40] Add t0450 to test 'get_direct' mechanism Christian Couder
                   ` (13 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Ben Peart

From: Ben Peart <benpeart@microsoft.com>

This adds the infrastructure to launch and use long running
sub-processes as external odb helpers.

For now only the 'init' and 'get_direct' capabilities are
supported with sub-processes.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c |  52 ++++---
 odb-helper.c   | 481 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 sha1_file.c    |  56 +++++--
 3 files changed, 535 insertions(+), 54 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index ccca67eff5..084cd32e0b 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -67,32 +67,11 @@ const char *external_odb_root(void)
 	return root;
 }
 
-int external_odb_has_object(const unsigned char *sha1)
-{
-	struct odb_helper *o;
-
-	if (!use_external_odb)
-		return 0;
-
-	external_odb_init();
-
-	for (o = helpers; o; o = o->next) {
-		if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE))
-			return 1;
-		if (odb_helper_has_object(o, sha1))
-			return 1;
-	}
-	return 0;
-}
-
-int external_odb_get_object(const unsigned char *sha1)
+static int external_odb_do_get_object(const unsigned char *sha1)
 {
 	struct odb_helper *o;
 	const char *path;
 
-	if (!external_odb_has_object(sha1))
-		return -1;
-
 	path = sha1_file_name_alt(external_odb_root(), sha1);
 	safe_create_leading_directories_const(path);
 	prepare_external_alt_odb();
@@ -147,6 +126,35 @@ int external_odb_get_direct(const unsigned char *sha1)
 	return -1;
 }
 
+int external_odb_has_object(const unsigned char *sha1)
+{
+	struct odb_helper *o;
+
+	if (!use_external_odb)
+		return 0;
+
+	external_odb_init();
+
+	for (o = helpers; o; o = o->next) {
+		if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE)) {
+			if (o->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT)
+				return 1;
+			return !external_odb_do_get_object(sha1);
+		}
+		if (odb_helper_has_object(o, sha1))
+			return 1;
+	}
+	return 0;
+}
+
+int external_odb_get_object(const unsigned char *sha1)
+{
+	if (!external_odb_has_object(sha1))
+		return -1;
+
+	return external_odb_do_get_object(sha1);
+}
+
 int external_odb_put_object(const void *buf, size_t len,
 			    const char *type, unsigned char *sha1)
 {
diff --git a/odb-helper.c b/odb-helper.c
index 4c16dd297a..fce1dff501 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -4,6 +4,22 @@
 #include "odb-helper.h"
 #include "run-command.h"
 #include "sha1-lookup.h"
+#include "sub-process.h"
+#include "pkt-line.h"
+#include "sigchain.h"
+
+struct object_process {
+	struct subprocess_entry subprocess;
+	unsigned int supported_capabilities;
+};
+
+static struct hashmap subprocess_map;
+
+static int check_object_process_status(int fd, struct strbuf *status)
+{
+	subprocess_read_status(fd, status);
+	return strcmp(status->buf, "success");
+}
 
 static void parse_capabilities(char *cap_buf,
 			       unsigned int *supported_capabilities,
@@ -39,6 +55,396 @@ static void parse_capabilities(char *cap_buf,
 	string_list_clear(&cap_list, 0);
 }
 
+static int send_start_packets(struct child_process *process, const char *cmd)
+{
+	int err = packet_writel(process->in, "git-read-object-client", "version=1", NULL);
+	if (err)
+		return err;
+
+	err = strcmp(packet_read_line(process->out, NULL), "git-read-object-server");
+	if (err) {
+		error("external process '%s' does not support read-object protocol version 1", cmd);
+		return err;
+	}
+	err = strcmp(packet_read_line(process->out, NULL), "version=1");
+	if (err)
+		return err;
+	err = packet_read_line(process->out, NULL) != NULL;
+	if (err)
+		return err;
+
+	return packet_writel(process->in,
+			     "capability=get_git_obj",
+			     "capability=get_raw_obj",
+			     "capability=get_direct",
+			     "capability=put_raw_obj",
+			     "capability=have",
+			     NULL);
+}
+
+static int start_object_process_fn(struct subprocess_entry *subprocess)
+{
+	int err;
+	struct object_process *entry = (struct object_process *)subprocess;
+	struct child_process *process = &subprocess->process;
+	char *cap_buf;
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	err = send_start_packets(process, subprocess->cmd);
+
+	if (!err)
+		while ((cap_buf = packet_read_line(process->out, NULL)))
+			parse_capabilities(cap_buf, &entry->supported_capabilities, subprocess->cmd);
+
+	sigchain_pop(SIGPIPE);
+
+	return err;
+}
+
+static struct object_process *launch_object_process(struct odb_helper *o,
+						    unsigned int capability)
+{
+	struct object_process *entry = NULL;
+
+	if (!subprocess_map.tablesize)
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn) cmd2process_cmp, NULL, 0);
+	else
+		entry = (struct object_process *)subprocess_find_entry(&subprocess_map, o->cmd);
+
+	fflush(NULL);
+
+	if (!entry) {
+		entry = xmalloc(sizeof(*entry));
+		entry->supported_capabilities = 0;
+
+		if (subprocess_start(&subprocess_map, &entry->subprocess, o->cmd, start_object_process_fn)) {
+			error("Could not launch process for cmd '%s'", o->cmd);
+			free(entry);
+			return NULL;
+		}
+	}
+
+	o->supported_capabilities = entry->supported_capabilities;
+
+	if (capability && !(capability & entry->supported_capabilities)) {
+		error("The cmd '%s' does not support capability '%d'", o->cmd, capability);
+		return NULL;
+	}
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	return entry;
+}
+
+static int check_object_process_error(int err,
+				      const char *status,
+				      struct object_process *entry,
+				      const char *cmd,
+				      unsigned int capability)
+{
+	sigchain_pop(SIGPIPE);
+
+	if (!err)
+		return 0;
+
+	if (!strcmp(status, "error")) {
+		/* The process signaled a problem with the file. */
+	} else if (!strcmp(status, "notfound")) {
+		/* Object was not found */
+		err = -1;
+	} else if (!strcmp(status, "abort")) {
+		/*
+		 * The process signaled a permanent problem. Don't try to read
+		 * objects with the same command for the lifetime of the current
+		 * Git process.
+		 */
+		if (capability)
+			entry->supported_capabilities &= ~capability;
+	} else {
+		/*
+		 * Something went wrong with the read-object process.
+		 * Force shutdown and restart if needed.
+		 */
+		error("external object process '%s' failed", cmd);
+		subprocess_stop(&subprocess_map, &entry->subprocess);
+		free(entry);
+	}
+
+	return err;
+}
+
+static int send_init_packets(struct object_process *entry,
+			     struct strbuf *status)
+{
+	struct child_process *process = &entry->subprocess.process;
+
+	return packet_write_fmt_gently(process->in, "command=init\n") ||
+		packet_flush_gently(process->in) ||
+		check_object_process_status(process->out, status);
+}
+
+static int init_object_process(struct odb_helper *o)
+{
+	int err;
+	struct strbuf status = STRBUF_INIT;
+	struct object_process *entry = launch_object_process(o, 0);
+	if (!entry)
+		return -1;
+
+	err = send_init_packets(entry, &status);
+
+	return check_object_process_error(err, status.buf, entry,
+					  o->cmd, 0);
+}
+
+static ssize_t read_packetized_raw_object_to_fd(struct odb_helper *o,
+						const unsigned char *sha1,
+						int fd_in, int fd_out)
+{
+	ssize_t total_read = 0;
+	unsigned long total_got = 0;
+	int packet_len;
+
+	char hdr[32];
+	int hdrlen;
+
+	int ret = Z_STREAM_END;
+	unsigned char compressed[4096];
+	git_zstream stream;
+	git_SHA_CTX hash;
+	unsigned char real_sha1[20];
+
+	off_t size;
+	enum object_type type;
+	const char *s;
+	int pkt_size;
+	char *size_buf;
+
+	size_buf = packet_read_line(fd_in, &pkt_size);
+	if (!skip_prefix(size_buf, "size=", &s))
+		return error("odb helper '%s' did not send size of plain object", o->name);
+	size = strtoumax(s, NULL, 10);
+	if (!skip_prefix(packet_read_line(fd_in, NULL), "kind=", &s))
+		return error("odb helper '%s' did not send kind of plain object", o->name);
+	/* Check if the object is not available */
+	if (!strcmp(s, "none"))
+		return -1;
+	type = type_from_string_gently(s, strlen(s), 1);
+	if (type < 0)
+		return error("odb helper '%s' sent bad type '%s'", o->name, s);
+
+	/* Set it up */
+	git_deflate_init(&stream, zlib_compression_level);
+	stream.next_out = compressed;
+	stream.avail_out = sizeof(compressed);
+	git_SHA1_Init(&hash);
+
+	/* First header.. */
+	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", typename(type), size) + 1;
+	stream.next_in = (unsigned char *)hdr;
+	stream.avail_in = hdrlen;
+	while (git_deflate(&stream, 0) == Z_OK)
+		; /* nothing */
+	git_SHA1_Update(&hash, hdr, hdrlen);
+
+	for (;;) {
+		/* packet_read() writes a '\0' extra byte at the end */
+		char buf[LARGE_PACKET_DATA_MAX + 1];
+
+		packet_len = packet_read(fd_in, NULL, NULL,
+			buf, LARGE_PACKET_DATA_MAX + 1,
+			PACKET_READ_GENTLE_ON_EOF);
+
+		if (packet_len <= 0)
+			break;
+
+		total_got += packet_len;
+
+		/* Then the data itself.. */
+		stream.next_in = (void *)buf;
+		stream.avail_in = packet_len;
+		do {
+			unsigned char *in0 = stream.next_in;
+			ret = git_deflate(&stream, Z_FINISH);
+			git_SHA1_Update(&hash, in0, stream.next_in - in0);
+			write_or_die(fd_out, compressed, stream.next_out - compressed);
+			stream.next_out = compressed;
+			stream.avail_out = sizeof(compressed);
+		} while (ret == Z_OK);
+
+		total_read += packet_len;
+	}
+
+	if (packet_len < 0) {
+		error("unable to read from odb helper '%s': %s",
+		      o->name, strerror(errno));
+		git_deflate_end(&stream);
+		return packet_len;
+	}
+
+	if (ret != Z_STREAM_END) {
+		warning("bad zlib data from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+
+	ret = git_deflate_end_gently(&stream);
+	if (ret != Z_OK) {
+		warning("deflateEnd on object %s from odb helper '%s' failed (%d)",
+			sha1_to_hex(sha1), o->name, ret);
+		return -1;
+	}
+	git_SHA1_Final(real_sha1, &hash);
+	if (hashcmp(sha1, real_sha1)) {
+		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+		return -1;
+	}
+	if (total_got != size) {
+		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
+			o->name, sha1_to_hex(sha1), total_got, size);
+		return -1;
+	}
+
+	return total_read;
+}
+
+static ssize_t read_packetized_git_object_to_fd(struct odb_helper *o,
+						const unsigned char *sha1,
+						int fd_in, int fd_out)
+{
+	ssize_t total_read = 0;
+	unsigned long total_got = 0;
+	int packet_len;
+	git_zstream stream;
+	int zret = Z_STREAM_END;
+	git_SHA_CTX hash;
+	unsigned char real_sha1[20];
+
+	memset(&stream, 0, sizeof(stream));
+	git_inflate_init(&stream);
+	git_SHA1_Init(&hash);
+
+	for (;;) {
+		/* packet_read() writes a '\0' extra byte at the end */
+		char buf[LARGE_PACKET_DATA_MAX + 1];
+
+		packet_len = packet_read(fd_in, NULL, NULL,
+			buf, LARGE_PACKET_DATA_MAX + 1,
+			PACKET_READ_GENTLE_ON_EOF);
+
+		if (packet_len <= 0)
+			break;
+
+		write_or_die(fd_out, buf, packet_len);
+
+		stream.next_in = (unsigned char *)buf;
+		stream.avail_in = packet_len;
+		do {
+			unsigned char inflated[4096];
+			unsigned long got;
+
+			stream.next_out = inflated;
+			stream.avail_out = sizeof(inflated);
+			zret = git_inflate(&stream, Z_SYNC_FLUSH);
+			got = sizeof(inflated) - stream.avail_out;
+
+			git_SHA1_Update(&hash, inflated, got);
+			/* skip header when counting size */
+			if (!total_got) {
+				const unsigned char *p = memchr(inflated, '\0', got);
+				if (p)
+					got -= p - inflated + 1;
+				else
+					got = 0;
+			}
+			total_got += got;
+		} while (stream.avail_in && zret == Z_OK);
+
+		total_read += packet_len;
+	}
+
+	git_inflate_end(&stream);
+
+	if (packet_len < 0)
+		return packet_len;
+
+	git_SHA1_Final(real_sha1, &hash);
+
+	if (zret != Z_STREAM_END) {
+		warning("bad zlib data from odb helper '%s' for %s",
+			o->name, sha1_to_hex(sha1));
+		return -1;
+	}
+	if (hashcmp(real_sha1, sha1)) {
+		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+		return -1;
+	}
+
+	return total_read;
+}
+
+static int send_get_packets(struct odb_helper *o,
+			    struct object_process *entry,
+			    const unsigned char *sha1,
+			    int fd,
+			    unsigned int *cur_cap,
+			    struct strbuf *status)
+{
+	const char *instruction;
+	int err;
+	struct child_process *process = &entry->subprocess.process;
+
+	if (entry->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ) {
+		*cur_cap = ODB_HELPER_CAP_GET_GIT_OBJ;
+		instruction = "get_git_obj";
+	} else if (entry->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ) {
+		*cur_cap = ODB_HELPER_CAP_GET_RAW_OBJ;
+		instruction = "get_raw_obj";
+	} else if (entry->supported_capabilities & ODB_HELPER_CAP_GET_DIRECT) {
+		*cur_cap = ODB_HELPER_CAP_GET_DIRECT;
+		instruction = "get_direct";
+	} else {
+		BUG("No known ODB_HELPER_CAP_GET_XXX capability!");
+	}
+
+	err = packet_write_fmt_gently(process->in, "command=%s\n", instruction);
+	if (err)
+		return err;
+
+	err = packet_write_fmt_gently(process->in, "sha1=%s\n", sha1_to_hex(sha1));
+	if (err)
+		return err;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		return err;
+
+	if (entry->supported_capabilities & ODB_HELPER_CAP_GET_RAW_OBJ)
+		err = read_packetized_raw_object_to_fd(o, sha1, process->out, fd) < 0;
+	else if (entry->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ)
+		err = read_packetized_git_object_to_fd(o, sha1, process->out, fd) < 0;
+
+	return check_object_process_status(process->out, status);
+}
+
+static int get_object_process(struct odb_helper *o, const unsigned char *sha1, int fd)
+{
+	int err;
+	struct strbuf status = STRBUF_INIT;
+	unsigned int cur_cap = 0;
+	struct object_process *entry = launch_object_process(o, 0);
+	if (!entry)
+		return -1;
+
+	err = send_get_packets(o, entry, sha1, fd, &cur_cap, &status);
+
+	return check_object_process_error(err, status.buf, entry,
+					  o->cmd, cur_cap);
+}
+
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
 	struct odb_helper *o;
@@ -117,15 +523,12 @@ static int odb_helper_finish(struct odb_helper *o,
 	return 0;
 }
 
-int odb_helper_init(struct odb_helper *o)
+static int init_object_script(struct odb_helper *o)
 {
 	struct odb_helper_cmd cmd;
 	FILE *fh;
 	struct strbuf line = STRBUF_INIT;
 
-	if (!o->script_mode)
-		return 0;
-
 	if (odb_helper_start(o, &cmd, 0, "init") < 0)
 		return -1;
 
@@ -140,6 +543,21 @@ int odb_helper_init(struct odb_helper *o)
 	return 0;
 }
 
+int odb_helper_init(struct odb_helper *o)
+{
+	int res;
+	uint64_t start = getnanotime();
+
+	if (o->script_mode)
+		res = init_object_script(o);
+	else
+		res = init_object_process(o);
+
+	trace_performance_since(start, "odb_helper_init");
+
+	return 0;
+}
+
 static int parse_object_line(struct odb_helper_object *o, const char *line)
 {
 	char *end;
@@ -414,28 +832,42 @@ static int odb_helper_get_git_object(struct odb_helper *o,
 	return 0;
 }
 
-int odb_helper_get_direct(struct odb_helper *o,
-			  const unsigned char *sha1)
+static int get_direct_script(struct odb_helper *o, const unsigned char *sha1)
 {
-	struct odb_helper_object *obj;
 	struct odb_helper_cmd cmd;
 
-	obj = odb_helper_lookup(o, sha1);
-	if (!obj)
-		return -1;
-
 	if (odb_helper_start(o, &cmd, 0, "get_direct %s", sha1_to_hex(sha1)) < 0)
 		return -1;
-
 	if (odb_helper_finish(o, &cmd))
 		return -1;
-
 	return 0;
 }
 
-int odb_helper_get_object(struct odb_helper *o,
-			  const unsigned char *sha1,
-			  int fd)
+int odb_helper_get_direct(struct odb_helper *o,
+			  const unsigned char *sha1)
+{
+	int res;
+	uint64_t start;
+
+	if (o->supported_capabilities & ODB_HELPER_CAP_HAVE) {
+		struct odb_helper_object *obj = odb_helper_lookup(o, sha1);
+		if (!obj)
+			return -1;
+	}
+
+	start = getnanotime();
+
+	if (o->script_mode)
+		res = get_direct_script(o, sha1);
+	else
+		res = get_object_process(o, sha1, -1);
+
+	trace_performance_since(start, "odb_helper_get_direct");
+
+	return res;
+}
+
+static int get_object_script(struct odb_helper *o, const unsigned char *sha1, int fd)
 {
 	if (o->supported_capabilities & ODB_HELPER_CAP_GET_GIT_OBJ)
 		return odb_helper_get_git_object(o, sha1, fd);
@@ -447,6 +879,23 @@ int odb_helper_get_object(struct odb_helper *o,
 	BUG("invalid get capability (capabilities: '%d')", o->supported_capabilities);
 }
 
+int odb_helper_get_object(struct odb_helper *o,
+			  const unsigned char *sha1,
+			  int fd)
+{
+	int res;
+	uint64_t start = getnanotime();
+
+	if (o->script_mode)
+		res = get_object_script(o, sha1, fd);
+	else
+		res = get_object_process(o, sha1, fd);
+
+	trace_performance_since(start, "odb_helper_get_object");
+
+	return res;
+}
+
 int odb_helper_put_object(struct odb_helper *o,
 			  const void *buf, size_t len,
 			  const char *type, unsigned char *sha1)
diff --git a/sha1_file.c b/sha1_file.c
index 3735720bfc..fb34f0b18d 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -698,7 +698,17 @@ int check_and_freshen_file(const char *fn, int freshen)
 
 static int check_and_freshen_local(const unsigned char *sha1, int freshen)
 {
-	return check_and_freshen_file(sha1_file_name(sha1), freshen);
+	int ret;
+	int tried_hook = 0;
+
+retry:
+	ret = check_and_freshen_file(sha1_file_name(sha1), freshen);
+	if (!ret && !tried_hook) {
+		tried_hook = 1;
+		if (!external_odb_get_direct(sha1))
+			goto retry;
+	}
+	return ret;
 }
 
 static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen)
@@ -3017,20 +3027,11 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	return (status < 0) ? status : 0;
 }
 
-int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)
+static int find_cached_or_packed(const unsigned char *sha1, struct object_info *oi,
+				 unsigned flags, struct pack_entry *e, int retry)
 {
-	static struct object_info blank_oi = OBJECT_INFO_INIT;
-	struct pack_entry e;
-	int rtype;
-	const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
-				    lookup_replace_object(sha1) :
-				    sha1;
-
-	if (!oi)
-		oi = &blank_oi;
-
 	if (!(flags & OBJECT_INFO_SKIP_CACHED)) {
-		struct cached_object *co = find_cached_object(real);
+		struct cached_object *co = find_cached_object(sha1);
 		if (co) {
 			if (oi->typep)
 				*(oi->typep) = co->type;
@@ -3049,9 +3050,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 		}
 	}
 
-	if (!find_pack_entry(real, &e)) {
+	if (!find_pack_entry(sha1, e)) {
 		/* Most likely it's a loose object. */
-		if (!sha1_loose_object_info(real, oi, flags)) {
+		if (!sha1_loose_object_info(sha1, oi, flags)) {
 			oi->whence = OI_LOOSE;
 			return 0;
 		}
@@ -3061,10 +3062,33 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 			return -1;
 		} else {
 			reprepare_packed_git();
-			if (!find_pack_entry(real, &e))
+			if (!find_pack_entry(sha1, e)) {
+				if (retry && !external_odb_get_direct(sha1))
+					return find_cached_or_packed(sha1, oi, flags, e, 0);
 				return -1;
+			}
 		}
 	}
+	return 1;
+}
+
+int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)
+{
+	static struct object_info blank_oi = OBJECT_INFO_INIT;
+	struct pack_entry e;
+	int rtype;
+	enum object_type real_type;
+	int res;
+	const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
+				    lookup_replace_object(sha1) :
+				    sha1;
+
+	if (!oi)
+		oi = &blank_oi;
+
+	res = find_cached_or_packed(real, oi, flags, &e, 1);
+	if (res < 1)
+		return res;
 
 	if (oi == &blank_oi)
 		/*
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 28/40] Add t0450 to test 'get_direct' mechanism
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (26 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 27/40] odb-helper: add init_object_process() Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 29/40] Add t0460 to test passing git objects Christian Couder
                   ` (12 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder, Ben Peart

From: Ben Peart <benpeart@microsoft.com>

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0450-read-object.sh | 28 +++++++++++++++++++++
 t/t0450/read-object    | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)
 create mode 100755 t/t0450-read-object.sh
 create mode 100755 t/t0450/read-object

diff --git a/t/t0450-read-object.sh b/t/t0450-read-object.sh
new file mode 100755
index 0000000000..6b97305452
--- /dev/null
+++ b/t/t0450-read-object.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='tests for long running read-object process'
+
+. ./test-lib.sh
+
+PATH="$PATH:$TEST_DIRECTORY/t0450"
+
+test_expect_success 'setup host repo with a root commit' '
+	test_commit zero &&
+	hash1=$(git ls-tree HEAD | grep zero.t | cut -f1 | cut -d\  -f3)
+'
+
+HELPER="read-object"
+
+test_expect_success 'blobs can be retrieved from the host repo' '
+	git init guest-repo &&
+	(cd guest-repo &&
+	 git config odb.magic.subprocessCommand "$HELPER" &&
+	 git cat-file blob "$hash1" >/dev/null)
+'
+
+test_expect_success 'invalid blobs generate errors' '
+	cd guest-repo &&
+	test_must_fail git cat-file blob "invalid"
+'
+
+test_done
diff --git a/t/t0450/read-object b/t/t0450/read-object
new file mode 100755
index 0000000000..cf22e2f581
--- /dev/null
+++ b/t/t0450/read-object
@@ -0,0 +1,68 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git read-object protocol version 1
+# See Documentation/technical/read-object-protocol.txt
+#
+# Allows you to test the ability for blobs to be pulled from a host git repo
+# "on demand."  Called when git needs a blob it couldn't find locally due to
+# a lazy clone that only cloned the commits and trees.
+#
+# A lazy clone can be simulated via the following commands from the host repo
+# you wish to create a lazy clone of:
+#
+# cd /host_repo
+# git rev-parse HEAD
+# git init /guest_repo
+# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
+#	cut -d' ' -f1 | git pack-objects /e/guest_repo/.git/objects/pack/noblobs
+# cd /guest_repo
+# git config core.virtualizeobjects true
+# git reset --hard <sha from rev-parse call above>
+#
+# Please note, this sample is a minimal skeleton. No proper error handling 
+# was implemented.
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+
+#
+# Point $DIR to the folder where your host git repo is located so we can pull
+# missing objects from it
+#
+my $DIR = "../.git/";
+
+packet_initialize("git-read-object", 1);
+
+packet_read_and_check_capabilities("get_direct");
+packet_write_capabilities("get_direct");
+
+while (1) {
+	my ($res, $command) = packet_txt_read();
+
+	if ( $res == -1 ) {
+		exit 0;
+	}
+
+	$command =~ s/^command=//;
+
+	if ( $command eq "init" ) {
+		packet_bin_read();
+
+		packet_txt_write("status=success");
+		packet_flush();
+	} elsif ( $command eq "get_direct" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | GIT_NO_EXTERNAL_ODB=1 git hash-object -w --stdin >/dev/null 2>&1');
+
+		packet_txt_write(($?) ? "status=error" : "status=success");
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 29/40] Add t0460 to test passing git objects
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (27 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 28/40] Add t0450 to test 'get_direct' mechanism Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 30/40] odb-helper: add put_object_process() Christian Couder
                   ` (11 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0460-read-object-git.sh | 28 +++++++++++++++++
 t/t0460/read-object-git    | 78 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+)
 create mode 100755 t/t0460-read-object-git.sh
 create mode 100755 t/t0460/read-object-git

diff --git a/t/t0460-read-object-git.sh b/t/t0460-read-object-git.sh
new file mode 100755
index 0000000000..2873b445f3
--- /dev/null
+++ b/t/t0460-read-object-git.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='tests for long running read-object process passing git objects'
+
+. ./test-lib.sh
+
+PATH="$PATH:$TEST_DIRECTORY/t0460"
+
+test_expect_success 'setup host repo with a root commit' '
+	test_commit zero &&
+	hash1=$(git ls-tree HEAD | grep zero.t | cut -f1 | cut -d\  -f3)
+'
+
+HELPER="read-object-git"
+
+test_expect_success 'blobs can be retrieved from the host repo' '
+	git init guest-repo &&
+	(cd guest-repo &&
+	 git config odb.magic.subprocessCommand "$HELPER" &&
+	 git cat-file blob "$hash1" >/dev/null)
+'
+
+test_expect_success 'invalid blobs generate errors' '
+	cd guest-repo &&
+	test_must_fail git cat-file blob "invalid"
+'
+
+test_done
diff --git a/t/t0460/read-object-git b/t/t0460/read-object-git
new file mode 100755
index 0000000000..38529e622e
--- /dev/null
+++ b/t/t0460/read-object-git
@@ -0,0 +1,78 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git read-object protocol version 1
+# See Documentation/technical/read-object-protocol.txt
+#
+# Allows you to test the ability for blobs to be pulled from a host git repo
+# "on demand."  Called when git needs a blob it couldn't find locally due to
+# a lazy clone that only cloned the commits and trees.
+#
+# A lazy clone can be simulated via the following commands from the host repo
+# you wish to create a lazy clone of:
+#
+# cd /host_repo
+# git rev-parse HEAD
+# git init /guest_repo
+# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
+#	cut -d' ' -f1 | git pack-objects /e/guest_repo/.git/objects/pack/noblobs
+# cd /guest_repo
+# git config core.virtualizeobjects true
+# git reset --hard <sha from rev-parse call above>
+#
+# Please note, this sample is a minimal skeleton. No proper error handling 
+# was implemented.
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+
+#
+# Point $DIR to the folder where your host git repo is located so we can pull
+# missing objects from it
+#
+my $DIR = "../.git/";
+
+packet_initialize("git-read-object", 1);
+
+packet_read_and_check_capabilities("get_git_obj");
+packet_write_capabilities("get_git_obj");
+
+while (1) {
+	my ($res, $command) = packet_txt_read();
+
+	if ( $res == -1 ) {
+		exit 0;
+	}
+
+	$command =~ s/^command=//;
+
+	if ( $command eq "init" ) {
+		packet_bin_read();
+
+		packet_txt_write("status=success");
+		packet_flush();
+	} elsif ( $command eq "get_git_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		my $path = $sha1;
+		$path =~ s{..}{$&/};
+		$path = $DIR . "/objects/" . $path;
+
+		my $contents = do {
+		    local $/;
+		    open my $fh, $path or die "Can't open '$path': $!";
+		    <$fh>
+		};
+
+		packet_bin_write($contents);
+		packet_flush();
+		packet_txt_write("status=success");
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 30/40] odb-helper: add put_object_process()
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (28 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 29/40] Add t0460 to test passing git objects Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 31/40] Add t0470 to test passing raw objects Christian Couder
                   ` (10 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This adds the infrastructure to send objects to a sub-process
handling the communication with an external odb.

For now we only handle sending raw blobs using the 'put_raw_obj'
instruction.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 odb-helper.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index fce1dff501..db90c0a004 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -445,6 +445,58 @@ static int get_object_process(struct odb_helper *o, const unsigned char *sha1, i
 					  o->cmd, cur_cap);
 }
 
+static int send_put_packets(struct object_process *entry,
+			    const unsigned char *sha1,
+			    const void *buf,
+			    size_t len,
+			    struct strbuf *status)
+{
+	struct child_process *process = &entry->subprocess.process;
+	int err = packet_write_fmt_gently(process->in, "command=put_raw_obj\n");
+	if (err)
+		return err;
+
+	err = packet_write_fmt_gently(process->in, "sha1=%s\n", sha1_to_hex(sha1));
+	if (err)
+		return err;
+
+	err = packet_write_fmt_gently(process->in, "size=%"PRIuMAX"\n", len);
+	if (err)
+		return err;
+
+	err = packet_write_fmt_gently(process->in, "kind=blob\n");
+	if (err)
+		return err;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		return err;
+
+	err = write_packetized_from_buf(buf, len, process->in);
+	if (err)
+		return err;
+
+	return check_object_process_status(process->out, status);
+}
+
+static int put_object_process(struct odb_helper *o,
+			      const void *buf, size_t len,
+			      const char *type, unsigned char *sha1)
+{
+	int err;
+	struct object_process *entry;
+	struct strbuf status = STRBUF_INIT;
+
+	entry = launch_object_process(o, ODB_HELPER_CAP_PUT_RAW_OBJ);
+	if (!entry)
+		return -1;
+
+	err = send_put_packets(entry, sha1, buf, len, &status);
+
+	return check_object_process_error(err, status.buf, entry, o->cmd,
+					  ODB_HELPER_CAP_PUT_RAW_OBJ);
+}
+
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
 	struct odb_helper *o;
@@ -896,9 +948,9 @@ int odb_helper_get_object(struct odb_helper *o,
 	return res;
 }
 
-int odb_helper_put_object(struct odb_helper *o,
-			  const void *buf, size_t len,
-			  const char *type, unsigned char *sha1)
+static int put_raw_object_script(struct odb_helper *o,
+				 const void *buf, size_t len,
+				 const char *type, unsigned char *sha1)
 {
 	struct odb_helper_cmd cmd;
 
@@ -924,3 +976,20 @@ int odb_helper_put_object(struct odb_helper *o,
 	odb_helper_finish(o, &cmd);
 	return 0;
 }
+
+int odb_helper_put_object(struct odb_helper *o,
+			  const void *buf, size_t len,
+			  const char *type, unsigned char *sha1)
+{
+	int res;
+	uint64_t start = getnanotime();
+
+	if (o->script_mode)
+		res = put_raw_object_script(o, buf, len, type, sha1);
+	else
+		res = put_object_process(o, buf, len, type, sha1);
+
+	trace_performance_since(start, "odb_helper_put_object");
+
+	return res;
+}
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 31/40] Add t0470 to test passing raw objects
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (29 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 30/40] odb-helper: add put_object_process() Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 32/40] odb-helper: add have_object_process() Christian Couder
                   ` (9 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0470-read-object-http-e-odb.sh | 109 ++++++++++++++++++++++++++++++++++++++
 t/t0470/read-object-plain         |  83 +++++++++++++++++++++++++++++
 2 files changed, 192 insertions(+)
 create mode 100755 t/t0470-read-object-http-e-odb.sh
 create mode 100755 t/t0470/read-object-plain

diff --git a/t/t0470-read-object-http-e-odb.sh b/t/t0470-read-object-http-e-odb.sh
new file mode 100755
index 0000000000..774528c04f
--- /dev/null
+++ b/t/t0470-read-object-http-e-odb.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+
+test_description='tests for read-object process passing plain objects to an HTTPD server'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+PATH="$PATH:$TEST_DIRECTORY/t0470"
+
+# odb helper script must see this
+export HTTPD_URL
+
+HELPER="read-object-plain"
+
+test_expect_success 'setup repo with a root commit' '
+	test_commit zero
+'
+
+test_expect_success 'setup another repo from the first one' '
+	git init other-repo &&
+	(cd other-repo &&
+	 git remote add origin .. &&
+	 git pull origin master &&
+	 git checkout master &&
+	 git log)
+'
+
+test_expect_success 'setup the helper in the root repo' '
+	git config odb.magic.subprocessCommand "$HELPER"
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME&size=123&type=blob"
+
+test_expect_success 'can upload a file' '
+	echo "Hello Apache World!" >hello_to_send.txt &&
+	echo "How are you?" >>hello_to_send.txt &&
+	curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" >out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+	curl --include "$LIST_URL" >out_list &&
+	grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+	curl --data "delete" --include "$UPLOAD_URL&delete=1" >out_delete &&
+	curl --include "$LIST_URL" >out_list2 &&
+	! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+	test_commit one &&
+	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+	echo "$hash1-4-blob" >expected &&
+	ls "$FILES_DIR" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+	git cat-file blob "$hash1" &&
+	git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+	(cd other-repo &&
+	 git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+	 test_must_fail git cat-file blob "$hash1" &&
+	 git config odb.magic.subprocesscommand "$HELPER" &&
+	 git cat-file blob "$hash1" &&
+	 git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone .. . &&
+	 git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 test_must_fail git clone --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git clone -c odb.magic.subprocessCommand="$HELPER" --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+stop_httpd
+
+test_done
diff --git a/t/t0470/read-object-plain b/t/t0470/read-object-plain
new file mode 100755
index 0000000000..918e7b00b5
--- /dev/null
+++ b/t/t0470/read-object-plain
@@ -0,0 +1,83 @@
+#!/usr/bin/perl
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+use LWP::UserAgent;
+use HTTP::Request::Common;
+
+packet_initialize("git-read-object", 1);
+
+packet_read_and_check_capabilities("get_raw_obj", "put_raw_obj");
+packet_write_capabilities("get_raw_obj", "put_raw_obj");
+
+my $http_url = $ENV{HTTPD_URL};
+
+while (1) {
+	my ($res, $command) = packet_txt_read();
+
+	if ( $res == -1 ) {
+		exit 0;
+	}
+
+	$command =~ s/^command=//;
+
+	if ( $command eq "init" ) {
+		packet_bin_read();
+
+		packet_txt_write("status=success");
+		packet_flush();
+	} elsif ( $command eq "get_raw_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		my $get_url = $http_url . "/list/?sha1=" . $sha1;
+
+		my $userAgent = LWP::UserAgent->new();
+
+		my $response = $userAgent->get( $get_url );
+
+		if ($response->is_error) {
+		    packet_txt_write("size=0");
+		    packet_txt_write("kind=none");	    
+		    packet_txt_write("status=notfound");
+		} else {
+		    packet_txt_write("size=" . length($response->content));
+		    packet_txt_write("kind=blob");
+		    packet_bin_write($response->content);
+		    packet_flush();
+		    packet_txt_write("status=success");
+		}
+
+		packet_flush();
+	} elsif ( $command eq "put_raw_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		my ($size) = packet_txt_read() =~ /^size=([0-9]+)$/;
+		my ($kind) = packet_txt_read() =~ /^kind=(\w+)$/;
+		packet_bin_read();
+
+		# We must read the content we are sent and send it to the right url
+		my ($res, $buf) = packet_bin_read();
+		die "bad packet_bin_read res ($res)" unless ($res eq 0);
+		( packet_bin_read() eq ( 1, "" ) ) || die "bad send end";		
+
+		my $upload_url = $http_url . "/upload/?sha1=" . $sha1 . "&size=" . $size . "&type=blob";
+
+		my $userAgent = LWP::UserAgent->new();
+		my $request = POST $upload_url, Content_Type => 'multipart/form-data', Content => $buf;
+
+		my $response = $userAgent->request($request);
+
+		if ($response->is_error) {
+			packet_txt_write("status=failure");
+		} else {
+			packet_txt_write("status=success");
+		}
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 32/40] odb-helper: add have_object_process()
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (30 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 31/40] Add t0470 to test passing raw objects Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 33/40] Add t0480 to test "have" capability and raw objects Christian Couder
                   ` (8 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This adds the infrastructure to handle 'have' instructions in
process mode.

The answer from the helper sub-process should be like the
output in script mode, that is lines like this:

sha1 SPACE size SPACE type NEWLINE

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 odb-helper.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/odb-helper.c b/odb-helper.c
index db90c0a004..31fb398469 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -646,6 +646,70 @@ static int odb_helper_object_cmp(const void *va, const void *vb)
 	return hashcmp(a->sha1, b->sha1);
 }
 
+static int send_have_packets(struct odb_helper *o,
+			     struct object_process *entry,
+			     struct strbuf *status)
+{
+	char *line;
+	int packet_len;
+	int total_got = 0;
+	struct child_process *process = &entry->subprocess.process;
+	int err = packet_write_fmt_gently(process->in, "command=have\n");
+
+	if (err)
+		return err;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		return err;
+
+	for (;;) {
+		/* packet_read() writes a '\0' extra byte at the end */
+		char buf[LARGE_PACKET_DATA_MAX + 1];
+		char *p = buf;
+		int more;
+
+		packet_len = packet_read(process->out, NULL, NULL,
+			buf, LARGE_PACKET_DATA_MAX + 1,
+			PACKET_READ_GENTLE_ON_EOF);
+
+		if (packet_len <= 0)
+			break;
+
+		total_got += packet_len;
+
+		do {
+			char *eol = strchrnul(p, '\n');
+			more = (*eol == '\n');
+			*eol = '\0';
+			if (add_have_entry(o, p))
+				break;
+			p = eol + 1;
+		} while (more);
+	}
+
+	if (packet_len < 0)
+		return packet_len;
+
+	return check_object_process_status(process->out, status);
+}
+
+static int have_object_process(struct odb_helper *o)
+{
+	int err;
+	struct object_process *entry;
+	struct strbuf status = STRBUF_INIT;
+
+	entry = launch_object_process(o, ODB_HELPER_CAP_HAVE);
+	if (!entry)
+		return -1;
+
+	err = send_have_packets(o, entry, &status);
+
+	return check_object_process_error(err, status.buf, entry, o->cmd,
+					  ODB_HELPER_CAP_HAVE);
+}
+
 static void have_object_script(struct odb_helper *o)
 {
 	struct odb_helper_cmd cmd;
@@ -667,12 +731,20 @@ static void have_object_script(struct odb_helper *o)
 
 static void odb_helper_load_have(struct odb_helper *o)
 {
+	uint64_t start;
+
 	if (o->have_valid)
 		return;
 	o->have_valid = 1;
 
+	start = getnanotime();
+
 	if (o->script_mode)
 		have_object_script(o);
+	else
+		have_object_process(o);
+
+	trace_performance_since(start, "odb_helper_load_have");
 
 	qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
 }
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 33/40] Add t0480 to test "have" capability and raw objects
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (31 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 32/40] odb-helper: add have_object_process() Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 34/40] external-odb: use 'odb=magic' attribute to mark odb blobs Christian Couder
                   ` (7 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0480-read-object-have-http-e-odb.sh | 109 +++++++++++++++++++++++++++++++++
 t/t0480/read-object-plain-have         | 103 +++++++++++++++++++++++++++++++
 2 files changed, 212 insertions(+)
 create mode 100755 t/t0480-read-object-have-http-e-odb.sh
 create mode 100755 t/t0480/read-object-plain-have

diff --git a/t/t0480-read-object-have-http-e-odb.sh b/t/t0480-read-object-have-http-e-odb.sh
new file mode 100755
index 0000000000..056a40f2bb
--- /dev/null
+++ b/t/t0480-read-object-have-http-e-odb.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+
+test_description='tests for read-object process with "have" cap and plain objects'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+PATH="$PATH:$TEST_DIRECTORY/t0480"
+
+# odb helper script must see this
+export HTTPD_URL
+
+HELPER="read-object-plain-have"
+
+test_expect_success 'setup repo with a root commit' '
+	test_commit zero
+'
+
+test_expect_success 'setup another repo from the first one' '
+	git init other-repo &&
+	(cd other-repo &&
+	 git remote add origin .. &&
+	 git pull origin master &&
+	 git checkout master &&
+	 git log)
+'
+
+test_expect_success 'setup the helper in the root repo' '
+	git config odb.magic.subprocessCommand "$HELPER"
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME&size=123&type=blob"
+
+test_expect_success 'can upload a file' '
+	echo "Hello Apache World!" >hello_to_send.txt &&
+	echo "How are you?" >>hello_to_send.txt &&
+	curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" >out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+	curl --include "$LIST_URL" >out_list &&
+	grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+	curl --data "delete" --include "$UPLOAD_URL&delete=1" >out_delete &&
+	curl --include "$LIST_URL" >out_list2 &&
+	! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+	test_commit one &&
+	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+	echo "$hash1-4-blob" >expected &&
+	ls "$FILES_DIR" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+	git cat-file blob "$hash1" &&
+	git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+	(cd other-repo &&
+	 git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+	 test_must_fail git cat-file blob "$hash1" &&
+	 git config odb.magic.subprocessCommand "$HELPER" &&
+	 git cat-file blob "$hash1" &&
+	 git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone .. . &&
+	 git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 test_must_fail git clone --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git clone -c odb.magic.subprocessCommand="$HELPER" --no-local .. .) &&
+	rm -rf my-other-clone
+'
+
+stop_httpd
+
+test_done
diff --git a/t/t0480/read-object-plain-have b/t/t0480/read-object-plain-have
new file mode 100755
index 0000000000..d63e327f33
--- /dev/null
+++ b/t/t0480/read-object-plain-have
@@ -0,0 +1,103 @@
+#!/usr/bin/perl
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+use LWP::UserAgent;
+use HTTP::Request::Common;
+
+packet_initialize("git-read-object", 1);
+
+packet_read_and_check_capabilities("get_raw_obj", "put_raw_obj", "have");
+packet_write_capabilities("get_raw_obj", "put_raw_obj", "have");
+
+my $http_url = $ENV{HTTPD_URL};
+
+while (1) {
+	my ($res, $command) = packet_txt_read();
+
+	if ( $res == -1 ) {
+		exit 0;
+	}
+
+	$command =~ s/^command=//;
+
+	if ( $command eq "init" ) {
+		packet_bin_read();
+
+		packet_txt_write("status=success");
+		packet_flush();
+	} elsif ( $command eq "have" ) {
+		# read the flush after the command
+		packet_bin_read();
+
+		my $have_url = $http_url . "/list/";
+
+		my $userAgent = LWP::UserAgent->new();
+		my $response = $userAgent->get( $have_url );
+
+		if ($response->is_error) {
+			packet_bin_write("");
+			packet_flush();
+			packet_txt_write("status=failure");
+		} else {
+			packet_bin_write($response->content);
+			packet_flush();
+			packet_txt_write("status=success");
+		}
+		packet_flush();
+	} elsif ( $command eq "get_raw_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		packet_bin_read();
+
+		my $get_url = $http_url . "/list/?sha1=" . $sha1;
+
+		my $userAgent = LWP::UserAgent->new();
+
+		my $response = $userAgent->get( $get_url );
+
+		if ($response->is_error) {
+			packet_txt_write("size=0");
+			packet_txt_write("kind=none");	    
+			packet_txt_write("status=notfound");
+		} else {
+			packet_txt_write("size=" . length($response->content));
+			packet_txt_write("kind=blob");
+			packet_bin_write($response->content);
+			packet_flush();
+			packet_txt_write("status=success");
+		}
+
+		packet_flush();
+	} elsif ( $command eq "put_raw_obj" ) {
+		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+		my ($size) = packet_txt_read() =~ /^size=([0-9]+)$/;
+		my ($kind) = packet_txt_read() =~ /^kind=(\w+)$/;
+
+		packet_bin_read();
+
+		# We must read the content we are sent and send it to the right url
+		my ($res, $buf) = packet_bin_read();
+		die "bad packet_bin_read res ($res)" unless ($res eq 0);
+		( packet_bin_read() eq ( 1, "" ) ) || die "bad send end";		
+
+		my $upload_url = $http_url . "/upload/?sha1=" . $sha1 . "&size=" . $size . "&type=blob";
+
+		my $userAgent = LWP::UserAgent->new();
+		my $request = POST $upload_url, Content_Type => 'multipart/form-data', Content => $buf;
+
+		my $response = $userAgent->request($request);
+
+		if ($response->is_error) {
+			packet_txt_write("status=failure");
+		} else {
+			packet_txt_write("status=success");
+		}
+		packet_flush();
+	} else {
+		die "bad command '$command'";
+	}
+}
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 34/40] external-odb: use 'odb=magic' attribute to mark odb blobs
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (32 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 33/40] Add t0480 to test "have" capability and raw objects Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 35/40] Add Documentation/technical/external-odb.txt Christian Couder
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

To tell which blobs should be sent to the "magic" external odb,
let's require that the blobs be marked using the 'odb=magic'
attribute.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 external-odb.c                         | 22 ++++++++++++++++++++--
 external-odb.h                         |  3 ++-
 sha1_file.c                            | 20 +++++++++++++++-----
 t/t0400-external-odb.sh                |  3 +++
 t/t0410-transfer-e-odb.sh              |  3 +++
 t/t0420-transfer-http-e-odb.sh         |  3 +++
 t/t0470-read-object-http-e-odb.sh      |  3 +++
 t/t0480-read-object-have-http-e-odb.sh |  3 +++
 8 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 084cd32e0b..e103514a46 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "external-odb.h"
 #include "odb-helper.h"
+#include "attr.h"
 
 static struct odb_helper *helpers;
 static struct odb_helper **helpers_tail = &helpers;
@@ -155,8 +156,23 @@ int external_odb_get_object(const unsigned char *sha1)
 	return external_odb_do_get_object(sha1);
 }
 
+static int has_odb_attrs(struct odb_helper *o, const char *path)
+{
+	static struct attr_check *check;
+
+	if (!check)
+		check = attr_check_initl("odb", NULL);
+
+	if (!git_check_attr(path, check)) {
+		const char *value = check->items[0].value;
+		return value ? !strcmp(o->name, value) : 0;
+	}
+	return 0;
+}
+
 int external_odb_put_object(const void *buf, size_t len,
-			    const char *type, unsigned char *sha1)
+			    const char *type, unsigned char *sha1,
+			    const char *path)
 {
 	struct odb_helper *o;
 
@@ -164,12 +180,14 @@ int external_odb_put_object(const void *buf, size_t len,
 		return 1;
 
 	/* For now accept only blobs */
-	if (strcmp(type, "blob"))
+	if (!path || strcmp(type, "blob"))
 		return 1;
 
 	external_odb_init();
 
 	for (o = helpers; o; o = o->next) {
+		if (!has_odb_attrs(o, path))
+			continue;
 		int r = odb_helper_put_object(o, buf, len, type, sha1);
 		if (r <= 0)
 			return r;
diff --git a/external-odb.h b/external-odb.h
index 247b131fd5..9bd7856b60 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -6,6 +6,7 @@ int external_odb_has_object(const unsigned char *sha1);
 int external_odb_get_object(const unsigned char *sha1);
 int external_odb_get_direct(const unsigned char *sha1);
 int external_odb_put_object(const void *buf, size_t len,
-			    const char *type, unsigned char *sha1);
+			    const char *type, unsigned char *sha1,
+			    const char *path);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/sha1_file.c b/sha1_file.c
index fb34f0b18d..75188850da 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3487,7 +3487,9 @@ static int freshen_packed_object(const unsigned char *sha1)
 	return 1;
 }
 
-int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1)
+static int write_sha1_file_with_path(const void *buf, unsigned long len,
+				     const char *type, unsigned char *sha1,
+				     const char *path)
 {
 	char hdr[32];
 	int hdrlen = sizeof(hdr);
@@ -3496,13 +3498,19 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
 	 * it out into .git/objects/??/?{38} file.
 	 */
 	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
-	if (!external_odb_put_object(buf, len, type, sha1))
+	if (!external_odb_put_object(buf, len, type, sha1, path))
 		return 0;
 	if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
 		return 0;
 	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }
 
+int write_sha1_file(const void *buf, unsigned long len,
+		    const char *type, unsigned char *sha1)
+{
+	write_sha1_file_with_path(buf, len, type, sha1, NULL);
+}
+
 int hash_sha1_file_literally(const void *buf, unsigned long len, const char *type,
 			     unsigned char *sha1, unsigned flags)
 {
@@ -3637,7 +3645,8 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	}
 
 	if (write_object)
-		ret = write_sha1_file(buf, size, typename(type), sha1);
+		ret = write_sha1_file_with_path(buf, size, typename(type),
+						sha1, path);
 	else
 		ret = hash_sha1_file(buf, size, typename(type), sha1);
 	if (re_allocated)
@@ -3659,8 +3668,9 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
 
 	if (write_object)
-		ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
-				      sha1);
+		ret = write_sha1_file_with_path(sbuf.buf, sbuf.len,
+						typename(OBJ_BLOB),
+						sha1, path);
 	else
 		ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
 				     sha1);
diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index fa355bd7bb..f2c45a625c 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -73,6 +73,9 @@ test_expect_success 'helper can add objects to alt repo' '
 
 test_expect_success 'commit adds objects to alt repo' '
 	test_config odb.magic.scriptCommand "$HELPER" &&
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit three &&
 	hash3=$(git ls-tree HEAD | grep three.t | cut -f1 | cut -d\  -f3) &&
 	content=$(cd alt-repo && git show "$hash3") &&
diff --git a/t/t0410-transfer-e-odb.sh b/t/t0410-transfer-e-odb.sh
index 065ec7d759..fd3e37918c 100755
--- a/t/t0410-transfer-e-odb.sh
+++ b/t/t0410-transfer-e-odb.sh
@@ -111,6 +111,9 @@ test_expect_success 'setup other repo and its alternate repo' '
 '
 
 test_expect_success 'new blobs are put in first object store' '
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit one &&
 	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
 	content=$(cd alt-repo1 && git show "$hash1") &&
diff --git a/t/t0420-transfer-http-e-odb.sh b/t/t0420-transfer-http-e-odb.sh
index f84fe950ec..d307af0457 100755
--- a/t/t0420-transfer-http-e-odb.sh
+++ b/t/t0420-transfer-http-e-odb.sh
@@ -94,6 +94,9 @@ test_expect_success 'can delete uploaded files' '
 FILES_DIR="httpd/www/files"
 
 test_expect_success 'new blobs are transfered to the http server' '
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit one &&
 	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
 	echo "$hash1-4-blob" >expected &&
diff --git a/t/t0470-read-object-http-e-odb.sh b/t/t0470-read-object-http-e-odb.sh
index 774528c04f..d814a43d59 100755
--- a/t/t0470-read-object-http-e-odb.sh
+++ b/t/t0470-read-object-http-e-odb.sh
@@ -62,6 +62,9 @@ test_expect_success 'can delete uploaded files' '
 FILES_DIR="httpd/www/files"
 
 test_expect_success 'new blobs are transfered to the http server' '
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit one &&
 	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
 	echo "$hash1-4-blob" >expected &&
diff --git a/t/t0480-read-object-have-http-e-odb.sh b/t/t0480-read-object-have-http-e-odb.sh
index 056a40f2bb..fe1fac5ef3 100755
--- a/t/t0480-read-object-have-http-e-odb.sh
+++ b/t/t0480-read-object-have-http-e-odb.sh
@@ -62,6 +62,9 @@ test_expect_success 'can delete uploaded files' '
 FILES_DIR="httpd/www/files"
 
 test_expect_success 'new blobs are transfered to the http server' '
+	echo "* odb=magic" >.gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git add .gitattributes &&
+	GIT_NO_EXTERNAL_ODB=1 git commit -m "Add .gitattributes" &&
 	test_commit one &&
 	hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
 	echo "$hash1-4-blob" >expected &&
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 35/40] Add Documentation/technical/external-odb.txt
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (33 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 34/40] external-odb: use 'odb=magic' attribute to mark odb blobs Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03 18:38   ` Stefan Beller
  2017-08-28 18:59   ` Ben Peart
  2017-08-03  9:19 ` [PATCH v5 36/40] clone: add 'initial' param to write_remote_refs() Christian Couder
                   ` (5 subsequent siblings)
  40 siblings, 2 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This describes the external odb mechanism's purpose and
how it works.

Helped-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/technical/external-odb.txt | 295 +++++++++++++++++++++++++++++++
 1 file changed, 295 insertions(+)
 create mode 100644 Documentation/technical/external-odb.txt

diff --git a/Documentation/technical/external-odb.txt b/Documentation/technical/external-odb.txt
new file mode 100644
index 0000000000..5991221fd5
--- /dev/null
+++ b/Documentation/technical/external-odb.txt
@@ -0,0 +1,295 @@
+External ODBs
+^^^^^^^^^^^^^
+
+The External ODB mechanism makes it possible for Git objects, mostly
+blobs for now though, to be stored in an "external object database"
+(External ODB).
+
+An External ODB can be any object store as long as there is an helper
+program called an "odb helper" that can communicate with Git to
+transfer objects to/from the external odb and to retrieve information
+about available objects in the external odb.
+
+Purpose
+=======
+
+The purpose of this mechanism is to make possible to handle Git
+objects, especially blobs, in much more flexible ways.
+
+Currently Git can store its objects only in the form of loose objects
+in separate files or packed objects in a pack file.
+
+This is not flexible enough for some important use cases like handling
+really big binary files or handling a really big number of files that
+are fetched only as needed. And it is not realistic to expect that Git
+could fully natively handle many of such use cases.
+
+Furthermore many improvements that are dependent on specific setups
+could be implemented in the way Git objects are managed if it was
+possible to customize how the Git objects are handled. For example a
+restartable clone using the bundle mechanism has often been requested,
+but implementing that would go against the current strict rules under
+which the Git objects are currently handled.
+
+What Git needs a mechanism to make it possible to customize in a lot
+of different ways how the Git objects are handled. Though this
+mechanism should try as much as possible to avoid interfering with the
+usual way in which Git handle its objects.
+
+Helpers
+=======
+
+ODB helpers are commands that have to be registered using either the
+"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand"
+config variables.
+
+Registering such a command tells Git that an external odb called
+<odbname> exists and that the registered command should be used to
+communicate with it.
+
+There are 2 kinds of commands. Commands registered using the
+"odb.<odbname>.subprocessCommand" config variable are called "process
+commands" and the associated mode is called "process mode". Commands
+registered using the "odb.<odbname>.scriptCommand" config variables
+are called "script commands" and the associated mode is called "script
+mode".
+
+Process Mode
+============
+
+In process mode the command is started as a single process invocation
+that should last for the entire life of the single Git command that
+started it.
+
+A packet format (pkt-line, see technical/protocol-common.txt) based
+protocol over standard input and standard output is used for
+communication between Git and the helper command.
+
+After the process command is started, Git sends a welcome message
+("git-read-object-client"), a list of supported protocol version
+numbers, and a flush packet. Git expects to read a welcome response
+message ("git-read-object-server"), exactly one protocol version
+number from the previously sent list, and a flush packet. All further
+communication will be based on the selected version.
+
+The remaining protocol description below documents "version=1". Please
+note that "version=42" in the example below does not exist and is only
+there to illustrate how the protocol would look with more than one
+version.
+
+After the version negotiation Git sends a list of all capabilities
+that it supports and a flush packet. Git expects to read a list of
+desired capabilities, which must be a subset of the supported
+capabilities list, and a flush packet as response:
+
+------------------------
+packet: git> git-read-object-client
+packet: git> version=1
+packet: git> version=42
+packet: git> 0000
+packet: git< git-read-object-server
+packet: git< version=1
+packet: git< 0000
+packet: git> capability=get_raw_obj
+packet: git> capability=have
+packet: git> capability=put_raw_obj
+packet: git> capability=not-yet-invented
+packet: git> 0000
+packet: git< capability=get_raw_obj
+packet: git< 0000
+------------------------
+
+Afterwards Git sends a list of "key=value" pairs terminated with a
+flush packet. The list will contain at least the instruction (based on
+the supported capabilities) and the arguments for the
+instruction. Please note, that the process must not send any response
+before it received the final flush packet.
+
+In general any response from the helper should end with a status
+packet. See the documentation of the 'get_*' instructions below for
+examples of status packets.
+
+After the helper has processed an instruction, it is expected to wait
+for the next "key=value" list containing another instruction.
+
+On exit Git will close the pipe to the helper. The helper is then
+expected to detect EOF and exit gracefully on its own. Git will wait
+until the process has stopped.
+
+Script Mode
+===========
+
+In this mode Git launches the script command each time it wants to
+communicates with the helper. There is no welcome message and no
+protocol version in this mode.
+
+The instruction and associated arguments are passed as arguments when
+launching the script command and if needed further information is
+passed between Git and the command through stdin and stdout.
+
+Capabilities/Instructions
+=========================
+
+The following instructions are currently supported by Git:
+
+- init
+- get_git_obj
+- get_raw_obj
+- get_direct
+- put_raw_obj
+- have
+
+The plan is to also support 'put_git_obj' and 'put_direct' soon, for
+consistency with the 'get_*' instructions.
+
+ - 'init'
+
+All the process and script commands must accept the 'init'
+instruction. It should be the first instruction sent to a command. It
+should not be advertised in the capability exchange. Any argument
+should be ignored.
+
+In process mode, after receiving the 'init' instruction and a flush
+packet, the helper should just send a status packet and then a flush
+packet. See the 'get_*' instructions below for examples of status
+packets.
+
+In script mode the command should print on stdout the capabilities
+that it supports if any. This is the only time in script mode when a
+capability exchange happens.
+
+For example a script command could use the following shell code
+snippet to handle the 'init' instruction:
+
+------------------------
+case "$1" in
+init)
+	echo "capability=get_git_obj"
+	echo "capability=put_raw_obj"
+	echo "capability=have"
+	;;
+------------------------
+
+ - 'get_git_obj <sha1>' and 'get_raw_obj <sha1>'
+
+These instructions should have a hexadecimal <sha1> argument to tell
+which object the helper should send to git.
+
+In process mode the sha1 argument should be followed by a flush packet
+like this:
+
+------------------------
+packet: git> command=get_git_obj
+packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
+packet: git> 0000
+------------------------
+
+After reading that the helper should send the requested object to Git in a
+packet series followed by a flush packet. If the helper does not experience
+problems then the helper must send a "success" status like the following:
+
+------------------------
+packet: git< status=success
+packet: git< 0000
+------------------------
+
+In case the helper cannot or does not want to send the requested
+object as well as any other object for the lifetime of the Git
+process, then it is expected to respond with an "abort" status at any
+point in the protocol:
+
+------------------------
+packet: git< status=abort
+packet: git< 0000
+------------------------
+
+Git neither stops nor restarts the helper in case the "error"/"abort"
+status is set.
+
+If the helper dies during the communication or does not adhere to the
+protocol then Git will stop and restart it with the next instruction.
+
+In script mode the helper should just send the requested object to Git
+by writing it to stdout and should then exit. The exit code should
+signal to Git if a problem occured or not.
+
+The only difference between 'get_git_obj' and 'get_raw_obj' is that in
+case of 'get_git_obj' the requested object should be sent as a Git
+object (that is in the same format as loose object files). In case of
+'get_raw_obj' the object should be sent in its raw format (that is the
+same output as `git cat-file <type> <sha1>`).
+
+ - 'get_direct <sha1>'
+
+This instruction is similar as the other 'get_*' instructions except
+that no object should be sent from the helper to Git. Instead the
+helper should directly write the requested object into a loose object
+file in the ".git/objects" directory.
+
+After the helper has sent the "status=success" packet and the
+following flush packet in process mode, or after it has exited in the
+script mode, Git should lookup again for a loose object file with the
+requested sha1.
+
+ - 'put_raw_obj <sha1> <size> <type>'
+
+This instruction should be following by three arguments to tell which
+object the helper will receive from git: <sha1>, <size> and
+<type>. The hexadecimal <sha1> argument describes the object that will
+be sent from Git to the helper. The <type> is the object type (blob,
+tree, commit or tag) of this object. The <size> is the size of the
+(decompressed) object content.
+
+In process mode the last argument (the type) should be followed by a
+flush packet.
+
+After reading that the helper should read the announced object from
+Git in a packet series followed by a flush packet.
+
+If the helper does not experience problems when receiving and storing
+or processing the object, then the helper must send a "success" status
+as described for the 'get_*' instructions.
+
+In script mode the helper should just receive the announced object
+from its standard input. After receiving and processing the object,
+the helper should exit and its exit code should signal to Git if a
+problem occured or not.
+
+- 'have'
+
+In process mode this instruction should be followed by a flush
+packet. After receiving this packet the helper should send the sha1,
+size and type, in this order, of all the objects it can provide to Git
+(through a 'get_*' instruction). There should be a space character
+between the sha1 and the size and between the size and the type, and
+then a new line character after the type.
+
+If the helper does not experience problems, then it must then send a
+"success" status as described for the 'get_*' instructions.
+
+In script mode the helper should send to its standard output the sha1,
+size and type, in this order of all the objects it can provide to
+Git. There should also be a space character between the sha1 and the
+size and between the size and the type, and then a new line character
+after the type.
+
+After sending this, the script helper should exit and its exit code
+should signal to Git if a problem occured or not.
+
+Selecting objects
+=================
+
+To select objects that should be handled by an external odb, one can
+use the git attributes system. For now this will only work will blobs
+and this will only work along with the 'put_raw_obj' instruction.
+
+For example if one has an external odb called "magic" and has
+registered an associated a process command helper that supports the
+'put_raw_obj' instruction, then one can tell Git that all the .jpg
+files should be handled by the "magic" odb using a .gitattributes file
+can that contains:
+
+------------------------
+*.jpg           odb=magic
+------------------------
+
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 36/40] clone: add 'initial' param to write_remote_refs()
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (34 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 35/40] Add Documentation/technical/external-odb.txt Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 37/40] clone: add --initial-refspec option Christian Couder
                   ` (4 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

We want to make it possible to separate fetching remote refs into
an initial part and a later part. To prepare for that, let's add
an 'initial' boolean parameter to write_remote_refs() to tell this
function if we are performing the initial part or not.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/clone.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4b5340c55f..2362dda880 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -573,7 +573,7 @@ static struct ref *wanted_peer_refs(const struct ref *refs,
 	return local_refs;
 }
 
-static void write_remote_refs(const struct ref *local_refs)
+static void write_remote_refs(const struct ref *local_refs, int initial)
 {
 	const struct ref *r;
 
@@ -592,8 +592,13 @@ static void write_remote_refs(const struct ref *local_refs)
 			die("%s", err.buf);
 	}
 
-	if (initial_ref_transaction_commit(t, &err))
-		die("%s", err.buf);
+	if (initial) {
+		if (initial_ref_transaction_commit(t, &err))
+			die("%s", err.buf);
+	} else {
+		if (ref_transaction_commit(t, &err))
+			die("%s", err.buf);
+	}
 
 	strbuf_release(&err);
 	ref_transaction_free(t);
@@ -640,7 +645,8 @@ static void update_remote_refs(const struct ref *refs,
 			       const char *branch_top,
 			       const char *msg,
 			       struct transport *transport,
-			       int check_connectivity)
+			       int check_connectivity,
+			       int initial)
 {
 	const struct ref *rm = mapped_refs;
 
@@ -655,7 +661,7 @@ static void update_remote_refs(const struct ref *refs,
 	}
 
 	if (refs) {
-		write_remote_refs(mapped_refs);
+		write_remote_refs(mapped_refs, initial);
 		if (option_single_branch && !option_no_tags)
 			write_followtags(refs, msg);
 	}
@@ -1164,7 +1170,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_fetch_refs(transport, mapped_refs);
 
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
-			   branch_top.buf, reflog_msg.buf, transport, !is_local);
+			   branch_top.buf, reflog_msg.buf, transport,
+			   !is_local, 0);
 
 	update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 37/40] clone: add --initial-refspec option
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (35 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 36/40] clone: add 'initial' param to write_remote_refs() Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 38/40] clone: disable external odb before initial clone Christian Couder
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

This option makes it possible to separate fetching refs when cloning
in two parts, an initial part and a later normal part.

This way after the initial part, mechanisms like the external odb
mechanism can be used to prefetch some objects using information
that has been made available during the initial fetch.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/clone.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 2362dda880..76e561534d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -56,6 +56,7 @@ static enum transport_family family;
 static struct string_list option_config = STRING_LIST_INIT_NODUP;
 static struct string_list option_required_reference = STRING_LIST_INIT_NODUP;
 static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP;
+static struct string_list option_initial_refspec = STRING_LIST_INIT_NODUP;
 static int option_dissociate;
 static int max_jobs = -1;
 static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
@@ -106,6 +107,8 @@ static struct option builtin_clone_options[] = {
 			N_("reference repository")),
 	OPT_STRING_LIST(0, "reference-if-able", &option_optional_reference,
 			N_("repo"), N_("reference repository")),
+	OPT_STRING_LIST(0, "initial-refspec", &option_initial_refspec,
+			N_("refspec"), N_("fetch this refspec first")),
 	OPT_BOOL(0, "dissociate", &option_dissociate,
 		 N_("use --reference only while cloning")),
 	OPT_STRING('o', "origin", &option_origin, N_("name"),
@@ -865,6 +868,47 @@ static void dissociate_from_references(void)
 	free(alternates);
 }
 
+static struct refspec *parse_initial_refspecs(void)
+{
+	const char **refspecs;
+	struct refspec *initial_refspecs;
+	struct string_list_item *rs;
+	int i = 0;
+
+	if (!option_initial_refspec.nr)
+		return NULL;
+
+	refspecs = xcalloc(option_initial_refspec.nr, sizeof(const char *));
+
+	for_each_string_list_item(rs, &option_initial_refspec)
+		refspecs[i++] = rs->string;
+
+	initial_refspecs = parse_fetch_refspec(option_initial_refspec.nr, refspecs);
+
+	free(refspecs);
+
+	return initial_refspecs;
+}
+
+static void fetch_initial_refs(struct transport *transport,
+			       const struct ref *refs,
+			       struct refspec *initial_refspecs,
+			       const char *branch_top,
+			       const char *reflog_msg,
+			       int is_local)
+{
+	int i;
+
+	for (i = 0; i < option_initial_refspec.nr; i++) {
+		struct ref *init_refs = NULL;
+		struct ref **tail = &init_refs;
+		get_fetch_map(refs, &initial_refspecs[i], &tail, 0);
+		transport_fetch_refs(transport, init_refs);
+		update_remote_refs(refs, init_refs, NULL, branch_top, reflog_msg,
+				   transport, !is_local, 1);
+	}
+}
+
 int cmd_clone(int argc, const char **argv, const char *prefix)
 {
 	int is_bundle = 0, is_local;
@@ -888,6 +932,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	struct refspec *refspec;
 	const char *fetch_pattern;
 
+	struct refspec *initial_refspecs;
+	int is_initial;
+
 	packet_trace_identity("clone");
 	argc = parse_options(argc, argv, prefix, builtin_clone_options,
 			     builtin_clone_usage, 0);
@@ -1055,6 +1102,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (option_required_reference.nr || option_optional_reference.nr)
 		setup_reference();
 
+	initial_refspecs = parse_initial_refspecs();
+
 	fetch_pattern = xstrfmt("+%s*:%s*", src_ref_prefix, branch_top.buf);
 	refspec = parse_fetch_refspec(1, &fetch_pattern);
 	free((char *)fetch_pattern);
@@ -1110,6 +1159,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	refs = transport_get_remote_refs(transport);
 
 	if (refs) {
+		fetch_initial_refs(transport, refs, initial_refspecs,
+				   branch_top.buf, reflog_msg.buf, is_local);
+
 		mapped_refs = wanted_peer_refs(refs, refspec);
 		/*
 		 * transport_get_remote_refs() may return refs with null sha-1
@@ -1169,9 +1221,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	else if (refs && complete_refs_before_fetch)
 		transport_fetch_refs(transport, mapped_refs);
 
+	is_initial = !refs || option_initial_refspec.nr == 0;
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
 			   branch_top.buf, reflog_msg.buf, transport,
-			   !is_local, 0);
+			   !is_local, is_initial);
 
 	update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 38/40] clone: disable external odb before initial clone
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (36 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 37/40] clone: add --initial-refspec option Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 39/40] Add tests for 'clone --initial-refspec' Christian Couder
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

To make it possible to have the external odb mechanism only kick in
after the initial part of a clone, we should disable it during the
initial part of the clone.

Let's do that by saving and then restoring the value of the
'use_external_odb' global variable.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/clone.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 76e561534d..dc57eabd40 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -934,6 +934,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 	struct refspec *initial_refspecs;
 	int is_initial;
+	int saved_use_external_odb;
 
 	packet_trace_identity("clone");
 	argc = parse_options(argc, argv, prefix, builtin_clone_options,
@@ -1079,6 +1080,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 	git_config(git_default_config, NULL);
 
+	/* Temporarily disable external ODB before initial clone */
+	saved_use_external_odb = use_external_odb;
+	use_external_odb = 0;
+
 	if (option_bare) {
 		if (option_mirror)
 			src_ref_prefix = "refs/";
@@ -1162,6 +1167,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		fetch_initial_refs(transport, refs, initial_refspecs,
 				   branch_top.buf, reflog_msg.buf, is_local);
 
+		use_external_odb = saved_use_external_odb;
+
 		mapped_refs = wanted_peer_refs(refs, refspec);
 		/*
 		 * transport_get_remote_refs() may return refs with null sha-1
@@ -1203,6 +1210,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 					option_branch, option_origin);
 
 		warning(_("You appear to have cloned an empty repository."));
+
+		use_external_odb = saved_use_external_odb;
+
 		mapped_refs = NULL;
 		our_head_points_at = NULL;
 		remote_head_points_at = NULL;
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 39/40] Add tests for 'clone --initial-refspec'
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (37 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 38/40] clone: disable external odb before initial clone Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-08-03  9:19 ` [PATCH v5 40/40] Add t0430 to test cloning using bundles Christian Couder
  2017-09-10 12:30 ` [PATCH v5 00/40] Add initial experimental external ODB support Lars Schneider
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0420-transfer-http-e-odb.sh         |  7 +++++
 t/t0470-read-object-http-e-odb.sh      |  7 +++++
 t/t0480-read-object-have-http-e-odb.sh |  7 +++++
 t/t5616-clone-initial-refspec.sh       | 48 ++++++++++++++++++++++++++++++++++
 4 files changed, 69 insertions(+)
 create mode 100755 t/t5616-clone-initial-refspec.sh

diff --git a/t/t0420-transfer-http-e-odb.sh b/t/t0420-transfer-http-e-odb.sh
index d307af0457..ed833850c3 100755
--- a/t/t0420-transfer-http-e-odb.sh
+++ b/t/t0420-transfer-http-e-odb.sh
@@ -140,6 +140,13 @@ test_expect_success 'no-local clone from the first repo with helper succeeds' '
 	rm -rf my-other-clone
 '
 
+test_expect_success 'no-local initial-refspec clone succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git -c odb.magic.scriptCommand="$HELPER" \
+		clone --no-local --initial-refspec "refs/odbs/magic/*:refs/odbs/magic/*" .. .)
+'
+
 stop_httpd
 
 test_done
diff --git a/t/t0470-read-object-http-e-odb.sh b/t/t0470-read-object-http-e-odb.sh
index d814a43d59..7355ca4d51 100755
--- a/t/t0470-read-object-http-e-odb.sh
+++ b/t/t0470-read-object-http-e-odb.sh
@@ -107,6 +107,13 @@ test_expect_success 'no-local clone from the first repo with helper succeeds' '
 	rm -rf my-other-clone
 '
 
+test_expect_success 'no-local initial-refspec clone succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git -c odb.magic.subprocessCommand="$HELPER" \
+		clone --no-local --initial-refspec "refs/odbs/magic/*:refs/odbs/magic/*" .. .)
+'
+
 stop_httpd
 
 test_done
diff --git a/t/t0480-read-object-have-http-e-odb.sh b/t/t0480-read-object-have-http-e-odb.sh
index fe1fac5ef3..c451d269a7 100755
--- a/t/t0480-read-object-have-http-e-odb.sh
+++ b/t/t0480-read-object-have-http-e-odb.sh
@@ -107,6 +107,13 @@ test_expect_success 'no-local clone from the first repo with helper succeeds' '
 	rm -rf my-other-clone
 '
 
+test_expect_success 'no-local initial-refspec clone succeeds' '
+	mkdir my-other-clone &&
+	(cd my-other-clone &&
+	 git -c odb.magic.subprocessCommand="$HELPER" \
+		clone --no-local --initial-refspec "refs/odbs/magic/*:refs/odbs/magic/*" .. .)
+'
+
 stop_httpd
 
 test_done
diff --git a/t/t5616-clone-initial-refspec.sh b/t/t5616-clone-initial-refspec.sh
new file mode 100755
index 0000000000..ccbc27f83f
--- /dev/null
+++ b/t/t5616-clone-initial-refspec.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+
+test_description='test clone with --initial-refspec option'
+. ./test-lib.sh
+
+
+test_expect_success 'setup regular repo' '
+	# Make two branches, "master" and "side"
+	echo one >file &&
+	git add file &&
+	git commit -m one &&
+	echo two >file &&
+	git commit -a -m two &&
+	git tag two &&
+	echo three >file &&
+	git commit -a -m three &&
+	git checkout -b side &&
+	echo four >file &&
+	git commit -a -m four &&
+	git checkout master
+'
+
+test_expect_success 'add a special ref pointing to a blob' '
+	hash=$(echo "Hello world!" | git hash-object -w -t blob --stdin) &&
+	git update-ref refs/special/hello "$hash"
+'
+
+test_expect_success 'no-local clone from the first repo' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone --no-local .. . &&
+	 test_must_fail git cat-file blob "$hash") &&
+	rm -rf my-clone
+'
+
+test_expect_success 'no-local clone with --initial-refspec' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone --no-local --initial-refspec "refs/special/*:refs/special/*" .. . &&
+	 git cat-file blob "$hash" &&
+	 git rev-parse refs/special/hello >actual &&
+	 echo "$hash" >expected &&
+	 test_cmp expected actual) &&
+	rm -rf my-clone
+'
+
+test_done
+
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 40/40] Add t0430 to test cloning using bundles
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (38 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 39/40] Add tests for 'clone --initial-refspec' Christian Couder
@ 2017-08-03  9:19 ` Christian Couder
  2017-09-10 12:30 ` [PATCH v5 00/40] Add initial experimental external ODB support Lars Schneider
  40 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-03  9:19 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0430-clone-bundle-e-odb.sh | 85 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)
 create mode 100755 t/t0430-clone-bundle-e-odb.sh

diff --git a/t/t0430-clone-bundle-e-odb.sh b/t/t0430-clone-bundle-e-odb.sh
new file mode 100755
index 0000000000..ac38ae1be5
--- /dev/null
+++ b/t/t0430-clone-bundle-e-odb.sh
@@ -0,0 +1,85 @@
+#!/bin/sh
+
+test_description='tests for cloning using a bundle through e-odb'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+# odb helper script must see this
+export HTTPD_URL
+
+write_script odb-clone-bundle-helper <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
+echo >&2 "odb-clone-bundle-helper args:" "$@"
+case "$1" in
+init)
+	ref_hash=$(git rev-parse refs/odbs/magic/bundle) ||
+	die "couldn't find refs/odbs/magic/bundle"
+	GIT_NO_EXTERNAL_ODB=1 git cat-file blob "$ref_hash" >bundle_info ||
+	die "couldn't get blob $ref_hash"
+	bundle_url=$(sed -e 's/bundle url: //' bundle_info)
+	echo >&2 "bundle_url: '$bundle_url'"
+	curl "$bundle_url" -o bundle_file ||
+	die "curl '$bundle_url' failed"
+	GIT_NO_EXTERNAL_ODB=1 git bundle unbundle bundle_file >unbundling_info ||
+	die "unbundling 'bundle_file' failed"
+	;;
+get*)
+	die "odb-clone-bundle-helper '$1' called"
+	;;
+put*)
+	die "odb-clone-bundle-helper '$1' called"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
+esac
+EOF
+HELPER="\"$PWD\"/odb-clone-bundle-helper"
+
+
+test_expect_success 'setup repo with a few commits' '
+	test_commit one &&
+	test_commit two &&
+	test_commit three &&
+	test_commit four
+'
+
+BUNDLE_FILE="file.bundle"
+FILES_DIR="httpd/www/files"
+GET_URL="$HTTPD_URL/files/$BUNDLE_FILE"
+
+test_expect_success 'create a bundle for this repo and check that it can be downloaded' '
+	git bundle create "$BUNDLE_FILE" master &&
+	mkdir "$FILES_DIR" &&
+	cp "$BUNDLE_FILE" "$FILES_DIR/" &&
+	curl "$GET_URL" --output actual &&
+	test_cmp "$BUNDLE_FILE" actual
+'
+
+test_expect_success 'create an e-odb ref for this bundle' '
+	ref_hash=$(echo "bundle url: $GET_URL" | GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) &&
+	git update-ref refs/odbs/magic/bundle "$ref_hash"
+'
+
+test_expect_success 'clone using the e-odb helper to download and install the bundle' '
+	mkdir my-clone &&
+	(cd my-clone &&
+	 git clone --no-local \
+		-c odb.magic.scriptCommand="$HELPER" \
+		--initial-refspec "refs/odbs/magic/*:refs/odbs/magic/*" .. .)
+'
+
+stop_httpd
+
+test_done
-- 
2.14.0.rc1.52.gf02fb0ddac.dirty


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt
  2017-08-03  9:19 ` [PATCH v5 35/40] Add Documentation/technical/external-odb.txt Christian Couder
@ 2017-08-03 18:38   ` Stefan Beller
  2017-08-25  6:14     ` Christian Couder
  2017-08-28 18:59   ` Ben Peart
  1 sibling, 1 reply; 73+ messages in thread
From: Stefan Beller @ 2017-08-03 18:38 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

On Thu, Aug 3, 2017 at 2:19 AM, Christian Couder
<christian.couder@gmail.com> wrote:
> This describes the external odb mechanism's purpose and
> how it works.

Thanks for providing this documentation patch!

I read through it sequentially, see questions that came to mind
in between.

If the very last paragraph came earlier (or an example), it
would have helped me to understand the big picture better.

>
> Helped-by: Ben Peart <benpeart@microsoft.com>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  Documentation/technical/external-odb.txt | 295 +++++++++++++++++++++++++++++++
>  1 file changed, 295 insertions(+)
>  create mode 100644 Documentation/technical/external-odb.txt
>
> diff --git a/Documentation/technical/external-odb.txt b/Documentation/technical/external-odb.txt
> new file mode 100644
> index 0000000000..5991221fd5
> --- /dev/null
> +++ b/Documentation/technical/external-odb.txt
> @@ -0,0 +1,295 @@
> +External ODBs
> +^^^^^^^^^^^^^
> +
> +The External ODB mechanism makes it possible for Git objects, mostly
> +blobs for now though, to be stored in an "external object database"
> +(External ODB).
> +
> +An External ODB can be any object store as long as there is an helper
> +program called an "odb helper" that can communicate with Git to
> +transfer objects to/from the external odb and to retrieve information
> +about available objects in the external odb.
> +
> +Purpose
> +=======
> +
> +The purpose of this mechanism is to make possible to handle Git
> +objects, especially blobs, in much more flexible ways.
> +
> +Currently Git can store its objects only in the form of loose objects
> +in separate files or packed objects in a pack file.
> +
> +This is not flexible enough for some important use cases like handling
> +really big binary files or handling a really big number of files that
> +are fetched only as needed. And it is not realistic to expect that Git
> +could fully natively handle many of such use cases.

This is a strong statement. Why is it not realistic? What are these
"many of such use cases"?

> +Furthermore many improvements that are dependent on specific setups
> +could be implemented in the way Git objects are managed if it was
> +possible to customize how the Git objects are handled. For example a
> +restartable clone using the bundle mechanism has often been requested,
> +but implementing that would go against the current strict rules under
> +which the Git objects are currently handled.

So in this example, you would use todays git-clone to obtain a small version
of the repo and then obtain other objects later?

> +What Git needs a mechanism to make it possible to customize in a lot
> +of different ways how the Git objects are handled.

I do not understand why we need this. Is this aimed to support git LFS,
which by its model has additional objects not natively tracked by Git, that
are fetched later when needed?

> Though this
> +mechanism should try as much as possible to avoid interfering with the
> +usual way in which Git handle its objects.
> +
> +Helpers
> +=======
> +
> +ODB helpers are commands that have to be registered using either the
> +"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand"
> +config variables.
> +
> +Registering such a command tells Git that an external odb called
> +<odbname> exists and that the registered command should be used to
> +communicate with it.
> +
> +There are 2 kinds of commands. Commands registered using the
> +"odb.<odbname>.subprocessCommand" config variable are called "process
> +commands" and the associated mode is called "process mode". Commands
> +registered using the "odb.<odbname>.scriptCommand" config variables
> +are called "script commands" and the associated mode is called "script
> +mode".

So there is the possibility for multiple ODBs by the nature of the config
as we can have multiple <odbname> sections. How does Git know which
odb to talk to? (does it talk to all of them when asking for a random object?)

When writing an object how does Git decide where to store an object
(internally or in one of its ODB? Maybe in multiple ODBs? Does the user
give rules how to tackle the problem or will Git have some magic to do
the right thing? If so where can I read about that?)

One could think that one ODB is able to learn about objects out of band
i.e. to replace the fetch/clone/push mechanism, whereas another ODB is
capable of efficient fast local storage and yet another one that is optimized
for storing large binary files.

> +Process Mode
> +============
> +
> +In process mode the command is started as a single process invocation
> +that should last for the entire life of the single Git command that
> +started it.
> +
> +A packet format (pkt-line, see technical/protocol-common.txt) based
> +protocol over standard input and standard output is used for
> +communication between Git and the helper command.
> +
> +After the process command is started, Git sends a welcome message
> +("git-read-object-client"), a list of supported protocol version
> +numbers, and a flush packet. Git expects to read a welcome response
> +message ("git-read-object-server"), exactly one protocol version
> +number from the previously sent list, and a flush packet. All further
> +communication will be based on the selected version.
> +
> +The remaining protocol description below documents "version=1". Please
> +note that "version=42" in the example below does not exist and is only
> +there to illustrate how the protocol would look with more than one
> +version.
> +
> +After the version negotiation Git sends a list of all capabilities
> +that it supports and a flush packet. Git expects to read a list of
> +desired capabilities, which must be a subset of the supported
> +capabilities list, and a flush packet as response:
> +
> +------------------------
> +packet: git> git-read-object-client
> +packet: git> version=1
> +packet: git> version=42
> +packet: git> 0000
> +packet: git< git-read-object-server
> +packet: git< version=1
> +packet: git< 0000
> +packet: git> capability=get_raw_obj
> +packet: git> capability=have
> +packet: git> capability=put_raw_obj
> +packet: git> capability=not-yet-invented
> +packet: git> 0000
> +packet: git< capability=get_raw_obj
> +packet: git< 0000
> +------------------------
> +
> +Afterwards Git sends a list of "key=value" pairs terminated with a
> +flush packet. The list will contain at least the instruction (based on
> +the supported capabilities) and the arguments for the
> +instruction. Please note, that the process must not send any response
> +before it received the final flush packet.
> +
> +In general any response from the helper should end with a status
> +packet. See the documentation of the 'get_*' instructions below for
> +examples of status packets.
> +
> +After the helper has processed an instruction, it is expected to wait
> +for the next "key=value" list containing another instruction.
> +
> +On exit Git will close the pipe to the helper. The helper is then
> +expected to detect EOF and exit gracefully on its own. Git will wait
> +until the process has stopped.
> +
> +Script Mode
> +===========
> +
> +In this mode Git launches the script command each time it wants to
> +communicates with the helper. There is no welcome message and no
> +protocol version in this mode.
> +
> +The instruction and associated arguments are passed as arguments when
> +launching the script command and if needed further information is
> +passed between Git and the command through stdin and stdout.
> +
> +Capabilities/Instructions
> +=========================
> +
> +The following instructions are currently supported by Git:
> +
> +- init
> +- get_git_obj
> +- get_raw_obj
> +- get_direct
> +- put_raw_obj
> +- have
> +
> +The plan is to also support 'put_git_obj' and 'put_direct' soon, for
> +consistency with the 'get_*' instructions.
> +
> + - 'init'
> +
> +All the process and script commands must accept the 'init'
> +instruction. It should be the first instruction sent to a command. It
> +should not be advertised in the capability exchange. Any argument
> +should be ignored.
> +
> +In process mode, after receiving the 'init' instruction and a flush
> +packet, the helper should just send a status packet and then a flush
> +packet. See the 'get_*' instructions below for examples of status
> +packets.
> +
> +In script mode the command should print on stdout the capabilities
> +that it supports if any. This is the only time in script mode when a
> +capability exchange happens.
> +
> +For example a script command could use the following shell code
> +snippet to handle the 'init' instruction:
> +
> +------------------------
> +case "$1" in
> +init)
> +       echo "capability=get_git_obj"
> +       echo "capability=put_raw_obj"
> +       echo "capability=have"
> +       ;;
> +------------------------

I can see the rationale for script mode, but not quite for process mode
as in process mode we could do the same init work that is needed after
the welcome message?

Is it kept in process mode to keep consistent with script mode?

I assume this is to setup the ODB, which then can also state  things like
"I am not in a state to work, as the network connection is missing"
or ask the user for a password for the encrypted database?

> + - 'get_git_obj <sha1>' and 'get_raw_obj <sha1>'
> +
> +These instructions should have a hexadecimal <sha1> argument to tell
> +which object the helper should send to git.
> +
> +In process mode the sha1 argument should be followed by a flush packet
> +like this:
> +
> +------------------------
> +packet: git> command=get_git_obj
> +packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
> +packet: git> 0000
> +------------------------
> +
> +After reading that the helper should send the requested object to Git in a
> +packet series followed by a flush packet. If the helper does not experience
> +problems then the helper must send a "success" status like the following:
> +
> +------------------------
> +packet: git< status=success
> +packet: git< 0000
> +------------------------
> +
> +In case the helper cannot or does not want to send the requested
> +object as well as any other object for the lifetime of the Git
> +process, then it is expected to respond with an "abort" status at any
> +point in the protocol:
> +
> +------------------------
> +packet: git< status=abort
> +packet: git< 0000
> +------------------------
> +
> +Git neither stops nor restarts the helper in case the "error"/"abort"
> +status is set.
> +
> +If the helper dies during the communication or does not adhere to the
> +protocol then Git will stop and restart it with the next instruction.
> +
> +In script mode the helper should just send the requested object to Git
> +by writing it to stdout and should then exit. The exit code should
> +signal to Git if a problem occured or not.
> +
> +The only difference between 'get_git_obj' and 'get_raw_obj' is that in
> +case of 'get_git_obj' the requested object should be sent as a Git
> +object (that is in the same format as loose object files). In case of
> +'get_raw_obj' the object should be sent in its raw format (that is the
> +same output as `git cat-file <type> <sha1>`).

In case of abort, what are the implications for Git? How do we deliver the
message to the user (should the helper print to stderr, or is there a way
to relay it through Git such that we do not have racy output?)

> + - 'get_direct <sha1>'
> +
> +This instruction is similar as the other 'get_*' instructions except
> +that no object should be sent from the helper to Git. Instead the
> +helper should directly write the requested object into a loose object
> +file in the ".git/objects" directory.
> +
> +After the helper has sent the "status=success" packet and the
> +following flush packet in process mode, or after it has exited in the
> +script mode, Git should lookup again for a loose object file with the
> +requested sha1.

Does it have to be a loose object or is the helper also allowed
to put a packfile into $GIT_OBJECT_DIRECTORY/pack ?
If so, is it expected to also produce an idx file?

> +
> + - 'put_raw_obj <sha1> <size> <type>'
> +
> +This instruction should be following by three arguments to tell which
> +object the helper will receive from git: <sha1>, <size> and
> +<type>. The hexadecimal <sha1> argument describes the object that will
> +be sent from Git to the helper. The <type> is the object type (blob,
> +tree, commit or tag) of this object. The <size> is the size of the
> +(decompressed) object content.

So the type is encoded as strings "blob", "tree" ... Maybe quote them?

The size is "in bytes" (maybe add that unit?). I expect there is no fanciness
allowed such as "3.3MB" as that is not precise enough.

> +In process mode the last argument (the type) should be followed by a
> +flush packet.
> +
> +After reading that the helper should read the announced object from
> +Git in a packet series followed by a flush packet.
> +
> +If the helper does not experience problems when receiving and storing
> +or processing the object, then the helper must send a "success" status
> +as described for the 'get_*' instructions.

Otherwise an abort is expected?

> +
> +In script mode the helper should just receive the announced object
> +from its standard input. After receiving and processing the object,
> +the helper should exit and its exit code should signal to Git if a
> +problem occured or not.
> +
> +- 'have'
> +
> +In process mode this instruction should be followed by a flush
> +packet. After receiving this packet the helper should send the sha1,
> +size and type, in this order, of all the objects it can provide to Git
> +(through a 'get_*' instruction). There should be a space character
> +between the sha1 and the size and between the size and the type, and
> +then a new line character after the type.

As this is also inside a packet, do we need to care about splitting
up the payload? i.e. when we have a lot of objects such that we need
multiple packets to present all 'have's, are we allowed to split
up anywhere or just after a '\n' ?

> +If the helper does not experience problems, then it must then send a
> +"success" status as described for the 'get_*' instructions.
> +
> +In script mode the helper should send to its standard output the sha1,
> +size and type, in this order of all the objects it can provide to
> +Git. There should also be a space character between the sha1 and the
> +size and between the size and the type, and then a new line character
> +after the type.
> +
> +After sending this, the script helper should exit and its exit code
> +should signal to Git if a problem occured or not.
> +
> +Selecting objects
> +=================
> +
> +To select objects that should be handled by an external odb, one can
> +use the git attributes system. For now this will only work will blobs
> +and this will only work along with the 'put_raw_obj' instruction.
> +
> +For example if one has an external odb called "magic" and has
> +registered an associated a process command helper that supports the
> +'put_raw_obj' instruction, then one can tell Git that all the .jpg
> +files should be handled by the "magic" odb using a .gitattributes file
> +can that contains:
> +
> +------------------------
> +*.jpg           odb=magic
> +------------------------

Hah that answers some questions that are asked earlier!

What happens if I say

  *.jpg odb=my-magic-store,my-jpeg-store

?

Maybe relevant:
https://public-inbox.org/git/20170725211300.vwlpioy5jes55273@sigill.intra.peff.net/
"Extend the .gitattributes file to also specify file sizes"

> +
> --
> 2.14.0.rc1.52.gf02fb0ddac.dirty
>

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 04/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl
  2017-08-03  9:18 ` [PATCH v5 04/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl Christian Couder
@ 2017-08-03 19:11   ` Junio C Hamano
  2017-08-04  6:32     ` Christian Couder
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2017-08-03 19:11 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> This will make it possible to reuse packet reading and writing
> functions in other test scripts.
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  perl/Git/Packet.pm | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
>  create mode 100644 perl/Git/Packet.pm
>
> diff --git a/perl/Git/Packet.pm b/perl/Git/Packet.pm
> new file mode 100644
> index 0000000000..aaffecbe2a
> --- /dev/null
> +++ b/perl/Git/Packet.pm
> @@ -0,0 +1,71 @@
> +package Git::Packet;
> +use 5.008;
> +use strict;
> +use warnings;
> +BEGIN {
> +	require Exporter;
> +	if ($] < 5.008003) {
> +		*import = \&Exporter::import;
> +	} else {
> +		# Exporter 5.57 which supports this invocation was
> +		# released with perl 5.8.3
> +		Exporter->import('import');
> +	}
> +}

This is merely me being curious, but do we want this boilerplate,
which we do not use in perl/Git.pm but we do in perl/Git/I18N.pm?

> +our @EXPORT = qw(
> +			packet_bin_read
> +			packet_txt_read
> +			packet_bin_write
> +			packet_txt_write
> +			packet_flush
> +		);
> +our @EXPORT_OK = @EXPORT;

We can see that you made sure that the only thing 05/40 needs to do
is to use this package and remove the definition of these subs,
without having to touch any caller by first updating the original
implementation in 03/40 and then exporting these names in 04/40.
Knowing that the preparation is nicely done already, it is a bit
irritating to see that 05/40 is a separate patch, as we need to
switch between the patches to see if there is any difference between
the original implementation of the subs, and the replacement
implemented in here.  It would have been nicer to have changes in
04/40 and 05/40 in a single patch.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 08/40] Git/Packet.pm: add capability functions
  2017-08-03  9:18 ` [PATCH v5 08/40] Git/Packet.pm: add capability functions Christian Couder
@ 2017-08-03 19:14   ` Junio C Hamano
  2017-08-04 20:34     ` Christian Couder
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2017-08-03 19:14 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> Add functions to help read and write capabilities.
> Use these functions in 't/t0021/rot13-filter.pl'.
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---

Steps 06-08/40 all look sensible to me, but they probably fall into
the same bucket as step 03/40, i.e. better done before step 04-05/40
(which should probably be a single patch, as I earlier said) as
preparatory steps.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 10/40] Add initial external odb support
  2017-08-03  9:18 ` [PATCH v5 10/40] Add initial external odb support Christian Couder
@ 2017-08-03 19:34   ` Junio C Hamano
  2017-08-03 20:17     ` Jeff King
  2017-09-14 10:14     ` Christian Couder
  0 siblings, 2 replies; 73+ messages in thread
From: Junio C Hamano @ 2017-08-03 19:34 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> +int external_odb_has_object(const unsigned char *sha1)
> +{
> +	struct odb_helper *o;
> +
> +	external_odb_init();
> +
> +	for (o = helpers; o; o = o->next)
> +		if (odb_helper_has_object(o, sha1))
> +			return 1;
> +	return 0;
> +}
> +
> +int external_odb_get_object(const unsigned char *sha1)
> +{
> +	struct odb_helper *o;
> +	const char *path;
> +
> +	if (!external_odb_has_object(sha1))
> +		return -1;

This probably would not matter, as I do not expect one repository to
connect to and backed by very many external odb instances, but I
would have expected that the interaction would go more like "ah, we
need this object that is lacking locally. let's see if there is
anybody with the object. now we found who claims to have the object,
let's ask that guy (and nobody else) to give the object to us".

IOW, I would have expected two functions:

 - struct odb_helper *external_odb_with(struct object_id *oid);
 - int external_odb_get(struct object_id *oid, struct odb_helper *odb);

where the latter may start like

    if (!odb) {
	odb = external_odb_with(oid);
	if (!odb)
	    return -1;
    }
    ... go ask that odb for the object ...

> diff --git a/external-odb.h b/external-odb.h
> new file mode 100644
> index 0000000000..9989490c9e
> --- /dev/null
> +++ b/external-odb.h
> @@ -0,0 +1,8 @@
> +#ifndef EXTERNAL_ODB_H
> +#define EXTERNAL_ODB_H
> +
> +const char *external_odb_root(void);
> +int external_odb_has_object(const unsigned char *sha1);
> +int external_odb_get_object(const unsigned char *sha1);

Even though ancient codebase of ours deliberately omitted them, I
think our recent trend is to explicitly spell "extern " in headers.

> diff --git a/odb-helper.h b/odb-helper.h
> new file mode 100644
> index 0000000000..5800661704
> --- /dev/null
> +++ b/odb-helper.h

Likewise.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 13/40] external odb: add 'put_raw_obj' support
  2017-08-03  9:18 ` [PATCH v5 13/40] external odb: add 'put_raw_obj' support Christian Couder
@ 2017-08-03 19:50   ` Junio C Hamano
  2017-09-14  9:17     ` Christian Couder
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2017-08-03 19:50 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> Add support for a 'put_raw_obj' capability/instruction to send new
> objects to an external odb. Objects will be sent as they are (in
> their 'raw' format). They will not be converted to Git objects.
>
> For now any new Git object (blob, tree, commit, ...) would be sent
> if 'put_raw_obj' is supported by an odb helper. This is not a great
> default, but let's leave it to following commits to tweak that.
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---

I thought in an earlier step that I saw this thing initialized in
the codepath that adds alternate object stores, which are read-only
places we "borrow" from.  Being able to write into it is good, but
conceptually it no longer feels correct to initialize it from the
alternate object database initialization codepath.

Another way to say it is that an object store, whether it is local
or external, is not "alt" if it will result in storing new objects
we locally create.  It's just an extension of our local object
store.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 14/40] external-odb: accept only blobs for now
  2017-08-03  9:19 ` [PATCH v5 14/40] external-odb: accept only blobs for now Christian Couder
@ 2017-08-03 19:52   ` Junio C Hamano
  2017-09-14  9:59     ` Christian Couder
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2017-08-03 19:52 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> The mechanism to decide which blobs should be sent to which
> external object database will be very simple for now.
> If the external odb helper support any "put_*" instruction
> all the new blobs will be sent to it.
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  external-odb.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/external-odb.c b/external-odb.c
> index 82fac702e8..a4f8c72e1c 100644
> --- a/external-odb.c
> +++ b/external-odb.c
> @@ -124,6 +124,10 @@ int external_odb_put_object(const void *buf, size_t len,
>  {
>  	struct odb_helper *o;
>  
> +	/* For now accept only blobs */
> +	if (strcmp(type, "blob"))
> +		return 1;
> +

I somehow doubt that a policy decision like this should be made at
this layer.  Shouldn't it be encoded in the capability the other
side supports, or determined at runtime per each individual object
when a "put" is attempted (i.e. allow the other side to say "You
tell me that you want me to store an object of type X and size Y;
I cannot do that, sorry").

>  	external_odb_init();
>  
>  	for (o = helpers; o; o = o->next) {

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 19/40] lib-httpd: add upload.sh
  2017-08-03  9:19 ` [PATCH v5 19/40] lib-httpd: add upload.sh Christian Couder
@ 2017-08-03 20:07   ` Junio C Hamano
  2017-09-14  7:43     ` Christian Couder
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2017-08-03 20:07 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> +OLDIFS="$IFS"
> +IFS='&'
> +set -- $QUERY_STRING
> +IFS="$OLDIFS"
> +
> +while test $# -gt 0
> +do
> +    key=${1%=*}
> +    val=${1#*=}

When you see that ${V%X*} and ${V#*X} appear in a pair for the same
variable V and same delimiter X, it almost always indicates a bug
waiting to happen.

What's the definition of "key" here?  A member of known set of short
tokens, all of which consists only of alphanumeric, or something?
Even if you do not currently plan to deal with a value with '=' in
it, it may be prudent to double '%' above (and do not double '#').

Style: indent your shell script with tabs.

> +
> +    case "$key" in
> +	"sha1") sha1="$val" ;;
> +	"type") type="$val" ;;
> +	"size") size="$val" ;;
> +	"delete") delete=1 ;;
> +	*) echo >&2 "unknown key '$key'" ;;
> +    esac

Indent your shell script with tabs; case/esac and the labels used
for each case arms all align at the same column.

> +
> +    shift
> +done
> +
> +case "$REQUEST_METHOD" in
> +  POST)

Likewise.

> +    if test "$delete" = "1"
> +    then
> +	rm -f "$FILES_DIR/$sha1-$size-$type"
> +    else
> +	mkdir -p "$FILES_DIR"
> +	cat >"$FILES_DIR/$sha1-$size-$type"
> +    fi
> +
> +    echo 'Status: 204 No Content'
> +    echo
> +    ;;
> +
> +  *)
> +    echo 'Status: 405 Method Not Allowed'
> +    echo
> +esac

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 10/40] Add initial external odb support
  2017-08-03 19:34   ` Junio C Hamano
@ 2017-08-03 20:17     ` Jeff King
  2017-09-14 10:14     ` Christian Couder
  1 sibling, 0 replies; 73+ messages in thread
From: Jeff King @ 2017-08-03 20:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, git, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

On Thu, Aug 03, 2017 at 12:34:25PM -0700, Junio C Hamano wrote:

> > +int external_odb_get_object(const unsigned char *sha1)
> > +{
> > +	struct odb_helper *o;
> > +	const char *path;
> > +
> > +	if (!external_odb_has_object(sha1))
> > +		return -1;
> 
> This probably would not matter, as I do not expect one repository to
> connect to and backed by very many external odb instances, but I
> would have expected that the interaction would go more like "ah, we
> need this object that is lacking locally. let's see if there is
> anybody with the object. now we found who claims to have the object,
> let's ask that guy (and nobody else) to give the object to us".
> 
> IOW, I would have expected two functions:
> 
>  - struct odb_helper *external_odb_with(struct object_id *oid);
>  - int external_odb_get(struct object_id *oid, struct odb_helper *odb);

One advantage of walking through them linearly and asking "can you get
it?" is that it gracefully handles external odbs which aren't available.
That can be used for redundancy, or for situations where a preferred
odb isn't always available (e.g., a fast server which is only available
when you're on a particular network).

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 25/40] external-odb: add 'get_direct' support
  2017-08-03  9:19 ` [PATCH v5 25/40] external-odb: add 'get_direct' support Christian Couder
@ 2017-08-03 21:40   ` Junio C Hamano
  2017-09-14  8:39     ` Christian Couder
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2017-08-03 21:40 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> This implements the 'get_direct' capability/instruction that makes
> it possible for external odb helper scripts to pass blobs to Git
> by directly writing them as loose objects files.

I am not sure if the assumption is made clear in this series, but I
am (perhaps incorrectly) guessing that it is assumed that the
intended use of this feature is to offload access to large blobs
by not including them in the initial clone.  So from that point of
view, I think it makes tons of sense to let the external helper to
directly populate the database bypassing Git (i.e. instead of
feeding data stream and have Git store it) like this "direct" method
does.

How does this compare with (and how well does this work with) what
Jonathan Tan is doing recently?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 04/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl
  2017-08-03 19:11   ` Junio C Hamano
@ 2017-08-04  6:32     ` Christian Couder
  0 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-04  6:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder,
	Ævar Arnfjörð Bjarmason

On Thu, Aug 3, 2017 at 9:11 PM, Junio C Hamano <gitster@pobox.com> wrote:

>> diff --git a/perl/Git/Packet.pm b/perl/Git/Packet.pm
>> new file mode 100644
>> index 0000000000..aaffecbe2a
>> --- /dev/null
>> +++ b/perl/Git/Packet.pm
>> @@ -0,0 +1,71 @@
>> +package Git::Packet;
>> +use 5.008;
>> +use strict;
>> +use warnings;
>> +BEGIN {
>> +     require Exporter;
>> +     if ($] < 5.008003) {
>> +             *import = \&Exporter::import;
>> +     } else {
>> +             # Exporter 5.57 which supports this invocation was
>> +             # released with perl 5.8.3
>> +             Exporter->import('import');
>> +     }
>> +}
>
> This is merely me being curious, but do we want this boilerplate,
> which we do not use in perl/Git.pm but we do in perl/Git/I18N.pm?

I don't know. I copied it as I thought that we wanted to support Perl
versions starting from 5.8.0, but I am ok to remove it or to leave it
depending on what the Perl experts think (CCing AEvar) and what we
decide.

>> +our @EXPORT = qw(
>> +                     packet_bin_read
>> +                     packet_txt_read
>> +                     packet_bin_write
>> +                     packet_txt_write
>> +                     packet_flush
>> +             );
>> +our @EXPORT_OK = @EXPORT;
>
> We can see that you made sure that the only thing 05/40 needs to do
> is to use this package and remove the definition of these subs,
> without having to touch any caller by first updating the original
> implementation in 03/40 and then exporting these names in 04/40.
> Knowing that the preparation is nicely done already, it is a bit
> irritating to see that 05/40 is a separate patch, as we need to
> switch between the patches to see if there is any difference between
> the original implementation of the subs, and the replacement
> implemented in here.  It would have been nicer to have changes in
> 04/40 and 05/40 in a single patch.

Ok, I have squashed 04/40 and 05/40 together in my current version of
this series.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 08/40] Git/Packet.pm: add capability functions
  2017-08-03 19:14   ` Junio C Hamano
@ 2017-08-04 20:34     ` Christian Couder
  0 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-04 20:34 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Thu, Aug 3, 2017 at 9:14 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Christian Couder <christian.couder@gmail.com> writes:
>
>> Add functions to help read and write capabilities.
>> Use these functions in 't/t0021/rot13-filter.pl'.
>>
>> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
>> ---
>
> Steps 06-08/40 all look sensible to me, but they probably fall into
> the same bucket as step 03/40, i.e. better done before step 04-05/40
> (which should probably be a single patch, as I earlier said) as
> preparatory steps.

Ok, I moved patches 06-08/40 before 04-05/40 in my current working version.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt
  2017-08-03 18:38   ` Stefan Beller
@ 2017-08-25  6:14     ` Christian Couder
  2017-08-25 21:23       ` Jonathan Tan
  0 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-08-25  6:14 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

On Thu, Aug 3, 2017 at 8:38 PM, Stefan Beller <sbeller@google.com> wrote:
> On Thu, Aug 3, 2017 at 2:19 AM, Christian Couder
> <christian.couder@gmail.com> wrote:
>> This describes the external odb mechanism's purpose and
>> how it works.
>
> Thanks for providing this documentation patch!
>
> I read through it sequentially, see questions that came to mind
> in between.

Thanks for your feedback!

> If the very last paragraph came earlier (or an example), it
> would have helped me to understand the big picture better.

Ok, I added the following at the end of the "helpers" section:

"Early on git commands send an 'init' instruction to the registered
commands. A capability negociation will take place during this
request/response exchange which will let Git and the helpers know how
they can further collaborate. The attribute system can also be used to
tell Git which objects should be handled by which helper."

>> +Purpose
>> +=======
>> +
>> +The purpose of this mechanism is to make possible to handle Git
>> +objects, especially blobs, in much more flexible ways.
>> +
>> +Currently Git can store its objects only in the form of loose objects
>> +in separate files or packed objects in a pack file.
>> +
>> +This is not flexible enough for some important use cases like handling
>> +really big binary files or handling a really big number of files that
>> +are fetched only as needed. And it is not realistic to expect that Git
>> +could fully natively handle many of such use cases.
>
> This is a strong statement. Why is it not realistic? What are these
> "many of such use cases"?

What I mean is that the Git default storage (loose objects and packed
objects) cannot easily be optimized for many different kind of
contents. Currently it is optimized for a not huge number of not very
big text files, and it works quite well too when there are not too
many quite small binary files. And then there are tweaks that can be
used to improve things in specific cases (for example if you have very
big text files, you can set "core.bigfilethreshold" to a size bigger
than your text files so that they will still be delta-compress as Peff
explained in a recent thread).

As Git is used by more and more by people having different needs, I
think it is not realistic to expect that we can optimize its object
storage for all these different needs. So a better strategy is to just
let them store objects in external stores.

If we wanted to optimize for different use cases without letting
people use external stores, we would anyway need to implement
different internal stores which would be a huge burden and which could
lead us to re-implement things like HTTP servers that already exists
outside Git.

About these many use cases, I gave the "really big binary files"
example which is why Git LFS exists (and which GitLab is interested in
better solving), and the "really big number of files that are fetched
only as needed" example which Microsoft is interested in solving. I
could also imagine that some people have both big text files and big
binary files in which case the "core.bigfilethreshold" might not work
well, or that some people already have blobs in some different stores
(like HTTP servers, Docker registries, artifact stores, ...) and want
to fetch them from there as much as possible. And then letting people
use different stores can make clones or fetches restartable which
would solve another problem people have long been complaining about...

I will try to use the above explanations to better improve the
statement in the documentation though I don't want it to be as long as
the above. Do you have an idea about what the right balance should be?

>> +Furthermore many improvements that are dependent on specific setups
>> +could be implemented in the way Git objects are managed if it was
>> +possible to customize how the Git objects are handled. For example a
>> +restartable clone using the bundle mechanism has often been requested,
>> +but implementing that would go against the current strict rules under
>> +which the Git objects are currently handled.
>
> So in this example, you would use todays git-clone to obtain a small version
> of the repo and then obtain other objects later?

The problem with explaining how it would work is that the
--initial-refspec option is added to git clone later in the patch
series. And there could be changes in the later part of the patch
series. So I don't want to promise or explain too much here.
But maybe I could add another patch to better explain that at the end
of the series.

>> +What Git needs a mechanism to make it possible to customize in a lot
>> +of different ways how the Git objects are handled.
>
> I do not understand why we need this. Is this aimed to support git LFS,
> which by its model has additional objects not natively tracked by Git, that
> are fetched later when needed?

It is aimed to support not just something like git LFS, but also many
different use cases (see my above explanations).

>> Though this
>> +mechanism should try as much as possible to avoid interfering with the
>> +usual way in which Git handle its objects.
>> +
>> +Helpers
>> +=======
>> +
>> +ODB helpers are commands that have to be registered using either the
>> +"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand"
>> +config variables.
>> +
>> +Registering such a command tells Git that an external odb called
>> +<odbname> exists and that the registered command should be used to
>> +communicate with it.
>> +
>> +There are 2 kinds of commands. Commands registered using the
>> +"odb.<odbname>.subprocessCommand" config variable are called "process
>> +commands" and the associated mode is called "process mode". Commands
>> +registered using the "odb.<odbname>.scriptCommand" config variables
>> +are called "script commands" and the associated mode is called "script
>> +mode".
>
> So there is the possibility for multiple ODBs by the nature of the config
> as we can have multiple <odbname> sections. How does Git know which
> odb to talk to? (does it talk to all of them when asking for a random object?)
>
> When writing an object how does Git decide where to store an object
> (internally or in one of its ODB? Maybe in multiple ODBs? Does the user
> give rules how to tackle the problem or will Git have some magic to do
> the right thing? If so where can I read about that?)

Yeah, it's possible to configure many ODBs. In this case after the
'init' instruction, Git will know what are the instructions supported
by each ODB. If more than one ODB support a 'get_*' instruction, yeah,
Git will ask the ODBs supporting a 'get_*' instruction in turn for
each object it did not already find. If more than one ODB support a
'put_*' instruction and if the attributes for a blob correspond to
more than one of these ODBs, yeah Git will try to "put" the blob into
these ODBs in turn until it succeeds.

> One could think that one ODB is able to learn about objects out of band
> i.e. to replace the fetch/clone/push mechanism, whereas another ODB is
> capable of efficient fast local storage and yet another one that is optimized
> for storing large binary files.

Yeah, all these things are possible.

Hopefully the following will clarify that:

"The communication happens through instructions that are sent by Git
and that the commands should answer. If it makes sense, Git will send
the same instruction to many commands in the order in which they are
configured."

>> +Process Mode
>> +============
>> +
>> +In process mode the command is started as a single process invocation
>> +that should last for the entire life of the single Git command that
>> +started it.
>> +
>> +A packet format (pkt-line, see technical/protocol-common.txt) based
>> +protocol over standard input and standard output is used for
>> +communication between Git and the helper command.
>> +
>> +After the process command is started, Git sends a welcome message
>> +("git-read-object-client"), a list of supported protocol version
>> +numbers, and a flush packet. Git expects to read a welcome response
>> +message ("git-read-object-server"), exactly one protocol version
>> +number from the previously sent list, and a flush packet. All further
>> +communication will be based on the selected version.
>> +
>> +The remaining protocol description below documents "version=1". Please
>> +note that "version=42" in the example below does not exist and is only
>> +there to illustrate how the protocol would look with more than one
>> +version.
>> +
>> +After the version negotiation Git sends a list of all capabilities
>> +that it supports and a flush packet. Git expects to read a list of
>> +desired capabilities, which must be a subset of the supported
>> +capabilities list, and a flush packet as response:
>> +
>> +------------------------
>> +packet: git> git-read-object-client
>> +packet: git> version=1
>> +packet: git> version=42
>> +packet: git> 0000
>> +packet: git< git-read-object-server
>> +packet: git< version=1
>> +packet: git< 0000
>> +packet: git> capability=get_raw_obj
>> +packet: git> capability=have
>> +packet: git> capability=put_raw_obj
>> +packet: git> capability=not-yet-invented
>> +packet: git> 0000
>> +packet: git< capability=get_raw_obj
>> +packet: git< 0000
>> +------------------------
>> +
>> +Afterwards Git sends a list of "key=value" pairs terminated with a
>> +flush packet. The list will contain at least the instruction (based on
>> +the supported capabilities) and the arguments for the
>> +instruction. Please note, that the process must not send any response
>> +before it received the final flush packet.
>> +
>> +In general any response from the helper should end with a status
>> +packet. See the documentation of the 'get_*' instructions below for
>> +examples of status packets.
>> +
>> +After the helper has processed an instruction, it is expected to wait
>> +for the next "key=value" list containing another instruction.
>> +
>> +On exit Git will close the pipe to the helper. The helper is then
>> +expected to detect EOF and exit gracefully on its own. Git will wait
>> +until the process has stopped.
>> +
>> +Script Mode
>> +===========
>> +
>> +In this mode Git launches the script command each time it wants to
>> +communicates with the helper. There is no welcome message and no
>> +protocol version in this mode.
>> +
>> +The instruction and associated arguments are passed as arguments when
>> +launching the script command and if needed further information is
>> +passed between Git and the command through stdin and stdout.
>> +
>> +Capabilities/Instructions
>> +=========================
>> +
>> +The following instructions are currently supported by Git:
>> +
>> +- init
>> +- get_git_obj
>> +- get_raw_obj
>> +- get_direct
>> +- put_raw_obj
>> +- have
>> +
>> +The plan is to also support 'put_git_obj' and 'put_direct' soon, for
>> +consistency with the 'get_*' instructions.
>> +
>> + - 'init'
>> +
>> +All the process and script commands must accept the 'init'
>> +instruction. It should be the first instruction sent to a command. It
>> +should not be advertised in the capability exchange. Any argument
>> +should be ignored.
>> +
>> +In process mode, after receiving the 'init' instruction and a flush
>> +packet, the helper should just send a status packet and then a flush
>> +packet. See the 'get_*' instructions below for examples of status
>> +packets.
>> +
>> +In script mode the command should print on stdout the capabilities
>> +that it supports if any. This is the only time in script mode when a
>> +capability exchange happens.
>> +
>> +For example a script command could use the following shell code
>> +snippet to handle the 'init' instruction:
>> +
>> +------------------------
>> +case "$1" in
>> +init)
>> +       echo "capability=get_git_obj"
>> +       echo "capability=put_raw_obj"
>> +       echo "capability=have"
>> +       ;;
>> +------------------------
>
> I can see the rationale for script mode, but not quite for process mode
> as in process mode we could do the same init work that is needed after
> the welcome message?
>
> Is it kept in process mode to keep consistent with script mode?

Yes and because I want only the 'init' instruction to be required.
In process mode, if there were no 'init' instruction how could Git
know if it is ok to send a 'get' instruction for example if it does
not yet know the helpers' capabilities, and how could a helper that
only setup things be called only once?

> I assume this is to setup the ODB, which then can also state  things like
> "I am not in a state to work, as the network connection is missing"
> or ask the user for a password for the encrypted database?

Yeah, the helper can also take advantage of 'init' to setup and check
everything.

Do you think I should clarify something?

>> + - 'get_git_obj <sha1>' and 'get_raw_obj <sha1>'
>> +
>> +These instructions should have a hexadecimal <sha1> argument to tell
>> +which object the helper should send to git.
>> +
>> +In process mode the sha1 argument should be followed by a flush packet
>> +like this:
>> +
>> +------------------------
>> +packet: git> command=get_git_obj
>> +packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
>> +packet: git> 0000
>> +------------------------
>> +
>> +After reading that the helper should send the requested object to Git in a
>> +packet series followed by a flush packet. If the helper does not experience
>> +problems then the helper must send a "success" status like the following:
>> +
>> +------------------------
>> +packet: git< status=success
>> +packet: git< 0000
>> +------------------------
>> +
>> +In case the helper cannot or does not want to send the requested
>> +object as well as any other object for the lifetime of the Git
>> +process, then it is expected to respond with an "abort" status at any
>> +point in the protocol:
>> +
>> +------------------------
>> +packet: git< status=abort
>> +packet: git< 0000
>> +------------------------
>> +
>> +Git neither stops nor restarts the helper in case the "error"/"abort"
>> +status is set.
>> +
>> +If the helper dies during the communication or does not adhere to the
>> +protocol then Git will stop and restart it with the next instruction.
>> +
>> +In script mode the helper should just send the requested object to Git
>> +by writing it to stdout and should then exit. The exit code should
>> +signal to Git if a problem occured or not.
>> +
>> +The only difference between 'get_git_obj' and 'get_raw_obj' is that in
>> +case of 'get_git_obj' the requested object should be sent as a Git
>> +object (that is in the same format as loose object files). In case of
>> +'get_raw_obj' the object should be sent in its raw format (that is the
>> +same output as `git cat-file <type> <sha1>`).
>
> In case of abort, what are the implications for Git? How do we deliver the
> message to the user (should the helper print to stderr, or is there a way
> to relay it through Git such that we do not have racy output?)

The helper can print something to stderr. Hopefully it will be printed
using one printf() call or something like that which will make it not
so racy. (Or what kind of race are you talking about?) And Git will
remove the current instruction from the helpers capabilities, so it
will not ask the same instruction again (for the duration of the
current git process).

>> + - 'get_direct <sha1>'
>> +
>> +This instruction is similar as the other 'get_*' instructions except
>> +that no object should be sent from the helper to Git. Instead the
>> +helper should directly write the requested object into a loose object
>> +file in the ".git/objects" directory.
>> +
>> +After the helper has sent the "status=success" packet and the
>> +following flush packet in process mode, or after it has exited in the
>> +script mode, Git should lookup again for a loose object file with the
>> +requested sha1.
>
> Does it have to be a loose object or is the helper also allowed
> to put a packfile into $GIT_OBJECT_DIRECTORY/pack ?
> If so, is it expected to also produce an idx file?

It could also be a packfile and an idx file, but I expect most of
these kind of helpers will just create loose object files.
I will clarify with:

"...Git will lookup again for the requested sha1 in its loose
object files and pack files."

>> + - 'put_raw_obj <sha1> <size> <type>'
>> +
>> +This instruction should be following by three arguments to tell which
>> +object the helper will receive from git: <sha1>, <size> and
>> +<type>. The hexadecimal <sha1> argument describes the object that will
>> +be sent from Git to the helper. The <type> is the object type (blob,
>> +tree, commit or tag) of this object. The <size> is the size of the
>> +(decompressed) object content.
>
> So the type is encoded as strings "blob", "tree" ... Maybe quote them?

Ok, they will be quoted in the next version.

> The size is "in bytes" (maybe add that unit?). I expect there is no fanciness
> allowed such as "3.3MB" as that is not precise enough.

Yeah, I added "in bytes".

>> +In process mode the last argument (the type) should be followed by a
>> +flush packet.
>> +
>> +After reading that the helper should read the announced object from
>> +Git in a packet series followed by a flush packet.
>> +
>> +If the helper does not experience problems when receiving and storing
>> +or processing the object, then the helper must send a "success" status
>> +as described for the 'get_*' instructions.
>
> Otherwise an abort is expected?

There are also "notfound" and "error" failures. I will clarify this
with the following:

"Git neither stops nor restarts the helper in case a
"notfound"/"error"/"abort" status is set. An "error" status means a
possibly more transient error than an abort. The helper should also
send a "notfound" error in case of a "get_*" instruction, which means
that the requested object cannot be found."

>> +In script mode the helper should just receive the announced object
>> +from its standard input. After receiving and processing the object,
>> +the helper should exit and its exit code should signal to Git if a
>> +problem occured or not.
>> +
>> +- 'have'
>> +
>> +In process mode this instruction should be followed by a flush
>> +packet. After receiving this packet the helper should send the sha1,
>> +size and type, in this order, of all the objects it can provide to Git
>> +(through a 'get_*' instruction). There should be a space character
>> +between the sha1 and the size and between the size and the type, and
>> +then a new line character after the type.
>
> As this is also inside a packet, do we need to care about splitting
> up the payload? i.e. when we have a lot of objects such that we need
> multiple packets to present all 'have's, are we allowed to split
> up anywhere or just after a '\n' ?

The code only supports splitting just after a '\n'. I will clarify with:

"If many packets are needed to send back all this information, the
split between packets should be made after the new line characters."

>> +If the helper does not experience problems, then it must then send a
>> +"success" status as described for the 'get_*' instructions.
>> +
>> +In script mode the helper should send to its standard output the sha1,
>> +size and type, in this order of all the objects it can provide to
>> +Git. There should also be a space character between the sha1 and the
>> +size and between the size and the type, and then a new line character
>> +after the type.
>> +
>> +After sending this, the script helper should exit and its exit code
>> +should signal to Git if a problem occured or not.
>> +
>> +Selecting objects
>> +=================
>> +
>> +To select objects that should be handled by an external odb, one can
>> +use the git attributes system. For now this will only work will blobs
>> +and this will only work along with the 'put_raw_obj' instruction.
>> +
>> +For example if one has an external odb called "magic" and has
>> +registered an associated a process command helper that supports the
>> +'put_raw_obj' instruction, then one can tell Git that all the .jpg
>> +files should be handled by the "magic" odb using a .gitattributes file
>> +can that contains:
>> +
>> +------------------------
>> +*.jpg           odb=magic
>> +------------------------
>
> Hah that answers some questions that are asked earlier!
>
> What happens if I say
>
>   *.jpg odb=my-magic-store,my-jpeg-store
>
> ?

I am not sure how the attributes system works but I think it should handle this.
So the above would mean that Git will try to send the .jpg files to
both the "my-magic-store" and the "my-jpeg-store" helpers. The order
depends on which of those appears first in the config files.

> Maybe relevant:
> https://public-inbox.org/git/20170725211300.vwlpioy5jes55273@sigill.intra.peff.net/
> "Extend the .gitattributes file to also specify file sizes"

Yeah, it looks like this could help if some attributes could be set
depending on file sizes.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt
  2017-08-25  6:14     ` Christian Couder
@ 2017-08-25 21:23       ` Jonathan Tan
  2017-08-29  9:37         ` Christian Couder
  0 siblings, 1 reply; 73+ messages in thread
From: Jonathan Tan @ 2017-08-25 21:23 UTC (permalink / raw)
  To: Christian Couder
  Cc: Stefan Beller, git, Junio C Hamano, Jeff King, Ben Peart,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

On Fri, 25 Aug 2017 08:14:08 +0200
Christian Couder <christian.couder@gmail.com> wrote:

> As Git is used by more and more by people having different needs, I
> think it is not realistic to expect that we can optimize its object
> storage for all these different needs. So a better strategy is to just
> let them store objects in external stores.
[snip]
> About these many use cases, I gave the "really big binary files"
> example which is why Git LFS exists (and which GitLab is interested in
> better solving), and the "really big number of files that are fetched
> only as needed" example which Microsoft is interested in solving. I
> could also imagine that some people have both big text files and big
> binary files in which case the "core.bigfilethreshold" might not work
> well, or that some people already have blobs in some different stores
> (like HTTP servers, Docker registries, artifact stores, ...) and want
> to fetch them from there as much as possible. 

Thanks for explaining the use cases - this makes sense, especially the
last one which motivates the different modes for the "get" command
(return raw bytes vs populating the Git repository with loose/packed
objects).

> And then letting people
> use different stores can make clones or fetches restartable which
> would solve another problem people have long been complaining about...

This is unrelated to the rest of my e-mail, but out of curiosity, how
would a different store make clones or fetches restartable? Do you mean
that Git would invoke a "fetch" command through the ODB protocol instead
of using its own native protocol?

> >> +Furthermore many improvements that are dependent on specific setups
> >> +could be implemented in the way Git objects are managed if it was
> >> +possible to customize how the Git objects are handled. For example a
> >> +restartable clone using the bundle mechanism has often been requested,
> >> +but implementing that would go against the current strict rules under
> >> +which the Git objects are currently handled.
> >
> > So in this example, you would use todays git-clone to obtain a small version
> > of the repo and then obtain other objects later?
> 
> The problem with explaining how it would work is that the
> --initial-refspec option is added to git clone later in the patch
> series. And there could be changes in the later part of the patch
> series. So I don't want to promise or explain too much here.
> But maybe I could add another patch to better explain that at the end
> of the series.

Such an explanation, in whatever form (patch or e-mail) would be great,
because I'm not sure of the interaction between fetches and the
connectivity check.

The approach I have taken in my own patches [1] is to (1) declare that
if a lazy remote supplies an object, it promises to have everything
referred to by that object, and (2) we thus only need to check the
objects not from the lazy remote. Translated to the ODB world, (1) is
possible in the Microsoft case and is trivial in all the cases where the
ODB provides only blobs (since blobs don't refer to any other object),
and for (2), a "list" command should suffice.

One constraint is that we do not want to obtain (from the remote) or
store a separate list of what it has, to avoid the overhead. (I saw the
--initial-refspec approach - that would not work if we want to avoid the
overhead.)

For fetches, we remember the objects obtained from that specific remote
by adding a special file, name to be determined (I used ".imported" in
[1]). (The same method is used to note objects lazily downloaded.) The
repack command understands the difference between these two types of
objects (patches for this are in progress).

I'm not sure if this can be translated to the ODB world. The ODB can
declare a special capability that fetch sends to the server in order to
inform the server that it can exclude certain objects, and fetch can
inform the ODB of the packfiles that it has written, but I'm not sure
how the ODB can "remember" what it has. The ODB could mark such packs
with ".managed" to note that it is managed by that ODB, so Git shoudn't
touch it, but this means (for example) that Git can't GC them (and it
seems also quite contradictory for an ODB to manage Git packfiles).

[1] https://public-inbox.org/git/20170804145113.5ceafafa@twelve2.svl.corp.google.com/

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt
  2017-08-03  9:19 ` [PATCH v5 35/40] Add Documentation/technical/external-odb.txt Christian Couder
  2017-08-03 18:38   ` Stefan Beller
@ 2017-08-28 18:59   ` Ben Peart
  2017-08-29 15:43     ` Christian Couder
  1 sibling, 1 reply; 73+ messages in thread
From: Ben Peart @ 2017-08-28 18:59 UTC (permalink / raw)
  To: Christian Couder, git
  Cc: Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder



On 8/3/2017 5:19 AM, Christian Couder wrote:
> This describes the external odb mechanism's purpose and
> how it works.
> 
> Helped-by: Ben Peart <benpeart@microsoft.com>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>   Documentation/technical/external-odb.txt | 295 +++++++++++++++++++++++++++++++
>   1 file changed, 295 insertions(+)
>   create mode 100644 Documentation/technical/external-odb.txt
> 
> diff --git a/Documentation/technical/external-odb.txt b/Documentation/technical/external-odb.txt
> new file mode 100644
> index 0000000000..5991221fd5
> --- /dev/null
> +++ b/Documentation/technical/external-odb.txt
> @@ -0,0 +1,295 @@
> +External ODBs
> +^^^^^^^^^^^^^
> +
> +The External ODB mechanism makes it possible for Git objects, mostly
> +blobs for now though, to be stored in an "external object database"
> +(External ODB).
> +
> +An External ODB can be any object store as long as there is an helper
> +program called an "odb helper" that can communicate with Git to
> +transfer objects to/from the external odb and to retrieve information
> +about available objects in the external odb.
> +
> +Purpose
> +=======
> +
> +The purpose of this mechanism is to make possible to handle Git
> +objects, especially blobs, in much more flexible ways.
> +
> +Currently Git can store its objects only in the form of loose objects
> +in separate files or packed objects in a pack file.
> +
> +This is not flexible enough for some important use cases like handling
> +really big binary files or handling a really big number of files that
> +are fetched only as needed. And it is not realistic to expect that Git
> +could fully natively handle many of such use cases.
> +
> +Furthermore many improvements that are dependent on specific setups
> +could be implemented in the way Git objects are managed if it was
> +possible to customize how the Git objects are handled. For example a
> +restartable clone using the bundle mechanism has often been requested,
> +but implementing that would go against the current strict rules under
> +which the Git objects are currently handled.
> +
> +What Git needs a mechanism to make it possible to customize in a lot
> +of different ways how the Git objects are handled. Though this
> +mechanism should try as much as possible to avoid interfering with the
> +usual way in which Git handle its objects.
> +
> +Helpers
> +=======
> +
> +ODB helpers are commands that have to be registered using either the
> +"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand"
> +config variables.
> +
> +Registering such a command tells Git that an external odb called
> +<odbname> exists and that the registered command should be used to
> +communicate with it.
> +

What order are the odb handlers called? Are they called before or after 
the regular object store code for loose, pack and alternates?  Is the 
order configurable?

[...]
> +
> + - 'get_direct <sha1>'
> +
> +This instruction is similar as the other 'get_*' instructions except
> +that no object should be sent from the helper to Git. Instead the
> +helper should directly write the requested object into a loose object
> +file in the ".git/objects" directory.
> +
> +After the helper has sent the "status=success" packet and the
> +following flush packet in process mode, or after it has exited in the
> +script mode, Git should lookup again for a loose object file with the
> +requested sha1.

When will git call get_direct vs one of the other get_* functions? Could 
the functionality of enabling a helper to populate objects into the 
regular object store be provided by having a ODB helper that returned 
the object data as requested by get_git_obj or get_raw_obj but also 
stored it in the regular object store as a loose object (or pack file) 
for future calls?



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt
  2017-08-25 21:23       ` Jonathan Tan
@ 2017-08-29  9:37         ` Christian Couder
  0 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-29  9:37 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Stefan Beller, git, Junio C Hamano, Jeff King, Ben Peart,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

On Fri, Aug 25, 2017 at 11:23 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Fri, 25 Aug 2017 08:14:08 +0200
> Christian Couder <christian.couder@gmail.com> wrote:
>
>> As Git is used by more and more by people having different needs, I
>> think it is not realistic to expect that we can optimize its object
>> storage for all these different needs. So a better strategy is to just
>> let them store objects in external stores.
> [snip]
>> About these many use cases, I gave the "really big binary files"
>> example which is why Git LFS exists (and which GitLab is interested in
>> better solving), and the "really big number of files that are fetched
>> only as needed" example which Microsoft is interested in solving. I
>> could also imagine that some people have both big text files and big
>> binary files in which case the "core.bigfilethreshold" might not work
>> well, or that some people already have blobs in some different stores
>> (like HTTP servers, Docker registries, artifact stores, ...) and want
>> to fetch them from there as much as possible.
>
> Thanks for explaining the use cases - this makes sense, especially the
> last one which motivates the different modes for the "get" command
> (return raw bytes vs populating the Git repository with loose/packed
> objects).

Thanks for this comment. Now the beginning of the "Purpose" section
looks like this:

--------
The purpose of this mechanism is to make possible to handle Git
objects, especially blobs, in much more flexible ways.

Currently Git can store its objects only in the form of loose objects
in separate files or packed objects in a pack file. These existing
object stores cannot be easily optimized for many different kind of
contents.

So the current stores are not flexible enough for some important use
cases like handling really big binary files or handling a really big
number of files that are fetched only as needed. And it is not
realistic to expect that Git could fully natively handle many of such
use cases. Git would need to natively implement different internal
stores which would be a huge burden and which could lead to
re-implement things like HTTP servers, Docker registries or artifact
stores that already exist outside Git.
--------

>> And then letting people
>> use different stores can make clones or fetches restartable which
>> would solve another problem people have long been complaining about...
>
> This is unrelated to the rest of my e-mail, but out of curiosity, how
> would a different store make clones or fetches restartable? Do you mean
> that Git would invoke a "fetch" command through the ODB protocol instead
> of using its own native protocol?

Yeah, the idea is that during a clone (or a fetch), Git could first
fetch some refs with "meta information", for example refs/odb/magic/*
(where "magic" is the odb name) which would tell if some (new) bundles
are available.
If there are (new) bundles available then during the 'init'
instruction, which takes place just after this first fetch, the
external odb helper will notice that and "fetch" the (new) bundles
using a restartable protocol, for example HTTP.

If something goes wrong when the helper "fetches" a bundle, the helper
could force the clone (or the fetch) to error out (after maybe
retrying), and when the user (or the helper itself) tries again to
clone (or fetch), the helper would restart its bundle "fetch" (using
the restartable protocol).

When this "fetch" eventually succeeds, then the helper will unbundle
what it received, and then give back control to the second regular
part of the clone (or fetch). This regular part of the clone (or
fetch) will then try to fetch the usual refs, but as the unbundling
has already updated the content of the usual refs (as well as the
object stores) this fetch will find that everything is up-to-date.

Ok, maybe everything is not quite up-to-date and there are still
things to fetch, but anyway the biggest part of the clone (or fetch)
has already been made using a restartable protocol, so we are doing
much better than if we are not restartable at all.

There are examples in t0430 at the end of the patch series of
restartable clones. They don't really test that one can indeed restart
a clone, but they show that things "just work" when `git clone` is
split into an initial fetch (that updates "meta information" refs),
then the 'init' instruction sent to an helper (that fetches and
unbundles a bundle based on the "meta information" refs) and then the
regular part of the clone.

>> >> +Furthermore many improvements that are dependent on specific setups
>> >> +could be implemented in the way Git objects are managed if it was
>> >> +possible to customize how the Git objects are handled. For example a
>> >> +restartable clone using the bundle mechanism has often been requested,
>> >> +but implementing that would go against the current strict rules under
>> >> +which the Git objects are currently handled.
>> >
>> > So in this example, you would use todays git-clone to obtain a small version
>> > of the repo and then obtain other objects later?
>>
>> The problem with explaining how it would work is that the
>> --initial-refspec option is added to git clone later in the patch
>> series. And there could be changes in the later part of the patch
>> series. So I don't want to promise or explain too much here.
>> But maybe I could add another patch to better explain that at the end
>> of the series.
>
> Such an explanation, in whatever form (patch or e-mail) would be great,
> because I'm not sure of the interaction between fetches and the
> connectivity check.

Ok, I will add parts of the above explanations to a documentation
patch at the end of the patch series.

> The approach I have taken in my own patches [1] is to (1) declare that
> if a lazy remote supplies an object, it promises to have everything
> referred to by that object, and (2) we thus only need to check the
> objects not from the lazy remote. Translated to the ODB world, (1) is
> possible in the Microsoft case and is trivial in all the cases where the
> ODB provides only blobs (since blobs don't refer to any other object),
> and for (2), a "list" command should suffice.

Ok, the "list" command is a command that any lazy remote should
implement so that other repos can ask it which are all the objects it
has, right?

> One constraint is that we do not want to obtain (from the remote) or
> store a separate list of what it has, to avoid the overhead.

So the answer to the "list" command should be part of the answer which
sends all the objects.

> (I saw the
> --initial-refspec approach - that would not work if we want to avoid the
> overhead.)

The --initial-refspec approach is interesting if you want to fetch a
big number of objects or many big objects, like when you do an initial
clone of a big repo. In this use case a relatively small amount of
time spent in the initial fetch is an acceptable trade-off if the
clone or the fetch is restartable.

Also as the --initial-refspec clone or fetch could alleviate resource
usage of the server, it could be even faster than a regular clone or
fetch in this case.

I don't think the --initial-refspec option should be used all the time
when an external odb is configured. But using an external odb in the
first place means that you have specific requirements which suggests
that the regular way to clone (or fetch or push) might not be very
good for your use cases and for the objects that are stored in the
external ODB.

When you don't use --intial-refspec and an external odb helper is
configured, what happens is that the objects managed by the external
odb are not put in the pack file that is sent, so the receiver should
also have configured an external odb helper that can get the missing
objects otherwise Git will error out complaining about missing
objects.

This has some drawbacks of course, but at least it makes sure that
users' repositories are properly configured before they can start
working with a server using an external ODB.

> For fetches, we remember the objects obtained from that specific remote
> by adding a special file, name to be determined (I used ".imported" in
> [1]). (The same method is used to note objects lazily downloaded.) The
> repack command understands the difference between these two types of
> objects (patches for this are in progress).
>
> I'm not sure if this can be translated to the ODB world. The ODB can
> declare a special capability that fetch sends to the server in order to
> inform the server that it can exclude certain objects, and fetch can
> inform the ODB of the packfiles that it has written, but I'm not sure
> how the ODB can "remember" what it has.

The current design of the "ODB world" doesn't require a new capability
and I think that is a good thing.
Maybe it will be (or it is already) possible to optimize it using a
new capability, but I think it is a good thing to separate the new
capability if possible.

I know I have not answered all the previous emails and I will try to
answer soon, but I try to improve the documentation (and the code) at
the same time, so that hopefully it makes thing clearer for people who
will have similar questions later.

> The ODB could mark such packs
> with ".managed" to note that it is managed by that ODB, so Git shoudn't
> touch it, but this means (for example) that Git can't GC them (and it
> seems also quite contradictory for an ODB to manage Git packfiles).

I don't like very much ODB managing packs. Yeah, the "get_direct"
instruction and cloning using bundles require the helpers to kind of
write loose files or pack files, but they should really use git
commands to do that and then stop taking care.
The more simple the helper can be, the better.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt
  2017-08-28 18:59   ` Ben Peart
@ 2017-08-29 15:43     ` Christian Couder
  2017-08-30 12:50       ` Ben Peart
  0 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-08-29 15:43 UTC (permalink / raw)
  To: Ben Peart
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

On Mon, Aug 28, 2017 at 8:59 PM, Ben Peart <peartben@gmail.com> wrote:
>
> On 8/3/2017 5:19 AM, Christian Couder wrote:
>>
>> +Helpers
>> +=======
>> +
>> +ODB helpers are commands that have to be registered using either the
>> +"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand"
>> +config variables.
>> +
>> +Registering such a command tells Git that an external odb called
>> +<odbname> exists and that the registered command should be used to
>> +communicate with it.
>
> What order are the odb handlers called? Are they called before or after the
> regular object store code for loose, pack and alternates?  Is the order
> configurable?

For get_*_object instructions the regular code is called before the odb helpers.
(So the odb helper code is at the end of stat_sha1_file() and of
open_sha1_file() in sha1_file.c.)

For put_*_object instructions the regular code is called after the odb helpers.
(So the odb helper code is at the beginning of write_sha1_file() in
sha1_file.c.)

And no this order is not configurable, but of course it could be made
configurable.

>> + - 'get_direct <sha1>'
>> +
>> +This instruction is similar as the other 'get_*' instructions except
>> +that no object should be sent from the helper to Git. Instead the
>> +helper should directly write the requested object into a loose object
>> +file in the ".git/objects" directory.
>> +
>> +After the helper has sent the "status=success" packet and the
>> +following flush packet in process mode, or after it has exited in the
>> +script mode, Git should lookup again for a loose object file with the
>> +requested sha1.
>
> When will git call get_direct vs one of the other get_* functions?

It is called just before exiting when git cannot find an object.
It is not exactly at the same place as other get_* instructions as I
tried to reuse your code and as it looks like it makes it easier to
retry the regular code after the odb helper code.

> Could the
> functionality of enabling a helper to populate objects into the regular
> object store be provided by having a ODB helper that returned the object
> data as requested by get_git_obj or get_raw_obj but also stored it in the
> regular object store as a loose object (or pack file) for future calls?

I am not sure I understand what you mean.
If a helper returns the object data as requested by get_git_obj or
get_raw_obj, then currently Git will itself store the object locally
in its regular object store, so it is redundant for the helper to also
store or try to store the object in the regular object store.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt
  2017-08-29 15:43     ` Christian Couder
@ 2017-08-30 12:50       ` Ben Peart
  2017-08-30 14:15         ` Christian Couder
  0 siblings, 1 reply; 73+ messages in thread
From: Ben Peart @ 2017-08-30 12:50 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder



On 8/29/2017 11:43 AM, Christian Couder wrote:
> On Mon, Aug 28, 2017 at 8:59 PM, Ben Peart <peartben@gmail.com> wrote:
>>
>> On 8/3/2017 5:19 AM, Christian Couder wrote:
>>>
>>> +Helpers
>>> +=======
>>> +
>>> +ODB helpers are commands that have to be registered using either the
>>> +"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand"
>>> +config variables.
>>> +
>>> +Registering such a command tells Git that an external odb called
>>> +<odbname> exists and that the registered command should be used to
>>> +communicate with it.
>>
>> What order are the odb handlers called? Are they called before or after the
>> regular object store code for loose, pack and alternates?  Is the order
>> configurable?
> 
> For get_*_object instructions the regular code is called before the odb helpers.
> (So the odb helper code is at the end of stat_sha1_file() and of
> open_sha1_file() in sha1_file.c.)
> 
> For put_*_object instructions the regular code is called after the odb helpers.
> (So the odb helper code is at the beginning of write_sha1_file() in
> sha1_file.c.)
> 
> And no this order is not configurable, but of course it could be made
> configurable.
> 
>>> + - 'get_direct <sha1>'
>>> +
>>> +This instruction is similar as the other 'get_*' instructions except
>>> +that no object should be sent from the helper to Git. Instead the
>>> +helper should directly write the requested object into a loose object
>>> +file in the ".git/objects" directory.
>>> +
>>> +After the helper has sent the "status=success" packet and the
>>> +following flush packet in process mode, or after it has exited in the
>>> +script mode, Git should lookup again for a loose object file with the
>>> +requested sha1.
>>
>> When will git call get_direct vs one of the other get_* functions?
> 
> It is called just before exiting when git cannot find an object.
> It is not exactly at the same place as other get_* instructions as I
> tried to reuse your code and as it looks like it makes it easier to
> retry the regular code after the odb helper code.
> 
>> Could the
>> functionality of enabling a helper to populate objects into the regular
>> object store be provided by having a ODB helper that returned the object
>> data as requested by get_git_obj or get_raw_obj but also stored it in the
>> regular object store as a loose object (or pack file) for future calls?
> 
> I am not sure I understand what you mean.
> If a helper returns the object data as requested by get_git_obj or
> get_raw_obj, then currently Git will itself store the object locally
> in its regular object store, so it is redundant for the helper to also
> store or try to store the object in the regular object store.
> 

Doesn't this mean that objects will "leak out" into the regular object 
store as they are used?  For example, at checkout, all objects in the 
requested commit would be retrieved from the various object stores and 
if they came from a "large blob" ODB handler, they would get retrieved 
from the ODB handler and then written to the regular object store 
(presumably as a loose object).  From then on, the object would be 
retrieved from the regular object store.

This would seem to defeat the goal of enabling specialized object 
handlers to handle large or other "unusual" objects that git normally 
doesn't deal well with.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 35/40] Add Documentation/technical/external-odb.txt
  2017-08-30 12:50       ` Ben Peart
@ 2017-08-30 14:15         ` Christian Couder
  0 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-08-30 14:15 UTC (permalink / raw)
  To: Ben Peart
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider, Eric Wong,
	Christian Couder

On Wed, Aug 30, 2017 at 2:50 PM, Ben Peart <peartben@gmail.com> wrote:
>
>
> On 8/29/2017 11:43 AM, Christian Couder wrote:
>>
>> On Mon, Aug 28, 2017 at 8:59 PM, Ben Peart <peartben@gmail.com> wrote:
>>>
>>>
>>> On 8/3/2017 5:19 AM, Christian Couder wrote:
>>>>
>>>>
>>>> +Helpers
>>>> +=======
>>>> +
>>>> +ODB helpers are commands that have to be registered using either the
>>>> +"odb.<odbname>.subprocessCommand" or the "odb.<odbname>.scriptCommand"
>>>> +config variables.
>>>> +
>>>> +Registering such a command tells Git that an external odb called
>>>> +<odbname> exists and that the registered command should be used to
>>>> +communicate with it.
>>>
>>>
>>> What order are the odb handlers called? Are they called before or after
>>> the
>>> regular object store code for loose, pack and alternates?  Is the order
>>> configurable?
>>
>>
>> For get_*_object instructions the regular code is called before the odb
>> helpers.
>> (So the odb helper code is at the end of stat_sha1_file() and of
>> open_sha1_file() in sha1_file.c.)
>>
>> For put_*_object instructions the regular code is called after the odb
>> helpers.
>> (So the odb helper code is at the beginning of write_sha1_file() in
>> sha1_file.c.)
>>
>> And no this order is not configurable, but of course it could be made
>> configurable.
>>
>>>> + - 'get_direct <sha1>'
>>>> +
>>>> +This instruction is similar as the other 'get_*' instructions except
>>>> +that no object should be sent from the helper to Git. Instead the
>>>> +helper should directly write the requested object into a loose object
>>>> +file in the ".git/objects" directory.
>>>> +
>>>> +After the helper has sent the "status=success" packet and the
>>>> +following flush packet in process mode, or after it has exited in the
>>>> +script mode, Git should lookup again for a loose object file with the
>>>> +requested sha1.
>>>
>>>
>>> When will git call get_direct vs one of the other get_* functions?
>>
>>
>> It is called just before exiting when git cannot find an object.
>> It is not exactly at the same place as other get_* instructions as I
>> tried to reuse your code and as it looks like it makes it easier to
>> retry the regular code after the odb helper code.
>>
>>> Could the
>>> functionality of enabling a helper to populate objects into the regular
>>> object store be provided by having a ODB helper that returned the object
>>> data as requested by get_git_obj or get_raw_obj but also stored it in the
>>> regular object store as a loose object (or pack file) for future calls?
>>
>>
>> I am not sure I understand what you mean.
>> If a helper returns the object data as requested by get_git_obj or
>> get_raw_obj, then currently Git will itself store the object locally
>> in its regular object store, so it is redundant for the helper to also
>> store or try to store the object in the regular object store.
>>
>
> Doesn't this mean that objects will "leak out" into the regular object store
> as they are used?  For example, at checkout, all objects in the requested
> commit would be retrieved from the various object stores and if they came
> from a "large blob" ODB handler, they would get retrieved from the ODB
> handler and then written to the regular object store (presumably as a loose
> object).  From then on, the object would be retrieved from the regular
> object store.
>
> This would seem to defeat the goal of enabling specialized object handlers
> to handle large or other "unusual" objects that git normally doesn't deal
> well with.

Yeah, I agree that storing the objects in the regular object store
should not be done in all the cases.
There should be a way to control that.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 11/40] odb-helper: add odb_helper_init() to send 'init' instruction
  2017-08-03  9:18 ` [PATCH v5 11/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
@ 2017-09-10 12:12   ` Lars Schneider
  2017-09-14  7:18     ` Christian Couder
  0 siblings, 1 reply; 73+ messages in thread
From: Lars Schneider @ 2017-09-10 12:12 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Eric Wong, Christian Couder


> On 03 Aug 2017, at 10:18, Christian Couder <christian.couder@gmail.com> wrote:
> 
> Let's add an odb_helper_init() function to send an 'init'
> instruction to the helpers. This 'init' instruction is
> especially useful to get the capabilities that are supported
> by the helpers.
> 
> So while at it, let's also add a parse_capabilities()
> function to parse them and a supported_capabilities
> variable in struct odb_helper to store them.
> 
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
> ...
> 
> +static void parse_capabilities(char *cap_buf,
> +			       unsigned int *supported_capabilities,
> +			       const char *process_name)
> +{
> +	struct string_list cap_list = STRING_LIST_INIT_NODUP;
> +
> +	string_list_split_in_place(&cap_list, cap_buf, '=', 1);
> +
> +	if (cap_list.nr == 2 && !strcmp(cap_list.items[0].string, "capability")) {
> +		const char *cap_name = cap_list.items[1].string;
> +
> +		if (!strcmp(cap_name, "get_git_obj")) {
> +			*supported_capabilities |= ODB_HELPER_CAP_GET_GIT_OBJ;
> +		} else if (!strcmp(cap_name, "get_raw_obj")) {
> +			*supported_capabilities |= ODB_HELPER_CAP_GET_RAW_OBJ;
> +		} else if (!strcmp(cap_name, "get_direct")) {
> +			*supported_capabilities |= ODB_HELPER_CAP_GET_DIRECT;
> +		} else if (!strcmp(cap_name, "put_git_obj")) {
> +			*supported_capabilities |= ODB_HELPER_CAP_PUT_GIT_OBJ;
> +		} else if (!strcmp(cap_name, "put_raw_obj")) {
> +			*supported_capabilities |= ODB_HELPER_CAP_PUT_RAW_OBJ;
> +		} else if (!strcmp(cap_name, "put_direct")) {
> +			*supported_capabilities |= ODB_HELPER_CAP_PUT_DIRECT;
> +		} else if (!strcmp(cap_name, "have")) {
> +			*supported_capabilities |= ODB_HELPER_CAP_HAVE;
> +		} else {
> +			warning("external process '%s' requested unsupported read-object capability '%s'",
> +				process_name, cap_name);
> +		}

In 1514c8ed ("convert: refactor capabilities negotiation", 2017-06-30) I introduced
a simpler version of the capabilities negotiation. Maybe useful for you here, too? 

- Lars

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 12/40] t0400: add 'put_raw_obj' instruction to odb-helper script
  2017-08-03  9:18 ` [PATCH v5 12/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
@ 2017-09-10 12:12   ` Lars Schneider
  2017-09-14  7:09     ` Christian Couder
  0 siblings, 1 reply; 73+ messages in thread
From: Lars Schneider @ 2017-09-10 12:12 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Eric Wong, Christian Couder


> On 03 Aug 2017, at 10:18, Christian Couder <christian.couder@gmail.com> wrote:
> 
> To properly test passing objects from Git to an external odb
> we need an odb-helper script that supports a 'put'
> capability/instruction.
> 
> For now we will support only sending raw blobs, so the
> supported capability/instruction will be 'put_raw_obj'.

What other kind of blobs do you imagine could we send?


> While at it let's add a test to check that our odb-helper
> script works well.
> 
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
> t/t0400-external-odb.sh | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
> 
> diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
> index ed89f3ab40..3fa0449883 100755
> --- a/t/t0400-external-odb.sh
> +++ b/t/t0400-external-odb.sh
> @@ -7,10 +7,15 @@ test_description='basic tests for external object databases'
> ALT_SOURCE="$PWD/alt-repo/.git"
> export ALT_SOURCE
> write_script odb-helper <<\EOF
> +die() {
> +	printf >&2 "%s\n" "$@"
> +	exit 1
> +}
> GIT_DIR=$ALT_SOURCE; export GIT_DIR
> case "$1" in
> init)
> 	echo "capability=get_git_obj"
> +	echo "capability=put_raw_obj"
> 	echo "capability=have"
> 	;;
> have)
> @@ -20,6 +25,16 @@ have)
> get_git_obj)
> 	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
> 	;;
> +put_raw_obj)
> +	sha1="$2"
> +	size="$3"
> +	kind="$4"
> +	writen=$(git hash-object -w -t "$kind" --stdin)
> +	test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen '$writen'"

Typo? Should it be "written"?

- Lars

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 00/40] Add initial experimental external ODB support
  2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
                   ` (39 preceding siblings ...)
  2017-08-03  9:19 ` [PATCH v5 40/40] Add t0430 to test cloning using bundles Christian Couder
@ 2017-09-10 12:30 ` Lars Schneider
  2017-09-14  7:02   ` Christian Couder
  40 siblings, 1 reply; 73+ messages in thread
From: Lars Schneider @ 2017-09-10 12:30 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Eric Wong, Christian Couder


> On 03 Aug 2017, at 10:18, Christian Couder <christian.couder@gmail.com> wrote:
> 
> ...
> 
> * The "helpers" (registered commands)
> 
> Each helper manages access to one external ODB.
> 
> There are 2 different modes for helper:
> 
>  - Helpers configured using "odb.<odbname>.scriptCommand" are
>    launched each time Git wants to communicate with the <odbname>
>    external ODB. This is called "script mode".
> 
>  - Helpers configured using "odb.<odbname>.subprocessCommand" are
>    launched launched once as a sub-process (using sub-process.h), and
>    Git communicates with them using packet lines. This is called
>    "process mode".

I am curious, why would we support two modes? Wouldn't that increase
the maintenance cost? Wouldn't the subprocess command be superior?
I imagine the script mode eases testing, right?!


> ...
> 
> These odb refs point to a blob that is stored in the Git
> repository and contain information about the blob stored in the
> external odb. This information can be specific to the external odb.
> The repos can then share this information using commands like:
> 
> `git fetch origin "refs/odbs/<odbname>/*:refs/odbs/<odbname>/*"`
> 
> At the end of the current patch series, "git clone" is teached a
> "--initial-refspec" option, that asks it to first fetch some specified
> refs. This is used in the tests to fetch the odb refs first.
> 
> This way only one "git clone" command can setup a repo using the
> external ODB mechanism as long as the right helper is installed on the
> machine and as long as the following options are used:
> 
>  - "--initial-refspec <odbrefspec>" to fetch the odb refspec
>  - "-c odb.<odbname>.command=<helper>" to configure the helper

The "odb" config could, of course, go into the global git config. 
The odbrefspec is optional, right?

I have the impression there are a number of topics on the list
that tackle the "many/big objects in a Git repo" problem. Is
there a write up about the status of them, how they relate
to each other, and what the current problems are? 
I found the following but it looks abandoned:
https://github.com/jrn/git-large-repositories

- Lars

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 00/40] Add initial experimental external ODB support
  2017-09-10 12:30 ` [PATCH v5 00/40] Add initial experimental external ODB support Lars Schneider
@ 2017-09-14  7:02   ` Christian Couder
  0 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-09-14  7:02 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Eric Wong, Christian Couder

On Sun, Sep 10, 2017 at 2:30 PM, Lars Schneider
<larsxschneider@gmail.com> wrote:
>
>> On 03 Aug 2017, at 10:18, Christian Couder <christian.couder@gmail.com> wrote:
>>
>> ...
>>
>> * The "helpers" (registered commands)
>>
>> Each helper manages access to one external ODB.
>>
>> There are 2 different modes for helper:
>>
>>  - Helpers configured using "odb.<odbname>.scriptCommand" are
>>    launched each time Git wants to communicate with the <odbname>
>>    external ODB. This is called "script mode".
>>
>>  - Helpers configured using "odb.<odbname>.subprocessCommand" are
>>    launched launched once as a sub-process (using sub-process.h), and
>>    Git communicates with them using packet lines. This is called
>>    "process mode".
>
> I am curious, why would we support two modes? Wouldn't that increase
> the maintenance cost? Wouldn't the subprocess command be superior?
> I imagine the script mode eases testing, right?!

The script mode makes it much easier to write some helpers. For
example, as shown in t0430 at the end of the patch series, a helper
for a restartable bundle based clone could be something like
basically:

case "$1" in
init)
    ref_hash=$(git rev-parse refs/odbs/magic/bundle) ||
    die "couldn't find refs/odbs/magic/bundle"
    GIT_NO_EXTERNAL_ODB=1 git cat-file blob "$ref_hash" >bundle_info ||
    die "couldn't get blob $ref_hash"
    bundle_url=$(sed -e 's/bundle url: //' bundle_info)
    curl "$bundle_url" -o bundle_file ||
    die "curl '$bundle_url' failed"
    GIT_NO_EXTERNAL_ODB=1 git bundle unbundle bundle_file >unbundling_info ||
    die "unbundling 'bundle_file' failed"
    ;;

>> These odb refs point to a blob that is stored in the Git
>> repository and contain information about the blob stored in the
>> external odb. This information can be specific to the external odb.
>> The repos can then share this information using commands like:
>>
>> `git fetch origin "refs/odbs/<odbname>/*:refs/odbs/<odbname>/*"`
>>
>> At the end of the current patch series, "git clone" is teached a
>> "--initial-refspec" option, that asks it to first fetch some specified
>> refs. This is used in the tests to fetch the odb refs first.
>>
>> This way only one "git clone" command can setup a repo using the
>> external ODB mechanism as long as the right helper is installed on the
>> machine and as long as the following options are used:
>>
>>  - "--initial-refspec <odbrefspec>" to fetch the odb refspec
>>  - "-c odb.<odbname>.command=<helper>" to configure the helper
>
> The "odb" config could, of course, go into the global git config.

Sure.

> The odbrefspec is optional, right?

Using "--initial-refspec <odbrefspec>" is optional. There will be more
information in the documentation about this option in the next version
of the series.

> I have the impression there are a number of topics on the list
> that tackle the "many/big objects in a Git repo" problem. Is
> there a write up about the status of them, how they relate
> to each other, and what the current problems are?
> I found the following but it looks abandoned:
> https://github.com/jrn/git-large-repositories

Yeah, it could be interesting to discuss all these topics together. On
the other hand people working on existing patch series, like me, have
to work on them and post new versions, as just discussing the topics
is not enough to move things forward.
Anyway Junio and Jonathan Tan also asked me questions about how my
work relates to Jonathan's, so I will reply to them hopefully soon...

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 12/40] t0400: add 'put_raw_obj' instruction to odb-helper script
  2017-09-10 12:12   ` Lars Schneider
@ 2017-09-14  7:09     ` Christian Couder
  0 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-09-14  7:09 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Eric Wong, Christian Couder

On Sun, Sep 10, 2017 at 2:12 PM, Lars Schneider
<larsxschneider@gmail.com> wrote:
>
>> On 03 Aug 2017, at 10:18, Christian Couder <christian.couder@gmail.com> wrote:
>>
>> To properly test passing objects from Git to an external odb
>> we need an odb-helper script that supports a 'put'
>> capability/instruction.
>>
>> For now we will support only sending raw blobs, so the
>> supported capability/instruction will be 'put_raw_obj'.
>
> What other kind of blobs do you imagine could we send?

As for the get instructions there could be 'put_git_obj' to send blobs
in the Git format and 'put_direct' to have the helper read directly
from the Git object store.

>> +put_raw_obj)
>> +     sha1="$2"
>> +     size="$3"
>> +     kind="$4"
>> +     writen=$(git hash-object -w -t "$kind" --stdin)
>> +     test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen '$writen'"
>
> Typo? Should it be "written"?

Yeah, thanks. It's fixed in the current version.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 11/40] odb-helper: add odb_helper_init() to send 'init' instruction
  2017-09-10 12:12   ` Lars Schneider
@ 2017-09-14  7:18     ` Christian Couder
  0 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-09-14  7:18 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, Junio C Hamano, Jeff King, Ben Peart, Jonathan Tan,
	Nguyen Thai Ngoc Duy, Mike Hommey, Eric Wong, Christian Couder

On Sun, Sep 10, 2017 at 2:12 PM, Lars Schneider
<larsxschneider@gmail.com> wrote:
>
>> On 03 Aug 2017, at 10:18, Christian Couder <christian.couder@gmail.com> wrote:
>>
>> +static void parse_capabilities(char *cap_buf,
>> +                            unsigned int *supported_capabilities,
>> +                            const char *process_name)
>> +{
>> +     struct string_list cap_list = STRING_LIST_INIT_NODUP;
>> +
>> +     string_list_split_in_place(&cap_list, cap_buf, '=', 1);
>> +
>> +     if (cap_list.nr == 2 && !strcmp(cap_list.items[0].string, "capability")) {
>> +             const char *cap_name = cap_list.items[1].string;
>> +
>> +             if (!strcmp(cap_name, "get_git_obj")) {
>> +                     *supported_capabilities |= ODB_HELPER_CAP_GET_GIT_OBJ;
>> +             } else if (!strcmp(cap_name, "get_raw_obj")) {
>> +                     *supported_capabilities |= ODB_HELPER_CAP_GET_RAW_OBJ;
>> +             } else if (!strcmp(cap_name, "get_direct")) {
>> +                     *supported_capabilities |= ODB_HELPER_CAP_GET_DIRECT;
>> +             } else if (!strcmp(cap_name, "put_git_obj")) {
>> +                     *supported_capabilities |= ODB_HELPER_CAP_PUT_GIT_OBJ;
>> +             } else if (!strcmp(cap_name, "put_raw_obj")) {
>> +                     *supported_capabilities |= ODB_HELPER_CAP_PUT_RAW_OBJ;
>> +             } else if (!strcmp(cap_name, "put_direct")) {
>> +                     *supported_capabilities |= ODB_HELPER_CAP_PUT_DIRECT;
>> +             } else if (!strcmp(cap_name, "have")) {
>> +                     *supported_capabilities |= ODB_HELPER_CAP_HAVE;
>> +             } else {
>> +                     warning("external process '%s' requested unsupported read-object capability '%s'",
>> +                             process_name, cap_name);
>> +             }
>
> In 1514c8ed ("convert: refactor capabilities negotiation", 2017-06-30) I introduced
> a simpler version of the capabilities negotiation. Maybe useful for you here, too?

Yeah, actually there is also fa64a2fdbe (sub-process: refactor
handshake to common function, 2017-07-26) that Jonathan Tan wrote on
top of your changes and that adds subprocess_handshake(). So the
current code is using it like that:

static int start_object_process_fn(struct subprocess_entry *subprocess)
{
    static int versions[] = {1, 0};
    static struct subprocess_capability capabilities[] = {
        { "get_git_obj", ODB_HELPER_CAP_GET_GIT_OBJ },
        { "get_raw_obj", ODB_HELPER_CAP_GET_RAW_OBJ },
        { "get_direct",  ODB_HELPER_CAP_GET_DIRECT  },
        { "put_git_obj", ODB_HELPER_CAP_PUT_GIT_OBJ },
        { "put_raw_obj", ODB_HELPER_CAP_PUT_RAW_OBJ },
        { "put_direct",  ODB_HELPER_CAP_PUT_DIRECT  },
        { "have",        ODB_HELPER_CAP_HAVE },
        { NULL, 0 }
    };
    struct object_process *entry = (struct object_process *)subprocess;
    return subprocess_handshake(subprocess, "git-read-object", versions, NULL,
                    capabilities,
                    &entry->supported_capabilities);
}

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 19/40] lib-httpd: add upload.sh
  2017-08-03 20:07   ` Junio C Hamano
@ 2017-09-14  7:43     ` Christian Couder
  0 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-09-14  7:43 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Thu, Aug 3, 2017 at 10:07 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Christian Couder <christian.couder@gmail.com> writes:
>
>> +OLDIFS="$IFS"
>> +IFS='&'
>> +set -- $QUERY_STRING
>> +IFS="$OLDIFS"
>> +
>> +while test $# -gt 0
>> +do
>> +    key=${1%=*}
>> +    val=${1#*=}
>
> When you see that ${V%X*} and ${V#*X} appear in a pair for the same
> variable V and same delimiter X, it almost always indicates a bug
> waiting to happen.
>
> What's the definition of "key" here?  A member of known set of short
> tokens, all of which consists only of alphanumeric, or something?

Yeah, the key can be only "sha1", "type", "size" or "delete" as can be
seen later in the code.

> Even if you do not currently plan to deal with a value with '=' in
> it, it may be prudent to double '%' above (and do not double '#').

Yeah I agree. Thanks for spotting this!

> Style: indent your shell script with tabs.

Sure.

>> +    case "$key" in
>> +     "sha1") sha1="$val" ;;
>> +     "type") type="$val" ;;
>> +     "size") size="$val" ;;
>> +     "delete") delete=1 ;;
>> +     *) echo >&2 "unknown key '$key'" ;;
>> +    esac
>
> Indent your shell script with tabs; case/esac and the labels used
> for each case arms all align at the same column.

Yeah, it will be properly indented in the version I will send soon.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 25/40] external-odb: add 'get_direct' support
  2017-08-03 21:40   ` Junio C Hamano
@ 2017-09-14  8:39     ` Christian Couder
  2017-09-14 18:19       ` Jonathan Tan
  0 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-09-14  8:39 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Thu, Aug 3, 2017 at 11:40 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Christian Couder <christian.couder@gmail.com> writes:
>
>> This implements the 'get_direct' capability/instruction that makes
>> it possible for external odb helper scripts to pass blobs to Git
>> by directly writing them as loose objects files.
>
> I am not sure if the assumption is made clear in this series, but I
> am (perhaps incorrectly) guessing that it is assumed that the
> intended use of this feature is to offload access to large blobs
> by not including them in the initial clone.

Yeah, it could be used for that, but that's not the only interesting use case.

It could also be used for example if the working tree contains a huge
number of blobs and it is better to download only the blobs that are
needed when they are needed. In fact the code for 'get_direct' was
taken from Ben Peart's "read-object" patch series (actually from an
earlier version of this patch series):

https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/

> So from that point of
> view, I think it makes tons of sense to let the external helper to
> directly populate the database bypassing Git (i.e. instead of
> feeding data stream and have Git store it) like this "direct" method
> does.
>
> How does this compare with (and how well does this work with) what
> Jonathan Tan is doing recently?

From the following email:

https://public-inbox.org/git/20170804145113.5ceafafa@twelve2.svl.corp.google.com/

it looks like his work is fundamentally about changing the rules of
connectivity checks. Objects are split between "homegrown" objects and
"imported" objects which are in separate pack files. Then references
to imported objects are not checked during connectivity check.

I think changing connectivity rules is not necessary to make something
like external odb work. For example when fetching a pack that refers
to objects that are in an external odb, if access this external odb
has been configured, then the connectivity check will pass as the
missing objects in the pack will be seen as already part of the repo.

Yeah, if some commands like fsck are used, then possibly all the
objects will have to be requested from the external odb, as it may not
be possible to fully check all the objects, especially the blobs,
without accessing all their data. But I think this is a problem that
could be dealt with in different ways. For example we could develop
specific options in fsck so that it doesn't check the sha1 of objects
that are marked with some specific attributes, or that are stored in
external odbs, or that are bigger than some size.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 13/40] external odb: add 'put_raw_obj' support
  2017-08-03 19:50   ` Junio C Hamano
@ 2017-09-14  9:17     ` Christian Couder
  0 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-09-14  9:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Thu, Aug 3, 2017 at 9:50 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Christian Couder <christian.couder@gmail.com> writes:
>
>> Add support for a 'put_raw_obj' capability/instruction to send new
>> objects to an external odb. Objects will be sent as they are (in
>> their 'raw' format). They will not be converted to Git objects.
>>
>> For now any new Git object (blob, tree, commit, ...) would be sent
>> if 'put_raw_obj' is supported by an odb helper. This is not a great
>> default, but let's leave it to following commits to tweak that.
>>
>> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
>> ---
>
> I thought in an earlier step that I saw this thing initialized in
> the codepath that adds alternate object stores, which are read-only
> places we "borrow" from.  Being able to write into it is good, but
> conceptually it no longer feels correct to initialize it from the
> alternate object database initialization codepath.
>
> Another way to say it is that an object store, whether it is local
> or external, is not "alt" if it will result in storing new objects
> we locally create.  It's just an extension of our local object
> store.

I guess you are talking about the following code in "[PATCH v5 10/40]
Add initial external odb support":

+void prepare_external_alt_odb(void)
+{
+       static int linked_external;
+       const char *path;
+
+       if (linked_external)
+               return;
+
+       path = external_odb_root();
+       if (!access(path, F_OK)) {
+               link_alt_odb_entry(path, NULL, 0, "");
+               linked_external = 1;
+       }
+}
+
 void prepare_alt_odb(void)
 {
        const char *alt;
@@ -650,6 +666,7 @@ void prepare_alt_odb(void)
        link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0);

        read_info_alternates(get_object_directory(), 0);
+       prepare_external_alt_odb();
 }

Would it be ok if I do the following:

- rename prepare_external_alt_odb() to just prepare_external_odb(), as
this would avoid confusion between alt_odbs and external odbs
- remove the call to prepare_external_odb() in prepare_alt_odb()
- add a prepare_alt_and_external_odb() that just calls
prepare_alt_odb() and then prepare_external_odb()
- replace all the calls to prepare_alt_odb() with calls to
prepare_alt_and_external_odb()

?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 14/40] external-odb: accept only blobs for now
  2017-08-03 19:52   ` Junio C Hamano
@ 2017-09-14  9:59     ` Christian Couder
  0 siblings, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-09-14  9:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Thu, Aug 3, 2017 at 9:52 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Christian Couder <christian.couder@gmail.com> writes:
>
>> The mechanism to decide which blobs should be sent to which
>> external object database will be very simple for now.
>> If the external odb helper support any "put_*" instruction
>> all the new blobs will be sent to it.
>>
>> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
>> ---
>>  external-odb.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/external-odb.c b/external-odb.c
>> index 82fac702e8..a4f8c72e1c 100644
>> --- a/external-odb.c
>> +++ b/external-odb.c
>> @@ -124,6 +124,10 @@ int external_odb_put_object(const void *buf, size_t len,
>>  {
>>       struct odb_helper *o;
>>
>> +     /* For now accept only blobs */
>> +     if (strcmp(type, "blob"))
>> +             return 1;
>> +
>
> I somehow doubt that a policy decision like this should be made at
> this layer.  Shouldn't it be encoded in the capability the other
> side supports, or determined at runtime per each individual object
> when a "put" is attempted (i.e. allow the other side to say "You
> tell me that you want me to store an object of type X and size Y;
> I cannot do that, sorry").

I agree that it would be conceptually better to be able to support
other kind of objects in external odb, but realistically most use
cases for 'get_*' and 'put_*' instructions are for storing/retrieving
blobs as other kind of objects are in specific formats that are well
supported by the current object store.

I also agree that it would be a nice feature if external odb could
decide by themselves which objects they accept and I really want to
leave the door open to a future improvement implementing that using
the capability mechanism or perhaps another mechanism. For now though
in "[PATCH v5 34/40] external-odb: use 'odb=magic' attribute to mark
odb blobs" the attribute system is used to decide which blobs are put
into which external odb as it is probably good enough for many use
cases. It is also simple to implement in Git and makes helpers simpler
to implement.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 10/40] Add initial external odb support
  2017-08-03 19:34   ` Junio C Hamano
  2017-08-03 20:17     ` Jeff King
@ 2017-09-14 10:14     ` Christian Couder
  1 sibling, 0 replies; 73+ messages in thread
From: Christian Couder @ 2017-09-14 10:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Ben Peart, Jonathan Tan, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Thu, Aug 3, 2017 at 9:34 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Christian Couder <christian.couder@gmail.com> writes:
>
>> diff --git a/external-odb.h b/external-odb.h
>> new file mode 100644
>> index 0000000000..9989490c9e
>> --- /dev/null
>> +++ b/external-odb.h
>> @@ -0,0 +1,8 @@
>> +#ifndef EXTERNAL_ODB_H
>> +#define EXTERNAL_ODB_H
>> +
>> +const char *external_odb_root(void);
>> +int external_odb_has_object(const unsigned char *sha1);
>> +int external_odb_get_object(const unsigned char *sha1);
>
> Even though ancient codebase of ours deliberately omitted them, I
> think our recent trend is to explicitly spell "extern " in headers.
>
>> diff --git a/odb-helper.h b/odb-helper.h
>> new file mode 100644
>> index 0000000000..5800661704
>> --- /dev/null
>> +++ b/odb-helper.h
>
> Likewise.

Ok, I am adding "extern " to the headers.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 25/40] external-odb: add 'get_direct' support
  2017-09-14  8:39     ` Christian Couder
@ 2017-09-14 18:19       ` Jonathan Tan
  2017-09-15 11:24         ` Christian Couder
  0 siblings, 1 reply; 73+ messages in thread
From: Jonathan Tan @ 2017-09-14 18:19 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, git, Jeff King, Ben Peart, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Thu, 14 Sep 2017 10:39:35 +0200
Christian Couder <christian.couder@gmail.com> wrote:

> From the following email:
> 
> https://public-inbox.org/git/20170804145113.5ceafafa@twelve2.svl.corp.google.com/
> 
> it looks like his work is fundamentally about changing the rules of
> connectivity checks. Objects are split between "homegrown" objects and
> "imported" objects which are in separate pack files. Then references
> to imported objects are not checked during connectivity check.
> 
> I think changing connectivity rules is not necessary to make something
> like external odb work. For example when fetching a pack that refers
> to objects that are in an external odb, if access this external odb
> has been configured, then the connectivity check will pass as the
> missing objects in the pack will be seen as already part of the repo.

There are still some nuances. For example, if an external ODB provides
both a tree and a blob that the tree references, do we fetch the tree in
order to call "have" on all its blobs, or do we trust the ODB that if it
has the tree, it has all the other objects? In my design, I do the
latter, but in the general case where we have multiple ODBs, we might
have to do the former. (And if we do the former, it seems to me that the
connectivity check must be performed "online" - that is, with the ODBs
being able to provide "get".)

(Also, my work extends all the way to fetch/clone [1], but admittedly I
have been taking it a step at a time and recently have only been
discussing how the local repo should handle the missing object
situation.)

[1] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@google.com/

> Yeah, if some commands like fsck are used, then possibly all the
> objects will have to be requested from the external odb, as it may not
> be possible to fully check all the objects, especially the blobs,
> without accessing all their data. But I think this is a problem that
> could be dealt with in different ways. For example we could develop
> specific options in fsck so that it doesn't check the sha1 of objects
> that are marked with some specific attributes, or that are stored in
> external odbs, or that are bigger than some size.

The hard part is in dealing with missing commits and trees, I think, not
blobs.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 25/40] external-odb: add 'get_direct' support
  2017-09-14 18:19       ` Jonathan Tan
@ 2017-09-15 11:24         ` Christian Couder
  2017-09-15 20:54           ` Jonathan Tan
  0 siblings, 1 reply; 73+ messages in thread
From: Christian Couder @ 2017-09-15 11:24 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Junio C Hamano, git, Jeff King, Ben Peart, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Thu, Sep 14, 2017 at 8:19 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Thu, 14 Sep 2017 10:39:35 +0200
> Christian Couder <christian.couder@gmail.com> wrote:
>
>> From the following email:
>>
>> https://public-inbox.org/git/20170804145113.5ceafafa@twelve2.svl.corp.google.com/
>>
>> it looks like his work is fundamentally about changing the rules of
>> connectivity checks. Objects are split between "homegrown" objects and
>> "imported" objects which are in separate pack files. Then references
>> to imported objects are not checked during connectivity check.
>>
>> I think changing connectivity rules is not necessary to make something
>> like external odb work. For example when fetching a pack that refers
>> to objects that are in an external odb, if access this external odb
>> has been configured, then the connectivity check will pass as the
>> missing objects in the pack will be seen as already part of the repo.
>
> There are still some nuances. For example, if an external ODB provides
> both a tree and a blob that the tree references, do we fetch the tree in
> order to call "have" on all its blobs, or do we trust the ODB that if it
> has the tree, it has all the other objects? In my design, I do the
> latter, but in the general case where we have multiple ODBs, we might
> have to do the former. (And if we do the former, it seems to me that the
> connectivity check must be performed "online" - that is, with the ODBs
> being able to provide "get".)

Yeah, I agree that the problem is more complex if there can be trees
or all kind of objects in the external odb.
But as I explain in the following email to Junio, I don't think
storing other kind of objects is one of the most interesting use case:

https://public-inbox.org/git/CAP8UFD3=nuTRF24CLSoK4HSGm3nxGh8SbZVpMCg7cNcHj2zkBA@mail.gmail.com/

> (Also, my work extends all the way to fetch/clone [1], but admittedly I
> have been taking it a step at a time and recently have only been
> discussing how the local repo should handle the missing object
> situation.)
>
> [1] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@google.com/

Yeah, I think your work is interesting and could perhaps be useful for
external odbs as there could be situations that would be handled
better using your work or something similar.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 25/40] external-odb: add 'get_direct' support
  2017-09-15 11:24         ` Christian Couder
@ 2017-09-15 20:54           ` Jonathan Tan
  0 siblings, 0 replies; 73+ messages in thread
From: Jonathan Tan @ 2017-09-15 20:54 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, git, Jeff King, Ben Peart, Nguyen Thai Ngoc Duy,
	Mike Hommey, Lars Schneider, Eric Wong, Christian Couder

On Fri, 15 Sep 2017 13:24:50 +0200
Christian Couder <christian.couder@gmail.com> wrote:

> > There are still some nuances. For example, if an external ODB provides
> > both a tree and a blob that the tree references, do we fetch the tree in
> > order to call "have" on all its blobs, or do we trust the ODB that if it
> > has the tree, it has all the other objects? In my design, I do the
> > latter, but in the general case where we have multiple ODBs, we might
> > have to do the former. (And if we do the former, it seems to me that the
> > connectivity check must be performed "online" - that is, with the ODBs
> > being able to provide "get".)
> 
> Yeah, I agree that the problem is more complex if there can be trees
> or all kind of objects in the external odb.
> But as I explain in the following email to Junio, I don't think
> storing other kind of objects is one of the most interesting use case:
> 
> https://public-inbox.org/git/CAP8UFD3=nuTRF24CLSoK4HSGm3nxGh8SbZVpMCg7cNcHj2zkBA@mail.gmail.com/

If we start with only blobs in the ODB, that makes sense (the ODB will
need to supply a fast enough "list" or "have", but, as you wrote before,
a mechanism like fetching an additional ref that contains all the
necessary information whenever we fetch refs would be enough). I agree
that it would work with existing use cases (including yours).

> > (Also, my work extends all the way to fetch/clone [1], but admittedly I
> > have been taking it a step at a time and recently have only been
> > discussing how the local repo should handle the missing object
> > situation.)
> >
> > [1] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@google.com/
> 
> Yeah, I think your work is interesting and could perhaps be useful for
> external odbs as there could be situations that would be handled
> better using your work or something similar.

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2017-09-15 20:54 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-03  9:18 [PATCH v5 00/40] Add initial experimental external ODB support Christian Couder
2017-08-03  9:18 ` [PATCH v5 01/40] builtin/clone: get rid of 'value' strbuf Christian Couder
2017-08-03  9:18 ` [PATCH v5 02/40] t0021/rot13-filter: refactor packet reading functions Christian Couder
2017-08-03  9:18 ` [PATCH v5 03/40] t0021/rot13-filter: improve 'if .. elsif .. else' style Christian Couder
2017-08-03  9:18 ` [PATCH v5 04/40] Add Git/Packet.pm from parts of t0021/rot13-filter.pl Christian Couder
2017-08-03 19:11   ` Junio C Hamano
2017-08-04  6:32     ` Christian Couder
2017-08-03  9:18 ` [PATCH v5 05/40] t0021/rot13-filter: use Git/Packet.pm Christian Couder
2017-08-03  9:18 ` [PATCH v5 06/40] Git/Packet.pm: improve error message Christian Couder
2017-08-03  9:18 ` [PATCH v5 07/40] Git/Packet.pm: add packet_initialize() Christian Couder
2017-08-03  9:18 ` [PATCH v5 08/40] Git/Packet.pm: add capability functions Christian Couder
2017-08-03 19:14   ` Junio C Hamano
2017-08-04 20:34     ` Christian Couder
2017-08-03  9:18 ` [PATCH v5 09/40] sha1_file: prepare for external odbs Christian Couder
2017-08-03  9:18 ` [PATCH v5 10/40] Add initial external odb support Christian Couder
2017-08-03 19:34   ` Junio C Hamano
2017-08-03 20:17     ` Jeff King
2017-09-14 10:14     ` Christian Couder
2017-08-03  9:18 ` [PATCH v5 11/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
2017-09-10 12:12   ` Lars Schneider
2017-09-14  7:18     ` Christian Couder
2017-08-03  9:18 ` [PATCH v5 12/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
2017-09-10 12:12   ` Lars Schneider
2017-09-14  7:09     ` Christian Couder
2017-08-03  9:18 ` [PATCH v5 13/40] external odb: add 'put_raw_obj' support Christian Couder
2017-08-03 19:50   ` Junio C Hamano
2017-09-14  9:17     ` Christian Couder
2017-08-03  9:19 ` [PATCH v5 14/40] external-odb: accept only blobs for now Christian Couder
2017-08-03 19:52   ` Junio C Hamano
2017-09-14  9:59     ` Christian Couder
2017-08-03  9:19 ` [PATCH v5 15/40] t0400: add test for external odb write support Christian Couder
2017-08-03  9:19 ` [PATCH v5 16/40] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
2017-08-03  9:19 ` [PATCH v5 17/40] Add t0410 to test external ODB transfer Christian Couder
2017-08-03  9:19 ` [PATCH v5 18/40] lib-httpd: pass config file to start_httpd() Christian Couder
2017-08-03  9:19 ` [PATCH v5 19/40] lib-httpd: add upload.sh Christian Couder
2017-08-03 20:07   ` Junio C Hamano
2017-09-14  7:43     ` Christian Couder
2017-08-03  9:19 ` [PATCH v5 20/40] lib-httpd: add list.sh Christian Couder
2017-08-03  9:19 ` [PATCH v5 21/40] lib-httpd: add apache-e-odb.conf Christian Couder
2017-08-03  9:19 ` [PATCH v5 22/40] odb-helper: add odb_helper_get_raw_object() Christian Couder
2017-08-03  9:19 ` [PATCH v5 23/40] pack-objects: don't pack objects in external odbs Christian Couder
2017-08-03  9:19 ` [PATCH v5 24/40] Add t0420 to test transfer to HTTP external odb Christian Couder
2017-08-03  9:19 ` [PATCH v5 25/40] external-odb: add 'get_direct' support Christian Couder
2017-08-03 21:40   ` Junio C Hamano
2017-09-14  8:39     ` Christian Couder
2017-09-14 18:19       ` Jonathan Tan
2017-09-15 11:24         ` Christian Couder
2017-09-15 20:54           ` Jonathan Tan
2017-08-03  9:19 ` [PATCH v5 26/40] odb-helper: add 'script_mode' to 'struct odb_helper' Christian Couder
2017-08-03  9:19 ` [PATCH v5 27/40] odb-helper: add init_object_process() Christian Couder
2017-08-03  9:19 ` [PATCH v5 28/40] Add t0450 to test 'get_direct' mechanism Christian Couder
2017-08-03  9:19 ` [PATCH v5 29/40] Add t0460 to test passing git objects Christian Couder
2017-08-03  9:19 ` [PATCH v5 30/40] odb-helper: add put_object_process() Christian Couder
2017-08-03  9:19 ` [PATCH v5 31/40] Add t0470 to test passing raw objects Christian Couder
2017-08-03  9:19 ` [PATCH v5 32/40] odb-helper: add have_object_process() Christian Couder
2017-08-03  9:19 ` [PATCH v5 33/40] Add t0480 to test "have" capability and raw objects Christian Couder
2017-08-03  9:19 ` [PATCH v5 34/40] external-odb: use 'odb=magic' attribute to mark odb blobs Christian Couder
2017-08-03  9:19 ` [PATCH v5 35/40] Add Documentation/technical/external-odb.txt Christian Couder
2017-08-03 18:38   ` Stefan Beller
2017-08-25  6:14     ` Christian Couder
2017-08-25 21:23       ` Jonathan Tan
2017-08-29  9:37         ` Christian Couder
2017-08-28 18:59   ` Ben Peart
2017-08-29 15:43     ` Christian Couder
2017-08-30 12:50       ` Ben Peart
2017-08-30 14:15         ` Christian Couder
2017-08-03  9:19 ` [PATCH v5 36/40] clone: add 'initial' param to write_remote_refs() Christian Couder
2017-08-03  9:19 ` [PATCH v5 37/40] clone: add --initial-refspec option Christian Couder
2017-08-03  9:19 ` [PATCH v5 38/40] clone: disable external odb before initial clone Christian Couder
2017-08-03  9:19 ` [PATCH v5 39/40] Add tests for 'clone --initial-refspec' Christian Couder
2017-08-03  9:19 ` [PATCH v5 40/40] Add t0430 to test cloning using bundles Christian Couder
2017-09-10 12:30 ` [PATCH v5 00/40] Add initial experimental external ODB support Lars Schneider
2017-09-14  7:02   ` Christian Couder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.