From: Ben Peart <peartben@gmail.com>
To: Stefan Beller <sbeller@google.com>,
Jonathan Tan <jonathantanmy@google.com>,
Lars Schneider <larsxschneider@gmail.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
Jonathan Nieder <jrnieder@gmail.com>,
Jeff Hostetler <git@jeffhostetler.com>,
Philip Oakley <philipoakley@iee.org>
Subject: Re: [RFC PATCH v2 4/4] sha1_file: support promised object hook
Date: Thu, 20 Jul 2017 16:58:16 -0400 [thread overview]
Message-ID: <75d5c3cd-c1d7-f06a-fc7e-894cde95afa7@gmail.com> (raw)
In-Reply-To: <CAGZ79ka6vcF4Douc7EizwL_+_xaorro=gVw=1hfJv56kvN+7oQ@mail.gmail.com>
On 7/20/2017 2:23 PM, Stefan Beller wrote:
> On Wed, Jul 19, 2017 at 5:21 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
>> Teach sha1_file to invoke a hook whenever an object is requested and
>> unavailable but is promised. The hook is a shell command that can be
>> configured through "git config"; this hook takes in a list of hashes and
>> writes (if successful) the corresponding objects to the repo's local
>> storage.
>>
>> The usage of the hook can be suppressed through a flag when invoking
>> has_object_file_with_flags() and other similar functions.
>> parse_or_promise_object() in object.c requires this functionality, and
>> has been modified to use it.
>>
>> This is meant as a temporary measure to ensure that all Git commands
>> work in such a situation. Future patches will update some commands to
>> either tolerate promised objects (without invoking the hook) or be more
>> efficient in invoking the promised objects hook.
I agree that making git more tolerant of promised objects if possible
and precomputing a list of promised objects required to complete a
particular command and downloading them with a single request are good
optimizations to add over time.
>>
>> In order to determine the code changes in sha1_file.c necessary, I
>> investigated the following:
>> (1) functions in sha1_file that take in a hash, without the user
>> regarding how the object is stored (loose or packed)
>> (2) functions in sha1_file that operate on packed objects (because I
>> need to check callers that know about the loose/packed distinction
>> and operate on both differently, and ensure that they can handle
>> the concept of objects that are neither loose nor packed)
>>
>> (1) is handled by the modification to sha1_object_info_extended().
>>
>> For (2), I looked at for_each_packed_object and at the packed-related
>> functions that take in a hash. For for_each_packed_object, the callers
>> either already work or are fixed in this patch:
>> - reachable - only to find recent objects
>> - builtin/fsck - already knows about promised objects
>> - builtin/cat-file - fixed in this commit
>>
>> Callers of the other functions do not need to be changed:
>> - parse_pack_index
>> - http - indirectly from http_get_info_packs
>> - find_pack_entry_one
>> - this searches a single pack that is provided as an argument; the
>> caller already knows (through other means) that the sought object
>> is in a specific pack
>> - find_sha1_pack
>> - fast-import - appears to be an optimization to not store a
>> file if it is already in a pack
>> - http-walker - to search through a struct alt_base
>> - http-push - to search through remote packs
>> - has_sha1_pack
>> - builtin/fsck - already knows about promised objects
>> - builtin/count-objects - informational purposes only (check if loose
>> object is also packed)
>> - builtin/prune-packed - check if object to be pruned is packed (if
>> not, don't prune it)
>> - revision - used to exclude packed objects if requested by user
>> - diff - just for optimization
>>
has_sha1_file also takes a hash "whether local or in an alternate object
database, and whether packed or loose" but never calls
sha1_object_info_extended. As a result, we had to add support in
check_and_freshen to download missing objects to get proper behavior in
all cases. I don't think this will work correctly without it.
>> An alternative design that I considered but rejected:
>>
>> - Adding a hook whenever a packed object is requested, not on any
>> object. That is, whenever we attempt to search the packfiles for an
>> object, if it is missing (from the packfiles and from the loose
>> object storage), to invoke the hook (which must then store it as a
>> packfile), open the packfile the hook generated, and report that the
>> object is found in that new packfile. This reduces the amount of
>> analysis needed (in that we only need to look at how packed objects
>> are handled), but requires that the hook generate packfiles (or for
>> sha1_file to pack whatever loose objects are generated), creating one
>> packfile for each missing object and potentially very many packfiles
>> that must be linearly searched. This may be tolerable now for repos
>> that only have a few missing objects (for example, repos that only
>> want to exclude large blobs), and might be tolerable in the future if
>> we have batching support for the most commonly used commands, but is
>> not tolerable now for repos that exclude a large amount of objects.
>>
>> Helped-by: Ben Peart <benpeart@microsoft.com>
>> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>> ---
>> Documentation/config.txt | 8 +
>> Documentation/gitrepository-layout.txt | 8 +
>> Documentation/technical/read-object-protocol.txt | 102 ++++++++++++
>> builtin/cat-file.c | 9 ++
>> cache.h | 2 +
>> object.c | 3 +-
>> promised-object.c | 194 +++++++++++++++++++++++
>> promised-object.h | 12 ++
>> sha1_file.c | 44 +++--
>> t/t3907-promised-object.sh | 32 ++++
>> t/t3907/read-object | 114 +++++++++++++
>> 11 files changed, 513 insertions(+), 15 deletions(-)
>> create mode 100644 Documentation/technical/read-object-protocol.txt
>> create mode 100755 t/t3907/read-object
>>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index d5c9c4cab..c293ac921 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -393,6 +393,14 @@ The default is false, except linkgit:git-clone[1] or linkgit:git-init[1]
>> will probe and set core.ignoreCase true if appropriate when the repository
>> is created.
>>
>> +core.promisedBlobCommand::
>> + If set, whenever a blob in the local repo is attempted to be read, but
>> + is both missing and a promised blob, invoke this shell command to
>> + generate or obtain that blob before reporting an error. This shell
>> + command should take one or more hashes, each terminated by a newline,
>> + as standard input, and (if successful) should write the corresponding
>> + objects to the local repo (packed or loose).
>> +
>> core.precomposeUnicode::
>> This option is only used by Mac OS implementation of Git.
>> When core.precomposeUnicode=true, Git reverts the unicode decomposition
>> diff --git a/Documentation/gitrepository-layout.txt b/Documentation/gitrepository-layout.txt
>> index f51ed4e37..7dea7fe6b 100644
>> --- a/Documentation/gitrepository-layout.txt
>> +++ b/Documentation/gitrepository-layout.txt
>> @@ -47,6 +47,10 @@ use with dumb transports but otherwise is OK as long as
>> `objects/info/alternates` points at the object stores it
>> borrows from.
>> +
>> +. You could have objects that are merely promised by another source.
>> +When Git requires those objects, it will invoke the command in the
>> +`extensions.promisedObjects` configuration variable.
>> ++
>> This directory is ignored if $GIT_COMMON_DIR is set and
>> "$GIT_COMMON_DIR/objects" will be used instead.
>>
>> @@ -91,6 +95,10 @@ objects/info/http-alternates::
>> this object store borrows objects from, to be used when
>> the repository is fetched over HTTP.
>>
>> +objects/promised::
>> + This file records the sha1 object names, types, and sizes of
>> + promised objects.
>> +
>> refs::
>> References are stored in subdirectories of this
>> directory. The 'git prune' command knows to preserve
>> diff --git a/Documentation/technical/read-object-protocol.txt b/Documentation/technical/read-object-protocol.txt
>> new file mode 100644
>> index 000000000..a893b46e7
>> --- /dev/null
>> +++ b/Documentation/technical/read-object-protocol.txt
>> @@ -0,0 +1,102 @@
>> +Read Object Process
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This protocol reads very similar to the protocol that is used in
> clean/smudge filtering, designed by Lars (cc'd)
>
>> +
>> +The read-object process enables Git to read all missing blobs with a
>> +single process invocation for the entire life of a single Git command.
>> +This is achieved by using a packet format (pkt-line, see technical/
>> +protocol-common.txt) based protocol over standard input and standard
>> +output as follows. All packets, except for the "*CONTENT" packets and
>> +the "0000" flush packet, are considered text and therefore are
>> +terminated by a LF.
>> +
>> +Git starts the process when it encounters the first missing object that
>> +needs to be retrieved. After the process is started, Git sends a welcome
>> +message ("git-read-object-client"), a list of supported protocol version
>> +numbers, and a flush packet. Git expects to read a welcome response
>> +message ("git-read-object-server"), exactly one protocol version number
>> +from the previously sent list, and a flush packet. All further
>> +communication will be based on the selected version.
>> +
>> +The remaining protocol description below documents "version=1". Please
>> +note that "version=42" in the example below does not exist and is only
>> +there to illustrate how the protocol would look with more than one
>> +version.
>> +
>> +After the version negotiation Git sends a list of all capabilities that
>> +it supports and a flush packet. Git expects to read a list of desired
>> +capabilities, which must be a subset of the supported capabilities list,
>> +and a flush packet as response:
>> +------------------------
>> +packet: git> git-read-object-client
>> +packet: git> version=1
>> +packet: git> version=42
>> +packet: git> 0000
>> +packet: git< git-read-object-server
>> +packet: git< version=1
>> +packet: git< 0000
>> +packet: git> capability=get
>> +packet: git> capability=have
>> +packet: git> capability=put
>> +packet: git> capability=not-yet-invented
>> +packet: git> 0000
>> +packet: git< capability=get
>> +packet: git< 0000
>> +------------------------
>> +The only supported capability in version 1 is "get".
>> +
>> +Afterwards Git sends a list of "key=value" pairs terminated with a flush
>> +packet. The list will contain at least the command (based on the
>> +supported capabilities) and the sha1 of the object to retrieve. Please
>> +note, that the process must not send any response before it received the
>> +final flush packet.
>> +
>> +When the process receives the "get" command, it should make the requested
>> +object available in the git object store and then return success. Git will
>> +then check the object store again and this time find it and proceed.
>> +------------------------
>> +packet: git> command=get
>> +packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
>> +packet: git> 0000
>> +------------------------
>> +
>> +The process is expected to respond with a list of "key=value" pairs
>> +terminated with a flush packet. If the process does not experience
>> +problems then the list must contain a "success" status.
>> +------------------------
>> +packet: git< status=success
>> +packet: git< 0000
>> +------------------------
>> +
>> +In case the process cannot or does not want to process the content, it
>> +is expected to respond with an "error" status.
>> +------------------------
>> +packet: git< status=error
>> +packet: git< 0000
>> +------------------------
>> +
>> +In case the process cannot or does not want to process the content as
>> +well as any future content for the lifetime of the Git process, then it
>> +is expected to respond with an "abort" status at any point in the
>> +protocol.
>> +------------------------
>> +packet: git< status=abort
>> +packet: git< 0000
>> +------------------------
>> +
>> +Git neither stops nor restarts the process in case the "error"/"abort"
>> +status is set.
>> +
>> +If the process dies during the communication or does not adhere to the
>> +protocol then Git will stop the process and restart it with the next
>> +object that needs to be processed.
>> +
>> +After the read-object process has processed an object it is expected to
>> +wait for the next "key=value" list containing a command. Git will close
>> +the command pipe on exit. The process is expected to detect EOF and exit
>> +gracefully on its own. Git will wait until the process has stopped.
>> +
>> +A long running read-object process demo implementation can be found in
>> +`contrib/long-running-read-object/example.pl` located in the Git core
>> +repository. If you develop your own long running process then the
>> +`GIT_TRACE_PACKET` environment variables can be very helpful for
>> +debugging (see linkgit:git[1]).
>> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
>> index 96b786e48..33f636926 100644
>> --- a/builtin/cat-file.c
>> +++ b/builtin/cat-file.c
>> @@ -12,6 +12,7 @@
>> #include "streaming.h"
>> #include "tree-walk.h"
>> #include "sha1-array.h"
>> +#include "promised-object.h"
>>
>> struct batch_options {
>> int enabled;
>> @@ -432,6 +433,13 @@ static int batch_packed_object(const struct object_id *oid,
>> return 0;
>> }
>>
>> +static int batch_promised_object(const struct object_id *oid,
>> + void *data)
>> +{
>> + oid_array_append(data, oid);
>> + return 0;
>> +}
>> +
>> static int batch_objects(struct batch_options *opt)
>> {
>> struct strbuf buf = STRBUF_INIT;
>> @@ -473,6 +481,7 @@ static int batch_objects(struct batch_options *opt)
>>
>> for_each_loose_object(batch_loose_object, &sa, 0);
>> for_each_packed_object(batch_packed_object, &sa, 0);
>> + for_each_promised_object(batch_promised_object, &sa);
>>
>> cb.opt = opt;
>> cb.expand = &data;
>> diff --git a/cache.h b/cache.h
>> index dd94b5ffc..75b71f38b 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -1835,6 +1835,8 @@ struct object_info {
>> #define OBJECT_INFO_SKIP_CACHED 4
>> /* Do not retry packed storage after checking packed and loose storage */
>> #define OBJECT_INFO_QUICK 8
>> +/* Ignore list of promised objects */
>> +#define OBJECT_INFO_IGNORE_PROMISES 16
>> extern int sha1_object_info_extended(const unsigned char *, struct object_info *, unsigned flags);
>> extern int packed_object_info(struct packed_git *pack, off_t offset, struct object_info *);
>>
>> diff --git a/object.c b/object.c
>> index 0aeb95084..23f2a6cbc 100644
>> --- a/object.c
>> +++ b/object.c
>> @@ -285,7 +285,8 @@ struct object *parse_or_promise_object(const struct object_id *oid)
>> {
>> enum object_type type;
>>
>> - if (has_object_file(oid))
>> + if (has_object_file_with_flags(oid, OBJECT_INFO_SKIP_CACHED |
>> + OBJECT_INFO_IGNORE_PROMISES))
>> return parse_object(oid);
>>
>> if (is_promised_object(oid, &type, NULL)) {
>> diff --git a/promised-object.c b/promised-object.c
>> index 487ade437..d8d95ebb2 100644
>> --- a/promised-object.c
>> +++ b/promised-object.c
>> @@ -2,6 +2,12 @@
>> #include "promised-object.h"
>> #include "sha1-lookup.h"
>> #include "strbuf.h"
>> +#include "run-command.h"
>> +#include "sha1-array.h"
>> +#include "config.h"
>> +#include "sigchain.h"
>> +#include "sub-process.h"
>> +#include "pkt-line.h"
>>
>> #define ENTRY_SIZE (GIT_SHA1_RAWSZ + 1 + 8)
>> /*
>> @@ -128,3 +134,191 @@ int fsck_promised_objects(void)
>> }
>> return 0;
>> }
>> +
>> +#define CAP_GET (1u<<0)
>> +
>> +static int subprocess_map_initialized;
>> +static struct hashmap subprocess_map;
>> +
>> +struct read_object_process {
>> + struct subprocess_entry subprocess;
>> + unsigned int supported_capabilities;
>> +};
>> +
>> +int start_read_object_fn(struct subprocess_entry *subprocess)
>> +{
>> + int err;
>> + struct read_object_process *entry = (struct read_object_process *)subprocess;
>> + struct child_process *process;
>> + struct string_list cap_list = STRING_LIST_INIT_NODUP;
>> + char *cap_buf;
>> + const char *cap_name;
>> +
>> + process = subprocess_get_child_process(&entry->subprocess);
>> +
>> + sigchain_push(SIGPIPE, SIG_IGN);
>> +
>> + err = packet_writel(process->in, "git-read-object-client", "version=1", NULL);
>> + if (err)
>> + goto done;
>> +
>> + err = strcmp(packet_read_line(process->out, NULL), "git-read-object-server");
>> + if (err) {
>> + error("external process '%s' does not support read-object protocol version 1", subprocess->cmd);
>> + goto done;
>> + }
>> + err = strcmp(packet_read_line(process->out, NULL), "version=1");
>> + if (err)
>> + goto done;
>> + err = packet_read_line(process->out, NULL) != NULL;
>> + if (err)
>> + goto done;
>> +
>> + err = packet_writel(process->in, "capability=get", NULL);
>> + if (err)
>> + goto done;
>> +
>> + for (;;) {
>> + cap_buf = packet_read_line(process->out, NULL);
>> + if (!cap_buf)
>> + break;
>> + string_list_split_in_place(&cap_list, cap_buf, '=', 1);
>> +
>> + if (cap_list.nr != 2 || strcmp(cap_list.items[0].string, "capability"))
>> + continue;
>> +
>> + cap_name = cap_list.items[1].string;
>> + if (!strcmp(cap_name, "get")) {
>> + entry->supported_capabilities |= CAP_GET;
>> + }
>> + else {
>> + warning(
>> + "external process '%s' requested unsupported read-object capability '%s'",
>> + subprocess->cmd, cap_name
>> + );
>> + }
>> +
>> + string_list_clear(&cap_list, 0);
>> + }
>> +
>> +done:
>> + sigchain_pop(SIGPIPE);
>> +
>> + if (err || errno == EPIPE)
>> + return err ? err : errno;
>> +
>> + return 0;
>> +}
>> +
>> +static int read_object_process(const unsigned char *sha1)
>> +{
>> + int err;
>> + struct read_object_process *entry;
>> + struct child_process *process;
>> + struct strbuf status = STRBUF_INIT;
>> + uint64_t start;
>> +
>> + start = getnanotime();
>> +
>> + if (!repository_format_promised_objects)
>> + die("BUG: if extensions.promisedObjects is not set, there "
>> + "should not be any promised objects");
>> +
>> + if (!subprocess_map_initialized) {
>> + subprocess_map_initialized = 1;
>> + hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
>> + entry = NULL;
>> + } else {
>> + entry = (struct read_object_process *)subprocess_find_entry(&subprocess_map, repository_format_promised_objects);
>> + }
>> + if (!entry) {
>> + entry = xmalloc(sizeof(*entry));
>> + entry->supported_capabilities = 0;
>> +
>> + if (subprocess_start(&subprocess_map, &entry->subprocess, repository_format_promised_objects, start_read_object_fn)) {
>> + free(entry);
>> + return -1;
>> + }
>> + }
>> + process = subprocess_get_child_process(&entry->subprocess);
>> +
>> + if (!(CAP_GET & entry->supported_capabilities))
>> + return -1;
>> +
>> + sigchain_push(SIGPIPE, SIG_IGN);
>> +
>> + err = packet_write_fmt_gently(process->in, "command=get\n");
>> + if (err)
>> + goto done;
>> +
>> + err = packet_write_fmt_gently(process->in, "sha1=%s\n", sha1_to_hex(sha1));
>> + if (err)
>> + goto done;
>> +
>> + err = packet_flush_gently(process->in);
>> + if (err)
>> + goto done;
>> +
>> + err = subprocess_read_status(process->out, &status);
>> + err = err ? err : strcmp(status.buf, "success");
>> +
>> +done:
>> + sigchain_pop(SIGPIPE);
>> +
>> + if (err || errno == EPIPE) {
>> + err = err ? err : errno;
>> + if (!strcmp(status.buf, "error")) {
>> + /* The process signaled a problem with the file. */
>> + }
>> + else if (!strcmp(status.buf, "abort")) {
>> + /*
>> + * The process signaled a permanent problem. Don't try to read
>> + * objects with the same command for the lifetime of the current
>> + * Git process.
>> + */
>> + entry->supported_capabilities &= ~CAP_GET;
>> + }
>> + else {
>> + /*
>> + * Something went wrong with the read-object process.
>> + * Force shutdown and restart if needed.
>> + */
>> + error("external process '%s' failed", repository_format_promised_objects);
>> + subprocess_stop(&subprocess_map, (struct subprocess_entry *)entry);
>> + free(entry);
>> + }
>> + }
>> +
>> + trace_performance_since(start, "read_object_process");
>> +
>> + return err;
>> +}
>> +
>> +int request_promised_objects(const struct oid_array *oids)
>> +{
>> + int oids_requested = 0;
>> + int i;
>> +
>> + for (i = 0; i < oids->nr; i++) {
>> + if (is_promised_object(&oids->oid[i], NULL, NULL))
>> + break;
>> + }
>> +
>> + if (i == oids->nr)
>> + /* Nothing to fetch */
>> + return 0;
>> +
>> + for (; i < oids->nr; i++) {
>> + if (is_promised_object(&oids->oid[i], NULL, NULL)) {
>> + read_object_process(oids->oid[i].hash);
>> + oids_requested++;
>> + }
>> + }
>> +
>> + /*
>> + * The command above may have updated packfiles, so update our record
>> + * of them.
>> + */
>> + reprepare_packed_git();
>> + return oids_requested;
>> +}
>> diff --git a/promised-object.h b/promised-object.h
>> index 7eaedff17..8ad47aa4c 100644
>> --- a/promised-object.h
>> +++ b/promised-object.h
>> @@ -2,6 +2,7 @@
>> #define PROMISED_OBJECT_H
>>
>> #include "cache.h"
>> +#include "sha1-array.h"
>>
>> /*
>> * Returns 1 if oid is the name of a promised object. For non-blobs, 0 is
>> @@ -19,4 +20,15 @@ int for_each_promised_object(each_promised_object_fn, void *);
>> */
>> int fsck_promised_objects(void);
>>
>> +/*
>> + * If any of the given objects are promised objects, invokes
>> + * core.promisedobjectcommand with those objects and returns the number of
>> + * objects requested. No check is made as to whether the invocation actually
>> + * populated the repository with the promised objects.
>> + *
>> + * If none of the given objects are promised objects, this function does not
>> + * invoke anything and returns 0.
>> + */
>> +int request_promised_objects(const struct oid_array *oids);
>> +
>> #endif
>> diff --git a/sha1_file.c b/sha1_file.c
>> index 5862386cd..ded0ef46b 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -28,6 +28,11 @@
>> #include "list.h"
>> #include "mergesort.h"
>> #include "quote.h"
>> +#include "iterator.h"
>> +#include "dir-iterator.h"
>> +#include "sha1-lookup.h"
>> +#include "promised-object.h"
>> +#include "sha1-array.h"
>>
>> #define SZ_FMT PRIuMAX
>> static inline uintmax_t sz_fmt(size_t s) { return s; }
>> @@ -2983,6 +2988,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
>> const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
>> lookup_replace_object(sha1) :
>> sha1;
>> + int already_retried = 0;
>>
>> if (!oi)
>> oi = &blank_oi;
>> @@ -3007,30 +3013,40 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
>> }
>> }
>>
>> - if (!find_pack_entry(real, &e)) {
>> - /* Most likely it's a loose object. */
>> - if (!sha1_loose_object_info(real, oi, flags)) {
>> - oi->whence = OI_LOOSE;
>> - return 0;
>> - }
>> +retry:
>> + if (find_pack_entry(real, &e))
>> + goto found_packed;
>>
>> - /* Not a loose object; someone else may have just packed it. */
>> - if (flags & OBJECT_INFO_QUICK) {
>> - return -1;
>> - } else {
>> - reprepare_packed_git();
>> - if (!find_pack_entry(real, &e))
>> - return -1;
>> + /* Most likely it's a loose object. */
>> + if (!sha1_loose_object_info(real, oi, flags)) {
>> + oi->whence = OI_LOOSE;
>> + return 0;
>> + }
>> +
>> + /* Not a loose object; someone else may have just packed it. */
>> + reprepare_packed_git();
>> + if (find_pack_entry(real, &e))
>> + goto found_packed;
>> +
>> + /* Check if it is a promised blob */
>> + if (!already_retried && !(flags & OBJECT_INFO_IGNORE_PROMISES)) {
>> + struct oid_array promised = OID_ARRAY_INIT;
>> + oid_array_append_sha1(&promised, real);
>> + if (request_promised_objects(&promised)) {
>> + already_retried = 1;
>> + goto retry;
>> }
>> }
>>
>> + return -1;
>> +
>> +found_packed:
>> if (oi == &blank_oi)
>> /*
>> * We know that the caller doesn't actually need the
>> * information below, so return early.
>> */
>> return 0;
>> -
>> rtype = packed_object_info(e.p, e.offset, oi);
>> if (rtype < 0) {
>> mark_bad_packed_object(e.p, real);
>> diff --git a/t/t3907-promised-object.sh b/t/t3907-promised-object.sh
>> index 3e0caf4f9..d9e6a6486 100755
>> --- a/t/t3907-promised-object.sh
>> +++ b/t/t3907-promised-object.sh
>> @@ -38,4 +38,36 @@ test_expect_success '...but fails again with GIT_IGNORE_PROMISED_OBJECTS' '
>> unset GIT_IGNORE_PROMISED_OBJECTS
>> '
>>
>> +test_expect_success 'sha1_object_info_extended (through git cat-file)' '
>> + test_create_repo server &&
>> + test_commit -C server 1 1.t abcdefgh &&
>> + HASH=$(git hash-object server/1.t) &&
>> +
>> + test_create_repo client &&
>> + test_must_fail git -C client cat-file -p "$HASH"
>> +'
>> +
>> +test_expect_success '...succeeds if it is a promised object' '
>> + printf "%s03%016x" "$HASH" "$(wc -c <server/1.t)" |
>> + hex_pack >client/.git/objects/promised &&
>> + git -C client config core.repositoryformatversion 1 &&
>> + git -C client config extensions.promisedobjects \
>> + "\"$TEST_DIRECTORY/t3907/read-object\" \"$(pwd)/server/.git\"" &&
>> + git -C client cat-file -p "$HASH"
>> +'
>> +
>> +test_expect_success 'cat-file --batch-all-objects with promised objects' '
>> + rm -rf client &&
>> + test_create_repo client &&
>> + git -C client config core.repositoryformatversion 1 &&
>> + git -C client config extensions.promisedobjects \
>> + "\"$TEST_DIRECTORY/t3907/read-object\" \"$(pwd)/server/.git\"" &&
>> + printf "%s03%016x" "$HASH" "$(wc -c <server/1.t)" |
>> + hex_pack >client/.git/objects/promised &&
>> +
>> + # Verify that the promised object is printed
>> + git -C client cat-file --batch --batch-all-objects | tee out |
>> + grep abcdefgh
>> +'
>> +
>> test_done
>> diff --git a/t/t3907/read-object b/t/t3907/read-object
>> new file mode 100755
>> index 000000000..9666ad597
>> --- /dev/null
>> +++ b/t/t3907/read-object
>> @@ -0,0 +1,114 @@
>> +#!/usr/bin/perl
>> +#
>> +# Example implementation for the Git read-object protocol version 1
>> +# See Documentation/technical/read-object-protocol.txt
>> +#
>> +# Allows you to test the ability for blobs to be pulled from a host git repo
>> +# "on demand." Called when git needs a blob it couldn't find locally due to
>> +# a lazy clone that only cloned the commits and trees.
>> +#
>> +# A lazy clone can be simulated via the following commands from the host repo
>> +# you wish to create a lazy clone of:
>> +#
>> +# cd /host_repo
>> +# git rev-parse HEAD
>> +# git init /guest_repo
>> +# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
>> +# cut -d' ' -f1 | git pack-objects /guest_repo/.git/objects/pack/noblobs
>> +# cd /guest_repo
>> +# git config core.virtualizeobjects true
>> +# git reset --hard <sha from rev-parse call above>
>> +#
>> +# Please note, this sample is a minimal skeleton. No proper error handling
>> +# was implemented.
>> +#
>> +
>> +use strict;
>> +use warnings;
>> +
>> +#
>> +# Point $DIR to the folder where your host git repo is located so we can pull
>> +# missing objects from it
>> +#
>> +my $DIR = $ARGV[0];
>> +
>> +sub packet_bin_read {
>> + my $buffer;
>> + my $bytes_read = read STDIN, $buffer, 4;
>> + if ( $bytes_read == 0 ) {
>> +
>> + # EOF - Git stopped talking to us!
>> + exit();
>> + }
>> + elsif ( $bytes_read != 4 ) {
>> + die "invalid packet: '$buffer'";
>> + }
>> + my $pkt_size = hex($buffer);
>> + if ( $pkt_size == 0 ) {
>> + return ( 1, "" );
>> + }
>> + elsif ( $pkt_size > 4 ) {
>> + my $content_size = $pkt_size - 4;
>> + $bytes_read = read STDIN, $buffer, $content_size;
>> + if ( $bytes_read != $content_size ) {
>> + die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
>> + }
>> + return ( 0, $buffer );
>> + }
>> + else {
>> + die "invalid packet size: $pkt_size";
>> + }
>> +}
>> +
>> +sub packet_txt_read {
>> + my ( $res, $buf ) = packet_bin_read();
>> + unless ( $buf =~ s/\n$// ) {
>> + die "A non-binary line MUST be terminated by an LF.";
>> + }
>> + return ( $res, $buf );
>> +}
>> +
>> +sub packet_bin_write {
>> + my $buf = shift;
>> + print STDOUT sprintf( "%04x", length($buf) + 4 );
>> + print STDOUT $buf;
>> + STDOUT->flush();
>> +}
>> +
>> +sub packet_txt_write {
>> + packet_bin_write( $_[0] . "\n" );
>> +}
>> +
>> +sub packet_flush {
>> + print STDOUT sprintf( "%04x", 0 );
>> + STDOUT->flush();
>> +}
>> +
>> +( packet_txt_read() eq ( 0, "git-read-object-client" ) ) || die "bad initialize";
>> +( packet_txt_read() eq ( 0, "version=1" ) ) || die "bad version";
>> +( packet_bin_read() eq ( 1, "" ) ) || die "bad version end";
>> +
>> +packet_txt_write("git-read-object-server");
>> +packet_txt_write("version=1");
>> +packet_flush();
>> +
>> +( packet_txt_read() eq ( 0, "capability=get" ) ) || die "bad capability";
>> +( packet_bin_read() eq ( 1, "" ) ) || die "bad capability end";
>> +
>> +packet_txt_write("capability=get");
>> +packet_flush();
>> +
>> +while (1) {
>> + my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
>> +
>> + if ( $command eq "get" ) {
>> + my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
>> + packet_bin_read();
>> +
>> + system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | git -c core.virtualizeobjects=false hash-object -w --stdin >/dev/null 2>&1');
>> + packet_txt_write(($?) ? "status=error" : "status=success");
>> + packet_flush();
>> + } else {
>> + die "bad command '$command'";
>> + }
>> +}
>> --
>> 2.14.0.rc0.284.gd933b75aa4-goog
>>
next prev parent reply other threads:[~2017-07-20 20:58 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-11 19:48 [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Jonathan Tan
2017-07-11 19:48 ` [RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs Jonathan Tan
2017-07-11 22:02 ` Stefan Beller
2017-07-19 23:37 ` Jonathan Tan
2017-07-12 17:29 ` Jeff Hostetler
2017-07-12 19:28 ` Jonathan Nieder
2017-07-13 14:48 ` Jeff Hostetler
2017-07-13 15:05 ` Jeff Hostetler
2017-07-13 19:39 ` Jonathan Tan
2017-07-14 20:03 ` Jeff Hostetler
2017-07-14 21:30 ` Jonathan Nieder
2017-07-11 19:48 ` [RFC PATCH 2/3] sha1-array: support appending unsigned char hash Jonathan Tan
2017-07-11 22:06 ` Stefan Beller
2017-07-19 23:56 ` Jonathan Tan
2017-07-20 0:06 ` Stefan Beller
2017-07-11 19:48 ` [RFC PATCH 3/3] sha1_file: add promised blob hook support Jonathan Tan
2017-07-11 22:38 ` Stefan Beller
2017-07-12 17:40 ` Ben Peart
2017-07-12 20:38 ` Jonathan Nieder
2017-07-16 15:23 ` [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Philip Oakley
2017-07-17 17:43 ` Ben Peart
2017-07-25 20:48 ` Philip Oakley
2017-07-17 18:03 ` Jonathan Nieder
2017-07-29 12:51 ` Philip Oakley
2017-07-20 0:21 ` [RFC PATCH v2 0/4] Partial clone: promised objects (not only blobs) Jonathan Tan
2017-07-20 0:21 ` [RFC PATCH v2 1/4] object: remove "used" field from struct object Jonathan Tan
2017-07-20 0:36 ` Stefan Beller
2017-07-20 0:55 ` Jonathan Tan
2017-07-20 17:44 ` Ben Peart
2017-07-20 21:20 ` Junio C Hamano
2017-07-20 0:21 ` [RFC PATCH v2 2/4] promised-object, fsck: introduce promised objects Jonathan Tan
2017-07-20 18:07 ` Stefan Beller
2017-07-20 19:17 ` Jonathan Tan
2017-07-20 19:58 ` Ben Peart
2017-07-20 21:13 ` Jonathan Tan
2017-07-21 16:24 ` Ben Peart
2017-07-21 20:33 ` Jonathan Tan
2017-07-25 15:10 ` Ben Peart
2017-07-29 13:26 ` Philip Oakley
2017-07-20 0:21 ` [RFC PATCH v2 3/4] sha1-array: support appending unsigned char hash Jonathan Tan
2017-07-20 0:21 ` [RFC PATCH v2 4/4] sha1_file: support promised object hook Jonathan Tan
2017-07-20 18:23 ` Stefan Beller
2017-07-20 20:58 ` Ben Peart [this message]
2017-07-20 21:18 ` Jonathan Tan
2017-07-21 16:27 ` Ben Peart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=75d5c3cd-c1d7-f06a-fc7e-894cde95afa7@gmail.com \
--to=peartben@gmail.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=larsxschneider@gmail.com \
--cc=philipoakley@iee.org \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).