From: Ben Peart <peartben@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, gitster@pobox.com, benpeart@microsoft.com,
pclouds@gmail.com, christian.couder@gmail.com,
git@jeffhostetler.com
Subject: Re: [PATCH v2 1/1] sha1_file: Add support for downloading blobs on demand
Date: Mon, 17 Jul 2017 16:09:17 -0400 [thread overview]
Message-ID: <d3f1f884-7b8a-885f-47cb-eca2b8ef0ecf@gmail.com> (raw)
In-Reply-To: <20170717110602.6fac89ea@twelve2.svl.corp.google.com>
On 7/17/2017 2:06 PM, Jonathan Tan wrote:
> About the difference between this patch and my patch set [1], besides
> the fact that this patch does not spawn separate processes for each
> missing object, which does seem like an improvement to me, this patch
> (i) does not use a list of promised objects (but instead communicates
> with the hook for each missing object), and (ii) provides backwards
> compatibility with other Git code (that does not know about promised
> objects) in a different way.
>
> The costs and benefits of (i) are being discussed here [2]. As for (ii),
> I still think that my approach is better - I have commented more about
> this below.
>
> Maybe the best approach is a combination of both our approaches.
Yes, in the context of the promised objects model patch series, the
value of this patch series is primarily as a sample of how to use the
sub-process mechanism to create a versioned interface for retrieving
objects.
>
> [1] https://public-inbox.org/git/34efd9e9936fdab331655f5a33a098a72dc134f4.1499800530.git.jonathantanmy@google.com/
>
> [2] https://public-inbox.org/git/20170713123951.5cab1adc@twelve2.svl.corp.google.com/
>
> On Fri, 14 Jul 2017 09:26:51 -0400
> Ben Peart <peartben@gmail.com> wrote:
>
>> +------------------------
>> +packet: git> command=get
>> +packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
>> +packet: git> 0000
>> +------------------------
>
> It would be useful to have this command support more than one SHA-1, so
> that hooks that know how to batch can do so.
>
I agree. Since nothing was using that capability yet, I decided to keep
it simple and not add support for a feature that wasn't being used. The
reason the interface is versioned is so that if/when something does need
that capability, it can be added.
>> +static int subprocess_map_initialized;
>> +static struct hashmap subprocess_map;
>
> The documentation of "tablesize" in "struct hashmap" states that it can
> be used to check if the hashmap is initialized, so
> subprocess_map_initialized is probably unnecessary.
>
Nice. That will make things a little simpler.
>> static int check_and_freshen(const unsigned char *sha1, int freshen)
>> {
>> - return check_and_freshen_local(sha1, freshen) ||
>> - check_and_freshen_nonlocal(sha1, freshen);
>> + int ret;
>> + int already_retried = 0;
>> +
>> +retry:
>> + ret = check_and_freshen_local(sha1, freshen) ||
>> + check_and_freshen_nonlocal(sha1, freshen);
>> + if (!ret && core_virtualize_objects && !already_retried) {
>> + already_retried = 1;
>> + if (!read_object_process(sha1))
>> + goto retry;
>> + }
>> +
>> + return ret;
>> }
>
> Is this change meant to ensure that Git code that operates on loose
> objects directly (bypassing storage-agnostic functions such as
> sha1_object_info_extended() and has_sha1_file()) still work? If yes,
> this patch appears incomplete (for example, read_loose_object() needs to
> be changed too), and this seems like a difficult task - in my patch set
> [1], I ended up deciding to create a separate type of storage and
> instead looked at the code that operates on *packed* objects directly
> (because there were fewer such methods) to ensure that they would work
> correctly in the presence of a separate type of storage.
>
Yes, with this set of patches, we've been running successfully on
completely sparse clones (no commits, trees, or blobs) for several
months. read_loose_object() is only called by fsck when it is
enumerating existing loose objects so does not need to be updated.
We have a few thousand developers making ~100K commits per week so in
our particular usage, I'm fairly confident it works correctly. That
said, it is possible there is some code path I've missed. :)
> [1] https://public-inbox.org/git/34efd9e9936fdab331655f5a33a098a72dc134f4.1499800530.git.jonathantanmy@google.com/
>
next prev parent reply other threads:[~2017-07-17 20:09 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-14 13:26 [RFC/PATCH v2 0/1] Add support for downloading blobs on demand Ben Peart
2017-07-14 13:26 ` [PATCH v2 1/1] sha1_file: " Ben Peart
2017-07-14 15:18 ` Christian Couder
2017-07-17 18:06 ` Jonathan Tan
2017-07-17 20:09 ` Ben Peart [this message]
2017-07-17 23:24 ` Jonathan Tan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d3f1f884-7b8a-885f-47cb-eca2b8ef0ecf@gmail.com \
--to=peartben@gmail.com \
--cc=benpeart@microsoft.com \
--cc=christian.couder@gmail.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).