archive mirror
 help / color / mirror / Atom feed
From: Ben Peart <>
To: Jonathan Tan <>
Subject: Re: [PATCH v2 1/1] sha1_file: Add support for downloading blobs on demand
Date: Mon, 17 Jul 2017 16:09:17 -0400	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 7/17/2017 2:06 PM, Jonathan Tan wrote:
> About the difference between this patch and my patch set [1], besides
> the fact that this patch does not spawn separate processes for each
> missing object, which does seem like an improvement to me, this patch
> (i) does not use a list of promised objects (but instead communicates
> with the hook for each missing object), and (ii) provides backwards
> compatibility with other Git code (that does not know about promised
> objects) in a different way.
> The costs and benefits of (i) are being discussed here [2]. As for (ii),
> I still think that my approach is better - I have commented more about
> this below.
> Maybe the best approach is a combination of both our approaches.

Yes, in the context of the promised objects model patch series, the 
value of this patch series is primarily as a sample of how to use the 
sub-process mechanism to create a versioned interface for retrieving 

> [1]
> [2]
> On Fri, 14 Jul 2017 09:26:51 -0400
> Ben Peart <> wrote:
>> +------------------------
>> +packet: git> command=get
>> +packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
>> +packet: git> 0000
>> +------------------------
> It would be useful to have this command support more than one SHA-1, so
> that hooks that know how to batch can do so.

I agree.  Since nothing was using that capability yet, I decided to keep 
it simple and not add support for a feature that wasn't being used. The 
reason the interface is versioned is so that if/when something does need 
that capability, it can be added.

>> +static int subprocess_map_initialized;
>> +static struct hashmap subprocess_map;
> The documentation of "tablesize" in "struct hashmap" states that it can
> be used to check if the hashmap is initialized, so
> subprocess_map_initialized is probably unnecessary.

Nice.  That will make things a little simpler.

>>   static int check_and_freshen(const unsigned char *sha1, int freshen)
>>   {
>> -	return check_and_freshen_local(sha1, freshen) ||
>> -	       check_and_freshen_nonlocal(sha1, freshen);
>> +	int ret;
>> +	int already_retried = 0;
>> +
>> +retry:
>> +	ret = check_and_freshen_local(sha1, freshen) ||
>> +		check_and_freshen_nonlocal(sha1, freshen);
>> +	if (!ret && core_virtualize_objects && !already_retried) {
>> +		already_retried = 1;
>> +		if (!read_object_process(sha1))
>> +			goto retry;
>> +	}
>> +
>> +	return ret;
>>   }
> Is this change meant to ensure that Git code that operates on loose
> objects directly (bypassing storage-agnostic functions such as
> sha1_object_info_extended() and has_sha1_file()) still work? If yes,
> this patch appears incomplete (for example, read_loose_object() needs to
> be changed too), and this seems like a difficult task - in my patch set
> [1], I ended up deciding to create a separate type of storage and
> instead looked at the code that operates on *packed* objects directly
> (because there were fewer such methods) to ensure that they would work
> correctly in the presence of a separate type of storage.

Yes, with this set of patches, we've been running successfully on 
completely sparse clones (no commits, trees, or blobs) for several 
months.  read_loose_object() is only called by fsck when it is 
enumerating existing loose objects so does not need to be updated.

We have a few thousand developers making ~100K commits per week so in 
our particular usage, I'm fairly confident it works correctly.  That 
said, it is possible there is some code path I've missed. :)

> [1]

  reply	other threads:[~2017-07-17 20:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-14 13:26 [RFC/PATCH v2 0/1] Add support for downloading blobs on demand Ben Peart
2017-07-14 13:26 ` [PATCH v2 1/1] sha1_file: " Ben Peart
2017-07-14 15:18   ` Christian Couder
2017-07-17 18:06   ` Jonathan Tan
2017-07-17 20:09     ` Ben Peart [this message]
2017-07-17 23:24       ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).