All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: Missing blob hook might be invoked infinitely recursively
@ 2017-06-29 18:48 Jonathan Tan
  2017-06-29 19:24 ` Jonathan Nieder
  2017-06-30 18:28 ` Jeff Hostetler
  0 siblings, 2 replies; 3+ messages in thread
From: Jonathan Tan @ 2017-06-29 18:48 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler

As some of you may know, I'm currently working on support for partial
clones/fetches in Git (where blobs above a user-specified size threshold
are not downloaded - only their names and sizes are downloaded). To do
this, the client repository needs to be able to download blobs at will
whenever it needs a missing one (for example, upon checkout).

So I have done this by adding support for a hook in Git [1], and
updating the object-reading code in Git to, by default, automatically
invoke this hook whenever necessary. (This means that existing
subsystems will all work by default, in theory at least.) My current
design is for the hook to have maximum flexibility - when invoked with a
list of SHA-1s, it must merely ensure that those objects are in the
local repo, whether packed or loose.

I am also working on a command (fetch-blob) to be bundled with Git to be
used as a default hook, and here is where the problem lies.

Suppose you have missing blob AB12 and CD34 that you now need, so
fetch-blob is invoked. It sends the literals AB12 and CD34 to a new
server endpoint and obtains a packfile, which it then pipes through "git
index-pack". The issue is that "git index-pack" wants to try to access
AB12 and CD34 in the local repo in order to do a SHA-1 collision check,
and therefore fetch-blob is invoked once again, creating infinite
recursion.

This is straightforwardly fixed by making "git index-pack" understand
about missing blobs, but this might be a symptom of this approach being
error-prone (custom hooks that invoke any Git command must be extra
careful).

So I have thought of a few solutions, each with its pros and cons:

1. Require the hook to instead output a packfile to stdout. This means
that that hook no longer needs to access the local repo, and thus has
less dependence on Git commands. But this reduces the flexibility in
that its output must be packed, not loose. (This is fine for the use
cases I'm thinking of, but probably not so for others.)

2. Add support for an environment variable to Git that suppresses access
to the missing blob manifest, in effect, suppressing invocation of the
hook. This allows anyone (the person configuring Git or the hook writer)
to suppress this access, although they might need in-depth knowledge to
know whether the hook is meant to be run with such access suppressed or
required.

3. Like the above, except for a command-line argument to Git.

What do you think? Any solutions that I am missing?

[1] Work in progress, but you can see an earlier version here: https://public-inbox.org/git/b917a463f0ad4ce0ab115203b3f24894961a2e75.1497558851.git.jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RFC: Missing blob hook might be invoked infinitely recursively
  2017-06-29 18:48 RFC: Missing blob hook might be invoked infinitely recursively Jonathan Tan
@ 2017-06-29 19:24 ` Jonathan Nieder
  2017-06-30 18:28 ` Jeff Hostetler
  1 sibling, 0 replies; 3+ messages in thread
From: Jonathan Nieder @ 2017-06-29 19:24 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Jeff Hostetler

Hi,

Jonathan Tan wrote:

> Suppose you have missing blob AB12 and CD34 that you now need, so
> fetch-blob is invoked. It sends the literals AB12 and CD34 to a new
> server endpoint and obtains a packfile, which it then pipes through "git
> index-pack". The issue is that "git index-pack" wants to try to access
> AB12 and CD34 in the local repo in order to do a SHA-1 collision check,
> and therefore fetch-blob is invoked once again, creating infinite
> recursion.
[...]
> 2. Add support for an environment variable to Git that suppresses access
> to the missing blob manifest, in effect, suppressing invocation of the
> hook. This allows anyone (the person configuring Git or the hook writer)
> to suppress this access, although they might need in-depth knowledge to
> know whether the hook is meant to be run with such access suppressed or
> required.

Small tweak: what if Git itself sets that environment variable when
invoking the hook?  A fetch-blob hook author can then explicitly unset
the environment variable to request recursion if they need it.

I should credit Shawn Pearce for saying this a little more clearly
offline a moment ago.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RFC: Missing blob hook might be invoked infinitely recursively
  2017-06-29 18:48 RFC: Missing blob hook might be invoked infinitely recursively Jonathan Tan
  2017-06-29 19:24 ` Jonathan Nieder
@ 2017-06-30 18:28 ` Jeff Hostetler
  1 sibling, 0 replies; 3+ messages in thread
From: Jeff Hostetler @ 2017-06-30 18:28 UTC (permalink / raw)
  To: Jonathan Tan, git



On 6/29/2017 2:48 PM, Jonathan Tan wrote:
> As some of you may know, I'm currently working on support for partial
> clones/fetches in Git (where blobs above a user-specified size threshold
> are not downloaded - only their names and sizes are downloaded). To do
> this, the client repository needs to be able to download blobs at will
> whenever it needs a missing one (for example, upon checkout).
> 
> So I have done this by adding support for a hook in Git [1], and
> updating the object-reading code in Git to, by default, automatically
> invoke this hook whenever necessary. (This means that existing
> subsystems will all work by default, in theory at least.) My current
> design is for the hook to have maximum flexibility - when invoked with a
> list of SHA-1s, it must merely ensure that those objects are in the
> local repo, whether packed or loose.
> 
> I am also working on a command (fetch-blob) to be bundled with Git to be
> used as a default hook, and here is where the problem lies.
> 
> Suppose you have missing blob AB12 and CD34 that you now need, so
> fetch-blob is invoked. It sends the literals AB12 and CD34 to a new
> server endpoint and obtains a packfile, which it then pipes through "git
> index-pack". The issue is that "git index-pack" wants to try to access
> AB12 and CD34 in the local repo in order to do a SHA-1 collision check,
> and therefore fetch-blob is invoked once again, creating infinite
> recursion.
> 
> This is straightforwardly fixed by making "git index-pack" understand
> about missing blobs, but this might be a symptom of this approach being
> error-prone (custom hooks that invoke any Git command must be extra
> careful).

I'm not sure if this what you meant, but could you pass to index-pack
the set of missing blobs that you passed to fetch-blob?  That is, let
index-pack do it's normal lookups, but just not on the list passed in.

> 
> So I have thought of a few solutions, each with its pros and cons:
> 
> 1. Require the hook to instead output a packfile to stdout. This means
> that that hook no longer needs to access the local repo, and thus has
> less dependence on Git commands. But this reduces the flexibility in
> that its output must be packed, not loose. (This is fine for the use
> cases I'm thinking of, but probably not so for others.)
> 
> 2. Add support for an environment variable to Git that suppresses access
> to the missing blob manifest, in effect, suppressing invocation of the
> hook. This allows anyone (the person configuring Git or the hook writer)
> to suppress this access, although they might need in-depth knowledge to
> know whether the hook is meant to be run with such access suppressed or
> required.
> 
> 3. Like the above, except for a command-line argument to Git.
> 
> What do you think? Any solutions that I am missing?
> 
> [1] Work in progress, but you can see an earlier version here: https://public-inbox.org/git/b917a463f0ad4ce0ab115203b3f24894961a2e75.1497558851.git.jonathantanmy@google.com/
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-06-30 18:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-29 18:48 RFC: Missing blob hook might be invoked infinitely recursively Jonathan Tan
2017-06-29 19:24 ` Jonathan Nieder
2017-06-30 18:28 ` Jeff Hostetler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.