git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git mailing list <git@vger.kernel.org>,
	Mark Thomas <markbt@efaref.net>,
	Jeff Hostetler <git@jeffhostetler.com>,
	Kevin David <kevin.david@microsoft.com>
Subject: Re: Proposal for missing blob support in Git repos
Date: Tue, 2 May 2017 20:32:05 +0200	[thread overview]
Message-ID: <CACBZZX6jQtO_3zYjnvq0dhtWvUxb7vYLtQUWpFHLw1v-SteHcQ@mail.gmail.com> (raw)
In-Reply-To: <CAGf8dgK05+f4uX-8+iMFvQd0n2JP6YxJ18ag8uDaEH6qc6SgVQ@mail.gmail.com>

On Tue, May 2, 2017 at 7:21 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Mon, May 1, 2017 at 6:41 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Jonathan Tan <jonathantanmy@google.com> writes:
>>
>>> On 05/01/2017 04:29 PM, Junio C Hamano wrote:
>>>> Jonathan Tan <jonathantanmy@google.com> writes:
>>>>
>>>>> Thanks for your comments. If you're referring to the codepath
>>>>> involving write_sha1_file() (for example, builtin/hash-object ->
>>>>> index_fd or builtin/unpack-objects), that is fine because
>>>>> write_sha1_file() invokes freshen_packed_object() and
>>>>> freshen_loose_object() directly to check if the object already exists
>>>>> (and thus does not invoke the new mechanism in this patch).
>>>>
>>>> Is that a good thing, though?  It means that you an attacker can
>>>> feed one version to the remote object store your "grab blob" hook
>>>> gets the blobs from, and have you add a colliding object locally,
>>>> and the usual "are we recording the same object as existing one?"
>>>> check is bypassed.
>>>
>>> If I understand this correctly, what you mean is the situation where
>>> the hook adds an object to the local repo, overriding another object
>>> of the same name?
>>
>> No.
>>
>> write_sha1_file() pays attention to objects already in the local
>> object store to avoid hash collisions that can be used to replace a
>> known-to-be-good object and that is done as a security measure.
>> What I am reading in your response was that this new mechanism
>> bypasses that, and I was wondering if that is a good thing.
>
> Oh, what I meant was that write_sha1_file() bypasses the new
> mechanism, not that the new mechanism bypasses the checks in
> write_sha1_file().
>
> To be clear, here's what happens when write_sha1_file() is invoked
> (before and after this patch - this patch does not affect
> write_sha1_file at all):
> 1. (some details omitted)
> 2. call freshen_packed_object
> 3, call freshen_loose_object if necessary
> 4. write object (if freshen_packed_object and freshen_loose_object do
> not both return 0)
>
> Nothing changes in this patch (whether a hook is defined or not).

But don't the semantics change in the sense that before
core.missingBlobCommand you couldn't write a new blob SHA1 that was
already part of your history, whereas with this change
write_sha1_file() might write what it considers to be a new blob, but
it's actually colliding with an existing blob, but write_sha1_file()
doesn't know that because knowing would involve asking the hook to
fetch the blob?

> And here's what happens when has_sha1_file (or another function listed
> in the commit message) is invoked:
> 1. check for existence of packed object of the requested name
> 2. check for existence of loose object of the requested name
> 3. check again for existence of packed object of the requested name
> 4. if a hook is defined, invoke the hook and repeat 1-3
>
> Here, in step 4, the hook could do whatever it wants to the repository.

This might be a bit of early bikeshedding, but then again the lack of
early bikeshedding tends to turn into standards.

Wouldn't it be better to name this core.missingObjectCommand & have
the hook take a list on stdin like:

    <id> <TAB> <object_type> <TAB> <object_id> <TAB> <request_type> <TAB> [....]

And have the hook respond:

    <id> <TAB> <status> <TAB> [....]

I.e. what you'd do now is send this to the hook:

    1 <TAB> blob <TAB> <sha1> <TAB> missing

And the hook would respond:

    1 <TAB> ok

But this leaves open the door addressing this potential edge case with
writing new blobs in the future, i.e. write_sha1_file() could call it
as:

    1 <TAB> blob <TAB> <sha1> <TAB> new

And the hook could either respond immediately as:

    1 <TAB> ok

If it's in some #YOLO mode where it's not going to check for colliding
blobs over the network, or alternatively & ask the parent repo if it
has those blobs, and if so print:

    1 <TAB> collision

Or something like that.

This also enables future lazy loading of trees/commits from the same
API, and for the hook to respond out-of-order to the input it gets as
it can, since each request is prefixed with an incrementing request
id.

  reply	other threads:[~2017-05-02 18:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-26 22:13 Proposal for missing blob support in Git repos Jonathan Tan
2017-05-01  3:57 ` Junio C Hamano
2017-05-01 19:12   ` Jonathan Tan
2017-05-01 23:29     ` Junio C Hamano
2017-05-02  0:33       ` Jonathan Tan
2017-05-02  0:38         ` Brandon Williams
2017-05-02  1:41         ` Junio C Hamano
2017-05-02 17:21           ` Jonathan Tan
2017-05-02 18:32             ` Ævar Arnfjörð Bjarmason [this message]
2017-05-02 21:45               ` Jonathan Tan
2017-05-04  4:29                 ` Junio C Hamano
2017-05-04 17:09                   ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACBZZX6jQtO_3zYjnvq0dhtWvUxb7vYLtQUWpFHLw1v-SteHcQ@mail.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=kevin.david@microsoft.com \
    --cc=markbt@efaref.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).