All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: peff@peff.net
Cc: jonathantanmy@google.com, git@vger.kernel.org
Subject: Re: [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases
Date: Thu, 16 May 2019 11:26:46 -0700	[thread overview]
Message-ID: <20190516182646.173332-1-jonathantanmy@google.com> (raw)
In-Reply-To: <20190515231617.GA1395@sigill.intra.peff.net>

> On Tue, May 14, 2019 at 02:10:55PM -0700, Jonathan Tan wrote:
> 
> > Support for lazy fetching should still generally be turned off in
> > index-pack because it is used as part of the lazy fetching process
> > itself (if not, infinite loops may occur), but we do need to fetch the
> > REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
> > those are REF_DELTA themselves, because we do not send "have" when
> > making such fetches.)
> 
> I agree that the current implementation (and probably any sane
> implementation) would not send us a delta if we have not provided any
> haves. But this does mean that a malicious server could send a client
> into an infinite loop.
> 
> Pretty unlikely, but should we put some kind of circuit-breaker into the
> client to ensure this?

I thought of this - such a server could, but it seems to me that it
would be similar to a server streaming random bytes to us without
stopping (which is already possible).

> > To resolve this, prefetch all missing REF_DELTA bases before attempting
> > to resolve them. This both ensures that all bases are attempted to be
> > fetched, and ensures that we make only one request per index-pack
> > invocation, and not one request per missing object.
> 
> Ah, but now things get more tricky.
> 
> You are assuming that the server does not ever send a REF_DELTA unless
> the base object is not present in the pack (it would use OFS_DELTA
> instead). If we imagine a server which did, then there are two
> implications:
> 
>   1. We might pre-fetch a full copy of an object that we don't need.
>      It's just that it's stored as a delta in the pack which we are
>      currently indexing.
> 
>   2. If we pre-fetch multiple objects, some of them may be REF_DELTAs
>      against each other, leading to an infinite loop.
> 
> Off the top of my head, I am pretty sure your assumption holds for all
> versions of Git that support delta-base-offset[1]. But that feels a lot
> less certain to me. I could imagine an alternate server implementation,
> for example, that is gluing together packs and does not try hard to
> order the base before the delta, which would require it to use REF_DELTA
> instead of OFS_DELTA.

A cursory glance makes me think that REF_DELTA against a base object
also in the pack is already correctly handled. Right before the
invocation of conclude_pack() (which calls fix_unresolved_deltas(), the
function I modified), resolve_deltas() is invoked. The latter invokes
resolve_base() (directly or through threaded_second_pass()) which
invokes find_unresolved_deltas(), which invokes
find_unresolved_deltas_1(), which seems to handle both REF_DELTA and
OFS_DELTA.

Snipping the rest as I don't think we need to solve those if we can
handle REF_DELTA being against an object in a pack, but let me know if
you think that some of the points there still need to be addressed.

  parent reply	other threads:[~2019-05-16 18:26 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-14 21:10 [PATCH 0/2] Partial clone fix: handling received REF_DELTA Jonathan Tan
2019-05-14 21:10 ` [PATCH 1/2] t5616: refactor packfile replacement Jonathan Tan
2019-05-15  8:36   ` Johannes Schindelin
2019-05-15 18:22     ` Jonathan Tan
2019-05-14 21:10 ` [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases Jonathan Tan
2019-05-15  8:46   ` Johannes Schindelin
2019-05-15 18:28     ` Jonathan Tan
2019-05-17 18:33       ` Johannes Schindelin
2019-05-15 23:16   ` Jeff King
2019-05-16  1:43     ` Junio C Hamano
2019-05-16  4:04       ` Jeff King
2019-05-16 18:26     ` Jonathan Tan [this message]
2019-05-16 21:12       ` Jeff King
2019-05-16 21:30         ` Jonathan Tan
2019-05-16 21:42           ` Jeff King
2019-05-16 23:15             ` Jonathan Tan
2019-05-17  1:09               ` Jeff King
2019-05-17  1:22                 ` Jeff King
2019-05-17  4:39                   ` Jeff King
2019-05-17  4:42                     ` Jeff King
2019-05-17  7:20                     ` Duy Nguyen
2019-05-17  8:55                       ` Jeff King
2019-05-18 11:39                         ` Duy Nguyen
2019-05-20 23:04                           ` Nicolas Pitre
2019-05-21 21:20                             ` Jeff King
2019-06-03 22:23   ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190516182646.173332-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.