All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases
Date: Thu, 16 May 2019 17:42:58 -0400	[thread overview]
Message-ID: <20190516214257.GD10787@sigill.intra.peff.net> (raw)
In-Reply-To: <20190516213056.221406-1-jonathantanmy@google.com>

On Thu, May 16, 2019 at 02:30:56PM -0700, Jonathan Tan wrote:

> > So I could go either way, though I do think it makes sense for on-demand
> > fetches for partial clones to avoid asking for thin packs as a general
> > principle.
> 
> This should not be a problem since fetch-pack can already know that
> we're doing an on-demand fetch (args->no_dependents), so we should be
> able to either plumb a "no-thin-pack" arg in the same way or rename
> args->no_dependents to also encompass the no-thin-pack option. But this
> can be done separately from this patch set, I think.

Yeah, I think it can be done separately. Though the two may intermingle
if we want to instruct index-pack that it should not try to pre-fetch if
we did not ask for a thin pack.

> > As a matter of fact, should partial clones _always_ avoid
> > asking for thin packs?  That would make this issue go away entirely.
> > 
> > Sometimes it would be more efficient (we do not have to get an extra
> > base object just to resolve the delta we needed) but sometimes worse (if
> > we did actually have the base, it's a win). Whether it's a win would
> > depend on the "hit" rate, and I suspect that is heavily dependent on
> > workload characteristics (what kind of filtering is in use, are we
> > topping up in a non-partial way, etc).
> 
> I think it's best if we still allow servers to serve thin packs. For
> example, if we're excluding only large blobs, clients would still want
> servers to be able to delta against blobs that they have.

Yes, this is getting into the hit-rate thing I mentioned. You're right
that for a reasonably typical case of "no blobs over 10MB" we'd have a
very high hit rate, and disabling thin packs would almost certainly be a
big loss.

I guess even when we have a "miss", the cost is usually not that high
either. If we get A as a delta against B, then in the non-thin-pack case
we transfer all of A. In the thin-pack case with pre-fetch we transfer
all of B, and then the delta. But the delta is often small enough
compared to the total content that it's not that big a deal either way.
There are pathological cases, of course, but that's already true. :)

So you're right, it's probably still a win to use thin packs when we
can.

> > Right, REF_DELTA is definitely correctly handled currently, and I don't
> > think that would break with your patch. It's just that your patch would
> > introduce a bunch of extra traffic as we request bases separately that
> > are already in the pack.
> 
> Ah...I see. For this problem, I think that it can be solved with the
> "if (objects[d->obj_no].real_type != OBJ_REF_DELTA)" check that the
> existing code uses before calling read_object(). I'll include this in
> the next reroll if any other issue comes up.

I'm confused about this. Aren't we pre-fetching before we've actually
resolved deltas? The base could be in the pack as a true base, and we
might have seen it already then. But it could itself be a delta, and we
wouldn't know we have it until we resolve it (this gets into the
lucky/unlucky ordering thing).

-Peff

  reply	other threads:[~2019-05-16 21:43 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-14 21:10 [PATCH 0/2] Partial clone fix: handling received REF_DELTA Jonathan Tan
2019-05-14 21:10 ` [PATCH 1/2] t5616: refactor packfile replacement Jonathan Tan
2019-05-15  8:36   ` Johannes Schindelin
2019-05-15 18:22     ` Jonathan Tan
2019-05-14 21:10 ` [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases Jonathan Tan
2019-05-15  8:46   ` Johannes Schindelin
2019-05-15 18:28     ` Jonathan Tan
2019-05-17 18:33       ` Johannes Schindelin
2019-05-15 23:16   ` Jeff King
2019-05-16  1:43     ` Junio C Hamano
2019-05-16  4:04       ` Jeff King
2019-05-16 18:26     ` Jonathan Tan
2019-05-16 21:12       ` Jeff King
2019-05-16 21:30         ` Jonathan Tan
2019-05-16 21:42           ` Jeff King [this message]
2019-05-16 23:15             ` Jonathan Tan
2019-05-17  1:09               ` Jeff King
2019-05-17  1:22                 ` Jeff King
2019-05-17  4:39                   ` Jeff King
2019-05-17  4:42                     ` Jeff King
2019-05-17  7:20                     ` Duy Nguyen
2019-05-17  8:55                       ` Jeff King
2019-05-18 11:39                         ` Duy Nguyen
2019-05-20 23:04                           ` Nicolas Pitre
2019-05-21 21:20                             ` Jeff King
2019-06-03 22:23   ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190516214257.GD10787@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.