All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Erik van Zijst <erik.van.zijst@gmail.com>
Cc: git@vger.kernel.org, ssaasen@atlassian.com, mheemskerk@atlassian.com
Subject: Re: [ANNOUNCE] Git Merge Contributor Summit topic planning
Date: Wed, 1 Feb 2017 15:53:00 +0100	[thread overview]
Message-ID: <20170201145300.4pn3faodhdb72jly@sigill.intra.peff.net> (raw)
In-Reply-To: <1485941532-47993-1-git-send-email-erik.van.zijst@gmail.com>

On Wed, Feb 01, 2017 at 10:32:12AM +0100, Erik van Zijst wrote:

> Clients performing a full clone get redirected to a CDN where they seed
> their new local repo from a pre-built bundle file, and then pull/fetch
> any remaining changes. Mercurial has had native, built-in support for
> this for a while now.
> 
> I imagine other large code hosts could benefit from this as well and
> I'd love to gauge the group's interest for this. Could this make sense
> for Git? Would it have a chance of landing?
> 
> Our spike implements it as an optional capability during ref
> advertisement. What are your thoughts on this?

I think this is definitely an interesting topic to discuss tomorrow.

Here are a few observations from my past thinking on the issue. I
haven't read the proposal from earlier this week yet, so some of them
may be obsolete.

Seeding from a bundle CDN generally solves two problems: getting the
bulk of the data from someplace with higher bandwidth (the CDN), and
getting the bulk of the data over a protocol that can be resumed (the
bundle).

But we don't necessarily have to solve both problems simultaneously.
And you might not want to. Storing a separate bundle on another server
is complicated to configure, and doubles the amount of disk space you
need (just half of it is on the CDN). Using a bundle means you can't
seed from a non-bundle source.

So for any solution, I'd want to consider how you can put together the
pieces. Can you seed from a non-bundle? Can you seed from yourself and
just get resumability? If so, how hard is it to serve a pseudo-bundle
based on the packfiles you have on disk (i.e., getting resumability
at least in the common cases without paying the disk cost). I.e., saving
enough data that you could reconstruct the bundle byte-for-byte when you
need to.

If you _can_ do that latter part, and you take "I only care about
resumability" to the simplest extreme, you'd probably end up with a
protocol more like:

  Client: I need a packfile with this want/have
  Server: OK, here it is; its opaque id is XYZ.
  ... connection interrupted ...
  Client: It's me again. I have up to byte N of pack XYZ
  Server: OK, resuming
          [or: I don't have XYZ anymore; start from scratch]

Then generating XYZ and generating that bundle are basically the same
task.

All just food for thought. I look forward to digging into it more on the
list and in the in-person discussion.

-Peff

  reply	other threads:[~2017-02-01 14:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-31  0:48 [ANNOUNCE] Git Merge Contributor Summit topic planning Jeff King
2017-01-31  0:59 ` Jeff King
2017-02-01 19:51   ` Christian Couder
2017-02-01  9:32 ` Erik van Zijst
2017-02-01 14:53   ` Jeff King [this message]
2017-02-01 18:06     ` Junio C Hamano
2017-02-01 21:28       ` Jeff King
2017-02-01 21:35         ` Junio C Hamano
2017-02-01 20:37 ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170201145300.4pn3faodhdb72jly@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=erik.van.zijst@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=mheemskerk@atlassian.com \
    --cc=ssaasen@atlassian.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.