git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Patrick Steinhardt <ps@pks.im>,
	Christian Couder <christian.couder@gmail.com>,
	Albert Cui <albertqcui@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
Date: Sat, 07 Aug 2021 04:19:59 +0200	[thread overview]
Message-ID: <878s1eynew.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <YQ2eLRjMRnVpdGVZ@google.com>


On Fri, Aug 06 2021, Jonathan Nieder wrote:

> Ævar Arnfjörð Bjarmason wrote:
>
>> Or perhaps not, but they're my currently my best effort to explain the
>> differences between the two and how they interact. So I think it's best
>> to point to those instead of coming up with something in this reply,
>> which'll inevitably be an incomplete rewrite of much of that.
>>
>> In short, there are use-cases that packfile-uri is inherently unsuitable
>> for, or rather changing the packfile-uri feature to support them would
>> pretty much make it indistinguishable from this bundle-uri mechanism,
>> which I think would just add more confusion to the protocol.
>
> Hm.  I was hoping you might say more about those use cases --- e.g. is
> there a concrete installation that wants to take advantage of this?
> By focusing on the real-world example, we'd get a better shared
> understanding of the underlying constraints.

I hacked this up for potential use on GitLab's infrastructure, mainly as
a mechanism to relieve CPU pressure on CI thundering herds.

Often you need full clones, and you sometimes need to do those from
scratch. When you've just had a push come in it's handy to convert those
to incremental fetches on top of a bundle you made recently.

It's not deployed on anything currently, it's just something I've been
hacking up. I'll be on vacation much of the rest of this month, the plan
is to start stressing it on real-world use-cases after that. I thought
I'd send this RFC first.

> After all, both are ways to reduce the bandwidth of a clone or other
> large fetch operation by offloading the bulk of content to static
> serving.

The support we have for packfile-uri in git.git now as far as the server
side goes, I think it's fair to say, fairly immature, I gather that
JGit's version is more advanced, and that's what's serving up things at
Google at e.g. https://chromium.googlesource.com.

I.e. right now for git-upload-pack you need to exhaustively enumerate
all the objects to exclude, although there's some on-list patches
recently for being able to supply tips.

More importantly your CDN reliability MUST match that of your git
server, otherwise your clone fails (as the server has already sent the
"other side" of the expected CDN pack).

Furthermore you MUST as the server be able to tell the client what pack
checksum on the CDN they should expect, which requires a very tight
coupling between git server and CDN.

You can't e.g. say "bootstrap with this bundle/pack" and point to
something like Debian's async-updated FTP network as a source. The
bootstrap data may or may not be there, and it may or may not be as
up-to-date as you'd like.

I think any proposed integration into git.git should mainly consider the
bulk of users, the big hosting providers can always run with their own
patches.

I think this approach opens up the potential for easier and therefore
wider CDN integration for git servers for providers that aren't one of
the big fish. E.g. your CDN generation can be daily cronjob, and the
server can point to it blindly and hope for the best. The client will
optimistically use the CDN data, and recover if not.

I think one thing I oversold is the "path to resumable clones",
i.e. that's all true, but I don't think that's really any harder go do
with packfile-uri in theory (in practice just serving up a sensible pack
with it is pretty tedious with git-upload-pack as it stands).

The negotiation aspect of it is also interesting and something I've been
experimenting with. I.e. the bundles are what the client sends as its
HAVE tips. This allows a server to anticipate what dialog newly cloning
clients are likely to have, and even pre-populate a cache of that
locally (it could even serve that diff up as a packfile-uri :).

Right now it retrieves each bundle in full before adding the tips to
negotiate to a subsequent dialog, but I've successfully experimental
locally with negotiating on the basis of objects we don't even have
yet. I.e. download the bundle(s), and as soon as we have the header fire
off the dialog with the server to get its PACK on the basis of those
promised tips.

It makes the recovery a bit more involved in case the bundles don't have
what we're after, but this allows us to disconnect really fast from the
server and twiddle our own thumbs while we finish downloading the
bundles to get the full clone. We already disconnect early in cases
where the bundle(s) already have what we're after.

This E-Mail got rather long, but hey, I did point you parts of this
series that cover some/most of this, and since it wasn't clear what if
anything of that you'd read ... :) Hopefully this summary is useful.

  reply	other threads:[~2021-08-07  2:54 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 01/13] serve: add command to advertise bundle URIs Ævar Arnfjörð Bjarmason
2021-08-10 13:58   ` Derrick Stolee
2021-08-23 13:25     ` Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 02/13] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 03/13] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 04/13] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 05/13] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 06/13] bundle-uri client: add transfer.injectBundleURI support Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 07/13] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 08/13] bundle.h: make "fd" version of read_bundle_header() public Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 09/13] fetch-pack: add a deref_without_lazy_fetch_extended() Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 10/13] fetch-pack: move --keep=* option filling to a function Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 11/13] index-pack: add --progress-title option Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 12/13] bundle-uri client: support for bundle-uri with "clone" Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 13/13] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
2021-08-24 21:48   ` brian m. carlson
2021-08-24 22:33     ` Ævar Arnfjörð Bjarmason
2021-08-06 14:38 ` [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Jonathan Nieder
2021-08-06 16:26   ` Ævar Arnfjörð Bjarmason
2021-08-06 20:40     ` Jonathan Nieder
2021-08-07  2:19       ` Ævar Arnfjörð Bjarmason [this message]
2021-08-10 13:55 ` Derrick Stolee
2021-08-23 13:28   ` Ævar Arnfjörð Bjarmason
2021-08-24  2:03     ` Derrick Stolee
2021-08-24 22:00       ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878s1eynew.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=albertqcui@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).