git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>, git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Patrick Steinhardt <ps@pks.im>,
	Christian Couder <christian.couder@gmail.com>,
	Albert Cui <albertqcui@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
Date: Tue, 10 Aug 2021 09:55:47 -0400	[thread overview]
Message-ID: <e7fe220b-2877-107e-8f7e-ea507a65feff@gmail.com> (raw)
In-Reply-To: <RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com>

On 8/5/2021 11:07 AM, Ævar Arnfjörð Bjarmason wrote:> We're in the 2.33.0 rc cycle, and I'd hoped to have some more prep
> work for this integrated already, but for now here's something
> interesting I've been working on for early commentary/feedback.

I've taking a brief look at the code, but mostly have thoughts about the
core idea based on reading the documentation. I'll keep my feedback here
restricted to that, especially because I saw some thoughts from Jonathan
that question the idea.

> This adds the the ability to protocol v2 for servers to optimistically
> pre-seed supporting clients with one or more bundles via a new
> "bundle-uri" protocol extension.

To make sure I understand things correctly, I'm going to rewrite the
description of the feature in my own words. Please correct me if I have
misread something.

This presents a new capability for protocol v2 that allows a server to
notify a client that they can download a bundle to bootstrap their object
database, and then come back to the origin server to "catch up" the
remaining new objects since that bundle via a fetch negotiation.

This idea is similar to the packfile-uri feature in that it offloads some
server load to a dumb data store (like a CDN) but differs in a few key
ways:

1. The bundle download does not need to be controlled by the server, and
   the server doesn't even need to know its contents. The client will
   discover the content and share its view in a later negotiation.

2. The packfile-uri could advertise a pack that contains only the large
   reachable blobs, and hence could use the "happy path" for bitmaps to
   compute a packfile containing all reachable objects except these large
   blobs. For the bundle-uri feature to mimic this, the blobs would need
   to be reachable from refs (perhaps shallow tags?) but it would not be
   clear that the negotiation would prevent redownloading those files.
   This is an area to explore and expand upon.

3. The bundle-uri feature is focused on "clone" scenarios, and I don't see
   how it could be used to help "fetch" scenarios. To be fair, I also have
   been vocal about how the packfile-uri feature is very hard to make
   helpful for the "fetch" scenario. The benefits to "clone" seem to be
   worth the effort alone. I think the bundle-api doubles-down on that
   being the core functionality that matters, and that is fine by me. It
   sacrifices the flexibility of the packfile-uri with a lower maintenance
   burden for servers that want to implement it.

The biggest benefit that I see is that the Git server does not need to
worry about whether the bundle has an exact set of data that it expects.
There is no timing issue about whether or not the exact packfile is seeded.
Instead, the server could have a fixed URL for its bundle and update its
contents (atomically) at any schedule without worrying about concurrent
clones being interrupted. From an operational perspective, I find the
bundle-uri a better way to offload "clone" load.

This also depends on that following "git fetch" being easy to serve. In
that sense, it can be beneficial to be somewhat aware of the bundles that
are being served: can we store the bundled refs as reachability bitmaps so
we have those available for the negotiation in the following "git fetch"
operations? This choice seems specific to how the server is deciding to
create these bundles.

It also presents interesting ideas for how to create the bundle to focus
on some core portion of the repo. The "thundering herd" of CI machines
that re-clone repos at a high rate will also be likely to use the
"--single-branch" option to reduce the refs that they will ask for in the
negotiation. In that sense, we won't want a snapshot of all refs at a
given time and instead might prefer a snapshot of the default branch or
some set of highly-active branches.

One question I saw Jonathan ask was "can we modify the packfile-uri
capability to be better?" I think this feature has enough different goals
that they could co-exist. That's the point of protocol v2, right? Servers
can implement and advertise the subset of functionality that they think is
best for their needs.

I hope my ramblings provide some amount of clarity to the discussion, but
also I intend to show support of the idea. If I was given the choice of
which feature to support (I mostly work on the client experience, so take
my choice with a grain of salt), then I would focus on implementing the
bundle-uri capability _before_ the packfile-uri capability. And that's the
best part: more options present more flexibility for different hosts to
make different decisions.

Thanks,
-Stolee

  parent reply	other threads:[~2021-08-10 13:55 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 01/13] serve: add command to advertise bundle URIs Ævar Arnfjörð Bjarmason
2021-08-10 13:58   ` Derrick Stolee
2021-08-23 13:25     ` Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 02/13] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 03/13] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 04/13] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 05/13] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 06/13] bundle-uri client: add transfer.injectBundleURI support Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 07/13] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 08/13] bundle.h: make "fd" version of read_bundle_header() public Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 09/13] fetch-pack: add a deref_without_lazy_fetch_extended() Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 10/13] fetch-pack: move --keep=* option filling to a function Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 11/13] index-pack: add --progress-title option Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 12/13] bundle-uri client: support for bundle-uri with "clone" Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 13/13] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
2021-08-24 21:48   ` brian m. carlson
2021-08-24 22:33     ` Ævar Arnfjörð Bjarmason
2021-08-06 14:38 ` [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Jonathan Nieder
2021-08-06 16:26   ` Ævar Arnfjörð Bjarmason
2021-08-06 20:40     ` Jonathan Nieder
2021-08-07  2:19       ` Ævar Arnfjörð Bjarmason
2021-08-10 13:55 ` Derrick Stolee [this message]
2021-08-23 13:28   ` Ævar Arnfjörð Bjarmason
2021-08-24  2:03     ` Derrick Stolee
2021-08-24 22:00       ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e7fe220b-2877-107e-8f7e-ea507a65feff@gmail.com \
    --to=stolee@gmail.com \
    --cc=albertqcui@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).