All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Patrick Steinhardt <ps@pks.im>,
	Christian Couder <christian.couder@gmail.com>,
	Albert Cui <albertqcui@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc.
Date: Mon, 23 Aug 2021 22:03:11 -0400	[thread overview]
Message-ID: <c56bec68-bf00-f3b0-74b4-fd3e47319000@gmail.com> (raw)
In-Reply-To: <87k0kcfdg0.fsf@evledraar.gmail.com>

On 8/23/2021 9:28 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Aug 10 2021, Derrick Stolee wrote:
> 
>> On 8/5/2021 11:07 AM, Ævar Arnfjörð Bjarmason wrote:> We're in the 2.33.0 rc cycle, and I'd hoped to have some more prep

(I'll skip the parts where you agree with me.)

>> 3. The bundle-uri feature is focused on "clone" scenarios, and I don't see
>>    how it could be used to help "fetch" scenarios. To be fair, I also have
>>    been vocal about how the packfile-uri feature is very hard to make
>>    helpful for the "fetch" scenario. The benefits to "clone" seem to be
>>    worth the effort alone. I think the bundle-api doubles-down on that
>>    being the core functionality that matters, and that is fine by me. It
>>    sacrifices the flexibility of the packfile-uri with a lower maintenance
>>    burden for servers that want to implement it.
> 
> Not correct, I'm going to have it handle incremental fetches.
> 
> What *is* correct is that the initial RFC patches here entirely punt on
> it (I only hook them into clone.c, no fetch.c).
> 
> I focused on the "clone" case to get to a MVP earlier, and as you note I
> think in practice most users of this will want to only use it for clone
> bootstrapping, and won't care so much about the incremental case.
> 
> But for the "fetch" case we'd simply start fetching the N bundles in
> parallel, and read their headers, then abort fetching any whose header
> declares tips that we already have.

To save network time, we would need to read data as it is streaming
from the network and stop the download when we realize we don't need
the bundle? This seems wasteful, but I'm willing to see it work in
reality before judging it too harshly.

It would be better if the origin Git server could see the tips that
the user has and then only recommend bundles that serve new data.
That requires a closer awareness of which bundles contain which
content.

> So e.g. for a repo with a "big" bundle that's 1 week old, and 7
> incremental bundles for the last 7 days of updates we'd quickly find
> that we only need the last 3 if we updated 4 days ago, based on only
> fetching their headers.

I can attest that using time-based packs can be a good way to catch
up based on pre-computed pack-files instead of generating one on
demand. The GVFS protocol has a "prefetch" endpoint that gives all
pre-computed pack-files since a given timestamp. (One big difference
is that those pack-files contain only commits and trees, not blobs.
We should ensure that your proposal can extend to bundles that serve
filtered repos such as --filter=blob:none.)

The thing that works especially well with the GVFS protocol is that
if we are missing a reachable object from one of those pack-files,
then it is downloaded as a one-off request _when needed_. This allows
the pack-files to be computed by independent cache servers that have
their own clocks and compute their own timestamps and generate these
pack-files as "new objects I fetched since the last timestamp".

This is all to say that your timestamp-based approach will work as
long as you are very careful in how the incremental bundles are
constructed. There can really only be one list of bundles at a time,
and somehow we need those servers to atomically swap their list of
bundles when they are reorganized.

> The client spec is very liberal on "MAY" instead of "MUST" for
> supporting clients. E.g. a client might sensibly conclude that their
> most recent commits are recent enough, and that it would probably be a
> waste of time to fetch the headers of the pointed-to bundles, and just
> do a normal fetch instead.

Sensible. I wouldn't expect anything less.

> The response format is also currently:
> 
>     <URI>[ <optional key-value>*]
> 
> And nothing uses the "optional key-value". One of the main things I had
> in mind was to support something like (none of these key-values are
> specified currently):
> 
>     https://example.com/big.bundle tip=deadbeef1
>     https://example.com/incremental-1.bundle prereq=deadbeef1 tip=deadbeef2
>     https://example.com/incremental-2.bundle prereq=deadbeef2 tip=deadbeef3
> 
> Or:
> 
>     https://example.com/big.bundle type=complete
>     https://example.com/incremental-1.bundle type=incremental
>     https://example.com/incremental-2.bundle type=incremental
> 
> I.e. allow the server to up-front to optionally specify some details
> about the pointed-to-bundles, purely so the client can avoid the work
> and roundtrip of trying to fetch the headers of N pointed-to bundles
> just to see which if any it needs for a clone/fetch.

Ok, this seems like a reasonable way forward. If the bundles are
truly focused on data within a core "trunk" (say, the reflog of
the default branch) then this works. It gets more complicated for
repos with many long-lived branches that merge together over the
span of weeks. I'm talking about extreme outliers like the Windows
OS repository, so take this concern with a grain of salt. The
point is that in that world, a single tip isn't enough sometimes.

...
>> One question I saw Jonathan ask was "can we modify the packfile-uri
>> capability to be better?" I think this feature has enough different goals
>> that they could co-exist. That's the point of protocol v2, right? Servers
>> can implement and advertise the subset of functionality that they think is
>> best for their needs.
> 
> *Nod*, I also think as shown with these patches the overall complexity
> after setting up the scaffolding for a new feature isn't very high, and
> much of it is going away in some generally useful prep patches I'm
> trying to get in...
> 
>> I hope my ramblings provide some amount of clarity to the discussion, but
>> also I intend to show support of the idea. If I was given the choice of
>> which feature to support (I mostly work on the client experience, so take
>> my choice with a grain of salt), then I would focus on implementing the
>> bundle-uri capability _before_ the packfile-uri capability. And that's the
>> best part: more options present more flexibility for different hosts to
>> make different decisions.
> 
> Thanks again. I'll keep you CC'd on future submissions if that's OK.

Sounds good. I'll try to prioritize review.

> One thing I could really use to move this forward is code review of some
> of the outstanding serieses I've got, in particular these two are
> immediately applicable to this one. It's based on the former, and I
> needed to steal one patch from the latter:
> 
> https://lore.kernel.org/git/cover-v4-00.10-00000000000-20210805T011823Z-avarab@gmail.com/
> https://lore.kernel.org/git/cover-v2-0.4-00000000000-20210823T110136Z-avarab@gmail.com/

I'll see what I can contribute here.

Thanks,
-Stolee

  reply	other threads:[~2021-08-24  2:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 15:07 [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 01/13] serve: add command to advertise bundle URIs Ævar Arnfjörð Bjarmason
2021-08-10 13:58   ` Derrick Stolee
2021-08-23 13:25     ` Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 02/13] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 03/13] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 04/13] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 05/13] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 06/13] bundle-uri client: add transfer.injectBundleURI support Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 07/13] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 08/13] bundle.h: make "fd" version of read_bundle_header() public Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 09/13] fetch-pack: add a deref_without_lazy_fetch_extended() Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 10/13] fetch-pack: move --keep=* option filling to a function Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 11/13] index-pack: add --progress-title option Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 12/13] bundle-uri client: support for bundle-uri with "clone" Ævar Arnfjörð Bjarmason
2021-08-05 15:07 ` [RFC PATCH 13/13] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
2021-08-24 21:48   ` brian m. carlson
2021-08-24 22:33     ` Ævar Arnfjörð Bjarmason
2021-08-06 14:38 ` [RFC PATCH 00/13] Add bundle-uri: resumably clones, static "dumb" CDN etc Jonathan Nieder
2021-08-06 16:26   ` Ævar Arnfjörð Bjarmason
2021-08-06 20:40     ` Jonathan Nieder
2021-08-07  2:19       ` Ævar Arnfjörð Bjarmason
2021-08-10 13:55 ` Derrick Stolee
2021-08-23 13:28   ` Ævar Arnfjörð Bjarmason
2021-08-24  2:03     ` Derrick Stolee [this message]
2021-08-24 22:00       ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c56bec68-bf00-f3b0-74b4-fd3e47319000@gmail.com \
    --to=stolee@gmail.com \
    --cc=albertqcui@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.