git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Victoria Dye <vdye@github.com>
To: "Ævar Arnfjörð Bjarmason via GitGitGadget"
	<gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, newren@gmail.com,
	avarab@gmail.com, mjcheetham@outlook.com, steadmon@google.com,
	chooglen@google.com, jonathantanmy@google.com,
	dyroneteng@gmail.com, Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH 1/9] protocol v2: add server-side "bundle-uri" skeleton
Date: Thu, 10 Nov 2022 17:59:37 -0800	[thread overview]
Message-ID: <508ba524-09d2-81f1-c5f8-815ab766791a@github.com> (raw)
In-Reply-To: <a02eee983185815d94ba1124b43eae43280aa963.1667264854.git.gitgitgadget@gmail.com>

Ævar Arnfjörð Bjarmason via GitGitGadget wrote:
> +
> +PROTOCOL for bundle-uri
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +A `bundle-uri` request takes no arguments, and as noted above does not
> +currently advertise a capability value. Both may be added in the
> +future.
> +
> +When the client issues a `command=bundle-uri` the response is a list of

nit: comma after `command=bundle-uri`

I misread this a couple times dropping the "the", so it read like the
`command=bundle-uri` was the *response*, not the request. I think the comma
would help avoid that?

> +key-value pairs provided as packet lines with value `<key>=<value>`. The
> +meaning of these key-value pairs are provided by the config keys in the
> +`bundle.*` namespace (see linkgit:git-config[1]).

What does this ("the meaning of these key-value pairs are provided by the
config keys...") mean? Are the response keys in the `bundle.*` namespace? Or
do the client-side `bundle.*` keys provide some kind of translation of what
the keys mean?

> +
> +Clients are still expected to fully parse the line according to the
> +above format, lines that do not conform to the format SHOULD be
> +discarded. The user MAY be warned in such a case.

Why "still" - is there some reason they *wouldn't* parse the response line? 

Is "the above format" referring to `<key>=<value>` in general, or restricted
to/guaranteed that the `<key>`'s defined by the `bundle.*` namespace? I'm
guessing "still expected to fully parse" == "MUST parse" (using
MUST/SHOULD/MAY nomeclature), it would help to call that out explicitly to
be consistent with the rest of this doc.

> +
> +bundle-uri CLIENT AND SERVER EXPECTATIONS
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +URI CONTENTS::
> +The advertised URIs MUST be in one of two possible formats.
> ++
> +The first possible format is a bundle file that `git bundle verify`

I don't think "format" is the right word to describe this (I'd reserve
"format" for the literal format of the URI string). Maybe something like
this?

	The advertised URIs MUST contain one of two types of content.

	The advertised URI may contain a bundle file that `git bundle
	verify` would accept...

	...
	
	Alternatively, the advertised URI may provide a plaintext file...

> +would accept. I.e. they MUST contain one or more reference tips for
> +use by the client, MUST indicate prerequisites (in any) with standard
> +"-" prefixes, and MUST indicate their "object-format", if
> +applicable. Create "*.bundle" files with `git bundle create`.

The last sentence doesn't add anything that you don't know from the `git
bundle verify` note in the first doesn't already tell you, and feels like a
bit of a non-sequitur as a result. Although, it tangentially raises a
question: do bundle files *have to* have the '.bundle' suffix to pass `git
bundle verify`? If not, are they expected to when coming from these URIs?

> ++
> +The second possible format is a plaintext file that `git config --list`
> +would accept (with the `--file` option). The key-value pairs in this list
> +are in the `bundle.*` namespace (see linkgit:git-config[1]).
> +
> +bundle-uri CLIENT ERROR RECOVERY::
> +A client MUST above all gracefully degrade on errors, whether that
> +error is because of bad missing/data in the bundle URI(s), because
> +that client is too dumb to e.g. understand and fully parse out bundle
> +headers and their prerequisite relationships, or something else.
> ++
> +Server operators should feel confident in turning on "bundle-uri" and
> +not worry if e.g. their CDN goes down that clones or fetches will run
> +into hard failures. Even if the server bundle bundle(s) are
> +incomplete, or bad in some way the client should still end up with a
> +functioning repository, just as if it had chosen not to use this
> +protocol extension.
> ++
> +All subsequent discussion on client and server interaction MUST keep
> +this in mind.
> +
> +bundle-uri SERVER TO CLIENT::
> +The ordering of the returned bundle uris is not significant. Clients
> +MUST parse their headers to discover their contained OIDS and
> +prerequisites. A client MUST consider the content of the bundle(s)
> +themselves and their header as the ultimate source of truth.
> ++
> +A server MAY even return bundle(s) that don't have any direct
> +relationship to the repository being cloned (either through accident,
> +or intentional "clever" configuration), and expect a client to sort
> +out what data they'd like from the bundle(s), if any.
> +
> +bundle-uri CLIENT TO SERVER::
> +The client SHOULD provide reference tips found in the bundle header(s)
> +as 'have' lines in any subsequent `fetch` request. A client MAY also
> +ignore the bundle(s) entirely if doing so is deemed worse for some
> +reason, e.g. if the bundles can't be downloaded, it doesn't like the
> +tips it finds etc.
> +
> +WHEN ADVERTISED BUNDLE(S) REQUIRE NO FURTHER NEGOTIATION::
> +If after issuing `bundle-uri` and `ls-refs`, and getting the header(s)
> +of the bundle(s) the client finds that the ref tips it wants can be
> +retrieved entirety from advertised bundle(s), it MAY disconnect. The

s/entirety/entirely

And to clarify, by "it MAY disconnect", you mean it may disconnect from the
main repository server? Or the bundle server? 

> +results of such a 'clone' or 'fetch' should be indistinguishable from
> +the state attained without using bundle-uri.
> +
> +EARLY CLIENT DISCONNECTIONS AND ERROR RECOVERY::
> +A client MAY perform an early disconnect while still downloading the
> +bundle(s) (having streamed and parsed their headers). In such a case
> +the client MUST gracefully recover from any errors related to
> +finishing the download and validation of the bundle(s).
> ++
> +I.e. a client might need to re-connect and issue a 'fetch' command,
> +and possibly fall back to not making use of 'bundle-uri' at all.
> ++
> +This "MAY" behavior is specified as such (and not a "SHOULD") on the
> +assumption that a server advertising bundle uris is more likely than
> +not to be serving up a relatively large repository, and to be pointing
> +to URIs that have a good chance of being in working order. A client
> +MAY e.g. look at the payload size of the bundles as a heuristic to see
> +if an early disconnect is worth it, should falling back on a full
> +"fetch" dialog be necessary.
> +
> +WHEN ADVERTISED BUNDLE(S) REQUIRE FURTHER NEGOTIATION::
> +A client SHOULD commence a negotiation of a PACK from the server via
> +the "fetch" command using the OID tips found in advertised bundles,
> +even if's still in the process of downloading those bundle(s).
> ++
> +This allows for aggressive early disconnects from any interactive
> +server dialog. The client blindly trusts that the advertised OID tips
> +are relevant, and issues them as 'have' lines, it then requests any
> +tips it would like (usually from the "ls-refs" advertisement) via
> +'want' lines. The server will then compute a (hopefully small) PACK
> +with the expected difference between the tips from the bundle(s) and
> +the data requested.
> ++
> +The only connection the client then needs to keep active is to the
> +concurrently downloading static bundle(s), when those and the
> +incremental PACK are retrieved they should be inflated and
> +validated. Any errors at this point should be gracefully recovered
> +from, see above.
> +
> +bundle-uri PROTOCOL FEATURES
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +As noted above the `<key>=<value>` definitions are documented by the
> +`bundle.*` config namespace.

Same comment as earlier - this is a confusing way to phrase this. If you
mean "the keys are part of the `bundle.*` namespace documented in
linkgit:git-config[1]", I think you can just say that directly. If not, it
would help to clarify the relationship between the `bundle.*` namespace and
these keys.

> +
> +In particular, the `bundle.version` key specifies an integer value. The
> +only accepted value at the moment is `1`, but if the client sees an
> +unexpected value here then the client MUST ignore the bundle list.
> +
> +As long as `bundle.version` is understood, all other unknown keys MAY be
> +ignored by the client. The server will guarantee compatibility with older
> +clients, though newer clients may be better able to use the extra keys to
> +minimize downloads.
> +
> +Any backwards-incompatible addition of pre-URI key-value will be
> +guarded by a new `bundle.version` value or values in 'bundle-uri'
> +capability advertisement itself, and/or by new future `bundle-uri`
> +request arguments.
> +
> +Some example key-value pairs that are not currently implemented but could
> +be implemented in the future include:
> +
> + * Add a "hash=<val>" or "size=<bytes>" advertise the expected hash or
> +   size of the bundle file.
> +
> + * Advertise that one or more bundle files are the same (to e.g. have
> +   clients round-robin or otherwise choose one of N possible files).
> +
> + * A "oid=<OID>" shortcut and "prerequisite=<OID>" shortcut. For
> +   expressing the common case of a bundle with one tip and no
> +   prerequisites, or one tip and one prerequisite.
> ++
> +This would allow for optimizing the common case of servers who'd like
> +to provide one "big bundle" containing only their "main" branch,
> +and/or incremental updates thereof.
> ++
> +A client receiving such a a response MAY assume that they can skip
> +retrieving the header from a bundle at the indicated URI, and thus
> +save themselves and the server(s) the request(s) needed to inspect the
> +headers of that bundle or bundles.

Overall, this document is quite thorough, especially with respect to edge
cases/error handling. I found it a bit confusing at times (at least
partially due to my unfamiliarity with protocol v2), including some
potentially ambiguous phrasing or scenarios (especially those in the
disconnection & error recovery sections) that are difficult to clearly
describe in generic terms.

I think some sections (especially "PROTOCOL for bundle-uri" and "bundle-uri
CLIENT AND SERVER EXPECTATIONS") would benefit from examples of what "good"
and "bad" request/response values & behaviors look like; they would help
illustrate some of those more complex situations. The rest of the patch (the
implementation & tests) looked good to me. 

Thanks for your continued work on this, I'm really excited to see the next
steps of bundle servers in this series!


  parent reply	other threads:[~2022-11-11  1:59 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-01  1:07 [PATCH 0/9] Bundle URIs IV: advertise over protocol v2 Derrick Stolee via GitGitGadget
2022-11-01  1:07 ` [PATCH 1/9] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason via GitGitGadget
2022-11-08 17:08   ` SZEDER Gábor
2022-11-11  1:59   ` Victoria Dye [this message]
2022-11-16 14:08     ` Derrick Stolee
2022-11-01  1:07 ` [PATCH 2/9] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason via GitGitGadget
2022-11-01  1:07 ` [PATCH 3/9] bundle-uri client: add helper for testing server Ævar Arnfjörð Bjarmason via GitGitGadget
2022-11-01  1:07 ` [PATCH 4/9] bundle-uri: serve bundle.* keys from config Derrick Stolee via GitGitGadget
2022-11-01  1:07 ` [PATCH 5/9] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason via GitGitGadget
2022-11-01  1:07 ` [PATCH 6/9] strbuf: reintroduce strbuf_parent_directory() Derrick Stolee via GitGitGadget
2022-11-03  9:28   ` Phillip Wood
2022-11-03  9:49   ` Ævar Arnfjörð Bjarmason
2022-11-01  1:07 ` [PATCH 7/9] bundle-uri: allow relative URLs in bundle lists Derrick Stolee via GitGitGadget
2022-11-01  1:07 ` [PATCH 8/9] bundle-uri: download bundles from an advertised list Derrick Stolee via GitGitGadget
2022-11-01  1:07 ` [PATCH 9/9] clone: unbundle the advertised bundles Derrick Stolee via GitGitGadget
2022-11-16 19:51 ` [PATCH v2 0/9] Bundle URIs IV: advertise over protocol v2 Derrick Stolee via GitGitGadget
2022-11-16 19:51   ` [PATCH v2 1/9] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason via GitGitGadget
2022-11-16 19:51   ` [PATCH v2 2/9] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason via GitGitGadget
2022-11-29  0:57     ` Victoria Dye
2022-12-02 15:00       ` Derrick Stolee
2022-11-16 19:51   ` [PATCH v2 3/9] bundle-uri client: add helper for testing server Ævar Arnfjörð Bjarmason via GitGitGadget
2022-11-29  0:59     ` Victoria Dye
2022-12-02 15:28       ` Derrick Stolee
2022-11-16 19:51   ` [PATCH v2 4/9] bundle-uri: serve bundle.* keys from config Derrick Stolee via GitGitGadget
2022-11-29  1:00     ` Victoria Dye
2022-11-16 19:51   ` [PATCH v2 5/9] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason via GitGitGadget
2022-11-29  1:03     ` Victoria Dye
2022-12-02 15:38       ` Derrick Stolee
2022-11-16 19:51   ` [PATCH v2 6/9] strbuf: introduce strbuf_strip_file_from_path() Derrick Stolee via GitGitGadget
2022-11-29  1:03     ` Victoria Dye
2022-12-02 15:40       ` Derrick Stolee
2022-12-02 18:32     ` Ævar Arnfjörð Bjarmason
2022-12-05 15:11       ` Derrick Stolee
2022-11-16 19:51   ` [PATCH v2 7/9] bundle-uri: allow relative URLs in bundle lists Derrick Stolee via GitGitGadget
2022-11-29  1:25     ` Victoria Dye
2022-12-02 16:03       ` Derrick Stolee
2022-11-16 19:51   ` [PATCH v2 8/9] bundle-uri: download bundles from an advertised list Derrick Stolee via GitGitGadget
2022-11-29  1:51     ` Victoria Dye
2022-11-16 19:51   ` [PATCH v2 9/9] clone: unbundle the advertised bundles Derrick Stolee via GitGitGadget
2022-11-29  1:59     ` Victoria Dye
2022-12-02 16:16       ` Derrick Stolee
2022-12-05 17:50   ` [PATCH v3 00/11] Bundle URIs IV: advertise over protocol v2 Derrick Stolee via GitGitGadget
2022-12-05 17:50     ` [PATCH v3 01/11] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason via GitGitGadget
2022-12-05 23:31       ` Victoria Dye
2022-12-05 17:50     ` [PATCH v3 02/11] t: create test harness for 'bundle-uri' command Ævar Arnfjörð Bjarmason via GitGitGadget
2022-12-05 17:50     ` [PATCH v3 03/11] clone: request the 'bundle-uri' command when available Ævar Arnfjörð Bjarmason via GitGitGadget
2022-12-05 17:50     ` [PATCH v3 04/11] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason via GitGitGadget
2022-12-05 23:32       ` Victoria Dye
2022-12-07 15:20         ` Derrick Stolee
2022-12-05 17:50     ` [PATCH v3 05/11] transport: rename got_remote_heads Derrick Stolee via GitGitGadget
2022-12-05 17:50     ` [PATCH v3 06/11] bundle-uri client: add helper for testing server Ævar Arnfjörð Bjarmason via GitGitGadget
2022-12-05 23:32       ` Victoria Dye
2022-12-05 17:50     ` [PATCH v3 07/11] bundle-uri: serve bundle.* keys from config Derrick Stolee via GitGitGadget
2022-12-05 17:50     ` [PATCH v3 08/11] strbuf: introduce strbuf_strip_file_from_path() Derrick Stolee via GitGitGadget
2022-12-06 10:06       ` Ævar Arnfjörð Bjarmason
2022-12-06 11:37         ` Ævar Arnfjörð Bjarmason
2022-12-07 14:44           ` Derrick Stolee
2022-12-08 12:52             ` Ævar Arnfjörð Bjarmason
2022-12-05 17:50     ` [PATCH v3 09/11] bundle-uri: allow relative URLs in bundle lists Derrick Stolee via GitGitGadget
2022-12-05 23:33       ` Victoria Dye
2022-12-07 15:22         ` Derrick Stolee
2022-12-05 17:50     ` [PATCH v3 10/11] bundle-uri: download bundles from an advertised list Derrick Stolee via GitGitGadget
2022-12-07 12:57       ` Jeff King
2022-12-07 15:27         ` Derrick Stolee
2022-12-07 15:54           ` Derrick Stolee
2022-12-08  6:40             ` Jeff King
2022-12-08  6:36           ` Jeff King
2022-12-08 14:58             ` Derrick Stolee
2022-12-05 17:50     ` [PATCH v3 11/11] clone: unbundle the advertised bundles Derrick Stolee via GitGitGadget
2022-12-05 23:42     ` [PATCH v3 00/11] Bundle URIs IV: advertise over protocol v2 Victoria Dye
2022-12-22 15:14     ` [PATCH v4 " Derrick Stolee via GitGitGadget
2022-12-22 15:14       ` [PATCH v4 01/11] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason via GitGitGadget
2022-12-22 15:14       ` [PATCH v4 02/11] t: create test harness for 'bundle-uri' command Ævar Arnfjörð Bjarmason via GitGitGadget
2022-12-22 15:14       ` [PATCH v4 03/11] clone: request the 'bundle-uri' command when available Ævar Arnfjörð Bjarmason via GitGitGadget
2022-12-22 15:14       ` [PATCH v4 04/11] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason via GitGitGadget
2022-12-22 15:14       ` [PATCH v4 05/11] transport: rename got_remote_heads Derrick Stolee via GitGitGadget
2022-12-22 15:14       ` [PATCH v4 06/11] bundle-uri client: add helper for testing server Ævar Arnfjörð Bjarmason via GitGitGadget
2022-12-30 16:31         ` Jeff King
2023-01-05 19:09           ` Derrick Stolee
2023-01-06  8:48             ` [PATCH] test-bundle-uri: drop unused variables Jeff King
2023-01-06 14:13               ` Derrick Stolee
2022-12-22 15:14       ` [PATCH v4 07/11] bundle-uri: serve bundle.* keys from config Derrick Stolee via GitGitGadget
2022-12-22 15:14       ` [PATCH v4 08/11] strbuf: introduce strbuf_strip_file_from_path() Derrick Stolee via GitGitGadget
2022-12-22 15:14       ` [PATCH v4 09/11] bundle-uri: allow relative URLs in bundle lists Derrick Stolee via GitGitGadget
2022-12-22 15:14       ` [PATCH v4 10/11] bundle-uri: download bundles from an advertised list Derrick Stolee via GitGitGadget
2022-12-22 15:14       ` [PATCH v4 11/11] clone: unbundle the advertised bundles Derrick Stolee via GitGitGadget
2022-12-25 11:35       ` [PATCH v4 00/11] Bundle URIs IV: advertise over protocol v2 Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=508ba524-09d2-81f1-c5f8-815ab766791a@github.com \
    --to=vdye@github.com \
    --cc=avarab@gmail.com \
    --cc=chooglen@google.com \
    --cc=derrickstolee@github.com \
    --cc=dyroneteng@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=mjcheetham@outlook.com \
    --cc=newren@gmail.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).