All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>, git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Patrick Steinhardt <ps@pks.im>,
	Christian Couder <christian.couder@gmail.com>,
	Albert Cui <albertqcui@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	"brian m . carlson" <sandals@crustytoothpaste.net>,
	"Robin H . Johnson" <robbat2@gentoo.org>
Subject: Re: [PATCH 2/3] protocol v2: specify static seeding of clone/fetch via "bundle-uri"
Date: Tue, 26 Oct 2021 10:00:10 -0400	[thread overview]
Message-ID: <f2269fc7-1688-d62e-02bb-01c5b5e33143@gmail.com> (raw)
In-Reply-To: <patch-2.3-3ac0539c053-20211025T211159Z-avarab@gmail.com>

On 10/25/2021 5:25 PM, Ævar Arnfjörð Bjarmason wrote:
> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
> +bundle-uri CLIENT ERROR RECOVERY
> +++++++++++++++++++++++++++++++++
> +
> +A client MUST above all gracefully degrade on errors, whether that
> +error is because of bad missing/data in the bundle URI(s), because
> +that client is too dumb to e.g. understand and fully parse out bundle
> +headers and their prerequisite relationships, or something else.

"too dumb" seems a bit informal to me, especially because you
immediately elaborate on its meaning. You could rewrite as follows:

  ...because
  that client can't understand or fully parse out bundle
  headers and their prerequisite relationships, or something else.

> +Server operators should feel confident in turning on "bundle-uri" and
> +not worry if e.g. their CDN goes down that clones or fetches will run
> +into hard failures. Even if the server bundle bundle(s) are
> +incomplete, or bad in some way the client should still end up with a
> +functioning repository, just as if it had chosen not to use this
> +protocol extension.

Also, insertions of "e.g." in the middle of a sentence don't flow well.

  Server operators should feel confident in turning on "bundle-uri" and
  not worry that failures such as the CDN being unavailable will cause
  clones or fetches to have hard failures. Even if the server bundle(s)
  are invalid, the client should still end up with a functioning
  repository, just as if it had chosen not to use this protocol extension.

(Note: I also removed a "bundle bundle(s)" that was split across a line
break.)

> +bundle-uri SERVER TO CLIENT
> ++++++++++++++++++++++++++++
> +
> +The ordering of the returned bundle uris is not significant. Clients

I'm late to noticing, but shouldn't "URI" be all-caps when not used in
the literal capability string "bundle-uri"?

> +bundle-uri CLIENT TO SERVER
> ++++++++++++++++++++++++++++
> +
> +The client SHOULD provide reference tips found in the bundle header(s)
> +as 'have' lines in any subsequent `fetch` request. A client MAY also
> +ignore the bundle(s) entirely if doing so is deemed worse for some
> +reason, e.g. if the bundles can't be downloaded, it doesn't like the
> +tips it finds etc.

I would just stop after "is deemed worse for some reason." because one
example is obvious and the other is unclear how the client would detect
that situation. (Maybe: tip commit timestamps are really old?)

> +
> +WHEN ADVERTISED BUNDLE(S) REQUIRE NO FURTHER NEGOTIATION
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> +
> +If after issuing `bundle-uri` and `ls-refs`, and getting the header(s)
> +of the bundle(s) the client finds that the ref tips it wants can be
> +retrieved entirety from advertised bundle(s), it MAY disconnect. The

s/entirety/entirely/

> +results of such a 'clone' or 'fetch' should be indistinguishable from
> +the state attained without using bundle-uri.
> +
> +EARLY CLIENT DISCONNECTIONS AND ERROR RECOVERY
> +++++++++++++++++++++++++++++++++++++++++++++++
> +
> +A client MAY perform an early disconnect while still downloading the
> +bundle(s) (having streamed and parsed their headers). In such a case
> +the client MUST gracefully recover from any errors related to
> +finishing the download and validation of the bundle(s).
> +
> +I.e. a client might need to re-connect and issue a 'fetch' command,
> +and possibly fall back to not making use of 'bundle-uri' at all.

Use "For example," over starting a sentence with "i.e.". The examples
of "i.e." and "e.g." already in this document show proper use, which
involves parentheses.

> +This "MAY" behavior is specified as such (and not a "SHOULD") on the
> +assumption that a server advertising bundle uris is more likely than
> +not to be serving up a relatively large repository, and to be pointing
> +to URIs that have a good chance of being in working order. A client
> +MAY e.g. look at the payload size of the bundles as a heuristic to see

Again, here, the entire sentence is an example. This "e.g." can be
removed with no loss of meaning.

> +if an early disconnect is worth it, should falling back on a full
> +"fetch" dialog be necessary.


> +While no per-URI key-value are currently supported currently they're
> +intended to support future features such as:
> +
> + * Add a "hash=<val>" or "size=<bytes>" advertise the expected hash or
> +   size of the bundle file.

I suppose if one wanted to add this server-to-bundle coupling, then some
clients might appreciate it.

> + * Advertise that one or more bundle files are the same (to e.g. have
> +   clients round-robin or otherwise choose one of N possible files).

  * Advertise that one or more bundle files are the same, to allow for
    redundancy without causing duplicated effort.

> +static void send_bundle_uris(struct packet_writer *writer,
> +			     struct string_list *uris)
> +{
> +	struct string_list_item *item;
> +
> +	for_each_string_list_item(item, uris)
> +		packet_writer_write(writer, "%s", item->string);
> +}
> +
> +static int advertise_bundle_uri = -1;
> +static struct string_list bundle_uris = STRING_LIST_INIT_DUP;

I see you put send_bundle_uris() before the global bundle_uris so
it can be independent, but do you expect anyone to call send_bundle_uris()
via a different list?

Should we find a different place to store this data?

> +static int bundle_uri_config(const char *var, const char *value, void *data)
> +{
> +	if (!strcmp(var, "uploadpack.bundleuri")) {
> +		advertise_bundle_uri = 1;
> +		string_list_append(&bundle_uris, value);
> +	}
> +
> +	return 0;
> +}

Here, we are dictating that the URI list is available as a multi-valued
config "uploadpack.bundleuri".

1. Should this be updated in Documentation/config/uploadpack.txt?

2. This seems difficult to extend to your possible future features as
   listed in the protocol docs, mainly because this can only store the
   flat URI string. To add things like hash values, sizes, and prereqs,
   you would need more data included and grouped on a per-URI basis.
   What plans do you have to make extensions here while remaining
   somewhat compatible with downgrading Git versions?

> @@ -136,6 +137,11 @@ static struct protocol_capability capabilities[] = {
>  		.advertise = always_advertise,
>  		.command = cap_object_info,
>  	},
> +	{
> +		.name = "bundle-uri",
> +		.advertise = bundle_uri_advertise,
> +		.command = bundle_uri_command,
> +	},
>  };

I really appreciate that it is this simple to extend protocol v2.

> +test_expect_success 'basics of bundle-uri: dies if not enabled' '
> +	test-tool pkt-line pack >in <<-EOF &&
> +	command=bundle-uri
> +	0000
> +	EOF
> +
> +	cat >err.expect <<-\EOF &&
> +	fatal: invalid command '"'"'bundle-uri'"'"'
> +	EOF
> +
> +	cat >expect <<-\EOF &&
> +	ERR serve: invalid command '"'"'bundle-uri'"'"'
> +	EOF
> +
> +	test_must_fail test-tool serve-v2 --stateless-rpc <in >out 2>err.actual &&
> +	test_cmp err.expect err.actual &&
> +	test_must_be_empty out
> +'
> +
> +

hyper-nit: double newline.

The implementation seems simple enough, which I like. I'm a bit
your current use of Git config as the back-end, only because it is
difficult to be future-proof. As the functionality stands today, the
current config design works just fine. Perhaps we don't need to
worry about the future, because we can design a new, complementary
storage for that extra data. It seems worth exploring for a little
while, though. Perhaps we should take a page out of 'git-remote'
and how it stores named remotes with sub-items for metadata. The
names probably don't need to ever be exposed to users, but it could
be beneficial to anyone implementing this scheme.

[bundle "main"]
	uri = https://example.com/my-bundle
	uri = https://redundant-cdn.com/my-bundle
	size = 120523
	sha256 = {64hexchars}

[bundle "fork"]
	uri = https://cdn.org/my-fork
	size = 334
	sha256 = {...}
	prereq = {oid}

This kind of layout has an immediate grouping of data that should
help any future plan. Notice how I included multiple "uri" lines
in the "main", which helps with your plan for duplicate URIs.

Thanks,
-Stolee

  reply	other threads:[~2021-10-26 14:00 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-25 21:25 [PATCH 0/3] bundle-uri: "dumb" static CDN offloading, spec & server implementation Ævar Arnfjörð Bjarmason
2021-10-25 21:25 ` [PATCH 1/3] leak tests: mark t5701-git-serve.sh as passing SANITIZE=leak Ævar Arnfjörð Bjarmason
2021-10-25 21:25 ` [PATCH 2/3] protocol v2: specify static seeding of clone/fetch via "bundle-uri" Ævar Arnfjörð Bjarmason
2021-10-26 14:00   ` Derrick Stolee [this message]
2021-10-26 15:00     ` Ævar Arnfjörð Bjarmason
2021-10-27  1:55       ` Derrick Stolee
2021-10-27 17:49         ` Ævar Arnfjörð Bjarmason
2021-10-27  2:01   ` Derrick Stolee
2021-10-27  8:29     ` Ævar Arnfjörð Bjarmason
2021-10-27 16:31       ` Derrick Stolee
2021-10-27 18:01         ` Ævar Arnfjörð Bjarmason
2021-10-27 19:23           ` Derrick Stolee
2021-10-27 20:22             ` Ævar Arnfjörð Bjarmason
2021-10-29 18:30               ` Derrick Stolee
2021-10-30 14:51           ` Philip Oakley
2021-10-25 21:25 ` [PATCH 3/3] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2021-10-26 14:05   ` Derrick Stolee
2021-10-29 18:46 ` [PATCH 0/3] bundle-uri: "dumb" static CDN offloading, spec & server implementation Derrick Stolee
2021-10-30  7:21   ` Ævar Arnfjörð Bjarmason
2021-11-01 21:00     ` Derrick Stolee
2021-11-01 23:18       ` Ævar Arnfjörð Bjarmason
2022-03-11 16:24 ` [RFC PATCH v2 00/13] bundle-uri: a "dumb CDN" for git Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 01/13] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 02/13] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 03/13] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 04/13] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 05/13] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 06/13] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 07/13] bundle-uri client: add transfer.injectBundleURI support Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 08/13] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 09/13] fetch-pack: add a deref_without_lazy_fetch_extended() Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 10/13] fetch-pack: move --keep=* option filling to a function Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 11/13] bundle.h: make "fd" version of read_bundle_header() public Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 12/13] bundle-uri client: support for bundle-uri with "clone" Ævar Arnfjörð Bjarmason
2022-03-11 16:24   ` [RFC PATCH v2 13/13] bundle-uri: make the download program configurable Ævar Arnfjörð Bjarmason
2022-03-11 21:28   ` [RFC PATCH v2 00/13] bundle-uri: a "dumb CDN" for git Derrick Stolee
2022-04-18 17:23   ` [RFC PATCH v2 00/36] bundle-uri: a "dumb CDN" for git + TOC format Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 01/36] connect.c: refactor sending of agent & object-format Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 02/36] dir API: add a generalized path_match_flags() function Ævar Arnfjörð Bjarmason
2022-04-21 17:26       ` Derrick Stolee
2022-04-18 17:23     ` [RFC PATCH v2 03/36] fetch-pack: add a deref_without_lazy_fetch_extended() Ævar Arnfjörð Bjarmason
2022-04-21 17:28       ` Derrick Stolee
2022-04-18 17:23     ` [RFC PATCH v2 04/36] fetch-pack: move --keep=* option filling to a function Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 05/36] http: make http_get_file() external Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 06/36] remote: move relative_url() Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 07/36] remote: allow relative_url() to return an absolute url Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 08/36] bundle.h: make "fd" version of read_bundle_header() public Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 09/36] protocol v2: add server-side "bundle-uri" skeleton Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 10/36] bundle-uri client: add "bundle-uri" parsing + tests Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 11/36] bundle-uri client: add minimal NOOP client Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 12/36] bundle-uri client: add "git ls-remote-bundle-uri" Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 13/36] bundle-uri client: add transfer.injectBundleURI support Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 14/36] bundle-uri client: add boolean transfer.bundleURI setting Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 15/36] bundle-uri client: support for bundle-uri with "clone" Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 16/36] bundle-uri: make the download program configurable Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 17/36] remote-curl: add 'get' capability Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 18/36] bundle: implement 'fetch' command for direct bundles Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 19/36] bundle: parse table of contents during 'fetch' Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 20/36] bundle: add --filter option to 'fetch' Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 21/36] bundle: allow relative URLs in table of contents Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 22/36] bundle: make it easy to call 'git bundle fetch' Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 23/36] clone: add --bundle-uri option Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 24/36] clone: --bundle-uri cannot be combined with --depth Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 25/36] bundle: only fetch bundles if timestamp is new Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 26/36] fetch: fetch bundles before fetching original data Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 27/36] protocol-caps: implement cap_features() Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 28/36] serve: understand but do not advertise 'features' capability Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 29/36] serve: advertise 'features' when config exists Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 30/36] connect: implement get_recommended_features() Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 31/36] transport: add connections for 'features' capability Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 32/36] clone: use server-recommended bundle URI Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 33/36] t5601: basic bundle URI test Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 34/36] protocol v2: add server-side "bundle-uri" skeleton (docs) Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 35/36] bundle-uri docs: add design notes Ævar Arnfjörð Bjarmason
2022-04-18 17:23     ` [RFC PATCH v2 36/36] docs: document bundle URI standard Ævar Arnfjörð Bjarmason
2022-04-21 19:54     ` [RFC PATCH v2 00/36] bundle-uri: a "dumb CDN" for git + TOC format Derrick Stolee
2022-04-22  9:37       ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2269fc7-1688-d62e-02bb-01c5b5e33143@gmail.com \
    --to=stolee@gmail.com \
    --cc=albertqcui@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=robbat2@gentoo.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.