All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: steadmon@google.com
Cc: git@vger.kernel.org, Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH v6 1/1] protocol: advertise multiple supported versions
Date: Tue,  5 Feb 2019 11:42:26 -0800	[thread overview]
Message-ID: <20190205194226.172769-1-jonathantanmy@google.com> (raw)
In-Reply-To: <10039ca163a9ca26595d3bbfcf70bc2ca02666f9.1545342797.git.steadmon@google.com>

Thanks. Overall, one major point: I think that the declaration of
supported versions need to be transport-scoped (search this email for
"transport-scoped" for context). And one medium point: it might be
better for protocol.version to be the preferred and maximum version, not
only the preferred version. I have other minor points, which you can
read below.

> Currently the client advertises that it supports the wire protocol
> version set in the protocol.version config. However, not all services
> support the same set of protocol versions. For example, git-receive-pack
> supports v1 and v0, but not v2. 

That's true, except <see next paragraph>.

> If a client connects to git-receive-pack
> and requests v2, it will instead be downgraded to v0.

This is a bit misleading. The client never requests v2 - connect.c has
an override that changes it to v0 (as can be seen from the diff of this
patch) before the request takes place. If we were to create a custom
client that requests v2, the server would die with "support for protocol
v2 not implemented yet". (But maybe this is a bug - the server should
downgrade instead.)

> Other services,
> such as git-upload-archive, do not do any version negotiation checks.
> 
> This patch creates a protocol version registry. Individual client and
> server programs register all the protocol versions they support prior to
> communicating with a remote instance. Versions should be listed in
> preference order; the version specified in protocol.version will
> automatically be moved to the front of the registry.

It took me a while to understand what's going on, so let me try to
summarize:

 - The main issue is that send-pack doesn't support v2, so it would be
   incorrect for the client to advertise v2; this is currently fine
   because of the override in connect.c. A side issue is that the client
   advertises v2 for other things (e.g. archive), which currently works
   only because the server ignores them.

 - To solve both these issues, each command declares which protocols it
   supports.

 - Because we now have such a list, it is not too difficult to pass the
   entire list to the server, so we should do so. (Servers already
   support multiple versions.)

> The protocol version registry is passed to remote helpers via the
> GIT_PROTOCOL environment variable.

This is unfortunate to me, but I can't think of a better way to do it -
we do need to communicate the allowed version list somehow. A transport
option cannot be ignored by an old remote helper, but an envvar can.

Backwards/forwards compatibility notes:

 - If an old Git uses a new remote helper, no GIT_PROTOCOL is passed so
   the remote helper uses v0 regardless of what's configured in the
   repo. This should be fine.
 - If a new Git uses an old remote helper, what's configured gets used
   (unless it's for send-pack, in which case a version override occurs),
   regardless of which versions Git declares that it supports. I can
   envision some situations in which this would trip us up, but we are
   already in this exact situation anyway.

> Additionally, remove special cases around advertising version=0.
> Previously we avoided adding version advertisements to the client's
> initial connection request if the client wanted version=0. However,
> including these advertisements does not change the version negotiation
> behavior, so it's better to have simpler code. As a side effect, this
> means that client operations over SSH will always include a
> "SendEnv=GIT_PROTOCOL" option on the SSH command line.

HTTP too, I think?

Also, this means that the client will send different information to the
server now, but it seems fine (as far as I know, clients already send
Git version information anyway).

Overall, I would write the commit message this way:

    A Git client can communicate with a Git server through various
    commands, and for each command, in various protocol versions. Each
    repo can define protocol.version as 0, 1, or 2 for the protocol
    version that it prefers. But this integer is handled differently by
    each command: if communicating with "upload-pack", the integer is
    passed verbatim to be interpreted by the server; if communicating
    with "receive-pack", the integer is hardcoded to be overridden to 0
    or 1; and if communicating with anything else (e.g. "archive"), the
    integer is passed verbatim but ignored by the server.

    Handle each command the same way by having each client command
    declare which versions it supports, and whenever a command is
    invoked, send its list of versions to the server (instead of one
    single version, as is done currently). The protocol.version is now
    used as a hint to which version the client prefers; if it is present
    in the list, it is moved to the front, otherwise the list is sent
    unchanged.

    The server already supports multiple versions, but currently chooses
    the greatest. Make the server choose the first supported version
    instead.

    Because remote helpers now need this list of versions, pass the list
    of versions as the GIT_PROTOCOL environment variable. (New remote
    helpers will make use of this, and old remote helpers will use the
    repo configuration.)

    <special case about advertising version=0>

    <unnecessary externs>

> diff --git a/builtin/archive.c b/builtin/archive.c
> index e74f675390..8adb9f381b 100644
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -8,6 +8,7 @@
>  #include "transport.h"
>  #include "parse-options.h"
>  #include "pkt-line.h"
> +#include "protocol.h"
>  #include "sideband.h"
>  
>  static void create_output_file(const char *output_file)
> @@ -94,6 +95,8 @@ int cmd_archive(int argc, const char **argv, const char *prefix)
>  		OPT_END()
>  	};
>  
> +	register_allowed_protocol_version(protocol_v0);
> +
>  	argc = parse_options(argc, argv, prefix, local_opts, NULL,
>  			     PARSE_OPT_KEEP_ALL);
>  

What happens if a command never declares any version? It might be best
to explicitly allow such a case, and treat it as a single list of
protocol_v0.

> diff --git a/connect.c b/connect.c
> index 94547e5056..57266b6cec 100644
> --- a/connect.c
> +++ b/connect.c
> @@ -1046,7 +1046,7 @@ static enum ssh_variant determine_ssh_variant(const char *ssh_command,
>   */
>  static struct child_process *git_connect_git(int fd[2], char *hostandport,
>  					     const char *path, const char *prog,
> -					     enum protocol_version version,
> +					     const struct strbuf *version_advert,
>  					     int flags)
>  {
>  	struct child_process *conn;
> @@ -1084,12 +1084,9 @@ static struct child_process *git_connect_git(int fd[2], char *hostandport,
>  		    prog, path, 0,
>  		    target_host, 0);
>  
> -	/* If using a new version put that stuff here after a second null byte */
> -	if (version > 0) {
> -		strbuf_addch(&request, '\0');
> -		strbuf_addf(&request, "version=%d%c",
> -			    version, '\0');
> -	}
> +	/* Add version fields after a second null byte */
> +	strbuf_addch(&request, '\0');
> +	strbuf_addf(&request, "%s%c", version_advert->buf, '\0');
>  
>  	packet_write(fd[1], request.buf, request.len);

As expected, the version information for git:// is now written
differently.

> @@ -1104,14 +1101,13 @@ static struct child_process *git_connect_git(int fd[2], char *hostandport,
>   */
>  static void push_ssh_options(struct argv_array *args, struct argv_array *env,
>  			     enum ssh_variant variant, const char *port,
> -			     enum protocol_version version, int flags)
> +			     const struct strbuf *version_advert, int flags)
>  {
> -	if (variant == VARIANT_SSH &&
> -	    version > 0) {
> +	if (variant == VARIANT_SSH) {
>  		argv_array_push(args, "-o");
>  		argv_array_push(args, "SendEnv=" GIT_PROTOCOL_ENVIRONMENT);
> -		argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=version=%d",
> -				 version);
> +		argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=%s",
> +				 version_advert->buf);
>  	}
>  
>  	if (flags & CONNECT_IPV4) {

Likewise for ssh://.

[snip other ssh code]

> @@ -1226,16 +1223,10 @@ struct child_process *git_connect(int fd[2], const char *url,
>  {
>  	char *hostandport, *path;
>  	struct child_process *conn;
> +	struct strbuf version_advert = STRBUF_INIT;
>  	enum protocol protocol;
> -	enum protocol_version version = get_protocol_version_config();
>  
> -	/*
> -	 * NEEDSWORK: If we are trying to use protocol v2 and we are planning
> -	 * to perform a push, then fallback to v0 since the client doesn't know
> -	 * how to push yet using v2.
> -	 */
> -	if (version == protocol_v2 && !strcmp("git-receive-pack", prog))
> -		version = protocol_v0;
> +	get_client_protocol_version_advertisement(&version_advert);
>  
>  	/* Without this we cannot rely on waitpid() to tell
>  	 * what happened to our children.

It's great that this hardcoded override is now removed. I see below (in
another file) that the same occurs for http://.

> +void register_allowed_protocol_versions_from_env(void)
> +{
> +	const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
> +	const char *version_str;
> +	struct string_list version_list = STRING_LIST_INIT_DUP;
> +	struct string_list_item *version;
> +
> +	if (!git_protocol)
> +		return;
> +
> +	string_list_split(&version_list, git_protocol, ':', -1);
> +	for_each_string_list_item(version, &version_list) {
> +		if (skip_prefix(version->string, "version=", &version_str))
> +			register_allowed_protocol_version(
> +				parse_protocol_version(version_str));

Probably worth BUG() if does not start with "version="?

> +	}
> +	string_list_clear(&version_list, 0);
> +}

[snip]

> +void get_client_protocol_version_advertisement(struct strbuf *advert)
> +{
> +	int i, tmp_nr = nr_allowed_versions;
> +	enum protocol_version *tmp_allowed_versions, config_version;
> +	strbuf_reset(advert);
> +
> +	version_registration_locked = 1;
> +
> +	config_version = get_protocol_version_config();
> +	if (config_version == protocol_v0) {
> +		strbuf_addstr(advert, "version=0");
> +		return;
> +	}

Why is protocol v0 special-cased like this?

> +	if (tmp_nr > 0) {
> +		ALLOC_ARRAY(tmp_allowed_versions, tmp_nr);
> +		copy_array(tmp_allowed_versions, allowed_versions, tmp_nr,
> +			   sizeof(enum protocol_version));
> +	} else {
> +		ALLOC_ARRAY(tmp_allowed_versions, 1);
> +		tmp_nr = 1;
> +		tmp_allowed_versions[0] = config_version;

In this "else" block, wouldn't this mean that if we forget to declare
any versions, we might send an unsupported version to the server? I
didn't check this, but it might happen if we invoked
transport_fetch_refs() without going through any of the builtins (which
might happen in the case of a lazy fetch in a partial clone).

> +	}
> +
> +	if (tmp_allowed_versions[0] != config_version)
> +		for (i = 1; i < nr_allowed_versions; i++)
> +			if (tmp_allowed_versions[i] == config_version) {
> +				SWAP(tmp_allowed_versions[0],
> +				     tmp_allowed_versions[i]);
> +			}

It doesn't seem right to just directly swap the config_version with the
front.

Also, I think this is more complicated that it needs to be. You could
just search for the config version; if found, loop backwards starting
from the index, shuffling to the right, then put the config version in
element 0.

> +
> +	strbuf_addf(advert, "version=%s",
> +		    format_protocol_version(tmp_allowed_versions[0]));
> +	for (i = 1; i < tmp_nr; i++)
> +		strbuf_addf(advert, ":version=%s",
> +			    format_protocol_version(tmp_allowed_versions[i]));
> +
> +	free(tmp_allowed_versions);
> +}

[snip]

>  	const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
> @@ -38,9 +142,10 @@ enum protocol_version determine_protocol_version_server(void)
>  	/*
>  	 * Determine which protocol version the client has requested.  Since
>  	 * multiple 'version' keys can be sent by the client, indicating that
> -	 * the client is okay to speak any of them, select the greatest version
> -	 * that the client has requested.  This is due to the assumption that
> -	 * the most recent protocol version will be the most state-of-the-art.
> +	 * the client is okay to speak any of them, select the first
> +	 * recognizable version that the client has requested.  This is due to
> +	 * the assumption that the protocol versions will be listed in
> +	 * preference order.
>  	 */
>  	if (git_protocol) {
>  		struct string_list list = STRING_LIST_INIT_DUP;
> @@ -53,8 +158,11 @@ enum protocol_version determine_protocol_version_server(void)
>  
>  			if (skip_prefix(item->string, "version=", &value)) {
>  				v = parse_protocol_version(value);
> -				if (v > version)
> +				if (v != protocol_unknown_version &&
> +				    is_allowed_protocol_version(v)) {
>  					version = v;
> +					break;
> +				}
>  			}
>  		}

Here we see that the server now takes the first version, as discussed.

> diff --git a/protocol.h b/protocol.h
> index 2ad35e433c..88330fd0ee 100644
> --- a/protocol.h
> +++ b/protocol.h
> @@ -14,7 +14,24 @@ enum protocol_version {
>   * 'protocol.version' config.  If unconfigured, a value of 'protocol_v0' is
>   * returned.
>   */
> -extern enum protocol_version get_protocol_version_config(void);
> +enum protocol_version get_protocol_version_config(void);
> +
> +/*
> + * Register an allowable protocol version for a given operation. Registration
> + * must occur before attempting to advertise a version to a server process.
> + */
> +void register_allowed_protocol_version(enum protocol_version version);

I think the allowed protocol versions have to be transport-scoped. A
single builtin command may invoke multiple remote commands (as will be
the case if a lazy fetch in a partial clone is ever required).

> @@ -360,16 +373,10 @@ static struct discovery *discover_refs(const char *service, int for_push)
>  		strbuf_addf(&refs_url, "service=%s", service);
>  	}
>  
> -	/*
> -	 * NEEDSWORK: If we are trying to use protocol v2 and we are planning
> -	 * to perform a push, then fallback to v0 since the client doesn't know
> -	 * how to push yet using v2.
> -	 */
> -	if (version == protocol_v2 && !strcmp("git-receive-pack", service))
> -		version = protocol_v0;
> +	get_client_protocol_version_advertisement(&version_advert);
>  
>  	/* Add the extra Git-Protocol header */
> -	if (get_protocol_http_header(version, &protocol_header))
> +	if (get_client_protocol_http_header(&version_advert, &protocol_header))
>  		string_list_append(&extra_headers, protocol_header.buf);
>  
>  	memset(&http_options, 0, sizeof(http_options));

Great to see that this hardcoded override is also removed for http://.

> @@ -489,10 +489,8 @@ test_expect_success 'push with http:// and a config of v2 does not request v2' '
>  	git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" log -1 --format=%s client_branch >expect &&
>  	test_cmp expect actual &&
>  
> -	# Client didnt request to use protocol v2
> -	! grep "Git-Protocol: version=2" log &&
> -	# Server didnt respond using protocol v2
> -	! grep "git< version 2" log
> +	# Server responded with version 1
> +	grep "git< version 1" log
>  '

Change the test title - the client indeed would request 2, but since 1
is in front, the server will use version 1.

  reply	other threads:[~2019-02-05 19:42 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-02 21:59 [PATCH 0/1] Limit client version advertisements Josh Steadmon
2018-10-02 21:59 ` [PATCH 1/1] protocol: limit max protocol version per service Josh Steadmon
2018-10-02 22:28   ` Stefan Beller
2018-10-03 21:33     ` Josh Steadmon
2018-10-03 22:47       ` Stefan Beller
2018-10-05  0:18         ` Josh Steadmon
2018-10-05 19:25           ` Stefan Beller
2018-10-10 23:53             ` Josh Steadmon
2018-10-12 23:30               ` Jonathan Nieder
2018-10-12 23:32               ` Jonathan Nieder
2018-10-12 23:45                 ` Josh Steadmon
2018-10-12 23:53                   ` Jonathan Nieder
2018-10-13  7:58                     ` Junio C Hamano
2018-10-12  1:02 ` [PATCH v2 0/1] Advertise multiple supported proto versions steadmon
2018-10-12  1:02   ` [PATCH v2 1/1] protocol: advertise multiple supported versions steadmon
2018-10-12 22:30     ` Stefan Beller
2018-10-22 22:55       ` Josh Steadmon
2018-10-23  0:37         ` Stefan Beller
2018-11-12 21:49   ` [PATCH v3 0/1] Advertise multiple supported proto versions steadmon
2018-11-12 21:49     ` [PATCH v3 1/1] protocol: advertise multiple supported versions steadmon
2018-11-12 22:33       ` Stefan Beller
2018-11-13  3:24         ` Re* " Junio C Hamano
2018-11-13 19:21           ` Stefan Beller
2018-11-14  2:31             ` Junio C Hamano
2018-11-13  4:01       ` Junio C Hamano
2018-11-13 22:53         ` Josh Steadmon
2018-11-14  2:38           ` Junio C Hamano
2018-11-14 19:57             ` Josh Steadmon
2018-11-16  2:45               ` Junio C Hamano
2018-11-16 19:59                 ` Josh Steadmon
2018-11-13 13:35       ` Junio C Hamano
2018-11-13 18:28       ` SZEDER Gábor
2018-11-13 23:03         ` Josh Steadmon
2018-11-14  0:47           ` SZEDER Gábor
2018-11-14  2:44         ` Junio C Hamano
2018-11-14 11:01           ` SZEDER Gábor
2018-11-14  2:30     ` [PATCH v4 0/1] Advertise multiple supported proto versions Josh Steadmon
2018-11-14  2:30       ` [PATCH v4 1/1] protocol: advertise multiple supported versions Josh Steadmon
2018-11-14 10:22       ` [PATCH v4 0/1] Advertise multiple supported proto versions Junio C Hamano
2018-11-14 19:51         ` Josh Steadmon
2018-11-16 10:46           ` Junio C Hamano
2018-11-16 22:34       ` [PATCH v5 " Josh Steadmon
2018-11-16 22:34         ` [PATCH v5 1/1] protocol: advertise multiple supported versions Josh Steadmon
2018-12-14 20:20           ` Ævar Arnfjörð Bjarmason
2018-12-14 21:58             ` Josh Steadmon
2018-12-14 22:39               ` Ævar Arnfjörð Bjarmason
2018-12-18 23:05                 ` Josh Steadmon
2018-12-20 21:58 ` [PATCH v6 0/1] Advertise multiple supported proto versions Josh Steadmon
2018-12-20 21:58   ` [PATCH v6 1/1] protocol: advertise multiple supported versions Josh Steadmon
2019-02-05 19:42     ` Jonathan Tan [this message]
2019-02-07 23:58       ` Josh Steadmon
2019-02-11 18:53         ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190205194226.172769-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.