From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A44F8C433EF for ; Tue, 3 May 2022 01:16:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229712AbiECBTh (ORCPT ); Mon, 2 May 2022 21:19:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230128AbiECBSN (ORCPT ); Mon, 2 May 2022 21:18:13 -0400 Received: from pb-smtp2.pobox.com (pb-smtp2.pobox.com [64.147.108.71]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E36ECE08 for ; Mon, 2 May 2022 18:14:38 -0700 (PDT) Received: from pb-smtp2.pobox.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 9023E1096BB; Mon, 2 May 2022 20:54:40 -0400 (EDT) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=r7+ziDC31rn/V74RgtVDiFRG7FQC1BJ4ZzXvyz VYM3c=; b=rcHTH3I0ybTh3o5IEleQXU1mR8tRyQPOiHL6xdXcGXbE/h1tZk3spq iAe+R11jfwAvXC4a+9IL9G6zDd8sR4XuOAz1X7GiT8ADThXTpvhBVzwhjIOVJzN9 GFYd2jvYJUM80rc/VNR4yUyXFWQnPChZKTQyQ/tzRUf1dH8D5hQEo= Received: from pb-smtp2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 77C041096BA; Mon, 2 May 2022 20:54:40 -0400 (EDT) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [34.83.65.128]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp2.pobox.com (Postfix) with ESMTPSA id C18931096B9; Mon, 2 May 2022 20:54:39 -0400 (EDT) (envelope-from junio@pobox.com) From: Junio C Hamano To: Calvin Wan Cc: git@vger.kernel.org, jonathantanmy@google.com, philipoakley@iee.email, johncai86@gmail.com, me@ttaylorr.com Subject: Re: [PATCH v4 5/8] transport: add client side capability to request object-info References: <20220328191112.3092139-1-calvinwan@google.com> <20220502170904.2770649-1-calvinwan@google.com> <20220502170904.2770649-6-calvinwan@google.com> Date: Mon, 02 May 2022 17:54:38 -0700 In-Reply-To: <20220502170904.2770649-6-calvinwan@google.com> (Calvin Wan's message of "Mon, 2 May 2022 17:09:01 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 9C8FABF8-CA7B-11EC-B669-CB998F0A682E-77302942!pb-smtp2.pobox.com Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Calvin Wan writes: > The question of version mismatch often comes up with client/server > relationships. There are two cases to consider here (assuming either > client or server functionality for object-info changes between the > different versions): > > 1. client version > server version > - client can request additional attributes from server > 2. server version > client version > - server can return additional info to the client > > The second case is a non-issue since the client would never be able to > request additional info from the server. That assumes that we can never remove attributes once we start supporting them, so it is not a non-issue exactly. The way we make it a non-issue is to make sure that the server returns no more than what the client has asked, i.e. the same way we deal with the first case. > In order to solve the first > case, recall an earlier patch where the server sends back the attributes > even if no object ids are sent by the client. This allows the client to > first check whether the server can accept the requested attributes > before sending the entire request. It requires two round-trips when the server does not know something the client has asked, one without any object names, and then a real request with the list of object names? I think in practice, because the way the previous step is structured, you can send a full request and in the non-error codepath the server side will understand everything and give full response back without an extra round-trip. Only in the error case, the client will see the set of acceptable attributes and no object data (and implicitly treat that as an indication of an error), but then what would the client do after that happens? - The client will just give up; the attributes it wanted but the server couldn't supply are so fundamentally necessary to perform whatever the client wanted to do. - The client will retry the request by asking a known subset of attributes for the same set of objects. This requires an extra round-trip. If an earlier step didn't make .unknown case bypass the response loop, this extra round-trip would have been totally unnecessary. > index 8c7752fc82..c27018d48c 100644 > --- a/fetch-pack.h > +++ b/fetch-pack.h > @@ -69,6 +69,12 @@ struct fetch_pack_args { > unsigned connectivity_checked:1; > }; > > +struct object_info_args { > + const struct string_list *object_info_options; > + const struct string_list *server_options; > + const struct oid_array *oids; > +}; These are const pointers because these lists are not owned by this structure and are borrowed from somebody else? > diff --git a/transport-helper.c b/transport-helper.c > index b4dbbabb0c..093946f7fd 100644 > --- a/transport-helper.c > +++ b/transport-helper.c > @@ -686,13 +686,16 @@ static int fetch_refs(struct transport *transport, > > /* > * If we reach here, then the server, the client, and/or the transport > - * helper does not support protocol v2. --negotiate-only requires > - * protocol v2. > + * helper does not support protocol v2. --negotiate-only and --object-info > + * require protocol v2. > */ OK. > if (data->transport_options.acked_commits) { > warning(_("--negotiate-only requires protocol v2")); > return -1; > } > + if (transport->smart_options->object_info) { > + die(_("--object-info requires protocol v2")); > + } OK. > diff --git a/transport.c b/transport.c > index 3d64a43ab3..08c505e1d0 100644 > --- a/transport.c > +++ b/transport.c > @@ -353,6 +353,79 @@ static struct ref *handshake(struct transport *transport, int for_push, > return refs; > } > > +static int fetch_object_info(struct transport *transport, struct object_info **object_info_data) > +{ > + size_t i; > + int size_index = -1; > + struct git_transport_data *data = transport->data; > + struct object_info_args args; > + struct packet_reader reader; > + struct string_list server_attributes = STRING_LIST_INIT_DUP; > + > + memset(&args, 0, sizeof(args)); > + args.server_options = transport->server_options; > + args.object_info_options = transport->smart_options->object_info_options; It is unclear who fills smart_options->object_info_options and from what. Are we referring to something that is not yet used, or worse yet, code that is not yet written? It seems that the result of applying 1-5/8 does not compile. > + connect_setup(transport, 0); > + packet_reader_init(&reader, data->fd[0], NULL, 0, > + PACKET_READ_CHOMP_NEWLINE | > + PACKET_READ_DIE_ON_ERR_PACKET); > + data->version = discover_version(&reader); > + > + transport->hash_algo = reader.hash_algo; > + > + switch (data->version) { > + case protocol_v2: > + if (!server_supports_v2("object-info", 0)) > + return -1; > + /** > + * Send a request with only attributes first. If server can return all > + * of the requested attributes, then send request with object ids > + */ Doesn't it force the most pessimistic behaviour, i.e. always do a likely-to-be-useless (when the feature becomes widely avaiable) probe round-trip before making a real request? > + send_object_info_request(data->fd[1], &args); > + if (packet_reader_read(&reader) != PACKET_READ_NORMAL) { > + check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected"); > + return -1; > + } All these check_stateless_delimiter() calls in this patch are on overly long lines. > + string_list_split(&server_attributes, reader.line, ' ', -1); > + packet_reader_read(&reader); > + check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected"); > + if (server_attributes.nr != args.object_info_options->nr) > + return -1; So, I am guessing (without having the code that prepares the right hand side) that we are comparing the number of attributes we asked (on the right hand side) and the number of attributes the other side said it supports in response to our request (on the left hand side). This does not protect against a server that responds with attribute names duplicated or any somewhat odd third-party servers that are not identical to what you wrote here. I would rather see us to loop through object_info_options array and see if server_attributes list says all of them are supported more explicitly. That can be done in the loop below, where you'd figure out the "foo_index" for each attribute "foo" you are supporting in the code. > + for (i = 0; i < server_attributes.nr; i++) { > + if (!strcmp(server_attributes.items[i].string, "size")) > + size_index = i + 1; > + } size_index was initialized to -1 and I was expecting we can tell the attribute is not used by checking (size_index < 0), but this seems to make size_index 1 based, not 0 based. Intended? I think this is to skip the object name that comes as the first item in the response, but it would be more "pure" to keep foo_index 0-based and add the offset by whatever constant number of things that come in front (currently, 1 for object name) near a lot closer to where we parse and read the data, i.e. in the while() loop below. > + args.oids = transport->smart_options->object_info_oids; > + send_object_info_request(data->fd[1], &args); And this is the second round-trip. > + break; > + case protocol_v1: > + case protocol_v0: > + die(_("wrong protocol version. expected v2")); > + case protocol_unknown_version: > + BUG("unknown protocol version"); > + } > + > + if (packet_reader_read(&reader) != PACKET_READ_NORMAL) > + die(_("error reading object info header")); > + i = 0; > + while (packet_reader_read(&reader) == PACKET_READ_NORMAL) { > + struct string_list object_info_values = STRING_LIST_INIT_DUP; > + > + string_list_split(&object_info_values, reader.line, ' ', -1); > + if (size_index > 0) { > + if (!strcmp(object_info_values.items[size_index].string, "")) > + die("object-info: not our ref %s", > + object_info_values.items[0].string); > + *(*object_info_data)[i].sizep = strtoul(object_info_values.items[size_index].string, NULL, 10); > + } > + i++; > + } > + check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected"); > + > + return 0; > +} All of the above look written for the best case scenario (i.e. it assumes there won't be any unexpected type of response), which makes a good fuzz target. A malicious server side can throw non-number in the "size" slot and strtoul() does not check for a malformatted number, or it can return more response records than it was requested, and overflow object_info_data[] array the caller of this function prepared. smart_options->object_info_oids may know how many response records we should expect, so at least this while() loop should be able to protect against an object_info_data[] overflow attack using it as the upper limit of i. > static struct ref *get_refs_via_connect(struct transport *transport, int for_push, > struct transport_ls_refs_options *options) > { > @@ -392,8 +465,14 @@ static int fetch_refs_via_pack(struct transport *transport, > args.server_options = transport->server_options; > args.negotiation_tips = data->options.negotiation_tips; > args.reject_shallow_remote = transport->smart_options->reject_shallow; > + args.object_info = transport->smart_options->object_info; > > - if (!data->got_remote_heads) { > + if (transport->smart_options && transport->smart_options->object_info) { > + if (!fetch_object_info(transport, data->options.object_info_data)) > + goto cleanup; > + ret = -1; > + goto cleanup; > + } else if (!data->got_remote_heads) { > int i; > int must_list_refs = 0; > for (i = 0; i < nr_heads; i++) { > diff --git a/transport.h b/transport.h > index 12bc08fc33..dbf60bb710 100644 > --- a/transport.h > +++ b/transport.h > @@ -6,6 +6,7 @@ > #include "remote.h" > #include "list-objects-filter-options.h" > #include "string-list.h" > +#include "object-store.h" > > struct git_transport_options { > unsigned thin : 1; > @@ -31,6 +32,12 @@ struct git_transport_options { > */ > unsigned connectivity_checked:1; > > + /* > + * Transport will attempt to pull only object-info. Fallbacks > + * to pulling entire object if object-info is not supported > + */ > + unsigned object_info : 1; > + > int depth; > const char *deepen_since; > const struct string_list *deepen_not; > @@ -54,6 +61,10 @@ struct git_transport_options { > * common commits to this oidset instead of fetching any packfiles. > */ > struct oidset *acked_commits; > + > + struct oid_array *object_info_oids; > + struct object_info **object_info_data; > + struct string_list *object_info_options; > }; > > enum transport_family {