git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Benjamin Flesch <benjaminflesch@icloud.com>
Subject: [PATCH 3/9] upload-pack: use oidset for deepen_not list
Date: Wed, 28 Feb 2024 17:37:44 -0500	[thread overview]
Message-ID: <20240228223744.GC1158131@coredump.intra.peff.net> (raw)
In-Reply-To: <20240228223700.GA1157826@coredump.intra.peff.net>

We record the oid of every deepen-not line the client sends to us. For a
well-behaved client, the resulting array should be bounded by the number
of unique refs we have. But because there's no de-duplication, a
malicious client can cause the array to grow unbounded by just sending
the same "refs/heads/foo" over and over (assuming such a ref exists).

Since the deepen-not list is just being fed to a "rev-list --not"
traversal, the order of items doesn't matter. So we can replace the
oid_array with an oidset which notices and skips duplicates.

That bounds the memory in malicious cases to be linear in the number of
unique refs. And even in non-malicious cases, there may be a slight
improvement in memory usage if multiple refs point to the same oid
(though in practice this list is probably pretty tiny anyway, as it
comes from the user specifying "--shallow-exclude" on the client fetch).

Note that in the trace2 output we'll now output the number of
de-duplicated objects, rather than the total number of "deepen-not"
lines we received. This is arguably a more useful value for tracing /
debugging anyway.

Reported-by: Benjamin Flesch <benjaminflesch@icloud.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 upload-pack.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index f6395b210e..ebad9026d7 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -65,7 +65,7 @@ struct upload_pack_data {
 	struct strvec hidden_refs;
 
 	struct object_array shallows;
-	struct oid_array deepen_not;
+	struct oidset deepen_not;
 	struct object_array extra_edge_obj;
 	int depth;
 	timestamp_t deepen_since;
@@ -125,7 +125,7 @@ static void upload_pack_data_init(struct upload_pack_data *data)
 	struct object_array want_obj = OBJECT_ARRAY_INIT;
 	struct object_array have_obj = OBJECT_ARRAY_INIT;
 	struct object_array shallows = OBJECT_ARRAY_INIT;
-	struct oid_array deepen_not = OID_ARRAY_INIT;
+	struct oidset deepen_not = OID_ARRAY_INIT;
 	struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 	struct object_array extra_edge_obj = OBJECT_ARRAY_INIT;
 	struct string_list allowed_filters = STRING_LIST_INIT_DUP;
@@ -158,7 +158,7 @@ static void upload_pack_data_clear(struct upload_pack_data *data)
 	object_array_clear(&data->want_obj);
 	object_array_clear(&data->have_obj);
 	object_array_clear(&data->shallows);
-	oid_array_clear(&data->deepen_not);
+	oidset_clear(&data->deepen_not);
 	object_array_clear(&data->extra_edge_obj);
 	list_objects_filter_release(&data->filter_options);
 	string_list_clear(&data->allowed_filters, 0);
@@ -923,12 +923,13 @@ static int send_shallow_list(struct upload_pack_data *data)
 		strvec_push(&av, "rev-list");
 		if (data->deepen_since)
 			strvec_pushf(&av, "--max-age=%"PRItime, data->deepen_since);
-		if (data->deepen_not.nr) {
+		if (oidset_size(&data->deepen_not)) {
+			const struct object_id *oid;
+			struct oidset_iter iter;
 			strvec_push(&av, "--not");
-			for (i = 0; i < data->deepen_not.nr; i++) {
-				struct object_id *oid = data->deepen_not.oid + i;
+			oidset_iter_init(&data->deepen_not, &iter);
+			while ((oid = oidset_iter_next(&iter)))
 				strvec_push(&av, oid_to_hex(oid));
-			}
 			strvec_push(&av, "--not");
 		}
 		for (i = 0; i < data->want_obj.nr; i++) {
@@ -1004,15 +1005,15 @@ static int process_deepen_since(const char *line, timestamp_t *deepen_since, int
 	return 0;
 }
 
-static int process_deepen_not(const char *line, struct oid_array *deepen_not, int *deepen_rev_list)
+static int process_deepen_not(const char *line, struct oidset *deepen_not, int *deepen_rev_list)
 {
 	const char *arg;
 	if (skip_prefix(line, "deepen-not ", &arg)) {
 		char *ref = NULL;
 		struct object_id oid;
 		if (expand_ref(the_repository, arg, strlen(arg), &oid, &ref) != 1)
 			die("git upload-pack: ambiguous deepen-not: %s", line);
-		oid_array_append(deepen_not, &oid);
+		oidset_insert(deepen_not, &oid);
 		free(ref);
 		*deepen_rev_list = 1;
 		return 1;
@@ -1554,7 +1555,7 @@ static void trace2_fetch_info(struct upload_pack_data *data)
 	jw_object_intmax(&jw, "depth", data->depth);
 	jw_object_intmax(&jw, "shallows", data->shallows.nr);
 	jw_object_bool(&jw, "deepen-since", data->deepen_since);
-	jw_object_intmax(&jw, "deepen-not", data->deepen_not.nr);
+	jw_object_intmax(&jw, "deepen-not", oidset_size(&data->deepen_not));
 	jw_object_bool(&jw, "deepen-relative", data->deepen_relative);
 	if (data->filter_options.choice)
 		jw_object_string(&jw, "filter", list_object_filter_config_name(data->filter_options.choice));
-- 
2.44.0.rc2.424.gbdbf4d014b


  parent reply	other threads:[~2024-02-28 22:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28 22:37 [PATCH 0/9] bound upload-pack memory allocations Jeff King
2024-02-28 22:37 ` [PATCH 1/9] upload-pack: drop separate v2 "haves" array Jeff King
2024-02-28 22:37 ` [PATCH 2/9] upload-pack: switch deepen-not list to an oid_array Jeff King
2024-02-28 22:37 ` Jeff King [this message]
2024-02-28 22:38 ` [PATCH 4/9] upload-pack: use a strmap for want-ref lines Jeff King
2024-02-28 22:38 ` [PATCH 5/9] upload-pack: accept only a single packfile-uri line Jeff King
2024-02-28 22:38 ` [PATCH 6/9] upload-pack: disallow object-info capability by default Jeff King
2024-03-04  8:34   ` Patrick Steinhardt
2024-03-04  9:59     ` Jeff King
2024-04-29 20:49       ` Taylor Blau
2024-02-28 22:39 ` [PATCH 7/9] upload-pack: always turn off save_commit_buffer Jeff King
2024-02-28 22:39 ` [PATCH 8/9] upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places Jeff King
2024-02-28 22:39 ` [PATCH 9/9] upload-pack: free tree buffers after parsing Jeff King
2024-03-04  8:33   ` Patrick Steinhardt
2024-03-04  9:57     ` Jeff King
2024-03-04 10:00       ` Patrick Steinhardt
2024-02-28 22:47 ` [PATCH 0/9] bound upload-pack memory allocations Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240228223744.GC1158131@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=benjaminflesch@icloud.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).