All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 3/3] receive-pack: eliminate duplicate .have refs
Date: Thu, 19 May 2011 17:34:46 -0400	[thread overview]
Message-ID: <20110519213446.GC29793@sigill.intra.peff.net> (raw)
In-Reply-To: <20110519213231.GA29702@sigill.intra.peff.net>

When receiving a push, we advertise ref tips from any
alternate repositories, in case that helps the client send a
smaller pack. Since these refs don't actually exist in the
destination repository, we don't transmit the real ref
names, but instead use the pseudo-ref ".have".

If your alternate has a large number of duplicate refs (for
example, because it is aggregating objects from many related
repositories, some of which will have the same tags and
branch tips), then we will send each ".have $sha1" line
multiple times. This is a pointless waste of bandwidth, as
we are simply repeating the same fact to the client over and
over.

This patch eliminates duplicate .have refs early on. It does
so efficiently by sorting the complete list and skipping
duplicates. This has the side effect of re-ordering the
.have lines by ascending sha1; this isn't a problem, though,
as the original order was meaningless.

There is a similar .have system in fetch-pack, but it
does not suffer from the same problem. For each alternate
ref we consider in fetch-pack, we actually open the object
and mark it with the SEEN flag, so duplicates are
automatically culled.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/receive-pack.c |   16 +++++++++++++---
 sha1-array.c           |   16 ++++++++++++++++
 sha1-array.h           |    6 ++++++
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 6bb1281..e1a687a 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -10,6 +10,7 @@
 #include "remote.h"
 #include "transport.h"
 #include "string-list.h"
+#include "sha1-array.h"
 
 static const char receive_pack_usage[] = "git receive-pack <git-dir>";
 
@@ -731,14 +732,23 @@ static int delete_only(struct command *commands)
 	return 1;
 }
 
-static void add_one_alternate_ref(const struct ref *ref, void *unused)
+static void add_one_alternate_sha1(const unsigned char sha1[20], void *unused)
 {
-	add_extra_ref(".have", ref->old_sha1, 0);
+	add_extra_ref(".have", sha1, 0);
+}
+
+static void collect_one_alternate_ref(const struct ref *ref, void *data)
+{
+	struct sha1_array *sa = data;
+	sha1_array_append(sa, ref->old_sha1);
 }
 
 static void add_alternate_refs(void)
 {
-	for_each_alternate_ref(add_one_alternate_ref, NULL);
+	struct sha1_array sa = SHA1_ARRAY_INIT;
+	for_each_alternate_ref(collect_one_alternate_ref, &sa);
+	sha1_array_for_each_unique(&sa, add_one_alternate_sha1, NULL);
+	sha1_array_clear(&sa);
 }
 
 int cmd_receive_pack(int argc, const char **argv, const char *prefix)
diff --git a/sha1-array.c b/sha1-array.c
index 5b75a5a..b2f47f9 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -41,3 +41,19 @@ void sha1_array_clear(struct sha1_array *array)
 	array->alloc = 0;
 	array->sorted = 0;
 }
+
+void sha1_array_for_each_unique(struct sha1_array *array,
+				for_each_sha1_fn fn,
+				void *data)
+{
+	int i;
+
+	if (!array->sorted)
+		sha1_array_sort(array);
+
+	for (i = 0; i < array->nr; i++) {
+		if (i > 0 && !hashcmp(array->sha1[i], array->sha1[i-1]))
+			continue;
+		fn(array->sha1[i], data);
+	}
+}
diff --git a/sha1-array.h b/sha1-array.h
index 15d3b6b..4499b5d 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -15,4 +15,10 @@ void sha1_array_sort(struct sha1_array *array);
 int sha1_array_lookup(struct sha1_array *array, const unsigned char *sha1);
 void sha1_array_clear(struct sha1_array *array);
 
+typedef void (*for_each_sha1_fn)(const unsigned char sha1[20],
+				 void *data);
+void sha1_array_for_each_unique(struct sha1_array *array,
+				for_each_sha1_fn fn,
+				void *data);
+
 #endif /* SHA1_ARRAY_H */
-- 
1.7.5.8.ga95ab

  parent reply	other threads:[~2011-05-19 21:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-19 21:32 [PATCH 0/3] avoid duplicate .have lines Jeff King
2011-05-19 21:33 ` [PATCH 1/3] refactor refs_from_alternate_cb to allow passing extra data Jeff King
2011-05-19 21:34 ` [PATCH 2/3] bisect: refactor sha1_array into a generic sha1 list Jeff King
2011-05-20  0:17   ` Thiago Farina
2011-05-20  7:47     ` Jeff King
2011-05-20 17:14       ` Junio C Hamano
2011-05-23 21:53         ` Jeff King
2011-05-19 21:34 ` Jeff King [this message]
2011-05-20  3:06   ` [PATCH 3/3] receive-pack: eliminate duplicate .have refs Junio C Hamano
2011-05-20  7:42     ` Jeff King
2011-05-20 17:06       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110519213446.GC29793@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.