From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 09/12] receive-pack: use oidset to de-duplicate .have lines
Date: Mon, 23 Jan 2017 19:47:39 -0500 [thread overview]
Message-ID: <20170124004739.5aowmmkesbanyohm@sigill.intra.peff.net> (raw)
In-Reply-To: <20170124003729.j4ygjcgypdq7hceg@sigill.intra.peff.net>
If you have an alternate object store with a very large
number of refs, the peak memory usage of the sha1_array can
grow high, even if most of them are duplicates that end up
not being printed at all.
The similar for_each_alternate_ref() code-paths in
fetch-pack solve this by using flags in "struct object" to
de-duplicate (and so are relying on obj_hash at the core).
But we don't have a "struct object" at all in this case. We
could call lookup_unknown_object() to get one, but if our
goal is reducing memory footprint, it's not great:
- an unknown object is as large as the largest object type
(a commit), which is bigger than an oidset entry
- we can free the memory after our ref advertisement, but
"struct object" entries persist forever (and the
receive-pack may hang around for a long time, as the
bottleneck is often client upload bandwidth).
So let's use an oidset. Note that unlike a sha1-array it
doesn't sort the output as a side effect. However, our
output is at least stable, because for_each_alternate_ref()
will give us the sha1s in ref-sorted order.
In one particularly pathological case with an alternate that
has 60,000 unique refs out of 80 million total, this reduced
the peak heap usage of "git receive-pack . </dev/null" from
13GB to 14MB.
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/receive-pack.c | 26 ++++++++++++--------------
1 file changed, 12 insertions(+), 14 deletions(-)
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index b9f2c0cc5..27bb52988 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -21,6 +21,7 @@
#include "sigchain.h"
#include "fsck.h"
#include "tmp-objdir.h"
+#include "oidset.h"
static const char * const receive_pack_usage[] = {
N_("git receive-pack <git-dir>"),
@@ -271,27 +272,24 @@ static int show_ref_cb(const char *path_full, const struct object_id *oid,
return 0;
}
-static int show_one_alternate_sha1(const unsigned char sha1[20], void *unused)
+static void show_one_alternate_ref(const char *refname,
+ const struct object_id *oid,
+ void *data)
{
- show_ref(".have", sha1);
- return 0;
-}
+ struct oidset *seen = data;
-static void collect_one_alternate_ref(const char *refname,
- const struct object_id *oid,
- void *data)
-{
- struct sha1_array *sa = data;
- sha1_array_append(sa, oid->hash);
+ if (oidset_insert(seen, oid))
+ return;
+
+ show_ref(".have", oid->hash);
}
static void write_head_info(void)
{
- struct sha1_array sa = SHA1_ARRAY_INIT;
+ static struct oidset seen = OIDSET_INIT;
- for_each_alternate_ref(collect_one_alternate_ref, &sa);
- sha1_array_for_each_unique(&sa, show_one_alternate_sha1, NULL);
- sha1_array_clear(&sa);
+ for_each_alternate_ref(show_one_alternate_ref, &seen);
+ oidset_clear(&seen);
for_each_ref(show_ref_cb, NULL);
if (!sent_capabilities)
show_ref("capabilities^{}", null_sha1);
--
2.11.0.765.g454d2182f
next prev parent reply other threads:[~2017-01-24 0:47 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-24 0:37 [PATCH 0/12] reducing resource usage of for_each_alternate_ref Jeff King
2017-01-24 0:38 ` [PATCH 01/12] for_each_alternate_ref: handle failure from real_pathdup() Jeff King
2017-01-25 18:26 ` Junio C Hamano
2017-01-24 0:39 ` [PATCH 02/12] for_each_alternate_ref: stop trimming trailing slashes Jeff King
2017-01-24 0:40 ` [PATCH 03/12] for_each_alternate_ref: use strbuf for path allocation Jeff King
2017-01-25 18:29 ` Junio C Hamano
2017-01-25 18:40 ` Jeff King
2017-01-24 0:40 ` [PATCH 04/12] for_each_alternate_ref: pass name/oid instead of ref struct Jeff King
2017-01-24 0:44 ` [PATCH 05/12] for_each_alternate_ref: replace transport code with for-each-ref Jeff King
2017-01-25 19:00 ` Junio C Hamano
2017-01-24 0:45 ` [PATCH 06/12] clone: disable save_commit_buffer Jeff King
2017-01-25 19:11 ` Junio C Hamano
2017-01-25 19:27 ` Jeff King
2017-01-25 19:35 ` Jeff King
2017-01-25 21:07 ` Jeff King
2017-01-24 0:45 ` [PATCH 07/12] fetch-pack: cache results of for_each_alternate_ref Jeff King
2017-01-25 19:21 ` Junio C Hamano
2017-01-25 19:47 ` Jeff King
2017-01-24 0:46 ` [PATCH 08/12] add oidset API Jeff King
2017-01-24 20:26 ` Ramsay Jones
2017-01-24 20:35 ` Jeff King
2017-01-24 0:47 ` Jeff King [this message]
2017-01-25 19:32 ` [PATCH 09/12] receive-pack: use oidset to de-duplicate .have lines Junio C Hamano
2017-01-25 19:54 ` Jeff King
2017-01-24 0:47 ` [PATCH 10/12] receive-pack: fix misleading namespace/.have comment Jeff King
2017-01-24 0:48 ` [PATCH 11/12] receive-pack: treat namespace .have lines like alternates Jeff King
2017-01-25 19:51 ` Junio C Hamano
2017-01-25 19:58 ` Jeff King
2017-01-27 17:45 ` Lukas Fleischer
2017-01-27 17:58 ` Jeff King
2017-01-27 20:42 ` Junio C Hamano
2017-01-24 0:48 ` [PATCH 12/12] receive-pack: avoid duplicates between our refs and alternates Jeff King
2017-01-25 20:02 ` Junio C Hamano
2017-01-25 20:05 ` Jeff King
2017-01-24 1:33 ` [PATCH 0/12] reducing resource usage of for_each_alternate_ref Brandon Williams
2017-01-24 2:12 ` Jeff King
2017-02-08 20:52 ` [PATCH v2 0/11] " Jeff King
2017-02-08 20:52 ` [PATCH v2 01/11] for_each_alternate_ref: handle failure from real_pathdup() Jeff King
2017-02-08 20:52 ` [PATCH v2 02/11] for_each_alternate_ref: stop trimming trailing slashes Jeff King
2017-02-08 20:52 ` [PATCH v2 03/11] for_each_alternate_ref: use strbuf for path allocation Jeff King
2017-02-08 20:52 ` [PATCH v2 04/11] for_each_alternate_ref: pass name/oid instead of ref struct Jeff King
2017-02-08 20:53 ` [PATCH v2 05/11] for_each_alternate_ref: replace transport code with for-each-ref Jeff King
2017-02-08 20:53 ` [PATCH v2 06/11] fetch-pack: cache results of for_each_alternate_ref Jeff King
2017-02-08 20:53 ` [PATCH v2 07/11] add oidset API Jeff King
2017-02-08 20:53 ` [PATCH v2 08/11] receive-pack: use oidset to de-duplicate .have lines Jeff King
2017-02-08 20:53 ` [PATCH v2 09/11] receive-pack: fix misleading namespace/.have comment Jeff King
2017-02-08 20:53 ` [PATCH v2 10/11] receive-pack: treat namespace .have lines like alternates Jeff King
2017-02-08 20:53 ` [PATCH v2 11/11] receive-pack: avoid duplicates between our refs and alternates Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170124004739.5aowmmkesbanyohm@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).