git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 07/12] fetch-pack: cache results of for_each_alternate_ref
Date: Mon, 23 Jan 2017 19:45:59 -0500	[thread overview]
Message-ID: <20170124004559.vlsrwwphuzdsfqoq@sigill.intra.peff.net> (raw)
In-Reply-To: <20170124003729.j4ygjcgypdq7hceg@sigill.intra.peff.net>

We may run for_each_alternate_ref() twice, once in
find_common() and once in everything_local(). This operation
can be expensive, because it involves running a sub-process
which must freshly load all of the alternate's refs from
disk.

Let's cache and reuse the results between the two calls. We
can make some optimizations based on the particular use
pattern in fetch-pack to keep our memory usage down.

The first is that we only care about the sha1s, not the refs
themselves. So it's OK to store only the sha1s, and to
suppress duplicates. The natural fit would therefore be a
sha1_array.

However, sha1_array's de-duplication happens only after it
has read and sorted all entries. It still stores each
duplicate. For an alternate with a large number of refs
pointing to the same commits, this is a needless expense.

Instead, we'd prefer to eliminate duplicates before putting
them in the cache, which implies using a hash. We can
further note that fetch-pack will call parse_object() on
each alternate sha1. We can therefore keep our cache as a
set of pointers to "struct object". That gives us a place to
put our "already seen" bit with an optimized hash lookup.
And as a bonus, the object stores the sha1 for us, so
pointer-to-object is all we need.

There are two extra optimizations I didn't do here:

  - we actually store an array of pointer-to-object.
    Technically we could just walk the obj_hash table
    looking for entries with the ALTERNATE flag set (because
    our use case doesn't care about the order here).

    But that hash table may be mostly composed of
    non-ALTERNATE entries, so we'd waste time walking over
    them. So it would be a slight win in memory use, but a
    loss in CPU.

  - the items we pull out of the cache are actual "struct
    object"s, but then we feed "obj->sha1" to our
    sub-functions, which promptly call parse_object().

    This second parse is cheap, because it starts with
    lookup_object() and will bail immediately when it sees
    we've already parsed the object. We could save the extra
    hash lookup, but it would involve refactoring the
    functions we call. It may or may not be worth the
    trouble.

Signed-off-by: Jeff King <peff@peff.net>
---
 fetch-pack.c | 52 ++++++++++++++++++++++++++++++++++++++++++----------
 object.h     |  2 +-
 2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 54f84c573..e0f5d5ce8 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -35,6 +35,7 @@ static const char *alternate_shallow_file;
 #define COMMON_REF	(1U << 2)
 #define SEEN		(1U << 3)
 #define POPPED		(1U << 4)
+#define ALTERNATE	(1U << 5)
 
 static int marked;
 
@@ -67,6 +68,41 @@ static inline void print_verbose(const struct fetch_pack_args *args,
 	fputc('\n', stderr);
 }
 
+struct alternate_object_cache {
+	struct object **items;
+	size_t nr, alloc;
+};
+
+static void cache_one_alternate(const char *refname,
+				const struct object_id *oid,
+				void *vcache)
+{
+	struct alternate_object_cache *cache = vcache;
+	struct object *obj = parse_object(oid->hash);
+
+	if (!obj || (obj->flags & ALTERNATE))
+		return;
+
+	obj->flags |= ALTERNATE;
+	ALLOC_GROW(cache->items, cache->nr + 1, cache->alloc);
+	cache->items[cache->nr++] = obj;
+}
+
+static void for_each_cached_alternate(void (*cb)(struct object *))
+{
+	static int initialized;
+	static struct alternate_object_cache cache;
+	size_t i;
+
+	if (!initialized) {
+		for_each_alternate_ref(cache_one_alternate, &cache);
+		initialized = 1;
+	}
+
+	for (i = 0; i < cache.nr; i++)
+		cb(cache.items[i]);
+}
+
 static void rev_list_push(struct commit *commit, int mark)
 {
 	if (!(commit->object.flags & mark)) {
@@ -253,11 +289,9 @@ static void send_request(struct fetch_pack_args *args,
 		write_or_die(fd, buf->buf, buf->len);
 }
 
-static void insert_one_alternate_ref(const char *refname,
-				     const struct object_id *oid,
-				     void *unused)
+static void insert_one_alternate_object(struct object *obj)
 {
-	rev_list_insert_ref(NULL, oid->hash);
+	rev_list_insert_ref(NULL, obj->oid.hash);
 }
 
 #define INITIAL_FLUSH 16
@@ -300,7 +334,7 @@ static int find_common(struct fetch_pack_args *args,
 	marked = 1;
 
 	for_each_ref(rev_list_insert_ref_oid, NULL);
-	for_each_alternate_ref(insert_one_alternate_ref, NULL);
+	for_each_cached_alternate(insert_one_alternate_object);
 
 	fetching = 0;
 	for ( ; refs ; refs = refs->next) {
@@ -621,11 +655,9 @@ static void filter_refs(struct fetch_pack_args *args,
 	*refs = newlist;
 }
 
-static void mark_alternate_complete(const char *refname,
-				    const struct object_id *oid,
-				    void *unused)
+static void mark_alternate_complete(struct object *obj)
 {
-	mark_complete(oid->hash);
+	mark_complete(obj->oid.hash);
 }
 
 static int everything_local(struct fetch_pack_args *args,
@@ -661,7 +693,7 @@ static int everything_local(struct fetch_pack_args *args,
 
 	if (!args->deepen) {
 		for_each_ref(mark_complete_oid, NULL);
-		for_each_alternate_ref(mark_alternate_complete, NULL);
+		for_each_cached_alternate(mark_alternate_complete);
 		commit_list_sort_by_date(&complete);
 		if (cutoff)
 			mark_recent_complete_commits(args, cutoff);
diff --git a/object.h b/object.h
index 614a00675..f52957dcb 100644
--- a/object.h
+++ b/object.h
@@ -29,7 +29,7 @@ struct object_array {
 /*
  * object flag allocation:
  * revision.h:      0---------10                                26
- * fetch-pack.c:    0---4
+ * fetch-pack.c:    0---5
  * walker.c:        0-2
  * upload-pack.c:       4       11----------------19
  * builtin/blame.c:               12-13
-- 
2.11.0.765.g454d2182f


  parent reply	other threads:[~2017-01-24  0:47 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-24  0:37 [PATCH 0/12] reducing resource usage of for_each_alternate_ref Jeff King
2017-01-24  0:38 ` [PATCH 01/12] for_each_alternate_ref: handle failure from real_pathdup() Jeff King
2017-01-25 18:26   ` Junio C Hamano
2017-01-24  0:39 ` [PATCH 02/12] for_each_alternate_ref: stop trimming trailing slashes Jeff King
2017-01-24  0:40 ` [PATCH 03/12] for_each_alternate_ref: use strbuf for path allocation Jeff King
2017-01-25 18:29   ` Junio C Hamano
2017-01-25 18:40     ` Jeff King
2017-01-24  0:40 ` [PATCH 04/12] for_each_alternate_ref: pass name/oid instead of ref struct Jeff King
2017-01-24  0:44 ` [PATCH 05/12] for_each_alternate_ref: replace transport code with for-each-ref Jeff King
2017-01-25 19:00   ` Junio C Hamano
2017-01-24  0:45 ` [PATCH 06/12] clone: disable save_commit_buffer Jeff King
2017-01-25 19:11   ` Junio C Hamano
2017-01-25 19:27     ` Jeff King
2017-01-25 19:35       ` Jeff King
2017-01-25 21:07         ` Jeff King
2017-01-24  0:45 ` Jeff King [this message]
2017-01-25 19:21   ` [PATCH 07/12] fetch-pack: cache results of for_each_alternate_ref Junio C Hamano
2017-01-25 19:47     ` Jeff King
2017-01-24  0:46 ` [PATCH 08/12] add oidset API Jeff King
2017-01-24 20:26   ` Ramsay Jones
2017-01-24 20:35     ` Jeff King
2017-01-24  0:47 ` [PATCH 09/12] receive-pack: use oidset to de-duplicate .have lines Jeff King
2017-01-25 19:32   ` Junio C Hamano
2017-01-25 19:54     ` Jeff King
2017-01-24  0:47 ` [PATCH 10/12] receive-pack: fix misleading namespace/.have comment Jeff King
2017-01-24  0:48 ` [PATCH 11/12] receive-pack: treat namespace .have lines like alternates Jeff King
2017-01-25 19:51   ` Junio C Hamano
2017-01-25 19:58     ` Jeff King
2017-01-27 17:45     ` Lukas Fleischer
2017-01-27 17:58       ` Jeff King
2017-01-27 20:42         ` Junio C Hamano
2017-01-24  0:48 ` [PATCH 12/12] receive-pack: avoid duplicates between our refs and alternates Jeff King
2017-01-25 20:02   ` Junio C Hamano
2017-01-25 20:05     ` Jeff King
2017-01-24  1:33 ` [PATCH 0/12] reducing resource usage of for_each_alternate_ref Brandon Williams
2017-01-24  2:12   ` Jeff King
2017-02-08 20:52 ` [PATCH v2 0/11] " Jeff King
2017-02-08 20:52   ` [PATCH v2 01/11] for_each_alternate_ref: handle failure from real_pathdup() Jeff King
2017-02-08 20:52   ` [PATCH v2 02/11] for_each_alternate_ref: stop trimming trailing slashes Jeff King
2017-02-08 20:52   ` [PATCH v2 03/11] for_each_alternate_ref: use strbuf for path allocation Jeff King
2017-02-08 20:52   ` [PATCH v2 04/11] for_each_alternate_ref: pass name/oid instead of ref struct Jeff King
2017-02-08 20:53   ` [PATCH v2 05/11] for_each_alternate_ref: replace transport code with for-each-ref Jeff King
2017-02-08 20:53   ` [PATCH v2 06/11] fetch-pack: cache results of for_each_alternate_ref Jeff King
2017-02-08 20:53   ` [PATCH v2 07/11] add oidset API Jeff King
2017-02-08 20:53   ` [PATCH v2 08/11] receive-pack: use oidset to de-duplicate .have lines Jeff King
2017-02-08 20:53   ` [PATCH v2 09/11] receive-pack: fix misleading namespace/.have comment Jeff King
2017-02-08 20:53   ` [PATCH v2 10/11] receive-pack: treat namespace .have lines like alternates Jeff King
2017-02-08 20:53   ` [PATCH v2 11/11] receive-pack: avoid duplicates between our refs and alternates Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170124004559.vlsrwwphuzdsfqoq@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).