All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Ben Maurer <bmaurer@fb.com>,
	Siddharth Agarwal <sid0@fb.com>
Subject: Re: [PATCH/RFC 0/6] reuse deltas found by bitmaps
Date: Wed, 26 Mar 2014 21:13:23 -0400	[thread overview]
Message-ID: <20140327011323.GA9266@sigill.intra.peff.net> (raw)
In-Reply-To: <20140326181300.GA7087@sigill.intra.peff.net>

On Wed, Mar 26, 2014 at 02:13:00PM -0400, Jeff King wrote:

> So I think the next steps are probably:
> 
>   1. Measure the "all objects are preferred bases" approach and confirm
>      that it is bad.

Below is a very rough patch to accomplish this. It just traverses the
"have" bitmap and adds every object with the "exclude" flag. The result
is as comically bad as I expected.

For a big fetch, it seems like it's working (numbers against v1.9.0):

  5311.31: server (128 days)   4.49(7.35+0.23)   4.98(6.82+3.31) +10.9%
  5311.32: size   (128 days)             25.8M             32.0M +24.2%
  5311.33: client (128 days)   7.17(7.38+0.20)   7.33(7.90+0.20) +2.2%

A modest increase in CPU time, and we get back most of our size
(remember that our "bad" case here is ~80M).

But for a small fetch...

  5311.3: server   (1 days)    0.20(0.17+0.03)   4.39(4.03+6.59) +2095.0%
  5311.4: size     (1 days)              57.2K             59.5K +4.1%
  5311.5: client   (1 days)    0.08(0.08+0.00)   0.08(0.08+0.00) +0.0%

Yikes. Besides spending lots of CPU on handling the enlarged object
list, notice that we still only dropped the size in the 128-day case to
32M. Which is almost exactly what the earlier "reuse on-disk deltas"
patch achieved.

What I think is happening is that we manage to reuse those on-disk
deltas (since they are now preferred bases, and we know we can). But we
never actually come up with any _new_ deltas, because the search window
is overwhelmed with candidates. So not only do we waste a huge amount of
CPU, but we just end up at the same not-quite-optimal result as before.

So this is a dead end, but I think it was good to double-check that.

The patch below is messy and would probably be better split into a few
patches, but I don't expect anyone to apply it (or even read it,
really). It's just for reference.

---
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0ee5f1f..1a5d401 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1026,7 +1026,7 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1,
 	if (have_duplicate_entry(sha1, 0, &index_pos))
 		return 0;
 
-	create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset);
+	create_object_entry(sha1, type, name_hash, flags, 0, index_pos, pack, offset);
 
 	display_progress(progress_state, to_pack.nr_objects);
 	return 1;
@@ -2436,6 +2436,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
 	}
 
 	traverse_bitmap_commit_list(&add_object_entry_from_bitmap);
+	bitmap_have_foreach(&add_object_entry_from_bitmap);
 	return 0;
 }
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1bae7e8..f4e30f5 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -605,6 +605,7 @@ static void show_objects_for_type(
 	struct bitmap *objects,
 	struct ewah_bitmap *type_filter,
 	enum object_type object_type,
+	int flags,
 	show_reachable_fn show_reach)
 {
 	size_t pos = 0, i = 0;
@@ -613,9 +614,6 @@ static void show_objects_for_type(
 	struct ewah_iterator it;
 	eword_t filter;
 
-	if (bitmap_git.reuse_objects == bitmap_git.pack->num_objects)
-		return;
-
 	ewah_iterator_init(&it, type_filter);
 
 	while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
@@ -640,7 +638,7 @@ static void show_objects_for_type(
 			if (bitmap_git.hashes)
 				hash = ntohl(bitmap_git.hashes[entry->nr]);
 
-			show_reach(sha1, object_type, 0, hash, bitmap_git.pack, entry->offset);
+			show_reach(sha1, object_type, flags, hash, bitmap_git.pack, entry->offset);
 		}
 
 		pos += BITS_IN_WORD;
@@ -816,14 +814,17 @@ void traverse_bitmap_commit_list(show_reachable_fn show_reachable)
 {
 	assert(bitmap_git.result);
 
+	if (bitmap_git.reuse_objects == bitmap_git.pack->num_objects)
+		return;
+
 	show_objects_for_type(bitmap_git.result, bitmap_git.commits,
-		OBJ_COMMIT, show_reachable);
+		OBJ_COMMIT, 0, show_reachable);
 	show_objects_for_type(bitmap_git.result, bitmap_git.trees,
-		OBJ_TREE, show_reachable);
+		OBJ_TREE, 0, show_reachable);
 	show_objects_for_type(bitmap_git.result, bitmap_git.blobs,
-		OBJ_BLOB, show_reachable);
+		OBJ_BLOB, 0, show_reachable);
 	show_objects_for_type(bitmap_git.result, bitmap_git.tags,
-		OBJ_TAG, show_reachable);
+		OBJ_TAG, 0, show_reachable);
 
 	show_extended_objects(bitmap_git.result, show_reachable);
 
@@ -1090,3 +1091,18 @@ int bitmap_have(const unsigned char *sha1)
 
 	return bitmap_get(bitmap_git.haves, pos);
 }
+
+void bitmap_have_foreach(show_reachable_fn show_reachable)
+{
+	if (!bitmap_git.haves)
+		return;
+
+	show_objects_for_type(bitmap_git.haves, bitmap_git.commits,
+		OBJ_COMMIT, 1, show_reachable);
+	show_objects_for_type(bitmap_git.haves, bitmap_git.trees,
+		OBJ_TREE, 1, show_reachable);
+	show_objects_for_type(bitmap_git.haves, bitmap_git.blobs,
+		OBJ_BLOB, 1, show_reachable);
+	show_objects_for_type(bitmap_git.haves, bitmap_git.tags,
+		OBJ_TAG, 1, show_reachable);
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index a63ee6b..02c08f8 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -50,6 +50,7 @@ int reuse_partial_packfile_from_bitmap(struct packed_git **packfile, uint32_t *e
 int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 *reused_bitmaps, int show_progress);
 
 int bitmap_have(const unsigned char *sha1);
+void bitmap_have_foreach(show_reachable_fn);
 
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(unsigned char *sha1);

  parent reply	other threads:[~2014-03-27  1:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-26  7:22 [PATCH/RFC 0/6] reuse deltas found by bitmaps Jeff King
2014-03-26  7:22 ` [PATCH 1/6] t/perf-lib: factor boilerplate out of test_perf Jeff King
2014-03-26 22:34   ` Junio C Hamano
2014-03-26  7:22 ` [PATCH 2/6] t/perf/aggregate: factor our percent calculations Jeff King
2014-03-26  7:22 ` [PATCH 3/6] t/perf: add infrastructure for measuring sizes Jeff King
2014-03-26  7:22 ` [PATCH 4/6] t/perf: add perf tests for fetches from a bitmapped server Jeff King
2014-03-26  7:23 ` [PATCH 5/6] pack-bitmap: save "have" bitmap from walk Jeff King
2014-03-26  7:23 ` [PATCH 6/6] pack-objects: reuse deltas for thin "have" objects Jeff King
2014-03-28  4:23   ` Eric Sunshine
2014-03-26 17:33 ` [PATCH/RFC 0/6] reuse deltas found by bitmaps Junio C Hamano
2014-03-26 18:13   ` Jeff King
2014-03-26 22:31     ` Junio C Hamano
2014-03-26 22:36       ` Jeff King
2014-03-27  1:13     ` Jeff King [this message]
2014-03-27 16:36       ` Junio C Hamano
2014-03-26 22:40 ` Siddharth Agarwal
2014-03-27 14:09   ` Siddharth Agarwal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140327011323.GA9266@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=bmaurer@fb.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sid0@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.