All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] some prune optimizations
@ 2019-02-14  4:31 Jeff King
  2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
                   ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Jeff King @ 2019-02-14  4:31 UTC (permalink / raw)
  To: git

Here are two optimizations for git-prune that we've been running at
GitHub for ages. No particular reason to share them now; they just
finally floated to the top of my todo pile. Do note that I rebased and
polished them (and the third one is brand new), so the concepts are
proven, but it's possible I introduced a new bug. ;)

  [1/3]: prune: lazily perform reachability traversal
  [2/3]: prune: use bitmaps for reachability traversal
  [3/3]: prune: check SEEN flag for reachability

 builtin/prune.c       | 44 +++++++++++++++++++++++++++++++------------
 reachable.c           | 42 +++++++++++++++++++++++++++++++++++++++++
 t/perf/p5304-prune.sh | 35 ++++++++++++++++++++++++++++++++++
 t/t5304-prune.sh      | 12 ++++++++++++
 4 files changed, 121 insertions(+), 12 deletions(-)
 create mode 100755 t/perf/p5304-prune.sh

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 1/3] prune: lazily perform reachability traversal
  2019-02-14  4:31 [PATCH 0/3] some prune optimizations Jeff King
@ 2019-02-14  4:35 ` Jeff King
  2019-02-14 10:54   ` Eric Sunshine
  2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
  2019-02-14  4:38 ` [PATCH 3/3] prune: check SEEN flag for reachability Jeff King
  2 siblings, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-02-14  4:35 UTC (permalink / raw)
  To: git

The general strategy of "git prune" is to do a full reachability walk,
then for each loose object see if we found it in our walk. But if we
don't have any loose objects, we don't need to do the expensive walk in
the first place.

This patch postpones that walk until the first time we need to see its
results.

Note that this is really a specific case of a more general optimization,
which is that we could traverse only far enough to find the object under
consideration (i.e., stop the traversal when we find it, then pick up
again when asked about the next object, etc). That could save us in some
instances from having to do a full walk. But it's actually a bit tricky
to do with our traversal code, and you'd need to do a full walk anyway
if you have even a single unreachable object (which you generally do, if
any objects are actually left after running git-repack).

So in practice this lazy-load of the full walk catches one easy but
common case (i.e., you've just repacked via git-gc, and there's nothing
unreachable).

The perf script is fairly contrived, but it does show off the
improvement:

  Test                            HEAD^             HEAD
  -------------------------------------------------------------------------
  5304.4: prune with no objects   3.66(3.60+0.05)   0.00(0.00+0.00) -100.0%

and would let us know if we accidentally regress this optimization.

Note also that we need to take special care with prune_shallow(), which
relies on us having performed the traversal. So this optimization can
only kick in for a non-shallow repository. Since this is easy to get
wrong and is not covered by existing tests, let's add an extra test to
t5304 that covers this case explicitly.

Signed-off-by: Jeff King <peff@peff.net>
---
The diff looks nice with --color-moved. I wish there was a way to
communicate that information in a plaintext email.

 builtin/prune.c       | 43 ++++++++++++++++++++++++++++++++-----------
 t/perf/p5304-prune.sh | 24 ++++++++++++++++++++++++
 t/t5304-prune.sh      | 12 ++++++++++++
 3 files changed, 68 insertions(+), 11 deletions(-)
 create mode 100755 t/perf/p5304-prune.sh

diff --git a/builtin/prune.c b/builtin/prune.c
index 1ec9ddd751..04b6573945 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -31,16 +31,40 @@ static int prune_tmp_file(const char *fullpath)
 	return 0;
 }
 
-static int prune_object(const struct object_id *oid, const char *fullpath,
-			void *data)
+static void perform_reachability_traversal(struct rev_info *revs)
 {
-	struct stat st;
+	static int initialized;
+	struct progress *progress = NULL;
+
+	if (initialized)
+		return;
+
+	if (show_progress)
+		progress = start_delayed_progress(_("Checking connectivity"), 0);
+	mark_reachable_objects(revs, 1, expire, progress);
+	stop_progress(&progress);
+	initialized = 1;
+}
+
+static int is_object_reachable(const struct object_id *oid,
+			       struct rev_info *revs)
+{
+	perform_reachability_traversal(revs);
 
 	/*
 	 * Do we know about this object?
 	 * It must have been reachable
 	 */
-	if (lookup_object(the_repository, oid->hash))
+	return !!lookup_object(the_repository, oid->hash);
+}
+
+static int prune_object(const struct object_id *oid, const char *fullpath,
+			void *data)
+{
+	struct rev_info *revs = data;
+	struct stat st;
+
+	if (is_object_reachable(oid, revs))
 		return 0;
 
 	if (lstat(fullpath, &st)) {
@@ -102,7 +126,6 @@ static void remove_temporary_files(const char *path)
 int cmd_prune(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
-	struct progress *progress = NULL;
 	int exclude_promisor_objects = 0;
 	const struct option options[] = {
 		OPT__DRY_RUN(&show_only, N_("do not remove, show only")),
@@ -142,17 +165,13 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 
 	if (show_progress == -1)
 		show_progress = isatty(2);
-	if (show_progress)
-		progress = start_delayed_progress(_("Checking connectivity"), 0);
 	if (exclude_promisor_objects) {
 		fetch_if_missing = 0;
 		revs.exclude_promisor_objects = 1;
 	}
 
-	mark_reachable_objects(&revs, 1, expire, progress);
-	stop_progress(&progress);
 	for_each_loose_file_in_objdir(get_object_directory(), prune_object,
-				      prune_cruft, prune_subdir, NULL);
+				      prune_cruft, prune_subdir, &revs);
 
 	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
 	remove_temporary_files(get_object_directory());
@@ -160,8 +179,10 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 	remove_temporary_files(s);
 	free(s);
 
-	if (is_repository_shallow(the_repository))
+	if (is_repository_shallow(the_repository)) {
+		perform_reachability_traversal(&revs);
 		prune_shallow(show_only ? PRUNE_SHOW_ONLY : 0);
+	}
 
 	return 0;
 }
diff --git a/t/perf/p5304-prune.sh b/t/perf/p5304-prune.sh
new file mode 100755
index 0000000000..3c852084eb
--- /dev/null
+++ b/t/perf/p5304-prune.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+test_description='performance tests of prune'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+test_expect_success 'remove reachable loose objects' '
+	git repack -ad
+'
+
+test_expect_success 'remove unreachable loose objects' '
+	git prune
+'
+
+test_expect_success 'confirm there are no loose objects' '
+	git count-objects | grep ^0
+'
+
+test_perf 'prune with no objects' '
+	git prune
+'
+
+test_done
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 270da21ac3..2c19a790c1 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -274,6 +274,18 @@ test_expect_success 'prune .git/shallow' '
 	test_path_is_missing .git/shallow
 '
 
+test_expect_success 'prune .git/shallow when there are no loose objects' '
+	SHA1=$(echo hi|git commit-tree HEAD^{tree}) &&
+	echo $SHA1 >.git/shallow &&
+	git update-ref refs/heads/shallow-tip $SHA1 &&
+	git repack -ad &&
+	# verify assumption that all loose objects are gone
+	git count-objects | grep ^0 &&
+	git prune &&
+	echo $SHA1 >expect &&
+	test_cmp expect .git/shallow
+'
+
 test_expect_success 'prune: handle alternate object database' '
 	test_create_repo A &&
 	git -C A commit --allow-empty -m "initial commit" &&
-- 
2.21.0.rc0.586.gffba1126a0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 2/3] prune: use bitmaps for reachability traversal
  2019-02-14  4:31 [PATCH 0/3] some prune optimizations Jeff King
  2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
@ 2019-02-14  4:37 ` Jeff King
  2019-03-09  2:49   ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong
  2019-04-15 15:00   ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee
  2019-02-14  4:38 ` [PATCH 3/3] prune: check SEEN flag for reachability Jeff King
  2 siblings, 2 replies; 57+ messages in thread
From: Jeff King @ 2019-02-14  4:37 UTC (permalink / raw)
  To: git

Pruning generally has to traverse the whole commit graph in order to
see which objects are reachable. This is the exact problem that
reachability bitmaps were meant to solve, so let's use them (if they're
available, of course).

Here are timings on git.git:

  Test                            HEAD^             HEAD
  ------------------------------------------------------------------------
  5304.6: prune with bitmaps      3.65(3.56+0.09)   1.01(0.92+0.08) -72.3%

And on linux.git:

  Test                            HEAD^               HEAD
  --------------------------------------------------------------------------
  5304.6: prune with bitmaps      35.05(34.79+0.23)   3.00(2.78+0.21) -91.4%

The tests show a pretty optimal case, as we'll have just repacked and
should have pretty good coverage of all refs with our bitmaps. But
that's actually pretty realistic: normally prune is run via "gc" right
after repacking.

A few notes on the implementation:

  - the change is actually in reachable.c, so it would improve
    reachability traversals by "reflog expire --stale-fix", as well.
    Those aren't performed regularly, though (a normal "git gc" doesn't
    use --stale-fix), so they're not really worth measuring. There's a
    low chance of regressing that caller, since the use of bitmaps is
    totally transparent from the caller's perspective.

  - The bitmap case could actually get away without creating a "struct
    object", and instead the caller could just look up each object id in
    the bitmap result. However, this would be a marginal improvement in
    runtime, and it would make the callers much more complicated. They'd
    have to handle both the bitmap and non-bitmap cases separately, and
    in the case of git-prune, we'd also have to tweak prune_shallow(),
    which relies on our SEEN flags.

  - Because we do create real object structs, we go through a few
    contortions to create ones of the right type. This isn't strictly
    necessary (lookup_unknown_object() would suffice), but it's more
    memory efficient to use the correct types, since we already know
    them.

Signed-off-by: Jeff King <peff@peff.net>
---
 reachable.c           | 42 ++++++++++++++++++++++++++++++++++++++++++
 t/perf/p5304-prune.sh | 11 +++++++++++
 2 files changed, 53 insertions(+)

diff --git a/reachable.c b/reachable.c
index 6e9b810d2a..0d00a91de4 100644
--- a/reachable.c
+++ b/reachable.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "worktree.h"
 #include "object-store.h"
+#include "pack-bitmap.h"
 
 struct connectivity_progress {
 	struct progress *progress;
@@ -158,10 +159,44 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
 				      FOR_EACH_OBJECT_LOCAL_ONLY);
 }
 
+static void *lookup_object_by_type(struct repository *r,
+				   const struct object_id *oid,
+				   enum object_type type)
+{
+	switch (type) {
+	case OBJ_COMMIT:
+		return lookup_commit(r, oid);
+	case OBJ_TREE:
+		return lookup_tree(r, oid);
+	case OBJ_TAG:
+		return lookup_tag(r, oid);
+	case OBJ_BLOB:
+		return lookup_blob(r, oid);
+	default:
+		die("BUG: unknown object type %d", type);
+	}
+}
+
+static int mark_object_seen(const struct object_id *oid,
+			     enum object_type type,
+			     int exclude,
+			     uint32_t name_hash,
+			     struct packed_git *found_pack,
+			     off_t found_offset)
+{
+	struct object *obj = lookup_object_by_type(the_repository, oid, type);
+	if (!obj)
+		die("unable to create object '%s'", oid_to_hex(oid));
+
+	obj->flags |= SEEN;
+	return 0;
+}
+
 void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 			    timestamp_t mark_recent, struct progress *progress)
 {
 	struct connectivity_progress cp;
+	struct bitmap_index *bitmap_git;
 
 	/*
 	 * Set up revision parsing, and mark us as being interested
@@ -188,6 +223,13 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	cp.progress = progress;
 	cp.count = 0;
 
+	bitmap_git = prepare_bitmap_walk(revs);
+	if (bitmap_git) {
+		traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
+		free_bitmap_index(bitmap_git);
+		return;
+	}
+
 	/*
 	 * Set up the revision walk - this will move all commits
 	 * from the pending list to the commit walking list.
diff --git a/t/perf/p5304-prune.sh b/t/perf/p5304-prune.sh
index 3c852084eb..83baedb8a4 100755
--- a/t/perf/p5304-prune.sh
+++ b/t/perf/p5304-prune.sh
@@ -21,4 +21,15 @@ test_perf 'prune with no objects' '
 	git prune
 '
 
+test_expect_success 'repack with bitmaps' '
+	git repack -adb
+'
+
+# We have to create the object in each trial run, since otherwise
+# runs after the first see no object and just skip the traversal entirely!
+test_perf 'prune with bitmaps' '
+	echo "probably not present in repo" | git hash-object -w --stdin &&
+	git prune
+'
+
 test_done
-- 
2.21.0.rc0.586.gffba1126a0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 3/3] prune: check SEEN flag for reachability
  2019-02-14  4:31 [PATCH 0/3] some prune optimizations Jeff King
  2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
  2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
@ 2019-02-14  4:38 ` Jeff King
  2 siblings, 0 replies; 57+ messages in thread
From: Jeff King @ 2019-02-14  4:38 UTC (permalink / raw)
  To: git

The git-prune command checks reachability by doing a traversal, and then
checking whether a given object exists in the global object hash. This
can yield false positives if any other part of the code had to create an
object struct for some reason. It's not clear whether this is even
possible, but it's more robust to rely on something a little more
concrete: the SEEN flag set by our traversal.

Note that there is a slight possibility of regression here, as we're
relying on mark_reachable_objects() to consistently set the flag.
However, it has always done so, and we're already relying on that fact
in prune_shallow(), which is called as part of git-prune. So this is
making these two parts of the prune operation more consistent.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/prune.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index 04b6573945..97613eccb5 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -49,13 +49,12 @@ static void perform_reachability_traversal(struct rev_info *revs)
 static int is_object_reachable(const struct object_id *oid,
 			       struct rev_info *revs)
 {
+	struct object *obj;
+
 	perform_reachability_traversal(revs);
 
-	/*
-	 * Do we know about this object?
-	 * It must have been reachable
-	 */
-	return !!lookup_object(the_repository, oid->hash);
+	obj = lookup_object(the_repository, oid->hash);
+	return obj && (obj->flags & SEEN);
 }
 
 static int prune_object(const struct object_id *oid, const char *fullpath,
-- 
2.21.0.rc0.586.gffba1126a0

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/3] prune: lazily perform reachability traversal
  2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
@ 2019-02-14 10:54   ` Eric Sunshine
  2019-02-14 11:07     ` Jeff King
  0 siblings, 1 reply; 57+ messages in thread
From: Eric Sunshine @ 2019-02-14 10:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Wed, Feb 13, 2019 at 11:35 PM Jeff King <peff@peff.net> wrote:
> diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
> @@ -274,6 +274,18 @@ test_expect_success 'prune .git/shallow' '
> +test_expect_success 'prune .git/shallow when there are no loose objects' '
> +       SHA1=$(echo hi|git commit-tree HEAD^{tree}) &&

Perhaps name this variable 'oid' rather than 'SHA1' to make brian happy.

> +       echo $SHA1 >.git/shallow &&
> +       git update-ref refs/heads/shallow-tip $SHA1 &&
> +       git repack -ad &&
> +       # verify assumption that all loose objects are gone
> +       git count-objects | grep ^0 &&
> +       git prune &&
> +       echo $SHA1 >expect &&
> +       test_cmp expect .git/shallow
> +'

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/3] prune: lazily perform reachability traversal
  2019-02-14 10:54   ` Eric Sunshine
@ 2019-02-14 11:07     ` Jeff King
  0 siblings, 0 replies; 57+ messages in thread
From: Jeff King @ 2019-02-14 11:07 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Feb 14, 2019 at 05:54:25AM -0500, Eric Sunshine wrote:

> On Wed, Feb 13, 2019 at 11:35 PM Jeff King <peff@peff.net> wrote:
> > diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
> > @@ -274,6 +274,18 @@ test_expect_success 'prune .git/shallow' '
> > +test_expect_success 'prune .git/shallow when there are no loose objects' '
> > +       SHA1=$(echo hi|git commit-tree HEAD^{tree}) &&
> 
> Perhaps name this variable 'oid' rather than 'SHA1' to make brian happy.

Heh, that line (and the one below it) are cut-and-paste from the setup
of the test directly above.

Perhaps we should do this on top:

-- >8 --
Subject: [PATCH] t5304: rename "sha1" variables to "oid"

Let's make the script less jarring to read in a post-sha1 world by
using more hash-agnostic variable names.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5304-prune.sh | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 2c19a790c1..1eeb828a15 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -118,10 +118,10 @@ test_expect_success 'prune: do not prune detached HEAD with no reflog' '
 
 test_expect_success 'prune: prune former HEAD after checking out branch' '
 
-	head_sha1=$(git rev-parse HEAD) &&
+	head_oid=$(git rev-parse HEAD) &&
 	git checkout --quiet master &&
 	git prune -v >prune_actual &&
-	grep "$head_sha1" prune_actual
+	grep "$head_oid" prune_actual
 
 '
 
@@ -265,24 +265,24 @@ EOF
 '
 
 test_expect_success 'prune .git/shallow' '
-	SHA1=$(echo hi|git commit-tree HEAD^{tree}) &&
-	echo $SHA1 >.git/shallow &&
+	oid=$(echo hi|git commit-tree HEAD^{tree}) &&
+	echo $oid >.git/shallow &&
 	git prune --dry-run >out &&
-	grep $SHA1 .git/shallow &&
-	grep $SHA1 out &&
+	grep $oid .git/shallow &&
+	grep $oid out &&
 	git prune &&
 	test_path_is_missing .git/shallow
 '
 
 test_expect_success 'prune .git/shallow when there are no loose objects' '
-	SHA1=$(echo hi|git commit-tree HEAD^{tree}) &&
-	echo $SHA1 >.git/shallow &&
-	git update-ref refs/heads/shallow-tip $SHA1 &&
+	oid=$(echo hi|git commit-tree HEAD^{tree}) &&
+	echo $oid >.git/shallow &&
+	git update-ref refs/heads/shallow-tip $oid &&
 	git repack -ad &&
 	# verify assumption that all loose objects are gone
 	git count-objects | grep ^0 &&
 	git prune &&
-	echo $SHA1 >expect &&
+	echo $oid >expect &&
 	test_cmp expect .git/shallow
 '
 
@@ -326,8 +326,8 @@ test_expect_success 'prune: handle HEAD reflog in multiple worktrees' '
 		git reset --hard HEAD^
 	) &&
 	git prune --expire=now &&
-	SHA1=`git hash-object expected` &&
-	git -C third-worktree show "$SHA1" >actual &&
+	oid=`git hash-object expected` &&
+	git -C third-worktree show "$oid" >actual &&
 	test_cmp expected actual
 '
 
-- 
2.21.0.rc1.567.g7f045cdc92


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* bitmaps by default? [was: prune: use bitmaps for reachability traversal]
  2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
@ 2019-03-09  2:49   ` Eric Wong
  2019-03-10 23:39     ` Jeff King
  2019-04-15 15:00   ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee
  1 sibling, 1 reply; 57+ messages in thread
From: Eric Wong @ 2019-03-09  2:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:
> Pruning generally has to traverse the whole commit graph in order to
> see which objects are reachable. This is the exact problem that
> reachability bitmaps were meant to solve, so let's use them (if they're
> available, of course).

Perhaps this is good impetus for doing bitmaps by default?

It would make life easier for people new to hosting git servers
(and hopefully reduce centralization :)


I started working on it, but t0410-partial-clone.sh fails with
"Failed to write bitmap index. Packfile doesn't have full
closure"; so more work needs to be done w.r.t. default behavior
on partial clones...

Here's my WIP:

diff --git a/builtin/repack.c b/builtin/repack.c
index 67f8978043..ca98d32715 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -14,7 +14,7 @@
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
-static int write_bitmaps;
+static int write_bitmaps = -1;
 static int use_delta_islands;
 static char *packdir, *packtmp;
 
@@ -344,10 +344,14 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		die(_("--keep-unreachable and -A are incompatible"));
 
 	if (pack_kept_objects < 0)
-		pack_kept_objects = write_bitmaps;
+		pack_kept_objects = write_bitmaps > 0;
 
-	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE))
-		die(_(incremental_bitmap_conflict_error));
+	if (!(pack_everything & ALL_INTO_ONE)) {
+		if (write_bitmaps > 0)
+			die(_(incremental_bitmap_conflict_error));
+	} else if (write_bitmaps < 0) {
+		write_bitmaps = 1;
+	}
 
 	packdir = mkpathdup("%s/pack", get_object_directory());
 	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
@@ -368,7 +372,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argv_array_push(&cmd.args, "--indexed-objects");
 	if (repository_format_partial_clone)
 		argv_array_push(&cmd.args, "--exclude-promisor-objects");
-	if (write_bitmaps)
+	if (write_bitmaps > 0)
 		argv_array_push(&cmd.args, "--write-bitmap-index");
 	if (use_delta_islands)
 		argv_array_push(&cmd.args, "--delta-islands");

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: bitmaps by default? [was: prune: use bitmaps for reachability traversal]
  2019-03-09  2:49   ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong
@ 2019-03-10 23:39     ` Jeff King
  2019-03-12  3:13       ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong
  0 siblings, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-03-10 23:39 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

On Sat, Mar 09, 2019 at 02:49:44AM +0000, Eric Wong wrote:

> Jeff King <peff@peff.net> wrote:
> > Pruning generally has to traverse the whole commit graph in order to
> > see which objects are reachable. This is the exact problem that
> > reachability bitmaps were meant to solve, so let's use them (if they're
> > available, of course).
> 
> Perhaps this is good impetus for doing bitmaps by default?

I'm actually not sure it is, because the prune costs less than making
the bitmaps. Here are some timings on linux.git. This full-graph
traversal is roughly the same cost as the reachability walk that prune
would do internally:

	$ time git rev-list --objects --all >/dev/null
	real	0m47.714s
	user	0m47.113s
	sys	0m0.600s

Here's a normal noop repack as a baseline.

	$ time git repack -ad
	real	1m26.922s
	user	1m20.029s
	sys	0m7.878s

And here's another immediately after with bitmap generation. Generating
the bitmaps takes about 100s, compared to the 47s it would save us on
the prune.

	$ time git repack -adb
	real	3m5.915s
	user	2m59.377s
	sys	0m7.718s

Things are a little rosier if you generate the bitmaps a second time:

	$ time git repack -adb
	real	1m43.571s
	user	1m37.403s
	sys	0m8.179s

We can reuse some of the old bitmaps and it only takes 20 extra seconds,
making it a net win. But I'm not sure how realistic that is. There were
literally no new objects introduced between those two command. If this
were a "real" repack occurring after we'd accumulated a week or two
worth of objects, how long would it take?

A few other random observations:

  - I do suspect there are some real inefficiencies in the way we
    generate bitmaps. It _should_ be about as expensive as the graph
    traversal, but clearly it's not. I think this is because of the way
    the current bitmap code picks commits to bitmap, and then walks
    somewhat haphazardly over the history, trying to accumulate the set
    of objects for each commit. IOW, I believe it may sometimes
    traverse over some sequences of history more than once. So if we
    could make that faster, then the balance would shift in its favor.

  - This is comparing the cost of generating the bitmaps to the time
    saved for _one_ operation. On a repo serving many fetches, the cost
    to generate it is amortized over many requests. But for a normal
    end-user, that's not true (they'd presumably push out their work,
    but that usually only needs to walk a small bit of history anyway).

    The balance would change if we had more operations that used bitmaps
    (e.g., --contains can use them, as can ahead/behind checks). We
    don't do those things yet, but we could. However, those algorithms
    are also using other commit-graph optimizations, and we've discussed
    revamping the bitmap format as part of that work (one problem in
    particular is that to use the current bitmaps you have to parse the
    whole .bitmap file, making it sometimes a net negative to use the
    bitmaps). So I'd consider holding off any decision like "let's make
    this the default" until we see where that work goes.

> It would make life easier for people new to hosting git servers
> (and hopefully reduce centralization :)

I do think they're a net win for people hosting git servers. But if
that's the goal, I think at most you'd want to make bitmaps the default
for bare repos. They're really not much help for normal end-user repos
at this point.

> I started working on it, but t0410-partial-clone.sh fails with
> "Failed to write bitmap index. Packfile doesn't have full
> closure"; so more work needs to be done w.r.t. default behavior
> on partial clones...

Yeah, you can't use bitmaps at all in an incomplete clone. Shallow
clones would probably have the same issue (though in general we just
disable bitmaps entirely in shallow situations, so that might kick in).

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH] repack: enable bitmaps by default on bare repos
  2019-03-10 23:39     ` Jeff King
@ 2019-03-12  3:13       ` Eric Wong
  2019-03-12  9:07         ` Ævar Arnfjörð Bjarmason
  2019-03-12 10:49         ` Jeff King
  0 siblings, 2 replies; 57+ messages in thread
From: Eric Wong @ 2019-03-12  3:13 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:
> On Sat, Mar 09, 2019 at 02:49:44AM +0000, Eric Wong wrote:
> > It would make life easier for people new to hosting git servers
> > (and hopefully reduce centralization :)
> 
> I do think they're a net win for people hosting git servers. But if
> that's the goal, I think at most you'd want to make bitmaps the default
> for bare repos. They're really not much help for normal end-user repos
> at this point.

Fair enough, hopefully this can make life easier for admins
new to hosting git:

----------8<---------
Subject: [PATCH] repack: enable bitmaps by default on bare repos

A typical use case for bare repos is for serving clones and
fetches to clients.  Enable bitmaps by default on bare repos to
make it easier for admins to host git repos in a performant way.

Signed-off-by: Eric Wong <e@80x24.org>
---
 builtin/repack.c  | 16 ++++++++++------
 t/t7700-repack.sh | 14 +++++++++++++-
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 67f8978043..5d4758b515 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -14,7 +14,7 @@
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
-static int write_bitmaps;
+static int write_bitmaps = -1;
 static int use_delta_islands;
 static char *packdir, *packtmp;
 
@@ -343,11 +343,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
 		die(_("--keep-unreachable and -A are incompatible"));
 
+	if (!(pack_everything & ALL_INTO_ONE)) {
+		if (write_bitmaps > 0)
+			die(_(incremental_bitmap_conflict_error));
+	} else if (write_bitmaps < 0) {
+		write_bitmaps = is_bare_repository();
+	}
+
 	if (pack_kept_objects < 0)
-		pack_kept_objects = write_bitmaps;
-
-	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE))
-		die(_(incremental_bitmap_conflict_error));
+		pack_kept_objects = write_bitmaps > 0;
 
 	packdir = mkpathdup("%s/pack", get_object_directory());
 	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
@@ -368,7 +372,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argv_array_push(&cmd.args, "--indexed-objects");
 	if (repository_format_partial_clone)
 		argv_array_push(&cmd.args, "--exclude-promisor-objects");
-	if (write_bitmaps)
+	if (write_bitmaps > 0)
 		argv_array_push(&cmd.args, "--write-bitmap-index");
 	if (use_delta_islands)
 		argv_array_push(&cmd.args, "--delta-islands");
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 6162e2a8e6..3e0b5c40e4 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -221,5 +221,17 @@ test_expect_success 'repack --keep-pack' '
 	)
 '
 
+test_expect_success 'bitmaps are created by default in bare repos' '
+	git clone --bare .git bare.git &&
+	cd bare.git &&
+	mkdir old &&
+	mv objects/pack/* old &&
+	pack=$(ls old/*.pack) &&
+	test_path_is_file "$pack" &&
+	git unpack-objects -q <"$pack" &&
+	git repack -ad &&
+	bitmap=$(ls objects/pack/*.bitmap) &&
+	test_path_is_file "$bitmap"
+'
+
 test_done
-

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] repack: enable bitmaps by default on bare repos
  2019-03-12  3:13       ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong
@ 2019-03-12  9:07         ` Ævar Arnfjörð Bjarmason
  2019-03-12 10:49         ` Jeff King
  1 sibling, 0 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-12  9:07 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jeff King, git


On Tue, Mar 12 2019, Eric Wong wrote:

> Jeff King <peff@peff.net> wrote:
>> On Sat, Mar 09, 2019 at 02:49:44AM +0000, Eric Wong wrote:
>> > It would make life easier for people new to hosting git servers
>> > (and hopefully reduce centralization :)
>>
>> I do think they're a net win for people hosting git servers. But if
>> that's the goal, I think at most you'd want to make bitmaps the default
>> for bare repos. They're really not much help for normal end-user repos
>> at this point.
>
> Fair enough, hopefully this can make life easier for admins
> new to hosting git:
>
> ----------8<---------
> Subject: [PATCH] repack: enable bitmaps by default on bare repos
>
> A typical use case for bare repos is for serving clones and
> fetches to clients.  Enable bitmaps by default on bare repos to
> make it easier for admins to host git repos in a performant way.
>
> Signed-off-by: Eric Wong <e@80x24.org>
> ---
>  builtin/repack.c  | 16 ++++++++++------
>  t/t7700-repack.sh | 14 +++++++++++++-
>  2 files changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 67f8978043..5d4758b515 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -14,7 +14,7 @@
>
>  static int delta_base_offset = 1;
>  static int pack_kept_objects = -1;
> -static int write_bitmaps;
> +static int write_bitmaps = -1;
>  static int use_delta_islands;
>  static char *packdir, *packtmp;
>
> @@ -343,11 +343,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
>  		die(_("--keep-unreachable and -A are incompatible"));
>
> +	if (!(pack_everything & ALL_INTO_ONE)) {
> +		if (write_bitmaps > 0)
> +			die(_(incremental_bitmap_conflict_error));
> +	} else if (write_bitmaps < 0) {
> +		write_bitmaps = is_bare_repository();
> +	}
> +
>  	if (pack_kept_objects < 0)
> -		pack_kept_objects = write_bitmaps;
> -
> -	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE))
> -		die(_(incremental_bitmap_conflict_error));
> +		pack_kept_objects = write_bitmaps > 0;
>
>  	packdir = mkpathdup("%s/pack", get_object_directory());
>  	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
> @@ -368,7 +372,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	argv_array_push(&cmd.args, "--indexed-objects");
>  	if (repository_format_partial_clone)
>  		argv_array_push(&cmd.args, "--exclude-promisor-objects");
> -	if (write_bitmaps)
> +	if (write_bitmaps > 0)
>  		argv_array_push(&cmd.args, "--write-bitmap-index");
>  	if (use_delta_islands)
>  		argv_array_push(&cmd.args, "--delta-islands");
> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index 6162e2a8e6..3e0b5c40e4 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -221,5 +221,17 @@ test_expect_success 'repack --keep-pack' '
>  	)
>  '
>
> +test_expect_success 'bitmaps are created by default in bare repos' '
> +	git clone --bare .git bare.git &&
> +	cd bare.git &&
> +	mkdir old &&
> +	mv objects/pack/* old &&
> +	pack=$(ls old/*.pack) &&
> +	test_path_is_file "$pack" &&
> +	git unpack-objects -q <"$pack" &&
> +	git repack -ad &&
> +	bitmap=$(ls objects/pack/*.bitmap) &&
> +	test_path_is_file "$bitmap"
> +'
> +
>  test_done
> -

Looks sensible in principle, but now the git-repack and git-config docs
talking about repack.writeBitmaps are out-of-date. A v2 should update
those.

Also worth testing that -c repack.writeBitmaps=false on a bare
repository still overrides it, even though glancing at the code it looks
like that case is handled, but without being tested for.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] repack: enable bitmaps by default on bare repos
  2019-03-12  3:13       ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong
  2019-03-12  9:07         ` Ævar Arnfjörð Bjarmason
@ 2019-03-12 10:49         ` Jeff King
  2019-03-12 12:05           ` Jeff King
  2019-03-13  1:51           ` Eric Wong
  1 sibling, 2 replies; 57+ messages in thread
From: Jeff King @ 2019-03-12 10:49 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

On Tue, Mar 12, 2019 at 03:13:03AM +0000, Eric Wong wrote:

> > I do think they're a net win for people hosting git servers. But if
> > that's the goal, I think at most you'd want to make bitmaps the default
> > for bare repos. They're really not much help for normal end-user repos
> > at this point.
> 
> Fair enough, hopefully this can make life easier for admins
> new to hosting git:
> 
> ----------8<---------
> Subject: [PATCH] repack: enable bitmaps by default on bare repos
> 
> A typical use case for bare repos is for serving clones and
> fetches to clients.  Enable bitmaps by default on bare repos to
> make it easier for admins to host git repos in a performant way.

OK. I still think of bitmaps as something that might need manual care
and feeding, but I think that may be leftover superstition. I can't
offhand think of any real downsides to this.

>  static int delta_base_offset = 1;
>  static int pack_kept_objects = -1;
> -static int write_bitmaps;
> +static int write_bitmaps = -1;

So we'll have "-1" be "not decided yet". Makes sense.

> @@ -343,11 +343,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
>  		die(_("--keep-unreachable and -A are incompatible"));
>  
> +	if (!(pack_everything & ALL_INTO_ONE)) {
> +		if (write_bitmaps > 0)
> +			die(_(incremental_bitmap_conflict_error));
> +	} else if (write_bitmaps < 0) {
> +		write_bitmaps = is_bare_repository();
> +	}

Might it be easier here to always resolve "-1" into a 0/1? I.e., like:

  if (write_bitmaps < 0)
	write_bitmaps = (pack_everything & ALL_INTO_ONE) && is_bare_repository();

and then the rest of the logic can stay the same, and does not need to
be modified to handle "write_bitmaps < 0"?

> +test_expect_success 'bitmaps are created by default in bare repos' '
> +	git clone --bare .git bare.git &&
> +	cd bare.git &&

Please don't "cd" outside of a subshell, since it impacts further tests
that are added.

> +	mkdir old &&
> +	mv objects/pack/* old &&
> +	pack=$(ls old/*.pack) &&

Are we sure we have just done $pack here? Our repo came from a
local-disk clone, which would have just hard-linked whatever was in the
source repo. So we're subtly relying on the state that other tests have
left.

I'm not sure what we're trying to accomplish with this unpacking,
though. Running "git repack -ad" should generate bitmaps whether the
objects were already in a single pack or not. So I think this test can
just be:

  git clone --bare . bare.git &&
  git -C bare.git repack -ad &&
  bitmap=$(ls objects/pack/*.bitmap)
  test_path_is_file "$bitmap"

I do agree with Ævar it might also be worth testing that disabling
bitmaps explicitly still works. And also that repacking _without_ "-a"
(i.e., an incremental) does not complain about being unable to generate
bitmaps.

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] repack: enable bitmaps by default on bare repos
  2019-03-12 10:49         ` Jeff King
@ 2019-03-12 12:05           ` Jeff King
  2019-03-13  1:51           ` Eric Wong
  1 sibling, 0 replies; 57+ messages in thread
From: Jeff King @ 2019-03-12 12:05 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

On Tue, Mar 12, 2019 at 06:49:54AM -0400, Jeff King wrote:

> I'm not sure what we're trying to accomplish with this unpacking,
> though. Running "git repack -ad" should generate bitmaps whether the
> objects were already in a single pack or not. So I think this test can
> just be:
> 
>   git clone --bare . bare.git &&
>   git -C bare.git repack -ad &&
>   bitmap=$(ls objects/pack/*.bitmap)
>   test_path_is_file "$bitmap"

Of course that should be "bare.git/objects/pack/*.bitmap" in the third
line, since we skipped the "cd" entirely.

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] repack: enable bitmaps by default on bare repos
  2019-03-12 10:49         ` Jeff King
  2019-03-12 12:05           ` Jeff King
@ 2019-03-13  1:51           ` Eric Wong
  2019-03-13 14:54             ` Jeff King
  1 sibling, 1 reply; 57+ messages in thread
From: Eric Wong @ 2019-03-13  1:51 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> wrote:
> OK. I still think of bitmaps as something that might need manual care
> and feeding, but I think that may be leftover superstition. I can't
> offhand think of any real downsides to this.

It's a _relatively_ new feature to long-time users like us, so
maybe the "new == immature" mindset sets in[*].  I skimmed some
mail archives but couldn't find any reason not to...

But I did find Ævar's forgotten gitperformance doc and thread
where the topic was brought up:

  https://public-inbox.org/git/20170403211644.26814-1-avarab@gmail.com/

> On Tue, Mar 12, 2019 at 03:13:03AM +0000, Eric Wong wrote:
> > @@ -343,11 +343,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> >  	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
> >  		die(_("--keep-unreachable and -A are incompatible"));
> >  
> > +	if (!(pack_everything & ALL_INTO_ONE)) {
> > +		if (write_bitmaps > 0)
> > +			die(_(incremental_bitmap_conflict_error));
> > +	} else if (write_bitmaps < 0) {
> > +		write_bitmaps = is_bare_repository();
> > +	}
> 
> Might it be easier here to always resolve "-1" into a 0/1? I.e., like:
> 
>   if (write_bitmaps < 0)
> 	write_bitmaps = (pack_everything & ALL_INTO_ONE) && is_bare_repository();
> 
> and then the rest of the logic can stay the same, and does not need to
> be modified to handle "write_bitmaps < 0"?

Good point, changed in v2.

> > +test_expect_success 'bitmaps are created by default in bare repos' '
> > +	git clone --bare .git bare.git &&
> > +	cd bare.git &&
> 
> Please don't "cd" outside of a subshell, since it impacts further tests
> that are added.

Oops, I got it into my head that each test was already run in a
separate subshell :x  Fixed.

> > +	mkdir old &&
> > +	mv objects/pack/* old &&
> > +	pack=$(ls old/*.pack) &&
> 
> Are we sure we have just done $pack here? Our repo came from a
> local-disk clone, which would have just hard-linked whatever was in the
> source repo. So we're subtly relying on the state that other tests have
> left.
> 
> I'm not sure what we're trying to accomplish with this unpacking,
> though. Running "git repack -ad" should generate bitmaps whether the
> objects were already in a single pack or not. So I think this test can
> just be:

Right, I forgot "repack -a" didn't need loose objects to operate :x

> I do agree with Ævar it might also be worth testing that disabling
> bitmaps explicitly still works. And also that repacking _without_ "-a"
> (i.e., an incremental) does not complain about being unable to generate
> bitmaps.

Yup, now with additional tests for repack.writeBitmaps=false,
repack (w/o "-a") and a config/repack.txt update

------------8<---------
Subject: [PATCH] repack: enable bitmaps by default on bare repos

A typical use case for bare repos is for serving clones and
fetches to clients.  Enable bitmaps by default on bare repos to
make it easier for admins to host git repos in a performant way.

Signed-off-by: Eric Wong <e@80x24.org>
Helped-by: Jeff King <peff@peff.net>
---
 Documentation/config/repack.txt |  2 +-
 builtin/repack.c                |  5 ++++-
 t/t7700-repack.sh               | 19 ++++++++++++++++++-
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/repack.txt b/Documentation/config/repack.txt
index a5c37813fd..9c413e177e 100644
--- a/Documentation/config/repack.txt
+++ b/Documentation/config/repack.txt
@@ -24,4 +24,4 @@ repack.writeBitmaps::
 	packs created for clones and fetches, at the cost of some disk
 	space and extra time spent on the initial repack.  This has
 	no effect if multiple packfiles are created.
-	Defaults to false.
+	Defaults to true on bare repos, false otherwise.
diff --git a/builtin/repack.c b/builtin/repack.c
index 67f8978043..caca113927 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -14,7 +14,7 @@
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
-static int write_bitmaps;
+static int write_bitmaps = -1;
 static int use_delta_islands;
 static char *packdir, *packtmp;
 
@@ -343,6 +343,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
 		die(_("--keep-unreachable and -A are incompatible"));
 
+	if (write_bitmaps < 0)
+		write_bitmaps = (pack_everything & ALL_INTO_ONE) &&
+				 is_bare_repository();
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps;
 
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 6162e2a8e6..6458286dcf 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -221,5 +221,22 @@ test_expect_success 'repack --keep-pack' '
 	)
 '
 
-test_done
+test_expect_success 'bitmaps are created by default in bare repos' '
+	git clone --bare .git bare.git &&
+	git -C bare.git repack -ad &&
+	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
+	test_path_is_file "$bitmap"
+'
+
+test_expect_success 'incremental repack does not complain' '
+	git -C bare.git repack -q 2>repack.err &&
+	! test -s repack.err
+'
 
+test_expect_success 'bitmaps can be disabled on bare repos' '
+	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	bitmap=$(ls bare.git/objects/pack/*.bitmap 2>/dev/null || :) &&
+	test -z "$bitmap"
+'
+
+test_done
-- 
EW

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] repack: enable bitmaps by default on bare repos
  2019-03-13  1:51           ` Eric Wong
@ 2019-03-13 14:54             ` Jeff King
  2019-03-14  9:12               ` [PATCH v3] " Eric Wong
  0 siblings, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-03-13 14:54 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Ævar Arnfjörð Bjarmason

On Wed, Mar 13, 2019 at 01:51:33AM +0000, Eric Wong wrote:

> But I did find Ævar's forgotten gitperformance doc and thread
> where the topic was brought up:
> 
>   https://public-inbox.org/git/20170403211644.26814-1-avarab@gmail.com/

One thing that thread reminded me of: we probably also want to default
pack.writebitmaphashcache on. Otherwise the time saved during the object
enumeration can backfire when we spend much more time trying delta
compression (because we don't know the pathnames of any objects).

The reason it defaults to off is for on-disk compatibility with JGit.
But I have very little experience running without the hash-cache on. We
added it very early on because we found performance was not great
without it (I don't know if people running JGit have run into the same
problem and if not, why not).

> ------------8<---------
> Subject: [PATCH] repack: enable bitmaps by default on bare repos
> 
> A typical use case for bare repos is for serving clones and
> fetches to clients.  Enable bitmaps by default on bare repos to
> make it easier for admins to host git repos in a performant way.

This looks good to me, with one minor nit:

> +test_expect_success 'incremental repack does not complain' '
> +	git -C bare.git repack -q 2>repack.err &&
> +	! test -s repack.err
> +'

This last line could use "test_must_be_empty".

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-03-13 14:54             ` Jeff King
@ 2019-03-14  9:12               ` Eric Wong
  2019-03-14 16:02                 ` Jeff King
  2019-04-09 15:10                 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 57+ messages in thread
From: Eric Wong @ 2019-03-14  9:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> wrote:
> On Wed, Mar 13, 2019 at 01:51:33AM +0000, Eric Wong wrote:
> 
> > But I did find Ævar's forgotten gitperformance doc and thread
> > where the topic was brought up:
> > 
> >   https://public-inbox.org/git/20170403211644.26814-1-avarab@gmail.com/
> 
> One thing that thread reminded me of: we probably also want to default
> pack.writebitmaphashcache on. Otherwise the time saved during the object
> enumeration can backfire when we spend much more time trying delta
> compression (because we don't know the pathnames of any objects).

Interesting...  I found a big improvement with public-inbox
just using bitmaps; but have never tried the hash cache.

> The reason it defaults to off is for on-disk compatibility with JGit.

Right.  Our documentation seems to indicate JGit just warns (but
doesn't fall over), so maybe that can be considered separately.

I've never used JGit myself; and was satisfied enough with
bitmaps alone that I never tried the hash-cache.

> But I have very little experience running without the hash-cache on. We
> added it very early on because we found performance was not great
> without it (I don't know if people running JGit have run into the same
> problem and if not, why not).

As far as serving clones and fetches, public-inbox-init has
always created bare repos with bitmaps enabled, but without
the hash-cache for compatibility concerns.

That's a lot of fetches and clones over the years.

> > +test_expect_success 'incremental repack does not complain' '
> > +	git -C bare.git repack -q 2>repack.err &&
> > +	! test -s repack.err
> > +'
> 
> This last line could use "test_must_be_empty".

Thanks for the review!

---------8<-----------
Subject: [PATCH] repack: enable bitmaps by default on bare repos

A typical use case for bare repos is for serving clones and
fetches to clients.  Enable bitmaps by default on bare repos to
make it easier for admins to host git repos in a performant way.

Signed-off-by: Eric Wong <e@80x24.org>
Helped-by: Jeff King <peff@peff.net>
---
 Documentation/config/repack.txt |  2 +-
 builtin/repack.c                |  5 ++++-
 t/t7700-repack.sh               | 19 ++++++++++++++++++-
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/repack.txt b/Documentation/config/repack.txt
index a5c37813fd..9c413e177e 100644
--- a/Documentation/config/repack.txt
+++ b/Documentation/config/repack.txt
@@ -24,4 +24,4 @@ repack.writeBitmaps::
 	packs created for clones and fetches, at the cost of some disk
 	space and extra time spent on the initial repack.  This has
 	no effect if multiple packfiles are created.
-	Defaults to false.
+	Defaults to true on bare repos, false otherwise.
diff --git a/builtin/repack.c b/builtin/repack.c
index 67f8978043..caca113927 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -14,7 +14,7 @@
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
-static int write_bitmaps;
+static int write_bitmaps = -1;
 static int use_delta_islands;
 static char *packdir, *packtmp;
 
@@ -343,6 +343,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
 		die(_("--keep-unreachable and -A are incompatible"));
 
+	if (write_bitmaps < 0)
+		write_bitmaps = (pack_everything & ALL_INTO_ONE) &&
+				 is_bare_repository();
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps;
 
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 6162e2a8e6..86d05160a3 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -221,5 +221,22 @@ test_expect_success 'repack --keep-pack' '
 	)
 '
 
-test_done
+test_expect_success 'bitmaps are created by default in bare repos' '
+	git clone --bare .git bare.git &&
+	git -C bare.git repack -ad &&
+	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
+	test_path_is_file "$bitmap"
+'
+
+test_expect_success 'incremental repack does not complain' '
+	git -C bare.git repack -q 2>repack.err &&
+	test_must_be_empty repack.err
+'
 
+test_expect_success 'bitmaps can be disabled on bare repos' '
+	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	bitmap=$(ls bare.git/objects/pack/*.bitmap 2>/dev/null || :) &&
+	test -z "$bitmap"
+'
+
+test_done
-- 
EW

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-03-14  9:12               ` [PATCH v3] " Eric Wong
@ 2019-03-14 16:02                 ` Jeff King
  2019-03-15  6:21                   ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King
  2019-04-09 15:10                 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-03-14 16:02 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Ævar Arnfjörð Bjarmason

On Thu, Mar 14, 2019 at 09:12:54AM +0000, Eric Wong wrote:

> > The reason it defaults to off is for on-disk compatibility with JGit.
> 
> Right.  Our documentation seems to indicate JGit just warns (but
> doesn't fall over), so maybe that can be considered separately.

I think it was a hard error in the beginning, but they changed it pretty
soon after we added more flags. So it might be reasonable to just enable
it by default (but it wouldn't hurt to double check the behavior).

I tried running t5310 (which does a back-and-forth with jgit) using this
patch:

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a154fc29f6..5264bf055a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -97,7 +97,7 @@ static off_t reuse_packfile_offset;
 static int use_bitmap_index_default = 1;
 static int use_bitmap_index = -1;
 static int write_bitmap_index;
-static uint16_t write_bitmap_options;
+static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
 
 static int exclude_promisor_objects;
 

and it seemed happy.

> As far as serving clones and fetches, public-inbox-init has
> always created bare repos with bitmaps enabled, but without
> the hash-cache for compatibility concerns.
> 
> That's a lot of fetches and clones over the years.

The symptom you'd see is that "Compressing objects" during a fetch takes
a long time, and/or produces lousy deltas. But it matters less if:

  - you keep things packed pretty promptly, because we don't bother
    looking for new deltas between objects in the same pack. Just trying
    to clone public-inbox.org/git, it does look like it's mostly packed
    (based on the object counts) but the compression phase still takes
    10+ seconds.

  - how much the names actually help. In your case, I'd think not at
    all, because being based on hashes, they're effectively random. So
    the pack-objects heuristics to try to find deltas between files of
    similar filenames will not help you.

Regarding the second thing, I wondered if the overall packing of your
public-inbox git repo might not be good, so I did a "git repack -adf
--window=1000" on a clone.  Hundreds of CPU minutes later, I was only
able to shave off about 80MB. I'm not sure if that means you
occasionally do very aggressive repacks, or if there simply isn't all
that much delta opportunity (after all, you're not storing many versions
of one file, but rather tons of different emails; I would expect to find
deltas between various versions of a patch, though).

Anyway...

> ---------8<-----------
> Subject: [PATCH] repack: enable bitmaps by default on bare repos
> 
> A typical use case for bare repos is for serving clones and
> fetches to clients.  Enable bitmaps by default on bare repos to
> make it easier for admins to host git repos in a performant way.
> 
> Signed-off-by: Eric Wong <e@80x24.org>
> Helped-by: Jeff King <peff@peff.net>

This version looks good to me. If we're going to flip the hash-cache
default, I think that should be a separate patch anyway.

-Peff

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 0/2] enable bitmap hash-cache by default
  2019-03-14 16:02                 ` Jeff King
@ 2019-03-15  6:21                   ` Jeff King
  2019-03-15  6:22                     ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King
  2019-03-15  6:25                     ` [PATCH 2/2] pack-objects: default to writing bitmap hash-cache Jeff King
  0 siblings, 2 replies; 57+ messages in thread
From: Jeff King @ 2019-03-15  6:21 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Ævar Arnfjörð Bjarmason

On Thu, Mar 14, 2019 at 12:02:56PM -0400, Jeff King wrote:

> On Thu, Mar 14, 2019 at 09:12:54AM +0000, Eric Wong wrote:
> 
> > > The reason it defaults to off is for on-disk compatibility with JGit.
> > 
> > Right.  Our documentation seems to indicate JGit just warns (but
> > doesn't fall over), so maybe that can be considered separately.
> 
> I think it was a hard error in the beginning, but they changed it pretty
> soon after we added more flags. So it might be reasonable to just enable
> it by default (but it wouldn't hurt to double check the behavior).
> 
> I tried running t5310 (which does a back-and-forth with jgit) using this
> patch:

I dug up the actual JGit change, and it was indeed from 2014. So here's
a more complete series to handle that. There's a minor performance
mystery in the second patch, but I think it might be OK to proceed even
without solving it.

Conceptually these go on top of your patch, but they could be applied
separately.

  [1/2]: t5310: correctly remove bitmaps for jgit test
  [2/2]: pack-objects: default to writing bitmap hash-cache

 Documentation/config/pack.txt      | 4 +---
 builtin/pack-objects.c             | 2 +-
 t/perf/p5310-pack-bitmaps.sh       | 3 +--
 t/perf/p5311-pack-bitmaps-fetch.sh | 1 -
 t/t5310-pack-bitmaps.sh            | 5 ++---
 5 files changed, 5 insertions(+), 10 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 1/2] t5310: correctly remove bitmaps for jgit test
  2019-03-15  6:21                   ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King
@ 2019-03-15  6:22                     ` Jeff King
  2019-03-15 13:25                       ` SZEDER Gábor
  2019-03-15  6:25                     ` [PATCH 2/2] pack-objects: default to writing bitmap hash-cache Jeff King
  1 sibling, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-03-15  6:22 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Ævar Arnfjörð Bjarmason

We use "jgit gc" to generate a pack bitmap file, and then make sure our
implementation can read it. To prepare the repo before running jgit, we
try to "rm -f" any existing bitmap files. But we got the path wrong;
we're in a bare repo, so looking in ".git/" finds nothing. Our "rm"
doesn't complain because of the "-f", and when we run "rev-list" there
are two bitmap files (ours and jgit's).

Our reader implementation will ignore one of the bitmap files, but it's
likely non-deterministic which one we will use. We'd prefer the one with
the more recent timestamp (just because of the way the packed_git list
is sorted), but in most test runs they'd have identical timestamps.

So this was probably actually testing something useful about 50% of the
time, and other half just testing that we could read our own bitmaps
(which is covered elsewhere).

Signed-off-by: Jeff King <peff@peff.net>
---
Just a cleanup I noticed in the area; can be applied independently from
patch 2.

 t/t5310-pack-bitmaps.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 82d7f7f6a5..44a038881a 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -269,7 +269,7 @@ test_expect_success JGIT 'we can read jgit bitmaps' '
 	git clone --bare . compat-jgit.git &&
 	(
 		cd compat-jgit.git &&
-		rm -f .git/objects/pack/*.bitmap &&
+		rm -f objects/pack/*.bitmap &&
 		jgit gc &&
 		git rev-list --test-bitmap HEAD
 	)
-- 
2.21.0.543.g90eed137f3


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 2/2] pack-objects: default to writing bitmap hash-cache
  2019-03-15  6:21                   ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King
  2019-03-15  6:22                     ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King
@ 2019-03-15  6:25                     ` Jeff King
  1 sibling, 0 replies; 57+ messages in thread
From: Jeff King @ 2019-03-15  6:25 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Ævar Arnfjörð Bjarmason

Enabling pack.writebitmaphashcache should always be a performance win.
It costs only 4 bytes per object on disk, and the timings in ae4f07fbcc
(pack-bitmap: implement optional name_hash cache, 2013-12-21) show it
improving fetch and partial-bitmap clone times by 40-50%.

The only reason we didn't enable it by default at the time is that early
versions of JGit's bitmap reader complained about the presence of
optional header bits it didn't understand. But that was changed in
JGit's d2fa3987a (Use bitcheck to check for presence of OPT_FULL option,
2013-10-30), which made it into JGit v3.5.0 in late 2014.

So let's turn this option on by default. It's backwards-compatible with
all versions of Git, and if you are also using JGit on the same
repository, you'd only run into problems using a version that's almost 5
years old.

We'll drop the manual setting from all of our test scripts, including
perf tests. This isn't strictly necessary, but it has two advantages:

  1. If the hash-cache ever stops being enabled by default, our perf
     regression tests will notice.

  2. We can use the modified perf tests to show off the behavior of an
     otherwise unconfigured repo, as shown below.

These are the results of a few of a perf tests against linux.git that
showed interesting results. You can see the expected speedup in 5310.4,
which was noted in ae4f07fbcc. Curiously, 5310.8 did not improve (and
actually got slower), despite seeing the opposite in ae4f07fbcc.
I don't have an explanation for that.

The tests from p5311 did not exist back then, but do show improvements
(a smaller pack due to better deltas, which we found in less time).

  Test                                    HEAD^                HEAD
  -------------------------------------------------------------------------------------
  5310.4: simulated fetch                 7.39(22.70+0.25)     5.64(11.43+0.22) -23.7%
  5310.8: clone (partial bitmap)          18.45(24.83+1.19)    19.94(28.40+1.36) +8.1%
  5311.31: server (128 days)              0.41(1.13+0.05)      0.34(0.72+0.02) -17.1%
  5311.32: size   (128 days)                         7.4M                 7.0M -4.8%
  5311.33: client (128 days)              1.33(1.49+0.06)      1.29(1.37+0.12) -3.0%

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/config/pack.txt      | 4 +---
 builtin/pack-objects.c             | 2 +-
 t/perf/p5310-pack-bitmaps.sh       | 3 +--
 t/perf/p5311-pack-bitmaps-fetch.sh | 1 -
 t/t5310-pack-bitmaps.sh            | 3 +--
 5 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 425c73aa52..9cdcfa7324 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -124,6 +124,4 @@ pack.writeBitmapHashCache::
 	bitmapped and non-bitmapped objects (e.g., when serving a fetch
 	between an older, bitmapped pack and objects that have been
 	pushed since the last gc). The downside is that it consumes 4
-	bytes per object of disk space, and that JGit's bitmap
-	implementation does not understand it, causing it to complain if
-	Git and JGit are used on the same repository. Defaults to false.
+	bytes per object of disk space. Defaults to true.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a154fc29f6..5264bf055a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -97,7 +97,7 @@ static off_t reuse_packfile_offset;
 static int use_bitmap_index_default = 1;
 static int use_bitmap_index = -1;
 static int write_bitmap_index;
-static uint16_t write_bitmap_options;
+static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
 
 static int exclude_promisor_objects;
 
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index bb91dbb173..6a3a42531b 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -12,8 +12,7 @@ test_perf_large_repo
 # We intentionally use the deprecated pack.writebitmaps
 # config so that we can test against older versions of git.
 test_expect_success 'setup bitmap config' '
-	git config pack.writebitmaps true &&
-	git config pack.writebitmaphashcache true
+	git config pack.writebitmaps true
 '
 
 test_perf 'repack to disk' '
diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
index b04575951f..47c3fd7581 100755
--- a/t/perf/p5311-pack-bitmaps-fetch.sh
+++ b/t/perf/p5311-pack-bitmaps-fetch.sh
@@ -7,7 +7,6 @@ test_perf_default_repo
 
 test_expect_success 'create bitmapped server repo' '
 	git config pack.writebitmaps true &&
-	git config pack.writebitmaphashcache true &&
 	git repack -ad
 '
 
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 44a038881a..a26c8ba9a2 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -34,8 +34,7 @@ test_expect_success 'setup repo with moderate-sized history' '
 	bitmaptip=$(git rev-parse master) &&
 	blob=$(echo tagged-blob | git hash-object -w --stdin) &&
 	git tag tagged-blob $blob &&
-	git config repack.writebitmaps true &&
-	git config pack.writebitmaphashcache true
+	git config repack.writebitmaps true
 '
 
 test_expect_success 'full repack creates bitmaps' '
-- 
2.21.0.543.g90eed137f3

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] t5310: correctly remove bitmaps for jgit test
  2019-03-15  6:22                     ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King
@ 2019-03-15 13:25                       ` SZEDER Gábor
  2019-03-15 18:36                         ` Jeff King
  0 siblings, 1 reply; 57+ messages in thread
From: SZEDER Gábor @ 2019-03-15 13:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Wong, git, Ævar Arnfjörð Bjarmason

On Fri, Mar 15, 2019 at 02:22:44AM -0400, Jeff King wrote:
> We use "jgit gc" to generate a pack bitmap file, and then make sure our
> implementation can read it. To prepare the repo before running jgit, we
> try to "rm -f" any existing bitmap files. But we got the path wrong;
> we're in a bare repo, so looking in ".git/" finds nothing. Our "rm"
> doesn't complain because of the "-f", and when we run "rev-list" there
> are two bitmap files (ours and jgit's).

Oh, indeed.

> Our reader implementation will ignore one of the bitmap files, but it's
> likely non-deterministic which one we will use. We'd prefer the one with
> the more recent timestamp (just because of the way the packed_git list
> is sorted), but in most test runs they'd have identical timestamps.
> 
> So this was probably actually testing something useful about 50% of the
> time, and other half just testing that we could read our own bitmaps
> (which is covered elsewhere).

At least it was testing the right thing up until 87a6bb701a
(t5310-pack-bitmaps: make JGit tests work with GIT_TEST_SPLIT_INDEX,
2018-05-10) came along.


>  t/t5310-pack-bitmaps.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 82d7f7f6a5..44a038881a 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -269,7 +269,7 @@ test_expect_success JGIT 'we can read jgit bitmaps' '
>  	git clone --bare . compat-jgit.git &&
>  	(
>  		cd compat-jgit.git &&
> -		rm -f .git/objects/pack/*.bitmap &&
> +		rm -f objects/pack/*.bitmap &&
>  		jgit gc &&
>  		git rev-list --test-bitmap HEAD
>  	)
> -- 
> 2.21.0.543.g90eed137f3
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] t5310: correctly remove bitmaps for jgit test
  2019-03-15 13:25                       ` SZEDER Gábor
@ 2019-03-15 18:36                         ` Jeff King
  0 siblings, 0 replies; 57+ messages in thread
From: Jeff King @ 2019-03-15 18:36 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Eric Wong, git, Ævar Arnfjörð Bjarmason

On Fri, Mar 15, 2019 at 02:25:45PM +0100, SZEDER Gábor wrote:

> > Our reader implementation will ignore one of the bitmap files, but it's
> > likely non-deterministic which one we will use. We'd prefer the one with
> > the more recent timestamp (just because of the way the packed_git list
> > is sorted), but in most test runs they'd have identical timestamps.
> > 
> > So this was probably actually testing something useful about 50% of the
> > time, and other half just testing that we could read our own bitmaps
> > (which is covered elsewhere).
> 
> At least it was testing the right thing up until 87a6bb701a
> (t5310-pack-bitmaps: make JGit tests work with GIT_TEST_SPLIT_INDEX,
> 2018-05-10) came along.

Heh, indeed. I didn't even dig into the history, and just assumed I had
botched it in 2013. :)

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-03-14  9:12               ` [PATCH v3] " Eric Wong
  2019-03-14 16:02                 ` Jeff King
@ 2019-04-09 15:10                 ` Ævar Arnfjörð Bjarmason
  2019-04-10 22:57                   ` Jeff King
  1 sibling, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-09 15:10 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jeff King, git


On Thu, Mar 14 2019, Eric Wong wrote:

> Jeff King <peff@peff.net> wrote:
>> On Wed, Mar 13, 2019 at 01:51:33AM +0000, Eric Wong wrote:
>>
>> > But I did find Ævar's forgotten gitperformance doc and thread
>> > where the topic was brought up:
>> >
>> >   https://public-inbox.org/git/20170403211644.26814-1-avarab@gmail.com/
>>
>> One thing that thread reminded me of: we probably also want to default
>> pack.writebitmaphashcache on. Otherwise the time saved during the object
>> enumeration can backfire when we spend much more time trying delta
>> compression (because we don't know the pathnames of any objects).
>
> Interesting...  I found a big improvement with public-inbox
> just using bitmaps; but have never tried the hash cache.
>
>> The reason it defaults to off is for on-disk compatibility with JGit.
>
> Right.  Our documentation seems to indicate JGit just warns (but
> doesn't fall over), so maybe that can be considered separately.
>
> I've never used JGit myself; and was satisfied enough with
> bitmaps alone that I never tried the hash-cache.
>
>> But I have very little experience running without the hash-cache on. We
>> added it very early on because we found performance was not great
>> without it (I don't know if people running JGit have run into the same
>> problem and if not, why not).
>
> As far as serving clones and fetches, public-inbox-init has
> always created bare repos with bitmaps enabled, but without
> the hash-cache for compatibility concerns.
>
> That's a lot of fetches and clones over the years.
>
>> > +test_expect_success 'incremental repack does not complain' '
>> > +	git -C bare.git repack -q 2>repack.err &&
>> > +	! test -s repack.err
>> > +'
>>
>> This last line could use "test_must_be_empty".
>
> Thanks for the review!
>
> ---------8<-----------
> Subject: [PATCH] repack: enable bitmaps by default on bare repos
>
> A typical use case for bare repos is for serving clones and
> fetches to clients.  Enable bitmaps by default on bare repos to
> make it easier for admins to host git repos in a performant way.
>
> Signed-off-by: Eric Wong <e@80x24.org>
> Helped-by: Jeff King <peff@peff.net>
> ---
>  Documentation/config/repack.txt |  2 +-
>  builtin/repack.c                |  5 ++++-
>  t/t7700-repack.sh               | 19 ++++++++++++++++++-
>  3 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/config/repack.txt b/Documentation/config/repack.txt
> index a5c37813fd..9c413e177e 100644
> --- a/Documentation/config/repack.txt
> +++ b/Documentation/config/repack.txt
> @@ -24,4 +24,4 @@ repack.writeBitmaps::
>  	packs created for clones and fetches, at the cost of some disk
>  	space and extra time spent on the initial repack.  This has
>  	no effect if multiple packfiles are created.
> -	Defaults to false.
> +	Defaults to true on bare repos, false otherwise.
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 67f8978043..caca113927 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -14,7 +14,7 @@
>
>  static int delta_base_offset = 1;
>  static int pack_kept_objects = -1;
> -static int write_bitmaps;
> +static int write_bitmaps = -1;
>  static int use_delta_islands;
>  static char *packdir, *packtmp;
>
> @@ -343,6 +343,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
>  		die(_("--keep-unreachable and -A are incompatible"));
>
> +	if (write_bitmaps < 0)
> +		write_bitmaps = (pack_everything & ALL_INTO_ONE) &&
> +				 is_bare_repository();
>  	if (pack_kept_objects < 0)
>  		pack_kept_objects = write_bitmaps;
>
> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index 6162e2a8e6..86d05160a3 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -221,5 +221,22 @@ test_expect_success 'repack --keep-pack' '
>  	)
>  '
>
> -test_done
> +test_expect_success 'bitmaps are created by default in bare repos' '
> +	git clone --bare .git bare.git &&
> +	git -C bare.git repack -ad &&
> +	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
> +	test_path_is_file "$bitmap"
> +'
> +
> +test_expect_success 'incremental repack does not complain' '
> +	git -C bare.git repack -q 2>repack.err &&
> +	test_must_be_empty repack.err
> +'
>
> +test_expect_success 'bitmaps can be disabled on bare repos' '
> +	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
> +	bitmap=$(ls bare.git/objects/pack/*.bitmap 2>/dev/null || :) &&
> +	test -z "$bitmap"
> +'
> +
> +test_done

I've found a case where turning bitmaps on does horrible things for
bitmap "push" performance.

As it turns out it's not related to this patch per-se, because I had a
*.bitmap for other reasons, but replying to this because we'd presumably
get the same thing in the bare repo case once this merges down.

I can't share the repo, but I had a report where just a "git push" of a
topic branch that was 2/58 ahead/behind took ~2 minutes just in
"Enumerating objects", but ~500ms without bitmaps.

Using a horrible "print to stderr"[1] monkeypatch I'd get without
bitmaps and reported by trace2 / ts:

    Apr 09 16:45:15 16:45:15.365817 git.c:433                         | d1 | main                     | cmd_name     |     |           |           |            | pack-objects (push/pack-objects)
    Apr 09 16:45:15 16:45:15.366220 builtin/pack-objects.c:3493       | d1 | main                     | region_enter | r1  |  0.000928 |           | pack-objects | label:enumerate-objects
    Apr 09 16:45:15 16:45:15.366241 builtin/pack-objects.c:3495       | d1 | main                     | region_enter | r1  |  0.000950 |           | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 16:45:15 16:45:15.366384 builtin/pack-objects.c:3498       | d1 | main                     | region_leave | r1  |  0.001091 |  0.000141 | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 16:45:15 16:45:15.366394 builtin/pack-objects.c:3510       | d1 | main                     | region_enter | r1  |  0.001102 |           | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 16:45:15 get obj list 1
    Apr 09 16:45:15 get obj list 2, did 29391 lines
    Apr 09 16:45:15 get obj list 3
    Apr 09 16:45:15 get obj list 4
    Apr 09 16:45:15 get obj list 5
    Apr 09 16:45:15 get obj list 6
    Apr 09 16:45:15 get obj list 7
    Apr 09 16:45:15 get obj list 8
    Apr 09 16:45:15 get obj list 9
    Apr 09 16:45:15 16:45:15.776559 builtin/pack-objects.c:3514       | d1 | main                     | region_leave | r1  |  0.411263 |  0.410161 | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 16:45:15 16:45:15.776577 builtin/pack-objects.c:3517       | d1 | main                     | region_enter | r1  |  0.411285 |           | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 16:45:15 16:45:15.776584 builtin/pack-objects.c:3520       | d1 | main                     | region_leave | r1  |  0.411292 |  0.000007 | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 16:45:15 16:45:15.776605 builtin/pack-objects.c:3530       | d1 | main                     | region_leave | r1  |  0.411313 |  0.410385 | pack-objects | label:enumerate-objects
    Apr 09 16:45:15 16:45:15.776609 builtin/pack-objects.c:3542       | d1 | main                     | region_enter | r1  |  0.411318 |           | pack-objects | label:write-pack-file
    Apr 09 16:45:15 16:45:15.794235 builtin/pack-objects.c:3544       | d1 | main                     | region_leave | r1  |  0.428942 |  0.017624 | pack-objects | label:write-pack-file

But with pack.useBitmaps=true:

    Apr 09 16:39:59 16:39:59.139022 git.c:433                         | d1 | main                     | cmd_name     |     |           |           |            | pack-objects (push/pack-objects)
    Apr 09 16:39:59 16:39:59.139398 builtin/pack-objects.c:3493       | d1 | main                     | region_enter | r1  |  0.000869 |           | pack-objects | label:enumerate-objects
    Apr 09 16:39:59 16:39:59.139419 builtin/pack-objects.c:3495       | d1 | main                     | region_enter | r1  |  0.000892 |           | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 16:39:59 16:39:59.139551 builtin/pack-objects.c:3498       | d1 | main                     | region_leave | r1  |  0.001023 |  0.000131 | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 16:39:59 16:39:59.139559 builtin/pack-objects.c:3510       | d1 | main                     | region_enter | r1  |  0.001032 |           | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 16:39:59 get obj list 1
    Apr 09 16:39:59 get obj list 2, did 29392 lines
    Apr 09 16:39:59 get obj list 3
    Apr 09 16:39:59 prepping walk
    Apr 09 16:39:59 opening packed bitmap...
    Apr 09 16:39:59 opening packed bitmap done
    Apr 09 16:39:59 walking 29392 pending
    Apr 09 16:39:59 done walking 29392 pending
    Apr 09 16:39:59 prepare_bitmap_walk 3
    Apr 09 16:39:59 prepare_bitmap_walk 4
    Apr 09 16:39:59 prepare_bitmap_walk 5
    Apr 09 16:40:00 prepare_bitmap_walk 6
    Apr 09 16:40:00 prepare_bitmap_walk 6.1
    Apr 09 16:41:35 prepare_bitmap_walk 6.2
    Apr 09 16:41:35 prepare_bitmap_walk 7
    Apr 09 16:41:52 prepare_bitmap_walk 8
    Apr 09 16:41:52 walking?
    Apr 09 16:41:52 traversing
    Apr 09 16:41:52 traversing done
    Apr 09 16:41:52 16:41:52.091634 builtin/pack-objects.c:3514       | d1 | main                     | region_leave | r1  | 112.953099 | 112.952067 | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 16:41:52 16:41:52.091655 builtin/pack-objects.c:3517       | d1 | main                     | region_enter | r1  | 112.953128 |           | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 16:41:52 16:41:52.091668 builtin/pack-objects.c:3520       | d1 | main                     | region_leave | r1  | 112.953141 |  0.000013 | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 16:41:52 16:41:52.091700 builtin/pack-objects.c:3530       | d1 | main                     | region_leave | r1  | 112.953172 | 112.952303 | pack-objects | label:enumerate-objects
    Apr 09 16:41:52 16:41:52.091706 builtin/pack-objects.c:3542       | d1 | main                     | region_enter | r1  | 112.953179 |           | pack-objects | label:write-pack-file
    Apr 09 16:41:52 16:41:52.111966 builtin/pack-objects.c:3544       | d1 | main                     | region_leave | r1  | 112.973438 |  0.020259 | pack-objects | label:write-pack-file

I.e. almost all the time is in get_object_list_from_bitmap() and around
1m30s in just this in pack-bitmap.c:

    haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);

And then another ~20s in:

    wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);

This is with 10 packs and where only the largest (initial clone pack)
had a *.bitmap, but I can also reproduce with a 'git repack -A -d -b',
i.e. with only one pack with a *.bitmap, although that makes it a bit
better for the first bit, and almost completely cuts down on the time
spent in the second phase:

    Apr 09 17:08:37 17:08:37.261507 builtin/pack-objects.c:3493       | d1 | main                     | region_enter | r1  |  0.000922 |           | pack-objects | label:enumerate-objects
    Apr 09 17:08:37 17:08:37.261527 builtin/pack-objects.c:3495       | d1 | main                     | region_enter | r1  |  0.000943 |           | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 17:08:37 17:08:37.261600 builtin/pack-objects.c:3498       | d1 | main                     | region_leave | r1  |  0.001015 |  0.000072 | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 17:08:37 17:08:37.261608 builtin/pack-objects.c:3510       | d1 | main                     | region_enter | r1  |  0.001024 |           | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 17:08:37 get obj list 1
    Apr 09 17:08:37 get obj list 2, did 29380 lines
    Apr 09 17:08:37 get obj list 3
    Apr 09 17:08:37 prepping walk
    Apr 09 17:08:37 opening packed bitmap...
    Apr 09 17:08:37 opening packed bitmap done
    Apr 09 17:08:37 walking 29380 pending
    Apr 09 17:08:37 done walking 29380 pending
    Apr 09 17:08:37 prepare_bitmap_walk 3
    Apr 09 17:08:37 prepare_bitmap_walk 4
    Apr 09 17:08:37 prepare_bitmap_walk 5
    Apr 09 17:08:38 prepare_bitmap_walk 6
    Apr 09 17:08:38 prepare_bitmap_walk 6.1
    Apr 09 17:09:07 prepare_bitmap_walk 6.2
    Apr 09 17:09:07 prepare_bitmap_walk 7
    Apr 09 17:09:09 prepare_bitmap_walk 8
    Apr 09 17:09:09 walking?
    Apr 09 17:09:09 traversing
    Apr 09 17:09:09 traversing done
    Apr 09 17:09:09 17:09:09.229185 builtin/pack-objects.c:3514       | d1 | main                     | region_leave | r1  | 31.968595 | 31.967571 | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 17:09:09 17:09:09.229203 builtin/pack-objects.c:3517       | d1 | main                     | region_enter | r1  | 31.968619 |           | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 17:09:09 17:09:09.229214 builtin/pack-objects.c:3520       | d1 | main                     | region_leave | r1  | 31.968630 |  0.000011 | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 17:09:09 17:09:09.229242 builtin/pack-objects.c:3530       | d1 | main                     | region_leave | r1  | 31.968658 | 31.967736 | pack-objects | label:enumerate-objects
    Apr 09 17:09:09 17:09:09.229265 builtin/pack-objects.c:3542       | d1 | main                     | region_enter | r1  | 31.968681 |           | pack-objects | label:write-pack-file
    Apr 09 17:09:09 17:09:09.251998 builtin/pack-objects.c:3544       | d1 | main                     | region_leave | r1  | 31.991412 |  0.022731 | pack-objects | label:write-pack-file

I don't have time to dig more into this now, just wanted to send these
initial results...

1.



    diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
    index a154fc29f6..8b2af1740e 100644
    --- a/builtin/pack-objects.c
    +++ b/builtin/pack-objects.c
    @@ -3052,2 +3052,3 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
     {
    +	fprintf(stderr, "prepping walk\n");
     	if (!(bitmap_git = prepare_bitmap_walk(revs)))
    @@ -3055,2 +3056,3 @@ static int get_object_list_from_bitmap(struct rev_info *revs)

    +	fprintf(stderr, "walking?\n");
     	if (pack_options_allow_reuse() &&
    @@ -3066,3 +3068,5 @@ static int get_object_list_from_bitmap(struct rev_info *revs)

    +	fprintf(stderr, "traversing\n");
     	traverse_bitmap_commit_list(bitmap_git, &add_object_entry_from_bitmap);
    +	fprintf(stderr, "traversing done\n");
     	return 0;
    @@ -3091,2 +3095,3 @@ static void get_object_list(int ac, const char **av)
     	int save_warning;
    +	int lns = 0;

    @@ -3102,3 +3107,5 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 1\n");
     	while (fgets(line, sizeof(line), stdin) != NULL) {
    +		lns++;
     		int len = strlen(line);
    @@ -3128,4 +3135,6 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 2, did %d lines\n", lns);
     	warn_on_object_refname_ambiguity = save_warning;

    +	fprintf(stderr, "get obj list 3\n");
     	if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
    @@ -3133,2 +3142,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 4\n");
     	if (use_delta_islands)
    @@ -3136,2 +3146,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 5\n");
     	if (prepare_revision_walk(&revs))
    @@ -3140,2 +3151,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 6\n");
     	if (!fn_show_object)
    @@ -3146,2 +3158,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 7\n");
     	if (unpack_unreachable_expiration) {
    @@ -3157,2 +3170,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 8\n");
     	if (keep_unreachable)
    @@ -3163,2 +3177,3 @@ static void get_object_list(int ac, const char **av)
     		loosen_unused_packed_objects(&revs);
    +	fprintf(stderr, "get obj list 9\n");

    @@ -3478,3 +3493,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
     			    the_repository);
    +	trace2_region_enter("pack-objects", "enumerate-objects-prepare-packing-data",
    +			    the_repository);
     	prepare_packing_data(the_repository, &to_pack);
    +	trace2_region_leave("pack-objects", "enumerate-objects-prepare-packing-data",
    +			    the_repository);

    @@ -3482,11 +3501,28 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
     		progress_state = start_progress(_("Enumerating objects"), 0);
    -	if (!use_internal_rev_list)
    +	if (!use_internal_rev_list) {
    +		trace2_region_enter("pack-objects", "enumerate-objects-read-stdin",
    +				    the_repository);
     		read_object_list_from_stdin();
    -	else {
    +		trace2_region_leave("pack-objects", "enumerate-objects-read-stdin",
    +				    the_repository);
    +	} else {
    +		trace2_region_enter("pack-objects", "enumerate-objects-get-obj-list",
    +				    the_repository);
     		get_object_list(rp.argc, rp.argv);
     		argv_array_clear(&rp);
    +		trace2_region_leave("pack-objects", "enumerate-objects-get-obj-list",
    +				    the_repository);
     	}
    +	trace2_region_enter("pack-objects", "enumerate-objects-cleanup-preferred-base",
    +			    the_repository);
     	cleanup_preferred_base();
    -	if (include_tag && nr_result)
    +	trace2_region_leave("pack-objects", "enumerate-objects-cleanup-preferred-base",
    +			    the_repository);
    +	if (include_tag && nr_result) {
    +		trace2_region_enter("pack-objects", "enumerate-objects-add-tags",
    +				    the_repository);
     		for_each_ref(add_ref_tag, NULL);
    +		trace2_region_leave("pack-objects", "enumerate-objects-add-tags",
    +				    the_repository);
    +	}
     	stop_progress(&progress_state);
    diff --git a/pack-bitmap.c b/pack-bitmap.c
    index 4695aaf6b4..0ab71597fd 100644
    --- a/pack-bitmap.c
    +++ b/pack-bitmap.c
    @@ -693,5 +693,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	 * because we may not need to use it */
    +	fprintf(stderr, "opening packed bitmap...\n");
     	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
     		goto cleanup;
    +	fprintf(stderr, "opening packed bitmap done\n");

    +	fprintf(stderr, "walking %d pending\n", revs->pending.nr);
     	for (i = 0; i < revs->pending.nr; ++i) {
    @@ -720,2 +723,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	}
    +	fprintf(stderr, "done walking %d pending\n", revs->pending.nr);

    @@ -726,2 +730,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	 */
    +	fprintf(stderr, "prepare_bitmap_walk 3\n");
     	if (haves && !in_bitmapped_pack(bitmap_git, haves))
    @@ -729,2 +734,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)

    +	fprintf(stderr, "prepare_bitmap_walk 4\n");
     	/* if we don't want anything, we're done here */
    @@ -738,2 +744,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	 */
    +	fprintf(stderr, "prepare_bitmap_walk 5\n");
     	if (load_pack_bitmap(bitmap_git) < 0)
    @@ -741,2 +748,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)

    +	fprintf(stderr, "prepare_bitmap_walk 6\n");
     	object_array_clear(&revs->pending);
    @@ -745,3 +753,5 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     		revs->ignore_missing_links = 1;
    + 		fprintf(stderr, "prepare_bitmap_walk 6.1\n");
     		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
    + 		fprintf(stderr, "prepare_bitmap_walk 6.2\n");
     		reset_revision_walk();
    @@ -752,4 +762,6 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	}
    +	fprintf(stderr, "prepare_bitmap_walk 7\n");

     	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);
    +	fprintf(stderr, "prepare_bitmap_walk 8\n");

    diff --git a/revision.c b/revision.c
    index eb8e51bc63..4592d01ee7 100644
    --- a/revision.c
    +++ b/revision.c
    @@ -63,2 +63,4 @@ static void mark_tree_contents_uninteresting(struct repository *r,

    +	fprintf(stderr, "MTCU\n");
    +
     	if (parse_tree_gently(tree, 1) < 0)
    @@ -167,2 +169,4 @@ static void add_children_by_path(struct repository *r,

    +	fprintf(stderr, "ACBP\n");
    +
     	if (!tree)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-04-09 15:10                 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason
@ 2019-04-10 22:57                   ` Jeff King
  2019-04-25  7:16                     ` Junio C Hamano
  2019-05-23 11:30                     ` Jeff King
  0 siblings, 2 replies; 57+ messages in thread
From: Jeff King @ 2019-04-10 22:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Eric Wong, git

On Tue, Apr 09, 2019 at 05:10:43PM +0200, Ævar Arnfjörð Bjarmason wrote:

> I've found a case where turning bitmaps on does horrible things for
> bitmap "push" performance.
> [...]
> I can't share the repo, but I had a report where just a "git push" of a
> topic branch that was 2/58 ahead/behind took ~2 minutes just in
> "Enumerating objects", but ~500ms without bitmaps.

That's pretty bad, though I'm not terribly surprised. The worst cases
are ones where we have to traverse a lot to fill in the bitmap. So
either there are a lot of commits newer than the bitmapped pack, or
we've done a bad job of picking old commits to bitmap, requiring us to
walk back to find all of the reachable objects (until we find something
that does have a bitmap).

And "bad" here is somewhat subjective. The other side told us some
"have" lines, and those are what we have to walk back from. _Usually_
those would correspond to ref tips, and the bitmap code tries to put a
bitmap at each ref tip. But if you have tons of refs, it can't always do
so.

> I.e. almost all the time is in get_object_list_from_bitmap() and around
> 1m30s in just this in pack-bitmap.c:
> 
>     haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
> 
> And then another ~20s in:
> 
>     wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);

Yeah, that's where I'd expect the time to go. Inside find_objects()
we'll call traverse_commit_list() to do the actual walk.

The first walk for "haves" is looking mostly at old commits. Stuff that
is mentioned in the pack, but for which we have to walk to find the
bitmap.

The second walk for wants is probably mostly looking at new commits.
Things that don't have a bitmap yet, and for which we have to walk
(though it's interesting that it's so much more expensive than the
non-bitmap walk, which has to fully enumerate those trees itself; that
implies that some of the "haves" are recent, too, with respect to the
last pack).

> This is with 10 packs and where only the largest (initial clone pack)
> had a *.bitmap, but I can also reproduce with a 'git repack -A -d -b',
> i.e. with only one pack with a *.bitmap, although that makes it a bit
> better for the first bit, and almost completely cuts down on the time
> spent in the second phase:

Yeah, that makes sense. By repacking you've taken all those new commits
and included them in on-disk bitmaps. So I'd expect the "wants" to get
much shorter, but the "haves" phase staying long means we could do a
better job of picking commits to have on-disk bitmaps.

So two avenues for exploration I think:

  1. I've long suspected that the bitmap selection code isn't ideal.
     Both in terms of what it picks, but also in its runtime (I think it
     ends up walking the same slices of history multiple times in some
     cases).

  2. The answer we get from a bitmap versus a regular traversal are not
     apples-to-apples equivalent. The regular traversal walks down to
     the UNINTERESTING commits, marks the boundaries trees and blobs as
     UNINTERESTING, and then adds in all the interesting trees and blobs
     minus the UNINTERESTING parts. So it can sometimes give the wrong
     answer, claiming something is interesting when it is not.

     Whereas with bitmaps we fill in the trees and blobs as we walk, and
     you get the true answer. But it means we may open up a lot more
     trees than the equivalent traversal would.

     So one thing I've never really experimented with (and indeed, never
     really thought about until writing this email) is that the bitmaps
     could try to do that looser style of traversal, knowing that we
     might err on the side of calling things interesting in a few cases.
     But hopefully spending a lot less time opening trees.

     I'm not even 100% sure what that would look like in code, but just
     thinking about it from a high level, I don't there's a particular
     mathematical reason it couldn't work.

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/3] prune: use bitmaps for reachability traversal
  2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
  2019-03-09  2:49   ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong
@ 2019-04-15 15:00   ` Derrick Stolee
  2019-04-18 19:49     ` Jeff King
  1 sibling, 1 reply; 57+ messages in thread
From: Derrick Stolee @ 2019-04-15 15:00 UTC (permalink / raw)
  To: Jeff King, git

On 2/13/2019 11:37 PM, Jeff King wrote:
> +static void *lookup_object_by_type(struct repository *r,
> +				   const struct object_id *oid,
> +				   enum object_type type)
> +{
> +	switch (type) {
> +	case OBJ_COMMIT:
> +		return lookup_commit(r, oid);
> +	case OBJ_TREE:
> +		return lookup_tree(r, oid);
> +	case OBJ_TAG:
> +		return lookup_tag(r, oid);
> +	case OBJ_BLOB:
> +		return lookup_blob(r, oid);
> +	default:
> +		die("BUG: unknown object type %d", type);
> +	}
> +}
> +
> +static int mark_object_seen(const struct object_id *oid,
> +			     enum object_type type,
> +			     int exclude,
> +			     uint32_t name_hash,
> +			     struct packed_git *found_pack,
> +			     off_t found_offset)
> +{
> +	struct object *obj = lookup_object_by_type(the_repository, oid, type);
> +	if (!obj)
> +		die("unable to create object '%s'", oid_to_hex(oid));
> +
> +	obj->flags |= SEEN;
> +	return 0;
> +}
> +
>  void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
>  			    timestamp_t mark_recent, struct progress *progress)
>  {
>  	struct connectivity_progress cp;
> +	struct bitmap_index *bitmap_git;
>  
>  	/*
>  	 * Set up revision parsing, and mark us as being interested
> @@ -188,6 +223,13 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
>  	cp.progress = progress;
>  	cp.count = 0;
>  
> +	bitmap_git = prepare_bitmap_walk(revs);
> +	if (bitmap_git) {
> +		traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
> +		free_bitmap_index(bitmap_git);
> +		return;
> +	}
> +

Peff,

This block after "if (bitmap_git)" is not exercised by the (non-performance)
test suite, so the rest of the code above is not tested, either. Could a test
of this "prune" capability be added to the regression tests around the bitmaps?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/3] prune: use bitmaps for reachability traversal
  2019-04-15 15:00   ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee
@ 2019-04-18 19:49     ` Jeff King
  2019-04-18 20:08       ` [PATCH] t5304: add a test for pruning with bitmaps Jeff King
  0 siblings, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-04-18 19:49 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

On Mon, Apr 15, 2019 at 11:00:45AM -0400, Derrick Stolee wrote:

> >  void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
> >  			    timestamp_t mark_recent, struct progress *progress)
> [...]
> >  	cp.progress = progress;
> >  	cp.count = 0;
> >  
> > +	bitmap_git = prepare_bitmap_walk(revs);
> > +	if (bitmap_git) {
> > +		traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
> > +		free_bitmap_index(bitmap_git);
> > +		return;
> > +	}
> > +
> 
> This block after "if (bitmap_git)" is not exercised by the (non-performance)
> test suite, so the rest of the code above is not tested, either. Could a test
> of this "prune" capability be added to the regression tests around the bitmaps?

I have somewhat mixed feelings here.  We can add a test with a trivial
bitmap to get code coverage here. But as with many optimizations
(bitmaps, but also your commit graph work), we get a much more thorough
correctness test by re-running the whole test suite (though we do not
yet have one for running with bitmaps on), and a better test that the
optimization is kicking in and working via the t/perf tests.

I dunno. I guess it does not hurt to at least to at least make sure this
code is running in the normal suite. I don't think that will find the
more interesting regressions, but at least saves us the from the most
egregious ones ("oops, the whole thing segfaults because somebody
changed the API" kinds of things).

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH] t5304: add a test for pruning with bitmaps
  2019-04-18 19:49     ` Jeff King
@ 2019-04-18 20:08       ` Jeff King
  2019-04-20  1:01         ` Derrick Stolee
  0 siblings, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-04-18 20:08 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano

On Thu, Apr 18, 2019 at 03:49:53PM -0400, Jeff King wrote:

> > This block after "if (bitmap_git)" is not exercised by the (non-performance)
> > test suite, so the rest of the code above is not tested, either. Could a test
> > of this "prune" capability be added to the regression tests around the bitmaps?
> 
> I have somewhat mixed feelings here.  We can add a test with a trivial
> bitmap to get code coverage here. But as with many optimizations
> (bitmaps, but also your commit graph work), we get a much more thorough
> correctness test by re-running the whole test suite (though we do not
> yet have one for running with bitmaps on), and a better test that the
> optimization is kicking in and working via the t/perf tests.
> 
> I dunno. I guess it does not hurt to at least to at least make sure this
> code is running in the normal suite. I don't think that will find the
> more interesting regressions, but at least saves us the from the most
> egregious ones ("oops, the whole thing segfaults because somebody
> changed the API" kinds of things).

So here's a test. This goes on top of jk/prune-optim (which is also
already in master).

-- >8 --
Subject: [PATCH] t5304: add a test for pruning with bitmaps

Commit fde67d6896 (prune: use bitmaps for reachability traversal,
2019-02-13) uses bitmaps for pruning when they're available, but only
covers this functionality in the t/perf tests. This makes a kind of
sense, since the point is that the behaviour is indistinguishable before
and after the patch, just faster.

But since the bitmap code path is not exercised at all in the regular
test suite, it leaves us open to a regression where the behavior does in
fact change. The most thorough way to test that would be running the
whole suite with bitmaps enabled. But we don't yet have a way to do
that, and anyway it's expensive to do so. Let's at least add a basic
test that exercises this path and make sure we prune an object we should
(and not one that we shouldn't).

That would hopefully catch the most obvious breakages early.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5304-prune.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 1eeb828a15..df60f18fb8 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -341,4 +341,12 @@ test_expect_success 'prune: handle expire option correctly' '
 	git prune --no-expire
 '
 
+test_expect_success 'trivial prune with bitmaps enabled' '
+	git repack -adb &&
+	blob=$(echo bitmap-unreachable-blob | git hash-object -w --stdin) &&
+	git prune --expire=now &&
+	git cat-file -e HEAD &&
+	test_must_fail git cat-file -e $blob
+'
+
 test_done
-- 
2.21.0.1090.g9d17dac192


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] t5304: add a test for pruning with bitmaps
  2019-04-18 20:08       ` [PATCH] t5304: add a test for pruning with bitmaps Jeff King
@ 2019-04-20  1:01         ` Derrick Stolee
  2019-04-20  3:24           ` Jeff King
  0 siblings, 1 reply; 57+ messages in thread
From: Derrick Stolee @ 2019-04-20  1:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On 4/18/2019 4:08 PM, Jeff King wrote:
> On Thu, Apr 18, 2019 at 03:49:53PM -0400, Jeff King wrote:
>> I dunno. I guess it does not hurt to at least to at least make sure this
>> code is running in the normal suite. I don't think that will find the
>> more interesting regressions, but at least saves us the from the most
>> egregious ones ("oops, the whole thing segfaults because somebody
>> changed the API" kinds of things).

That's the level of coverage I was hoping to see. I won't be picky if
the OBJ_TAG case isn't hit or something.

> So here's a test. This goes on top of jk/prune-optim (which is also
> already in master).

[snip]

> +test_expect_success 'trivial prune with bitmaps enabled' '
> +	git repack -adb &&
> +	blob=$(echo bitmap-unreachable-blob | git hash-object -w --stdin) &&
> +	git prune --expire=now &&
> +	git cat-file -e HEAD &&
> +	test_must_fail git cat-file -e $blob
> +'
> +
>  test_done

This test does cover the 'mark_object_seen()' method, which I tested by
removing the "!" in the "if (!obj) die();" condition.

However, my first inclination was to remove the line

	obj->flags |= SEEN;

from the method, and I found that it still worked! This was confusing,
so I looked for places where the SEEN flag was added during bitmap walks,
and it turns out that the objects are marked as SEEN by prepare_bitmap_walk().

This means that we can remove a lot of the implementation from reachable.c
and have the same effect (and drop an iteration through the objects). See the
diff below.

Thoughts?
-Stolee

-->8--

diff --git a/reachable.c b/reachable.c
index 0d00a91de4..7d2762d47f 100644
--- a/reachable.c
+++ b/reachable.c
@@ -159,39 +159,6 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
                                      FOR_EACH_OBJECT_LOCAL_ONLY);
 }

-static void *lookup_object_by_type(struct repository *r,
-                                  const struct object_id *oid,
-                                  enum object_type type)
-{
-       switch (type) {
-       case OBJ_COMMIT:
-               return lookup_commit(r, oid);
-       case OBJ_TREE:
-               return lookup_tree(r, oid);
-       case OBJ_TAG:
-               return lookup_tag(r, oid);
-       case OBJ_BLOB:
-               return lookup_blob(r, oid);
-       default:
-               die("BUG: unknown object type %d", type);
-       }
-}
-
-static int mark_object_seen(const struct object_id *oid,
-                            enum object_type type,
-                            int exclude,
-                            uint32_t name_hash,
-                            struct packed_git *found_pack,
-                            off_t found_offset)
-{
-       struct object *obj = lookup_object_by_type(the_repository, oid, type);
-       if (!obj)
-               die("unable to create object '%s'", oid_to_hex(oid));
-
-       obj->flags |= SEEN;
-       return 0;
-}
-
 void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
                            timestamp_t mark_recent, struct progress *progress)
 {
@@ -225,7 +192,10 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,

        bitmap_git = prepare_bitmap_walk(revs);
        if (bitmap_git) {
-               traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
+               /*
+                * reachable objects are marked as SEEN
+                * by prepare_bitmap_walk(revs).
+                */
                free_bitmap_index(bitmap_git);
                return;
        }

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] t5304: add a test for pruning with bitmaps
  2019-04-20  1:01         ` Derrick Stolee
@ 2019-04-20  3:24           ` Jeff King
  2019-04-20 21:01             ` Derrick Stolee
  0 siblings, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-04-20  3:24 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano

On Fri, Apr 19, 2019 at 09:01:50PM -0400, Derrick Stolee wrote:

> > +test_expect_success 'trivial prune with bitmaps enabled' '
> > +	git repack -adb &&
> > +	blob=$(echo bitmap-unreachable-blob | git hash-object -w --stdin) &&
> > +	git prune --expire=now &&
> > +	git cat-file -e HEAD &&
> > +	test_must_fail git cat-file -e $blob
> > +'
> > +
> >  test_done
> 
> This test does cover the 'mark_object_seen()' method, which I tested by
> removing the "!" in the "if (!obj) die();" condition.
> 
> However, my first inclination was to remove the line
> 
> 	obj->flags |= SEEN;
> 
> from the method, and I found that it still worked! This was confusing,
> so I looked for places where the SEEN flag was added during bitmap walks,
> and it turns out that the objects are marked as SEEN by prepare_bitmap_walk().

I think this is only _some_ objects. The full bitmap is created by
reading the on-disk bitmaps, and then filling in any necessary gaps by
doing a traditional traversal. We also set it on tip objects to de-dup
the initial revs->pending list.

Try running t5304 with this:

diff --git a/reachable.c b/reachable.c
index eba6f64e01..7ec73ef43f 100644
--- a/reachable.c
+++ b/reachable.c
@@ -191,6 +191,8 @@ static int mark_object_seen(const struct object_id *oid,
 	if (!obj)
 		die("unable to create object '%s'", oid_to_hex(oid));
 
+	if (!(obj->flags & SEEN))
+		die("seen flag not already there");
 	obj->flags |= SEEN;
 	return 0;
 }

which shows that we are indeed freshly setting SEEN on some objects.

Interestingly, I don't _think_ you can cause an object to get pruned
accidentally here, because:

  1. Any object that will miss its SEEN has to be in the bitmapped pack,
     and not found through normal traversal.

  2. git-prune only removes loose objects, not packed ones.

  3. Loose objects cannot be in the bitmapped pack. Therefore, no object
     can hit both cases (1) and (2).

Even if that were the end of the story, I don't think I'd want to rely
on it here. The post-condition of the function should be that SEEN is
set on all reachable objects, whether bitmaps are used or not. And we do
indeed use this elsewhere:

We'll later call prune_shallow(), which uses SEEN to discard unreachable
entries. I'm not sure it even makes sense to have a shallow repo with
bitmaps, though.  Obviously we're lacking the full graph, but I wouldn't
be surprised if the shallow code quietly pretends that all is well and
we generate bogus bitmaps or something. Or maybe it even mostly works as
long as you don't walk over the shallow cut-points.

mark_reachable_objects() is also used for reflog expiration with
--stale-fix. I tried generating a test that would fail with your patch,
but I think it's not quite possible, because --stale-fix will do a
follow-up walk of the graph to see which objects mentioned in the reflog
we have and do not have. So it doesn't actually break things, it just
makes them slower (because we erroneously fail to mark SEEN things that
it's then forced to walk).

So I don't _think_ your patch actually breaks anything user-visible, but
it's a bit like leaving a live grenade in your living room for somebody
to find.

And I think we are indeed covering lookup_object_by_type(), etc in the
t5304 test I added. So AFAICT all of the new code is covered after that?

-Peff

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] t5304: add a test for pruning with bitmaps
  2019-04-20  3:24           ` Jeff King
@ 2019-04-20 21:01             ` Derrick Stolee
  0 siblings, 0 replies; 57+ messages in thread
From: Derrick Stolee @ 2019-04-20 21:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On 4/19/2019 11:24 PM, Jeff King wrote:
> Try running t5304 with this:
> 
> diff --git a/reachable.c b/reachable.c
> index eba6f64e01..7ec73ef43f 100644
> --- a/reachable.c
> +++ b/reachable.c
> @@ -191,6 +191,8 @@ static int mark_object_seen(const struct object_id *oid,
>  	if (!obj)
>  		die("unable to create object '%s'", oid_to_hex(oid));
>  
> +	if (!(obj->flags & SEEN))
> +		die("seen flag not already there");
>  	obj->flags |= SEEN;
>  	return 0;
>  }
> 
> which shows that we are indeed freshly setting SEEN on some objects.

Good point! Thanks for clearing that up for me.
 
> Interestingly, I don't _think_ you can cause an object to get pruned
> accidentally here, because:

[snip]

I appreciate the additional context that you gave (and I snipped). This
makes me comfortable accepting this test as an exercise of the code (to
guard against future changes that create failures like null-refs) and we
will need to rely on the performance suite to catch issues like removing
the SEEN markers that I had tested.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-04-10 22:57                   ` Jeff King
@ 2019-04-25  7:16                     ` Junio C Hamano
  2019-05-04  1:37                       ` Jeff King
  2019-05-23 11:30                     ` Jeff King
  1 sibling, 1 reply; 57+ messages in thread
From: Junio C Hamano @ 2019-04-25  7:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, Eric Wong, git

Jeff King <peff@peff.net> writes:

> On Tue, Apr 09, 2019 at 05:10:43PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> I've found a case where turning bitmaps on does horrible things for
>> bitmap "push" performance.
>> [...]
>> I can't share the repo, but I had a report where just a "git push" of a
>> topic branch that was 2/58 ahead/behind took ~2 minutes just in
>> "Enumerating objects", but ~500ms without bitmaps.
>
> That's pretty bad, though I'm not terribly surprised.

I was revisiting the recent "What's cooking" report, and I am not
sure what the current status of the topic is.

I do not get a feel that the current bitmap implementation has been
widely used in repositories that have vastly different access
patterns---it probably has been tried only by those who can afford
the engineering cost to see if the implementation happens to work
well for their workload and some may have chosen to adopt it while
others didn't.  So it may be very well tuned for the former people
but once we merge this topic down, we'll hear from others with quite
different workload, which may lead to us tuning the code to bit
better to their workload while not hurting other existing users,
hopefully.

Or not.

I am somewhat tempted to make things more exciting by merging it to
'next' soonish, but I guess Ævar and you are not quite ready for
that excitement yet, judging from the following (which looks quite
sensible suggestions)?

Thanks.

> Yeah, that makes sense. By repacking you've taken all those new commits
> and included them in on-disk bitmaps. So I'd expect the "wants" to get
> much shorter, but the "haves" phase staying long means we could do a
> better job of picking commits to have on-disk bitmaps.
>
> So two avenues for exploration I think:
>
>   1. I've long suspected that the bitmap selection code isn't ideal.
>      Both in terms of what it picks, but also in its runtime (I think it
>      ends up walking the same slices of history multiple times in some
>      cases).
>
>   2. The answer we get from a bitmap versus a regular traversal are not
>      apples-to-apples equivalent. The regular traversal walks down to
>      the UNINTERESTING commits, marks the boundaries trees and blobs as
>      UNINTERESTING, and then adds in all the interesting trees and blobs
>      minus the UNINTERESTING parts. So it can sometimes give the wrong
>      answer, claiming something is interesting when it is not.
>
>      Whereas with bitmaps we fill in the trees and blobs as we walk, and
>      you get the true answer. But it means we may open up a lot more
>      trees than the equivalent traversal would.
>
>      So one thing I've never really experimented with (and indeed, never
>      really thought about until writing this email) is that the bitmaps
>      could try to do that looser style of traversal, knowing that we
>      might err on the side of calling things interesting in a few cases.
>      But hopefully spending a lot less time opening trees.
>
>      I'm not even 100% sure what that would look like in code, but just
>      thinking about it from a high level, I don't there's a particular
>      mathematical reason it couldn't work.
>
> -Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-04-25  7:16                     ` Junio C Hamano
@ 2019-05-04  1:37                       ` Jeff King
  2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-05-04  1:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Eric Wong, git

On Thu, Apr 25, 2019 at 04:16:46PM +0900, Junio C Hamano wrote:

> I was revisiting the recent "What's cooking" report, and I am not
> sure what the current status of the topic is.
> 
> I do not get a feel that the current bitmap implementation has been
> widely used in repositories that have vastly different access
> patterns---it probably has been tried only by those who can afford
> the engineering cost to see if the implementation happens to work
> well for their workload and some may have chosen to adopt it while
> others didn't.  So it may be very well tuned for the former people
> but once we merge this topic down, we'll hear from others with quite
> different workload, which may lead to us tuning the code to bit
> better to their workload while not hurting other existing users,
> hopefully.
> 
> Or not.

Note that Ævar's case was somebody running bitmaps locally and trying to
push, which I think is generally not a good match for bitmaps (even when
they work, they cost more to generate than what you save if you're only
pushing once).

The goal of Eric's patch was that by kicking in for bare repos, we'd
mostly be hitting servers that are serving up fetches. So if by
"workload" you mean that we some people might use bare repos for other
cases, yeah, there's a potential for confusion or regression there.

If you mean that bitmaps might not work for some workloads even when
we're serving a lot of fetches, I won't say that's _not_ true, but my
experience is that they are generally a net win. Both for the smaller
repositories we see on github.com, but also for big, busy ones that our
on-premises customers use.

  Actually, there is one curiosity with Eric's patch that I haven't
  tested. As I've mentioned before, we store "forks" as single
  repositories pointing to a single shared alternates repository. Since
  the bitmap code only handles one .bitmap per invocation, you really
  want just one big one in the shared repo. If "git repack" in the forks
  started generating one, that would be surprising and annoying.

  In practice this is a pretty extreme corner case. And a lot would
  depend on how you're using "repack" in the fork (e.g., a partial
  repack would know that it can't generate bitmaps anyway). I'm pretty
  sure it would not even impact our setup at all, but I can probably
  come up with a devils advocate one where it would.

> I am somewhat tempted to make things more exciting by merging it to
> 'next' soonish, but I guess Ævar and you are not quite ready for
> that excitement yet, judging from the following (which looks quite
> sensible suggestions)?

It's OK with me for this to go to 'next'. Note that the other two
patches from me could actually graduate separately. One is a
straight-out test fix, and the other should always be a win (and does
nothing if you're not already generating bitmaps).

By the way, there were some timing puzzles mentioned in that second
commit. I re-ran them today and everything was what I'd expect. So I
wonder if I just screwed up the timings before. I can re-write that
commit message if it hasn't made it to 'next' yet.

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-04  1:37                       ` Jeff King
@ 2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
  2019-05-04 13:23                           ` SZEDER Gábor
  2019-05-07  7:45                           ` Jeff King
  0 siblings, 2 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-04  6:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor


On Sat, May 04 2019, Jeff King wrote:

> On Thu, Apr 25, 2019 at 04:16:46PM +0900, Junio C Hamano wrote:
>
>> I was revisiting the recent "What's cooking" report, and I am not
>> sure what the current status of the topic is.
>>
>> I do not get a feel that the current bitmap implementation has been
>> widely used in repositories that have vastly different access
>> patterns---it probably has been tried only by those who can afford
>> the engineering cost to see if the implementation happens to work
>> well for their workload and some may have chosen to adopt it while
>> others didn't.  So it may be very well tuned for the former people
>> but once we merge this topic down, we'll hear from others with quite
>> different workload, which may lead to us tuning the code to bit
>> better to their workload while not hurting other existing users,
>> hopefully.
>>
>> Or not.
>
> Note that Ævar's case was somebody running bitmaps locally and trying to
> push, which I think is generally not a good match for bitmaps (even when
> they work, they cost more to generate than what you save if you're only
> pushing once).

Right. It was *not* caused by this "enable bitmaps by default on bare
repos" patch (which I wasn't even running with), but *is* indicative of
a pretty big edge case with enabling bitmaps that *will* happen for some
on such bare repos if we ship the patch.

> The goal of Eric's patch was that by kicking in for bare repos, we'd
> mostly be hitting servers that are serving up fetches. So if by
> "workload" you mean that we some people might use bare repos for other
> cases, yeah, there's a potential for confusion or regression there.

As noted I suspect that's a really rare use-case in practice, and in
reply to Junio's <xmqq1s1qy2ox.fsf@gitster-ct.c.googlers.com> upthread I
think it's fine to just try this. Maybe we'll finally get such use-cases
out of the woodworks by turning it on by default.

As an aside this is the Nth time I notice how crappy that "Enumerating
objects" progress bar is. We do a *lot* of things there, including this
bitmap calculation.

But just splitting it up might result in either no progress (all
individually below 2 seconds), or a lot of noise as you have 20 things
that each take 2 seconds. I wonder if someone's looked at supporting:

    Enumerating Objects (X%) => Calculating bitmaps (Y%)

Where X% is the total progres, and %Y is the sub-progress. I eyeballed
doing this once by "chaining" the progress structs, but there's probably
a less crappy way...


> If you mean that bitmaps might not work for some workloads even when
> we're serving a lot of fetches, I won't say that's _not_ true, but my
> experience is that they are generally a net win. Both for the smaller
> repositories we see on github.com, but also for big, busy ones that our
> on-premises customers use.
>
>   Actually, there is one curiosity with Eric's patch that I haven't
>   tested. As I've mentioned before, we store "forks" as single
>   repositories pointing to a single shared alternates repository. Since
>   the bitmap code only handles one .bitmap per invocation, you really
>   want just one big one in the shared repo. If "git repack" in the forks
>   started generating one, that would be surprising and annoying.
>
>   In practice this is a pretty extreme corner case. And a lot would
>   depend on how you're using "repack" in the fork (e.g., a partial
>   repack would know that it can't generate bitmaps anyway). I'm pretty
>   sure it would not even impact our setup at all, but I can probably
>   come up with a devils advocate one where it would.
>
>> I am somewhat tempted to make things more exciting by merging it to
>> 'next' soonish, but I guess Ævar and you are not quite ready for
>> that excitement yet, judging from the following (which looks quite
>> sensible suggestions)?
>
> It's OK with me for this to go to 'next'. Note that the other two
> patches from me could actually graduate separately. One is a
> straight-out test fix, and the other should always be a win (and does
> nothing if you're not already generating bitmaps).
>
> By the way, there were some timing puzzles mentioned in that second
> commit. I re-ran them today and everything was what I'd expect. So I
> wonder if I just screwed up the timings before. I can re-write that
> commit message if it hasn't made it to 'next' yet.
>
> -Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
@ 2019-05-04 13:23                           ` SZEDER Gábor
  2019-05-08 20:17                             ` Ævar Arnfjörð Bjarmason
  2019-05-07  7:45                           ` Jeff King
  1 sibling, 1 reply; 57+ messages in thread
From: SZEDER Gábor @ 2019-05-04 13:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Junio C Hamano, Eric Wong, git

On Sat, May 04, 2019 at 08:52:01AM +0200, Ævar Arnfjörð Bjarmason wrote:
> As an aside this is the Nth time I notice how crappy that "Enumerating
> objects" progress bar is.

And don't forget that the commit-graph progress bar still prints
nonsense :)

  https://public-inbox.org/git/87ef6ydds8.fsf@evledraar.gmail.com/


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
  2019-05-04 13:23                           ` SZEDER Gábor
@ 2019-05-07  7:45                           ` Jeff King
  2019-05-07  8:12                             ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-05-07  7:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor

On Sat, May 04, 2019 at 08:52:01AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Note that Ævar's case was somebody running bitmaps locally and trying to
> > push, which I think is generally not a good match for bitmaps (even when
> > they work, they cost more to generate than what you save if you're only
> > pushing once).
> 
> Right. It was *not* caused by this "enable bitmaps by default on bare
> repos" patch (which I wasn't even running with), but *is* indicative of
> a pretty big edge case with enabling bitmaps that *will* happen for some
> on such bare repos if we ship the patch.

Yeah. To clarify my comments a bit: I do think it would be possible to
hit a weird case like this while serving fetches (i.e., as far as I know
there is nothing in what you saw that is inherent to pushes). But I do
think for serving fetches, bitmaps are overall a big net win (based on
my experiences).

So I think it may come down to a tradeoff: enabling this by default
would probably be a net win across the population, but that's little
comfort to the unlucky somebody who may see it as a regression. I'm not
sure which is more important to maintain.

> As an aside this is the Nth time I notice how crappy that "Enumerating
> objects" progress bar is. We do a *lot* of things there, including this
> bitmap calculation.
> 
> But just splitting it up might result in either no progress (all
> individually below 2 seconds), or a lot of noise as you have 20 things
> that each take 2 seconds. I wonder if someone's looked at supporting:
> 
>     Enumerating Objects (X%) => Calculating bitmaps (Y%)
> 
> Where X% is the total progres, and %Y is the sub-progress. I eyeballed
> doing this once by "chaining" the progress structs, but there's probably
> a less crappy way...

I don't think it needs to be split; I think we just need to update the
object count while we're traversing the bitmaps. The problem is that the
progress object is known to pack-objects.c. Without bitmaps, as the
revision machinery walks the graph, our callbacks bump the progress
meter every time we see an object.

With bitmaps, all that walking happens behind the scenes, and the bitmap
code delivers us the final answer. So we pause for a long time, and then
suddenly it shoots forward.

I think we'd want a way to tell the bitmap code to update our progress
meter as it traverses (both single objects, but also taking into account
when it finds a bitmap and then suddenly bumps the value by a large
amount).

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-07  7:45                           ` Jeff King
@ 2019-05-07  8:12                             ` Ævar Arnfjörð Bjarmason
  2019-05-08  7:11                               ` Jeff King
  0 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-07  8:12 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor, Derrick Stolee


On Tue, May 07 2019, Jeff King wrote:

> On Sat, May 04, 2019 at 08:52:01AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > Note that Ævar's case was somebody running bitmaps locally and trying to
>> > push, which I think is generally not a good match for bitmaps (even when
>> > they work, they cost more to generate than what you save if you're only
>> > pushing once).
>>
>> Right. It was *not* caused by this "enable bitmaps by default on bare
>> repos" patch (which I wasn't even running with), but *is* indicative of
>> a pretty big edge case with enabling bitmaps that *will* happen for some
>> on such bare repos if we ship the patch.
>
> Yeah. To clarify my comments a bit: I do think it would be possible to
> hit a weird case like this while serving fetches (i.e., as far as I know
> there is nothing in what you saw that is inherent to pushes). But I do
> think for serving fetches, bitmaps are overall a big net win (based on
> my experiences).
>
> So I think it may come down to a tradeoff: enabling this by default
> would probably be a net win across the population, but that's little
> comfort to the unlucky somebody who may see it as a regression. I'm not
> sure which is more important to maintain.
>
>> As an aside this is the Nth time I notice how crappy that "Enumerating
>> objects" progress bar is. We do a *lot* of things there, including this
>> bitmap calculation.
>>
>> But just splitting it up might result in either no progress (all
>> individually below 2 seconds), or a lot of noise as you have 20 things
>> that each take 2 seconds. I wonder if someone's looked at supporting:
>>
>>     Enumerating Objects (X%) => Calculating bitmaps (Y%)
>>
>> Where X% is the total progres, and %Y is the sub-progress. I eyeballed
>> doing this once by "chaining" the progress structs, but there's probably
>> a less crappy way...
>
> I don't think it needs to be split; I think we just need to update the
> object count while we're traversing the bitmaps. The problem is that the
> progress object is known to pack-objects.c. Without bitmaps, as the
> revision machinery walks the graph, our callbacks bump the progress
> meter every time we see an object.
>
> With bitmaps, all that walking happens behind the scenes, and the bitmap
> code delivers us the final answer. So we pause for a long time, and then
> suddenly it shoots forward.
>
> I think we'd want a way to tell the bitmap code to update our progress
> meter as it traverses (both single objects, but also taking into account
> when it finds a bitmap and then suddenly bumps the value by a large
> amount).

Not splitting it will fix the progress bar stalling, so it fixes the
problem that the user is wondering if the command is entirely hanging.

But I was hoping to give the user an idea of roughly where we're
spending our time, e.g. so you can see how much the pack.useSparse
setting is helping (or not).

So something where we report sub-progress as we go along, and perhaps
print some brief summary at the end if it took long enough, e.g.:

    Enumerating Objects (X^1%) => Marking trees (Y^1%)
    Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%)

And at the end:

    Enumerating Objects (100%) in ~2m30s -- (~10s marking trees, ~2m10s bitmaps, ~10s other)

I.e. bringing the whole "nested" trace2 regions full circle with the
progress bar where we could elect to trace/show some of that info, and
then you could turn on some trace2 mode/verbose progress to see more.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-07  8:12                             ` Ævar Arnfjörð Bjarmason
@ 2019-05-08  7:11                               ` Jeff King
  2019-05-08 14:20                                 ` Derrick Stolee
  2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 57+ messages in thread
From: Jeff King @ 2019-05-08  7:11 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor, Derrick Stolee

On Tue, May 07, 2019 at 10:12:06AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I think we'd want a way to tell the bitmap code to update our progress
> > meter as it traverses (both single objects, but also taking into account
> > when it finds a bitmap and then suddenly bumps the value by a large
> > amount).
> 
> Not splitting it will fix the progress bar stalling, so it fixes the
> problem that the user is wondering if the command is entirely hanging.
> 
> But I was hoping to give the user an idea of roughly where we're
> spending our time, e.g. so you can see how much the pack.useSparse
> setting is helping (or not).

Yeah, I think that's a bigger and more complicated problem. I admit that
my main annoyance is just the stall while we fill in the bitmaps (and
it's easy because the bitmap traversal is the same unit of work as a
regular traversal).

> So something where we report sub-progress as we go along, and perhaps
> print some brief summary at the end if it took long enough, e.g.:
> 
>     Enumerating Objects (X^1%) => Marking trees (Y^1%)
>     Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%)
> 
> And at the end:
> 
>     Enumerating Objects (100%) in ~2m30s -- (~10s marking trees, ~2m10s bitmaps, ~10s other)
> 
> I.e. bringing the whole "nested" trace2 regions full circle with the
> progress bar where we could elect to trace/show some of that info, and
> then you could turn on some trace2 mode/verbose progress to see more.

I do wonder if this really needs to be part of the progress bar. The
goal of the progress bar is to give the user a sense that work is
happening, and (if possible, but not for "enumerating") an idea of when
it might finish. If the trace code can already do detailed timings, then
shouldn't we just be encouraging people to use that?

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-08  7:11                               ` Jeff King
@ 2019-05-08 14:20                                 ` Derrick Stolee
  2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 57+ messages in thread
From: Derrick Stolee @ 2019-05-08 14:20 UTC (permalink / raw)
  To: Jeff King, Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor

On 5/8/2019 3:11 AM, Jeff King wrote:
> On Tue, May 07, 2019 at 10:12:06AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
>>> I think we'd want a way to tell the bitmap code to update our progress
>>> meter as it traverses (both single objects, but also taking into account
>>> when it finds a bitmap and then suddenly bumps the value by a large
>>> amount).
>>
>> Not splitting it will fix the progress bar stalling, so it fixes the
>> problem that the user is wondering if the command is entirely hanging.
>>
>> But I was hoping to give the user an idea of roughly where we're
>> spending our time, e.g. so you can see how much the pack.useSparse
>> setting is helping (or not).
> 
> Yeah, I think that's a bigger and more complicated problem. I admit that
> my main annoyance is just the stall while we fill in the bitmaps (and
> it's easy because the bitmap traversal is the same unit of work as a
> regular traversal).

The pack.useSparse setting also speeds up a section that is not marked
by progress: that portion usually is walking all UNINTERESTING trees and
the"Enumerating Objects" progress is just for walking the INTERESTING objects.
 
>> So something where we report sub-progress as we go along, and perhaps
>> print some brief summary at the end if it took long enough, e.g.:
>>
>>     Enumerating Objects (X^1%) => Marking trees (Y^1%)
>>     Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%)

I like this idea for splitting the "normal" mechanism, too:

     Enumerating Objects (X^1%) => Marking trees (Y^1%)
     Enumerating Objects (X^2%) => Enumerating objects to pack (Y^2%)

>> I.e. bringing the whole "nested" trace2 regions full circle with the
>> progress bar where we could elect to trace/show some of that info, and
>> then you could turn on some trace2 mode/verbose progress to see more.
> 
> I do wonder if this really needs to be part of the progress bar. The
> goal of the progress bar is to give the user a sense that work is
> happening, and (if possible, but not for "enumerating") an idea of when
> it might finish. If the trace code can already do detailed timings, then
> shouldn't we just be encouraging people to use that?

The problem I've seen (without bitmaps) is that running `git push` can
take a while before _any_ progress is listed.

Good news is: `pack.useSparse` fixed our push problem in the Windows OS
repo. The end-to-end time for `git push` sped up by 7.7x with the change,
and this "blank" time is too fast for users to notice.

Updating the progress could help in cases without pack.useSparse.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-08  7:11                               ` Jeff King
  2019-05-08 14:20                                 ` Derrick Stolee
@ 2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
  2019-05-08 22:25                                   ` Jeff King
  1 sibling, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-08 16:13 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor, Derrick Stolee


On Wed, May 08 2019, Jeff King wrote:

> On Tue, May 07, 2019 at 10:12:06AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > I think we'd want a way to tell the bitmap code to update our progress
>> > meter as it traverses (both single objects, but also taking into account
>> > when it finds a bitmap and then suddenly bumps the value by a large
>> > amount).
>>
>> Not splitting it will fix the progress bar stalling, so it fixes the
>> problem that the user is wondering if the command is entirely hanging.
>>
>> But I was hoping to give the user an idea of roughly where we're
>> spending our time, e.g. so you can see how much the pack.useSparse
>> setting is helping (or not).
>
> Yeah, I think that's a bigger and more complicated problem. I admit that
> my main annoyance is just the stall while we fill in the bitmaps (and
> it's easy because the bitmap traversal is the same unit of work as a
> regular traversal).
>
>> So something where we report sub-progress as we go along, and perhaps
>> print some brief summary at the end if it took long enough, e.g.:
>>
>>     Enumerating Objects (X^1%) => Marking trees (Y^1%)
>>     Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%)
>>
>> And at the end:
>>
>>     Enumerating Objects (100%) in ~2m30s -- (~10s marking trees, ~2m10s bitmaps, ~10s other)
>>
>> I.e. bringing the whole "nested" trace2 regions full circle with the
>> progress bar where we could elect to trace/show some of that info, and
>> then you could turn on some trace2 mode/verbose progress to see more.
>
> I do wonder if this really needs to be part of the progress bar. The
> goal of the progress bar is to give the user a sense that work is
> happening, and (if possible, but not for "enumerating") an idea of when
> it might finish. If the trace code can already do detailed timings, then
> shouldn't we just be encouraging people to use that?

To just show work happening we could save ourselves some horizontal
space and the debates over counting v.s. enumerating with:

     diff --git a/progress.c b/progress.c
     index 0318bdd41b..83336ca391 100644
     --- a/progress.c
     +++ b/progress.c
     @@ -226,3 +226,3 @@ static struct progress *start_progress_delay(const char *title, uint64_t total,
             struct progress *progress = xmalloc(sizeof(*progress));
     -       progress->title = title;
     +       progress->title = "Reticulating splines";
             progress->total = total;

:)

Obviously that's silly, but the point is we do show some user messaging
with these now, and e.g. the other day here on-list (can't be bothered
to find the msgid) someone was lamenting that the N progressbars we show
on "push" were too verbose.

So by coalescing some of the existing bars that do one logical operation
(push) in N steps we could be less verbose without losing the "stuff's
happening" part of it, and would see if something odd was going on,
e.g. the "I/O write" part being proportionally slower on this box than
the other, or when they upgrade bitmaps suddenly showing up as >95% of
the time.

The bit I find interesting about tying it into trace2 is that once you
do that the trace logs can contain e.g. min/max/avg/median/percentile
time for doing some operation we can break into N steps same/similar
steps, which might be interesting for performance analysis.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-04 13:23                           ` SZEDER Gábor
@ 2019-05-08 20:17                             ` Ævar Arnfjörð Bjarmason
  2019-05-09  4:24                               ` Junio C Hamano
  0 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-08 20:17 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Jeff King, Junio C Hamano, Eric Wong, git, Derrick Stolee


On Sat, May 04 2019, SZEDER Gábor wrote:

> On Sat, May 04, 2019 at 08:52:01AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> As an aside this is the Nth time I notice how crappy that "Enumerating
>> objects" progress bar is.
>
> And don't forget that the commit-graph progress bar still prints
> nonsense :)
>
>   https://public-inbox.org/git/87ef6ydds8.fsf@evledraar.gmail.com/

I forgot for a bit, but then figured I'd get out of Derrick's way with
his more significant commit-graph work (which would have inevitably had
merge conflicts with this patch), and look at the time. Here we are
coming up on rc0.

I'm inclined to just wait until Derrick's refactorings land post-2.22.0
unless a) we need it enough before 2.22.0 to cause him trouble b) Junio
agrees with "a" and would take such a patch to fix this part of the
progress output before 2.22.0.

If people (just you would do) tell me "yes I/we want it" and Junio says
"yeah I'll do that" I'll write this patch in the next couple of days,
otherwise I'll do other stuff.

Junio?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
@ 2019-05-08 22:25                                   ` Jeff King
  0 siblings, 0 replies; 57+ messages in thread
From: Jeff King @ 2019-05-08 22:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Eric Wong, git, SZEDER Gábor, Derrick Stolee

On Wed, May 08, 2019 at 06:13:58PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I do wonder if this really needs to be part of the progress bar. The
> > goal of the progress bar is to give the user a sense that work is
> > happening, and (if possible, but not for "enumerating") an idea of when
> > it might finish. If the trace code can already do detailed timings, then
> > shouldn't we just be encouraging people to use that?
> [...]
>      -       progress->title = title;
>      +       progress->title = "Reticulating splines";

Heh, OK, that's a fair reductio ad absurdum of my point. I guess what I
was trying to say is that we don't need to go overboard with accuracy as
long as we're giving a vague sense of the work. Precision measurements
can be handled through a different system.

But I don't mind if you can find a way to do these kind of cascaded
progress meters in a way that is pleasing to the eye and not hard to
handle in the callers of the code.

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-08 20:17                             ` Ævar Arnfjörð Bjarmason
@ 2019-05-09  4:24                               ` Junio C Hamano
  0 siblings, 0 replies; 57+ messages in thread
From: Junio C Hamano @ 2019-05-09  4:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: SZEDER Gábor, Jeff King, Eric Wong, git, Derrick Stolee

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Sat, May 04 2019, SZEDER Gábor wrote:
> ...
>> And don't forget that the commit-graph progress bar still prints
>> nonsense :)
>
> I'm inclined to just wait until Derrick's refactorings land post-2.22.0

Fine by me.  Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-04-10 22:57                   ` Jeff King
  2019-04-25  7:16                     ` Junio C Hamano
@ 2019-05-23 11:30                     ` Jeff King
  2019-05-23 12:53                       ` Derrick Stolee
                                         ` (2 more replies)
  1 sibling, 3 replies; 57+ messages in thread
From: Jeff King @ 2019-05-23 11:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, Eric Wong, git

On Wed, Apr 10, 2019 at 06:57:21PM -0400, Jeff King wrote:

>   2. The answer we get from a bitmap versus a regular traversal are not
>      apples-to-apples equivalent. The regular traversal walks down to
>      the UNINTERESTING commits, marks the boundaries trees and blobs as
>      UNINTERESTING, and then adds in all the interesting trees and blobs
>      minus the UNINTERESTING parts. So it can sometimes give the wrong
>      answer, claiming something is interesting when it is not.
> 
>      Whereas with bitmaps we fill in the trees and blobs as we walk, and
>      you get the true answer. But it means we may open up a lot more
>      trees than the equivalent traversal would.
> 
>      So one thing I've never really experimented with (and indeed, never
>      really thought about until writing this email) is that the bitmaps
>      could try to do that looser style of traversal, knowing that we
>      might err on the side of calling things interesting in a few cases.
>      But hopefully spending a lot less time opening trees.
> 
>      I'm not even 100% sure what that would look like in code, but just
>      thinking about it from a high level, I don't there's a particular
>      mathematical reason it couldn't work.

I spent a while thinking and experimenting with this tonight. The result
is the patch below. Ævar, do you still have a copy of the repo that
misbehaved? I'd be curious to see how it fares.

Finding the right trees to explore is a little tricky with bitmaps.  In
a normal traversal, we consider the "edges" to be worth exploring.
I.e., the places where an UNINTERESTING commit is the parent of an
interesting one.

But with bitmaps, we don't have that information in the same way,
because we're trying to avoid walking the commits in the first place. So
e.g., in a history like this:

  A--B--C
      \
       D

Let's imagine we're computing "C..D", and that D has a bitmap on disk
but C does not. When we visit D, we'll see that it has a bitmap, fill in
the results in our cumulative "want" bitmap, and then stop walking its
parents (because we know everything they could tell us already).

Then we visit C. It's not bitmapped, so we keep walking. Unlike a normal
traversal, we can't stop walking even though there are only
UNINTERESTING commits left, because we have to fill the complete bitmap,
including A and B (and you can imagine there might be thousands of
ancestors of A, too). The reason is that we skipped adding ancestors of
D when we saw its bitmap, so no still_interesting() cutoff will work
there.

But how do we know that it's worth looking into the tree of "B"? If we
assume we visit commits before their ancestors (because of the commit
timestamp ordering), then we'd see that "B" is in the "want" bitmap
already, which makes it worth visiting its tree.

Unfortunately we'd then continue on to "A", and it would _also_ look
interesting, because it's also in the "want" bitmap. We don't have an
easy way of knowing that "A" is basically already covered by "B", and is
therefore not worth pursuing. In this example, we could pass down a bit
from B to A as we traverse. But you can imagine another interesting
commit branched from A, which _would_ make A's tree worth considering.

Fundamentally, as soon as we load a bitmap and stop traversing, we lose
all information about the graph structure.

So the patch below just looks at every tree that might be worth
exploring (so both "A" and "B" here, but not "C"). That shouldn't be any
_worse_ than what the current code does (because it looks at all the
trees). It appears to behave correctly, at least so far as passing the
test suite. Running p5310 and p5311 against git.git and linux.git, it
seems to perform worse. I'm not quite sure why that is (i.e., if I
screwed up something in the implementation, or if the algorithm is
fundamentally worse).

There are a lot of rough edges in the patch; I was just trying to see if
the idea was even viable. So don't bother reviewing it as a real
proposal for inclusion. If you do read it, I'd recommend just looking at
the post-image, since it's essentially a rewrite and the diff is a bit
messy.

-Peff

---
 pack-bitmap.c | 398 ++++++++++++++++++++++++--------------------------
 1 file changed, 193 insertions(+), 205 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 6069b2fe55..4a40f62a38 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -12,6 +12,8 @@
 #include "packfile.h"
 #include "repository.h"
 #include "object-store.h"
+#include "blob.h"
+#include "prio-queue.h"
 
 /*
  * An entry on the bitmap index, representing the bitmap for a given
@@ -422,180 +424,22 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
 	return bitmap_pos + bitmap_git->pack->num_objects;
 }
 
-struct bitmap_show_data {
-	struct bitmap_index *bitmap_git;
-	struct bitmap *base;
-};
-
-static void show_object(struct object *object, const char *name, void *data_)
-{
-	struct bitmap_show_data *data = data_;
-	int bitmap_pos;
-
-	bitmap_pos = bitmap_position(data->bitmap_git, &object->oid);
-
-	if (bitmap_pos < 0)
-		bitmap_pos = ext_index_add_object(data->bitmap_git, object,
-						  name);
-
-	bitmap_set(data->base, bitmap_pos);
-}
-
-static void show_commit(struct commit *commit, void *data)
-{
-}
-
-static int add_to_include_set(struct bitmap_index *bitmap_git,
-			      struct include_data *data,
-			      const struct object_id *oid,
-			      int bitmap_pos)
-{
-	khiter_t hash_pos;
-
-	if (data->seen && bitmap_get(data->seen, bitmap_pos))
-		return 0;
-
-	if (bitmap_get(data->base, bitmap_pos))
-		return 0;
-
-	hash_pos = kh_get_oid_map(bitmap_git->bitmaps, *oid);
-	if (hash_pos < kh_end(bitmap_git->bitmaps)) {
-		struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, hash_pos);
-		bitmap_or_ewah(data->base, lookup_stored_bitmap(st));
-		return 0;
-	}
-
-	bitmap_set(data->base, bitmap_pos);
-	return 1;
-}
-
-static int should_include(struct commit *commit, void *_data)
-{
-	struct include_data *data = _data;
-	int bitmap_pos;
-
-	bitmap_pos = bitmap_position(data->bitmap_git, &commit->object.oid);
-	if (bitmap_pos < 0)
-		bitmap_pos = ext_index_add_object(data->bitmap_git,
-						  (struct object *)commit,
-						  NULL);
-
-	if (!add_to_include_set(data->bitmap_git, data, &commit->object.oid,
-				bitmap_pos)) {
-		struct commit_list *parent = commit->parents;
-
-		while (parent) {
-			parent->item->object.flags |= SEEN;
-			parent = parent->next;
-		}
-
-		return 0;
-	}
-
-	return 1;
-}
-
-static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
-				   struct rev_info *revs,
-				   struct object_list *roots,
-				   struct bitmap *seen)
+static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
+				struct bitmap **base, struct commit *commit)
 {
-	struct bitmap *base = NULL;
-	int needs_walk = 0;
-
-	struct object_list *not_mapped = NULL;
-
-	/*
-	 * Go through all the roots for the walk. The ones that have bitmaps
-	 * on the bitmap index will be `or`ed together to form an initial
-	 * global reachability analysis.
-	 *
-	 * The ones without bitmaps in the index will be stored in the
-	 * `not_mapped_list` for further processing.
-	 */
-	while (roots) {
-		struct object *object = roots->item;
-		roots = roots->next;
-
-		if (object->type == OBJ_COMMIT) {
-			khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, object->oid);
-
-			if (pos < kh_end(bitmap_git->bitmaps)) {
-				struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos);
-				struct ewah_bitmap *or_with = lookup_stored_bitmap(st);
-
-				if (base == NULL)
-					base = ewah_to_bitmap(or_with);
-				else
-					bitmap_or_ewah(base, or_with);
-
-				object->flags |= SEEN;
-				continue;
-			}
-		}
-
-		object_list_insert(object, &not_mapped);
-	}
-
-	/*
-	 * Best case scenario: We found bitmaps for all the roots,
-	 * so the resulting `or` bitmap has the full reachability analysis
-	 */
-	if (not_mapped == NULL)
-		return base;
-
-	roots = not_mapped;
-
-	/*
-	 * Let's iterate through all the roots that don't have bitmaps to
-	 * check if we can determine them to be reachable from the existing
-	 * global bitmap.
-	 *
-	 * If we cannot find them in the existing global bitmap, we'll need
-	 * to push them to an actual walk and run it until we can confirm
-	 * they are reachable
-	 */
-	while (roots) {
-		struct object *object = roots->item;
-		int pos;
-
-		roots = roots->next;
-		pos = bitmap_position(bitmap_git, &object->oid);
-
-		if (pos < 0 || base == NULL || !bitmap_get(base, pos)) {
-			object->flags &= ~UNINTERESTING;
-			add_pending_object(revs, object, "");
-			needs_walk = 1;
-		} else {
-			object->flags |= SEEN;
-		}
-	}
-
-	if (needs_walk) {
-		struct include_data incdata;
-		struct bitmap_show_data show_data;
-
-		if (base == NULL)
-			base = bitmap_new();
-
-		incdata.bitmap_git = bitmap_git;
-		incdata.base = base;
-		incdata.seen = seen;
-
-		revs->include_check = should_include;
-		revs->include_check_data = &incdata;
-
-		if (prepare_revision_walk(revs))
-			die("revision walk setup failed");
+	khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid);
+	if (pos < kh_end(bitmap_git->bitmaps)) {
+		struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos);
+		struct ewah_bitmap *or_with = lookup_stored_bitmap(st);
 
-		show_data.bitmap_git = bitmap_git;
-		show_data.base = base;
+		if (*base == NULL)
+			*base = ewah_to_bitmap(or_with);
+		else
+			bitmap_or_ewah(*base, or_with);
 
-		traverse_commit_list(revs, show_commit, show_object,
-				     &show_data);
+		return 1;
 	}
-
-	return base;
+	return 0;
 }
 
 static void show_extended_objects(struct bitmap_index *bitmap_git,
@@ -665,24 +509,122 @@ static void show_objects_for_type(
 	}
 }
 
-static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
-			     struct object_list *roots)
+static void add_object_to_bitmap(struct bitmap_index *bitmap_git,
+				 struct object *obj,
+				 struct bitmap *bitmap,
+				 struct bitmap *seen,
+				 int ignore_missing);
+
+static void add_tag_to_bitmap(struct bitmap_index *bitmap_git,
+			      struct tag *tag,
+			      struct bitmap *bitmap,
+			      struct bitmap *seen,
+			      int ignore_missing)
+{
+	if (parse_tag(tag) < 0 || !tag->tagged) {
+		if (ignore_missing)
+			return;
+		die("unable to parse tag %s", oid_to_hex(&tag->object.oid));
+	}
+	add_object_to_bitmap(bitmap_git, tag->tagged, bitmap, seen,
+			     ignore_missing);
+}
+
+static void add_tree_to_bitmap(struct bitmap_index *bitmap_git,
+			       struct tree *tree,
+			       struct bitmap *bitmap,
+			       struct bitmap *seen,
+			       int ignore_missing)
 {
-	while (roots) {
-		struct object *object = roots->item;
-		roots = roots->next;
+	struct tree_desc desc;
+	struct name_entry entry;
 
-		if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
-			return 1;
+	if (parse_tree(tree) < 0) {
+		if (ignore_missing)
+			return;
+		die("unable to parse tree %s", oid_to_hex(&tree->object.oid));
 	}
 
-	return 0;
+	init_tree_desc(&desc, tree->buffer, tree->size);
+	while (tree_entry(&desc, &entry)) {
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, &entry.oid);
+			if (!t) {
+				die(_("entry '%s' in tree %s has tree mode, "
+				      "but is not a tree"),
+				    entry.path, oid_to_hex(&tree->object.oid));
+			}
+			add_object_to_bitmap(bitmap_git, &t->object,
+					     bitmap, seen, ignore_missing);
+		} else if (!S_ISGITLINK(entry.mode)) {
+			struct blob *b = lookup_blob(the_repository, &entry.oid);
+			if (!b) {
+				die(_("entry '%s' in tree %s has blob mode, "
+				      "but is not a blob"),
+				    entry.path, oid_to_hex(&tree->object.oid));
+			}
+			add_object_to_bitmap(bitmap_git, &b->object,
+					     bitmap, seen, ignore_missing);
+		}
+	}
+
+	free_tree_buffer(tree);
+}
+
+static void add_object_to_bitmap(struct bitmap_index *bitmap_git,
+				 struct object *obj,
+				 struct bitmap *bitmap,
+				 struct bitmap *seen,
+				 int ignore_missing)
+{
+	int pos = bitmap_position(bitmap_git, &obj->oid);
+
+	if (pos < 0)
+		pos = ext_index_add_object(bitmap_git, obj, NULL);
+
+	if (bitmap_get(bitmap, pos))
+		return; /* already there */
+
+	if (seen && bitmap_get(seen, pos))
+		return; /* seen in other bitmap; not worth digging further */
+
+	bitmap_set(bitmap, pos);
+
+	if (obj->type == OBJ_BLOB)
+		return;
+	else if (obj->type == OBJ_TAG)
+		add_tag_to_bitmap(bitmap_git, (struct tag *)obj,
+				  bitmap, seen, ignore_missing);
+	else if (obj->type == OBJ_TREE)
+		add_tree_to_bitmap(bitmap_git, (struct tree *)obj,
+				   bitmap, seen, ignore_missing);
+	else
+		BUG("unexpected object type: %d", obj->type);
+}
+
+static void add_objects_to_bitmap(struct bitmap_index *bitmap_git,
+				  struct object_list **list,
+				  struct bitmap *bitmap,
+				  struct bitmap *seen,
+				  int ignore_missing)
+{
+	while (*list) {
+		struct object_list *cur = *list;
+
+		add_object_to_bitmap(bitmap_git, cur->item,
+				     bitmap, seen, ignore_missing);
+
+		*list = cur->next;
+		free(cur);
+	}
 }
 
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 {
 	unsigned int i;
 
+	struct prio_queue commits = { compare_commits_by_commit_date };
+
 	struct object_list *wants = NULL;
 	struct object_list *haves = NULL;
 
@@ -714,24 +656,16 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 			object = parse_object_or_die(&tag->tagged->oid, NULL);
 		}
 
-		if (object->flags & UNINTERESTING)
-			object_list_insert(object, &haves);
-		else
-			object_list_insert(object, &wants);
+		if (object->type == OBJ_COMMIT)
+			prio_queue_put(&commits, object);
+		else {
+			if (object->flags & UNINTERESTING)
+				object_list_insert(object, &haves);
+			else
+				object_list_insert(object, &wants);
+		}
 	}
 
-	/*
-	 * if we have a HAVES list, but none of those haves is contained
-	 * in the packfile that has a bitmap, we don't have anything to
-	 * optimize here
-	 */
-	if (haves && !in_bitmapped_pack(bitmap_git, haves))
-		goto cleanup;
-
-	/* if we don't want anything, we're done here */
-	if (!wants)
-		goto cleanup;
-
 	/*
 	 * now we're going to use bitmaps, so load the actual bitmap entries
 	 * from disk. this is the point of no return; after this the rev_list
@@ -742,20 +676,74 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 
 	object_array_clear(&revs->pending);
 
-	if (haves) {
-		revs->ignore_missing_links = 1;
-		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
-		reset_revision_walk();
-		revs->ignore_missing_links = 0;
+	haves_bitmap = bitmap_new();
+	wants_bitmap = bitmap_new();
 
-		if (haves_bitmap == NULL)
-			BUG("failed to perform bitmap walk");
-	}
+	/*
+	 * First traverse the relevant commits as we would for a normal
+	 * traversal.
+	 */
+	while (commits.nr) {
+		struct commit *commit = prio_queue_get(&commits);
+		struct bitmap **dst_bitmap;
+		int pos;
+		struct commit_list *parent;
+
+		if (commit->object.flags & UNINTERESTING)
+			dst_bitmap = &haves_bitmap;
+		else
+			dst_bitmap = &wants_bitmap;
 
-	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);
+		pos = bitmap_position(bitmap_git, &commit->object.oid);
+		if (pos >= 0 && *dst_bitmap && bitmap_get(*dst_bitmap, pos))
+			continue; /* already covered */
 
-	if (!wants_bitmap)
-		BUG("failed to perform bitmap walk");
+		if (add_commit_to_bitmap(bitmap_git, dst_bitmap, commit))
+			continue; /* skip adding parents, they're covered */
+
+
+		/* otherwise mark ourselves and queue dependent objects */
+		if (pos < 0)
+			pos = ext_index_add_object(bitmap_git, &commit->object, NULL);
+		bitmap_set(*dst_bitmap, pos);
+		if (parse_commit(commit)) {
+			if (commit->object.flags & UNINTERESTING)
+				continue;
+			die("unable to parse commit %s",
+			    oid_to_hex(&commit->object.oid));
+		}
+		if (commit->object.flags & UNINTERESTING) {
+			/*
+			 * If an uninteresting commit is in the "wants" bitmap,
+			 * then our tree is worth exploring. This means we may
+			 * miss some trees in the "haves" that are not
+			 * ancestors of "wants" (or that are, but we missed
+			 * because of out-of-order timestamps).
+			 */
+			if (wants_bitmap && bitmap_get(wants_bitmap, pos))
+				object_list_insert(&get_commit_tree(commit)->object,
+						   &haves);
+		} else {
+			/*
+			 * "wants" must always be explored
+			 */
+			object_list_insert(&get_commit_tree(commit)->object,
+					   &wants);
+		}
+
+		for (parent = commit->parents; parent; parent = parent->next) {
+			if (commit->object.flags & UNINTERESTING)
+				parent->item->object.flags |= UNINTERESTING;
+			prio_queue_put(&commits, parent->item);
+		}
+	}
+
+	/*
+	 * At this point we've processed all of the commits, and queued items
+	 * in "haves" and "wants" that need to be marked.
+	 */
+	add_objects_to_bitmap(bitmap_git, &haves, haves_bitmap, NULL, 1);
+	add_objects_to_bitmap(bitmap_git, &wants, wants_bitmap, haves_bitmap, 0);
 
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
-- 
2.22.0.rc1.549.gadb183c4cb


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-23 11:30                     ` Jeff King
@ 2019-05-23 12:53                       ` Derrick Stolee
  2019-05-24  7:24                         ` Jeff King
  2019-05-23 19:26                       ` Ævar Arnfjörð Bjarmason
  2019-05-24 11:31                       ` [PATCH] pack-bitmap: look for an uninteresting bitmap Derrick Stolee
  2 siblings, 1 reply; 57+ messages in thread
From: Derrick Stolee @ 2019-05-23 12:53 UTC (permalink / raw)
  To: Jeff King, Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Eric Wong, git

On 5/23/2019 7:30 AM, Jeff King wrote:
> On Wed, Apr 10, 2019 at 06:57:21PM -0400, Jeff King wrote:
> 
>>   2. The answer we get from a bitmap versus a regular traversal are not
>>      apples-to-apples equivalent. The regular traversal walks down to
>>      the UNINTERESTING commits, marks the boundaries trees and blobs as
>>      UNINTERESTING, and then adds in all the interesting trees and blobs
>>      minus the UNINTERESTING parts. So it can sometimes give the wrong
>>      answer, claiming something is interesting when it is not.
>>
>>      Whereas with bitmaps we fill in the trees and blobs as we walk, and
>>      you get the true answer. But it means we may open up a lot more
>>      trees than the equivalent traversal would.
>>
>>      So one thing I've never really experimented with (and indeed, never
>>      really thought about until writing this email) is that the bitmaps
>>      could try to do that looser style of traversal, knowing that we
>>      might err on the side of calling things interesting in a few cases.
>>      But hopefully spending a lot less time opening trees.
>>
>>      I'm not even 100% sure what that would look like in code, but just
>>      thinking about it from a high level, I don't there's a particular
>>      mathematical reason it couldn't work.
> 
> I spent a while thinking and experimenting with this tonight. The result
> is the patch below. Ævar, do you still have a copy of the repo that
> misbehaved? I'd be curious to see how it fares.

This patch caught my attention, and I think I understand some of the issues
at hand. I'm not well-versed specifically in Git's bitmap implementation.
The fundamental ideas are there, but my perspective is biased by my own
independent bitmap implementation for Azure Repos. What worked there may not
work at all here.

> Finding the right trees to explore is a little tricky with bitmaps.  In
> a normal traversal, we consider the "edges" to be worth exploring.
> I.e., the places where an UNINTERESTING commit is the parent of an
> interesting one.

This is the "commit frontier" which I bring up again below.

> But with bitmaps, we don't have that information in the same way,
> because we're trying to avoid walking the commits in the first place. So
> e.g., in a history like this:
> 
>   A--B--C
>       \
>        D
> 
> Let's imagine we're computing "C..D", and that D has a bitmap on disk
> but C does not.

(As I read your discussion below, I'm confused. For "C..D", C is a have
and D is a want. We should explore all the haves _first_, then walk the
wants, right?)

> When we visit D, we'll see that it has a bitmap, fill in
> the results in our cumulative "want" bitmap, and then stop walking its
> parents (because we know everything they could tell us already).

I may be misreading something, but we would construct _a_ bitmap for C
by walking its reachable objects until hitting a bitmap we know about.
Perhaps A or B have a bitmap to start the construction. If we never
construct a bitmap for C, then we don't know what to remove from the "D"
bitmap to show the difference.

If "C" is not even in the bitmap pack, then we don't use bitmaps for
this calculation because of this condition:

        /*
         * if we have a HAVES list, but none of those haves is contained
         * in the packfile that has a bitmap, we don't have anything to
         * optimize here
         */
        if (haves && !in_bitmapped_pack(bitmap_git, haves))
                goto cleanup;

> Then we visit C. It's not bitmapped, so we keep walking. Unlike a normal
> traversal, we can't stop walking even though there are only
> UNINTERESTING commits left, because we have to fill the complete bitmap,
> including A and B (and you can imagine there might be thousands of
> ancestors of A, too). The reason is that we skipped adding ancestors of
> D when we saw its bitmap, so no still_interesting() cutoff will work
> there.
> 
> But how do we know that it's worth looking into the tree of "B"? If we
> assume we visit commits before their ancestors (because of the commit
> timestamp ordering), then we'd see that "B" is in the "want" bitmap
> already, which makes it worth visiting its tree.
> 
> Unfortunately we'd then continue on to "A", and it would _also_ look
> interesting, because it's also in the "want" bitmap. We don't have an
> easy way of knowing that "A" is basically already covered by "B", and is
> therefore not worth pursuing. In this example, we could pass down a bit
> from B to A as we traverse. But you can imagine another interesting
> commit branched from A, which _would_ make A's tree worth considering.
> 
> Fundamentally, as soon as we load a bitmap and stop traversing, we lose
> all information about the graph structure.
> 
> So the patch below just looks at every tree that might be worth
> exploring (so both "A" and "B" here, but not "C"). That shouldn't be any
> _worse_ than what the current code does (because it looks at all the
> trees). It appears to behave correctly, at least so far as passing the
> test suite. Running p5310 and p5311 against git.git and linux.git, it
> seems to perform worse. I'm not quite sure why that is (i.e., if I
> screwed up something in the implementation, or if the algorithm is
> fundamentally worse).

I'm of the opinion that the old method was better. It followed a very clear
three-step process:

 1. Compute the "haves" bitmap.

 2. Compute the "wants" bitmap, but don't explore into the "haves" bitmap.

 3. Subtract the "haves" bitmap from the "wants" (in case we added bits to
    the wants before they were in the haves).

But there is hope in your idea to improve some cases. As noted, we give up
if all of the haves are not in the bitmapped pack. By starting with a
commit walk, we can find the "commit frontier," which is the set of commits
that are explored by paint_down_to_common(). In this case, the set {B, C, D}.

After walking commits and finding a set of UNINTERESTING commits, we could
look for any UNINTERESTING commits that have bitmaps (or simply are in the
bitmapped pack) and use those to see our "haves" bitmap. So, if B has a
bitmap, but C is too new for the bitmap, then we can still create the haves
based on B and then take a bitmap diff "D minus B".

In fact, using the "merge base set" for the diff reflects what the non-bitmap
algorithm does in this case. It ignores C and only explores B.

I learned a lot looking at both versions of this method side-by-side. Thanks!

-Stolee

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-23 11:30                     ` Jeff King
  2019-05-23 12:53                       ` Derrick Stolee
@ 2019-05-23 19:26                       ` Ævar Arnfjörð Bjarmason
  2019-05-24  7:27                         ` Jeff King
  2019-05-24 11:31                       ` [PATCH] pack-bitmap: look for an uninteresting bitmap Derrick Stolee
  2 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-23 19:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Eric Wong, git


On Thu, May 23 2019, Jeff King wrote:

> On Wed, Apr 10, 2019 at 06:57:21PM -0400, Jeff King wrote:
>
>>   2. The answer we get from a bitmap versus a regular traversal are not
>>      apples-to-apples equivalent. The regular traversal walks down to
>>      the UNINTERESTING commits, marks the boundaries trees and blobs as
>>      UNINTERESTING, and then adds in all the interesting trees and blobs
>>      minus the UNINTERESTING parts. So it can sometimes give the wrong
>>      answer, claiming something is interesting when it is not.
>>
>>      Whereas with bitmaps we fill in the trees and blobs as we walk, and
>>      you get the true answer. But it means we may open up a lot more
>>      trees than the equivalent traversal would.
>>
>>      So one thing I've never really experimented with (and indeed, never
>>      really thought about until writing this email) is that the bitmaps
>>      could try to do that looser style of traversal, knowing that we
>>      might err on the side of calling things interesting in a few cases.
>>      But hopefully spending a lot less time opening trees.
>>
>>      I'm not even 100% sure what that would look like in code, but just
>>      thinking about it from a high level, I don't there's a particular
>>      mathematical reason it couldn't work.
>
> I spent a while thinking and experimenting with this tonight. The result
> is the patch below. Ævar, do you still have a copy of the repo that
> misbehaved? I'd be curious to see how it fares.

No, sorry. I think we could make an artificial test to emulate it, which
would be something like:

 * ~1 million commits
 * local clone setup to fetch all branches/tags (default 'git clone')
 * local a bit ahead/behind
 * Have old "main" pack with *.bitmap, bunch of other packs/loose objects without that
 * push try to push the local change upstream (or to a topic branch)

I tried briefly to emulate this with git.git with:

    (
        rm -rf /tmp/git /tmp/git.old &&
        git init /tmp/git.old &&
        git clone --bare https://github.com/git/git.git /tmp/git &&
        cd /tmp/git &&
        old_commit=$(git rev-parse v2.20.0^{}) &&
        git rev-list v2.12.0..v2.21.0|parallel 'git branch topic-{} {}' &&
        cd /tmp/git.old &&
        echo /tmp/git/objects >.git/objects/info/alternates &&
        git branch master $old_commit &&
        git reset --hard master &&
        git repack -Adb &&
        rm .git/objects/info/alternates &&
        for c in {1..10}
        do
            >$c &&
            git add $c &&
            git commit -m"bump $c"
        done
    )

But didn't get anywhere, probably because here my topics are all stuff I
have already, and I just have 2x packs.

It would be really cool to have some test tool that could produce
various "shape of repo" states like that. I.e. given an end-state of a
public repo emulate various plausible local client states like that
given some assumptions about how often the local client fetches, what
the GC settings are etc.

> Finding the right trees to explore is a little tricky with bitmaps.  In
> a normal traversal, we consider the "edges" to be worth exploring.
> I.e., the places where an UNINTERESTING commit is the parent of an
> interesting one.
>
> But with bitmaps, we don't have that information in the same way,
> because we're trying to avoid walking the commits in the first place. So
> e.g., in a history like this:
>
>   A--B--C
>       \
>        D
>
> Let's imagine we're computing "C..D", and that D has a bitmap on disk
> but C does not. When we visit D, we'll see that it has a bitmap, fill in
> the results in our cumulative "want" bitmap, and then stop walking its
> parents (because we know everything they could tell us already).
>
> Then we visit C. It's not bitmapped, so we keep walking. Unlike a normal
> traversal, we can't stop walking even though there are only
> UNINTERESTING commits left, because we have to fill the complete bitmap,
> including A and B (and you can imagine there might be thousands of
> ancestors of A, too). The reason is that we skipped adding ancestors of
> D when we saw its bitmap, so no still_interesting() cutoff will work
> there.
>
> But how do we know that it's worth looking into the tree of "B"? If we
> assume we visit commits before their ancestors (because of the commit
> timestamp ordering), then we'd see that "B" is in the "want" bitmap
> already, which makes it worth visiting its tree.
>
> Unfortunately we'd then continue on to "A", and it would _also_ look
> interesting, because it's also in the "want" bitmap. We don't have an
> easy way of knowing that "A" is basically already covered by "B", and is
> therefore not worth pursuing. In this example, we could pass down a bit
> from B to A as we traverse. But you can imagine another interesting
> commit branched from A, which _would_ make A's tree worth considering.
>
> Fundamentally, as soon as we load a bitmap and stop traversing, we lose
> all information about the graph structure.
>
> So the patch below just looks at every tree that might be worth
> exploring (so both "A" and "B" here, but not "C"). That shouldn't be any
> _worse_ than what the current code does (because it looks at all the
> trees). It appears to behave correctly, at least so far as passing the
> test suite. Running p5310 and p5311 against git.git and linux.git, it
> seems to perform worse. I'm not quite sure why that is (i.e., if I
> screwed up something in the implementation, or if the algorithm is
> fundamentally worse).
>
> There are a lot of rough edges in the patch; I was just trying to see if
> the idea was even viable. So don't bother reviewing it as a real
> proposal for inclusion. If you do read it, I'd recommend just looking at
> the post-image, since it's essentially a rewrite and the diff is a bit
> messy.
>
> -Peff
>
> ---
>  pack-bitmap.c | 398 ++++++++++++++++++++++++--------------------------
>  1 file changed, 193 insertions(+), 205 deletions(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 6069b2fe55..4a40f62a38 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -12,6 +12,8 @@
>  #include "packfile.h"
>  #include "repository.h"
>  #include "object-store.h"
> +#include "blob.h"
> +#include "prio-queue.h"
>
>  /*
>   * An entry on the bitmap index, representing the bitmap for a given
> @@ -422,180 +424,22 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
>  	return bitmap_pos + bitmap_git->pack->num_objects;
>  }
>
> -struct bitmap_show_data {
> -	struct bitmap_index *bitmap_git;
> -	struct bitmap *base;
> -};
> -
> -static void show_object(struct object *object, const char *name, void *data_)
> -{
> -	struct bitmap_show_data *data = data_;
> -	int bitmap_pos;
> -
> -	bitmap_pos = bitmap_position(data->bitmap_git, &object->oid);
> -
> -	if (bitmap_pos < 0)
> -		bitmap_pos = ext_index_add_object(data->bitmap_git, object,
> -						  name);
> -
> -	bitmap_set(data->base, bitmap_pos);
> -}
> -
> -static void show_commit(struct commit *commit, void *data)
> -{
> -}
> -
> -static int add_to_include_set(struct bitmap_index *bitmap_git,
> -			      struct include_data *data,
> -			      const struct object_id *oid,
> -			      int bitmap_pos)
> -{
> -	khiter_t hash_pos;
> -
> -	if (data->seen && bitmap_get(data->seen, bitmap_pos))
> -		return 0;
> -
> -	if (bitmap_get(data->base, bitmap_pos))
> -		return 0;
> -
> -	hash_pos = kh_get_oid_map(bitmap_git->bitmaps, *oid);
> -	if (hash_pos < kh_end(bitmap_git->bitmaps)) {
> -		struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, hash_pos);
> -		bitmap_or_ewah(data->base, lookup_stored_bitmap(st));
> -		return 0;
> -	}
> -
> -	bitmap_set(data->base, bitmap_pos);
> -	return 1;
> -}
> -
> -static int should_include(struct commit *commit, void *_data)
> -{
> -	struct include_data *data = _data;
> -	int bitmap_pos;
> -
> -	bitmap_pos = bitmap_position(data->bitmap_git, &commit->object.oid);
> -	if (bitmap_pos < 0)
> -		bitmap_pos = ext_index_add_object(data->bitmap_git,
> -						  (struct object *)commit,
> -						  NULL);
> -
> -	if (!add_to_include_set(data->bitmap_git, data, &commit->object.oid,
> -				bitmap_pos)) {
> -		struct commit_list *parent = commit->parents;
> -
> -		while (parent) {
> -			parent->item->object.flags |= SEEN;
> -			parent = parent->next;
> -		}
> -
> -		return 0;
> -	}
> -
> -	return 1;
> -}
> -
> -static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
> -				   struct rev_info *revs,
> -				   struct object_list *roots,
> -				   struct bitmap *seen)
> +static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
> +				struct bitmap **base, struct commit *commit)
>  {
> -	struct bitmap *base = NULL;
> -	int needs_walk = 0;
> -
> -	struct object_list *not_mapped = NULL;
> -
> -	/*
> -	 * Go through all the roots for the walk. The ones that have bitmaps
> -	 * on the bitmap index will be `or`ed together to form an initial
> -	 * global reachability analysis.
> -	 *
> -	 * The ones without bitmaps in the index will be stored in the
> -	 * `not_mapped_list` for further processing.
> -	 */
> -	while (roots) {
> -		struct object *object = roots->item;
> -		roots = roots->next;
> -
> -		if (object->type == OBJ_COMMIT) {
> -			khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, object->oid);
> -
> -			if (pos < kh_end(bitmap_git->bitmaps)) {
> -				struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos);
> -				struct ewah_bitmap *or_with = lookup_stored_bitmap(st);
> -
> -				if (base == NULL)
> -					base = ewah_to_bitmap(or_with);
> -				else
> -					bitmap_or_ewah(base, or_with);
> -
> -				object->flags |= SEEN;
> -				continue;
> -			}
> -		}
> -
> -		object_list_insert(object, &not_mapped);
> -	}
> -
> -	/*
> -	 * Best case scenario: We found bitmaps for all the roots,
> -	 * so the resulting `or` bitmap has the full reachability analysis
> -	 */
> -	if (not_mapped == NULL)
> -		return base;
> -
> -	roots = not_mapped;
> -
> -	/*
> -	 * Let's iterate through all the roots that don't have bitmaps to
> -	 * check if we can determine them to be reachable from the existing
> -	 * global bitmap.
> -	 *
> -	 * If we cannot find them in the existing global bitmap, we'll need
> -	 * to push them to an actual walk and run it until we can confirm
> -	 * they are reachable
> -	 */
> -	while (roots) {
> -		struct object *object = roots->item;
> -		int pos;
> -
> -		roots = roots->next;
> -		pos = bitmap_position(bitmap_git, &object->oid);
> -
> -		if (pos < 0 || base == NULL || !bitmap_get(base, pos)) {
> -			object->flags &= ~UNINTERESTING;
> -			add_pending_object(revs, object, "");
> -			needs_walk = 1;
> -		} else {
> -			object->flags |= SEEN;
> -		}
> -	}
> -
> -	if (needs_walk) {
> -		struct include_data incdata;
> -		struct bitmap_show_data show_data;
> -
> -		if (base == NULL)
> -			base = bitmap_new();
> -
> -		incdata.bitmap_git = bitmap_git;
> -		incdata.base = base;
> -		incdata.seen = seen;
> -
> -		revs->include_check = should_include;
> -		revs->include_check_data = &incdata;
> -
> -		if (prepare_revision_walk(revs))
> -			die("revision walk setup failed");
> +	khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid);
> +	if (pos < kh_end(bitmap_git->bitmaps)) {
> +		struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos);
> +		struct ewah_bitmap *or_with = lookup_stored_bitmap(st);
>
> -		show_data.bitmap_git = bitmap_git;
> -		show_data.base = base;
> +		if (*base == NULL)
> +			*base = ewah_to_bitmap(or_with);
> +		else
> +			bitmap_or_ewah(*base, or_with);
>
> -		traverse_commit_list(revs, show_commit, show_object,
> -				     &show_data);
> +		return 1;
>  	}
> -
> -	return base;
> +	return 0;
>  }
>
>  static void show_extended_objects(struct bitmap_index *bitmap_git,
> @@ -665,24 +509,122 @@ static void show_objects_for_type(
>  	}
>  }
>
> -static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
> -			     struct object_list *roots)
> +static void add_object_to_bitmap(struct bitmap_index *bitmap_git,
> +				 struct object *obj,
> +				 struct bitmap *bitmap,
> +				 struct bitmap *seen,
> +				 int ignore_missing);
> +
> +static void add_tag_to_bitmap(struct bitmap_index *bitmap_git,
> +			      struct tag *tag,
> +			      struct bitmap *bitmap,
> +			      struct bitmap *seen,
> +			      int ignore_missing)
> +{
> +	if (parse_tag(tag) < 0 || !tag->tagged) {
> +		if (ignore_missing)
> +			return;
> +		die("unable to parse tag %s", oid_to_hex(&tag->object.oid));
> +	}
> +	add_object_to_bitmap(bitmap_git, tag->tagged, bitmap, seen,
> +			     ignore_missing);
> +}
> +
> +static void add_tree_to_bitmap(struct bitmap_index *bitmap_git,
> +			       struct tree *tree,
> +			       struct bitmap *bitmap,
> +			       struct bitmap *seen,
> +			       int ignore_missing)
>  {
> -	while (roots) {
> -		struct object *object = roots->item;
> -		roots = roots->next;
> +	struct tree_desc desc;
> +	struct name_entry entry;
>
> -		if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
> -			return 1;
> +	if (parse_tree(tree) < 0) {
> +		if (ignore_missing)
> +			return;
> +		die("unable to parse tree %s", oid_to_hex(&tree->object.oid));
>  	}
>
> -	return 0;
> +	init_tree_desc(&desc, tree->buffer, tree->size);
> +	while (tree_entry(&desc, &entry)) {
> +		if (S_ISDIR(entry.mode)) {
> +			struct tree *t = lookup_tree(the_repository, &entry.oid);
> +			if (!t) {
> +				die(_("entry '%s' in tree %s has tree mode, "
> +				      "but is not a tree"),
> +				    entry.path, oid_to_hex(&tree->object.oid));
> +			}
> +			add_object_to_bitmap(bitmap_git, &t->object,
> +					     bitmap, seen, ignore_missing);
> +		} else if (!S_ISGITLINK(entry.mode)) {
> +			struct blob *b = lookup_blob(the_repository, &entry.oid);
> +			if (!b) {
> +				die(_("entry '%s' in tree %s has blob mode, "
> +				      "but is not a blob"),
> +				    entry.path, oid_to_hex(&tree->object.oid));
> +			}
> +			add_object_to_bitmap(bitmap_git, &b->object,
> +					     bitmap, seen, ignore_missing);
> +		}
> +	}
> +
> +	free_tree_buffer(tree);
> +}
> +
> +static void add_object_to_bitmap(struct bitmap_index *bitmap_git,
> +				 struct object *obj,
> +				 struct bitmap *bitmap,
> +				 struct bitmap *seen,
> +				 int ignore_missing)
> +{
> +	int pos = bitmap_position(bitmap_git, &obj->oid);
> +
> +	if (pos < 0)
> +		pos = ext_index_add_object(bitmap_git, obj, NULL);
> +
> +	if (bitmap_get(bitmap, pos))
> +		return; /* already there */
> +
> +	if (seen && bitmap_get(seen, pos))
> +		return; /* seen in other bitmap; not worth digging further */
> +
> +	bitmap_set(bitmap, pos);
> +
> +	if (obj->type == OBJ_BLOB)
> +		return;
> +	else if (obj->type == OBJ_TAG)
> +		add_tag_to_bitmap(bitmap_git, (struct tag *)obj,
> +				  bitmap, seen, ignore_missing);
> +	else if (obj->type == OBJ_TREE)
> +		add_tree_to_bitmap(bitmap_git, (struct tree *)obj,
> +				   bitmap, seen, ignore_missing);
> +	else
> +		BUG("unexpected object type: %d", obj->type);
> +}
> +
> +static void add_objects_to_bitmap(struct bitmap_index *bitmap_git,
> +				  struct object_list **list,
> +				  struct bitmap *bitmap,
> +				  struct bitmap *seen,
> +				  int ignore_missing)
> +{
> +	while (*list) {
> +		struct object_list *cur = *list;
> +
> +		add_object_to_bitmap(bitmap_git, cur->item,
> +				     bitmap, seen, ignore_missing);
> +
> +		*list = cur->next;
> +		free(cur);
> +	}
>  }
>
>  struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
>  {
>  	unsigned int i;
>
> +	struct prio_queue commits = { compare_commits_by_commit_date };
> +
>  	struct object_list *wants = NULL;
>  	struct object_list *haves = NULL;
>
> @@ -714,24 +656,16 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
>  			object = parse_object_or_die(&tag->tagged->oid, NULL);
>  		}
>
> -		if (object->flags & UNINTERESTING)
> -			object_list_insert(object, &haves);
> -		else
> -			object_list_insert(object, &wants);
> +		if (object->type == OBJ_COMMIT)
> +			prio_queue_put(&commits, object);
> +		else {
> +			if (object->flags & UNINTERESTING)
> +				object_list_insert(object, &haves);
> +			else
> +				object_list_insert(object, &wants);
> +		}
>  	}
>
> -	/*
> -	 * if we have a HAVES list, but none of those haves is contained
> -	 * in the packfile that has a bitmap, we don't have anything to
> -	 * optimize here
> -	 */
> -	if (haves && !in_bitmapped_pack(bitmap_git, haves))
> -		goto cleanup;
> -
> -	/* if we don't want anything, we're done here */
> -	if (!wants)
> -		goto cleanup;
> -
>  	/*
>  	 * now we're going to use bitmaps, so load the actual bitmap entries
>  	 * from disk. this is the point of no return; after this the rev_list
> @@ -742,20 +676,74 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
>
>  	object_array_clear(&revs->pending);
>
> -	if (haves) {
> -		revs->ignore_missing_links = 1;
> -		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
> -		reset_revision_walk();
> -		revs->ignore_missing_links = 0;
> +	haves_bitmap = bitmap_new();
> +	wants_bitmap = bitmap_new();
>
> -		if (haves_bitmap == NULL)
> -			BUG("failed to perform bitmap walk");
> -	}
> +	/*
> +	 * First traverse the relevant commits as we would for a normal
> +	 * traversal.
> +	 */
> +	while (commits.nr) {
> +		struct commit *commit = prio_queue_get(&commits);
> +		struct bitmap **dst_bitmap;
> +		int pos;
> +		struct commit_list *parent;
> +
> +		if (commit->object.flags & UNINTERESTING)
> +			dst_bitmap = &haves_bitmap;
> +		else
> +			dst_bitmap = &wants_bitmap;
>
> -	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);
> +		pos = bitmap_position(bitmap_git, &commit->object.oid);
> +		if (pos >= 0 && *dst_bitmap && bitmap_get(*dst_bitmap, pos))
> +			continue; /* already covered */
>
> -	if (!wants_bitmap)
> -		BUG("failed to perform bitmap walk");
> +		if (add_commit_to_bitmap(bitmap_git, dst_bitmap, commit))
> +			continue; /* skip adding parents, they're covered */
> +
> +
> +		/* otherwise mark ourselves and queue dependent objects */
> +		if (pos < 0)
> +			pos = ext_index_add_object(bitmap_git, &commit->object, NULL);
> +		bitmap_set(*dst_bitmap, pos);
> +		if (parse_commit(commit)) {
> +			if (commit->object.flags & UNINTERESTING)
> +				continue;
> +			die("unable to parse commit %s",
> +			    oid_to_hex(&commit->object.oid));
> +		}
> +		if (commit->object.flags & UNINTERESTING) {
> +			/*
> +			 * If an uninteresting commit is in the "wants" bitmap,
> +			 * then our tree is worth exploring. This means we may
> +			 * miss some trees in the "haves" that are not
> +			 * ancestors of "wants" (or that are, but we missed
> +			 * because of out-of-order timestamps).
> +			 */
> +			if (wants_bitmap && bitmap_get(wants_bitmap, pos))
> +				object_list_insert(&get_commit_tree(commit)->object,
> +						   &haves);
> +		} else {
> +			/*
> +			 * "wants" must always be explored
> +			 */
> +			object_list_insert(&get_commit_tree(commit)->object,
> +					   &wants);
> +		}
> +
> +		for (parent = commit->parents; parent; parent = parent->next) {
> +			if (commit->object.flags & UNINTERESTING)
> +				parent->item->object.flags |= UNINTERESTING;
> +			prio_queue_put(&commits, parent->item);
> +		}
> +	}
> +
> +	/*
> +	 * At this point we've processed all of the commits, and queued items
> +	 * in "haves" and "wants" that need to be marked.
> +	 */
> +	add_objects_to_bitmap(bitmap_git, &haves, haves_bitmap, NULL, 1);
> +	add_objects_to_bitmap(bitmap_git, &wants, wants_bitmap, haves_bitmap, 0);
>
>  	if (haves_bitmap)
>  		bitmap_and_not(wants_bitmap, haves_bitmap);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-23 12:53                       ` Derrick Stolee
@ 2019-05-24  7:24                         ` Jeff King
  2019-05-24 10:33                           ` Derrick Stolee
  0 siblings, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-05-24  7:24 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Eric Wong, git

On Thu, May 23, 2019 at 08:53:56AM -0400, Derrick Stolee wrote:

> > I spent a while thinking and experimenting with this tonight. The result
> > is the patch below. Ævar, do you still have a copy of the repo that
> > misbehaved? I'd be curious to see how it fares.
> 
> This patch caught my attention, and I think I understand some of the issues
> at hand. I'm not well-versed specifically in Git's bitmap implementation.
> The fundamental ideas are there, but my perspective is biased by my own
> independent bitmap implementation for Azure Repos. What worked there may not
> work at all here.

Thanks for looking at this. There are a lot of short-comings in the
current bitmap implementation, so if there's a better way to do things,
I'm not opposed to moving to a new format. :)

> > Finding the right trees to explore is a little tricky with bitmaps.  In
> > a normal traversal, we consider the "edges" to be worth exploring.
> > I.e., the places where an UNINTERESTING commit is the parent of an
> > interesting one.
> 
> This is the "commit frontier" which I bring up again below.

Right. I actually had trouble coming up with a succinct way of
describing this, and stole the definition from your recent blog post. ;)

> > But with bitmaps, we don't have that information in the same way,
> > because we're trying to avoid walking the commits in the first place. So
> > e.g., in a history like this:
> > 
> >   A--B--C
> >       \
> >        D
> > 
> > Let's imagine we're computing "C..D", and that D has a bitmap on disk
> > but C does not.
> 
> (As I read your discussion below, I'm confused. For "C..D", C is a have
> and D is a want. We should explore all the haves _first_, then walk the
> wants, right?)

I think I may have confused things by starting my description with a
hypothetical combined want/have walk. To take a step back: the problem
we were discussing is that we spend a lot of time reading trees to fill
in the "have" bitmap, even though most of those objects are unlikely to
be in the "want" in the first place (only the frontier trees are really
useful).

The current code does indeed fill the "have" bitmap first (so that while
walking "want", it can avoid walking down paths whose bits we know we're
going to clear eventually anyway). But we can't know the frontier if we
completely fill the "have" bitmap first. We have to walk both sides
together, looking for the frontier.

> > When we visit D, we'll see that it has a bitmap, fill in
> > the results in our cumulative "want" bitmap, and then stop walking its
> > parents (because we know everything they could tell us already).
> 
> I may be misreading something, but we would construct _a_ bitmap for C
> by walking its reachable objects until hitting a bitmap we know about.
> Perhaps A or B have a bitmap to start the construction. If we never
> construct a bitmap for C, then we don't know what to remove from the "D"
> bitmap to show the difference.

Yes, and that's how the current code works. If we walk back to B and it
has a bitmap, we can stop walking there and just OR in its bitmap, and
look at the trees for any intermediate commits (in this case just C) to
fill in the rest.

The problem is that we don't have a bitmap for every commit. So you can
imagine a history like this:

  A_1 -- A_2 -- ... -- A_1000 -- B -- C
                                   \
				    D

where we have a bitmap _only_ for D. Filling in the accurate and true
bitmap for C requires walking a thousand commits (and their trees!),
when the non-bitmap algorithm would find the frontier at B and call that
good enough.

> If "C" is not even in the bitmap pack, then we don't use bitmaps for
> this calculation because of this condition:
> 
>         /*
>          * if we have a HAVES list, but none of those haves is contained
>          * in the packfile that has a bitmap, we don't have anything to
>          * optimize here
>          */
>         if (haves && !in_bitmapped_pack(bitmap_git, haves))
>                 goto cleanup;

Right, but it may be in the pack but without a bitmap. We don't store a
bitmap for every commit. The idea was to save space, but select some key
commits that let us generally find a bitmap with a small amount of
walking. And most of the time it works fine. But in some cases, we seem
to find pathological cases where we do quite a lot of walking.

As I said earlier in the thread, I suspect our commit selection is not
great. It's mostly some heuristics we threw together in 2013, and I
don't think it was tested very thoroughly. So fixing that may be a
simpler approach.

What I was wondering here was whether we could get an easy fix based on
the same frontier ideas that the non-bitmap walk uses.

> I'm of the opinion that the old method was better. It followed a very clear
> three-step process:
> 
>  1. Compute the "haves" bitmap.
> 
>  2. Compute the "wants" bitmap, but don't explore into the "haves" bitmap.
> 
>  3. Subtract the "haves" bitmap from the "wants" (in case we added bits to
>     the wants before they were in the haves).

Right, this is _way_ simpler. But it necessarily means spending effort
to find the complete "haves", because we don't know which parts are
useful.

> But there is hope in your idea to improve some cases. As noted, we give up
> if all of the haves are not in the bitmapped pack. By starting with a
> commit walk, we can find the "commit frontier," which is the set of commits
> that are explored by paint_down_to_common(). In this case, the set {B, C, D}.
> 
> After walking commits and finding a set of UNINTERESTING commits, we could
> look for any UNINTERESTING commits that have bitmaps (or simply are in the
> bitmapped pack) and use those to see our "haves" bitmap. So, if B has a
> bitmap, but C is too new for the bitmap, then we can still create the haves
> based on B and then take a bitmap diff "D minus B".

But doing that commit walk to find the frontier negates part of the
purpose of using the bitmaps, which is avoiding even walking commits.
Going back to a variant of our example:

  A -- B -- C_1 -- .. -- C_1000
        \
	 D_1 -- .. -- D_1000

If we have a bitmap at C_1000 and D_1000, we don't have to walk any
commits at all. But finding the frontier requires walking 2000 commits.

Is there a way to find it with just bitmaps? You can certainly find the
intersection, but you don't have any idea which of the many shared
commits is the merge base. Of course in this example you don't actually
care about the frontier (you have the full answer immediately). But how
do you decide which way to optimize: try to avoid walking commits to
get a quick answer from bitmaps, or prefer to walk some commits to find
the frontier and avoid opening too many trees?

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-23 19:26                       ` Ævar Arnfjörð Bjarmason
@ 2019-05-24  7:27                         ` Jeff King
  2019-05-24  7:55                           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-05-24  7:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, Eric Wong, git

On Thu, May 23, 2019 at 09:26:34PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I spent a while thinking and experimenting with this tonight. The result
> > is the patch below. Ævar, do you still have a copy of the repo that
> > misbehaved? I'd be curious to see how it fares.
> 
> No, sorry. I think we could make an artificial test to emulate it, which
> would be something like:
> 
>  * ~1 million commits
>  * local clone setup to fetch all branches/tags (default 'git clone')
>  * local a bit ahead/behind
>  * Have old "main" pack with *.bitmap, bunch of other packs/loose objects without that
>  * push try to push the local change upstream (or to a topic branch)
> 
> I tried briefly to emulate this with git.git with:
> [...]
> But didn't get anywhere, probably because here my topics are all stuff I
> have already, and I just have 2x packs.

Yeah, I haven't been able to find a reproduction for this problem at
will. The bitmaps are _supposed_ to be sprinkled around through the
commit graph so that we don't have to walk far. But presumably in your
case that was not so.

I'm not sure what tickles the bitmap-writer to fail so hard. Is it
having too many refs? Weird patterns in the graph? Just a ton of
commits?

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-24  7:27                         ` Jeff King
@ 2019-05-24  7:55                           ` Ævar Arnfjörð Bjarmason
  2019-05-24  8:26                             ` Jeff King
  0 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-24  7:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Eric Wong, git


On Fri, May 24 2019, Jeff King wrote:

> On Thu, May 23, 2019 at 09:26:34PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > I spent a while thinking and experimenting with this tonight. The result
>> > is the patch below. Ævar, do you still have a copy of the repo that
>> > misbehaved? I'd be curious to see how it fares.
>>
>> No, sorry. I think we could make an artificial test to emulate it, which
>> would be something like:
>>
>>  * ~1 million commits
>>  * local clone setup to fetch all branches/tags (default 'git clone')
>>  * local a bit ahead/behind
>>  * Have old "main" pack with *.bitmap, bunch of other packs/loose objects without that
>>  * push try to push the local change upstream (or to a topic branch)
>>
>> I tried briefly to emulate this with git.git with:
>> [...]
>> But didn't get anywhere, probably because here my topics are all stuff I
>> have already, and I just have 2x packs.
>
> Yeah, I haven't been able to find a reproduction for this problem at
> will. The bitmaps are _supposed_ to be sprinkled around through the
> commit graph so that we don't have to walk far. But presumably in your
> case that was not so.
>
> I'm not sure what tickles the bitmap-writer to fail so hard. Is it
> having too many refs? Weird patterns in the graph? Just a ton of
> commits?

Ah, why did only this ancient (big) pack have a bitmap?

The bitmap writer had never failed, this was just a repository where
some automation (on a dev/staging box) cloned a repo, and someone had
once run a manual "repack" to make make a pack with a bitmap.

Then as time passed that pack stayed around, and re-looking at this that
could have only happened because I had gc.bigPackThreshold turned on.

I.e. without that we'd have eventually done a full repack, so the bitmap
would have gone away.

So getting the repo into that state was a series of unlikely events.

I think to the extent that this is an issue we can reproduce in the
future the proper fix for it in lieu of some easy fix in the bitmap code
would be to just teach "gc" to unlink old *.bitmap files if we detect
they're too stale.

I.e. we don't need to deal gracefully with some case where the bitmaps
just cover some tiny part of the graph, we can just teach "gc" to either
update them, or (if we're not currently writing them) unlink them.

That seems to me to be a good idea in general, not just with bitmaps but
also the commit graph. If we're doing a GC and our current settings
aren't such that we'd update those files, shouldn't we just unlink them?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-24  7:55                           ` Ævar Arnfjörð Bjarmason
@ 2019-05-24  8:26                             ` Jeff King
  2019-05-24  9:01                               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 57+ messages in thread
From: Jeff King @ 2019-05-24  8:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, Junio C Hamano, Eric Wong, git

On Fri, May 24, 2019 at 09:55:05AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I'm not sure what tickles the bitmap-writer to fail so hard. Is it
> > having too many refs? Weird patterns in the graph? Just a ton of
> > commits?
> 
> Ah, why did only this ancient (big) pack have a bitmap?
> 
> The bitmap writer had never failed, this was just a repository where
> some automation (on a dev/staging box) cloned a repo, and someone had
> once run a manual "repack" to make make a pack with a bitmap.

Just to be clear, by "fail" I didn't mean that the writer failed to
produce. I just meant it had poor commit selection for its bitmap
coverage. But...

> Then as time passed that pack stayed around, and re-looking at this that
> could have only happened because I had gc.bigPackThreshold turned on.
> 
> I.e. without that we'd have eventually done a full repack, so the bitmap
> would have gone away.
> 
> So getting the repo into that state was a series of unlikely events.

Ah, now that's interesting. The issue may have just been that we had a
ton of un-bitmapped commits because they weren't in the bitmapped pack
at all. The logic that Stolee pointed out earlier:

          /*
           * if we have a HAVES list, but none of those haves is contained
           * in the packfile that has a bitmap, we don't have anything to
           * optimize here
           */
          if (haves && !in_bitmapped_pack(bitmap_git, haves))
                  goto cleanup;

is pretty feeble. If you have even _one_ UNINTERESTING tip that's in the
pack, we'll try to use bitmaps. And in your case, you almost certainly
had old tags on both the client and the server there were in the old
bitmapped pack, but then a huge swath of history that had happened since
then. And it was that newer part of the graph that we had to walk to
fill in the bitmap.

So all of this makes pretty good sense to me now. Bitmaps work
incrementally as you add new, un-bitmapped history. But the cost gets
higher and higher the longer you go before repacking and generating new
bitmaps. So your case was very similar to a repo that uses bitmaps but
just hadn't packed in a really long time.

> I think to the extent that this is an issue we can reproduce in the
> future the proper fix for it in lieu of some easy fix in the bitmap code
> would be to just teach "gc" to unlink old *.bitmap files if we detect
> they're too stale.

Yeah. This could happen if you simply accumulated history without ever
running "gc". But in general we can probably assume that "gc" will run
periodically (though there is a real blind spot if you push up a very
huge chunk of history in one pack, since gc counts packs, not objects).

I agree that if gc is deciding to leave a big pack, it should probably
ditch the bitmaps for it.

> I.e. we don't need to deal gracefully with some case where the bitmaps
> just cover some tiny part of the graph, we can just teach "gc" to either
> update them, or (if we're not currently writing them) unlink them.

Unfortunately I don't think we can update them, because all of the
reachable objects need to be in the same pack. So any scheme that isn't
doing a periodic all-into-one repack probably shouldn't be using
bitmaps. I wonder if we need to tweak Eric's bitmaps-by-default logic to
better account for this.

> That seems to me to be a good idea in general, not just with bitmaps but
> also the commit graph. If we're doing a GC and our current settings
> aren't such that we'd update those files, shouldn't we just unlink them?

I don't think the commit graph would suffer from this. It's not tied to
a specific pack, so it can be freely updated on any gc. You still have
the problem when gc does not run. It's possible that auto-gc should have
separate thresholds for different activities (e.g., number of packs
should tell us when to repack objects, number of loose refs should tell
us when to pack refs, number of un-annotated commits should tell us when
to update the commit graph). The trouble is that some of those checks
are non-trivial.

In the long run, I think the plan is for the commit graph to allow cheap
incremental updating, which may make it reasonable to just update it
preemptively after every fetch/push.

-Peff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-24  8:26                             ` Jeff King
@ 2019-05-24  9:01                               ` Ævar Arnfjörð Bjarmason
  2019-05-24  9:29                                 ` SZEDER Gábor
  0 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-24  9:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, Junio C Hamano, Eric Wong, git


On Fri, May 24 2019, Jeff King wrote:

> On Fri, May 24, 2019 at 09:55:05AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > I'm not sure what tickles the bitmap-writer to fail so hard. Is it
>> > having too many refs? Weird patterns in the graph? Just a ton of
>> > commits?
>>
>> Ah, why did only this ancient (big) pack have a bitmap?
>>
>> The bitmap writer had never failed, this was just a repository where
>> some automation (on a dev/staging box) cloned a repo, and someone had
>> once run a manual "repack" to make make a pack with a bitmap.
>
> Just to be clear, by "fail" I didn't mean that the writer failed to
> produce. I just meant it had poor commit selection for its bitmap
> coverage. But...
>
>> Then as time passed that pack stayed around, and re-looking at this that
>> could have only happened because I had gc.bigPackThreshold turned on.
>>
>> I.e. without that we'd have eventually done a full repack, so the bitmap
>> would have gone away.
>>
>> So getting the repo into that state was a series of unlikely events.
>
> Ah, now that's interesting. The issue may have just been that we had a
> ton of un-bitmapped commits because they weren't in the bitmapped pack
> at all. The logic that Stolee pointed out earlier:
>
>           /*
>            * if we have a HAVES list, but none of those haves is contained
>            * in the packfile that has a bitmap, we don't have anything to
>            * optimize here
>            */
>           if (haves && !in_bitmapped_pack(bitmap_git, haves))
>                   goto cleanup;
>
> is pretty feeble. If you have even _one_ UNINTERESTING tip that's in the
> pack, we'll try to use bitmaps. And in your case, you almost certainly
> had old tags on both the client and the server there were in the old
> bitmapped pack, but then a huge swath of history that had happened since
> then. And it was that newer part of the graph that we had to walk to
> fill in the bitmap.
>
> So all of this makes pretty good sense to me now. Bitmaps work
> incrementally as you add new, un-bitmapped history. But the cost gets
> higher and higher the longer you go before repacking and generating new
> bitmaps. So your case was very similar to a repo that uses bitmaps but
> just hadn't packed in a really long time.
>
>> I think to the extent that this is an issue we can reproduce in the
>> future the proper fix for it in lieu of some easy fix in the bitmap code
>> would be to just teach "gc" to unlink old *.bitmap files if we detect
>> they're too stale.
>
> Yeah. This could happen if you simply accumulated history without ever
> running "gc". But in general we can probably assume that "gc" will run
> periodically (though there is a real blind spot if you push up a very
> huge chunk of history in one pack, since gc counts packs, not objects).
>
> I agree that if gc is deciding to leave a big pack, it should probably
> ditch the bitmaps for it.
>
>> I.e. we don't need to deal gracefully with some case where the bitmaps
>> just cover some tiny part of the graph, we can just teach "gc" to either
>> update them, or (if we're not currently writing them) unlink them.
>
> Unfortunately I don't think we can update them, because all of the
> reachable objects need to be in the same pack. So any scheme that isn't
> doing a periodic all-into-one repack probably shouldn't be using
> bitmaps. I wonder if we need to tweak Eric's bitmaps-by-default logic to
> better account for this.

I mean either we'd update them via repacking, or have some heuristic to
do away with them.

>> That seems to me to be a good idea in general, not just with bitmaps but
>> also the commit graph. If we're doing a GC and our current settings
>> aren't such that we'd update those files, shouldn't we just unlink them?
>
> I don't think the commit graph would suffer from this. It's not tied to
> a specific pack, so it can be freely updated on any gc. You still have
> the problem when gc does not run. It's possible that auto-gc should have
> separate thresholds for different activities (e.g., number of packs
> should tell us when to repack objects, number of loose refs should tell
> us when to pack refs, number of un-annotated commits should tell us when
> to update the commit graph). The trouble is that some of those checks
> are non-trivial.
>
> In the long run, I think the plan is for the commit graph to allow cheap
> incremental updating, which may make it reasonable to just update it
> preemptively after every fetch/push.

I don't think it's a performance problem to have an old commit-graph
lying around. But if you turn on the commit-graph, run gc a bunch, then
turn it off in config we'll have it lying around forever, even if you do
subsequent gc's.

So I think we should delete such things on the general principle that
the end-state of a gc's shouldn't be the accumulation of the values of
past configuration options if we can help it.

Maybe that screws over other users who did a "commit-graph write"
without setting gc.writeCommitGraph, but I think the only sane thing to
do is to make "gc" fully 'own' such things if its turned on at all. We
don't worry e.g. that we can't repack some recent pack because a user
might have just manually produced it with handcrafted love seconds
earlier.

Some of what you bring up is "let's do incremental GC". I agree, but
think that can be considered separately from what GC should do *when* it
runs, whether that's a "full" or "partial" run.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-24  9:01                               ` Ævar Arnfjörð Bjarmason
@ 2019-05-24  9:29                                 ` SZEDER Gábor
  2019-05-24 11:17                                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 57+ messages in thread
From: SZEDER Gábor @ 2019-05-24  9:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Derrick Stolee, Junio C Hamano, Eric Wong, git

On Fri, May 24, 2019 at 11:01:39AM +0200, Ævar Arnfjörð Bjarmason wrote:
> I don't think it's a performance problem to have an old commit-graph
> lying around. But if you turn on the commit-graph, run gc a bunch, then
> turn it off in config we'll have it lying around forever, even if you do
> subsequent gc's.
> 
> So I think we should delete such things on the general principle that
> the end-state of a gc's shouldn't be the accumulation of the values of
> past configuration options if we can help it.
> 
> Maybe that screws over other users who did a "commit-graph write"
> without setting gc.writeCommitGraph, but I think the only sane thing to
> do is to make "gc" fully 'own' such things if its turned on at all.

Note that there is 'core.commitGraph' as well; as long as it's
enabled, no commit-graph files should be deleted.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-24  7:24                         ` Jeff King
@ 2019-05-24 10:33                           ` Derrick Stolee
  0 siblings, 0 replies; 57+ messages in thread
From: Derrick Stolee @ 2019-05-24 10:33 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Eric Wong, git

On 5/24/2019 3:24 AM, Jeff King wrote:
> On Thu, May 23, 2019 at 08:53:56AM -0400, Derrick Stolee wrote:
> 
>>> I spent a while thinking and experimenting with this tonight. The result
>>> is the patch below. Ævar, do you still have a copy of the repo that
>>> misbehaved? I'd be curious to see how it fares.
>>
>> This patch caught my attention, and I think I understand some of the issues
>> at hand. I'm not well-versed specifically in Git's bitmap implementation.
>> The fundamental ideas are there, but my perspective is biased by my own
>> independent bitmap implementation for Azure Repos. What worked there may not
>> work at all here.
> 
> Thanks for looking at this. There are a lot of short-comings in the
> current bitmap implementation, so if there's a better way to do things,
> I'm not opposed to moving to a new format. :)
> 
>>> Finding the right trees to explore is a little tricky with bitmaps.  In
>>> a normal traversal, we consider the "edges" to be worth exploring.
>>> I.e., the places where an UNINTERESTING commit is the parent of an
>>> interesting one.
>>
>> This is the "commit frontier" which I bring up again below.
> 
> Right. I actually had trouble coming up with a succinct way of
> describing this, and stole the definition from your recent blog post. ;)
> 
>>> But with bitmaps, we don't have that information in the same way,
>>> because we're trying to avoid walking the commits in the first place. So
>>> e.g., in a history like this:
>>>
>>>   A--B--C
>>>       \
>>>        D
>>>
>>> Let's imagine we're computing "C..D", and that D has a bitmap on disk
>>> but C does not.
>>
>> (As I read your discussion below, I'm confused. For "C..D", C is a have
>> and D is a want. We should explore all the haves _first_, then walk the
>> wants, right?)
> 
> I think I may have confused things by starting my description with a
> hypothetical combined want/have walk. To take a step back: the problem
> we were discussing is that we spend a lot of time reading trees to fill
> in the "have" bitmap, even though most of those objects are unlikely to
> be in the "want" in the first place (only the frontier trees are really
> useful).

Thank you for resolving my confusion.

[snip]

> As I said earlier in the thread, I suspect our commit selection is not
> great. It's mostly some heuristics we threw together in 2013, and I
> don't think it was tested very thoroughly. So fixing that may be a
> simpler approach.

It's a hard problem! There are no _sure_ answers here, and what works in
some cases will probably not work in other cases.
 
> What I was wondering here was whether we could get an easy fix based on
> the same frontier ideas that the non-bitmap walk uses.

[snip]
 
> But doing that commit walk to find the frontier negates part of the
> purpose of using the bitmaps, which is avoiding even walking commits.
> Going back to a variant of our example:
> 
>   A -- B -- C_1 -- .. -- C_1000
>         \
> 	 D_1 -- .. -- D_1000
> 
> If we have a bitmap at C_1000 and D_1000, we don't have to walk any
> commits at all. But finding the frontier requires walking 2000 commits.

In my opinion, walking commits is easy (easier with the commit-graph)
and walking trees is hard. We probably _should_ do more commit walking if
it helps avoid walking some trees.
 
> Is there a way to find it with just bitmaps? You can certainly find the
> intersection, but you don't have any idea which of the many shared
> commits is the merge base. Of course in this example you don't actually
> care about the frontier (you have the full answer immediately). But how
> do you decide which way to optimize: try to avoid walking commits to
> get a quick answer from bitmaps, or prefer to walk some commits to find
> the frontier and avoid opening too many trees?

With a new perspective on the problem, I think perhaps trying to solve this
problem with bitmaps is incorrect. If you want to use bitmaps for C..D and
you don't have any bitmaps in the range D..C, then maybe we should just use
the old-fashioned method of walking trees? In your examples above, the
new method would walk trees for the commits in {B, C_i, D_j} while the
bitmap method would walk trees for the commits in {B, C_i, A_k}. I would
expect the list of {A_k} commits to be the largest in most cases.

The approach here would be to do the commit frontier walk, and check for
commits with bitmaps along the way. If we don't find an UNINTERESTING
bitmap, then use the non-bitmap way instead. Perhaps worth a shot.

I'll bring up this code snippet again:

        /*
         * if we have a HAVES list, but none of those haves is contained
         * in the packfile that has a bitmap, we don't have anything to
         * optimize here
         */
        if (haves && !in_bitmapped_pack(bitmap_git, haves))
                goto cleanup;

In addition to this, we can fill the "haves" set with the commits
in D..C (with boundary) and then check if any of those commits have a
precomputed bitmap. If not, goto cleanup.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-24  9:29                                 ` SZEDER Gábor
@ 2019-05-24 11:17                                   ` Ævar Arnfjörð Bjarmason
  2019-05-24 11:41                                     ` SZEDER Gábor
  0 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-24 11:17 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Jeff King, Derrick Stolee, Junio C Hamano, Eric Wong, git


On Fri, May 24 2019, SZEDER Gábor wrote:

> On Fri, May 24, 2019 at 11:01:39AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> I don't think it's a performance problem to have an old commit-graph
>> lying around. But if you turn on the commit-graph, run gc a bunch, then
>> turn it off in config we'll have it lying around forever, even if you do
>> subsequent gc's.
>>
>> So I think we should delete such things on the general principle that
>> the end-state of a gc's shouldn't be the accumulation of the values of
>> past configuration options if we can help it.
>>
>> Maybe that screws over other users who did a "commit-graph write"
>> without setting gc.writeCommitGraph, but I think the only sane thing to
>> do is to make "gc" fully 'own' such things if its turned on at all.
>
> Note that there is 'core.commitGraph' as well; as long as it's
> enabled, no commit-graph files should be deleted.

Why? If we won't update it or write it if it's not there, why keep it
around?

It means the commit-graph code and anything else (like bitmaps) needs to
deal with stale data for the common and default gc --auto case.

You also can't have e.g. a global core.commitGraph=true config along
with a per-repo gc.writeCommitGraph=true config do what you expect.

Now just because you wanted to write it for some you'll end up keeping
it around forever because you'd also want to optimistically always use
it if it's there.

Note that I'm talking about the *default* gc semantics, they don't have
to cover all advanced use-cases, just be good enough for most, and it's
also important that they're as simple as possible, and don't result in
stuff like "my performance sucks because I turned this config option on
once a year ago for 2 days".

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH] pack-bitmap: look for an uninteresting bitmap
  2019-05-23 11:30                     ` Jeff King
  2019-05-23 12:53                       ` Derrick Stolee
  2019-05-23 19:26                       ` Ævar Arnfjörð Bjarmason
@ 2019-05-24 11:31                       ` Derrick Stolee
  2 siblings, 0 replies; 57+ messages in thread
From: Derrick Stolee @ 2019-05-24 11:31 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, e, Derrick Stolee

If we try to do a range query such as C..D with topology as

 A_0 - ... - A_10000 - B - C_1 - ... - C_1000 - C
                         \
                           D_1 - ... - D_1000 - D

and none of the commits in {A_i, B, C_j, C} have a computed
bitmap, then we will very likely walk many many trees before
computing one for the "have" bitmap.

Instead, perform a commit walk to the boundary of C...D and
look for computed bitmaps in { B, C_j, C }. If any are found,
then it is worth starting from there and building a bitmap.
If not, revert to the old method of walking trees.

NOTE: this is only a proof-of-concept, as it fails a test in
t5310-pack-bitmaps.sh (clearly marked as a failure now).

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---

On 5/23/2019 7:30 AM, Jeff King wrote:> +	/*
> +	 * First traverse the relevant commits as we would for a normal
> +	 * traversal.
> +	 */
> +	while (commits.nr) {
> +		struct commit *commit = prio_queue_get(&commits);
> +		struct bitmap **dst_bitmap;

I was looking at this code again, and noticed this while() condition.

Shouldn't this use queue_has_nonstale() like in paint_down_to_common()?

Looking at the body of the loop, I don't see a way for the loop to stop
without it walking the entire history of C _and_ D.

Based on that, I wrote the patch below as an experiment. The
has_uninteresting_bitmap_in_frontier() shamelessly steals code from
paint_down_to_common(). Note the failing test, but perhaps there is
something salvageable from this.

Thanks,
-Stolee


 pack-bitmap.c           | 92 ++++++++++++++++++++++++++++++++++++++++-
 t/t5310-pack-bitmaps.sh |  2 +-
 2 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 6069b2fe55..1f4683663e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "repository.h"
 #include "object-store.h"
+#include "prio-queue.h"
 
 /*
  * An entry on the bitmap index, representing the bitmap for a given
@@ -679,6 +680,81 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
 	return 0;
 }
 
+#define PARENT1         (1u<<16)
+#define PARENT2         (1u<<17)
+#define STALE           (1u<<18)
+
+static const int all_flags = { PARENT1 | PARENT2 | STALE };
+
+static int queue_has_nonstale(struct prio_queue *queue)
+{
+	int i;
+	for (i = 0; i < queue->nr; i++) {
+		struct commit *commit = queue->array[i].data;
+		if (!(commit->object.flags & STALE))
+			return 1;
+	}
+	return 0;
+}
+
+static int has_uninteresting_bitmap_in_frontier(struct repository *r,
+						struct commit_list *list,
+						struct bitmap_index *bitmap_git)
+{
+	int res = 0;
+	struct commit_list *iter = list;
+	struct prio_queue queue = { compare_commits_by_commit_date };
+
+	while (iter) {
+		prio_queue_put(&queue, iter->item);
+		iter = iter->next;
+	}
+
+	while (queue_has_nonstale(&queue)) {
+		struct commit *commit = prio_queue_get(&queue);
+		struct commit_list *parents;
+		int flags;
+
+		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
+		if (flags == (PARENT1 | PARENT2)) {
+			/* Mark parents of a found merge stale */
+			flags |= STALE;
+		}
+
+		if (flags & PARENT1) {
+			khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid);
+
+			if (pos < kh_end(bitmap_git->bitmaps)) {
+				res = 1;
+				goto cleanup;
+			}
+		}
+
+		parents = commit->parents;
+		while (parents) {
+			struct commit *p = parents->item;
+			parents = parents->next;
+			if ((p->object.flags & flags) == flags)
+				continue;
+			if (repo_parse_commit(r, p))
+				goto cleanup;
+			p->object.flags |= flags;
+			prio_queue_put(&queue, p);
+		}
+	}
+
+cleanup:
+	clear_prio_queue(&queue);
+
+	iter = list;
+	while (iter) {
+		clear_commit_marks(iter->item, all_flags);
+		iter = iter->next;
+	}
+
+	return res;
+}
+
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 {
 	unsigned int i;
@@ -689,6 +765,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 	struct bitmap *wants_bitmap = NULL;
 	struct bitmap *haves_bitmap = NULL;
 
+	struct commit_list *commits = NULL;
+
 	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
 	/* try to open a bitmapped pack, but don't parse it yet
 	 * because we may not need to use it */
@@ -704,16 +782,22 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 		while (object->type == OBJ_TAG) {
 			struct tag *tag = (struct tag *) object;
 
-			if (object->flags & UNINTERESTING)
+			if (object->flags & UNINTERESTING) {
+				object->flags |= PARENT1;
 				object_list_insert(object, &haves);
-			else
+			} else {
+				object->flags |= PARENT2;
 				object_list_insert(object, &wants);
+			}
 
 			if (!tag->tagged)
 				die("bad tag");
 			object = parse_object_or_die(&tag->tagged->oid, NULL);
 		}
 
+		if (object->type == OBJ_COMMIT)
+			commit_list_insert((struct commit *)object, &commits);
+
 		if (object->flags & UNINTERESTING)
 			object_list_insert(object, &haves);
 		else
@@ -740,6 +824,10 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 	if (load_pack_bitmap(bitmap_git) < 0)
 		goto cleanup;
 
+	if (!has_uninteresting_bitmap_in_frontier(the_repository, commits, bitmap_git))
+		goto cleanup;
+
+	/* this is the real no-turning-back point! */
 	object_array_clear(&revs->pending);
 
 	if (haves) {
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index a26c8ba9a2..615608fbbf 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -422,7 +422,7 @@ test_expect_success 'fetch without bitmaps ignores delta against old base' '
 '
 
 # And do the same for the bitmap case, where we do expect to find the delta.
-test_expect_success 'fetch with bitmaps can reuse old base' '
+test_expect_failure 'fetch with bitmaps can reuse old base' '
 	test_config pack.usebitmaps true &&
 	test_when_finished "rm -rf client.git" &&
 	git init --bare client.git &&
-- 
2.22.0.rc0


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-24 11:17                                   ` Ævar Arnfjörð Bjarmason
@ 2019-05-24 11:41                                     ` SZEDER Gábor
  2019-05-24 11:58                                       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 57+ messages in thread
From: SZEDER Gábor @ 2019-05-24 11:41 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Derrick Stolee, Junio C Hamano, Eric Wong, git

On Fri, May 24, 2019 at 01:17:06PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, May 24 2019, SZEDER Gábor wrote:
> 
> > On Fri, May 24, 2019 at 11:01:39AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> I don't think it's a performance problem to have an old commit-graph
> >> lying around. But if you turn on the commit-graph, run gc a bunch, then
> >> turn it off in config we'll have it lying around forever, even if you do
> >> subsequent gc's.
> >>
> >> So I think we should delete such things on the general principle that
> >> the end-state of a gc's shouldn't be the accumulation of the values of
> >> past configuration options if we can help it.
> >>
> >> Maybe that screws over other users who did a "commit-graph write"
> >> without setting gc.writeCommitGraph, but I think the only sane thing to
> >> do is to make "gc" fully 'own' such things if its turned on at all.
> >
> > Note that there is 'core.commitGraph' as well; as long as it's
> > enabled, no commit-graph files should be deleted.
> 
> Why? If we won't update it or write it if it's not there, why keep it
> around?

To read it, if 'core.commitGraph' says that is should be read.

> It means the commit-graph code and anything else (like bitmaps) needs to
> deal with stale data for the common and default gc --auto case.
> 
> You also can't have e.g. a global core.commitGraph=true config along
> with a per-repo gc.writeCommitGraph=true config do what you expect.
> 
> Now just because you wanted to write it for some you'll end up keeping
> it around forever because you'd also want to optimistically always use
> it if it's there.

This is exactly what I expect it to do.


> Note that I'm talking about the *default* gc semantics, they don't have
> to cover all advanced use-cases, just be good enough for most, and it's
> also important that they're as simple as possible, and don't result in
> stuff like "my performance sucks because I turned this config option on
> once a year ago for 2 days".

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-24 11:41                                     ` SZEDER Gábor
@ 2019-05-24 11:58                                       ` Ævar Arnfjörð Bjarmason
  2019-05-24 12:34                                         ` SZEDER Gábor
  0 siblings, 1 reply; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-24 11:58 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Jeff King, Derrick Stolee, Junio C Hamano, Eric Wong, git


On Fri, May 24 2019, SZEDER Gábor wrote:

> On Fri, May 24, 2019 at 01:17:06PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, May 24 2019, SZEDER Gábor wrote:
>>
>> > On Fri, May 24, 2019 at 11:01:39AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >> I don't think it's a performance problem to have an old commit-graph
>> >> lying around. But if you turn on the commit-graph, run gc a bunch, then
>> >> turn it off in config we'll have it lying around forever, even if you do
>> >> subsequent gc's.
>> >>
>> >> So I think we should delete such things on the general principle that
>> >> the end-state of a gc's shouldn't be the accumulation of the values of
>> >> past configuration options if we can help it.
>> >>
>> >> Maybe that screws over other users who did a "commit-graph write"
>> >> without setting gc.writeCommitGraph, but I think the only sane thing to
>> >> do is to make "gc" fully 'own' such things if its turned on at all.
>> >
>> > Note that there is 'core.commitGraph' as well; as long as it's
>> > enabled, no commit-graph files should be deleted.
>>
>> Why? If we won't update it or write it if it's not there, why keep it
>> around?
>
> To read it, if 'core.commitGraph' says that is should be read.
>
>> It means the commit-graph code and anything else (like bitmaps) needs to
>> deal with stale data for the common and default gc --auto case.
>>
>> You also can't have e.g. a global core.commitGraph=true config along
>> with a per-repo gc.writeCommitGraph=true config do what you expect.
>>
>> Now just because you wanted to write it for some you'll end up keeping
>> it around forever because you'd also want to optimistically always use
>> it if it's there.
>
> This is exactly what I expect it to do.

Do you also expect base packs with an associated bitmap to have an
implicit *.keep flag under gc with pack.writeBitmaps=false and
pack.useBitmaps=true?

>> Note that I'm talking about the *default* gc semantics, they don't have
>> to cover all advanced use-cases, just be good enough for most, and it's
>> also important that they're as simple as possible, and don't result in
>> stuff like "my performance sucks because I turned this config option on
>> once a year ago for 2 days".

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-24 11:58                                       ` Ævar Arnfjörð Bjarmason
@ 2019-05-24 12:34                                         ` SZEDER Gábor
  2019-05-24 13:41                                           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 57+ messages in thread
From: SZEDER Gábor @ 2019-05-24 12:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Derrick Stolee, Junio C Hamano, Eric Wong, git

On Fri, May 24, 2019 at 01:58:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, May 24 2019, SZEDER Gábor wrote:
> 
> > On Fri, May 24, 2019 at 01:17:06PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Fri, May 24 2019, SZEDER Gábor wrote:
> >>
> >> > On Fri, May 24, 2019 at 11:01:39AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> >> I don't think it's a performance problem to have an old commit-graph
> >> >> lying around. But if you turn on the commit-graph, run gc a bunch, then
> >> >> turn it off in config we'll have it lying around forever, even if you do
> >> >> subsequent gc's.
> >> >>
> >> >> So I think we should delete such things on the general principle that
> >> >> the end-state of a gc's shouldn't be the accumulation of the values of
> >> >> past configuration options if we can help it.
> >> >>
> >> >> Maybe that screws over other users who did a "commit-graph write"
> >> >> without setting gc.writeCommitGraph, but I think the only sane thing to
> >> >> do is to make "gc" fully 'own' such things if its turned on at all.
> >> >
> >> > Note that there is 'core.commitGraph' as well; as long as it's
> >> > enabled, no commit-graph files should be deleted.
> >>
> >> Why? If we won't update it or write it if it's not there, why keep it
> >> around?
> >
> > To read it, if 'core.commitGraph' says that is should be read.
> >
> >> It means the commit-graph code and anything else (like bitmaps) needs to
> >> deal with stale data for the common and default gc --auto case.
> >>
> >> You also can't have e.g. a global core.commitGraph=true config along
> >> with a per-repo gc.writeCommitGraph=true config do what you expect.
> >>
> >> Now just because you wanted to write it for some you'll end up keeping
> >> it around forever because you'd also want to optimistically always use
> >> it if it's there.
> >
> > This is exactly what I expect it to do.
> 
> Do you also expect base packs with an associated bitmap to have an
> implicit *.keep flag under gc with pack.writeBitmaps=false and
> pack.useBitmaps=true?

I don't understand what an "implicit *.keep flag" is.  However, since
a reachability bitmap is always associated with a pack, but the
commit-graph is not, I don't think this is a valid comparison.

> >> Note that I'm talking about the *default* gc semantics, they don't have
> >> to cover all advanced use-cases, just be good enough for most, and it's
> >> also important that they're as simple as possible, and don't result in
> >> stuff like "my performance sucks because I turned this config option on
> >> once a year ago for 2 days".

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3] repack: enable bitmaps by default on bare repos
  2019-05-24 12:34                                         ` SZEDER Gábor
@ 2019-05-24 13:41                                           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 57+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-24 13:41 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Jeff King, Derrick Stolee, Junio C Hamano, Eric Wong, git


On Fri, May 24 2019, SZEDER Gábor wrote:

> On Fri, May 24, 2019 at 01:58:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, May 24 2019, SZEDER Gábor wrote:
>>
>> > On Fri, May 24, 2019 at 01:17:06PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >>
>> >> On Fri, May 24 2019, SZEDER Gábor wrote:
>> >>
>> >> > On Fri, May 24, 2019 at 11:01:39AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >> >> I don't think it's a performance problem to have an old commit-graph
>> >> >> lying around. But if you turn on the commit-graph, run gc a bunch, then
>> >> >> turn it off in config we'll have it lying around forever, even if you do
>> >> >> subsequent gc's.
>> >> >>
>> >> >> So I think we should delete such things on the general principle that
>> >> >> the end-state of a gc's shouldn't be the accumulation of the values of
>> >> >> past configuration options if we can help it.
>> >> >>
>> >> >> Maybe that screws over other users who did a "commit-graph write"
>> >> >> without setting gc.writeCommitGraph, but I think the only sane thing to
>> >> >> do is to make "gc" fully 'own' such things if its turned on at all.
>> >> >
>> >> > Note that there is 'core.commitGraph' as well; as long as it's
>> >> > enabled, no commit-graph files should be deleted.
>> >>
>> >> Why? If we won't update it or write it if it's not there, why keep it
>> >> around?
>> >
>> > To read it, if 'core.commitGraph' says that is should be read.
>> >
>> >> It means the commit-graph code and anything else (like bitmaps) needs to
>> >> deal with stale data for the common and default gc --auto case.
>> >>
>> >> You also can't have e.g. a global core.commitGraph=true config along
>> >> with a per-repo gc.writeCommitGraph=true config do what you expect.
>> >>
>> >> Now just because you wanted to write it for some you'll end up keeping
>> >> it around forever because you'd also want to optimistically always use
>> >> it if it's there.
>> >
>> > This is exactly what I expect it to do.
>>
>> Do you also expect base packs with an associated bitmap to have an
>> implicit *.keep flag under gc with pack.writeBitmaps=false and
>> pack.useBitmaps=true?
>
> I don't understand what an "implicit *.keep flag" is[...]

A .keep means we keep the pack, and e.g. gc.bigPackThreshold is
effectively an implicit *.keep flag on a pack matching some critera,
which in this case caused this issue of a stale *.bitmap file (since the
pack wasn't touched, neither was the bitmap).

> [...]However, since a reachability bitmap is always associated with a
> pack, but the commit-graph is not, I don't think this is a valid
> comparison.

I don't either, I'm just trying to understand where you're coming from,
and I still don't.

That bitmaps are associated with specific packs and the commit graph
isn't is an internal implementation detail. Users who care about that
distinction either don't use "git gc" or would be willing to tweak its
settings.

For the rest I think removing existing side-indexes is a good default
for the practical reasons mentioned upthread.

>> >> Note that I'm talking about the *default* gc semantics, they don't have
>> >> to cover all advanced use-cases, just be good enough for most, and it's
>> >> also important that they're as simple as possible, and don't result in
>> >> stuff like "my performance sucks because I turned this config option on
>> >> once a year ago for 2 days".

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2019-05-24 13:41 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-14  4:31 [PATCH 0/3] some prune optimizations Jeff King
2019-02-14  4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King
2019-02-14 10:54   ` Eric Sunshine
2019-02-14 11:07     ` Jeff King
2019-02-14  4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King
2019-03-09  2:49   ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong
2019-03-10 23:39     ` Jeff King
2019-03-12  3:13       ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong
2019-03-12  9:07         ` Ævar Arnfjörð Bjarmason
2019-03-12 10:49         ` Jeff King
2019-03-12 12:05           ` Jeff King
2019-03-13  1:51           ` Eric Wong
2019-03-13 14:54             ` Jeff King
2019-03-14  9:12               ` [PATCH v3] " Eric Wong
2019-03-14 16:02                 ` Jeff King
2019-03-15  6:21                   ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King
2019-03-15  6:22                     ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King
2019-03-15 13:25                       ` SZEDER Gábor
2019-03-15 18:36                         ` Jeff King
2019-03-15  6:25                     ` [PATCH 2/2] pack-objects: default to writing bitmap hash-cache Jeff King
2019-04-09 15:10                 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason
2019-04-10 22:57                   ` Jeff King
2019-04-25  7:16                     ` Junio C Hamano
2019-05-04  1:37                       ` Jeff King
2019-05-04  6:52                         ` Ævar Arnfjörð Bjarmason
2019-05-04 13:23                           ` SZEDER Gábor
2019-05-08 20:17                             ` Ævar Arnfjörð Bjarmason
2019-05-09  4:24                               ` Junio C Hamano
2019-05-07  7:45                           ` Jeff King
2019-05-07  8:12                             ` Ævar Arnfjörð Bjarmason
2019-05-08  7:11                               ` Jeff King
2019-05-08 14:20                                 ` Derrick Stolee
2019-05-08 16:13                                 ` Ævar Arnfjörð Bjarmason
2019-05-08 22:25                                   ` Jeff King
2019-05-23 11:30                     ` Jeff King
2019-05-23 12:53                       ` Derrick Stolee
2019-05-24  7:24                         ` Jeff King
2019-05-24 10:33                           ` Derrick Stolee
2019-05-23 19:26                       ` Ævar Arnfjörð Bjarmason
2019-05-24  7:27                         ` Jeff King
2019-05-24  7:55                           ` Ævar Arnfjörð Bjarmason
2019-05-24  8:26                             ` Jeff King
2019-05-24  9:01                               ` Ævar Arnfjörð Bjarmason
2019-05-24  9:29                                 ` SZEDER Gábor
2019-05-24 11:17                                   ` Ævar Arnfjörð Bjarmason
2019-05-24 11:41                                     ` SZEDER Gábor
2019-05-24 11:58                                       ` Ævar Arnfjörð Bjarmason
2019-05-24 12:34                                         ` SZEDER Gábor
2019-05-24 13:41                                           ` Ævar Arnfjörð Bjarmason
2019-05-24 11:31                       ` [PATCH] pack-bitmap: look for an uninteresting bitmap Derrick Stolee
2019-04-15 15:00   ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee
2019-04-18 19:49     ` Jeff King
2019-04-18 20:08       ` [PATCH] t5304: add a test for pruning with bitmaps Jeff King
2019-04-20  1:01         ` Derrick Stolee
2019-04-20  3:24           ` Jeff King
2019-04-20 21:01             ` Derrick Stolee
2019-02-14  4:38 ` [PATCH 3/3] prune: check SEEN flag for reachability Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.