git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/13] combining object filters and bitmaps
@ 2020-02-13  2:15 Jeff King
  2020-02-13  2:16 ` [PATCH 01/13] pack-bitmap: factor out type iterator initialization Jeff King
                   ` (13 more replies)
  0 siblings, 14 replies; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:15 UTC (permalink / raw)
  To: git

Partial clones are nice for saving bandwidth, but they can actually be
more expensive in CPU because many of the filters require the server to
do extra work to figure out which objects to send.

In particular, it's impossible for many filters to work with
reachability bitmaps, because the bitmap result doesn't provide enough
information. E.g., we can't evaluate a path-based sparse filter without
actually walking the trees to find out which objects are referred to at
which paths.

But in some cases the bitmaps _do_ give enough information, and it's
just a weakness of our implementation. E.g., it's easy to remove all of
the blobs from a bitmap result. So this patch series implements some of
the easier filters, while providing a framework for the bitmap code to
say "nope, that filter is too complex for me" so we can fallback to a
non-bitmap path.

One thing I'm not wild about is that this basically creates a parallel
universe of filter implementations, as nothing is shared with the
code in list-objects-filter.c (beyond the option parsing stage).
But if you look at the implementations in patches 11 and 12, I think
that's how it has to be in order to be efficient. The existing filters
are inherently about hooking into the traversal, and the whole point of
reachability bitmaps is to avoid that traversal.

So I think the best we can do is carefully write bitmap-aware versions
of specific filters, and then fall back to a regular traversal as
necessary. I'd like to also give server providers some tools for
limiting which filters they're willing to support (so they can protect
themselves from people running expensive filters), but I've left that as
future work for now.

Here's an outline of the series:

  [01/13]: pack-bitmap: factor out type iterator initialization
  [02/13]: pack-bitmap: fix leak of haves/wants object lists

    Some related fixes and cleanups in the bitmap area.

  [03/13]: rev-list: fallback to non-bitmap traversal when filtering
  [04/13]: rev-list: consolidate bitmap-disabling options
  [05/13]: rev-list: factor out bitmap-optimized routines

    It's much easier to exercise this code via rev-list, so this is some
    refactoring around the bitmap support there.

  [06/13]: rev-list: make --count work with --objects
  [07/13]: rev-list: allow bitmaps when counting objects

    These two aren't strictly necessary for the topic at hand, but have
    been bugging me for years. And they were made simple by the earlier
    refactoring, and provide a handy test mechanism.

  [08/13]: pack-bitmap: basic noop bitmap filter infrastructure
  [09/13]: rev-list: use bitmap filters for traversal
  [10/13]: bitmap: add bitmap_unset() function

    More refactoring in preparation for supporting filters.

  [11/13]: pack-bitmap: implement BLOB_NONE filtering
  [12/13]: pack-bitmap: implement BLOB_LIMIT filtering

    Some actual filters (tested with rev-list support added earlier).

  [13/13]: pack-objects: support filters with bitmaps

    And finally turning on support for pack generation. The final
    numeric result is (using linux.git with blob:none):

      Test                              HEAD^               HEAD
      ------------------------------------------------------------------------------
      5310.9: simulated partial clone   38.94(37.28+5.87)   11.06(11.27+4.07) -71.6%

    which isn't too shabby.

 builtin/pack-objects.c             |   3 +-
 builtin/rev-list.c                 | 124 +++++++++++---
 ewah/bitmap.c                      |   8 +
 ewah/ewok.h                        |   1 +
 object.c                           |   9 ++
 object.h                           |   2 +
 pack-bitmap.c                      | 252 +++++++++++++++++++++++++----
 pack-bitmap.h                      |   4 +-
 reachable.c                        |   2 +-
 t/perf/p5310-pack-bitmaps.sh       |  14 ++
 t/t5310-pack-bitmaps.sh            |  20 +++
 t/t6000-rev-list-misc.sh           |  12 ++
 t/t6113-rev-list-bitmap-filters.sh |  77 +++++++++
 13 files changed, 468 insertions(+), 60 deletions(-)
 create mode 100755 t/t6113-rev-list-bitmap-filters.sh

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 01/13] pack-bitmap: factor out type iterator initialization
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
@ 2020-02-13  2:16 ` Jeff King
  2020-02-13 17:45   ` Junio C Hamano
  2020-02-13  2:16 ` [PATCH 02/13] pack-bitmap: fix leak of haves/wants object lists Jeff King
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:16 UTC (permalink / raw)
  To: git

When count_object_type() wants to iterate over the bitmap of all objects
of a certain type, we have to pair up OBJ_COMMIT with bitmap->commits,
and so forth. Since we're about to add more code to iterate over these
bitmaps, let's pull the initialization into its own function.

We can also use this to simplify traverse_bitmap_commit_list(). It
accomplishes the same thing by manually passing the object type and the
bitmap to show_objects_for_type(), but using our helper we just need the
object type.

Note there's one small code change here: previously we'd simply return
zero when counting an unknown object type, and now we'll BUG(). This
shouldn't matter in practice, as all of the callers pass in only usual
commit/tree/blob/tag types.

Signed-off-by: Jeff King <peff@peff.net>
---
 pack-bitmap.c | 63 +++++++++++++++++++++++++++------------------------
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index e07c798879..9ca356ee29 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -616,9 +616,35 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
 	}
 }
 
+static void init_type_iterator(struct ewah_iterator *it,
+			       struct bitmap_index *bitmap_git,
+			       enum object_type type)
+{
+	switch (type) {
+	case OBJ_COMMIT:
+		ewah_iterator_init(it, bitmap_git->commits);
+		break;
+
+	case OBJ_TREE:
+		ewah_iterator_init(it, bitmap_git->trees);
+		break;
+
+	case OBJ_BLOB:
+		ewah_iterator_init(it, bitmap_git->blobs);
+		break;
+
+	case OBJ_TAG:
+		ewah_iterator_init(it, bitmap_git->tags);
+		break;
+
+	default:
+		BUG("object type %d not stored by bitmap type index", type);
+		break;
+	}
+}
+
 static void show_objects_for_type(
 	struct bitmap_index *bitmap_git,
-	struct ewah_bitmap *type_filter,
 	enum object_type object_type,
 	show_reachable_fn show_reach)
 {
@@ -633,7 +659,7 @@ static void show_objects_for_type(
 	if (bitmap_git->reuse_objects == bitmap_git->pack->num_objects)
 		return;
 
-	ewah_iterator_init(&it, type_filter);
+	init_type_iterator(&it, bitmap_git, object_type);
 
 	while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
 		eword_t word = objects->words[i] & filter;
@@ -835,14 +861,10 @@ void traverse_bitmap_commit_list(struct bitmap_index *bitmap_git,
 {
 	assert(bitmap_git->result);
 
-	show_objects_for_type(bitmap_git, bitmap_git->commits,
-		OBJ_COMMIT, show_reachable);
-	show_objects_for_type(bitmap_git, bitmap_git->trees,
-		OBJ_TREE, show_reachable);
-	show_objects_for_type(bitmap_git, bitmap_git->blobs,
-		OBJ_BLOB, show_reachable);
-	show_objects_for_type(bitmap_git, bitmap_git->tags,
-		OBJ_TAG, show_reachable);
+	show_objects_for_type(bitmap_git, OBJ_COMMIT, show_reachable);
+	show_objects_for_type(bitmap_git, OBJ_TREE, show_reachable);
+	show_objects_for_type(bitmap_git, OBJ_BLOB, show_reachable);
+	show_objects_for_type(bitmap_git, OBJ_TAG, show_reachable);
 
 	show_extended_objects(bitmap_git, show_reachable);
 }
@@ -857,26 +879,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
 	struct ewah_iterator it;
 	eword_t filter;
 
-	switch (type) {
-	case OBJ_COMMIT:
-		ewah_iterator_init(&it, bitmap_git->commits);
-		break;
-
-	case OBJ_TREE:
-		ewah_iterator_init(&it, bitmap_git->trees);
-		break;
-
-	case OBJ_BLOB:
-		ewah_iterator_init(&it, bitmap_git->blobs);
-		break;
-
-	case OBJ_TAG:
-		ewah_iterator_init(&it, bitmap_git->tags);
-		break;
-
-	default:
-		return 0;
-	}
+	init_type_iterator(&it, bitmap_git, type);
 
 	while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
 		eword_t word = objects->words[i++] & filter;
-- 
2.25.0.785.g49bcbe7794


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 02/13] pack-bitmap: fix leak of haves/wants object lists
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
  2020-02-13  2:16 ` [PATCH 01/13] pack-bitmap: factor out type iterator initialization Jeff King
@ 2020-02-13  2:16 ` Jeff King
  2020-02-13 18:12   ` Junio C Hamano
  2020-02-13  2:17 ` [PATCH 03/13] rev-list: fallback to non-bitmap traversal when filtering Jeff King
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:16 UTC (permalink / raw)
  To: git

When we do a bitmap-aware revision traversal, we create an object_list
for each of the "haves" and "wants" tips. After creating the result
bitmaps these are no longer needed or used, but we never free the list
memory.

Signed-off-by: Jeff King <peff@peff.net>
---
 object.c      | 9 +++++++++
 object.h      | 2 ++
 pack-bitmap.c | 5 +++++
 3 files changed, 16 insertions(+)

diff --git a/object.c b/object.c
index 142ef69399..4d11949b38 100644
--- a/object.c
+++ b/object.c
@@ -307,6 +307,15 @@ int object_list_contains(struct object_list *list, struct object *obj)
 	return 0;
 }
 
+void object_list_free(struct object_list **list)
+{
+	while (*list) {
+		struct object_list *p = *list;
+		*list = p->next;
+		free(p);
+	}
+}
+
 /*
  * A zero-length string to which object_array_entry::name can be
  * initialized without requiring a malloc/free.
diff --git a/object.h b/object.h
index 25f5ab3d54..2dbabfca0a 100644
--- a/object.h
+++ b/object.h
@@ -151,6 +151,8 @@ struct object_list *object_list_insert(struct object *item,
 
 int object_list_contains(struct object_list *list, struct object *obj);
 
+void object_list_free(struct object_list **list);
+
 /* Object array handling .. */
 void add_object_array(struct object *obj, const char *name, struct object_array *array);
 void add_object_array_with_path(struct object *obj, const char *name, struct object_array *array, unsigned mode, const char *path);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9ca356ee29..6c06099dc7 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -787,10 +787,15 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
 
+	object_list_free(&wants);
+	object_list_free(&haves);
+
 	return bitmap_git;
 
 cleanup:
 	free_bitmap_index(bitmap_git);
+	object_list_free(&wants);
+	object_list_free(&haves);
 	return NULL;
 }
 
-- 
2.25.0.785.g49bcbe7794


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 03/13] rev-list: fallback to non-bitmap traversal when filtering
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
  2020-02-13  2:16 ` [PATCH 01/13] pack-bitmap: factor out type iterator initialization Jeff King
  2020-02-13  2:16 ` [PATCH 02/13] pack-bitmap: fix leak of haves/wants object lists Jeff King
@ 2020-02-13  2:17 ` Jeff King
  2020-02-13 18:19   ` Junio C Hamano
  2020-02-13  2:17 ` [PATCH 04/13] rev-list: consolidate bitmap-disabling options Jeff King
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:17 UTC (permalink / raw)
  To: git

The "--use-bitmap-index" option is usually aspirational: if we have
bitmaps and the request can be fulfilled more quickly using them we'll
do so, but otherwise fall back to a non-bitmap traversal.

The exception is object filtering, which explicitly dies if the two
options are combined. Let's convert this to the usual fallback behavior.
This is a minor convenience for now (since the caller can easily know
that --filter and --use-bitmap-index don't combine), but will become
much more useful as we start to support _some_ filters with bitmaps, but
not others.

The test infrastructure here is bigger than necessary for checking this
one small feature. But it will serve as the basis for more filtering
bitmap tests in future patches.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c                 |  4 ++--
 t/t6113-rev-list-bitmap-filters.sh | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100755 t/t6113-rev-list-bitmap-filters.sh

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index e28d62ec64..bce406bd1e 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -521,8 +521,8 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (revs.show_notes)
 		die(_("rev-list does not support display of notes"));
 
-	if (filter_options.choice && use_bitmap_index)
-		die(_("cannot combine --use-bitmap-index with object filtering"));
+	if (filter_options.choice)
+		use_bitmap_index = 0;
 
 	save_commit_buffer = (revs.verbose_header ||
 			      revs.grep_filter.pattern_list ||
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
new file mode 100755
index 0000000000..977f8d0930
--- /dev/null
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+test_description='rev-list combining bitmaps and filters'
+. ./test-lib.sh
+
+test_expect_success 'set up bitmapped repo' '
+	# one commit will have bitmaps, the other will not
+	test_commit one &&
+	git repack -adb &&
+	test_commit two
+'
+
+test_expect_success 'filters fallback to non-bitmap traversal' '
+	# use a path-based filter, since they are inherently incompatible with
+	# bitmaps (i.e., this test will never get confused by later code to
+	# combine the features)
+	filter=$(echo "!one" | git hash-object -w --stdin) &&
+	git rev-list --objects --filter=sparse:oid=$filter HEAD >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=sparse:oid=$filter HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.25.0.785.g49bcbe7794


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 04/13] rev-list: consolidate bitmap-disabling options
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
                   ` (2 preceding siblings ...)
  2020-02-13  2:17 ` [PATCH 03/13] rev-list: fallback to non-bitmap traversal when filtering Jeff King
@ 2020-02-13  2:17 ` Jeff King
  2020-02-13  2:18 ` [PATCH 05/13] rev-list: factor out bitmap-optimized routines Jeff King
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:17 UTC (permalink / raw)
  To: git

A few options are incompatible with bitmaps, like --filter or
pathspec-based pruning. Let's put these checks all up front, so that
further code can trust the use_bitmap_index as the final word.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index bce406bd1e..9635b544e3 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -521,7 +521,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (revs.show_notes)
 		die(_("rev-list does not support display of notes"));
 
-	if (filter_options.choice)
+	if (filter_options.choice || revs.prune)
 		use_bitmap_index = 0;
 
 	save_commit_buffer = (revs.verbose_header ||
@@ -533,7 +533,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (show_progress)
 		progress = start_delayed_progress(show_progress, 0);
 
-	if (use_bitmap_index && !revs.prune) {
+	if (use_bitmap_index) {
 		if (revs.count && !revs.left_right && !revs.cherry_mark) {
 			uint32_t commit_count;
 			int max_count = revs.max_count;
-- 
2.25.0.785.g49bcbe7794


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 05/13] rev-list: factor out bitmap-optimized routines
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
                   ` (3 preceding siblings ...)
  2020-02-13  2:17 ` [PATCH 04/13] rev-list: consolidate bitmap-disabling options Jeff King
@ 2020-02-13  2:18 ` Jeff King
  2020-02-13 18:34   ` Junio C Hamano
  2020-02-13  2:19 ` [PATCH 06/13] rev-list: make --count work with --objects Jeff King
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:18 UTC (permalink / raw)
  To: git

There are a few operations in rev-list that are optimized for bitmaps.
Rather than having the code inline in cmd_rev_list(), let's move them
into helpers. This not only makes the flow of the main function simpler,
but it lets us replace the complex "can we do the optimization?"
conditionals with a series of early returns from the functions. That
also makes it easy to add comments explaining those conditions.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c | 88 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 67 insertions(+), 21 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 9635b544e3..c2daf40449 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -364,6 +364,69 @@ static inline int parse_missing_action_value(const char *value)
 	return 0;
 }
 
+static int try_bitmap_count(struct rev_info *revs)
+{
+	uint32_t commit_count;
+	int max_count;
+	struct bitmap_index *bitmap_git;
+
+	/* This function only handles counting, not general traversal. */
+	if (!revs->count)
+		return -1;
+
+	/*
+	 * A bitmap result can't know left/right, etc, because we don't
+	 * actually traverse.
+	 */
+	if (revs->left_right || revs->cherry_mark)
+		return -1;
+
+	/*
+	 * This must be saved before doing any walking, since the revision
+	 * machinery will count it down to zero while traversing.
+	 */
+	max_count = revs->max_count;
+
+	bitmap_git = prepare_bitmap_walk(revs);
+	if (!bitmap_git)
+		return -1;
+
+	count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
+	if (max_count >= 0 && max_count < commit_count)
+		commit_count = max_count;
+
+	printf("%d\n", commit_count);
+	free_bitmap_index(bitmap_git);
+	return 0;
+}
+
+static int try_bitmap_traversal(struct rev_info *revs)
+{
+	struct bitmap_index *bitmap_git;
+
+	/*
+	 * We can't use a bitmap result with a traversal limit, since the set
+	 * of commits we'd get would be essentially random.
+	 */
+	if (revs->max_count >= 0)
+		return -1;
+
+	/*
+	 * Our bitmap result will return all objects, and we're not
+	 * yet prepared to show only particular types.
+	 */
+	if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
+		return -1;
+
+	bitmap_git = prepare_bitmap_walk(revs);
+	if (!bitmap_git)
+		return -1;
+
+	traverse_bitmap_commit_list(bitmap_git, &show_object_fast);
+	free_bitmap_index(bitmap_git);
+	return 0;
+}
+
 int cmd_rev_list(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
@@ -534,27 +597,10 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		progress = start_delayed_progress(show_progress, 0);
 
 	if (use_bitmap_index) {
-		if (revs.count && !revs.left_right && !revs.cherry_mark) {
-			uint32_t commit_count;
-			int max_count = revs.max_count;
-			struct bitmap_index *bitmap_git;
-			if ((bitmap_git = prepare_bitmap_walk(&revs))) {
-				count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
-				if (max_count >= 0 && max_count < commit_count)
-					commit_count = max_count;
-				printf("%d\n", commit_count);
-				free_bitmap_index(bitmap_git);
-				return 0;
-			}
-		} else if (revs.max_count < 0 &&
-			   revs.tag_objects && revs.tree_objects && revs.blob_objects) {
-			struct bitmap_index *bitmap_git;
-			if ((bitmap_git = prepare_bitmap_walk(&revs))) {
-				traverse_bitmap_commit_list(bitmap_git, &show_object_fast);
-				free_bitmap_index(bitmap_git);
-				return 0;
-			}
-		}
+		if (!try_bitmap_count(&revs))
+			return 0;
+		if (!try_bitmap_traversal(&revs))
+			return 0;
 	}
 
 	if (prepare_revision_walk(&revs))
-- 
2.25.0.785.g49bcbe7794


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 06/13] rev-list: make --count work with --objects
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
                   ` (4 preceding siblings ...)
  2020-02-13  2:18 ` [PATCH 05/13] rev-list: factor out bitmap-optimized routines Jeff King
@ 2020-02-13  2:19 ` Jeff King
  2020-02-13 19:14   ` Junio C Hamano
  2020-02-13  2:20 ` [PATCH 07/13] rev-list: allow bitmaps when counting objects Jeff King
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:19 UTC (permalink / raw)
  To: git

The current behavior from "rev-list --count --objects" is nonsensical:
we enumerate all of the objects except commits, but then give a count of
commits. This wasn't planned, and is just what the code happens to do.

Instead, let's give the answer the user almost certainly wanted: the
full count of objects.

Note that there are more complicated cases around cherry-marking, etc.
We'll punt on those for now, but let the user know that we can't produce
an answer (rather than giving them something useless).

We'll test both the new feature as well as a vanilla --count of commits,
since that surprisingly doesn't seem to be covered in the existing
tests.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c       | 13 +++++++++++++
 t/t6000-rev-list-misc.sh | 12 ++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index c2daf40449..b4fd507f25 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -253,11 +253,19 @@ static int finish_object(struct object *obj, const char *name, void *cb_data)
 static void show_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
+	struct rev_info *revs = info->revs;
+
 	if (finish_object(obj, name, cb_data))
 		return;
 	display_progress(progress, ++progress_counter);
 	if (info->flags & REV_LIST_QUIET)
 		return;
+
+	if (revs->count) {
+		revs->count_right++;
+		return;
+	}
+
 	if (arg_show_object_names)
 		show_object_with_name(stdout, obj, name);
 	else
@@ -584,6 +592,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (revs.show_notes)
 		die(_("rev-list does not support display of notes"));
 
+	if (revs.count &&
+	    (revs.tag_objects || revs.tree_objects || revs.blob_objects) &&
+	    (revs.left_right || revs.cherry_mark))
+		die(_("marked counting is incompatible with --objects"));
+
 	if (filter_options.choice || revs.prune)
 		use_bitmap_index = 0;
 
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index b8cf82349b..383f2c457d 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -148,4 +148,16 @@ test_expect_success 'rev-list --end-of-options' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rev-list --count' '
+	count=$(git rev-list --count HEAD) &&
+	git rev-list HEAD >actual &&
+	test_line_count = $count actual
+'
+
+test_expect_success 'rev-list --count --objects' '
+	count=$(git rev-list --count --objects HEAD) &&
+	git rev-list --objects HEAD >actual &&
+	test_line_count = $count actual
+'
+
 test_done
-- 
2.25.0.785.g49bcbe7794


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 07/13] rev-list: allow bitmaps when counting objects
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
                   ` (5 preceding siblings ...)
  2020-02-13  2:19 ` [PATCH 06/13] rev-list: make --count work with --objects Jeff King
@ 2020-02-13  2:20 ` Jeff King
  2020-02-13 21:47   ` Junio C Hamano
  2020-02-13  2:20 ` [PATCH 08/13] pack-bitmap: basic noop bitmap filter infrastructure Jeff King
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:20 UTC (permalink / raw)
  To: git

The prior commit taught "--count --objects" to work without bitmaps. We
should be able to get the same answer much more quickly with bitmaps.

Note that we punt on the max_count case here. This perhaps _could_ be
made to work if we find all of the boundary commits and treat them as
UNINTERESTING, subtracting them (and their reachable objects) from the
set we return. That implies an actual commit traversal, but we'd still
be faster due to avoiding opening up any trees. Given the complexity and
the fact that anyone is unlikely to want this, it makes sense to just
fall back to the non-bitmap case for now.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c      | 21 ++++++++++++++++++---
 t/t5310-pack-bitmaps.sh |  6 ++++++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index b4fd507f25..ce1cfc7da0 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -374,7 +374,10 @@ static inline int parse_missing_action_value(const char *value)
 
 static int try_bitmap_count(struct rev_info *revs)
 {
-	uint32_t commit_count;
+	uint32_t commit_count = 0,
+		 tag_count = 0,
+		 tree_count = 0,
+		 blob_count = 0;
 	int max_count;
 	struct bitmap_index *bitmap_git;
 
@@ -389,6 +392,15 @@ static int try_bitmap_count(struct rev_info *revs)
 	if (revs->left_right || revs->cherry_mark)
 		return -1;
 
+	/*
+	 * If we're counting reachable objects, we can't handle a max count of
+	 * commits to traverse, since we don't know which objects go with which
+	 * commit.
+	 */
+	if (revs->max_count >= 0 &&
+	    (revs->tag_objects || revs->tree_objects || revs->blob_objects))
+		return -1;
+
 	/*
 	 * This must be saved before doing any walking, since the revision
 	 * machinery will count it down to zero while traversing.
@@ -399,11 +411,14 @@ static int try_bitmap_count(struct rev_info *revs)
 	if (!bitmap_git)
 		return -1;
 
-	count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
+	count_bitmap_commit_list(bitmap_git, &commit_count,
+				 revs->tree_objects ? &tree_count : NULL,
+				 revs->blob_objects ? &blob_count : NULL,
+				 revs->tag_objects ? &tag_count : NULL);
 	if (max_count >= 0 && max_count < commit_count)
 		commit_count = max_count;
 
-	printf("%d\n", commit_count);
+	printf("%d\n", commit_count + tree_count + blob_count + tag_count);
 	free_bitmap_index(bitmap_git);
 	return 0;
 }
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 6640329ebf..7ba7d294a5 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -74,6 +74,12 @@ rev_list_tests() {
 		test_cmp expect actual
 	'
 
+	test_expect_success "counting objects via bitmap ($state)" '
+		git rev-list --count --objects HEAD >expect &&
+		git rev-list --use-bitmap-index --count --objects HEAD >actual &&
+		test_cmp expect actual
+	'
+
 	test_expect_success "enumerate --objects ($state)" '
 		git rev-list --objects --use-bitmap-index HEAD >tmp &&
 		cut -d" " -f1 <tmp >tmp2 &&
-- 
2.25.0.785.g49bcbe7794


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 08/13] pack-bitmap: basic noop bitmap filter infrastructure
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
                   ` (6 preceding siblings ...)
  2020-02-13  2:20 ` [PATCH 07/13] rev-list: allow bitmaps when counting objects Jeff King
@ 2020-02-13  2:20 ` Jeff King
  2020-02-13  2:21 ` [PATCH 09/13] rev-list: use bitmap filters for traversal Jeff King
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:20 UTC (permalink / raw)
  To: git

Currently you can't use object filters with bitmaps, but we plan to
support at least some filters with bitmaps. Let's introduce some
infrastructure that will help us do that:

  - prepare_bitmap_walk() now accepts a list_objects_filter_options
    parameter (which can be NULL for no filtering; all the current
    callers pass this)

  - we'll bail early if the filter is incompatible with bitmaps (just as
    we would if there were no bitmaps at all). Currently all filters are
    incompatible.

  - we'll filter the resulting bitmap; since there are no supported
    filters yet, this is always a noop.

There should be no behavior change yet, but we'll support some actual
filters in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/pack-objects.c |  2 +-
 builtin/rev-list.c     |  4 ++--
 pack-bitmap.c          | 30 ++++++++++++++++++++++++++++--
 pack-bitmap.h          |  4 +++-
 reachable.c            |  2 +-
 5 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 393c20a2d7..60c943e42b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3040,7 +3040,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, NULL)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ce1cfc7da0..1ef180469f 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -407,7 +407,7 @@ static int try_bitmap_count(struct rev_info *revs)
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs);
+	bitmap_git = prepare_bitmap_walk(revs, NULL);
 	if (!bitmap_git)
 		return -1;
 
@@ -441,7 +441,7 @@ static int try_bitmap_traversal(struct rev_info *revs)
 	if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs);
+	bitmap_git = prepare_bitmap_walk(revs, NULL);
 	if (!bitmap_git)
 		return -1;
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 6c06099dc7..eb8888bb5e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "repository.h"
 #include "object-store.h"
+#include "list-objects-filter-options.h"
 
 /*
  * An entry on the bitmap index, representing the bitmap for a given
@@ -705,7 +706,25 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
 	return 0;
 }
 
-struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
+static int filter_bitmap(struct bitmap_index *bitmap_git,
+			 struct object_list *tip_objects,
+			 struct bitmap *to_filter,
+			 struct list_objects_filter_options *filter)
+{
+	if (!filter || filter->choice == LOFC_DISABLED)
+		return 0;
+
+	/* filter choice not handled */
+	return -1;
+}
+
+static int can_filter_bitmap(struct list_objects_filter_options *filter)
+{
+	return !filter_bitmap(NULL, NULL, NULL, filter);
+}
+
+struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
+					 struct list_objects_filter_options *filter)
 {
 	unsigned int i;
 
@@ -715,9 +734,14 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 	struct bitmap *wants_bitmap = NULL;
 	struct bitmap *haves_bitmap = NULL;
 
-	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
+	struct bitmap_index *bitmap_git;
+
+	if (!can_filter_bitmap(filter))
+		return NULL;
+
 	/* try to open a bitmapped pack, but don't parse it yet
 	 * because we may not need to use it */
+	bitmap_git = xcalloc(1, sizeof(*bitmap_git));
 	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
 		goto cleanup;
 
@@ -784,6 +808,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
 
+	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
+
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 466c5afa09..04bdb20c3c 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -8,6 +8,7 @@
 struct commit;
 struct repository;
 struct rev_info;
+struct list_objects_filter_options;
 
 static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};
 
@@ -46,7 +47,8 @@ void count_bitmap_commit_list(struct bitmap_index *, uint32_t *commits,
 void traverse_bitmap_commit_list(struct bitmap_index *,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
-struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs);
+struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
+					 struct list_objects_filter_options *filter);
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
 				       struct packed_git **packfile,
 				       uint32_t *entries, off_t *up_to);
diff --git a/reachable.c b/reachable.c
index 8f50235b28..45cde57cab 100644
--- a/reachable.c
+++ b/reachable.c
@@ -223,7 +223,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	cp.progress = progress;
 	cp.count = 0;
 
-	bitmap_git = prepare_bitmap_walk(revs);
+	bitmap_git = prepare_bitmap_walk(revs, NULL);
 	if (bitmap_git) {
 		traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
 		free_bitmap_index(bitmap_git);
-- 
2.25.0.785.g49bcbe7794


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 09/13] rev-list: use bitmap filters for traversal
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
                   ` (7 preceding siblings ...)
  2020-02-13  2:20 ` [PATCH 08/13] pack-bitmap: basic noop bitmap filter infrastructure Jeff King
@ 2020-02-13  2:21 ` Jeff King
  2020-02-13 22:22   ` Junio C Hamano
  2020-02-13  2:21 ` [PATCH 10/13] bitmap: add bitmap_unset() function Jeff King
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:21 UTC (permalink / raw)
  To: git

This just passes the filter-options struct to prepare_bitmap_walk().
Since the bitmap code doesn't actually support any filters yet, it will
fallback to the non-bitmap code if any --filter is specified. But this
lets us exercise that rejection code path, as well as getting us ready
to test filters via rev-list when we _do_ support them.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 1ef180469f..c6850e318b 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -372,7 +372,8 @@ static inline int parse_missing_action_value(const char *value)
 	return 0;
 }
 
-static int try_bitmap_count(struct rev_info *revs)
+static int try_bitmap_count(struct rev_info *revs,
+			    struct list_objects_filter_options *filter)
 {
 	uint32_t commit_count = 0,
 		 tag_count = 0,
@@ -407,7 +408,7 @@ static int try_bitmap_count(struct rev_info *revs)
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, NULL);
+	bitmap_git = prepare_bitmap_walk(revs, filter);
 	if (!bitmap_git)
 		return -1;
 
@@ -423,7 +424,8 @@ static int try_bitmap_count(struct rev_info *revs)
 	return 0;
 }
 
-static int try_bitmap_traversal(struct rev_info *revs)
+static int try_bitmap_traversal(struct rev_info *revs,
+				struct list_objects_filter_options *filter)
 {
 	struct bitmap_index *bitmap_git;
 
@@ -441,7 +443,7 @@ static int try_bitmap_traversal(struct rev_info *revs)
 	if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, NULL);
+	bitmap_git = prepare_bitmap_walk(revs, filter);
 	if (!bitmap_git)
 		return -1;
 
@@ -612,7 +614,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	    (revs.left_right || revs.cherry_mark))
 		die(_("marked counting is incompatible with --objects"));
 
-	if (filter_options.choice || revs.prune)
+	if (revs.prune)
 		use_bitmap_index = 0;
 
 	save_commit_buffer = (revs.verbose_header ||
@@ -625,9 +627,9 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		progress = start_delayed_progress(show_progress, 0);
 
 	if (use_bitmap_index) {
-		if (!try_bitmap_count(&revs))
+		if (!try_bitmap_count(&revs, &filter_options))
 			return 0;
-		if (!try_bitmap_traversal(&revs))
+		if (!try_bitmap_traversal(&revs, &filter_options))
 			return 0;
 	}
 
-- 
2.25.0.785.g49bcbe7794


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 10/13] bitmap: add bitmap_unset() function
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
                   ` (8 preceding siblings ...)
  2020-02-13  2:21 ` [PATCH 09/13] rev-list: use bitmap filters for traversal Jeff King
@ 2020-02-13  2:21 ` Jeff King
  2020-02-13  2:23 ` [PATCH 11/13] pack-bitmap: implement BLOB_NONE filtering Jeff King
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:21 UTC (permalink / raw)
  To: git

We've never needed to unset an individual bit in a bitmap until now.
Typically they start with all bits unset and we bitmap_set() them, or we
are applying another bitmap as a mask. But the easiest way to apply an
object filter to a bitmap result will be to unset the individual bits.

Signed-off-by: Jeff King <peff@peff.net>
---
 ewah/bitmap.c | 8 ++++++++
 ewah/ewok.h   | 1 +
 2 files changed, 9 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 52f1178db4..1c31b3e249 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -45,6 +45,14 @@ void bitmap_set(struct bitmap *self, size_t pos)
 	self->words[block] |= EWAH_MASK(pos);
 }
 
+void bitmap_unset(struct bitmap *self, size_t pos)
+{
+	size_t block = EWAH_BLOCK(pos);
+
+	if (block < self->word_alloc)
+		self->words[block] &= ~EWAH_MASK(pos);
+}
+
 int bitmap_get(struct bitmap *self, size_t pos)
 {
 	size_t block = EWAH_BLOCK(pos);
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 84b2a29faa..59f4ef7c4f 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -173,6 +173,7 @@ struct bitmap {
 
 struct bitmap *bitmap_new(void);
 void bitmap_set(struct bitmap *self, size_t pos);
+void bitmap_unset(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_reset(struct bitmap *self);
 void bitmap_free(struct bitmap *self);
-- 
2.25.0.785.g49bcbe7794


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 11/13] pack-bitmap: implement BLOB_NONE filtering
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
                   ` (9 preceding siblings ...)
  2020-02-13  2:21 ` [PATCH 10/13] bitmap: add bitmap_unset() function Jeff King
@ 2020-02-13  2:23 ` Jeff King
  2020-02-13  2:25 ` [PATCH 12/13] pack-bitmap: implement BLOB_LIMIT filtering Jeff King
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:23 UTC (permalink / raw)
  To: git

We can easily support BLOB_NONE filters with bitmaps. Since we know the
types of all of the objects, we just need to clear the result bits of
any blobs.

Note two subtleties in the implementation (which I also called out in
comments):

  - we have to include any blobs that were specifically asked for (and
    not reached through graph traversal) to match the non-bitmap version

  - we have to handle in-pack and "ext_index" objects separately.
    Arguably prepare_bitmap_walk() could be adding these ext_index
    objects to the type bitmaps. But it doesn't for now, so let's match
    the rest of the bitmap code here (it probably wouldn't be an
    efficiency improvement to do so since the cost of extending those
    bitmaps is about the same as our loop here, but it might make the
    code a bit simpler).

The regression tests just compare the bitmap output to the non-bitmap
output. Since the code internally falls back to the non-bitmap path in
some cases, this is at risk of being a trivial noop. To combat this,
we'll check for small differences between the two outputs (see the
comment in the test). This is a little fragile, as it's possible that we
may teach the fallback path for --use-bitmap-index to produce
bitmap-like output in the future. But the exact ordering of objects
would likely be different in such a case, anyway.

Plus we'd catch an unexpected fallback when running the perf suite, as
things would get slower. Here's what it looks like now (on git.git):

Test                                    HEAD^             HEAD
--------------------------------------------------------------------------------
5310.7: rev-list count with blob:none   1.67(1.62+0.05)   0.22(0.21+0.02) -86.8%

Signed-off-by: Jeff King <peff@peff.net>
---
 pack-bitmap.c                      | 74 ++++++++++++++++++++++++++++++
 t/perf/p5310-pack-bitmaps.sh       |  5 ++
 t/t6113-rev-list-bitmap-filters.sh | 35 ++++++++++++++
 3 files changed, 114 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index eb8888bb5e..f430ddc3d2 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -706,6 +706,73 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
 	return 0;
 }
 
+static struct bitmap *find_tip_blobs(struct bitmap_index *bitmap_git,
+				     struct object_list *tip_objects)
+{
+	struct bitmap *result = bitmap_new();
+	struct object_list *p;
+
+	for (p = tip_objects; p; p = p->next) {
+		int pos;
+
+		if (p->item->type != OBJ_BLOB)
+			continue;
+
+		pos = bitmap_position(bitmap_git, &p->item->oid);
+		if (pos < 0)
+			continue;
+
+		bitmap_set(result, pos);
+	}
+
+	return result;
+}
+
+static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
+				    struct object_list *tip_objects,
+				    struct bitmap *to_filter)
+{
+	struct eindex *eindex = &bitmap_git->ext_index;
+	struct bitmap *tips;
+	struct ewah_iterator it;
+	eword_t mask;
+	uint32_t i;
+
+	/*
+	 * The non-bitmap version of this filter never removes
+	 * blobs which the other side specifically asked for,
+	 * so we must match that behavior.
+	 */
+	tips = find_tip_blobs(bitmap_git, tip_objects);
+
+	/*
+	 * We can use the blob type-bitmap to work in whole words
+	 * for the objects that are actually in the bitmapped packfile.
+	 */
+	for (i = 0, init_type_iterator(&it, bitmap_git, OBJ_BLOB);
+	     i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+	     i++) {
+		if (i < tips->word_alloc)
+			mask &= ~tips->words[i];
+		to_filter->words[i] &= ~mask;
+	}
+
+	/*
+	 * Clear any blobs that weren't in the packfile (and so would not have
+	 * been caught by the loop above. We'll have to check them
+	 * individually.
+	 */
+	for (i = 0; i < eindex->count; i++) {
+		uint32_t pos = i + bitmap_git->pack->num_objects;
+		if (eindex->objects[i]->type == OBJ_BLOB &&
+		    bitmap_get(to_filter, pos) &&
+		    !bitmap_get(tips, pos))
+			bitmap_unset(to_filter, pos);
+	}
+
+	bitmap_free(tips);
+}
+
 static int filter_bitmap(struct bitmap_index *bitmap_git,
 			 struct object_list *tip_objects,
 			 struct bitmap *to_filter,
@@ -714,6 +781,13 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 	if (!filter || filter->choice == LOFC_DISABLED)
 		return 0;
 
+	if (filter->choice == LOFC_BLOB_NONE) {
+		if (bitmap_git)
+			filter_bitmap_blob_none(bitmap_git, tip_objects,
+						to_filter);
+		return 0;
+	}
+
 	/* filter choice not handled */
 	return -1;
 }
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 6a3a42531b..3383983450 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -39,6 +39,11 @@ test_perf 'pack to file (bitmap)' '
 	git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
 '
 
+test_perf 'rev-list count with blob:none' '
+	git rev-list --use-bitmap-index --count --objects --all \
+		--filter=blob:none >/dev/null
+'
+
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
 	cutoff=$(git rev-list HEAD~100 -1) &&
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index 977f8d0930..feaa6c0989 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -21,4 +21,39 @@ test_expect_success 'filters fallback to non-bitmap traversal' '
 	test_cmp expect actual
 '
 
+# the bitmap and regular traversals produce subtly different output:
+#
+#   - regular output is in traversal order, whereas bitmap is split by type,
+#     with non-packed objects at the end
+#
+#   - regular output has a space and the pathname appended to non-commit
+#     objects; bitmap output omits this
+#
+# Normalize and compare the two. The second argument should always be the
+# bitmap output.
+cmp_bitmap_traversal () {
+	if cmp "$1" "$2"
+	then
+		echo >&2 "identical raw outputs; are you sure bitmaps were used?"
+		return 1
+	fi &&
+	cut -d' ' -f1 "$1" | sort >"$1.normalized" &&
+	sort "$2" >"$2.normalized" &&
+	test_cmp "$1.normalized" "$2.normalized"
+}
+
+test_expect_success 'blob:none filter' '
+	git rev-list --objects --filter=blob:none HEAD >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=blob:none HEAD >actual &&
+	cmp_bitmap_traversal expect actual
+'
+
+test_expect_success 'blob:none filter with specified blob' '
+	git rev-list --objects --filter=blob:none HEAD HEAD:two.t >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=blob:none HEAD HEAD:two.t >actual &&
+	cmp_bitmap_traversal expect actual
+'
+
 test_done
-- 
2.25.0.785.g49bcbe7794


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 12/13] pack-bitmap: implement BLOB_LIMIT filtering
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
                   ` (10 preceding siblings ...)
  2020-02-13  2:23 ` [PATCH 11/13] pack-bitmap: implement BLOB_NONE filtering Jeff King
@ 2020-02-13  2:25 ` Jeff King
  2020-02-13 23:17   ` Junio C Hamano
  2020-02-13  2:25 ` [PATCH 13/13] pack-objects: support filters with bitmaps Jeff King
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
  13 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:25 UTC (permalink / raw)
  To: git

Just as the previous commit implemented BLOB_NONE, we can support
BLOB_LIMIT filters by looking at the sizes of any blobs in the result
and unsetting their bits as appropriate. This is slightly more expensive
than BLOB_NONE, but still produces a noticeable speedup (these results
are on git.git):

  Test                                        HEAD~2            HEAD
  ------------------------------------------------------------------------------------
  5310.7: rev-list count with blob:none       1.80(1.77+0.02)   0.22(0.20+0.02) -87.8%
  5310.8: rev-list count with blob:limit=1k   1.99(1.96+0.03)   0.29(0.25+0.03) -85.4%

The implementation is similar to the BLOB_NONE one, with the exception
that we have to go object-by-object while walking the blob-type bitmap
(since we can't mask out the matches, but must look up the size
individually for each blob). The trick with using ctz64() is taken from
show_objects_for_type(), which likewise needs to find individual bits
(but wants to quickly skip over big chunks without blobs).

Signed-off-by: Jeff King <peff@peff.net>
---
 pack-bitmap.c                      | 80 ++++++++++++++++++++++++++++++
 t/perf/p5310-pack-bitmaps.sh       |  5 ++
 t/t6113-rev-list-bitmap-filters.sh | 20 +++++++-
 3 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index f430ddc3d2..76cb60e8c3 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -773,6 +773,78 @@ static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
 	bitmap_free(tips);
 }
 
+static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
+				     uint32_t pos)
+{
+	struct packed_git *pack = bitmap_git->pack;
+	unsigned long size;
+	struct object_info oi = OBJECT_INFO_INIT;
+
+	oi.sizep = &size;
+
+	if (pos < pack->num_objects) {
+		struct revindex_entry *entry = &pack->revindex[pos];
+		if (packed_object_info(the_repository, pack,
+				       entry->offset, &oi) < 0) {
+			struct object_id oid;
+			nth_packed_object_oid(&oid, pack, entry->nr);
+			die(_("unable to get size of %s"), oid_to_hex(&oid));
+		}
+	} else {
+		struct eindex *eindex = &bitmap_git->ext_index;
+		struct object *obj = eindex->objects[pos - pack->num_objects];
+		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
+			die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
+	}
+
+	return size;
+}
+
+static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
+				     struct object_list *tip_objects,
+				     struct bitmap *to_filter,
+				     unsigned long limit)
+{
+	struct eindex *eindex = &bitmap_git->ext_index;
+	struct bitmap *tips;
+	struct ewah_iterator it;
+	eword_t mask;
+	uint32_t i;
+
+	tips = find_tip_blobs(bitmap_git, tip_objects);
+
+	for (i = 0, init_type_iterator(&it, bitmap_git, OBJ_BLOB);
+	     i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+	     i++) {
+		eword_t word = to_filter->words[i] & mask;
+		unsigned offset;
+
+		for (offset = 0; offset < BITS_IN_EWORD; offset++) {
+			uint32_t pos;
+
+			if ((word >> offset) == 0)
+				break;
+			offset += ewah_bit_ctz64(word >> offset);
+			pos = i * BITS_IN_EWORD + offset;
+
+			if (!bitmap_get(tips, pos) &&
+			    get_size_by_pos(bitmap_git, pos) >= limit)
+				bitmap_unset(to_filter, pos);
+		}
+	}
+
+	for (i = 0; i < eindex->count; i++) {
+		uint32_t pos = i + bitmap_git->pack->num_objects;
+		if (eindex->objects[i]->type == OBJ_BLOB &&
+		    bitmap_get(to_filter, pos) &&
+		    !bitmap_get(tips, pos) &&
+		    get_size_by_pos(bitmap_git, pos) >= limit)
+			bitmap_unset(to_filter, pos);
+	}
+
+	bitmap_free(tips);
+}
+
 static int filter_bitmap(struct bitmap_index *bitmap_git,
 			 struct object_list *tip_objects,
 			 struct bitmap *to_filter,
@@ -788,6 +860,14 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
+	if (filter->choice == LOFC_BLOB_LIMIT) {
+		if (bitmap_git)
+			filter_bitmap_blob_limit(bitmap_git, tip_objects,
+						 to_filter,
+						 filter->blob_limit_value);
+		return 0;
+	}
+
 	/* filter choice not handled */
 	return -1;
 }
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 3383983450..bbe1eb26a9 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -44,6 +44,11 @@ test_perf 'rev-list count with blob:none' '
 		--filter=blob:none >/dev/null
 '
 
+test_perf 'rev-list count with blob:limit=1k' '
+	git rev-list --use-bitmap-index --count --objects --all \
+		--filter=blob:limit=1k >/dev/null
+'
+
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
 	cutoff=$(git rev-list HEAD~100 -1) &&
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index feaa6c0989..0878f72828 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -6,8 +6,10 @@ test_description='rev-list combining bitmaps and filters'
 test_expect_success 'set up bitmapped repo' '
 	# one commit will have bitmaps, the other will not
 	test_commit one &&
+	test_commit much-larger-blob-one &&
 	git repack -adb &&
-	test_commit two
+	test_commit two &&
+	test_commit much-larger-blob-two
 '
 
 test_expect_success 'filters fallback to non-bitmap traversal' '
@@ -56,4 +58,20 @@ test_expect_success 'blob:none filter with specified blob' '
 	cmp_bitmap_traversal expect actual
 '
 
+test_expect_success 'blob:limit filter' '
+	git rev-list --objects --filter=blob:limit=5 HEAD >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=blob:limit=5 HEAD >actual &&
+	cmp_bitmap_traversal expect actual
+'
+
+test_expect_success 'blob:limit filter with specified blob' '
+	git rev-list --objects --filter=blob:limit=5 \
+		     HEAD HEAD:much-larger-blob-two.t >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=blob:limit=5 \
+		     HEAD HEAD:much-larger-blob-two.t >actual &&
+	cmp_bitmap_traversal expect actual
+'
+
 test_done
-- 
2.25.0.785.g49bcbe7794


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 13/13] pack-objects: support filters with bitmaps
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
                   ` (11 preceding siblings ...)
  2020-02-13  2:25 ` [PATCH 12/13] pack-bitmap: implement BLOB_LIMIT filtering Jeff King
@ 2020-02-13  2:25 ` Jeff King
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
  13 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-13  2:25 UTC (permalink / raw)
  To: git

Just as rev-list recently learned to combine filters and bitmaps, let's
do the same for pack-objects. The infrastructure is all there; we just
need to pass along our filter options, and the pack-bitmap code will
decide to use bitmaps or not.

This unsurprisingly makes things faster for partial clones of large
repositories (here we're cloning linux.git):

  Test                              HEAD^               HEAD
  ------------------------------------------------------------------------------
  5310.9: simulated partial clone   38.94(37.28+5.87)   11.06(11.27+4.07) -71.6%

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/pack-objects.c       |  3 +--
 t/perf/p5310-pack-bitmaps.sh |  4 ++++
 t/t5310-pack-bitmaps.sh      | 14 ++++++++++++++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 60c943e42b..b09f85bd30 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3040,7 +3040,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, NULL)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
@@ -3418,7 +3418,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (filter_options.choice) {
 		if (!pack_to_stdout)
 			die(_("cannot use --filter without --stdout"));
-		use_bitmap_index = 0;
 	}
 
 	/*
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index bbe1eb26a9..5ab2104ee8 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -49,6 +49,10 @@ test_perf 'rev-list count with blob:limit=1k' '
 		--filter=blob:limit=1k >/dev/null
 '
 
+test_perf 'simulated partial clone' '
+	git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
+'
+
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
 	cutoff=$(git rev-list HEAD~100 -1) &&
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 7ba7d294a5..ac9e6acf27 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -105,6 +105,20 @@ test_expect_success 'clone from bitmapped repository' '
 	test_cmp expect actual
 '
 
+test_expect_success 'partial clone from bitmapped repository' '
+	test_config uploadpack.allowfilter true &&
+	git clone --no-local --bare --filter=blob:none . partial-clone.git &&
+	(
+		cd partial-clone.git &&
+		pack=$(echo objects/pack/*.pack) &&
+		git verify-pack -v "$pack" >have &&
+		awk "/blob/ { print \$1 }" <have >blobs &&
+		# we expect this single blob because of the direct ref
+		git rev-parse refs/tags/tagged-blob >expect &&
+		test_cmp expect blobs
+	)
+'
+
 test_expect_success 'setup further non-bitmapped commits' '
 	test_commit_bulk --id=further 10
 '
-- 
2.25.0.785.g49bcbe7794

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH 01/13] pack-bitmap: factor out type iterator initialization
  2020-02-13  2:16 ` [PATCH 01/13] pack-bitmap: factor out type iterator initialization Jeff King
@ 2020-02-13 17:45   ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2020-02-13 17:45 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> When count_object_type() wants to iterate over the bitmap of all objects
> of a certain type, we have to pair up OBJ_COMMIT with bitmap->commits,
> and so forth. Since we're about to add more code to iterate over these
> bitmaps, let's pull the initialization into its own function.
>
> We can also use this to simplify traverse_bitmap_commit_list(). It
> accomplishes the same thing by manually passing the object type and the
> bitmap to show_objects_for_type(), but using our helper we just need the
> object type.
>
> Note there's one small code change here: previously we'd simply return
> zero when counting an unknown object type, and now we'll BUG(). This
> shouldn't matter in practice, as all of the callers pass in only usual
> commit/tree/blob/tag types.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  pack-bitmap.c | 63 +++++++++++++++++++++++++++------------------------
>  1 file changed, 33 insertions(+), 30 deletions(-)

Nicely written and makes perfect sense.  show_objects_for_type()
used to have a redundant pair of parameters, but it is impossible
to pass them inconsistently after this change.

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 02/13] pack-bitmap: fix leak of haves/wants object lists
  2020-02-13  2:16 ` [PATCH 02/13] pack-bitmap: fix leak of haves/wants object lists Jeff King
@ 2020-02-13 18:12   ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2020-02-13 18:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> When we do a bitmap-aware revision traversal, we create an object_list
> for each of the "haves" and "wants" tips. After creating the result
> bitmaps these are no longer needed or used, but we never free the list
> memory.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

Obviously correct ;-)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 03/13] rev-list: fallback to non-bitmap traversal when filtering
  2020-02-13  2:17 ` [PATCH 03/13] rev-list: fallback to non-bitmap traversal when filtering Jeff King
@ 2020-02-13 18:19   ` Junio C Hamano
  2020-02-13 18:40     ` Jeff King
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2020-02-13 18:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> The "--use-bitmap-index" option is usually aspirational: if we have
> bitmaps and the request can be fulfilled more quickly using them we'll
> do so, but otherwise fall back to a non-bitmap traversal.
>
> The exception is object filtering, which explicitly dies if the two
> options are combined. Let's convert this to the usual fallback behavior.
>
> This is a minor convenience for now (since the caller can easily know
> that --filter and --use-bitmap-index don't combine), but will become
> much more useful as we start to support _some_ filters with bitmaps, but
> not others.

Makes sense.  

Perhaps the option should have been called allow-bitmap-index or
something along that line, but it is too late ;-)

> +test_expect_success 'set up bitmapped repo' '
> +	# one commit will have bitmaps, the other will not
> +	test_commit one &&
> +	git repack -adb &&
> +	test_commit two
> +'
> +
> +test_expect_success 'filters fallback to non-bitmap traversal' '
> +	# use a path-based filter, since they are inherently incompatible with
> +	# bitmaps (i.e., this test will never get confused by later code to
> +	# combine the features)
> +	filter=$(echo "!one" | git hash-object -w --stdin) &&
> +	git rev-list --objects --filter=sparse:oid=$filter HEAD >expect &&
> +	git rev-list --use-bitmap-index \
> +		     --objects --filter=sparse:oid=$filter HEAD >actual &&
> +	test_cmp expect actual
> +'

OK.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 05/13] rev-list: factor out bitmap-optimized routines
  2020-02-13  2:18 ` [PATCH 05/13] rev-list: factor out bitmap-optimized routines Jeff King
@ 2020-02-13 18:34   ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2020-02-13 18:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> There are a few operations in rev-list that are optimized for bitmaps.
> Rather than having the code inline in cmd_rev_list(), let's move them
> into helpers. This not only makes the flow of the main function simpler,
> but it lets us replace the complex "can we do the optimization?"
> conditionals with a series of early returns from the functions. That
> also makes it easy to add comments explaining those conditions.

... and these new comments makes the resulting code much easier to
follow.  Nicely done.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 03/13] rev-list: fallback to non-bitmap traversal when filtering
  2020-02-13 18:19   ` Junio C Hamano
@ 2020-02-13 18:40     ` Jeff King
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-13 18:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Feb 13, 2020 at 10:19:19AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The "--use-bitmap-index" option is usually aspirational: if we have
> > bitmaps and the request can be fulfilled more quickly using them we'll
> > do so, but otherwise fall back to a non-bitmap traversal.
> >
> > The exception is object filtering, which explicitly dies if the two
> > options are combined. Let's convert this to the usual fallback behavior.
> >
> > This is a minor convenience for now (since the caller can easily know
> > that --filter and --use-bitmap-index don't combine), but will become
> > much more useful as we start to support _some_ filters with bitmaps, but
> > not others.
> 
> Makes sense.  
> 
> Perhaps the option should have been called allow-bitmap-index or
> something along that line, but it is too late ;-)

Yeah. It's also annoyingly long to type, and makes for long lines in the
test scripts. ;)

There are also some weird semantics with the fallback, because the
output may differ depending on whether we use bitmaps (see one of the
later patches). I wouldn't be opposed to cleaning this up and giving it
a new option ("--allow-bitmaps" or something) to keep compatibility, but
it's out of scope here.

The existing option (and my suggestion, as well as most of the internal
code) are guilty of equating "bitmap" with "object reachability bitmap".
There's lots of things one might use bitmaps for, and at some point we
might even expose such a thing via rev-list.

Anyway, that concludes my rant. I don't think any of these are urgent to
fix, and definitely shouldn't be part of this series.

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 06/13] rev-list: make --count work with --objects
  2020-02-13  2:19 ` [PATCH 06/13] rev-list: make --count work with --objects Jeff King
@ 2020-02-13 19:14   ` Junio C Hamano
  2020-02-13 20:27     ` Jeff King
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2020-02-13 19:14 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> The current behavior from "rev-list --count --objects" is nonsensical:
> we enumerate all of the objects except commits, but then give a count of
> commits. This wasn't planned, and is just what the code happens to do.
>
> Instead, let's give the answer the user almost certainly wanted: the
> full count of objects.
>
> Note that there are more complicated cases around cherry-marking, etc.
> We'll punt on those for now, but let the user know that we can't produce
> an answer (rather than giving them something useless).

Now, finally we start changing the behaviour.  

Is the reason why --left-right and --objects do not work well
together because same trees and blobs can be shared between commits
on both sides?

> diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
> index b8cf82349b..383f2c457d 100755
> --- a/t/t6000-rev-list-misc.sh
> +++ b/t/t6000-rev-list-misc.sh
> @@ -148,4 +148,16 @@ test_expect_success 'rev-list --end-of-options' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'rev-list --count' '
> +	count=$(git rev-list --count HEAD) &&
> +	git rev-list HEAD >actual &&
> +	test_line_count = $count actual
> +'
> +
> +test_expect_success 'rev-list --count --objects' '
> +	count=$(git rev-list --count --objects HEAD) &&
> +	git rev-list --objects HEAD >actual &&
> +	test_line_count = $count actual
> +'

Makes sense to define --count as the same as number of objects
shown and verify it.



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 06/13] rev-list: make --count work with --objects
  2020-02-13 19:14   ` Junio C Hamano
@ 2020-02-13 20:27     ` Jeff King
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-13 20:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Feb 13, 2020 at 11:14:06AM -0800, Junio C Hamano wrote:

> > The current behavior from "rev-list --count --objects" is nonsensical:
> > we enumerate all of the objects except commits, but then give a count of
> > commits. This wasn't planned, and is just what the code happens to do.
> >
> > Instead, let's give the answer the user almost certainly wanted: the
> > full count of objects.
> >
> > Note that there are more complicated cases around cherry-marking, etc.
> > We'll punt on those for now, but let the user know that we can't produce
> > an answer (rather than giving them something useless).
> 
> Now, finally we start changing the behaviour.  
> 
> Is the reason why --left-right and --objects do not work well
> together because same trees and blobs can be shared between commits
> on both sides?

Yes, exactly. I think you could probably define some sensible responses
there. E.g., consider this history:

  commit() { echo $1 >$1 && git add $1 && git commit -m $1; }
  commit base
  commit left
  git checkout -b side HEAD^
  commit right

which looks like this:

  $ git log --graph --oneline --all
  * e0b7b94 (master) left
  | * 2a4163a (HEAD -> side) right
  |/  
  * 8d8a806 base

That "base" commit is sort of in the same boat: it's reachable from
either side. By default we don't count it at all:

  $ git rev-list --count --left-right master...side
  1	1

but we do know about it as a boundary commit, and you could ask for it
to be counted on the right:

  $ git rev-list --boundary --left-right master...side
  <e0b7b9473efa49db3c6bf4ceab587f22d1935f28
  >2a4163a0a9bbcb837af2ac2e80e17120798f863a
  -8d8a806249905f101b5ac3f1eb74fa426f90ddf2

  $ git rev-list --count --boundary --left-right master...side
  2	1

So probably it would be sensible to do the same for the objects.
Anything reachable from both is a "boundary" object. But I don't think
we extend the left-right marking to --objects at all now:

  $ git rev-list --boundary --left-right --objects master...side
  <e0b7b9473efa49db3c6bf4ceab587f22d1935f28
  >2a4163a0a9bbcb837af2ac2e80e17120798f863a
  -8d8a806249905f101b5ac3f1eb74fa426f90ddf2
  f48654b5fe8de485e4622598842ca14b04b62c0a
  45cf141ba67d59203f02a54f03162f3fcef57830 left
  4540e3db9d99d518601ecadb81f7d55d55855033
  c376d892e8b105bd712d06ec5162b5f31ce949c3 right

Nor do we even seem to dig into the boundary objects.

So there would need to be more infrastructure in the traversal itself to
be able to do this right, I think. I'm happy to wait until somebody
demonstrates a real use for it. :)

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 07/13] rev-list: allow bitmaps when counting objects
  2020-02-13  2:20 ` [PATCH 07/13] rev-list: allow bitmaps when counting objects Jeff King
@ 2020-02-13 21:47   ` Junio C Hamano
  2020-02-13 22:27     ` Jeff King
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2020-02-13 21:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> -	count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
> +	count_bitmap_commit_list(bitmap_git, &commit_count,
> +				 revs->tree_objects ? &tree_count : NULL,
> +				 revs->blob_objects ? &blob_count : NULL,
> +				 revs->tag_objects ? &tag_count : NULL);

So count_bitmap_commit_list() has been prepared to count non-commits
already, but we are just starting to use that feature?

Interesting.

> -	printf("%d\n", commit_count);
> +	printf("%d\n", commit_count + tree_count + blob_count + tag_count);

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 09/13] rev-list: use bitmap filters for traversal
  2020-02-13  2:21 ` [PATCH 09/13] rev-list: use bitmap filters for traversal Jeff King
@ 2020-02-13 22:22   ` Junio C Hamano
  2020-02-13 22:34     ` Jeff King
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2020-02-13 22:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> This just passes the filter-options struct to prepare_bitmap_walk().
> Since the bitmap code doesn't actually support any filters yet, it will
> fallback to the non-bitmap code if any --filter is specified. But this
> lets us exercise that rejection code path, as well as getting us ready
> to test filters via rev-list when we _do_ support them.

So we used to look at filter_options.choice and declared any filter
is incompatible with use_bitmap_index quite early, but now we let
each of the try_bitmap_*() helpers check what is in the filter and
make their own decisions.

Of course, the prepare_bitmap_walk() call at the beginning of these
helpers does not know how to work with any filter at this point in
the series, so all of the above cancel out :-).

Makes sense.

I wonder if the "revs.prune" thing that forces use_bitmap_index off
should also move to prepare_bitmap_walk() at some point in the
series (or after the current series is done).  After all, the point
of introducing try_bitmap_*() helpers was to let these bitmap
specific logic to know what is and is not compatible with the bitmap
routines.

Thanks.

> @@ -441,7 +443,7 @@ static int try_bitmap_traversal(struct rev_info *revs)
>  	if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
>  		return -1;
>  
> -	bitmap_git = prepare_bitmap_walk(revs, NULL);
> +	bitmap_git = prepare_bitmap_walk(revs, filter);
>  	if (!bitmap_git)
>  		return -1;
>  
> @@ -612,7 +614,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  	    (revs.left_right || revs.cherry_mark))
>  		die(_("marked counting is incompatible with --objects"));
>  
> -	if (filter_options.choice || revs.prune)
> +	if (revs.prune)
>  		use_bitmap_index = 0;
>  
>  	save_commit_buffer = (revs.verbose_header ||
> @@ -625,9 +627,9 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  		progress = start_delayed_progress(show_progress, 0);
>  
>  	if (use_bitmap_index) {
> -		if (!try_bitmap_count(&revs))
> +		if (!try_bitmap_count(&revs, &filter_options))
>  			return 0;
> -		if (!try_bitmap_traversal(&revs))
> +		if (!try_bitmap_traversal(&revs, &filter_options))
>  			return 0;
>  	}

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 07/13] rev-list: allow bitmaps when counting objects
  2020-02-13 21:47   ` Junio C Hamano
@ 2020-02-13 22:27     ` Jeff King
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-13 22:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Feb 13, 2020 at 01:47:38PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > -	count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
> > +	count_bitmap_commit_list(bitmap_git, &commit_count,
> > +				 revs->tree_objects ? &tree_count : NULL,
> > +				 revs->blob_objects ? &blob_count : NULL,
> > +				 revs->tag_objects ? &tag_count : NULL);
> 
> So count_bitmap_commit_list() has been prepared to count non-commits
> already, but we are just starting to use that feature?
> 
> Interesting.

Yeah, it goes all the way back to fff42755ef (pack-bitmap: add support
for bitmap indexes, 2013-12-21), but was never used for non-commits. I
was pleasantly surprised that it actually worked. ;)

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 09/13] rev-list: use bitmap filters for traversal
  2020-02-13 22:22   ` Junio C Hamano
@ 2020-02-13 22:34     ` Jeff King
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-13 22:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Feb 13, 2020 at 02:22:07PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > This just passes the filter-options struct to prepare_bitmap_walk().
> > Since the bitmap code doesn't actually support any filters yet, it will
> > fallback to the non-bitmap code if any --filter is specified. But this
> > lets us exercise that rejection code path, as well as getting us ready
> > to test filters via rev-list when we _do_ support them.
> 
> So we used to look at filter_options.choice and declared any filter
> is incompatible with use_bitmap_index quite early, but now we let
> each of the try_bitmap_*() helpers check what is in the filter and
> make their own decisions.
> 
> Of course, the prepare_bitmap_walk() call at the beginning of these
> helpers does not know how to work with any filter at this point in
> the series, so all of the above cancel out :-).
> 
> Makes sense.
> 
> I wonder if the "revs.prune" thing that forces use_bitmap_index off
> should also move to prepare_bitmap_walk() at some point in the
> series (or after the current series is done).  After all, the point
> of introducing try_bitmap_*() helpers was to let these bitmap
> specific logic to know what is and is not compatible with the bitmap
> routines.

Ah, interesting thought. Yeah, we could push it down to that level to
avoid rev-list having to know details about how the bitmap code works.
That could replace the earlier patch to consolidate the filter/prune
logic. And then in this patch, this hunk:

> > @@ -612,7 +614,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> >  	    (revs.left_right || revs.cherry_mark))
> >  		die(_("marked counting is incompatible with --objects"));
> >  
> > -	if (filter_options.choice || revs.prune)
> > +	if (revs.prune)
> >  		use_bitmap_index = 0;

would just drop this conditional entirely. I like it.

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 12/13] pack-bitmap: implement BLOB_LIMIT filtering
  2020-02-13  2:25 ` [PATCH 12/13] pack-bitmap: implement BLOB_LIMIT filtering Jeff King
@ 2020-02-13 23:17   ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2020-02-13 23:17 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Just as the previous commit implemented BLOB_NONE, we can support
> BLOB_LIMIT filters by looking at the sizes of any blobs in the result
> and unsetting their bits as appropriate. This is slightly more expensive
> than BLOB_NONE, but still produces a noticeable speedup (these results
> are on git.git):
>
>   Test                                        HEAD~2            HEAD
>   ------------------------------------------------------------------------------------
>   5310.7: rev-list count with blob:none       1.80(1.77+0.02)   0.22(0.20+0.02) -87.8%
>   5310.8: rev-list count with blob:limit=1k   1.99(1.96+0.03)   0.29(0.25+0.03) -85.4%

That's respectable improvement.  packed_object_info() that asks only
for the inflated size is quite cheap when the object is a delta, and
hopefully we have more deltified blobs than deflated ones.

> +static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
> +				     uint32_t pos)
> +{
> +	struct packed_git *pack = bitmap_git->pack;
> +	unsigned long size;
> +	struct object_info oi = OBJECT_INFO_INIT;
> +
> +	oi.sizep = &size;
> +
> +	if (pos < pack->num_objects) {
> +		struct revindex_entry *entry = &pack->revindex[pos];
> +		if (packed_object_info(the_repository, pack,
> +				       entry->offset, &oi) < 0) {
> +			struct object_id oid;
> +			nth_packed_object_oid(&oid, pack, entry->nr);
> +			die(_("unable to get size of %s"), oid_to_hex(&oid));
> +		}
> +	} else {
> +		struct eindex *eindex = &bitmap_git->ext_index;
> +		struct object *obj = eindex->objects[pos - pack->num_objects];
> +		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
> +			die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
> +	}
> +
> +	return size;
> +}


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 0/15] combining object filters and bitmaps
  2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
                   ` (12 preceding siblings ...)
  2020-02-13  2:25 ` [PATCH 13/13] pack-objects: support filters with bitmaps Jeff King
@ 2020-02-14 18:21 ` Jeff King
  2020-02-14 18:22   ` [PATCH v2 01/15] pack-bitmap: factor out type iterator initialization Jeff King
                     ` (14 more replies)
  13 siblings, 15 replies; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Here's a v2 based on the early feedback from Junio, and a few things I
noticed while addressing it:

 - patch 4 now moves the check for revs.prune into the pack-bitmap code,
   rather than rearranging it within rev-list.

 - I noticed some more opportunities to reuse the "compare the results
   of a bitmap and non-bitmap traversal" code, so I factored that out
   into its own commit (patch 8)

 - I realized another long-standing annoyance is simple to fix: before
   we did not know how to do "rev-list --use-bitmap-index" without
   "--objects", but it was pretty easy to support. That's patch 9.

The range diff below is a little noisy because of the ripple effects of
some of those changes. If you just want to review the substantive
changes, look at patches 4, 8, and 9.

  [01/15]: pack-bitmap: factor out type iterator initialization
  [02/15]: pack-bitmap: fix leak of haves/wants object lists
  [03/15]: rev-list: fallback to non-bitmap traversal when filtering
  [04/15]: pack-bitmap: refuse to do a bitmap traversal with pathspecs
  [05/15]: rev-list: factor out bitmap-optimized routines
  [06/15]: rev-list: make --count work with --objects
  [07/15]: rev-list: allow bitmaps when counting objects
  [08/15]: t5310: factor out bitmap traversal comparison
  [09/15]: rev-list: allow commit-only bitmap traversals
  [10/15]: pack-bitmap: basic noop bitmap filter infrastructure
  [11/15]: rev-list: use bitmap filters for traversal
  [12/15]: bitmap: add bitmap_unset() function
  [13/15]: pack-bitmap: implement BLOB_NONE filtering
  [14/15]: pack-bitmap: implement BLOB_LIMIT filtering
  [15/15]: pack-objects: support filters with bitmaps

 builtin/pack-objects.c             |   6 +-
 builtin/rev-list.c                 | 114 +++++++++---
 ewah/bitmap.c                      |   8 +
 ewah/ewok.h                        |   1 +
 object.c                           |   9 +
 object.h                           |   2 +
 pack-bitmap.c                      | 272 +++++++++++++++++++++++++----
 pack-bitmap.h                      |   5 +-
 reachable.c                        |   4 +-
 t/perf/p5310-pack-bitmaps.sh       |  22 +++
 t/t5310-pack-bitmaps.sh            |  36 +++-
 t/t6000-rev-list-misc.sh           |  12 ++
 t/t6113-rev-list-bitmap-filters.sh |  56 ++++++
 t/test-lib-functions.sh            |  27 +++
 14 files changed, 504 insertions(+), 70 deletions(-)
 create mode 100755 t/t6113-rev-list-bitmap-filters.sh

Range-diff from v1:

 1:  283874f2ac =  1:  4424d745de pack-bitmap: factor out type iterator initialization
 2:  68e7bed41c =  2:  bc0576868f pack-bitmap: fix leak of haves/wants object lists
 3:  1182593fb2 =  3:  9d2ef09b9b rev-list: fallback to non-bitmap traversal when filtering
 4:  03b20e9b66 <  -:  ---------- rev-list: consolidate bitmap-disabling options
 -:  ---------- >  4:  ce7969221a pack-bitmap: refuse to do a bitmap traversal with pathspecs
 5:  91032b0d78 =  5:  a9a5513bb0 rev-list: factor out bitmap-optimized routines
 6:  c7c3a31cf3 !  6:  eab5dd34bc rev-list: make --count work with --objects
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
     +	    (revs.left_right || revs.cherry_mark))
     +		die(_("marked counting is incompatible with --objects"));
     +
    - 	if (filter_options.choice || revs.prune)
    + 	if (filter_options.choice)
      		use_bitmap_index = 0;
      
     
 7:  0aabf6262b =  7:  04af6b9362 rev-list: allow bitmaps when counting objects
 -:  ---------- >  8:  4b93d064b1 t5310: factor out bitmap traversal comparison
 -:  ---------- >  9:  d89509fa8f rev-list: allow commit-only bitmap traversals
 8:  969238ec42 ! 10:  c64d4db8e8 pack-bitmap: basic noop bitmap filter infrastructure
    @@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs)
      		return -1;
      
     @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs)
    - 	if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
    + 	if (revs->max_count >= 0)
      		return -1;
      
     -	bitmap_git = prepare_bitmap_walk(revs);
    @@ pack-bitmap.c: static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
      	unsigned int i;
      
     @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
    - 	struct bitmap *wants_bitmap = NULL;
    - 	struct bitmap *haves_bitmap = NULL;
    + 	if (revs->prune)
    + 		return NULL;
      
    --	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
    -+	struct bitmap_index *bitmap_git;
    -+
     +	if (!can_filter_bitmap(filter))
     +		return NULL;
     +
      	/* try to open a bitmapped pack, but don't parse it yet
      	 * because we may not need to use it */
    -+	bitmap_git = xcalloc(1, sizeof(*bitmap_git));
    - 	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
    - 		goto cleanup;
    - 
    + 	bitmap_git = xcalloc(1, sizeof(*bitmap_git));
     @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
      	if (haves_bitmap)
      		bitmap_and_not(wants_bitmap, haves_bitmap);
    @@ pack-bitmap.h
      
      static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};
      
    -@@ pack-bitmap.h: void count_bitmap_commit_list(struct bitmap_index *, uint32_t *commits,
    - void traverse_bitmap_commit_list(struct bitmap_index *,
    +@@ pack-bitmap.h: void traverse_bitmap_commit_list(struct bitmap_index *,
    + 				 struct rev_info *revs,
      				 show_reachable_fn show_reachable);
      void test_bitmap_walk(struct rev_info *revs);
     -struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs);
    @@ reachable.c: void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
     -	bitmap_git = prepare_bitmap_walk(revs);
     +	bitmap_git = prepare_bitmap_walk(revs, NULL);
      	if (bitmap_git) {
    - 		traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
    + 		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
      		free_bitmap_index(bitmap_git);
 9:  b4eb8fcb9d ! 11:  caa69ac535 rev-list: use bitmap filters for traversal
    @@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs)
      	struct bitmap_index *bitmap_git;
      
     @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs)
    - 	if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
    + 	if (revs->max_count >= 0)
      		return -1;
      
     -	bitmap_git = prepare_bitmap_walk(revs, NULL);
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      	    (revs.left_right || revs.cherry_mark))
      		die(_("marked counting is incompatible with --objects"));
      
    --	if (filter_options.choice || revs.prune)
    -+	if (revs.prune)
    - 		use_bitmap_index = 0;
    - 
    +-	if (filter_options.choice)
    +-		use_bitmap_index = 0;
    +-
      	save_commit_buffer = (revs.verbose_header ||
    + 			      revs.grep_filter.pattern_list ||
    + 			      revs.grep_filter.header_list);
     @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *prefix)
      		progress = start_delayed_progress(show_progress, 0);
      
10:  b518070c12 = 12:  d3e7462138 bitmap: add bitmap_unset() function
11:  438c21082e ! 13:  07ee699520 pack-bitmap: implement BLOB_NONE filtering
    @@ Commit message
             bitmaps is about the same as our loop here, but it might make the
             code a bit simpler).
     
    -    The regression tests just compare the bitmap output to the non-bitmap
    -    output. Since the code internally falls back to the non-bitmap path in
    -    some cases, this is at risk of being a trivial noop. To combat this,
    -    we'll check for small differences between the two outputs (see the
    -    comment in the test). This is a little fragile, as it's possible that we
    -    may teach the fallback path for --use-bitmap-index to produce
    -    bitmap-like output in the future. But the exact ordering of objects
    -    would likely be different in such a case, anyway.
    +    Here are perf results for the new test on git.git:
     
    -    Plus we'd catch an unexpected fallback when running the perf suite, as
    -    things would get slower. Here's what it looks like now (on git.git):
    -
    -    Test                                    HEAD^             HEAD
    -    --------------------------------------------------------------------------------
    -    5310.7: rev-list count with blob:none   1.67(1.62+0.05)   0.22(0.21+0.02) -86.8%
    +      Test                                    HEAD^             HEAD
    +      --------------------------------------------------------------------------------
    +      5310.9: rev-list count with blob:none   1.67(1.62+0.05)   0.22(0.21+0.02) -86.8%
     
      ## pack-bitmap.c ##
     @@ pack-bitmap.c: static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
    @@ pack-bitmap.c: static int filter_bitmap(struct bitmap_index *bitmap_git,
      }
     
      ## t/perf/p5310-pack-bitmaps.sh ##
    -@@ t/perf/p5310-pack-bitmaps.sh: test_perf 'pack to file (bitmap)' '
    - 	git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
    +@@ t/perf/p5310-pack-bitmaps.sh: test_perf 'rev-list (objects)' '
    + 	git rev-list --all --use-bitmap-index --objects >/dev/null
      '
      
     +test_perf 'rev-list count with blob:none' '
    @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'filters fallback to non
      	test_cmp expect actual
      '
      
    -+# the bitmap and regular traversals produce subtly different output:
    -+#
    -+#   - regular output is in traversal order, whereas bitmap is split by type,
    -+#     with non-packed objects at the end
    -+#
    -+#   - regular output has a space and the pathname appended to non-commit
    -+#     objects; bitmap output omits this
    -+#
    -+# Normalize and compare the two. The second argument should always be the
    -+# bitmap output.
    -+cmp_bitmap_traversal () {
    -+	if cmp "$1" "$2"
    -+	then
    -+		echo >&2 "identical raw outputs; are you sure bitmaps were used?"
    -+		return 1
    -+	fi &&
    -+	cut -d' ' -f1 "$1" | sort >"$1.normalized" &&
    -+	sort "$2" >"$2.normalized" &&
    -+	test_cmp "$1.normalized" "$2.normalized"
    -+}
    -+
     +test_expect_success 'blob:none filter' '
     +	git rev-list --objects --filter=blob:none HEAD >expect &&
     +	git rev-list --use-bitmap-index \
     +		     --objects --filter=blob:none HEAD >actual &&
    -+	cmp_bitmap_traversal expect actual
    ++	test_bitmap_traversal expect actual
     +'
     +
     +test_expect_success 'blob:none filter with specified blob' '
     +	git rev-list --objects --filter=blob:none HEAD HEAD:two.t >expect &&
     +	git rev-list --use-bitmap-index \
     +		     --objects --filter=blob:none HEAD HEAD:two.t >actual &&
    -+	cmp_bitmap_traversal expect actual
    ++	test_bitmap_traversal expect actual
     +'
     +
      test_done
12:  05cd57f0ba ! 14:  f35fab09a0 pack-bitmap: implement BLOB_LIMIT filtering
    @@ Commit message
         than BLOB_NONE, but still produces a noticeable speedup (these results
         are on git.git):
     
    -      Test                                        HEAD~2            HEAD
    +      Test                                         HEAD~2            HEAD
           ------------------------------------------------------------------------------------
    -      5310.7: rev-list count with blob:none       1.80(1.77+0.02)   0.22(0.20+0.02) -87.8%
    -      5310.8: rev-list count with blob:limit=1k   1.99(1.96+0.03)   0.29(0.25+0.03) -85.4%
    +      5310.9:  rev-list count with blob:none       1.80(1.77+0.02)   0.22(0.20+0.02) -87.8%
    +      5310.10: rev-list count with blob:limit=1k   1.99(1.96+0.03)   0.29(0.25+0.03) -85.4%
     
         The implementation is similar to the BLOB_NONE one, with the exception
         that we have to go object-by-object while walking the blob-type bitmap
    @@ t/t6113-rev-list-bitmap-filters.sh: test_description='rev-list combining bitmaps
      
      test_expect_success 'filters fallback to non-bitmap traversal' '
     @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'blob:none filter with specified blob' '
    - 	cmp_bitmap_traversal expect actual
    + 	test_bitmap_traversal expect actual
      '
      
     +test_expect_success 'blob:limit filter' '
     +	git rev-list --objects --filter=blob:limit=5 HEAD >expect &&
     +	git rev-list --use-bitmap-index \
     +		     --objects --filter=blob:limit=5 HEAD >actual &&
    -+	cmp_bitmap_traversal expect actual
    ++	test_bitmap_traversal expect actual
     +'
     +
     +test_expect_success 'blob:limit filter with specified blob' '
    @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'blob:none filter with s
     +	git rev-list --use-bitmap-index \
     +		     --objects --filter=blob:limit=5 \
     +		     HEAD HEAD:much-larger-blob-two.t >actual &&
    -+	cmp_bitmap_traversal expect actual
    ++	test_bitmap_traversal expect actual
     +'
     +
      test_done
13:  1e5befb73a ! 15:  797dbfe28f pack-objects: support filters with bitmaps
    @@ Commit message
         This unsurprisingly makes things faster for partial clones of large
         repositories (here we're cloning linux.git):
     
    -      Test                              HEAD^               HEAD
    +      Test                               HEAD^               HEAD
           ------------------------------------------------------------------------------
    -      5310.9: simulated partial clone   38.94(37.28+5.87)   11.06(11.27+4.07) -71.6%
    +      5310.11: simulated partial clone   38.94(37.28+5.87)   11.06(11.27+4.07) -71.6%
     
      ## builtin/pack-objects.c ##
     @@ builtin/pack-objects.c: static int pack_options_allow_reuse(void)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 01/15] pack-bitmap: factor out type iterator initialization
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-15  0:10     ` Taylor Blau
  2020-02-14 18:22   ` [PATCH v2 02/15] pack-bitmap: fix leak of haves/wants object lists Jeff King
                     ` (13 subsequent siblings)
  14 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

When count_object_type() wants to iterate over the bitmap of all objects
of a certain type, we have to pair up OBJ_COMMIT with bitmap->commits,
and so forth. Since we're about to add more code to iterate over these
bitmaps, let's pull the initialization into its own function.

We can also use this to simplify traverse_bitmap_commit_list(). It
accomplishes the same thing by manually passing the object type and the
bitmap to show_objects_for_type(), but using our helper we just need the
object type.

Note there's one small code change here: previously we'd simply return
zero when counting an unknown object type, and now we'll BUG(). This
shouldn't matter in practice, as all of the callers pass in only usual
commit/tree/blob/tag types.

Signed-off-by: Jeff King <peff@peff.net>
---
 pack-bitmap.c | 63 +++++++++++++++++++++++++++------------------------
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index e07c798879..9ca356ee29 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -616,9 +616,35 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
 	}
 }
 
+static void init_type_iterator(struct ewah_iterator *it,
+			       struct bitmap_index *bitmap_git,
+			       enum object_type type)
+{
+	switch (type) {
+	case OBJ_COMMIT:
+		ewah_iterator_init(it, bitmap_git->commits);
+		break;
+
+	case OBJ_TREE:
+		ewah_iterator_init(it, bitmap_git->trees);
+		break;
+
+	case OBJ_BLOB:
+		ewah_iterator_init(it, bitmap_git->blobs);
+		break;
+
+	case OBJ_TAG:
+		ewah_iterator_init(it, bitmap_git->tags);
+		break;
+
+	default:
+		BUG("object type %d not stored by bitmap type index", type);
+		break;
+	}
+}
+
 static void show_objects_for_type(
 	struct bitmap_index *bitmap_git,
-	struct ewah_bitmap *type_filter,
 	enum object_type object_type,
 	show_reachable_fn show_reach)
 {
@@ -633,7 +659,7 @@ static void show_objects_for_type(
 	if (bitmap_git->reuse_objects == bitmap_git->pack->num_objects)
 		return;
 
-	ewah_iterator_init(&it, type_filter);
+	init_type_iterator(&it, bitmap_git, object_type);
 
 	while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
 		eword_t word = objects->words[i] & filter;
@@ -835,14 +861,10 @@ void traverse_bitmap_commit_list(struct bitmap_index *bitmap_git,
 {
 	assert(bitmap_git->result);
 
-	show_objects_for_type(bitmap_git, bitmap_git->commits,
-		OBJ_COMMIT, show_reachable);
-	show_objects_for_type(bitmap_git, bitmap_git->trees,
-		OBJ_TREE, show_reachable);
-	show_objects_for_type(bitmap_git, bitmap_git->blobs,
-		OBJ_BLOB, show_reachable);
-	show_objects_for_type(bitmap_git, bitmap_git->tags,
-		OBJ_TAG, show_reachable);
+	show_objects_for_type(bitmap_git, OBJ_COMMIT, show_reachable);
+	show_objects_for_type(bitmap_git, OBJ_TREE, show_reachable);
+	show_objects_for_type(bitmap_git, OBJ_BLOB, show_reachable);
+	show_objects_for_type(bitmap_git, OBJ_TAG, show_reachable);
 
 	show_extended_objects(bitmap_git, show_reachable);
 }
@@ -857,26 +879,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
 	struct ewah_iterator it;
 	eword_t filter;
 
-	switch (type) {
-	case OBJ_COMMIT:
-		ewah_iterator_init(&it, bitmap_git->commits);
-		break;
-
-	case OBJ_TREE:
-		ewah_iterator_init(&it, bitmap_git->trees);
-		break;
-
-	case OBJ_BLOB:
-		ewah_iterator_init(&it, bitmap_git->blobs);
-		break;
-
-	case OBJ_TAG:
-		ewah_iterator_init(&it, bitmap_git->tags);
-		break;
-
-	default:
-		return 0;
-	}
+	init_type_iterator(&it, bitmap_git, type);
 
 	while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
 		eword_t word = objects->words[i++] & filter;
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 02/15] pack-bitmap: fix leak of haves/wants object lists
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
  2020-02-14 18:22   ` [PATCH v2 01/15] pack-bitmap: factor out type iterator initialization Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-15  0:15     ` Taylor Blau
  2020-02-18 17:58     ` Derrick Stolee
  2020-02-14 18:22   ` [PATCH v2 03/15] rev-list: fallback to non-bitmap traversal when filtering Jeff King
                     ` (12 subsequent siblings)
  14 siblings, 2 replies; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

When we do a bitmap-aware revision traversal, we create an object_list
for each of the "haves" and "wants" tips. After creating the result
bitmaps these are no longer needed or used, but we never free the list
memory.

Signed-off-by: Jeff King <peff@peff.net>
---
 object.c      | 9 +++++++++
 object.h      | 2 ++
 pack-bitmap.c | 5 +++++
 3 files changed, 16 insertions(+)

diff --git a/object.c b/object.c
index 142ef69399..4d11949b38 100644
--- a/object.c
+++ b/object.c
@@ -307,6 +307,15 @@ int object_list_contains(struct object_list *list, struct object *obj)
 	return 0;
 }
 
+void object_list_free(struct object_list **list)
+{
+	while (*list) {
+		struct object_list *p = *list;
+		*list = p->next;
+		free(p);
+	}
+}
+
 /*
  * A zero-length string to which object_array_entry::name can be
  * initialized without requiring a malloc/free.
diff --git a/object.h b/object.h
index 25f5ab3d54..2dbabfca0a 100644
--- a/object.h
+++ b/object.h
@@ -151,6 +151,8 @@ struct object_list *object_list_insert(struct object *item,
 
 int object_list_contains(struct object_list *list, struct object *obj);
 
+void object_list_free(struct object_list **list);
+
 /* Object array handling .. */
 void add_object_array(struct object *obj, const char *name, struct object_array *array);
 void add_object_array_with_path(struct object *obj, const char *name, struct object_array *array, unsigned mode, const char *path);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9ca356ee29..6c06099dc7 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -787,10 +787,15 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
 
+	object_list_free(&wants);
+	object_list_free(&haves);
+
 	return bitmap_git;
 
 cleanup:
 	free_bitmap_index(bitmap_git);
+	object_list_free(&wants);
+	object_list_free(&haves);
 	return NULL;
 }
 
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 03/15] rev-list: fallback to non-bitmap traversal when filtering
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
  2020-02-14 18:22   ` [PATCH v2 01/15] pack-bitmap: factor out type iterator initialization Jeff King
  2020-02-14 18:22   ` [PATCH v2 02/15] pack-bitmap: fix leak of haves/wants object lists Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-15  0:22     ` Taylor Blau
  2020-02-14 18:22   ` [PATCH v2 04/15] pack-bitmap: refuse to do a bitmap traversal with pathspecs Jeff King
                     ` (11 subsequent siblings)
  14 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

The "--use-bitmap-index" option is usually aspirational: if we have
bitmaps and the request can be fulfilled more quickly using them we'll
do so, but otherwise fall back to a non-bitmap traversal.

The exception is object filtering, which explicitly dies if the two
options are combined. Let's convert this to the usual fallback behavior.
This is a minor convenience for now (since the caller can easily know
that --filter and --use-bitmap-index don't combine), but will become
much more useful as we start to support _some_ filters with bitmaps, but
not others.

The test infrastructure here is bigger than necessary for checking this
one small feature. But it will serve as the basis for more filtering
bitmap tests in future patches.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c                 |  4 ++--
 t/t6113-rev-list-bitmap-filters.sh | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100755 t/t6113-rev-list-bitmap-filters.sh

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index e28d62ec64..bce406bd1e 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -521,8 +521,8 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (revs.show_notes)
 		die(_("rev-list does not support display of notes"));
 
-	if (filter_options.choice && use_bitmap_index)
-		die(_("cannot combine --use-bitmap-index with object filtering"));
+	if (filter_options.choice)
+		use_bitmap_index = 0;
 
 	save_commit_buffer = (revs.verbose_header ||
 			      revs.grep_filter.pattern_list ||
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
new file mode 100755
index 0000000000..977f8d0930
--- /dev/null
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+test_description='rev-list combining bitmaps and filters'
+. ./test-lib.sh
+
+test_expect_success 'set up bitmapped repo' '
+	# one commit will have bitmaps, the other will not
+	test_commit one &&
+	git repack -adb &&
+	test_commit two
+'
+
+test_expect_success 'filters fallback to non-bitmap traversal' '
+	# use a path-based filter, since they are inherently incompatible with
+	# bitmaps (i.e., this test will never get confused by later code to
+	# combine the features)
+	filter=$(echo "!one" | git hash-object -w --stdin) &&
+	git rev-list --objects --filter=sparse:oid=$filter HEAD >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=sparse:oid=$filter HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 04/15] pack-bitmap: refuse to do a bitmap traversal with pathspecs
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
                     ` (2 preceding siblings ...)
  2020-02-14 18:22   ` [PATCH v2 03/15] rev-list: fallback to non-bitmap traversal when filtering Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-14 19:03     ` Junio C Hamano
  2020-02-14 18:22   ` [PATCH v2 05/15] rev-list: factor out bitmap-optimized routines Jeff King
                     ` (10 subsequent siblings)
  14 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

rev-list has refused to use bitmaps with pathspec limiting since
c8a70d3509 (rev-list: disable --use-bitmap-index when pruning commits,
2015-07-01). But this is true not just for rev-list, but for anyone who
calls prepare_bitmap_walk(); the code isn't equipped to handle this
case.  We never noticed because the only other callers would never pass
a pathspec limiter.

But let's push the check down into prepare_bitmap_walk() anyway. That's
a more logical place for it to live, as callers shouldn't need to know
the details (and must be prepared to fall back to a regular traversal
anyway, since there might not be bitmaps in the repository).

It would also prepare us for a day where this case _is_ handled, but
that's pretty unlikely. E.g., we could use bitmaps to generate the set
of commits, and then diff each commit to see if it matches the pathspec.
That would be slightly faster than a naive traversal that actually walks
the commits. But you'd probably do better still to make use of the newer
commit-graph feature to make walking the commits very cheap.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c |  2 +-
 pack-bitmap.c      | 12 +++++++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index bce406bd1e..4cb5a52dee 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -533,7 +533,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (show_progress)
 		progress = start_delayed_progress(show_progress, 0);
 
-	if (use_bitmap_index && !revs.prune) {
+	if (use_bitmap_index) {
 		if (revs.count && !revs.left_right && !revs.cherry_mark) {
 			uint32_t commit_count;
 			int max_count = revs.max_count;
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 6c06099dc7..a97b717e55 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -715,9 +715,19 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 	struct bitmap *wants_bitmap = NULL;
 	struct bitmap *haves_bitmap = NULL;
 
-	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
+	struct bitmap_index *bitmap_git;
+
+	/*
+	 * We can't do pathspec limiting with bitmaps, because we don't know
+	 * which commits are associated with which object changes (let alone
+	 * even which objects are associated with which paths).
+	 */
+	if (revs->prune)
+		return NULL;
+
 	/* try to open a bitmapped pack, but don't parse it yet
 	 * because we may not need to use it */
+	bitmap_git = xcalloc(1, sizeof(*bitmap_git));
 	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
 		goto cleanup;
 
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 05/15] rev-list: factor out bitmap-optimized routines
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
                     ` (3 preceding siblings ...)
  2020-02-14 18:22   ` [PATCH v2 04/15] pack-bitmap: refuse to do a bitmap traversal with pathspecs Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-15  0:35     ` Taylor Blau
  2020-02-14 18:22   ` [PATCH v2 06/15] rev-list: make --count work with --objects Jeff King
                     ` (9 subsequent siblings)
  14 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

There are a few operations in rev-list that are optimized for bitmaps.
Rather than having the code inline in cmd_rev_list(), let's move them
into helpers. This not only makes the flow of the main function simpler,
but it lets us replace the complex "can we do the optimization?"
conditionals with a series of early returns from the functions. That
also makes it easy to add comments explaining those conditions.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c | 88 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 67 insertions(+), 21 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 4cb5a52dee..38c5ca5603 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -364,6 +364,69 @@ static inline int parse_missing_action_value(const char *value)
 	return 0;
 }
 
+static int try_bitmap_count(struct rev_info *revs)
+{
+	uint32_t commit_count;
+	int max_count;
+	struct bitmap_index *bitmap_git;
+
+	/* This function only handles counting, not general traversal. */
+	if (!revs->count)
+		return -1;
+
+	/*
+	 * A bitmap result can't know left/right, etc, because we don't
+	 * actually traverse.
+	 */
+	if (revs->left_right || revs->cherry_mark)
+		return -1;
+
+	/*
+	 * This must be saved before doing any walking, since the revision
+	 * machinery will count it down to zero while traversing.
+	 */
+	max_count = revs->max_count;
+
+	bitmap_git = prepare_bitmap_walk(revs);
+	if (!bitmap_git)
+		return -1;
+
+	count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
+	if (max_count >= 0 && max_count < commit_count)
+		commit_count = max_count;
+
+	printf("%d\n", commit_count);
+	free_bitmap_index(bitmap_git);
+	return 0;
+}
+
+static int try_bitmap_traversal(struct rev_info *revs)
+{
+	struct bitmap_index *bitmap_git;
+
+	/*
+	 * We can't use a bitmap result with a traversal limit, since the set
+	 * of commits we'd get would be essentially random.
+	 */
+	if (revs->max_count >= 0)
+		return -1;
+
+	/*
+	 * Our bitmap result will return all objects, and we're not
+	 * yet prepared to show only particular types.
+	 */
+	if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
+		return -1;
+
+	bitmap_git = prepare_bitmap_walk(revs);
+	if (!bitmap_git)
+		return -1;
+
+	traverse_bitmap_commit_list(bitmap_git, &show_object_fast);
+	free_bitmap_index(bitmap_git);
+	return 0;
+}
+
 int cmd_rev_list(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
@@ -534,27 +597,10 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		progress = start_delayed_progress(show_progress, 0);
 
 	if (use_bitmap_index) {
-		if (revs.count && !revs.left_right && !revs.cherry_mark) {
-			uint32_t commit_count;
-			int max_count = revs.max_count;
-			struct bitmap_index *bitmap_git;
-			if ((bitmap_git = prepare_bitmap_walk(&revs))) {
-				count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
-				if (max_count >= 0 && max_count < commit_count)
-					commit_count = max_count;
-				printf("%d\n", commit_count);
-				free_bitmap_index(bitmap_git);
-				return 0;
-			}
-		} else if (revs.max_count < 0 &&
-			   revs.tag_objects && revs.tree_objects && revs.blob_objects) {
-			struct bitmap_index *bitmap_git;
-			if ((bitmap_git = prepare_bitmap_walk(&revs))) {
-				traverse_bitmap_commit_list(bitmap_git, &show_object_fast);
-				free_bitmap_index(bitmap_git);
-				return 0;
-			}
-		}
+		if (!try_bitmap_count(&revs))
+			return 0;
+		if (!try_bitmap_traversal(&revs))
+			return 0;
 	}
 
 	if (prepare_revision_walk(&revs))
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 06/15] rev-list: make --count work with --objects
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
                     ` (4 preceding siblings ...)
  2020-02-14 18:22   ` [PATCH v2 05/15] rev-list: factor out bitmap-optimized routines Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-15  0:42     ` Taylor Blau
  2020-02-18 18:05     ` Derrick Stolee
  2020-02-14 18:22   ` [PATCH v2 07/15] rev-list: allow bitmaps when counting objects Jeff King
                     ` (8 subsequent siblings)
  14 siblings, 2 replies; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

The current behavior from "rev-list --count --objects" is nonsensical:
we enumerate all of the objects except commits, but then give a count of
commits. This wasn't planned, and is just what the code happens to do.

Instead, let's give the answer the user almost certainly wanted: the
full count of objects.

Note that there are more complicated cases around cherry-marking, etc.
We'll punt on those for now, but let the user know that we can't produce
an answer (rather than giving them something useless).

We'll test both the new feature as well as a vanilla --count of commits,
since that surprisingly doesn't seem to be covered in the existing
tests.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c       | 13 +++++++++++++
 t/t6000-rev-list-misc.sh | 12 ++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 38c5ca5603..9452123988 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -253,11 +253,19 @@ static int finish_object(struct object *obj, const char *name, void *cb_data)
 static void show_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
+	struct rev_info *revs = info->revs;
+
 	if (finish_object(obj, name, cb_data))
 		return;
 	display_progress(progress, ++progress_counter);
 	if (info->flags & REV_LIST_QUIET)
 		return;
+
+	if (revs->count) {
+		revs->count_right++;
+		return;
+	}
+
 	if (arg_show_object_names)
 		show_object_with_name(stdout, obj, name);
 	else
@@ -584,6 +592,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	if (revs.show_notes)
 		die(_("rev-list does not support display of notes"));
 
+	if (revs.count &&
+	    (revs.tag_objects || revs.tree_objects || revs.blob_objects) &&
+	    (revs.left_right || revs.cherry_mark))
+		die(_("marked counting is incompatible with --objects"));
+
 	if (filter_options.choice)
 		use_bitmap_index = 0;
 
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index b8cf82349b..383f2c457d 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -148,4 +148,16 @@ test_expect_success 'rev-list --end-of-options' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rev-list --count' '
+	count=$(git rev-list --count HEAD) &&
+	git rev-list HEAD >actual &&
+	test_line_count = $count actual
+'
+
+test_expect_success 'rev-list --count --objects' '
+	count=$(git rev-list --count --objects HEAD) &&
+	git rev-list --objects HEAD >actual &&
+	test_line_count = $count actual
+'
+
 test_done
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 07/15] rev-list: allow bitmaps when counting objects
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
                     ` (5 preceding siblings ...)
  2020-02-14 18:22   ` [PATCH v2 06/15] rev-list: make --count work with --objects Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-15  0:45     ` Taylor Blau
  2020-02-14 18:22   ` [PATCH v2 08/15] t5310: factor out bitmap traversal comparison Jeff King
                     ` (7 subsequent siblings)
  14 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

The prior commit taught "--count --objects" to work without bitmaps. We
should be able to get the same answer much more quickly with bitmaps.

Note that we punt on the max_count case here. This perhaps _could_ be
made to work if we find all of the boundary commits and treat them as
UNINTERESTING, subtracting them (and their reachable objects) from the
set we return. That implies an actual commit traversal, but we'd still
be faster due to avoiding opening up any trees. Given the complexity and
the fact that anyone is unlikely to want this, it makes sense to just
fall back to the non-bitmap case for now.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c      | 21 ++++++++++++++++++---
 t/t5310-pack-bitmaps.sh |  6 ++++++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 9452123988..70f3207ecc 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -374,7 +374,10 @@ static inline int parse_missing_action_value(const char *value)
 
 static int try_bitmap_count(struct rev_info *revs)
 {
-	uint32_t commit_count;
+	uint32_t commit_count = 0,
+		 tag_count = 0,
+		 tree_count = 0,
+		 blob_count = 0;
 	int max_count;
 	struct bitmap_index *bitmap_git;
 
@@ -389,6 +392,15 @@ static int try_bitmap_count(struct rev_info *revs)
 	if (revs->left_right || revs->cherry_mark)
 		return -1;
 
+	/*
+	 * If we're counting reachable objects, we can't handle a max count of
+	 * commits to traverse, since we don't know which objects go with which
+	 * commit.
+	 */
+	if (revs->max_count >= 0 &&
+	    (revs->tag_objects || revs->tree_objects || revs->blob_objects))
+		return -1;
+
 	/*
 	 * This must be saved before doing any walking, since the revision
 	 * machinery will count it down to zero while traversing.
@@ -399,11 +411,14 @@ static int try_bitmap_count(struct rev_info *revs)
 	if (!bitmap_git)
 		return -1;
 
-	count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
+	count_bitmap_commit_list(bitmap_git, &commit_count,
+				 revs->tree_objects ? &tree_count : NULL,
+				 revs->blob_objects ? &blob_count : NULL,
+				 revs->tag_objects ? &tag_count : NULL);
 	if (max_count >= 0 && max_count < commit_count)
 		commit_count = max_count;
 
-	printf("%d\n", commit_count);
+	printf("%d\n", commit_count + tree_count + blob_count + tag_count);
 	free_bitmap_index(bitmap_git);
 	return 0;
 }
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 6640329ebf..7ba7d294a5 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -74,6 +74,12 @@ rev_list_tests() {
 		test_cmp expect actual
 	'
 
+	test_expect_success "counting objects via bitmap ($state)" '
+		git rev-list --count --objects HEAD >expect &&
+		git rev-list --use-bitmap-index --count --objects HEAD >actual &&
+		test_cmp expect actual
+	'
+
 	test_expect_success "enumerate --objects ($state)" '
 		git rev-list --objects --use-bitmap-index HEAD >tmp &&
 		cut -d" " -f1 <tmp >tmp2 &&
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 08/15] t5310: factor out bitmap traversal comparison
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
                     ` (6 preceding siblings ...)
  2020-02-14 18:22   ` [PATCH v2 07/15] rev-list: allow bitmaps when counting objects Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-15  2:14     ` Taylor Blau
  2020-02-14 18:22   ` [PATCH v2 09/15] rev-list: allow commit-only bitmap traversals Jeff King
                     ` (6 subsequent siblings)
  14 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

We check the results of "rev-list --use-bitmap-index" by comparing it to
the same traversal without the bitmap option. However, this is a little
tricky to do because of some output differences (see the included
comment for details). Let's pull this out into a helper function, since
we'll be adding some similar tests.

While we're at it, let's also try to confirm that the bitmap output did
indeed use bitmaps. Since the code internally falls back to the
non-bitmap path in some cases, the tests are at risk of becoming trivial
noops.

This is a bit fragile, as not all outputs will differ (e.g., looking at
only the commits from a fully-bitmapped pack will end up exactly the
same as the normal traversal order, since it also matches the pack
order). So we'll provide an escape hatch by which tests can disable this
check (which should only be used after manually confirming that bitmaps
kicked in).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5310-pack-bitmaps.sh | 10 +++-------
 t/test-lib-functions.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 7ba7d294a5..b8645ae070 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -81,13 +81,9 @@ rev_list_tests() {
 	'
 
 	test_expect_success "enumerate --objects ($state)" '
-		git rev-list --objects --use-bitmap-index HEAD >tmp &&
-		cut -d" " -f1 <tmp >tmp2 &&
-		sort <tmp2 >actual &&
-		git rev-list --objects HEAD >tmp &&
-		cut -d" " -f1 <tmp >tmp2 &&
-		sort <tmp2 >expect &&
-		test_cmp expect actual
+		git rev-list --objects --use-bitmap-index HEAD >actual &&
+		git rev-list --objects HEAD >expect &&
+		test_bitmap_traversal expect actual
 	'
 
 	test_expect_success "bitmap --objects handles non-commit objects ($state)" '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 284c52d076..352c213d52 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1516,3 +1516,30 @@ test_set_port () {
 	port=$(($port + ${GIT_TEST_STRESS_JOB_NR:-0}))
 	eval $var=$port
 }
+
+# Compare a file containing rev-list bitmap traversal output to its non-bitmap
+# counterpart. You can't just use test_cmp for this, because the two produce
+# subtly different output:
+#
+#   - regular output is in traversal order, whereas bitmap is split by type,
+#     with non-packed objects at the end
+#
+#   - regular output has a space and the pathname appended to non-commit
+#     objects; bitmap output omits this
+#
+# This function normalizes and compares the two. The second file should
+# always be the bitmap output.
+test_bitmap_traversal () {
+	if test "$1" = "--no-confirm-bitmaps"
+	then
+		shift
+	elif cmp "$1" "$2"
+	then
+		echo >&2 "identical raw outputs; are you sure bitmaps were used?"
+		return 1
+	fi &&
+	cut -d' ' -f1 "$1" | sort >"$1.normalized" &&
+	sort "$2" >"$2.normalized" &&
+	test_cmp "$1.normalized" "$2.normalized" &&
+	rm -f "$1.normalized" "$2.normalized"
+}
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 09/15] rev-list: allow commit-only bitmap traversals
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
                     ` (7 preceding siblings ...)
  2020-02-14 18:22   ` [PATCH v2 08/15] t5310: factor out bitmap traversal comparison Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-18 18:18     ` Derrick Stolee
  2020-02-14 18:22   ` [PATCH v2 10/15] pack-bitmap: basic noop bitmap filter infrastructure Jeff King
                     ` (5 subsequent siblings)
  14 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Ever since we added reachability bitmap support, we've been able to use
it with rev-list to get the full list of objects, like:

  git rev-list --objects --use-bitmap-index --all

But you can't do so without --objects, since we weren't ready to just
show the commits. However, the internals of the bitmap code are mostly
ready for this: they avoid opening up trees when walking to fill in the
bitmaps. We just need to actually pass in the rev_info to
traverse_bitmap_commit_list() so it knows which types to bother
triggering our callback for.

For completeness, the perf test now covers both the existing --objects
case, as well as the new commits-only behavior (the objects one got way
faster when we introduced bitmaps, but obviously isn't improved now).

Here are numbers for linux.git:

  Test                         HEAD^               HEAD
  ------------------------------------------------------------------------
  5310.7: rev-list (commits)   8.29(8.10+0.19)       1.76(1.72+0.04) -78.8%
  5310.8: rev-list (objects)   8.06(7.94+0.12)       8.14(7.94+0.13) +1.0%

That run was cheating a little, as I didn't have any commit-graph in the
repository, and we'd built it by default these days when running git-gc.
Here are numbers with a commit-graph:

  Test                         HEAD^               HEAD
  ------------------------------------------------------------------------
  5310.7: rev-list (commits)   0.70(0.58+0.12)     0.51(0.46+0.04) -27.1%
  5310.8: rev-list (objects)   6.20(6.09+0.10)     6.27(6.16+0.11) +1.1%

Still an improvement, but a lot less impressive.

We could have the perf script remove any commit-graph to show the
out-sized effect, but it probably makes sense to leave it in what would
be a more typical setup.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/pack-objects.c       |  3 ++-
 builtin/rev-list.c           |  9 +--------
 pack-bitmap.c                | 20 +++++++++++++++-----
 pack-bitmap.h                |  1 +
 reachable.c                  |  2 +-
 t/perf/p5310-pack-bitmaps.sh |  8 ++++++++
 t/t5310-pack-bitmaps.sh      |  6 ++++++
 7 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 393c20a2d7..06915ebe7f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3054,7 +3054,8 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
 		display_progress(progress_state, nr_result);
 	}
 
-	traverse_bitmap_commit_list(bitmap_git, &add_object_entry_from_bitmap);
+	traverse_bitmap_commit_list(bitmap_git, revs,
+				    &add_object_entry_from_bitmap);
 	return 0;
 }
 
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 70f3207ecc..937324cef0 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -434,18 +434,11 @@ static int try_bitmap_traversal(struct rev_info *revs)
 	if (revs->max_count >= 0)
 		return -1;
 
-	/*
-	 * Our bitmap result will return all objects, and we're not
-	 * yet prepared to show only particular types.
-	 */
-	if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
-		return -1;
-
 	bitmap_git = prepare_bitmap_walk(revs);
 	if (!bitmap_git)
 		return -1;
 
-	traverse_bitmap_commit_list(bitmap_git, &show_object_fast);
+	traverse_bitmap_commit_list(bitmap_git, revs, &show_object_fast);
 	free_bitmap_index(bitmap_git);
 	return 0;
 }
diff --git a/pack-bitmap.c b/pack-bitmap.c
index a97b717e55..2fbc748b19 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -599,6 +599,7 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 }
 
 static void show_extended_objects(struct bitmap_index *bitmap_git,
+				  struct rev_info *revs,
 				  show_reachable_fn show_reach)
 {
 	struct bitmap *objects = bitmap_git->result;
@@ -612,6 +613,11 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
 			continue;
 
 		obj = eindex->objects[i];
+		if ((obj->type == OBJ_BLOB && !revs->blob_objects) ||
+		    (obj->type == OBJ_TREE && !revs->tree_objects) ||
+		    (obj->type == OBJ_TAG && !revs->tag_objects))
+			continue;
+
 		show_reach(&obj->oid, obj->type, 0, eindex->hashes[i], NULL, 0);
 	}
 }
@@ -872,16 +878,20 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 }
 
 void traverse_bitmap_commit_list(struct bitmap_index *bitmap_git,
+				 struct rev_info *revs,
 				 show_reachable_fn show_reachable)
 {
 	assert(bitmap_git->result);
 
 	show_objects_for_type(bitmap_git, OBJ_COMMIT, show_reachable);
-	show_objects_for_type(bitmap_git, OBJ_TREE, show_reachable);
-	show_objects_for_type(bitmap_git, OBJ_BLOB, show_reachable);
-	show_objects_for_type(bitmap_git, OBJ_TAG, show_reachable);
-
-	show_extended_objects(bitmap_git, show_reachable);
+	if (revs->tree_objects)
+		show_objects_for_type(bitmap_git, OBJ_TREE, show_reachable);
+	if (revs->blob_objects)
+		show_objects_for_type(bitmap_git, OBJ_BLOB, show_reachable);
+	if (revs->tag_objects)
+		show_objects_for_type(bitmap_git, OBJ_TAG, show_reachable);
+
+	show_extended_objects(bitmap_git, revs, show_reachable);
 }
 
 static uint32_t count_object_type(struct bitmap_index *bitmap_git,
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 466c5afa09..b0c06a212e 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -44,6 +44,7 @@ struct bitmap_index *prepare_bitmap_git(struct repository *r);
 void count_bitmap_commit_list(struct bitmap_index *, uint32_t *commits,
 			      uint32_t *trees, uint32_t *blobs, uint32_t *tags);
 void traverse_bitmap_commit_list(struct bitmap_index *,
+				 struct rev_info *revs,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs);
diff --git a/reachable.c b/reachable.c
index 8f50235b28..0919f025c4 100644
--- a/reachable.c
+++ b/reachable.c
@@ -225,7 +225,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 
 	bitmap_git = prepare_bitmap_walk(revs);
 	if (bitmap_git) {
-		traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
+		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
 		free_bitmap_index(bitmap_git);
 		return;
 	}
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 6a3a42531b..e52f66ec9e 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -39,6 +39,14 @@ test_perf 'pack to file (bitmap)' '
 	git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
 '
 
+test_perf 'rev-list (commits)' '
+	git rev-list --all --use-bitmap-index >/dev/null
+'
+
+test_perf 'rev-list (objects)' '
+	git rev-list --all --use-bitmap-index --objects >/dev/null
+'
+
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
 	cutoff=$(git rev-list HEAD~100 -1) &&
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index b8645ae070..2c64d0c441 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -80,6 +80,12 @@ rev_list_tests() {
 		test_cmp expect actual
 	'
 
+	test_expect_success "enumerate commits ($state)" '
+		git rev-list --use-bitmap-index HEAD >actual &&
+		git rev-list HEAD >expect &&
+		test_bitmap_traversal --no-confirm-bitmaps expect actual
+	'
+
 	test_expect_success "enumerate --objects ($state)" '
 		git rev-list --objects --use-bitmap-index HEAD >actual &&
 		git rev-list --objects HEAD >expect &&
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 10/15] pack-bitmap: basic noop bitmap filter infrastructure
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
                     ` (8 preceding siblings ...)
  2020-02-14 18:22   ` [PATCH v2 09/15] rev-list: allow commit-only bitmap traversals Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-14 18:22   ` [PATCH v2 11/15] rev-list: use bitmap filters for traversal Jeff King
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Currently you can't use object filters with bitmaps, but we plan to
support at least some filters with bitmaps. Let's introduce some
infrastructure that will help us do that:

  - prepare_bitmap_walk() now accepts a list_objects_filter_options
    parameter (which can be NULL for no filtering; all the current
    callers pass this)

  - we'll bail early if the filter is incompatible with bitmaps (just as
    we would if there were no bitmaps at all). Currently all filters are
    incompatible.

  - we'll filter the resulting bitmap; since there are no supported
    filters yet, this is always a noop.

There should be no behavior change yet, but we'll support some actual
filters in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/pack-objects.c |  2 +-
 builtin/rev-list.c     |  4 ++--
 pack-bitmap.c          | 26 +++++++++++++++++++++++++-
 pack-bitmap.h          |  4 +++-
 reachable.c            |  2 +-
 5 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 06915ebe7f..2bb81c2133 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3040,7 +3040,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, NULL)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 937324cef0..6ff5e175fa 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -407,7 +407,7 @@ static int try_bitmap_count(struct rev_info *revs)
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs);
+	bitmap_git = prepare_bitmap_walk(revs, NULL);
 	if (!bitmap_git)
 		return -1;
 
@@ -434,7 +434,7 @@ static int try_bitmap_traversal(struct rev_info *revs)
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs);
+	bitmap_git = prepare_bitmap_walk(revs, NULL);
 	if (!bitmap_git)
 		return -1;
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2fbc748b19..48c8694f92 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "repository.h"
 #include "object-store.h"
+#include "list-objects-filter-options.h"
 
 /*
  * An entry on the bitmap index, representing the bitmap for a given
@@ -711,7 +712,25 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
 	return 0;
 }
 
-struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
+static int filter_bitmap(struct bitmap_index *bitmap_git,
+			 struct object_list *tip_objects,
+			 struct bitmap *to_filter,
+			 struct list_objects_filter_options *filter)
+{
+	if (!filter || filter->choice == LOFC_DISABLED)
+		return 0;
+
+	/* filter choice not handled */
+	return -1;
+}
+
+static int can_filter_bitmap(struct list_objects_filter_options *filter)
+{
+	return !filter_bitmap(NULL, NULL, NULL, filter);
+}
+
+struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
+					 struct list_objects_filter_options *filter)
 {
 	unsigned int i;
 
@@ -731,6 +750,9 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 	if (revs->prune)
 		return NULL;
 
+	if (!can_filter_bitmap(filter))
+		return NULL;
+
 	/* try to open a bitmapped pack, but don't parse it yet
 	 * because we may not need to use it */
 	bitmap_git = xcalloc(1, sizeof(*bitmap_git));
@@ -800,6 +822,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
 
+	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
+
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index b0c06a212e..956775d0bb 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -8,6 +8,7 @@
 struct commit;
 struct repository;
 struct rev_info;
+struct list_objects_filter_options;
 
 static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};
 
@@ -47,7 +48,8 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 				 struct rev_info *revs,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
-struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs);
+struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
+					 struct list_objects_filter_options *filter);
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
 				       struct packed_git **packfile,
 				       uint32_t *entries, off_t *up_to);
diff --git a/reachable.c b/reachable.c
index 0919f025c4..77a60c70a5 100644
--- a/reachable.c
+++ b/reachable.c
@@ -223,7 +223,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	cp.progress = progress;
 	cp.count = 0;
 
-	bitmap_git = prepare_bitmap_walk(revs);
+	bitmap_git = prepare_bitmap_walk(revs, NULL);
 	if (bitmap_git) {
 		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
 		free_bitmap_index(bitmap_git);
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 11/15] rev-list: use bitmap filters for traversal
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
                     ` (9 preceding siblings ...)
  2020-02-14 18:22   ` [PATCH v2 10/15] pack-bitmap: basic noop bitmap filter infrastructure Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-14 18:22   ` [PATCH v2 12/15] bitmap: add bitmap_unset() function Jeff King
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

This just passes the filter-options struct to prepare_bitmap_walk().
Since the bitmap code doesn't actually support any filters yet, it will
fallback to the non-bitmap code if any --filter is specified. But this
lets us exercise that rejection code path, as well as getting us ready
to test filters via rev-list when we _do_ support them.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 6ff5e175fa..35e14ad2ed 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -372,7 +372,8 @@ static inline int parse_missing_action_value(const char *value)
 	return 0;
 }
 
-static int try_bitmap_count(struct rev_info *revs)
+static int try_bitmap_count(struct rev_info *revs,
+			    struct list_objects_filter_options *filter)
 {
 	uint32_t commit_count = 0,
 		 tag_count = 0,
@@ -407,7 +408,7 @@ static int try_bitmap_count(struct rev_info *revs)
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, NULL);
+	bitmap_git = prepare_bitmap_walk(revs, filter);
 	if (!bitmap_git)
 		return -1;
 
@@ -423,7 +424,8 @@ static int try_bitmap_count(struct rev_info *revs)
 	return 0;
 }
 
-static int try_bitmap_traversal(struct rev_info *revs)
+static int try_bitmap_traversal(struct rev_info *revs,
+				struct list_objects_filter_options *filter)
 {
 	struct bitmap_index *bitmap_git;
 
@@ -434,7 +436,7 @@ static int try_bitmap_traversal(struct rev_info *revs)
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, NULL);
+	bitmap_git = prepare_bitmap_walk(revs, filter);
 	if (!bitmap_git)
 		return -1;
 
@@ -605,9 +607,6 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	    (revs.left_right || revs.cherry_mark))
 		die(_("marked counting is incompatible with --objects"));
 
-	if (filter_options.choice)
-		use_bitmap_index = 0;
-
 	save_commit_buffer = (revs.verbose_header ||
 			      revs.grep_filter.pattern_list ||
 			      revs.grep_filter.header_list);
@@ -618,9 +617,9 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		progress = start_delayed_progress(show_progress, 0);
 
 	if (use_bitmap_index) {
-		if (!try_bitmap_count(&revs))
+		if (!try_bitmap_count(&revs, &filter_options))
 			return 0;
-		if (!try_bitmap_traversal(&revs))
+		if (!try_bitmap_traversal(&revs, &filter_options))
 			return 0;
 	}
 
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 12/15] bitmap: add bitmap_unset() function
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
                     ` (10 preceding siblings ...)
  2020-02-14 18:22   ` [PATCH v2 11/15] rev-list: use bitmap filters for traversal Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-14 18:22   ` [PATCH v2 13/15] pack-bitmap: implement BLOB_NONE filtering Jeff King
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

We've never needed to unset an individual bit in a bitmap until now.
Typically they start with all bits unset and we bitmap_set() them, or we
are applying another bitmap as a mask. But the easiest way to apply an
object filter to a bitmap result will be to unset the individual bits.

Signed-off-by: Jeff King <peff@peff.net>
---
 ewah/bitmap.c | 8 ++++++++
 ewah/ewok.h   | 1 +
 2 files changed, 9 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 52f1178db4..1c31b3e249 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -45,6 +45,14 @@ void bitmap_set(struct bitmap *self, size_t pos)
 	self->words[block] |= EWAH_MASK(pos);
 }
 
+void bitmap_unset(struct bitmap *self, size_t pos)
+{
+	size_t block = EWAH_BLOCK(pos);
+
+	if (block < self->word_alloc)
+		self->words[block] &= ~EWAH_MASK(pos);
+}
+
 int bitmap_get(struct bitmap *self, size_t pos)
 {
 	size_t block = EWAH_BLOCK(pos);
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 84b2a29faa..59f4ef7c4f 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -173,6 +173,7 @@ struct bitmap {
 
 struct bitmap *bitmap_new(void);
 void bitmap_set(struct bitmap *self, size_t pos);
+void bitmap_unset(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_reset(struct bitmap *self);
 void bitmap_free(struct bitmap *self);
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 13/15] pack-bitmap: implement BLOB_NONE filtering
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
                     ` (11 preceding siblings ...)
  2020-02-14 18:22   ` [PATCH v2 12/15] bitmap: add bitmap_unset() function Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-18 19:26     ` Derrick Stolee
  2020-02-14 18:22   ` [PATCH v2 14/15] pack-bitmap: implement BLOB_LIMIT filtering Jeff King
  2020-02-14 18:22   ` [PATCH v2 15/15] pack-objects: support filters with bitmaps Jeff King
  14 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

We can easily support BLOB_NONE filters with bitmaps. Since we know the
types of all of the objects, we just need to clear the result bits of
any blobs.

Note two subtleties in the implementation (which I also called out in
comments):

  - we have to include any blobs that were specifically asked for (and
    not reached through graph traversal) to match the non-bitmap version

  - we have to handle in-pack and "ext_index" objects separately.
    Arguably prepare_bitmap_walk() could be adding these ext_index
    objects to the type bitmaps. But it doesn't for now, so let's match
    the rest of the bitmap code here (it probably wouldn't be an
    efficiency improvement to do so since the cost of extending those
    bitmaps is about the same as our loop here, but it might make the
    code a bit simpler).

Here are perf results for the new test on git.git:

  Test                                    HEAD^             HEAD
  --------------------------------------------------------------------------------
  5310.9: rev-list count with blob:none   1.67(1.62+0.05)   0.22(0.21+0.02) -86.8%

Signed-off-by: Jeff King <peff@peff.net>
---
 pack-bitmap.c                      | 74 ++++++++++++++++++++++++++++++
 t/perf/p5310-pack-bitmaps.sh       |  5 ++
 t/t6113-rev-list-bitmap-filters.sh | 14 ++++++
 3 files changed, 93 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 48c8694f92..dcf8a9aadf 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -712,6 +712,73 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
 	return 0;
 }
 
+static struct bitmap *find_tip_blobs(struct bitmap_index *bitmap_git,
+				     struct object_list *tip_objects)
+{
+	struct bitmap *result = bitmap_new();
+	struct object_list *p;
+
+	for (p = tip_objects; p; p = p->next) {
+		int pos;
+
+		if (p->item->type != OBJ_BLOB)
+			continue;
+
+		pos = bitmap_position(bitmap_git, &p->item->oid);
+		if (pos < 0)
+			continue;
+
+		bitmap_set(result, pos);
+	}
+
+	return result;
+}
+
+static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
+				    struct object_list *tip_objects,
+				    struct bitmap *to_filter)
+{
+	struct eindex *eindex = &bitmap_git->ext_index;
+	struct bitmap *tips;
+	struct ewah_iterator it;
+	eword_t mask;
+	uint32_t i;
+
+	/*
+	 * The non-bitmap version of this filter never removes
+	 * blobs which the other side specifically asked for,
+	 * so we must match that behavior.
+	 */
+	tips = find_tip_blobs(bitmap_git, tip_objects);
+
+	/*
+	 * We can use the blob type-bitmap to work in whole words
+	 * for the objects that are actually in the bitmapped packfile.
+	 */
+	for (i = 0, init_type_iterator(&it, bitmap_git, OBJ_BLOB);
+	     i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+	     i++) {
+		if (i < tips->word_alloc)
+			mask &= ~tips->words[i];
+		to_filter->words[i] &= ~mask;
+	}
+
+	/*
+	 * Clear any blobs that weren't in the packfile (and so would not have
+	 * been caught by the loop above. We'll have to check them
+	 * individually.
+	 */
+	for (i = 0; i < eindex->count; i++) {
+		uint32_t pos = i + bitmap_git->pack->num_objects;
+		if (eindex->objects[i]->type == OBJ_BLOB &&
+		    bitmap_get(to_filter, pos) &&
+		    !bitmap_get(tips, pos))
+			bitmap_unset(to_filter, pos);
+	}
+
+	bitmap_free(tips);
+}
+
 static int filter_bitmap(struct bitmap_index *bitmap_git,
 			 struct object_list *tip_objects,
 			 struct bitmap *to_filter,
@@ -720,6 +787,13 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 	if (!filter || filter->choice == LOFC_DISABLED)
 		return 0;
 
+	if (filter->choice == LOFC_BLOB_NONE) {
+		if (bitmap_git)
+			filter_bitmap_blob_none(bitmap_git, tip_objects,
+						to_filter);
+		return 0;
+	}
+
 	/* filter choice not handled */
 	return -1;
 }
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index e52f66ec9e..936742314c 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -47,6 +47,11 @@ test_perf 'rev-list (objects)' '
 	git rev-list --all --use-bitmap-index --objects >/dev/null
 '
 
+test_perf 'rev-list count with blob:none' '
+	git rev-list --use-bitmap-index --count --objects --all \
+		--filter=blob:none >/dev/null
+'
+
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
 	cutoff=$(git rev-list HEAD~100 -1) &&
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index 977f8d0930..f4e6d582f0 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -21,4 +21,18 @@ test_expect_success 'filters fallback to non-bitmap traversal' '
 	test_cmp expect actual
 '
 
+test_expect_success 'blob:none filter' '
+	git rev-list --objects --filter=blob:none HEAD >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=blob:none HEAD >actual &&
+	test_bitmap_traversal expect actual
+'
+
+test_expect_success 'blob:none filter with specified blob' '
+	git rev-list --objects --filter=blob:none HEAD HEAD:two.t >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=blob:none HEAD HEAD:two.t >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_done
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 14/15] pack-bitmap: implement BLOB_LIMIT filtering
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
                     ` (12 preceding siblings ...)
  2020-02-14 18:22   ` [PATCH v2 13/15] pack-bitmap: implement BLOB_NONE filtering Jeff King
@ 2020-02-14 18:22   ` Jeff King
  2020-02-14 18:22   ` [PATCH v2 15/15] pack-objects: support filters with bitmaps Jeff King
  14 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Just as the previous commit implemented BLOB_NONE, we can support
BLOB_LIMIT filters by looking at the sizes of any blobs in the result
and unsetting their bits as appropriate. This is slightly more expensive
than BLOB_NONE, but still produces a noticeable speedup (these results
are on git.git):

  Test                                         HEAD~2            HEAD
  ------------------------------------------------------------------------------------
  5310.9:  rev-list count with blob:none       1.80(1.77+0.02)   0.22(0.20+0.02) -87.8%
  5310.10: rev-list count with blob:limit=1k   1.99(1.96+0.03)   0.29(0.25+0.03) -85.4%

The implementation is similar to the BLOB_NONE one, with the exception
that we have to go object-by-object while walking the blob-type bitmap
(since we can't mask out the matches, but must look up the size
individually for each blob). The trick with using ctz64() is taken from
show_objects_for_type(), which likewise needs to find individual bits
(but wants to quickly skip over big chunks without blobs).

Signed-off-by: Jeff King <peff@peff.net>
---
 pack-bitmap.c                      | 80 ++++++++++++++++++++++++++++++
 t/perf/p5310-pack-bitmaps.sh       |  5 ++
 t/t6113-rev-list-bitmap-filters.sh | 20 +++++++-
 3 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index dcf8a9aadf..9d680065e4 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -779,6 +779,78 @@ static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
 	bitmap_free(tips);
 }
 
+static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
+				     uint32_t pos)
+{
+	struct packed_git *pack = bitmap_git->pack;
+	unsigned long size;
+	struct object_info oi = OBJECT_INFO_INIT;
+
+	oi.sizep = &size;
+
+	if (pos < pack->num_objects) {
+		struct revindex_entry *entry = &pack->revindex[pos];
+		if (packed_object_info(the_repository, pack,
+				       entry->offset, &oi) < 0) {
+			struct object_id oid;
+			nth_packed_object_oid(&oid, pack, entry->nr);
+			die(_("unable to get size of %s"), oid_to_hex(&oid));
+		}
+	} else {
+		struct eindex *eindex = &bitmap_git->ext_index;
+		struct object *obj = eindex->objects[pos - pack->num_objects];
+		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
+			die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
+	}
+
+	return size;
+}
+
+static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
+				     struct object_list *tip_objects,
+				     struct bitmap *to_filter,
+				     unsigned long limit)
+{
+	struct eindex *eindex = &bitmap_git->ext_index;
+	struct bitmap *tips;
+	struct ewah_iterator it;
+	eword_t mask;
+	uint32_t i;
+
+	tips = find_tip_blobs(bitmap_git, tip_objects);
+
+	for (i = 0, init_type_iterator(&it, bitmap_git, OBJ_BLOB);
+	     i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
+	     i++) {
+		eword_t word = to_filter->words[i] & mask;
+		unsigned offset;
+
+		for (offset = 0; offset < BITS_IN_EWORD; offset++) {
+			uint32_t pos;
+
+			if ((word >> offset) == 0)
+				break;
+			offset += ewah_bit_ctz64(word >> offset);
+			pos = i * BITS_IN_EWORD + offset;
+
+			if (!bitmap_get(tips, pos) &&
+			    get_size_by_pos(bitmap_git, pos) >= limit)
+				bitmap_unset(to_filter, pos);
+		}
+	}
+
+	for (i = 0; i < eindex->count; i++) {
+		uint32_t pos = i + bitmap_git->pack->num_objects;
+		if (eindex->objects[i]->type == OBJ_BLOB &&
+		    bitmap_get(to_filter, pos) &&
+		    !bitmap_get(tips, pos) &&
+		    get_size_by_pos(bitmap_git, pos) >= limit)
+			bitmap_unset(to_filter, pos);
+	}
+
+	bitmap_free(tips);
+}
+
 static int filter_bitmap(struct bitmap_index *bitmap_git,
 			 struct object_list *tip_objects,
 			 struct bitmap *to_filter,
@@ -794,6 +866,14 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
+	if (filter->choice == LOFC_BLOB_LIMIT) {
+		if (bitmap_git)
+			filter_bitmap_blob_limit(bitmap_git, tip_objects,
+						 to_filter,
+						 filter->blob_limit_value);
+		return 0;
+	}
+
 	/* filter choice not handled */
 	return -1;
 }
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 936742314c..8b43a545c1 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -52,6 +52,11 @@ test_perf 'rev-list count with blob:none' '
 		--filter=blob:none >/dev/null
 '
 
+test_perf 'rev-list count with blob:limit=1k' '
+	git rev-list --use-bitmap-index --count --objects --all \
+		--filter=blob:limit=1k >/dev/null
+'
+
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
 	cutoff=$(git rev-list HEAD~100 -1) &&
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index f4e6d582f0..145603f124 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -6,8 +6,10 @@ test_description='rev-list combining bitmaps and filters'
 test_expect_success 'set up bitmapped repo' '
 	# one commit will have bitmaps, the other will not
 	test_commit one &&
+	test_commit much-larger-blob-one &&
 	git repack -adb &&
-	test_commit two
+	test_commit two &&
+	test_commit much-larger-blob-two
 '
 
 test_expect_success 'filters fallback to non-bitmap traversal' '
@@ -35,4 +37,20 @@ test_expect_success 'blob:none filter with specified blob' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'blob:limit filter' '
+	git rev-list --objects --filter=blob:limit=5 HEAD >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=blob:limit=5 HEAD >actual &&
+	test_bitmap_traversal expect actual
+'
+
+test_expect_success 'blob:limit filter with specified blob' '
+	git rev-list --objects --filter=blob:limit=5 \
+		     HEAD HEAD:much-larger-blob-two.t >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=blob:limit=5 \
+		     HEAD HEAD:much-larger-blob-two.t >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_done
-- 
2.25.0.796.gcc29325708


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 15/15] pack-objects: support filters with bitmaps
  2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
                     ` (13 preceding siblings ...)
  2020-02-14 18:22   ` [PATCH v2 14/15] pack-bitmap: implement BLOB_LIMIT filtering Jeff King
@ 2020-02-14 18:22   ` Jeff King
  14 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-14 18:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Just as rev-list recently learned to combine filters and bitmaps, let's
do the same for pack-objects. The infrastructure is all there; we just
need to pass along our filter options, and the pack-bitmap code will
decide to use bitmaps or not.

This unsurprisingly makes things faster for partial clones of large
repositories (here we're cloning linux.git):

  Test                               HEAD^               HEAD
  ------------------------------------------------------------------------------
  5310.11: simulated partial clone   38.94(37.28+5.87)   11.06(11.27+4.07) -71.6%

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/pack-objects.c       |  3 +--
 t/perf/p5310-pack-bitmaps.sh |  4 ++++
 t/t5310-pack-bitmaps.sh      | 14 ++++++++++++++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 2bb81c2133..6bc9bc1ce2 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3040,7 +3040,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, NULL)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
@@ -3419,7 +3419,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (filter_options.choice) {
 		if (!pack_to_stdout)
 			die(_("cannot use --filter without --stdout"));
-		use_bitmap_index = 0;
 	}
 
 	/*
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 8b43a545c1..7743f4f4c9 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -57,6 +57,10 @@ test_perf 'rev-list count with blob:limit=1k' '
 		--filter=blob:limit=1k >/dev/null
 '
 
+test_perf 'simulated partial clone' '
+	git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
+'
+
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
 	cutoff=$(git rev-list HEAD~100 -1) &&
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 2c64d0c441..8318781d2b 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -107,6 +107,20 @@ test_expect_success 'clone from bitmapped repository' '
 	test_cmp expect actual
 '
 
+test_expect_success 'partial clone from bitmapped repository' '
+	test_config uploadpack.allowfilter true &&
+	git clone --no-local --bare --filter=blob:none . partial-clone.git &&
+	(
+		cd partial-clone.git &&
+		pack=$(echo objects/pack/*.pack) &&
+		git verify-pack -v "$pack" >have &&
+		awk "/blob/ { print \$1 }" <have >blobs &&
+		# we expect this single blob because of the direct ref
+		git rev-parse refs/tags/tagged-blob >expect &&
+		test_cmp expect blobs
+	)
+'
+
 test_expect_success 'setup further non-bitmapped commits' '
 	test_commit_bulk --id=further 10
 '
-- 
2.25.0.796.gcc29325708

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 04/15] pack-bitmap: refuse to do a bitmap traversal with pathspecs
  2020-02-14 18:22   ` [PATCH v2 04/15] pack-bitmap: refuse to do a bitmap traversal with pathspecs Jeff King
@ 2020-02-14 19:03     ` Junio C Hamano
  2020-02-14 20:51       ` Jeff King
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2020-02-14 19:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> It would also prepare us for a day where this case _is_ handled, but
> that's pretty unlikely. E.g., we could use bitmaps to generate the set
> of commits, and then diff each commit to see if it matches the pathspec.

Would the gs/commit-graph-path-filter topic possibly help in this regard?
I was wondering how it would fit within the framework this series sets
up to teach object-filtering and reachability-bitmap codepaths to
work well together.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 04/15] pack-bitmap: refuse to do a bitmap traversal with pathspecs
  2020-02-14 19:03     ` Junio C Hamano
@ 2020-02-14 20:51       ` Jeff King
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-14 20:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Feb 14, 2020 at 11:03:26AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > It would also prepare us for a day where this case _is_ handled, but
> > that's pretty unlikely. E.g., we could use bitmaps to generate the set
> > of commits, and then diff each commit to see if it matches the pathspec.
> 
> Would the gs/commit-graph-path-filter topic possibly help in this regard?
> I was wondering how it would fit within the framework this series sets
> up to teach object-filtering and reachability-bitmap codepaths to
> work well together.

I think it would make the scheme I described faster, because that diff
becomes faster. So the commit traversal itself becomes proportionally
more expensive. And if we could speed that up with bitmaps, that's some
improvement. OTOH, my later timings in patch 9 showed that commit graphs
already make that part pretty fast. And they do so without a bunch of
weird restrictions and hassles. So I suspect it's not really worth the
effort to implement this half-way bitmaps approach (for a special case
that it's not even clear anybody cares about).

But if somebody _does_ do so, I don't think we'd need to do anything too
special to integrate with commit-graph-path-filter. The nice thing about
that approach is that the pathspec pruning will just magically return
the same answer, but faster.

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 01/15] pack-bitmap: factor out type iterator initialization
  2020-02-14 18:22   ` [PATCH v2 01/15] pack-bitmap: factor out type iterator initialization Jeff King
@ 2020-02-15  0:10     ` Taylor Blau
  0 siblings, 0 replies; 73+ messages in thread
From: Taylor Blau @ 2020-02-15  0:10 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

Hi Peff,

On Fri, Feb 14, 2020 at 01:22:08PM -0500, Jeff King wrote:
> When count_object_type() wants to iterate over the bitmap of all objects
> of a certain type, we have to pair up OBJ_COMMIT with bitmap->commits,
> and so forth. Since we're about to add more code to iterate over these
> bitmaps, let's pull the initialization into its own function.
>
> We can also use this to simplify traverse_bitmap_commit_list(). It
> accomplishes the same thing by manually passing the object type and the
> bitmap to show_objects_for_type(), but using our helper we just need the
> object type.
>
> Note there's one small code change here: previously we'd simply return
> zero when counting an unknown object type, and now we'll BUG(). This
> shouldn't matter in practice, as all of the callers pass in only usual
> commit/tree/blob/tag types.

This all makes sense, as does the refactoring below.

On a relaid note: since you sent 'v2' before I had popped enough items
off of my TODO list to look at your 'v1', I'll start my review in this
thread instead of in v1.

> Signed-off-by: Jeff King <peff@peff.net>

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 02/15] pack-bitmap: fix leak of haves/wants object lists
  2020-02-14 18:22   ` [PATCH v2 02/15] pack-bitmap: fix leak of haves/wants object lists Jeff King
@ 2020-02-15  0:15     ` Taylor Blau
  2020-02-15  6:46       ` Jeff King
  2020-02-18 17:58     ` Derrick Stolee
  1 sibling, 1 reply; 73+ messages in thread
From: Taylor Blau @ 2020-02-15  0:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On Fri, Feb 14, 2020 at 01:22:11PM -0500, Jeff King wrote:
> When we do a bitmap-aware revision traversal, we create an object_list
> for each of the "haves" and "wants" tips. After creating the result
> bitmaps these are no longer needed or used, but we never free the list
> memory.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  object.c      | 9 +++++++++
>  object.h      | 2 ++
>  pack-bitmap.c | 5 +++++
>  3 files changed, 16 insertions(+)
>
> diff --git a/object.c b/object.c
> index 142ef69399..4d11949b38 100644
> --- a/object.c
> +++ b/object.c
> @@ -307,6 +307,15 @@ int object_list_contains(struct object_list *list, struct object *obj)
>  	return 0;
>  }
>
> +void object_list_free(struct object_list **list)
> +{
> +	while (*list) {
> +		struct object_list *p = *list;
> +		*list = p->next;
> +		free(p);
> +	}
> +}

Hmm. I was going to write a comment saying more-or-less, "I think I'm
nitpicking here, but I'm not crazy about this as a 'while' loop". But,
when I wrote this as a 'for' loop instead, I wrote a use-after-free, and
then when I rid the code of that, it wasn't any more readable than the
version above.

>  /*
>   * A zero-length string to which object_array_entry::name can be
>   * initialized without requiring a malloc/free.
> diff --git a/object.h b/object.h
> index 25f5ab3d54..2dbabfca0a 100644
> --- a/object.h
> +++ b/object.h
> @@ -151,6 +151,8 @@ struct object_list *object_list_insert(struct object *item,
>
>  int object_list_contains(struct object_list *list, struct object *obj);
>
> +void object_list_free(struct object_list **list);
> +
>  /* Object array handling .. */
>  void add_object_array(struct object *obj, const char *name, struct object_array *array);
>  void add_object_array_with_path(struct object *obj, const char *name, struct object_array *array, unsigned mode, const char *path);
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 9ca356ee29..6c06099dc7 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -787,10 +787,15 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
>  	bitmap_git->result = wants_bitmap;
>  	bitmap_git->haves = haves_bitmap;
>
> +	object_list_free(&wants);
> +	object_list_free(&haves);
> +

Makes sense.

>  	return bitmap_git;
>
>  cleanup:
>  	free_bitmap_index(bitmap_git);
> +	object_list_free(&wants);
> +	object_list_free(&haves);

Ditto.

>  	return NULL;
>  }
>
> --
> 2.25.0.796.gcc29325708

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 03/15] rev-list: fallback to non-bitmap traversal when filtering
  2020-02-14 18:22   ` [PATCH v2 03/15] rev-list: fallback to non-bitmap traversal when filtering Jeff King
@ 2020-02-15  0:22     ` Taylor Blau
  0 siblings, 0 replies; 73+ messages in thread
From: Taylor Blau @ 2020-02-15  0:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On Fri, Feb 14, 2020 at 01:22:13PM -0500, Jeff King wrote:
> The "--use-bitmap-index" option is usually aspirational: if we have
> bitmaps and the request can be fulfilled more quickly using them we'll
> do so, but otherwise fall back to a non-bitmap traversal.
>
> The exception is object filtering, which explicitly dies if the two
> options are combined. Let's convert this to the usual fallback behavior.
> This is a minor convenience for now (since the caller can easily know
> that --filter and --use-bitmap-index don't combine), but will become
> much more useful as we start to support _some_ filters with bitmaps, but
> not others.

Yeah, I think that this convenience is worth it early on instead of
lumping in these changes in a future patch.

> The test infrastructure here is bigger than necessary for checking this
> one small feature. But it will serve as the basis for more filtering
> bitmap tests in future patches.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/rev-list.c                 |  4 ++--
>  t/t6113-rev-list-bitmap-filters.sh | 24 ++++++++++++++++++++++++
>  2 files changed, 26 insertions(+), 2 deletions(-)
>  create mode 100755 t/t6113-rev-list-bitmap-filters.sh
>
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index e28d62ec64..bce406bd1e 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -521,8 +521,8 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  	if (revs.show_notes)
>  		die(_("rev-list does not support display of notes"));
>
> -	if (filter_options.choice && use_bitmap_index)
> -		die(_("cannot combine --use-bitmap-index with object filtering"));
> +	if (filter_options.choice)
> +		use_bitmap_index = 0;
>
>  	save_commit_buffer = (revs.verbose_header ||
>  			      revs.grep_filter.pattern_list ||
> diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
> new file mode 100755
> index 0000000000..977f8d0930
> --- /dev/null
> +++ b/t/t6113-rev-list-bitmap-filters.sh
> @@ -0,0 +1,24 @@
> +#!/bin/sh
> +
> +test_description='rev-list combining bitmaps and filters'
> +. ./test-lib.sh
> +
> +test_expect_success 'set up bitmapped repo' '
> +	# one commit will have bitmaps, the other will not
> +	test_commit one &&
> +	git repack -adb &&
> +	test_commit two
> +'
> +
> +test_expect_success 'filters fallback to non-bitmap traversal' '
> +	# use a path-based filter, since they are inherently incompatible with
> +	# bitmaps (i.e., this test will never get confused by later code to
> +	# combine the features)
> +	filter=$(echo "!one" | git hash-object -w --stdin) &&
> +	git rev-list --objects --filter=sparse:oid=$filter HEAD >expect &&
> +	git rev-list --use-bitmap-index \
> +		     --objects --filter=sparse:oid=$filter HEAD >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_done
> --
> 2.25.0.796.gcc29325708

Makes sense.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 05/15] rev-list: factor out bitmap-optimized routines
  2020-02-14 18:22   ` [PATCH v2 05/15] rev-list: factor out bitmap-optimized routines Jeff King
@ 2020-02-15  0:35     ` Taylor Blau
  0 siblings, 0 replies; 73+ messages in thread
From: Taylor Blau @ 2020-02-15  0:35 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On Fri, Feb 14, 2020 at 01:22:18PM -0500, Jeff King wrote:
> There are a few operations in rev-list that are optimized for bitmaps.
> Rather than having the code inline in cmd_rev_list(), let's move them
> into helpers. This not only makes the flow of the main function simpler,
> but it lets us replace the complex "can we do the optimization?"
> conditionals with a series of early returns from the functions. That
> also makes it easy to add comments explaining those conditions.

Makes sense.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/rev-list.c | 88 +++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 67 insertions(+), 21 deletions(-)
>
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index 4cb5a52dee..38c5ca5603 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -364,6 +364,69 @@ static inline int parse_missing_action_value(const char *value)
>  	return 0;
>  }
>
> +static int try_bitmap_count(struct rev_info *revs)
> +{
> +	uint32_t commit_count;
> +	int max_count;
> +	struct bitmap_index *bitmap_git;
> +
> +	/* This function only handles counting, not general traversal. */
> +	if (!revs->count)
> +		return -1;
> +
> +	/*
> +	 * A bitmap result can't know left/right, etc, because we don't
> +	 * actually traverse.
> +	 */
> +	if (revs->left_right || revs->cherry_mark)
> +		return -1;
> +
> +	/*
> +	 * This must be saved before doing any walking, since the revision
> +	 * machinery will count it down to zero while traversing.
> +	 */
> +	max_count = revs->max_count;
> +
> +	bitmap_git = prepare_bitmap_walk(revs);
> +	if (!bitmap_git)
> +		return -1;
> +
> +	count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
> +	if (max_count >= 0 && max_count < commit_count)
> +		commit_count = max_count;
> +
> +	printf("%d\n", commit_count);
> +	free_bitmap_index(bitmap_git);
> +	return 0;
> +}
> +
> +static int try_bitmap_traversal(struct rev_info *revs)
> +{
> +	struct bitmap_index *bitmap_git;
> +
> +	/*
> +	 * We can't use a bitmap result with a traversal limit, since the set
> +	 * of commits we'd get would be essentially random.
> +	 */
> +	if (revs->max_count >= 0)
> +		return -1;
> +
> +	/*
> +	 * Our bitmap result will return all objects, and we're not
> +	 * yet prepared to show only particular types.
> +	 */
> +	if (!revs->tag_objects || !revs->tree_objects || !revs->blob_objects)
> +		return -1;
> +
> +	bitmap_git = prepare_bitmap_walk(revs);
> +	if (!bitmap_git)
> +		return -1;
> +
> +	traverse_bitmap_commit_list(bitmap_git, &show_object_fast);
> +	free_bitmap_index(bitmap_git);
> +	return 0;
> +}
> +
>  int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  {
>  	struct rev_info revs;
> @@ -534,27 +597,10 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  		progress = start_delayed_progress(show_progress, 0);
>
>  	if (use_bitmap_index) {
> -		if (revs.count && !revs.left_right && !revs.cherry_mark) {
> -			uint32_t commit_count;
> -			int max_count = revs.max_count;
> -			struct bitmap_index *bitmap_git;
> -			if ((bitmap_git = prepare_bitmap_walk(&revs))) {
> -				count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
> -				if (max_count >= 0 && max_count < commit_count)
> -					commit_count = max_count;
> -				printf("%d\n", commit_count);
> -				free_bitmap_index(bitmap_git);
> -				return 0;
> -			}
> -		} else if (revs.max_count < 0 &&
> -			   revs.tag_objects && revs.tree_objects && revs.blob_objects) {
> -			struct bitmap_index *bitmap_git;
> -			if ((bitmap_git = prepare_bitmap_walk(&revs))) {
> -				traverse_bitmap_commit_list(bitmap_git, &show_object_fast);
> -				free_bitmap_index(bitmap_git);
> -				return 0;
> -			}
> -		}
> +		if (!try_bitmap_count(&revs))
> +			return 0;
> +		if (!try_bitmap_traversal(&revs))
> +			return 0;
>  	}
>
>  	if (prepare_revision_walk(&revs))
> --
> 2.25.0.796.gcc29325708

The refactoring here is straightforward, and looks correct to my
reading.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 06/15] rev-list: make --count work with --objects
  2020-02-14 18:22   ` [PATCH v2 06/15] rev-list: make --count work with --objects Jeff King
@ 2020-02-15  0:42     ` Taylor Blau
  2020-02-15  6:48       ` Jeff King
  2020-02-18 18:05     ` Derrick Stolee
  1 sibling, 1 reply; 73+ messages in thread
From: Taylor Blau @ 2020-02-15  0:42 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On Fri, Feb 14, 2020 at 01:22:20PM -0500, Jeff King wrote:
> The current behavior from "rev-list --count --objects" is nonsensical:
> we enumerate all of the objects except commits, but then give a count of
> commits. This wasn't planned, and is just what the code happens to do.
>
> Instead, let's give the answer the user almost certainly wanted: the
> full count of objects.

This makes sense: I've often worried about introducing
backwards-incompatible changes in newer versions of Git, even for
behavior that didn't make sense to begin with.

Of course, backwards-incompatible changes *are* something worth worrying
about, but I don't find that the behavior was sensible to begin with, so
I don't have a problem "breaking" it if "breaking" means making
something nonsensical behave correctly.

> Note that there are more complicated cases around cherry-marking, etc.
> We'll punt on those for now, but let the user know that we can't produce
> an answer (rather than giving them something useless).

Yep, sounds good.

> We'll test both the new feature as well as a vanilla --count of commits,
> since that surprisingly doesn't seem to be covered in the existing
> tests.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/rev-list.c       | 13 +++++++++++++
>  t/t6000-rev-list-misc.sh | 12 ++++++++++++
>  2 files changed, 25 insertions(+)
>
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index 38c5ca5603..9452123988 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -253,11 +253,19 @@ static int finish_object(struct object *obj, const char *name, void *cb_data)
>  static void show_object(struct object *obj, const char *name, void *cb_data)
>  {
>  	struct rev_list_info *info = cb_data;
> +	struct rev_info *revs = info->revs;
> +
>  	if (finish_object(obj, name, cb_data))
>  		return;
>  	display_progress(progress, ++progress_counter);
>  	if (info->flags & REV_LIST_QUIET)
>  		return;
> +
> +	if (revs->count) {
> +		revs->count_right++;
> +		return;
> +	}
> +

Hmm. This puzzled me at first. Do you think that it could benefit from a
comment?

>  	if (arg_show_object_names)
>  		show_object_with_name(stdout, obj, name);
>  	else
> @@ -584,6 +592,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  	if (revs.show_notes)
>  		die(_("rev-list does not support display of notes"));
>
> +	if (revs.count &&
> +	    (revs.tag_objects || revs.tree_objects || revs.blob_objects) &&
> +	    (revs.left_right || revs.cherry_mark))
> +		die(_("marked counting is incompatible with --objects"));
> +
>  	if (filter_options.choice)
>  		use_bitmap_index = 0;
>
> diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
> index b8cf82349b..383f2c457d 100755
> --- a/t/t6000-rev-list-misc.sh
> +++ b/t/t6000-rev-list-misc.sh
> @@ -148,4 +148,16 @@ test_expect_success 'rev-list --end-of-options' '
>  	test_cmp expect actual
>  '
>
> +test_expect_success 'rev-list --count' '
> +	count=$(git rev-list --count HEAD) &&
> +	git rev-list HEAD >actual &&
> +	test_line_count = $count actual
> +'
> +
> +test_expect_success 'rev-list --count --objects' '
> +	count=$(git rev-list --count --objects HEAD) &&
> +	git rev-list --objects HEAD >actual &&
> +	test_line_count = $count actual
> +'
> +
>  test_done
> --
> 2.25.0.796.gcc29325708

All looks good.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 07/15] rev-list: allow bitmaps when counting objects
  2020-02-14 18:22   ` [PATCH v2 07/15] rev-list: allow bitmaps when counting objects Jeff King
@ 2020-02-15  0:45     ` Taylor Blau
  2020-02-15  6:55       ` Jeff King
  0 siblings, 1 reply; 73+ messages in thread
From: Taylor Blau @ 2020-02-15  0:45 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On Fri, Feb 14, 2020 at 01:22:22PM -0500, Jeff King wrote:
> The prior commit taught "--count --objects" to work without bitmaps. We
> should be able to get the same answer much more quickly with bitmaps.
>
> Note that we punt on the max_count case here. This perhaps _could_ be
> made to work if we find all of the boundary commits and treat them as
> UNINTERESTING, subtracting them (and their reachable objects) from the
> set we return. That implies an actual commit traversal, but we'd still
> be faster due to avoiding opening up any trees. Given the complexity and
> the fact that anyone is unlikely to want this, it makes sense to just
> fall back to the non-bitmap case for now.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/rev-list.c      | 21 ++++++++++++++++++---
>  t/t5310-pack-bitmaps.sh |  6 ++++++
>  2 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index 9452123988..70f3207ecc 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -374,7 +374,10 @@ static inline int parse_missing_action_value(const char *value)
>
>  static int try_bitmap_count(struct rev_info *revs)
>  {
> -	uint32_t commit_count;
> +	uint32_t commit_count = 0,
> +		 tag_count = 0,
> +		 tree_count = 0,
> +		 blob_count = 0;

Hmm, I don't usually see the comma-separated declaration/initialization
in git.git. Is there a reason you did it here? Not that I really mind
one way or the other, just interested.

>  	int max_count;
>  	struct bitmap_index *bitmap_git;
>
> @@ -389,6 +392,15 @@ static int try_bitmap_count(struct rev_info *revs)
>  	if (revs->left_right || revs->cherry_mark)
>  		return -1;
>
> +	/*
> +	 * If we're counting reachable objects, we can't handle a max count of
> +	 * commits to traverse, since we don't know which objects go with which
> +	 * commit.
> +	 */
> +	if (revs->max_count >= 0 &&
> +	    (revs->tag_objects || revs->tree_objects || revs->blob_objects))

An aside unrelated to the patch at hand: the expression

  (revs->tag_objects || revs->tree_objects || revs->blob_objects)

does occur in an awful lot of places throughout this file. Do you
imagine it'd be useful to pull this check out into its own function,
perhaps as a preparatory patch in a later version of this series?

I'm also not fussed if you don't think that such a change would be
useful, it's just an observation I had after seeing this expression a
few times.

> +		return -1;
> +
>  	/*
>  	 * This must be saved before doing any walking, since the revision
>  	 * machinery will count it down to zero while traversing.
> @@ -399,11 +411,14 @@ static int try_bitmap_count(struct rev_info *revs)
>  	if (!bitmap_git)
>  		return -1;
>
> -	count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL);
> +	count_bitmap_commit_list(bitmap_git, &commit_count,
> +				 revs->tree_objects ? &tree_count : NULL,
> +				 revs->blob_objects ? &blob_count : NULL,
> +				 revs->tag_objects ? &tag_count : NULL);
>  	if (max_count >= 0 && max_count < commit_count)
>  		commit_count = max_count;
>
> -	printf("%d\n", commit_count);
> +	printf("%d\n", commit_count + tree_count + blob_count + tag_count);
>  	free_bitmap_index(bitmap_git);
>  	return 0;
>  }
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 6640329ebf..7ba7d294a5 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -74,6 +74,12 @@ rev_list_tests() {
>  		test_cmp expect actual
>  	'
>
> +	test_expect_success "counting objects via bitmap ($state)" '
> +		git rev-list --count --objects HEAD >expect &&
> +		git rev-list --use-bitmap-index --count --objects HEAD >actual &&
> +		test_cmp expect actual
> +	'
> +
>  	test_expect_success "enumerate --objects ($state)" '
>  		git rev-list --objects --use-bitmap-index HEAD >tmp &&
>  		cut -d" " -f1 <tmp >tmp2 &&
> --
> 2.25.0.796.gcc29325708

Your tests look good to me, too.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 08/15] t5310: factor out bitmap traversal comparison
  2020-02-14 18:22   ` [PATCH v2 08/15] t5310: factor out bitmap traversal comparison Jeff King
@ 2020-02-15  2:14     ` Taylor Blau
  2020-02-15  7:00       ` Jeff King
  0 siblings, 1 reply; 73+ messages in thread
From: Taylor Blau @ 2020-02-15  2:14 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On Fri, Feb 14, 2020 at 01:22:25PM -0500, Jeff King wrote:
> We check the results of "rev-list --use-bitmap-index" by comparing it to
> the same traversal without the bitmap option. However, this is a little
> tricky to do because of some output differences (see the included
> comment for details). Let's pull this out into a helper function, since
> we'll be adding some similar tests.
>
> While we're at it, let's also try to confirm that the bitmap output did
> indeed use bitmaps. Since the code internally falls back to the
> non-bitmap path in some cases, the tests are at risk of becoming trivial
> noops.
>
> This is a bit fragile, as not all outputs will differ (e.g., looking at
> only the commits from a fully-bitmapped pack will end up exactly the
> same as the normal traversal order, since it also matches the pack
> order). So we'll provide an escape hatch by which tests can disable this
> check (which should only be used after manually confirming that bitmaps
> kicked in).

Thanks for pointing out the rationale behind this.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t5310-pack-bitmaps.sh | 10 +++-------
>  t/test-lib-functions.sh | 27 +++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+), 7 deletions(-)
>
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 7ba7d294a5..b8645ae070 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -81,13 +81,9 @@ rev_list_tests() {
>  	'
>
>  	test_expect_success "enumerate --objects ($state)" '
> -		git rev-list --objects --use-bitmap-index HEAD >tmp &&
> -		cut -d" " -f1 <tmp >tmp2 &&
> -		sort <tmp2 >actual &&
> -		git rev-list --objects HEAD >tmp &&
> -		cut -d" " -f1 <tmp >tmp2 &&
> -		sort <tmp2 >expect &&
> -		test_cmp expect actual
> +		git rev-list --objects --use-bitmap-index HEAD >actual &&
> +		git rev-list --objects HEAD >expect &&
> +		test_bitmap_traversal expect actual
>  	'
>
>  	test_expect_success "bitmap --objects handles non-commit objects ($state)" '
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 284c52d076..352c213d52 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1516,3 +1516,30 @@ test_set_port () {
>  	port=$(($port + ${GIT_TEST_STRESS_JOB_NR:-0}))
>  	eval $var=$port
>  }
> +
> +# Compare a file containing rev-list bitmap traversal output to its non-bitmap
> +# counterpart. You can't just use test_cmp for this, because the two produce
> +# subtly different output:
> +#
> +#   - regular output is in traversal order, whereas bitmap is split by type,
> +#     with non-packed objects at the end
> +#
> +#   - regular output has a space and the pathname appended to non-commit
> +#     objects; bitmap output omits this
> +#
> +# This function normalizes and compares the two. The second file should
> +# always be the bitmap output.
> +test_bitmap_traversal () {
> +	if test "$1" = "--no-confirm-bitmaps"
> +	then
> +		shift
> +	elif cmp "$1" "$2"

Any reason to prefer 'cmp' here and then use 'test_cmp' below? I can't
think of a good reason one way or another, especially in places where
GIT_TEST_CMP is just 'cmp', but I think the two usages should be
consistent with one another.

If there *is* a favorable argument for 'test_cmp' (instead of a bare
'cmp'), I think that it'd be that the original code a couple of hunks up
uses it.

> +	then
> +		echo >&2 "identical raw outputs; are you sure bitmaps were used?"
> +		return 1
> +	fi &&
> +	cut -d' ' -f1 "$1" | sort >"$1.normalized" &&
> +	sort "$2" >"$2.normalized" &&
> +	test_cmp "$1.normalized" "$2.normalized" &&
> +	rm -f "$1.normalized" "$2.normalized"

This implementation looks good to me.

> +}
> --
> 2.25.0.796.gcc29325708

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 02/15] pack-bitmap: fix leak of haves/wants object lists
  2020-02-15  0:15     ` Taylor Blau
@ 2020-02-15  6:46       ` Jeff King
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-15  6:46 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano

On Fri, Feb 14, 2020 at 04:15:52PM -0800, Taylor Blau wrote:

> > +void object_list_free(struct object_list **list)
> > +{
> > +	while (*list) {
> > +		struct object_list *p = *list;
> > +		*list = p->next;
> > +		free(p);
> > +	}
> > +}
> 
> Hmm. I was going to write a comment saying more-or-less, "I think I'm
> nitpicking here, but I'm not crazy about this as a 'while' loop". But,
> when I wrote this as a 'for' loop instead, I wrote a use-after-free, and
> then when I rid the code of that, it wasn't any more readable than the
> version above.

Yep, linked lists are deceptive that way.

This _could_ be converted to use the generic list.h macros. Or better
still, we could probably ditch it entirely in favor of an object_array.
But I'd prefer to do that separately, if at all.

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 06/15] rev-list: make --count work with --objects
  2020-02-15  0:42     ` Taylor Blau
@ 2020-02-15  6:48       ` Jeff King
  2020-02-16 23:34         ` Junio C Hamano
  0 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-15  6:48 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano

On Fri, Feb 14, 2020 at 04:42:16PM -0800, Taylor Blau wrote:

> On Fri, Feb 14, 2020 at 01:22:20PM -0500, Jeff King wrote:
> > The current behavior from "rev-list --count --objects" is nonsensical:
> > we enumerate all of the objects except commits, but then give a count of
> > commits. This wasn't planned, and is just what the code happens to do.
> >
> > Instead, let's give the answer the user almost certainly wanted: the
> > full count of objects.
> 
> This makes sense: I've often worried about introducing
> backwards-incompatible changes in newer versions of Git, even for
> behavior that didn't make sense to begin with.
> 
> Of course, backwards-incompatible changes *are* something worth worrying
> about, but I don't find that the behavior was sensible to begin with, so
> I don't have a problem "breaking" it if "breaking" means making
> something nonsensical behave correctly.

Yeah, I admit I'm guessing that nobody cares about the current behavior,
or that it was unplanned. But it seems sufficiently insane to me to take
a chance on.

> > +	if (revs->count) {
> > +		revs->count_right++;
> > +		return;
> > +	}
> > +
> 
> Hmm. This puzzled me at first. Do you think that it could benefit from a
> comment?

What would it say (i.e., I'm not sure what confused you)?

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 07/15] rev-list: allow bitmaps when counting objects
  2020-02-15  0:45     ` Taylor Blau
@ 2020-02-15  6:55       ` Jeff King
  2020-02-16 23:36         ` Junio C Hamano
  0 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-15  6:55 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano

On Fri, Feb 14, 2020 at 04:45:55PM -0800, Taylor Blau wrote:

> >  static int try_bitmap_count(struct rev_info *revs)
> >  {
> > -	uint32_t commit_count;
> > +	uint32_t commit_count = 0,
> > +		 tag_count = 0,
> > +		 tree_count = 0,
> > +		 blob_count = 0;
> 
> Hmm, I don't usually see the comma-separated declaration/initialization
> in git.git. Is there a reason you did it here? Not that I really mind
> one way or the other, just interested.

The variables are all related, and all should have the same type. I'd
complain about a patch that did:

  int ret, count;

because there's no logical reason those two variables have the same
type. They just happen to. And putting them both on the same line is
even worse, because it makes a diff changing one of them noisy.

But in the code quoted above, if one of them changes, they would all
(presumably) change. So I think it communicates something to group them
like this. That said, if we had a style guideline that strictly forbade
this (because it's often applied in _bad_ spots), I wouldn't be that
sad to convert it.

> > +	if (revs->max_count >= 0 &&
> > +	    (revs->tag_objects || revs->tree_objects || revs->blob_objects))
> 
> An aside unrelated to the patch at hand: the expression
> 
>   (revs->tag_objects || revs->tree_objects || revs->blob_objects)
> 
> does occur in an awful lot of places throughout this file. Do you
> imagine it'd be useful to pull this check out into its own function,
> perhaps as a preparatory patch in a later version of this series?
> 
> I'm also not fussed if you don't think that such a change would be
> useful, it's just an observation I had after seeing this expression a
> few times.

The most amusing part about it is that there's actually no way to
specify one type without the other. Most of the revision code tries to
be respectful of the fact that we might one day change that (though
there are some subtleties; e.g., asking for blobs but not trees would
still need to traverse the trees, but just not show them).

So this really could have been revs->objects. :) But I think given the
effort to treat them individually so far, it would be a step backwards
to switch now.

As far as whether there should be something like:

  int revs_show_objects(struct rev_info *revs)
  {
	return revs->tag_objects || revs->tree_objects || revs->blob_objects;
  }

I don't feel strongly either way. I find doing it inline perfectly
readable, but it's entirely possible my brain has been rotted by years
of looking at the revision code.

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 08/15] t5310: factor out bitmap traversal comparison
  2020-02-15  2:14     ` Taylor Blau
@ 2020-02-15  7:00       ` Jeff King
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-15  7:00 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano

On Fri, Feb 14, 2020 at 06:14:46PM -0800, Taylor Blau wrote:

> > This is a bit fragile, as not all outputs will differ (e.g., looking at
> > only the commits from a fully-bitmapped pack will end up exactly the
> > same as the normal traversal order, since it also matches the pack
> > order). So we'll provide an escape hatch by which tests can disable this
> > check (which should only be used after manually confirming that bitmaps
> > kicked in).
> 
> Thanks for pointing out the rationale behind this.

I didn't write it this way at first, but the need became quite apparent
when I ran the test from the next patch. :)

I waffled on whether the extra option made the helper too convoluted to
be worthwhile.  Another way to do it would be two separate functions:

  test_for_bitmap_output expect actual &&
  test_bitmap_output_matches expect actual

but as you can see I couldn't come up with good names.

I doubt it matters all that much either way, as long as the docstring is
clear.

> > +test_bitmap_traversal () {
> > +	if test "$1" = "--no-confirm-bitmaps"
> > +	then
> > +		shift
> > +	elif cmp "$1" "$2"
> 
> Any reason to prefer 'cmp' here and then use 'test_cmp' below? I can't
> think of a good reason one way or another, especially in places where
> GIT_TEST_CMP is just 'cmp', but I think the two usages should be
> consistent with one another.

On most platforms GIT_TEST_CMP is "diff -u" (it's only on platforms with
a crappy system diff tool that we replace it with "cmp"). So I thought
it would be confusing for it to dump a diff in the expected case (since
unlike a normal test_cmp, we want the outputs to differ).  "cmp" does
still write a short message to stderr; I thought about redirecting it to
/dev/null, but worried that would make debugging verbose output harder).

> If there *is* a favorable argument for 'test_cmp' (instead of a bare
> 'cmp'), I think that it'd be that the original code a couple of hunks up
> uses it.

Right, but it uses to check that two things are equal (which we also use
it for here).

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 06/15] rev-list: make --count work with --objects
  2020-02-15  6:48       ` Jeff King
@ 2020-02-16 23:34         ` Junio C Hamano
  2020-02-18  5:24           ` Jeff King
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2020-02-16 23:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git

Jeff King <peff@peff.net> writes:

>> > +	if (revs->count) {
>> > +		revs->count_right++;
>> > +		return;
>> > +	}
>> > +
>> 
>> Hmm. This puzzled me at first. Do you think that it could benefit from a
>> comment?
>
> What would it say (i.e., I'm not sure what confused you)?

I think the question reader had was "why *right*?"

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 07/15] rev-list: allow bitmaps when counting objects
  2020-02-15  6:55       ` Jeff King
@ 2020-02-16 23:36         ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2020-02-16 23:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git

Jeff King <peff@peff.net> writes:

>> > +	uint32_t commit_count = 0,
>> > +		 tag_count = 0,
>> > +		 tree_count = 0,
>> > +		 blob_count = 0;
>> 
>> Hmm, I don't usually see the comma-separated declaration/initialization
>> in git.git. Is there a reason you did it here? Not that I really mind
>> one way or the other, just interested.
>
> The variables are all related, and all should have the same type. I'd
> complain about a patch that did:
>
>   int ret, count;
>
> because there's no logical reason those two variables have the same
> type. They just happen to. And putting them both on the same line is
> even worse, because it makes a diff changing one of them noisy.
>
> But in the code quoted above, if one of them changes, they would all
> (presumably) change. So I think it communicates something to group them
> like this.

I often apply exactly the same criteria as above to my code and
review---since it is not just you (or me), perhaps CodingGuideline
can help other readers, but I am OK to delay documenting it until we
find the third person who has been applying this rule that has not
been spelt out explicitly ;-)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 06/15] rev-list: make --count work with --objects
  2020-02-16 23:34         ` Junio C Hamano
@ 2020-02-18  5:24           ` Jeff King
  2020-02-18 17:28             ` Junio C Hamano
  0 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-18  5:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git

On Sun, Feb 16, 2020 at 03:34:13PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> > +	if (revs->count) {
> >> > +		revs->count_right++;
> >> > +		return;
> >> > +	}
> >> > +
> >> 
> >> Hmm. This puzzled me at first. Do you think that it could benefit from a
> >> comment?
> >
> > What would it say (i.e., I'm not sure what confused you)?
> 
> I think the question reader had was "why *right*?"

Ah. The answer is: because it's not SYMMETRIC_LEFT. ;)

The counting code accumulates there by default when there are no side
markings, so that's what it will report when there's no left_right or
cherry_mark (which we know will be the case here, since we die()
otherwise).

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 06/15] rev-list: make --count work with --objects
  2020-02-18  5:24           ` Jeff King
@ 2020-02-18 17:28             ` Junio C Hamano
  2020-02-18 19:55               ` Jeff King
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2020-02-18 17:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git

Jeff King <peff@peff.net> writes:

> On Sun, Feb 16, 2020 at 03:34:13PM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> >> > +	if (revs->count) {
>> >> > +		revs->count_right++;
>> >> > +		return;
>> >> > +	}
>> >> > +
>> >> 
>> >> Hmm. This puzzled me at first. Do you think that it could benefit from a
>> >> comment?
>> >
>> > What would it say (i.e., I'm not sure what confused you)?
>> 
>> I think the question reader had was "why *right*?"
>
> Ah. The answer is: because it's not SYMMETRIC_LEFT. ;)
>
> The counting code accumulates there by default when there are no side
> markings, so that's what it will report when there's no left_right or
> cherry_mark (which we know will be the case here, since we die()
> otherwise).

Yup.

	/*
	 * The object count is always accumulated in the .count_right
	 * field for traversal that is not a left-right traversal,
	 * and cmd_rev_list() made sure that a .count request that
	 * wants to count non-commit objects, which is handled by
	 * the show_object() callback, does not ask for .left_right.
	 */

Overkill?  I dunno.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 02/15] pack-bitmap: fix leak of haves/wants object lists
  2020-02-14 18:22   ` [PATCH v2 02/15] pack-bitmap: fix leak of haves/wants object lists Jeff King
  2020-02-15  0:15     ` Taylor Blau
@ 2020-02-18 17:58     ` Derrick Stolee
  2020-02-18 20:02       ` Jeff King
  1 sibling, 1 reply; 73+ messages in thread
From: Derrick Stolee @ 2020-02-18 17:58 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Junio C Hamano

On 2/14/2020 1:22 PM, Jeff King wrote:
> When we do a bitmap-aware revision traversal, we create an object_list
> for each of the "haves" and "wants" tips. After creating the result
> bitmaps these are no longer needed or used, but we never free the list
> memory.

It's surprising that we got away with this for so long. Is it
possible that this loop of freeing memory could provide significant
performance drawbacks? I'm assuming that the free() loop is much
faster than the malloc() loop, so the impact is negligible.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 06/15] rev-list: make --count work with --objects
  2020-02-14 18:22   ` [PATCH v2 06/15] rev-list: make --count work with --objects Jeff King
  2020-02-15  0:42     ` Taylor Blau
@ 2020-02-18 18:05     ` Derrick Stolee
  2020-02-18 19:59       ` Jeff King
  1 sibling, 1 reply; 73+ messages in thread
From: Derrick Stolee @ 2020-02-18 18:05 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Junio C Hamano

On 2/14/2020 1:22 PM, Jeff King wrote:
> +test_expect_success 'rev-list --count' '
> +	count=$(git rev-list --count HEAD) &&
> +	git rev-list HEAD >actual &&
> +	test_line_count = $count actual
> +'
> +
> +test_expect_success 'rev-list --count --objects' '
> +	count=$(git rev-list --count --objects HEAD) &&
> +	git rev-list --objects HEAD >actual &&
> +	test_line_count = $count actual
> +'

I suppose these tests work, although I would probably
prefer precomputed explicit expected counts instead
of asking rev-list for the correct answer to a
rev-list command. This is still fine, because we test
that the non-count versions return the correct results,
but I would hate for a bug to affect both modes equally
and cause these tests to pass.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 09/15] rev-list: allow commit-only bitmap traversals
  2020-02-14 18:22   ` [PATCH v2 09/15] rev-list: allow commit-only bitmap traversals Jeff King
@ 2020-02-18 18:18     ` Derrick Stolee
  2020-02-18 20:05       ` Jeff King
  0 siblings, 1 reply; 73+ messages in thread
From: Derrick Stolee @ 2020-02-18 18:18 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Junio C Hamano

On 2/14/2020 1:22 PM, Jeff King wrote:
> Here are numbers for linux.git:
> 
>   Test                         HEAD^               HEAD
>   ------------------------------------------------------------------------
>   5310.7: rev-list (commits)   8.29(8.10+0.19)       1.76(1.72+0.04) -78.8%
>   5310.8: rev-list (objects)   8.06(7.94+0.12)       8.14(7.94+0.13) +1.0%

Nice.

> That run was cheating a little, as I didn't have any commit-graph in the
> repository, and we'd built it by default these days when running git-gc.
> Here are numbers with a commit-graph:
> 
>   Test                         HEAD^               HEAD
>   ------------------------------------------------------------------------
>   5310.7: rev-list (commits)   0.70(0.58+0.12)     0.51(0.46+0.04) -27.1%
>   5310.8: rev-list (objects)   6.20(6.09+0.10)     6.27(6.16+0.11) +1.1%
> 
> Still an improvement, but a lot less impressive.

I think this is still impressive, because you are still allocating the
object structs and writing data to the output. The commit-graph code allows
cheating and doing very little work when navigating from one commit to
another. I suppose the biggest difference between these two approaches is
that the object cache contains "parsed" commits at the end (with allocated
parents and initialized commit times) and we needed a priority queue for
the commit walk. I'm impressed that this is still 27% improvement!

> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index b8645ae070..2c64d0c441 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -80,6 +80,12 @@ rev_list_tests() {
>  		test_cmp expect actual
>  	'
>  
> +	test_expect_success "enumerate commits ($state)" '
> +		git rev-list --use-bitmap-index HEAD >actual &&
> +		git rev-list HEAD >expect &&
> +		test_bitmap_traversal --no-confirm-bitmaps expect actual
> +	'
> +

I was wondering if there is anything we could add to the "expect"
command that could better guarantee these commits show up in a
different order than the bitmap order, allowing us to drop the
"--no-confirm-bitmaps". Perhaps the issue is the merge structure
of the repository, and how it will cause these orders to always
agree?

I suppose "--topo-order" may actually present an order _even closer_
to the bitmap order.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 13/15] pack-bitmap: implement BLOB_NONE filtering
  2020-02-14 18:22   ` [PATCH v2 13/15] pack-bitmap: implement BLOB_NONE filtering Jeff King
@ 2020-02-18 19:26     ` Derrick Stolee
  2020-02-18 19:36       ` Derrick Stolee
  2020-02-18 20:24       ` Jeff King
  0 siblings, 2 replies; 73+ messages in thread
From: Derrick Stolee @ 2020-02-18 19:26 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Junio C Hamano

On 2/14/2020 1:22 PM, Jeff King wrote:
> We can easily support BLOB_NONE filters with bitmaps. Since we know the
> types of all of the objects, we just need to clear the result bits of
> any blobs.
> 
> Note two subtleties in the implementation (which I also called out in
> comments):
> 
>   - we have to include any blobs that were specifically asked for (and
>     not reached through graph traversal) to match the non-bitmap version

I have a concern here, but maybe I'm worrying about nothing. When a
partial clone asks for a pack of missing blobs, will your code create
an empty bitmap and then add bits to that bitmap one-by-one instead
of appending to a simple object list?

In the typical case where we ask for specific commits and trees, we
expect a very small number of blobs to add to the resulting bitmap.
When no commits or trees are included in the wants, then we don't
need the bitmap at all. IIRC an EWAH bitmap is relatively expensive
to update bits one at a time, so this is not incredibly efficient.

I apologize that I don't have the time to dig into this myself.

>   - we have to handle in-pack and "ext_index" objects separately.
>     Arguably prepare_bitmap_walk() could be adding these ext_index
>     objects to the type bitmaps. But it doesn't for now, so let's match
>     the rest of the bitmap code here (it probably wouldn't be an
>     efficiency improvement to do so since the cost of extending those
>     bitmaps is about the same as our loop here, but it might make the
>     code a bit simpler).
> 
> Here are perf results for the new test on git.git:
> 
>   Test                                    HEAD^             HEAD
>   --------------------------------------------------------------------------------
>   5310.9: rev-list count with blob:none   1.67(1.62+0.05)   0.22(0.21+0.02) -86.8%
...
> diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
> index e52f66ec9e..936742314c 100755
> --- a/t/perf/p5310-pack-bitmaps.sh
> +++ b/t/perf/p5310-pack-bitmaps.sh
> @@ -47,6 +47,11 @@ test_perf 'rev-list (objects)' '
>  	git rev-list --all --use-bitmap-index --objects >/dev/null
>  '
>  
> +test_perf 'rev-list count with blob:none' '
> +	git rev-list --use-bitmap-index --count --objects --all \
> +		--filter=blob:none >/dev/null
> +'

I wondered why you chose to extend these tests instead of using
p5600-partial-clone.sh, but I guess this script definitely creates
the bitmap for the test. When I tested p5600-partial-clone.sh below,
I manually repacked the Linux repo to have a bitmap:

Test                          v2.25.0               HEAD                    
----------------------------------------------------------------------------
5600.2: clone without blobs   79.81(111.34+11.35)   36.00(69.37+7.30) -54.9%
5600.3: checkout of result    45.56(114.59+4.81)    46.43(80.50+5.41) +1.9% 

Perhaps results for these tests would also be appropriate for your
commit messages?

Note the +1.9% for the checkout. It's unlikely that this is actually
something meaningful, but it _could_ be related to my concerns above
about building a blob list from an empty bitmap.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 13/15] pack-bitmap: implement BLOB_NONE filtering
  2020-02-18 19:26     ` Derrick Stolee
@ 2020-02-18 19:36       ` Derrick Stolee
  2020-02-18 20:30         ` Jeff King
  2020-02-18 20:24       ` Jeff King
  1 sibling, 1 reply; 73+ messages in thread
From: Derrick Stolee @ 2020-02-18 19:36 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Junio C Hamano

On 2/18/2020 2:26 PM, Derrick Stolee wrote:
> On 2/14/2020 1:22 PM, Jeff King wrote:
>> +test_perf 'rev-list count with blob:none' '
>> +	git rev-list --use-bitmap-index --count --objects --all \
>> +		--filter=blob:none >/dev/null
>> +'
> 
> I wondered why you chose to extend these tests instead of using
> p5600-partial-clone.sh, but I guess this script definitely creates
> the bitmap for the test. When I tested p5600-partial-clone.sh below,
> I manually repacked the Linux repo to have a bitmap:
> 
> Test                          v2.25.0               HEAD                    
> ----------------------------------------------------------------------------
> 5600.2: clone without blobs   79.81(111.34+11.35)   36.00(69.37+7.30) -54.9%
> 5600.3: checkout of result    45.56(114.59+4.81)    46.43(80.50+5.41) +1.9% 
> 
> Perhaps results for these tests would also be appropriate for your
> commit messages?

And speaking of valuable performance tests to update: should we take
the loop of fetch tests in p5311-pack-bitmaps-fetch.sh and make
equivalent versions in p5600-partial-clone.sh? It would be good to
make sure that the incremental fetches also improve in this scenario.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 06/15] rev-list: make --count work with --objects
  2020-02-18 17:28             ` Junio C Hamano
@ 2020-02-18 19:55               ` Jeff King
  2020-02-18 21:19                 ` Junio C Hamano
  0 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-18 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git

On Tue, Feb 18, 2020 at 09:28:31AM -0800, Junio C Hamano wrote:

> >> I think the question reader had was "why *right*?"
> >
> > Ah. The answer is: because it's not SYMMETRIC_LEFT. ;)
> >
> > The counting code accumulates there by default when there are no side
> > markings, so that's what it will report when there's no left_right or
> > cherry_mark (which we know will be the case here, since we die()
> > otherwise).
> 
> Yup.
> 
> 	/*
> 	 * The object count is always accumulated in the .count_right
> 	 * field for traversal that is not a left-right traversal,
> 	 * and cmd_rev_list() made sure that a .count request that
> 	 * wants to count non-commit objects, which is handled by
> 	 * the show_object() callback, does not ask for .left_right.
> 	 */
> 
> Overkill?  I dunno.

No, I think it looks good. Want to squash that in (and I'll do likewise
locally)?

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 06/15] rev-list: make --count work with --objects
  2020-02-18 18:05     ` Derrick Stolee
@ 2020-02-18 19:59       ` Jeff King
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-18 19:59 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano

On Tue, Feb 18, 2020 at 01:05:35PM -0500, Derrick Stolee wrote:

> On 2/14/2020 1:22 PM, Jeff King wrote:
> > +test_expect_success 'rev-list --count' '
> > +	count=$(git rev-list --count HEAD) &&
> > +	git rev-list HEAD >actual &&
> > +	test_line_count = $count actual
> > +'
> > +
> > +test_expect_success 'rev-list --count --objects' '
> > +	count=$(git rev-list --count --objects HEAD) &&
> > +	git rev-list --objects HEAD >actual &&
> > +	test_line_count = $count actual
> > +'
> 
> I suppose these tests work, although I would probably
> prefer precomputed explicit expected counts instead
> of asking rev-list for the correct answer to a
> rev-list command. This is still fine, because we test
> that the non-count versions return the correct results,
> but I would hate for a bug to affect both modes equally
> and cause these tests to pass.

I was hoping it would make the tests a bit less brittle. And as you
note, we should be checking the enumeration results themselves more
carefully in other tests, so I find it fairly unlikely to see a bug that
doesn't trigger _any_ test failure.

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 02/15] pack-bitmap: fix leak of haves/wants object lists
  2020-02-18 17:58     ` Derrick Stolee
@ 2020-02-18 20:02       ` Jeff King
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-18 20:02 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano

On Tue, Feb 18, 2020 at 12:58:10PM -0500, Derrick Stolee wrote:

> On 2/14/2020 1:22 PM, Jeff King wrote:
> > When we do a bitmap-aware revision traversal, we create an object_list
> > for each of the "haves" and "wants" tips. After creating the result
> > bitmaps these are no longer needed or used, but we never free the list
> > memory.
> 
> It's surprising that we got away with this for so long. Is it
> possible that this loop of freeing memory could provide significant
> performance drawbacks? I'm assuming that the free() loop is much
> faster than the malloc() loop, so the impact is negligible.

I don't think it's surprising. We're only talking about leaking hundreds
or perhaps thousands of structs. So maybe a few kilobytes at most. And
we'd typically only run one such bitmap traversal per program.

Meanwhile rev-list is generally storing that same set of objects in a
pending array, and I think we don't generally clean up after it (nor is
there even a convenient function to do so).

So I think it's sort of a drop in the bucket of Git's "eh, this pretty
much lasts about the whole program anyway" leaks. I don't think there's
much performance difference either way in freeing or not. It's mostly
just a cleanliness thing. (I do one day dream of being able to run the
test suite through a leakchecker, but we're still a ways off, I think).

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 09/15] rev-list: allow commit-only bitmap traversals
  2020-02-18 18:18     ` Derrick Stolee
@ 2020-02-18 20:05       ` Jeff King
  2020-02-18 20:11         ` Derrick Stolee
  0 siblings, 1 reply; 73+ messages in thread
From: Jeff King @ 2020-02-18 20:05 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano

On Tue, Feb 18, 2020 at 01:18:21PM -0500, Derrick Stolee wrote:

> > +	test_expect_success "enumerate commits ($state)" '
> > +		git rev-list --use-bitmap-index HEAD >actual &&
> > +		git rev-list HEAD >expect &&
> > +		test_bitmap_traversal --no-confirm-bitmaps expect actual
> > +	'
> > +
> 
> I was wondering if there is anything we could add to the "expect"
> command that could better guarantee these commits show up in a
> different order than the bitmap order, allowing us to drop the
> "--no-confirm-bitmaps". Perhaps the issue is the merge structure
> of the repository, and how it will cause these orders to always
> agree?
> 
> I suppose "--topo-order" may actually present an order _even closer_
> to the bitmap order.

I think in the partial-bitmap state it may have a different order,
because we'd put the new commits at the end. But we run this test in
both fully-bitmapped and partial-bitmap conditions.

We could add something like --topo-order to the expected output, but I
think even if it worked, then the bitmap-confirmation in
test_bitmap_traversal wouldn't really be checking anything! I.e., if the
"actual" traversal didn't use bitmaps, we'd still say "oh, these two
outputs are different, so one must have used bitmaps." So saying
"--no-confirm-bitmap" at least documents that we don't expect any
difference.

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 09/15] rev-list: allow commit-only bitmap traversals
  2020-02-18 20:05       ` Jeff King
@ 2020-02-18 20:11         ` Derrick Stolee
  0 siblings, 0 replies; 73+ messages in thread
From: Derrick Stolee @ 2020-02-18 20:11 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On 2/18/2020 3:05 PM, Jeff King wrote:
> On Tue, Feb 18, 2020 at 01:18:21PM -0500, Derrick Stolee wrote:
> 
>>> +	test_expect_success "enumerate commits ($state)" '
>>> +		git rev-list --use-bitmap-index HEAD >actual &&
>>> +		git rev-list HEAD >expect &&
>>> +		test_bitmap_traversal --no-confirm-bitmaps expect actual
>>> +	'
>>> +
>>
>> I was wondering if there is anything we could add to the "expect"
>> command that could better guarantee these commits show up in a
>> different order than the bitmap order, allowing us to drop the
>> "--no-confirm-bitmaps". Perhaps the issue is the merge structure
>> of the repository, and how it will cause these orders to always
>> agree?
>>
>> I suppose "--topo-order" may actually present an order _even closer_
>> to the bitmap order.
> 
> I think in the partial-bitmap state it may have a different order,
> because we'd put the new commits at the end. But we run this test in
> both fully-bitmapped and partial-bitmap conditions.
> 
> We could add something like --topo-order to the expected output, but I
> think even if it worked, then the bitmap-confirmation in
> test_bitmap_traversal wouldn't really be checking anything! I.e., if the
> "actual" traversal didn't use bitmaps, we'd still say "oh, these two
> outputs are different, so one must have used bitmaps." So saying
> "--no-confirm-bitmap" at least documents that we don't expect any
> difference.

That's a pretty good point. Thanks!

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 13/15] pack-bitmap: implement BLOB_NONE filtering
  2020-02-18 19:26     ` Derrick Stolee
  2020-02-18 19:36       ` Derrick Stolee
@ 2020-02-18 20:24       ` Jeff King
  1 sibling, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-18 20:24 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano

On Tue, Feb 18, 2020 at 02:26:53PM -0500, Derrick Stolee wrote:

> On 2/14/2020 1:22 PM, Jeff King wrote:
> > We can easily support BLOB_NONE filters with bitmaps. Since we know the
> > types of all of the objects, we just need to clear the result bits of
> > any blobs.
> > 
> > Note two subtleties in the implementation (which I also called out in
> > comments):
> > 
> >   - we have to include any blobs that were specifically asked for (and
> >     not reached through graph traversal) to match the non-bitmap version
> 
> I have a concern here, but maybe I'm worrying about nothing. When a
> partial clone asks for a pack of missing blobs, will your code create
> an empty bitmap and then add bits to that bitmap one-by-one instead
> of appending to a simple object list?

Yes. They'd all be listed in revs->pending, so we'd add them to our
array of "wants", and then create a bitmap. There's no traversal cost,
but we'd pay the cost to open the bitmap file. But...

> In the typical case where we ask for specific commits and trees, we
> expect a very small number of blobs to add to the resulting bitmap.
> When no commits or trees are included in the wants, then we don't
> need the bitmap at all. IIRC an EWAH bitmap is relatively expensive
> to update bits one at a time, so this is not incredibly efficient.

There's no ewah bitmap at play here at all. The internal bitmap we
create in memory for the walk is a real uncompressed bitmap, so settings
and retrieving bits is pretty cheap.

I'd worry much more about the fact that we had to parse the whole bitmap
file (which for historical format reasons involves actually running over
all of the bytes in the file).

I think that's somewhat orthogonal to this patch, though. The same
pessimality would be true of anybody fetching a couple blobs, whether
they use filters or not (and really, there's no reason that the
follow-up fetch for blobs in a filtered repository would need to use
filters, but for some reason it does).

It would be an easy optimization to say "we have only blobs, don't
bother opening the bitmap file", but I think that should come on top.
However, I have a suspicion that we actually call parse_object() on each
blob before we even get to the bitmap code (via get_reference()). If so,
that's a much larger low-hanging fruit.

> > --- a/t/perf/p5310-pack-bitmaps.sh
> [...]
> 
> I wondered why you chose to extend these tests instead of using
> p5600-partial-clone.sh, but I guess this script definitely creates
> the bitmap for the test. When I tested p5600-partial-clone.sh below,
> I manually repacked the Linux repo to have a bitmap:

Right, when the two features combine, we have to either pick a test
script that hits one or the other, or create a new one. Especially since
this one covers the full and partial bitmap states, I think it's nice to
reuse that work.

> Test                          v2.25.0               HEAD                    
> ----------------------------------------------------------------------------
> 5600.2: clone without blobs   79.81(111.34+11.35)   36.00(69.37+7.30) -54.9%
> 5600.3: checkout of result    45.56(114.59+4.81)    46.43(80.50+5.41) +1.9% 
> 
> Perhaps results for these tests would also be appropriate for your
> commit messages?

I think your p5600.2 is basically the same as what I ended up with
in p5310 for the final commit (when pack-objects finally learns to use
these filters, too). But we want to make sure we are testing with
bitmaps, so I don't think we'd want to count on the sample repo having
bitmaps already.

I think this speaks to the general problems with the perf suite, in that
we probably ought to be testing combinations of repos and tests, rather
than manually creating situations in each script).

> Note the +1.9% for the checkout. It's unlikely that this is actually
> something meaningful, but it _could_ be related to my concerns above
> about building a blob list from an empty bitmap.

In my experience with the perf suite, 2% is most likely noise
(especially for a filesystem-heavy operation like checkout). Note that
the difference in system time covers almost all of the wall-clock time.

It is curious that the user time in your second test dropped so
drastically, but didn't create a wall-clock improvement. I've often seen
weirdness there with the CPU clock changing due to external factors.

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 13/15] pack-bitmap: implement BLOB_NONE filtering
  2020-02-18 19:36       ` Derrick Stolee
@ 2020-02-18 20:30         ` Jeff King
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-18 20:30 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano

On Tue, Feb 18, 2020 at 02:36:55PM -0500, Derrick Stolee wrote:

> > I wondered why you chose to extend these tests instead of using
> > p5600-partial-clone.sh, but I guess this script definitely creates
> > the bitmap for the test. When I tested p5600-partial-clone.sh below,
> > I manually repacked the Linux repo to have a bitmap:
> > 
> > Test                          v2.25.0               HEAD                    
> > ----------------------------------------------------------------------------
> > 5600.2: clone without blobs   79.81(111.34+11.35)   36.00(69.37+7.30) -54.9%
> > 5600.3: checkout of result    45.56(114.59+4.81)    46.43(80.50+5.41) +1.9% 
> > 
> > Perhaps results for these tests would also be appropriate for your
> > commit messages?
> 
> And speaking of valuable performance tests to update: should we take
> the loop of fetch tests in p5311-pack-bitmaps-fetch.sh and make
> equivalent versions in p5600-partial-clone.sh? It would be good to
> make sure that the incremental fetches also improve in this scenario.

I don't think it would make sense to add them permanently, since p5600
doesn't necessarily have bitmaps in effect.

But in any case, those tests are mostly about pack-objects realizing
when we can use on-disk deltas due to the presence of bitmaps. If we're
avoiding sending blobs, then that cuts out a huge chunk of opportunity
for it to make any improvement (it might still have some impact because
of trees, though). So I'd expect the effect to be muted.

I dunno. It's true that I just hypothesized a result which we could
confirm. But the perf tests are quite expensive to run (and p5311 is one
of the more expensive ones). I'm not sure that repeating those tests
combined with partial clones carries a lot of regression-testing value
to merit adding that expense to all future runs.

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 06/15] rev-list: make --count work with --objects
  2020-02-18 19:55               ` Jeff King
@ 2020-02-18 21:19                 ` Junio C Hamano
  2020-02-18 21:23                   ` Jeff King
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2020-02-18 21:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git

Jeff King <peff@peff.net> writes:

> On Tue, Feb 18, 2020 at 09:28:31AM -0800, Junio C Hamano wrote:
>
>> >> I think the question reader had was "why *right*?"
>> >
>> > Ah. The answer is: because it's not SYMMETRIC_LEFT. ;)
>> >
>> > The counting code accumulates there by default when there are no side
>> > markings, so that's what it will report when there's no left_right or
>> > cherry_mark (which we know will be the case here, since we die()
>> > otherwise).
>> 
>> Yup.
>> 
>> 	/*
>> 	 * The object count is always accumulated in the .count_right
>> 	 * field for traversal that is not a left-right traversal,
>> 	 * and cmd_rev_list() made sure that a .count request that
>> 	 * wants to count non-commit objects, which is handled by
>> 	 * the show_object() callback, does not ask for .left_right.
>> 	 */
>> 
>> Overkill?  I dunno.
>
> No, I think it looks good. Want to squash that in (and I'll do likewise
> locally)?

It's too late for squashing anything in, as they are in 'next'
already.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 06/15] rev-list: make --count work with --objects
  2020-02-18 21:19                 ` Junio C Hamano
@ 2020-02-18 21:23                   ` Jeff King
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff King @ 2020-02-18 21:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git

On Tue, Feb 18, 2020 at 01:19:51PM -0800, Junio C Hamano wrote:

> >> 	/*
> >> 	 * The object count is always accumulated in the .count_right
> >> 	 * field for traversal that is not a left-right traversal,
> >> 	 * and cmd_rev_list() made sure that a .count request that
> >> 	 * wants to count non-commit objects, which is handled by
> >> 	 * the show_object() callback, does not ask for .left_right.
> >> 	 */
> >> 
> >> Overkill?  I dunno.
> >
> > No, I think it looks good. Want to squash that in (and I'll do likewise
> > locally)?
> 
> It's too late for squashing anything in, as they are in 'next'
> already.

Ah, I hadn't noticed that yet. I can live without it, but I'd also be OK
with a patch on top if you want to add it.

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2020-02-18 21:23 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13  2:15 [PATCH 0/13] combining object filters and bitmaps Jeff King
2020-02-13  2:16 ` [PATCH 01/13] pack-bitmap: factor out type iterator initialization Jeff King
2020-02-13 17:45   ` Junio C Hamano
2020-02-13  2:16 ` [PATCH 02/13] pack-bitmap: fix leak of haves/wants object lists Jeff King
2020-02-13 18:12   ` Junio C Hamano
2020-02-13  2:17 ` [PATCH 03/13] rev-list: fallback to non-bitmap traversal when filtering Jeff King
2020-02-13 18:19   ` Junio C Hamano
2020-02-13 18:40     ` Jeff King
2020-02-13  2:17 ` [PATCH 04/13] rev-list: consolidate bitmap-disabling options Jeff King
2020-02-13  2:18 ` [PATCH 05/13] rev-list: factor out bitmap-optimized routines Jeff King
2020-02-13 18:34   ` Junio C Hamano
2020-02-13  2:19 ` [PATCH 06/13] rev-list: make --count work with --objects Jeff King
2020-02-13 19:14   ` Junio C Hamano
2020-02-13 20:27     ` Jeff King
2020-02-13  2:20 ` [PATCH 07/13] rev-list: allow bitmaps when counting objects Jeff King
2020-02-13 21:47   ` Junio C Hamano
2020-02-13 22:27     ` Jeff King
2020-02-13  2:20 ` [PATCH 08/13] pack-bitmap: basic noop bitmap filter infrastructure Jeff King
2020-02-13  2:21 ` [PATCH 09/13] rev-list: use bitmap filters for traversal Jeff King
2020-02-13 22:22   ` Junio C Hamano
2020-02-13 22:34     ` Jeff King
2020-02-13  2:21 ` [PATCH 10/13] bitmap: add bitmap_unset() function Jeff King
2020-02-13  2:23 ` [PATCH 11/13] pack-bitmap: implement BLOB_NONE filtering Jeff King
2020-02-13  2:25 ` [PATCH 12/13] pack-bitmap: implement BLOB_LIMIT filtering Jeff King
2020-02-13 23:17   ` Junio C Hamano
2020-02-13  2:25 ` [PATCH 13/13] pack-objects: support filters with bitmaps Jeff King
2020-02-14 18:21 ` [PATCH v2 0/15] combining object filters and bitmaps Jeff King
2020-02-14 18:22   ` [PATCH v2 01/15] pack-bitmap: factor out type iterator initialization Jeff King
2020-02-15  0:10     ` Taylor Blau
2020-02-14 18:22   ` [PATCH v2 02/15] pack-bitmap: fix leak of haves/wants object lists Jeff King
2020-02-15  0:15     ` Taylor Blau
2020-02-15  6:46       ` Jeff King
2020-02-18 17:58     ` Derrick Stolee
2020-02-18 20:02       ` Jeff King
2020-02-14 18:22   ` [PATCH v2 03/15] rev-list: fallback to non-bitmap traversal when filtering Jeff King
2020-02-15  0:22     ` Taylor Blau
2020-02-14 18:22   ` [PATCH v2 04/15] pack-bitmap: refuse to do a bitmap traversal with pathspecs Jeff King
2020-02-14 19:03     ` Junio C Hamano
2020-02-14 20:51       ` Jeff King
2020-02-14 18:22   ` [PATCH v2 05/15] rev-list: factor out bitmap-optimized routines Jeff King
2020-02-15  0:35     ` Taylor Blau
2020-02-14 18:22   ` [PATCH v2 06/15] rev-list: make --count work with --objects Jeff King
2020-02-15  0:42     ` Taylor Blau
2020-02-15  6:48       ` Jeff King
2020-02-16 23:34         ` Junio C Hamano
2020-02-18  5:24           ` Jeff King
2020-02-18 17:28             ` Junio C Hamano
2020-02-18 19:55               ` Jeff King
2020-02-18 21:19                 ` Junio C Hamano
2020-02-18 21:23                   ` Jeff King
2020-02-18 18:05     ` Derrick Stolee
2020-02-18 19:59       ` Jeff King
2020-02-14 18:22   ` [PATCH v2 07/15] rev-list: allow bitmaps when counting objects Jeff King
2020-02-15  0:45     ` Taylor Blau
2020-02-15  6:55       ` Jeff King
2020-02-16 23:36         ` Junio C Hamano
2020-02-14 18:22   ` [PATCH v2 08/15] t5310: factor out bitmap traversal comparison Jeff King
2020-02-15  2:14     ` Taylor Blau
2020-02-15  7:00       ` Jeff King
2020-02-14 18:22   ` [PATCH v2 09/15] rev-list: allow commit-only bitmap traversals Jeff King
2020-02-18 18:18     ` Derrick Stolee
2020-02-18 20:05       ` Jeff King
2020-02-18 20:11         ` Derrick Stolee
2020-02-14 18:22   ` [PATCH v2 10/15] pack-bitmap: basic noop bitmap filter infrastructure Jeff King
2020-02-14 18:22   ` [PATCH v2 11/15] rev-list: use bitmap filters for traversal Jeff King
2020-02-14 18:22   ` [PATCH v2 12/15] bitmap: add bitmap_unset() function Jeff King
2020-02-14 18:22   ` [PATCH v2 13/15] pack-bitmap: implement BLOB_NONE filtering Jeff King
2020-02-18 19:26     ` Derrick Stolee
2020-02-18 19:36       ` Derrick Stolee
2020-02-18 20:30         ` Jeff King
2020-02-18 20:24       ` Jeff King
2020-02-14 18:22   ` [PATCH v2 14/15] pack-bitmap: implement BLOB_LIMIT filtering Jeff King
2020-02-14 18:22   ` [PATCH v2 15/15] pack-objects: support filters with bitmaps Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).