git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] Partial bundles
@ 2022-02-23 17:55 Derrick Stolee via GitGitGadget
  2022-02-23 17:55 ` [PATCH 01/11] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
                   ` (14 more replies)
  0 siblings, 15 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-23 17:55 UTC (permalink / raw)
  To: git; +Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy, Derrick Stolee

While discussing bundle-URIs [1], it came to my attention that bundles have
no way to specify an object filter, so bundles cannot be used with partial
clones.

[1]
https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/

This series provides a way to fix that by adding a 'filter' capability to
the bundle file format and allowing one to create a partial bundle with 'git
bundle create --filter=blob:none '.

There are a couple things that I want to point out about this implementation
that could use some high-level feedback:

 1. I moved the '--filter' parsing into setup_revisions() instead of adding
    another place to parse it. This works for 'git bundle' but it also
    allows it to be parsed successfully in commands such as 'git diff' which
    doesn't make sense. Options such as '--objects' are already being parsed
    there, and they don't make sense either, so I want some thoughts on
    this.

 2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
    filtered bundle, then the clone will fail with a message such as

fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
remote did not send all necessary objects

This might be fine. We don't expect users to clone partial bundles or fetch
partial bundles into an unfiltered repo and these failures are expected. It
is possible that we could put in custom logic to fail faster by reading the
bundle header for a filter.

Generally, the idea is to open this up as a potential way to bootstrap a
clone of a partial clone using a set of precomputed partial bundles.

Thanks, -Stolee

Derrick Stolee (11):
  index-pack: document and test the --promisor option
  revision: put object filter into struct rev_info
  pack-objects: use rev.filter when possible
  pack-bitmap: drop filter in prepare_bitmap_walk()
  list-objects: consolidate traverse_commit_list[_filtered]
  MyFirstObjectWalk: update recommended usage
  bundle: safely handle --objects option
  bundle: parse filter capability
  rev-list: move --filter parsing into revision.c
  bundle: create filtered bundles
  bundle: unbundle promisor packs

 Documentation/MyFirstObjectWalk.txt | 44 ++++++---------
 Documentation/git-index-pack.txt    |  8 +++
 builtin/pack-objects.c              |  9 +--
 builtin/rev-list.c                  | 29 +++-------
 bundle.c                            | 87 ++++++++++++++++++++++++-----
 bundle.h                            |  3 +
 list-objects-filter-options.c       |  2 +-
 list-objects-filter-options.h       |  5 ++
 list-objects.c                      | 25 +++------
 list-objects.h                      | 12 +++-
 pack-bitmap.c                       | 24 ++++----
 pack-bitmap.h                       |  2 -
 reachable.c                         |  2 +-
 revision.c                          | 11 ++++
 revision.h                          |  4 ++
 t/t5300-pack-object.sh              |  4 +-
 t/t6020-bundle-misc.sh              | 48 ++++++++++++++++
 17 files changed, 215 insertions(+), 104 deletions(-)


base-commit: 45fe28c951c3e70666ee4ef8379772851a8e4d32
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1159%2Fderrickstolee%2Fbundle%2Fpartial-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1159/derrickstolee/bundle/partial-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1159
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 01/11] index-pack: document and test the --promisor option
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
@ 2022-02-23 17:55 ` Derrick Stolee via GitGitGadget
  2022-02-23 17:55 ` [PATCH 02/11] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-23 17:55 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The --promisor option of 'git index-pack' was created in 88e2f9e
(introduce fetch-object: fetch one promisor object, 2017-12-05) but was
untested. It is currently unused within the Git codebase, but that will
change in an upcoming change to 'git bundle unbundle' when there is a
filter capability.

For now, add documentation about the option and add a test to ensure it
is working as expected.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-index-pack.txt | 8 ++++++++
 t/t5300-pack-object.sh           | 4 +++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 1f1e3592251..4e71c256ecb 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -122,6 +122,14 @@ This option cannot be used with --stdin.
 +
 include::object-format-disclaimer.txt[]
 
+--promisor[=<message>]::
+	Before committing the pack-index, create a .promisor file for this
+	pack. Particularly helpful when writing a promisor pack with --fix-thin
+	since the name of the pack is not final until the pack has been fully
+	written. If a `<message>` is provided, then that content will be
+	written to the .promisor file for future reference. See
+	link:technical/partial-clone.html[partial clone] for more information.
+
 NOTES
 -----
 
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 2fd845187e7..a11d61206ad 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -315,8 +315,10 @@ test_expect_success \
      git index-pack -o tmp.idx test-3.pack &&
      cmp tmp.idx test-1-${packname_1}.idx &&
 
-     git index-pack test-3.pack &&
+     git index-pack --promisor=message test-3.pack &&
      cmp test-3.idx test-1-${packname_1}.idx &&
+     echo message >expect &&
+     test_cmp expect test-3.promisor &&
 
      cat test-2-${packname_2}.pack >test-3.pack &&
      git index-pack -o tmp.idx test-2-${packname_2}.pack &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 02/11] revision: put object filter into struct rev_info
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
  2022-02-23 17:55 ` [PATCH 01/11] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
@ 2022-02-23 17:55 ` Derrick Stolee via GitGitGadget
  2022-03-04 22:15   ` Junio C Hamano
  2022-02-23 17:55 ` [PATCH 03/11] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-23 17:55 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Placing a 'struct list_objects_filter_options' pointer within 'struct
rev_info' will assist making some bookkeeping around object filters in
the future.

For now, let's use this new member to remove a static global instance of
the struct from builtin/rev-list.c.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/rev-list.c | 30 ++++++++++++++++--------------
 revision.h         |  4 ++++
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 777558e9b06..6f2b91d304e 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -62,7 +62,6 @@ static const char rev_list_usage[] =
 static struct progress *progress;
 static unsigned progress_counter;
 
-static struct list_objects_filter_options filter_options;
 static struct oidset omitted_objects;
 static int arg_print_omitted; /* print objects omitted by filter */
 
@@ -400,7 +399,6 @@ static inline int parse_missing_action_value(const char *value)
 }
 
 static int try_bitmap_count(struct rev_info *revs,
-			    struct list_objects_filter_options *filter,
 			    int filter_provided_objects)
 {
 	uint32_t commit_count = 0,
@@ -436,7 +434,8 @@ static int try_bitmap_count(struct rev_info *revs,
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
+					 filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -453,7 +452,6 @@ static int try_bitmap_count(struct rev_info *revs,
 }
 
 static int try_bitmap_traversal(struct rev_info *revs,
-				struct list_objects_filter_options *filter,
 				int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
@@ -465,7 +463,8 @@ static int try_bitmap_traversal(struct rev_info *revs,
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
+					 filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -475,7 +474,6 @@ static int try_bitmap_traversal(struct rev_info *revs,
 }
 
 static int try_bitmap_disk_usage(struct rev_info *revs,
-				 struct list_objects_filter_options *filter,
 				 int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
@@ -483,7 +481,7 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!show_disk_usage)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, revs->filter, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -597,13 +595,17 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		}
 
 		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
-			parse_list_objects_filter(&filter_options, arg);
-			if (filter_options.choice && !revs.blob_objects)
+			if (!revs.filter)
+				CALLOC_ARRAY(revs.filter, 1);
+			parse_list_objects_filter(revs.filter, arg);
+			if (revs.filter->choice && !revs.blob_objects)
 				die(_("object filtering requires --objects"));
 			continue;
 		}
 		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-			list_objects_filter_set_no_filter(&filter_options);
+			if (!revs.filter)
+				CALLOC_ARRAY(revs.filter, 1);
+			list_objects_filter_set_no_filter(revs.filter);
 			continue;
 		}
 		if (!strcmp(arg, "--filter-provided-objects")) {
@@ -688,11 +690,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		progress = start_delayed_progress(show_progress, 0);
 
 	if (use_bitmap_index) {
-		if (!try_bitmap_count(&revs, &filter_options, filter_provided_objects))
+		if (!try_bitmap_count(&revs, filter_provided_objects))
 			return 0;
-		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_objects))
+		if (!try_bitmap_disk_usage(&revs, filter_provided_objects))
 			return 0;
-		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_objects))
+		if (!try_bitmap_traversal(&revs, filter_provided_objects))
 			return 0;
 	}
 
@@ -733,7 +735,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
 
 	traverse_commit_list_filtered(
-		&filter_options, &revs, show_commit, show_object, &info,
+		revs.filter, &revs, show_commit, show_object, &info,
 		(arg_print_omitted ? &omitted_objects : NULL));
 
 	if (arg_print_omitted) {
diff --git a/revision.h b/revision.h
index 3c58c18c63a..1ddb73ab82e 100644
--- a/revision.h
+++ b/revision.h
@@ -81,6 +81,7 @@ struct rev_cmdline_info {
 
 struct oidset;
 struct topo_walk_info;
+struct list_objects_filter_options;
 
 struct rev_info {
 	/* Starting list */
@@ -94,6 +95,9 @@ struct rev_info {
 	/* The end-points specified by the end user */
 	struct rev_cmdline_info cmdline;
 
+	/* Object filter options. NULL for no filtering. */
+	struct list_objects_filter_options *filter;
+
 	/* excluding from --branches, --refs, etc. expansion */
 	struct string_list *ref_excludes;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 03/11] pack-objects: use rev.filter when possible
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
  2022-02-23 17:55 ` [PATCH 01/11] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
  2022-02-23 17:55 ` [PATCH 02/11] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
@ 2022-02-23 17:55 ` Derrick Stolee via GitGitGadget
  2022-03-04 22:25   ` Junio C Hamano
  2022-02-23 17:55 ` [PATCH 04/11] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-23 17:55 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

In builtin/pack-objects.c, we use a 'filter_options' global to populate
the --filter=<X> argument. The previous change created a pointer to a
filter option in 'struct rev_info', so we can use that pointer here as a
start to simplifying some usage of object filters.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ba2006f2212..256d9b1798f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, revs->filter, 0)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
@@ -3727,6 +3727,7 @@ static void get_object_list(int ac, const char **av)
 	repo_init_revisions(the_repository, &revs, NULL);
 	save_commit_buffer = 0;
 	setup_revisions(ac, av, &revs, &s_r_opt);
+	revs.filter = &filter_options;
 
 	/* make sure shallows are read */
 	is_repository_shallow(the_repository);
@@ -3777,7 +3778,7 @@ static void get_object_list(int ac, const char **av)
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
-	traverse_commit_list_filtered(&filter_options, &revs,
+	traverse_commit_list_filtered(revs.filter, &revs,
 				      show_commit, fn_show_object, NULL,
 				      NULL);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 04/11] pack-bitmap: drop filter in prepare_bitmap_walk()
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-02-23 17:55 ` [PATCH 03/11] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
@ 2022-02-23 17:55 ` Derrick Stolee via GitGitGadget
  2022-03-04 22:26   ` Junio C Hamano
  2022-02-23 17:55 ` [PATCH 05/11] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-23 17:55 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Now that all consumers of prepare_bitmap_walk() have populated the
'filter' member of 'struct rev_info', we can drop that extra parameter
from the method and access it directly from the 'struct rev_info'.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c |  2 +-
 builtin/rev-list.c     |  8 +++-----
 pack-bitmap.c          | 20 +++++++++-----------
 pack-bitmap.h          |  2 --
 reachable.c            |  2 +-
 5 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 256d9b1798f..57f2cf49696 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, revs->filter, 0)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, 0)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 6f2b91d304e..556e78aebb9 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -434,8 +434,7 @@ static int try_bitmap_count(struct rev_info *revs,
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
-					 filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -463,8 +462,7 @@ static int try_bitmap_traversal(struct rev_info *revs,
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
-					 filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -481,7 +479,7 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!show_disk_usage)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, revs->filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9c666cdb8bd..613f2797cdf 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -739,8 +739,7 @@ static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
 static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 				   struct rev_info *revs,
 				   struct object_list *roots,
-				   struct bitmap *seen,
-				   struct list_objects_filter_options *filter)
+				   struct bitmap *seen)
 {
 	struct bitmap *base = NULL;
 	int needs_walk = 0;
@@ -823,7 +822,7 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 		show_data.bitmap_git = bitmap_git;
 		show_data.base = base;
 
-		traverse_commit_list_filtered(filter, revs,
+		traverse_commit_list_filtered(revs->filter, revs,
 					      show_commit, show_object,
 					      &show_data, NULL);
 
@@ -1219,7 +1218,6 @@ static int can_filter_bitmap(struct list_objects_filter_options *filter)
 }
 
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter,
 					 int filter_provided_objects)
 {
 	unsigned int i;
@@ -1240,7 +1238,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (revs->prune)
 		return NULL;
 
-	if (!can_filter_bitmap(filter))
+	if (!can_filter_bitmap(revs->filter))
 		return NULL;
 
 	/* try to open a bitmapped pack, but don't parse it yet
@@ -1297,8 +1295,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 
 	if (haves) {
 		revs->ignore_missing_links = 1;
-		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL,
-					    filter);
+		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
 		reset_revision_walk();
 		revs->ignore_missing_links = 0;
 
@@ -1306,8 +1303,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 			BUG("failed to perform bitmap walk");
 	}
 
-	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap,
-				    filter);
+	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);
 
 	if (!wants_bitmap)
 		BUG("failed to perform bitmap walk");
@@ -1315,8 +1311,10 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
 
-	filter_bitmap(bitmap_git, (filter && filter_provided_objects) ? NULL : wants,
-		      wants_bitmap, filter);
+	filter_bitmap(bitmap_git,
+		      (revs->filter && filter_provided_objects) ? NULL : wants,
+		      wants_bitmap,
+		      revs->filter);
 
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 19a63fa1abc..3d3ddd77345 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -10,7 +10,6 @@
 struct commit;
 struct repository;
 struct rev_info;
-struct list_objects_filter_options;
 
 static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};
 
@@ -54,7 +53,6 @@ void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
 int test_bitmap_hashes(struct repository *r);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter,
 					 int filter_provided_objects);
 uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git);
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
diff --git a/reachable.c b/reachable.c
index 84e3d0d75ed..b9f4ad886ef 100644
--- a/reachable.c
+++ b/reachable.c
@@ -205,7 +205,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	cp.progress = progress;
 	cp.count = 0;
 
-	bitmap_git = prepare_bitmap_walk(revs, NULL, 0);
+	bitmap_git = prepare_bitmap_walk(revs, 0);
 	if (bitmap_git) {
 		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
 		free_bitmap_index(bitmap_git);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 05/11] list-objects: consolidate traverse_commit_list[_filtered]
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-02-23 17:55 ` [PATCH 04/11] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
@ 2022-02-23 17:55 ` Derrick Stolee via GitGitGadget
  2022-03-04 22:30   ` Junio C Hamano
  2022-02-23 17:55 ` [PATCH 06/11] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-23 17:55 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Now that all consumers of traverse_commit_list_filtered() populate the
'filter' member of 'struct rev_info', we can drop that parameter from
the method prototype to simplify things. In addition, the only thing
different now between traverse_commit_list_filtered() and
traverse_commit_list() is the presence of the 'omitted' parameter, which
is only non-NULL for one caller. We can consolidate these two methods by
having one call the other and use the simpler form everywhere the
'omitted' paramter would be NULL.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c |  6 +++---
 builtin/rev-list.c     |  2 +-
 list-objects.c         | 25 ++++++++-----------------
 list-objects.h         | 12 ++++++++++--
 pack-bitmap.c          |  6 +++---
 5 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 57f2cf49696..0432ae1e499 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3778,9 +3778,9 @@ static void get_object_list(int ac, const char **av)
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
-	traverse_commit_list_filtered(revs.filter, &revs,
-				      show_commit, fn_show_object, NULL,
-				      NULL);
+	traverse_commit_list(&revs,
+			     show_commit, fn_show_object,
+			     NULL);
 
 	if (unpack_unreachable_expiration) {
 		revs.ignore_missing_links = 1;
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 556e78aebb9..3ab727817fd 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -733,7 +733,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
 
 	traverse_commit_list_filtered(
-		revs.filter, &revs, show_commit, show_object, &info,
+		&revs, show_commit, show_object, &info,
 		(arg_print_omitted ? &omitted_objects : NULL));
 
 	if (arg_print_omitted) {
diff --git a/list-objects.c b/list-objects.c
index 2f623f82115..9422625b39e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -416,22 +416,7 @@ static void do_traverse(struct traversal_context *ctx)
 	strbuf_release(&csp);
 }
 
-void traverse_commit_list(struct rev_info *revs,
-			  show_commit_fn show_commit,
-			  show_object_fn show_object,
-			  void *show_data)
-{
-	struct traversal_context ctx;
-	ctx.revs = revs;
-	ctx.show_commit = show_commit;
-	ctx.show_object = show_object;
-	ctx.show_data = show_data;
-	ctx.filter = NULL;
-	do_traverse(&ctx);
-}
-
 void traverse_commit_list_filtered(
-	struct list_objects_filter_options *filter_options,
 	struct rev_info *revs,
 	show_commit_fn show_commit,
 	show_object_fn show_object,
@@ -444,7 +429,13 @@ void traverse_commit_list_filtered(
 	ctx.show_object = show_object;
 	ctx.show_commit = show_commit;
 	ctx.show_data = show_data;
-	ctx.filter = list_objects_filter__init(omitted, filter_options);
+	if (revs->filter)
+		ctx.filter = list_objects_filter__init(omitted, revs->filter);
+	else
+		ctx.filter = NULL;
+
 	do_traverse(&ctx);
-	list_objects_filter__free(ctx.filter);
+
+	if (ctx.filter)
+		list_objects_filter__free(ctx.filter);
 }
diff --git a/list-objects.h b/list-objects.h
index a952680e466..9eaf4de8449 100644
--- a/list-objects.h
+++ b/list-objects.h
@@ -7,7 +7,6 @@ struct rev_info;
 
 typedef void (*show_commit_fn)(struct commit *, void *);
 typedef void (*show_object_fn)(struct object *, const char *, void *);
-void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, void *);
 
 typedef void (*show_edge_fn)(struct commit *);
 void mark_edges_uninteresting(struct rev_info *revs,
@@ -18,11 +17,20 @@ struct oidset;
 struct list_objects_filter_options;
 
 void traverse_commit_list_filtered(
-	struct list_objects_filter_options *filter_options,
 	struct rev_info *revs,
 	show_commit_fn show_commit,
 	show_object_fn show_object,
 	void *show_data,
 	struct oidset *omitted);
 
+static inline void traverse_commit_list(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	void *show_data)
+{
+	traverse_commit_list_filtered(revs, show_commit,
+				      show_object, show_data, NULL);
+}
+
 #endif /* LIST_OBJECTS_H */
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 613f2797cdf..cbefaedbf43 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -822,9 +822,9 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 		show_data.bitmap_git = bitmap_git;
 		show_data.base = base;
 
-		traverse_commit_list_filtered(revs->filter, revs,
-					      show_commit, show_object,
-					      &show_data, NULL);
+		traverse_commit_list(revs,
+				     show_commit, show_object,
+				     &show_data);
 
 		revs->include_check = NULL;
 		revs->include_check_obj = NULL;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 06/11] MyFirstObjectWalk: update recommended usage
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-02-23 17:55 ` [PATCH 05/11] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
@ 2022-02-23 17:55 ` Derrick Stolee via GitGitGadget
  2022-03-04 22:33   ` Junio C Hamano
  2022-02-23 17:55 ` [PATCH 07/11] bundle: safely handle --objects option Derrick Stolee via GitGitGadget
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-23 17:55 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change consolidated traverse_commit_list() and
traverse_commit_list_filtered(). This allows us to simplify the
recommended usage in MyFirstObjectWalk.txt to use this new set of
values.

While here, add some clarification on the difference between the two
methods.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/MyFirstObjectWalk.txt | 44 +++++++++++------------------
 1 file changed, 16 insertions(+), 28 deletions(-)

diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
index ca267941f3e..8ec83185b8a 100644
--- a/Documentation/MyFirstObjectWalk.txt
+++ b/Documentation/MyFirstObjectWalk.txt
@@ -522,24 +522,25 @@ function shows that the all-object walk is being performed by
 `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
 functions reside in `list-objects.c`; examining the source shows that, despite
 the name, these functions traverse all kinds of objects. Let's have a look at
-the arguments to `traverse_commit_list_filtered()`, which are a superset of the
-arguments to the unfiltered version.
+the arguments to `traverse_commit_list()`.
 
-- `struct list_objects_filter_options *filter_options`: This is a struct which
-  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
-- `struct rev_info *revs`: This is the `rev_info` used for the walk.
+- `struct rev_info *revs`: This is the `rev_info` used for the walk. It
+  includes a `filter` member which contains information for how to filter
+  the object list.
 - `show_commit_fn show_commit`: A callback which will be used to handle each
   individual commit object.
 - `show_object_fn show_object`: A callback which will be used to handle each
   non-commit object (so each blob, tree, or tag).
 - `void *show_data`: A context buffer which is passed in turn to `show_commit`
   and `show_object`.
+
+In addition, `traverse_commit_list_filtered()` has an additional paramter:
+
 - `struct oidset *omitted`: A linked-list of object IDs which the provided
   filter caused to be omitted.
 
-It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
-instead of needing us to call it repeatedly ourselves. Cool! Let's add the
-callbacks first.
+It looks like these methods use callbacks we provide instead of needing us
+to call it repeatedly ourselves. Cool! Let's add the callbacks first.
 
 For the sake of this tutorial, we'll simply keep track of how many of each kind
 of object we find. At file scope in `builtin/walken.c` add the following
@@ -712,20 +713,9 @@ help understand. In our case, that means we omit trees and blobs not directly
 referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
 `HEAD` in the `pending` list.)
 
-First, we'll need to `#include "list-objects-filter-options.h"` and set up the
-`struct list_objects_filter_options` at the top of the function.
-
-----
-static void walken_object_walk(struct rev_info *rev)
-{
-	struct list_objects_filter_options filter_options = { 0 };
-
-	...
-----
-
 For now, we are not going to track the omitted objects, so we'll replace those
 parameters with `NULL`. For the sake of simplicity, we'll add a simple
-build-time branch to use our filter or not. Replace the line calling
+build-time branch to use our filter or not. Preface the line calling
 `traverse_commit_list()` with the following, which will remind us which kind of
 walk we've just performed:
 
@@ -733,19 +723,17 @@ walk we've just performed:
 	if (0) {
 		/* Unfiltered: */
 		trace_printf(_("Unfiltered object walk.\n"));
-		traverse_commit_list(rev, walken_show_commit,
-				walken_show_object, NULL);
 	} else {
 		trace_printf(
 			_("Filtered object walk with filterspec 'tree:1'.\n"));
-		parse_list_objects_filter(&filter_options, "tree:1");
-
-		traverse_commit_list_filtered(&filter_options, rev,
-			walken_show_commit, walken_show_object, NULL, NULL);
+		CALLOC_ARRAY(rev->filter, 1);
+		parse_list_objects_filter(rev->filter, "tree:1");
 	}
+	traverse_commit_list(rev, walken_show_commit,
+			     walken_show_object, NULL);
 ----
 
-`struct list_objects_filter_options` is usually built directly from a command
+The `rev->filter` member is usually built directly from a command
 line argument, so the module provides an easy way to build one from a string.
 Even though we aren't taking user input right now, we can still build one with
 a hardcoded string using `parse_list_objects_filter()`.
@@ -784,7 +772,7 @@ object:
 ----
 	...
 
-		traverse_commit_list_filtered(&filter_options, rev,
+		traverse_commit_list_filtered(rev,
 			walken_show_commit, walken_show_object, NULL, &omitted);
 
 	...
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 07/11] bundle: safely handle --objects option
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-02-23 17:55 ` [PATCH 06/11] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
@ 2022-02-23 17:55 ` Derrick Stolee via GitGitGadget
  2022-02-28 16:00   ` Jeff Hostetler
                     ` (2 more replies)
  2022-02-23 17:55 ` [PATCH 08/11] bundle: parse filter capability Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  14 siblings, 3 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-23 17:55 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Since 'git bundle' uses setup_revisions() to specify the object walk,
some options do not make sense to include during the pack-objects child
process. Further, these options are used for a call to
traverse_commit_list() which would then require a callback which is
currently NULL.

By populating the callback we prevent a segfault in the case of adding
the --objects flag. This is really a redundant statement because the
bundles are constructing a pack-file containing all objects in the
discovered commit range.

Adding --objects to a 'git bundle' command might cause a slower command,
but at least it will not have a hard failure when the user supplies this
option. We can also disable walking trees and blobs in advance of this
walk.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               | 10 +++++++++-
 t/t6020-bundle-misc.sh | 12 ++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/bundle.c b/bundle.c
index a0bb687b0f4..dc56db9a50a 100644
--- a/bundle.c
+++ b/bundle.c
@@ -451,6 +451,12 @@ struct bundle_prerequisites_info {
 	int fd;
 };
 
+
+static void ignore_object(struct object *obj, const char *v, void *data)
+{
+	/* Do nothing. */
+}
+
 static void write_bundle_prerequisites(struct commit *commit, void *data)
 {
 	struct bundle_prerequisites_info *bpi = data;
@@ -544,7 +550,9 @@ int create_bundle(struct repository *r, const char *path,
 		die("revision walk setup failed");
 	bpi.fd = bundle_fd;
 	bpi.pending = &revs_copy.pending;
-	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
+
+	revs.blob_objects = revs.tree_objects = 0;
+	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
 	object_array_remove_duplicates(&revs_copy.pending);
 
 	/* write bundle refs */
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index b13e8a52a93..6522401617d 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -475,4 +475,16 @@ test_expect_success 'clone from bundle' '
 	test_cmp expect actual
 '
 
+test_expect_success 'unfiltered bundle with --objects' '
+	git bundle create all-objects.bdl \
+		--all --objects &&
+	git bundle create all.bdl \
+		--all &&
+
+	# Compare the headers of these files.
+	head -11 all.bdl >expect &&
+	head -11 all-objects.bdl >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 08/11] bundle: parse filter capability
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2022-02-23 17:55 ` [PATCH 07/11] bundle: safely handle --objects option Derrick Stolee via GitGitGadget
@ 2022-02-23 17:55 ` Derrick Stolee via GitGitGadget
  2022-03-07 15:38   ` Ævar Arnfjörð Bjarmason
  2022-03-07 15:55   ` Ævar Arnfjörð Bjarmason
  2022-02-23 17:55 ` [PATCH 09/11] rev-list: move --filter parsing into revision.c Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-23 17:55 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The v3 bundle format has capabilities, allowing newer versions of Git to
create bundles with newer features. Older versions that do not
understand these new capabilities will fail with a helpful warning.

Create a new capability allowing Git to understand that the contained
pack-file is filtered according to some object filter. Typically, this
filter will be "blob:none" for a blobless partial clone.

This change teaches Git to parse this capability, place its value in the
bundle header, and demonstrate this understanding by adding a message to
'git bundle verify'.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c                      | 17 ++++++++++++++++-
 bundle.h                      |  3 +++
 list-objects-filter-options.c |  2 +-
 list-objects-filter-options.h |  5 +++++
 4 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/bundle.c b/bundle.c
index dc56db9a50a..2afced4d991 100644
--- a/bundle.c
+++ b/bundle.c
@@ -11,7 +11,7 @@
 #include "run-command.h"
 #include "refs.h"
 #include "strvec.h"
-
+#include "list-objects-filter-options.h"
 
 static const char v2_bundle_signature[] = "# v2 git bundle\n";
 static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -33,6 +33,8 @@ void bundle_header_release(struct bundle_header *header)
 {
 	string_list_clear(&header->prerequisites, 1);
 	string_list_clear(&header->references, 1);
+	list_objects_filter_release(header->filter);
+	free(header->filter);
 }
 
 static int parse_capability(struct bundle_header *header, const char *capability)
@@ -45,6 +47,11 @@ static int parse_capability(struct bundle_header *header, const char *capability
 		header->hash_algo = &hash_algos[algo];
 		return 0;
 	}
+	if (skip_prefix(capability, "filter=", &arg)) {
+		CALLOC_ARRAY(header->filter, 1);
+		parse_list_objects_filter(header->filter, arg);
+		return 0;
+	}
 	return error(_("unknown capability '%s'"), capability);
 }
 
@@ -220,6 +227,8 @@ int verify_bundle(struct repository *r,
 	req_nr = revs.pending.nr;
 	setup_revisions(2, argv, &revs, NULL);
 
+	revs.filter = header->filter;
+
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 
@@ -259,6 +268,12 @@ int verify_bundle(struct repository *r,
 			     r->nr),
 			  r->nr);
 		list_refs(r, 0, NULL);
+
+		if (header->filter) {
+			printf_ln("The bundle uses this filter: %s",
+				  list_objects_filter_spec(header->filter));
+		}
+
 		r = &header->prerequisites;
 		if (!r->nr) {
 			printf_ln(_("The bundle records a complete history."));
diff --git a/bundle.h b/bundle.h
index 06009fe6b1f..eb026153d56 100644
--- a/bundle.h
+++ b/bundle.h
@@ -5,11 +5,14 @@
 #include "cache.h"
 #include "string-list.h"
 
+struct list_objects_filter_options;
+
 struct bundle_header {
 	unsigned version;
 	struct string_list prerequisites;
 	struct string_list references;
 	const struct git_hash_algo *hash_algo;
+	struct list_objects_filter_options *filter;
 };
 
 #define BUNDLE_HEADER_INIT \
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index fd8d59f653a..b9d10770e4f 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -55,7 +55,7 @@ const char *list_object_filter_config_name(enum list_objects_filter_choice c)
  * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
  * convenience of the current command.
  */
-static int gently_parse_list_objects_filter(
+int gently_parse_list_objects_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg,
 	struct strbuf *errbuf)
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index da5b6737e27..347a99c28cf 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -72,6 +72,11 @@ struct list_objects_filter_options {
 /* Normalized command line arguments */
 #define CL_ARG__FILTER "filter"
 
+int gently_parse_list_objects_filter(
+	struct list_objects_filter_options *filter_options,
+	const char *arg,
+	struct strbuf *errbuf);
+
 void list_objects_filter_die_if_populated(
 	struct list_objects_filter_options *filter_options);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 09/11] rev-list: move --filter parsing into revision.c
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2022-02-23 17:55 ` [PATCH 08/11] bundle: parse filter capability Derrick Stolee via GitGitGadget
@ 2022-02-23 17:55 ` Derrick Stolee via GitGitGadget
  2022-02-23 17:55 ` [PATCH 10/11] bundle: create filtered bundles Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-23 17:55 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Now that 'struct rev_info' has a 'filter' member and most consumers of
object filtering are using that member instead of an external struct,
move the parsing of the '--filter' option out of builtin/rev-list.c and
into revision.c.

This use within handle_revision_pseudo_opt() allows us to find the
option within setup_revisions() if the arguments are passed directly. In
the case of a command such as 'git blame', the arguments are first
scanned and checked with parse_revision_opt(), which complains about the
option, so 'git blame --filter=blob:none <file>' does not become valid
with this change.

Some commands, such as 'git diff' gain this option without having it
make an effect. And 'git diff --objects' was already possible, but does
not actually make sense in that builtin.

The key addition that is coming is 'git bundle create --filter=<X>' so
we can create bundles containing promisor packs. More work is required
to make them fully functional, but that will follow.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/rev-list.c | 15 ---------------
 revision.c         | 11 +++++++++++
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 3ab727817fd..640828149c5 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -591,21 +591,6 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			show_progress = arg;
 			continue;
 		}
-
-		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
-			if (!revs.filter)
-				CALLOC_ARRAY(revs.filter, 1);
-			parse_list_objects_filter(revs.filter, arg);
-			if (revs.filter->choice && !revs.blob_objects)
-				die(_("object filtering requires --objects"));
-			continue;
-		}
-		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-			if (!revs.filter)
-				CALLOC_ARRAY(revs.filter, 1);
-			list_objects_filter_set_no_filter(revs.filter);
-			continue;
-		}
 		if (!strcmp(arg, "--filter-provided-objects")) {
 			filter_provided_objects = 1;
 			continue;
diff --git a/revision.c b/revision.c
index ad4286fbdde..1d612c1c102 100644
--- a/revision.c
+++ b/revision.c
@@ -32,6 +32,7 @@
 #include "utf8.h"
 #include "bloom.h"
 #include "json-writer.h"
+#include "list-objects-filter-options.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -2669,6 +2670,14 @@ static int handle_revision_pseudo_opt(struct rev_info *revs,
 		revs->no_walk = 0;
 	} else if (!strcmp(arg, "--single-worktree")) {
 		revs->single_worktree = 1;
+	} else if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
+		if (!revs->filter)
+			CALLOC_ARRAY(revs->filter, 1);
+		parse_list_objects_filter(revs->filter, arg);
+	} else if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
+		if (!revs->filter)
+			CALLOC_ARRAY(revs->filter, 1);
+		list_objects_filter_set_no_filter(revs->filter);
 	} else {
 		return 0;
 	}
@@ -2872,6 +2881,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 		die("cannot combine --walk-reflogs with history-limiting options");
 	if (revs->rewrite_parents && revs->children.name)
 		die(_("options '%s' and '%s' cannot be used together"), "--parents", "--children");
+	if (revs->filter && revs->filter->choice && !revs->blob_objects)
+		die(_("object filtering requires --objects"));
 
 	/*
 	 * Limitations on the graph functionality
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 10/11] bundle: create filtered bundles
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2022-02-23 17:55 ` [PATCH 09/11] rev-list: move --filter parsing into revision.c Derrick Stolee via GitGitGadget
@ 2022-02-23 17:55 ` Derrick Stolee via GitGitGadget
  2022-03-04 23:35   ` Junio C Hamano
  2022-03-07 15:44   ` Ævar Arnfjörð Bjarmason
  2022-02-23 17:55 ` [PATCH 11/11] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  14 siblings, 2 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-23 17:55 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

A previous change allowed Git to parse bundles with the 'filter'
capability. Now, teach Git to create bundles with this option.

Some rearranging of code is required to get the option parsing in the
correct spot. There are now two reasons why we might need capabilities
(a new hash algorithm or an object filter) so that is pulled out into a
place where we can check both at the same time.

The --filter option is parsed as part of setup_revisions(), but it
expected the --objects flag, too. That flag is somewhat implied by 'git
bundle' because it creates a pack-file walking objects, but there is
also a walk that walks the revision range expecting only commits. Make
this parsing work by setting 'revs.tree_objects' and 'revs.blob_objects'
before the call to setup_revisions().

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               | 56 +++++++++++++++++++++++++++++++++---------
 t/t6020-bundle-misc.sh | 30 ++++++++++++++++++++++
 2 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/bundle.c b/bundle.c
index 2afced4d991..e284ef63062 100644
--- a/bundle.c
+++ b/bundle.c
@@ -334,6 +334,9 @@ static int write_pack_data(int bundle_fd, struct rev_info *revs, struct strvec *
 		     "--stdout", "--thin", "--delta-base-offset",
 		     NULL);
 	strvec_pushv(&pack_objects.args, pack_options->v);
+	if (revs->filter)
+		strvec_pushf(&pack_objects.args, "--filter=%s",
+			     list_objects_filter_spec(revs->filter));
 	pack_objects.in = -1;
 	pack_objects.out = bundle_fd;
 	pack_objects.git_cmd = 1;
@@ -507,10 +510,38 @@ int create_bundle(struct repository *r, const char *path,
 	int bundle_to_stdout;
 	int ref_count = 0;
 	struct rev_info revs, revs_copy;
-	int min_version = the_hash_algo == &hash_algos[GIT_HASH_SHA1] ? 2 : 3;
+	int min_version = 2;
 	struct bundle_prerequisites_info bpi;
 	int i;
 
+	/* init revs to list objects for pack-objects later */
+	save_commit_buffer = 0;
+	repo_init_revisions(r, &revs, NULL);
+
+	/*
+	 * Pre-initialize the '--objects' flag so we can parse a
+	 * --filter option successfully.
+	 */
+	revs.tree_objects = revs.blob_objects = 1;
+
+	argc = setup_revisions(argc, argv, &revs, NULL);
+
+	/*
+	 * Reasons to require version 3:
+	 *
+	 * 1. @object-format is required because our hash algorithm is not
+	 *    SHA1.
+	 * 2. @filter is required because we parsed an object filter.
+	 */
+	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] ||
+	    revs.filter)
+		min_version = 3;
+
+	if (argc > 1) {
+		error(_("unrecognized argument: %s"), argv[1]);
+		goto err;
+	}
+
 	bundle_to_stdout = !strcmp(path, "-");
 	if (bundle_to_stdout)
 		bundle_fd = 1;
@@ -533,17 +564,14 @@ int create_bundle(struct repository *r, const char *path,
 		write_or_die(bundle_fd, capability, strlen(capability));
 		write_or_die(bundle_fd, the_hash_algo->name, strlen(the_hash_algo->name));
 		write_or_die(bundle_fd, "\n", 1);
-	}
-
-	/* init revs to list objects for pack-objects later */
-	save_commit_buffer = 0;
-	repo_init_revisions(r, &revs, NULL);
 
-	argc = setup_revisions(argc, argv, &revs, NULL);
-
-	if (argc > 1) {
-		error(_("unrecognized argument: %s"), argv[1]);
-		goto err;
+		if (revs.filter) {
+			const char *value = expand_list_objects_filter_spec(revs.filter);
+			capability = "@filter=";
+			write_or_die(bundle_fd, capability, strlen(capability));
+			write_or_die(bundle_fd, value, strlen(value));
+			write_or_die(bundle_fd, "\n", 1);
+		}
 	}
 
 	/* save revs.pending in revs_copy for later use */
@@ -566,6 +594,12 @@ int create_bundle(struct repository *r, const char *path,
 	bpi.fd = bundle_fd;
 	bpi.pending = &revs_copy.pending;
 
+	/*
+	 * Nullify the filter here, and any object walking. We only care
+	 * about commits and tags here. The revs_copy has the right
+	 * instances of these values.
+	 */
+	revs.filter = NULL;
 	revs.blob_objects = revs.tree_objects = 0;
 	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
 	object_array_remove_duplicates(&revs_copy.pending);
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 6522401617d..39cfefafb65 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -487,4 +487,34 @@ test_expect_success 'unfiltered bundle with --objects' '
 	test_cmp expect actual
 '
 
+for filter in "blob:none" "tree:0" "tree:1" "blob:limit=100"
+do
+	test_expect_success 'filtered bundle: $filter' '
+		test_when_finished rm -rf .git/objects/pack &&
+		git bundle create partial.bdl \
+			--all \
+			--filter=$filter &&
+
+		git bundle verify partial.bdl >unfiltered &&
+		make_user_friendly_and_stable_output <unfiltered >actual &&
+
+		cat >expect <<-EOF &&
+		The bundle contains these 10 refs:
+		<COMMIT-P> refs/heads/main
+		<COMMIT-N> refs/heads/release
+		<COMMIT-D> refs/heads/topic/1
+		<COMMIT-H> refs/heads/topic/2
+		<COMMIT-D> refs/pull/1/head
+		<COMMIT-G> refs/pull/2/head
+		<TAG-1> refs/tags/v1
+		<TAG-2> refs/tags/v2
+		<TAG-3> refs/tags/v3
+		<COMMIT-P> HEAD
+		The bundle uses this filter: $filter
+		The bundle records a complete history.
+		EOF
+		test_cmp expect actual
+	'
+done
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 11/11] bundle: unbundle promisor packs
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
                   ` (9 preceding siblings ...)
  2022-02-23 17:55 ` [PATCH 10/11] bundle: create filtered bundles Derrick Stolee via GitGitGadget
@ 2022-02-23 17:55 ` Derrick Stolee via GitGitGadget
  2022-03-04 23:43   ` Junio C Hamano
  2022-03-07 15:47   ` Ævar Arnfjörð Bjarmason
  2022-02-28 17:00 ` [PATCH 00/11] Partial bundles Jeff Hostetler
                   ` (3 subsequent siblings)
  14 siblings, 2 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-23 17:55 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

In order to have a valid pack-file after unbundling a bundle that has
the 'filter' capability, we need to generate a .promisor file. The
bundle does not promise _where_ the objects can be found, but we can
expect that these bundles will be unbundled in repositories with
appropriate promisor remotes that can find those missing objects.

Use the 'git index-pack --promisor=<message>' option to create this
.promisor file. Add "from-bundle" as the message to help anyone diagnose
issues with these promisor packs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               | 4 ++++
 t/t6020-bundle-misc.sh | 8 +++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/bundle.c b/bundle.c
index e284ef63062..3d97de40ef0 100644
--- a/bundle.c
+++ b/bundle.c
@@ -631,6 +631,10 @@ int unbundle(struct repository *r, struct bundle_header *header,
 	struct child_process ip = CHILD_PROCESS_INIT;
 	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
 
+	/* If there is a filter, then we need to create the promisor pack. */
+	if (header->filter)
+		strvec_push(&ip.args, "--promisor=from-bundle");
+
 	if (extra_index_pack_args) {
 		strvec_pushv(&ip.args, extra_index_pack_args->v);
 		strvec_clear(extra_index_pack_args);
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 39cfefafb65..344af34db1e 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -513,7 +513,13 @@ do
 		The bundle uses this filter: $filter
 		The bundle records a complete history.
 		EOF
-		test_cmp expect actual
+		test_cmp expect actual &&
+
+		# This creates the first pack-file in the
+		# .git/objects/pack directory. Look for a .promisor.
+		git bundle unbundle partial.bdl &&
+		ls .git/objects/pack/pack-*.promisor >promisor &&
+		test_line_count = 1 promisor
 	'
 done
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/11] bundle: safely handle --objects option
  2022-02-23 17:55 ` [PATCH 07/11] bundle: safely handle --objects option Derrick Stolee via GitGitGadget
@ 2022-02-28 16:00   ` Jeff Hostetler
  2022-03-04 22:58     ` Junio C Hamano
  2022-03-04 22:57   ` Junio C Hamano
  2022-03-07 15:35   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 114+ messages in thread
From: Jeff Hostetler @ 2022-02-28 16:00 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy, Derrick Stolee



On 2/23/22 12:55 PM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> Since 'git bundle' uses setup_revisions() to specify the object walk,
> some options do not make sense to include during the pack-objects child
> process. Further, these options are used for a call to
> traverse_commit_list() which would then require a callback which is
> currently NULL.
> 
> By populating the callback we prevent a segfault in the case of adding
> the --objects flag. This is really a redundant statement because the

Nit: I stumbled over "...because the bundles are constructing..."
Is there a better wording here??

> bundles are constructing a pack-file containing all objects in the
> discovered commit range.
> 
> Adding --objects to a 'git bundle' command might cause a slower command,
> but at least it will not have a hard failure when the user supplies this
> option. We can also disable walking trees and blobs in advance of this
> walk.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>   bundle.c               | 10 +++++++++-
>   t/t6020-bundle-misc.sh | 12 ++++++++++++
>   2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/bundle.c b/bundle.c
> index a0bb687b0f4..dc56db9a50a 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -451,6 +451,12 @@ struct bundle_prerequisites_info {
>   	int fd;
>   };
>   
> +
> +static void ignore_object(struct object *obj, const char *v, void *data)
> +{
> +	/* Do nothing. */
> +}
> +
>   static void write_bundle_prerequisites(struct commit *commit, void *data)
>   {
>   	struct bundle_prerequisites_info *bpi = data;
> @@ -544,7 +550,9 @@ int create_bundle(struct repository *r, const char *path,
>   		die("revision walk setup failed");
>   	bpi.fd = bundle_fd;
>   	bpi.pending = &revs_copy.pending;
> -	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
> +
> +	revs.blob_objects = revs.tree_objects = 0;
> +	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
>   	object_array_remove_duplicates(&revs_copy.pending);
>   
>   	/* write bundle refs */
> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
> index b13e8a52a93..6522401617d 100755
> --- a/t/t6020-bundle-misc.sh
> +++ b/t/t6020-bundle-misc.sh
> @@ -475,4 +475,16 @@ test_expect_success 'clone from bundle' '
>   	test_cmp expect actual
>   '
>   
> +test_expect_success 'unfiltered bundle with --objects' '
> +	git bundle create all-objects.bdl \
> +		--all --objects &&
> +	git bundle create all.bdl \
> +		--all &&
> +
> +	# Compare the headers of these files.
> +	head -11 all.bdl >expect &&
> +	head -11 all-objects.bdl >actual &&
> +	test_cmp expect actual
> +'
> +
>   test_done
> 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/11] Partial bundles
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
                   ` (10 preceding siblings ...)
  2022-02-23 17:55 ` [PATCH 11/11] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
@ 2022-02-28 17:00 ` Jeff Hostetler
  2022-02-28 17:54   ` Derrick Stolee
  2022-03-04 19:19 ` Derrick Stolee
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 114+ messages in thread
From: Jeff Hostetler @ 2022-02-28 17:00 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy, Derrick Stolee



On 2/23/22 12:55 PM, Derrick Stolee via GitGitGadget wrote:
> While discussing bundle-URIs [1], it came to my attention that bundles have
> no way to specify an object filter, so bundles cannot be used with partial
> clones.
> 
> [1]
> https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/
> 
> This series provides a way to fix that by adding a 'filter' capability to
> the bundle file format and allowing one to create a partial bundle with 'git
> bundle create --filter=blob:none '.

Nicely done.  There's a lot of refactoring here to move the
filtering code into a more usable place and get rid of some
of the awkward limitations of my original code.  Sorry that
you had to slog thru all of that.

> 
> There are a couple things that I want to point out about this implementation
> that could use some high-level feedback:
> 
>   1. I moved the '--filter' parsing into setup_revisions() instead of adding
>      another place to parse it. This works for 'git bundle' but it also
>      allows it to be parsed successfully in commands such as 'git diff' which
>      doesn't make sense. Options such as '--objects' are already being parsed
>      there, and they don't make sense either, so I want some thoughts on
>      this.

This feels like something that can wait for another task.
Let's keep this series focused on adding partial bundles.

> 
>   2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
>      filtered bundle, then the clone will fail with a message such as
> 
> fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
> remote did not send all necessary objects
> 
> This might be fine. We don't expect users to clone partial bundles or fetch
> partial bundles into an unfiltered repo and these failures are expected. It
> is possible that we could put in custom logic to fail faster by reading the
> bundle header for a filter.
> 
> Generally, the idea is to open this up as a potential way to bootstrap a
> clone of a partial clone using a set of precomputed partial bundles.

I think this is to be expected.

Would it help to have Git do a no-checkout clone when cloning
from a partial bundle?  Maybe that would give the user a chance to set
a real remote (and maybe set the partial clone/fetch config settings)
and then backfill their local clone??   (That might be functional, but
not very user-friendly....)

Or should we just consider this limitation as a placeholder while we
wait for the Bundle URI effort?

Jeff

> 
> Thanks, -Stolee
> 
> Derrick Stolee (11):
>    index-pack: document and test the --promisor option
>    revision: put object filter into struct rev_info
>    pack-objects: use rev.filter when possible
>    pack-bitmap: drop filter in prepare_bitmap_walk()
>    list-objects: consolidate traverse_commit_list[_filtered]
>    MyFirstObjectWalk: update recommended usage
>    bundle: safely handle --objects option
>    bundle: parse filter capability
>    rev-list: move --filter parsing into revision.c
>    bundle: create filtered bundles
>    bundle: unbundle promisor packs
> 
>   Documentation/MyFirstObjectWalk.txt | 44 ++++++---------
>   Documentation/git-index-pack.txt    |  8 +++
>   builtin/pack-objects.c              |  9 +--
>   builtin/rev-list.c                  | 29 +++-------
>   bundle.c                            | 87 ++++++++++++++++++++++++-----
>   bundle.h                            |  3 +
>   list-objects-filter-options.c       |  2 +-
>   list-objects-filter-options.h       |  5 ++
>   list-objects.c                      | 25 +++------
>   list-objects.h                      | 12 +++-
>   pack-bitmap.c                       | 24 ++++----
>   pack-bitmap.h                       |  2 -
>   reachable.c                         |  2 +-
>   revision.c                          | 11 ++++
>   revision.h                          |  4 ++
>   t/t5300-pack-object.sh              |  4 +-
>   t/t6020-bundle-misc.sh              | 48 ++++++++++++++++
>   17 files changed, 215 insertions(+), 104 deletions(-)
> 
> 
> base-commit: 45fe28c951c3e70666ee4ef8379772851a8e4d32
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1159%2Fderrickstolee%2Fbundle%2Fpartial-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1159/derrickstolee/bundle/partial-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1159
> 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/11] Partial bundles
  2022-02-28 17:00 ` [PATCH 00/11] Partial bundles Jeff Hostetler
@ 2022-02-28 17:54   ` Derrick Stolee
  2022-03-01 18:03     ` Jeff Hostetler
  0 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee @ 2022-02-28 17:54 UTC (permalink / raw)
  To: Jeff Hostetler, Derrick Stolee via GitGitGadget, git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy

On 2/28/2022 12:00 PM, Jeff Hostetler wrote:
> On 2/23/22 12:55 PM, Derrick Stolee via GitGitGadget wrote:
>> While discussing bundle-URIs [1], it came to my attention that bundles have
>> no way to specify an object filter, so bundles cannot be used with partial
>> clones.
>>
>> [1]
>> https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/
>>
>> This series provides a way to fix that by adding a 'filter' capability to
>> the bundle file format and allowing one to create a partial bundle with 'git
>> bundle create --filter=blob:none '.
> 
> Nicely done.  There's a lot of refactoring here to move the
> filtering code into a more usable place and get rid of some
> of the awkward limitations of my original code.  Sorry that
> you had to slog thru all of that.
> 
>>
>> There are a couple things that I want to point out about this implementation
>> that could use some high-level feedback:
>>
>>   1. I moved the '--filter' parsing into setup_revisions() instead of adding
>>      another place to parse it. This works for 'git bundle' but it also
>>      allows it to be parsed successfully in commands such as 'git diff' which
>>      doesn't make sense. Options such as '--objects' are already being parsed
>>      there, and they don't make sense either, so I want some thoughts on
>>      this.
> 
> This feels like something that can wait for another task.
> Let's keep this series focused on adding partial bundles.

What do you mean "can wait"? Do you recommend that I _don't_ do this
refactor and instead implement filter parsing directly in bundles?

Or, are you saying that we should not worry about these potential
side-effects of allowing (then ignoring) certain options in other
commands, at least until a later series?
 
>>   2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
>>      filtered bundle, then the clone will fail with a message such as
>>
>> fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
>> remote did not send all necessary objects
>>
>> This might be fine. We don't expect users to clone partial bundles or fetch
>> partial bundles into an unfiltered repo and these failures are expected. It
>> is possible that we could put in custom logic to fail faster by reading the
>> bundle header for a filter.
>>
>> Generally, the idea is to open this up as a potential way to bootstrap a
>> clone of a partial clone using a set of precomputed partial bundles.
> 
> I think this is to be expected.
> 
> Would it help to have Git do a no-checkout clone when cloning
> from a partial bundle?  Maybe that would give the user a chance to set
> a real remote (and maybe set the partial clone/fetch config settings)
> and then backfill their local clone??   (That might be functional, but
> not very user-friendly....)
> 
> Or should we just consider this limitation as a placeholder while we
> wait for the Bundle URI effort?

It would be interesting to have another application of partial
bundles, such as cloning directly from a bundle, then allowing
a remote to be configured. It provides a "build-it-yourself"
approach to bundle URIs for partial clones.

I'm not sure if such an application is required for this series,
or if it could be delayed until after. I'm open to suggestions.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/11] Partial bundles
  2022-02-28 17:54   ` Derrick Stolee
@ 2022-03-01 18:03     ` Jeff Hostetler
  0 siblings, 0 replies; 114+ messages in thread
From: Jeff Hostetler @ 2022-03-01 18:03 UTC (permalink / raw)
  To: Derrick Stolee, Derrick Stolee via GitGitGadget, git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy



On 2/28/22 12:54 PM, Derrick Stolee wrote:
> On 2/28/2022 12:00 PM, Jeff Hostetler wrote:
>> On 2/23/22 12:55 PM, Derrick Stolee via GitGitGadget wrote:
>>> While discussing bundle-URIs [1], it came to my attention that bundles have
>>> no way to specify an object filter, so bundles cannot be used with partial
>>> clones.
>>>
>>> [1]
>>> https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/
>>>
>>> This series provides a way to fix that by adding a 'filter' capability to
>>> the bundle file format and allowing one to create a partial bundle with 'git
>>> bundle create --filter=blob:none '.
>>
>> Nicely done.  There's a lot of refactoring here to move the
>> filtering code into a more usable place and get rid of some
>> of the awkward limitations of my original code.  Sorry that
>> you had to slog thru all of that.
>>
>>>
>>> There are a couple things that I want to point out about this implementation
>>> that could use some high-level feedback:
>>>
>>>    1. I moved the '--filter' parsing into setup_revisions() instead of adding
>>>       another place to parse it. This works for 'git bundle' but it also
>>>       allows it to be parsed successfully in commands such as 'git diff' which
>>>       doesn't make sense. Options such as '--objects' are already being parsed
>>>       there, and they don't make sense either, so I want some thoughts on
>>>       this.
>>
>> This feels like something that can wait for another task.
>> Let's keep this series focused on adding partial bundles.
> 
> What do you mean "can wait"? Do you recommend that I _don't_ do this
> refactor and instead implement filter parsing directly in bundles?
> 
> Or, are you saying that we should not worry about these potential
> side-effects of allowing (then ignoring) certain options in other
> commands, at least until a later series?

Sorry to be confusing here.  I just meant that the refactoring looks
good and we should continue with it.  And for now let's not worry
about those odd non-sense combinations that may now be possible.

That at some point later we can address the non-sense combinations
with some control flags on the call to the common parser or something.
I started thinking about that while looking at the code and just thought
that it would be a bit of a mess.  And that we didn't need to do any
thing about it now.  Such tweaking would be better done in a later
series.

>   
>>>    2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
>>>       filtered bundle, then the clone will fail with a message such as
>>>
>>> fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
>>> remote did not send all necessary objects
>>>
>>> This might be fine. We don't expect users to clone partial bundles or fetch
>>> partial bundles into an unfiltered repo and these failures are expected. It
>>> is possible that we could put in custom logic to fail faster by reading the
>>> bundle header for a filter.
>>>
>>> Generally, the idea is to open this up as a potential way to bootstrap a
>>> clone of a partial clone using a set of precomputed partial bundles.
>>
>> I think this is to be expected.
>>
>> Would it help to have Git do a no-checkout clone when cloning
>> from a partial bundle?  Maybe that would give the user a chance to set
>> a real remote (and maybe set the partial clone/fetch config settings)
>> and then backfill their local clone??   (That might be functional, but
>> not very user-friendly....)
>>
>> Or should we just consider this limitation as a placeholder while we
>> wait for the Bundle URI effort?
> 
> It would be interesting to have another application of partial
> bundles, such as cloning directly from a bundle, then allowing
> a remote to be configured. It provides a "build-it-yourself"
> approach to bundle URIs for partial clones.
> 
> I'm not sure if such an application is required for this series,
> or if it could be delayed until after. I'm open to suggestions.

I think what you have here is nice and self-contained.  That is,
supporting partial bundles is a nice stop point.  Then have 1 or 2
series that consumes them.

Jeff

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/11] Partial bundles
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
                   ` (11 preceding siblings ...)
  2022-02-28 17:00 ` [PATCH 00/11] Partial bundles Jeff Hostetler
@ 2022-03-04 19:19 ` Derrick Stolee
  2022-03-07 14:55 ` Ævar Arnfjörð Bjarmason
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
  14 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee @ 2022-03-04 19:19 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy, Jeff Hostetler

On 2/23/2022 12:55 PM, Derrick Stolee via GitGitGadget wrote:
> While discussing bundle-URIs [1], it came to my attention that bundles have
> no way to specify an object filter, so bundles cannot be used with partial
> clones.
> 
> [1]
> https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/
> 
> This series provides a way to fix that by adding a 'filter' capability to
> the bundle file format and allowing one to create a partial bundle with 'git
> bundle create --filter=blob:none '.
> 
> There are a couple things that I want to point out about this implementation
> that could use some high-level feedback:
> 
>  1. I moved the '--filter' parsing into setup_revisions() instead of adding
>     another place to parse it. This works for 'git bundle' but it also
>     allows it to be parsed successfully in commands such as 'git diff' which
>     doesn't make sense. Options such as '--objects' are already being parsed
>     there, and they don't make sense either, so I want some thoughts on
>     this.
> 
>  2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
>     filtered bundle, then the clone will fail with a message such as
> 
> fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
> remote did not send all necessary objects
> 
> This might be fine. We don't expect users to clone partial bundles or fetch
> partial bundles into an unfiltered repo and these failures are expected. It
> is possible that we could put in custom logic to fail faster by reading the
> bundle header for a filter.
> 
> Generally, the idea is to open this up as a potential way to bootstrap a
> clone of a partial clone using a set of precomputed partial bundles.

Thanks Jeff, for providing a review of this series. I hope that at
least one other reviewer could take a look sometime.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 02/11] revision: put object filter into struct rev_info
  2022-02-23 17:55 ` [PATCH 02/11] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
@ 2022-03-04 22:15   ` Junio C Hamano
  2022-03-07 13:59     ` Derrick Stolee
  0 siblings, 1 reply; 114+ messages in thread
From: Junio C Hamano @ 2022-03-04 22:15 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  static int try_bitmap_count(struct rev_info *revs,
> -			    struct list_objects_filter_options *filter,
>  			    int filter_provided_objects)

This makes quite a lot of sense as filter is now available as
revs->filter.

>  {
>  	uint32_t commit_count = 0,
> @@ -436,7 +434,8 @@ static int try_bitmap_count(struct rev_info *revs,
>  	 */
>  	max_count = revs->max_count;
>  
> -	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
> +	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
> +					 filter_provided_objects);

And we should be able to do the same to prepare_bitmap_walk().  It
is OK if such a change comes later and not as part of this commit.

Perhaps it is deliberate.  Unlike the helpers this step touches,
namely, try_bitmap_count(), try_bitmap_traversal(), and
try_bitmap_disk_usage(), prepare_bitmap_walk() is not a file-scope
static helper and updating it will need touching many more places.

> @@ -597,13 +595,17 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  		}
>  
>  		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {

#leftoverbit.  We need to remember to clean this up, now "--filter"
is well established (I am assuming this literal-string pasting is
because we didn't know what the right and final word to be used as
the option name back when this code was originally written), when
the code around here is quiescent.

> -			parse_list_objects_filter(&filter_options, arg);
> -			if (filter_options.choice && !revs.blob_objects)
> +			if (!revs.filter)
> +				CALLOC_ARRAY(revs.filter, 1);
> +			parse_list_objects_filter(revs.filter, arg);
> +			if (revs.filter->choice && !revs.blob_objects)
>  				die(_("object filtering requires --objects"));
>  			continue;

OK.  The original "filter_options" was a structure and not a pointer
to a structure; now we have a pointer to a structure in revs as a
member so we need an on-demand allocation.  CALLOC_ARRAY() instead
of xcalloc(), when we know we are creating one element and not an
array of elements whose size happens to be one, is not wrong but it
does look strange.  Was there a reason why we avoid xcalloc()?

Makes me also wonder how big the filter_options structure is;
because we will not use unbounded many revs structure, it may have
been a simpler conversion to turn a static struct into an embedded
struct member in a struct (instead of a member of a struct that is a
pointer to a struct).  That way, we did not have to ...

>  		}
>  		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
> -			list_objects_filter_set_no_filter(&filter_options);
> +			if (!revs.filter)
> +				CALLOC_ARRAY(revs.filter, 1);

... repeat the on-demand allocation.  If some code used to pass
&filter_options in a parameter to helper functions, and such calling
sites get rewritten to pass the value in the revs.filter pointer,
and if revs hasn't gone through this codepath, these helper functions
will start receiving NULL in their filter_options parameter, which
they may or may not be prepared to take.  This "we get rid of a
global struct and replace it with an on-demand allocated structure,
pointer to which is stored in the revs structure" rewrite somehow
makes me nervous.

> diff --git a/revision.h b/revision.h
> index 3c58c18c63a..1ddb73ab82e 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -81,6 +81,7 @@ struct rev_cmdline_info {
>  
>  struct oidset;
>  struct topo_walk_info;
> +struct list_objects_filter_options;

Is the forward-declaration the only reason why we needed to have a
pointer to a(n opaque) struct, not an embedded struct, as a member?

>  struct rev_info {
>  	/* Starting list */
> @@ -94,6 +95,9 @@ struct rev_info {
>  	/* The end-points specified by the end user */
>  	struct rev_cmdline_info cmdline;
>  
> +	/* Object filter options. NULL for no filtering. */
> +	struct list_objects_filter_options *filter;
> +
>  	/* excluding from --branches, --refs, etc. expansion */
>  	struct string_list *ref_excludes;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 03/11] pack-objects: use rev.filter when possible
  2022-02-23 17:55 ` [PATCH 03/11] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
@ 2022-03-04 22:25   ` Junio C Hamano
  0 siblings, 0 replies; 114+ messages in thread
From: Junio C Hamano @ 2022-03-04 22:25 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> In builtin/pack-objects.c, we use a 'filter_options' global to populate
> the --filter=<X> argument. The previous change created a pointer to a
> filter option in 'struct rev_info', so we can use that pointer here as a
> start to simplifying some usage of object filters.

Hmph, it is very unfortunate that we cannot really get rid of the
file-scope static filter_options easily and replace it with an
instance of "struct list_objects_filter_options" embedded in
rev_info, because cmd_pack_objects(), where the place filter-options
are parsed, does not have an instance of "struct rev_info" to use
yet, and it takes some code restructuring to get there.

And that is why ...

> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  builtin/pack-objects.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index ba2006f2212..256d9b1798f 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
>  
>  static int get_object_list_from_bitmap(struct rev_info *revs)
>  {
> -	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
> +	if (!(bitmap_git = prepare_bitmap_walk(revs, revs->filter, 0)))
>  		return -1;
>  
>  	if (pack_options_allow_reuse() &&
> @@ -3727,6 +3727,7 @@ static void get_object_list(int ac, const char **av)
>  	repo_init_revisions(the_repository, &revs, NULL);
>  	save_commit_buffer = 0;
>  	setup_revisions(ac, av, &revs, &s_r_opt);
> +	revs.filter = &filter_options;

... we need something like this.

Nothing wrong per-se, feels somewhat unsatisfactory, but it would
work OK, which counts ;-)

>  	/* make sure shallows are read */
>  	is_repository_shallow(the_repository);
> @@ -3777,7 +3778,7 @@ static void get_object_list(int ac, const char **av)
>  
>  	if (!fn_show_object)
>  		fn_show_object = show_object;
> -	traverse_commit_list_filtered(&filter_options, &revs,
> +	traverse_commit_list_filtered(revs.filter, &revs,
>  				      show_commit, fn_show_object, NULL,
>  				      NULL);

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 04/11] pack-bitmap: drop filter in prepare_bitmap_walk()
  2022-02-23 17:55 ` [PATCH 04/11] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
@ 2022-03-04 22:26   ` Junio C Hamano
  0 siblings, 0 replies; 114+ messages in thread
From: Junio C Hamano @ 2022-03-04 22:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> Now that all consumers of prepare_bitmap_walk() have populated the
> 'filter' member of 'struct rev_info', we can drop that extra parameter
> from the method and access it directly from the 'struct rev_info'.

This step is a logical consequence from the previous few steps.
Very pleasing.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/11] list-objects: consolidate traverse_commit_list[_filtered]
  2022-02-23 17:55 ` [PATCH 05/11] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
@ 2022-03-04 22:30   ` Junio C Hamano
  0 siblings, 0 replies; 114+ messages in thread
From: Junio C Hamano @ 2022-03-04 22:30 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> Now that all consumers of traverse_commit_list_filtered() populate the
> 'filter' member of 'struct rev_info', we can drop that parameter from
> the method prototype to simplify things. In addition, the only thing
> different now between traverse_commit_list_filtered() and
> traverse_commit_list() is the presence of the 'omitted' parameter, which
> is only non-NULL for one caller. We can consolidate these two methods by
> having one call the other and use the simpler form everywhere the
> 'omitted' paramter would be NULL.

Nice.  Modulo "paramter" -> "parameter".

>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  builtin/pack-objects.c |  6 +++---
>  builtin/rev-list.c     |  2 +-
>  list-objects.c         | 25 ++++++++-----------------
>  list-objects.h         | 12 ++++++++++--
>  pack-bitmap.c          |  6 +++---
>  5 files changed, 25 insertions(+), 26 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 57f2cf49696..0432ae1e499 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3778,9 +3778,9 @@ static void get_object_list(int ac, const char **av)
>  
>  	if (!fn_show_object)
>  		fn_show_object = show_object;
> -	traverse_commit_list_filtered(revs.filter, &revs,
> -				      show_commit, fn_show_object, NULL,
> -				      NULL);
> +	traverse_commit_list(&revs,
> +			     show_commit, fn_show_object,
> +			     NULL);
>  
>  	if (unpack_unreachable_expiration) {
>  		revs.ignore_missing_links = 1;
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index 556e78aebb9..3ab727817fd 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -733,7 +733,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
>  
>  	traverse_commit_list_filtered(
> -		revs.filter, &revs, show_commit, show_object, &info,
> +		&revs, show_commit, show_object, &info,
>  		(arg_print_omitted ? &omitted_objects : NULL));
>  
>  	if (arg_print_omitted) {
> diff --git a/list-objects.c b/list-objects.c
> index 2f623f82115..9422625b39e 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -416,22 +416,7 @@ static void do_traverse(struct traversal_context *ctx)
>  	strbuf_release(&csp);
>  }
>  
> -void traverse_commit_list(struct rev_info *revs,
> -			  show_commit_fn show_commit,
> -			  show_object_fn show_object,
> -			  void *show_data)
> -{
> -	struct traversal_context ctx;
> -	ctx.revs = revs;
> -	ctx.show_commit = show_commit;
> -	ctx.show_object = show_object;
> -	ctx.show_data = show_data;
> -	ctx.filter = NULL;
> -	do_traverse(&ctx);
> -}
> -
>  void traverse_commit_list_filtered(
> -	struct list_objects_filter_options *filter_options,
>  	struct rev_info *revs,
>  	show_commit_fn show_commit,
>  	show_object_fn show_object,
> @@ -444,7 +429,13 @@ void traverse_commit_list_filtered(
>  	ctx.show_object = show_object;
>  	ctx.show_commit = show_commit;
>  	ctx.show_data = show_data;
> -	ctx.filter = list_objects_filter__init(omitted, filter_options);
> +	if (revs->filter)
> +		ctx.filter = list_objects_filter__init(omitted, revs->filter);
> +	else
> +		ctx.filter = NULL;
> +
>  	do_traverse(&ctx);
> -	list_objects_filter__free(ctx.filter);
> +
> +	if (ctx.filter)
> +		list_objects_filter__free(ctx.filter);
>  }
> diff --git a/list-objects.h b/list-objects.h
> index a952680e466..9eaf4de8449 100644
> --- a/list-objects.h
> +++ b/list-objects.h
> @@ -7,7 +7,6 @@ struct rev_info;
>  
>  typedef void (*show_commit_fn)(struct commit *, void *);
>  typedef void (*show_object_fn)(struct object *, const char *, void *);
> -void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, void *);
>  
>  typedef void (*show_edge_fn)(struct commit *);
>  void mark_edges_uninteresting(struct rev_info *revs,
> @@ -18,11 +17,20 @@ struct oidset;
>  struct list_objects_filter_options;
>  
>  void traverse_commit_list_filtered(
> -	struct list_objects_filter_options *filter_options,
>  	struct rev_info *revs,
>  	show_commit_fn show_commit,
>  	show_object_fn show_object,
>  	void *show_data,
>  	struct oidset *omitted);
>  
> +static inline void traverse_commit_list(
> +	struct rev_info *revs,
> +	show_commit_fn show_commit,
> +	show_object_fn show_object,
> +	void *show_data)
> +{
> +	traverse_commit_list_filtered(revs, show_commit,
> +				      show_object, show_data, NULL);
> +}
> +
>  #endif /* LIST_OBJECTS_H */
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 613f2797cdf..cbefaedbf43 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -822,9 +822,9 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
>  		show_data.bitmap_git = bitmap_git;
>  		show_data.base = base;
>  
> -		traverse_commit_list_filtered(revs->filter, revs,
> -					      show_commit, show_object,
> -					      &show_data, NULL);
> +		traverse_commit_list(revs,
> +				     show_commit, show_object,
> +				     &show_data);
>  
>  		revs->include_check = NULL;
>  		revs->include_check_obj = NULL;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 06/11] MyFirstObjectWalk: update recommended usage
  2022-02-23 17:55 ` [PATCH 06/11] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
@ 2022-03-04 22:33   ` Junio C Hamano
  2022-03-07 14:05     ` Derrick Stolee
  0 siblings, 1 reply; 114+ messages in thread
From: Junio C Hamano @ 2022-03-04 22:33 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The previous change consolidated traverse_commit_list() and
> traverse_commit_list_filtered(). This allows us to simplify the
> recommended usage in MyFirstObjectWalk.txt to use this new set of
> values.
>
> While here, add some clarification on the difference between the two
> methods.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  Documentation/MyFirstObjectWalk.txt | 44 +++++++++++------------------
>  1 file changed, 16 insertions(+), 28 deletions(-)

Nice simplification.

>
> diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
> index ca267941f3e..8ec83185b8a 100644
> --- a/Documentation/MyFirstObjectWalk.txt
> +++ b/Documentation/MyFirstObjectWalk.txt
> @@ -522,24 +522,25 @@ function shows that the all-object walk is being performed by
>  `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
>  functions reside in `list-objects.c`; examining the source shows that, despite
>  the name, these functions traverse all kinds of objects. Let's have a look at
> -the arguments to `traverse_commit_list_filtered()`, which are a superset of the
> -arguments to the unfiltered version.
> +the arguments to `traverse_commit_list()`.
>  
> -- `struct list_objects_filter_options *filter_options`: This is a struct which
> -  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
> -- `struct rev_info *revs`: This is the `rev_info` used for the walk.
> +- `struct rev_info *revs`: This is the `rev_info` used for the walk. It
> +  includes a `filter` member which contains information for how to filter
> +  the object list.

Perhaps,

    "When its `filter` member is not NULL, it contains ..."

implying that it is valid for `filter` member to be NULL and none of
the following things will happen in such a case.

>  For now, we are not going to track the omitted objects, so we'll replace those
>  parameters with `NULL`. For the sake of simplicity, we'll add a simple
> -build-time branch to use our filter or not. Replace the line calling
> +build-time branch to use our filter or not. Preface the line calling

Good eyes.

>  `traverse_commit_list()` with the following, which will remind us which kind of
>  walk we've just performed:
>  
> @@ -733,19 +723,17 @@ walk we've just performed:
>  	if (0) {
>  		/* Unfiltered: */
>  		trace_printf(_("Unfiltered object walk.\n"));
> -		traverse_commit_list(rev, walken_show_commit,
> -				walken_show_object, NULL);
>  	} else {
>  		trace_printf(
>  			_("Filtered object walk with filterspec 'tree:1'.\n"));
> -		parse_list_objects_filter(&filter_options, "tree:1");
> -
> -		traverse_commit_list_filtered(&filter_options, rev,
> -			walken_show_commit, walken_show_object, NULL, NULL);
> +		CALLOC_ARRAY(rev->filter, 1);
> +		parse_list_objects_filter(rev->filter, "tree:1");
>  	}
> +	traverse_commit_list(rev, walken_show_commit,
> +			     walken_show_object, NULL);
>  ----
>  
> -`struct list_objects_filter_options` is usually built directly from a command
> +The `rev->filter` member is usually built directly from a command
>  line argument, so the module provides an easy way to build one from a string.
>  Even though we aren't taking user input right now, we can still build one with
>  a hardcoded string using `parse_list_objects_filter()`.
> @@ -784,7 +772,7 @@ object:
>  ----
>  	...
>  
> -		traverse_commit_list_filtered(&filter_options, rev,
> +		traverse_commit_list_filtered(rev,
>  			walken_show_commit, walken_show_object, NULL, &omitted);
>  
>  	...

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/11] bundle: safely handle --objects option
  2022-02-23 17:55 ` [PATCH 07/11] bundle: safely handle --objects option Derrick Stolee via GitGitGadget
  2022-02-28 16:00   ` Jeff Hostetler
@ 2022-03-04 22:57   ` Junio C Hamano
  2022-03-07 15:35   ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 114+ messages in thread
From: Junio C Hamano @ 2022-03-04 22:57 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> Since 'git bundle' uses setup_revisions() to specify the object walk,
> some options do not make sense to include during the pack-objects child
> process. Further, these options are used for a call to
> traverse_commit_list() which would then require a callback which is
> currently NULL.
>
> By populating the callback we prevent a segfault in the case of adding
> the --objects flag. This is really a redundant statement because the
> bundles are constructing a pack-file containing all objects in the
> discovered commit range.
>
> Adding --objects to a 'git bundle' command might cause a slower command,
> but at least it will not have a hard failure when the user supplies this
> option. We can also disable walking trees and blobs in advance of this
> walk.

Wow.  That's fun.  

This commit makes me wonder if we are safe with --max-parents=,
--author=, and other nonsense options, but it is obvious that it is
a segfault waiting to happen by passing NULL to object callback,
which makes it worth singling out "--objects" and dedicate a commit
to fix it.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/11] bundle: safely handle --objects option
  2022-02-28 16:00   ` Jeff Hostetler
@ 2022-03-04 22:58     ` Junio C Hamano
  2022-03-07 14:09       ` Derrick Stolee
  0 siblings, 1 reply; 114+ messages in thread
From: Junio C Hamano @ 2022-03-04 22:58 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy, Derrick Stolee

Jeff Hostetler <git@jeffhostetler.com> writes:

> On 2/23/22 12:55 PM, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>
>> Since 'git bundle' uses setup_revisions() to specify the object
>> walk,
>> some options do not make sense to include during the pack-objects child
>> process. Further, these options are used for a call to
>> traverse_commit_list() which would then require a callback which is
>> currently NULL.
>> By populating the callback we prevent a segfault in the case of
>> adding
>> the --objects flag. This is really a redundant statement because the
>
> Nit: I stumbled over "...because the bundles are constructing..."
> Is there a better wording here??

"... because the command is constructing ..." should be sufficient,
I hope?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 10/11] bundle: create filtered bundles
  2022-02-23 17:55 ` [PATCH 10/11] bundle: create filtered bundles Derrick Stolee via GitGitGadget
@ 2022-03-04 23:35   ` Junio C Hamano
  2022-03-07 14:14     ` Derrick Stolee
  2022-03-07 15:44   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 114+ messages in thread
From: Junio C Hamano @ 2022-03-04 23:35 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> A previous change allowed Git to parse bundles with the 'filter'
> capability. Now, teach Git to create bundles with this option.
>
> Some rearranging of code is required to get the option parsing in the
> correct spot. There are now two reasons why we might need capabilities
> (a new hash algorithm or an object filter) so that is pulled out into a
> place where we can check both at the same time.
>
> The --filter option is parsed as part of setup_revisions(), but it
> expected the --objects flag, too. That flag is somewhat implied by 'git
> bundle' because it creates a pack-file walking objects, but there is
> also a walk that walks the revision range expecting only commits. Make
> this parsing work by setting 'revs.tree_objects' and 'revs.blob_objects'
> before the call to setup_revisions().
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---

Now, the gem of the series ;-)

> @@ -334,6 +334,9 @@ static int write_pack_data(int bundle_fd, struct rev_info *revs, struct strvec *
>  		     "--stdout", "--thin", "--delta-base-offset",
>  		     NULL);
>  	strvec_pushv(&pack_objects.args, pack_options->v);
> +	if (revs->filter)
> +		strvec_pushf(&pack_objects.args, "--filter=%s",
> +			     list_objects_filter_spec(revs->filter));
>  	pack_objects.in = -1;
>  	pack_objects.out = bundle_fd;
>  	pack_objects.git_cmd = 1;

Quite expected.

> @@ -507,10 +510,38 @@ int create_bundle(struct repository *r, const char *path,
>  	int bundle_to_stdout;
>  	int ref_count = 0;
>  	struct rev_info revs, revs_copy;
> -	int min_version = the_hash_algo == &hash_algos[GIT_HASH_SHA1] ? 2 : 3;
> +	int min_version = 2;
>  	struct bundle_prerequisites_info bpi;
>  	int i;
>  
> +	/* init revs to list objects for pack-objects later */
> +	save_commit_buffer = 0;
> +	repo_init_revisions(r, &revs, NULL);
> +
> +	/*
> +	 * Pre-initialize the '--objects' flag so we can parse a
> +	 * --filter option successfully.
> +	 */
> +	revs.tree_objects = revs.blob_objects = 1;

Tricky, but true.

> +	argc = setup_revisions(argc, argv, &revs, NULL);
> +
> +	/*
> +	 * Reasons to require version 3:
> +	 *
> +	 * 1. @object-format is required because our hash algorithm is not
> +	 *    SHA1.
> +	 * 2. @filter is required because we parsed an object filter.
> +	 */

OK.

> +	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] ||
> +	    revs.filter)

Did we need to wrap?  With these on a single line, the line is way
shorter than the line with "because our hash algorithm is not" on
it.

> +		min_version = 3;
> +
> +	if (argc > 1) {
> +		error(_("unrecognized argument: %s"), argv[1]);
> +		goto err;
> +	}
> +

OK.  We are moving original logic around correctly and there is not
much to see here ;-)

> @@ -533,17 +564,14 @@ int create_bundle(struct repository *r, const char *path,
>  		write_or_die(bundle_fd, capability, strlen(capability));
>  		write_or_die(bundle_fd, the_hash_algo->name, strlen(the_hash_algo->name));
>  		write_or_die(bundle_fd, "\n", 1);
> ...
> +		if (revs.filter) {
> +			const char *value = expand_list_objects_filter_spec(revs.filter);
> +			capability = "@filter=";
> +			write_or_die(bundle_fd, capability, strlen(capability));
> +			write_or_die(bundle_fd, value, strlen(value));
> +			write_or_die(bundle_fd, "\n", 1);
> +		}

This block is added at the end of the code to write the v3 preamble
and it adds the @filter= capability.  Looking good.

> @@ -566,6 +594,12 @@ int create_bundle(struct repository *r, const char *path,
>  	bpi.fd = bundle_fd;
>  	bpi.pending = &revs_copy.pending;
>  
> +	/*
> +	 * Nullify the filter here, and any object walking. We only care
> +	 * about commits and tags here. The revs_copy has the right
> +	 * instances of these values.
> +	 */
> +	revs.filter = NULL;
>  	revs.blob_objects = revs.tree_objects = 0;
>  	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
>  	object_array_remove_duplicates(&revs_copy.pending);

OK.  We prepare revs, and we save it to revs_copy, because we
perform two traversals, one to determine which bottom commits are
required to unbundle the bundle (which is done with the instance
"revs"), and then later to actually enumerate the objects to place
in the bundle (using "revs_copy").  Is there a reason why we need to
remove .filter in order to perform the first traversal?

This is a tangent, but I wish we could reliably determine when we
can optimize the first traversal away, by inspecting revs.  If there
are any pending objects with UNINTERESTING bit, or members like
max_count, max_age, min_age are set, we'd end up traversing down to
all roots and the prerequisites list would be empty.

> +	test_expect_success 'filtered bundle: $filter' '
> +		test_when_finished rm -rf .git/objects/pack &&
> +		git bundle create partial.bdl \
> +			--all \
> +			--filter=$filter &&
> +
> +		git bundle verify partial.bdl >unfiltered &&
> +		make_user_friendly_and_stable_output <unfiltered >actual &&
> +
> +		cat >expect <<-EOF &&
> +		The bundle contains these 10 refs:
> +		<COMMIT-P> refs/heads/main
> +		<COMMIT-N> refs/heads/release
> +		<COMMIT-D> refs/heads/topic/1
> +		<COMMIT-H> refs/heads/topic/2
> +		<COMMIT-D> refs/pull/1/head
> +		<COMMIT-G> refs/pull/2/head
> +		<TAG-1> refs/tags/v1
> +		<TAG-2> refs/tags/v2
> +		<TAG-3> refs/tags/v3
> +		<COMMIT-P> HEAD
> +		The bundle uses this filter: $filter
> +		The bundle records a complete history.
> +		EOF
> +		test_cmp expect actual
> +	'

OK.

It is somewhat curious why our bundle tests do not unbundle and
check the resulting contents of the repository we unbundle it in.

> +done
> +
>  test_done

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 11/11] bundle: unbundle promisor packs
  2022-02-23 17:55 ` [PATCH 11/11] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
@ 2022-03-04 23:43   ` Junio C Hamano
  2022-03-07 14:48     ` Derrick Stolee
  2022-03-07 15:47   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 114+ messages in thread
From: Junio C Hamano @ 2022-03-04 23:43 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> In order to have a valid pack-file after unbundling a bundle that has
> the 'filter' capability, we need to generate a .promisor file. The
> bundle does not promise _where_ the objects can be found, but we can
> expect that these bundles will be unbundled in repositories with
> appropriate promisor remotes that can find those missing objects.

That sounds like a lot of wishful thinking, but I do not think of a
better way to phrase the idea.  Taking a bundle out of a repository
and unbundling it elsewhere is "git fetch" that could be done to
send objects from the former to the latter repository, so I am OK
with the assumption that the original repository will stay available
for such users who took its contents over sneaker-net instead of
over the wire.

> Use the 'git index-pack --promisor=<message>' option to create this
> .promisor file. Add "from-bundle" as the message to help anyone diagnose
> issues with these promisor packs.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  bundle.c               | 4 ++++
>  t/t6020-bundle-misc.sh | 8 +++++++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/bundle.c b/bundle.c
> index e284ef63062..3d97de40ef0 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -631,6 +631,10 @@ int unbundle(struct repository *r, struct bundle_header *header,
>  	struct child_process ip = CHILD_PROCESS_INIT;
>  	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
>  
> +	/* If there is a filter, then we need to create the promisor pack. */
> +	if (header->filter)
> +		strvec_push(&ip.args, "--promisor=from-bundle");
> +
>  	if (extra_index_pack_args) {
>  		strvec_pushv(&ip.args, extra_index_pack_args->v);
>  		strvec_clear(extra_index_pack_args);
> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
> index 39cfefafb65..344af34db1e 100755
> --- a/t/t6020-bundle-misc.sh
> +++ b/t/t6020-bundle-misc.sh
> @@ -513,7 +513,13 @@ do
>  		The bundle uses this filter: $filter
>  		The bundle records a complete history.
>  		EOF
> -		test_cmp expect actual
> +		test_cmp expect actual &&
> +
> +		# This creates the first pack-file in the
> +		# .git/objects/pack directory. Look for a .promisor.
> +		git bundle unbundle partial.bdl &&
> +		ls .git/objects/pack/pack-*.promisor >promisor &&
> +		test_line_count = 1 promisor

OK.  Do we also want to inspect the contents of the resulting
repository to make sure that the bundle had the right contents?

One idea to do so would probably be

 - prepare a test repository (you already have it)
 - prepare a partial.bdl (you already do this)

 - clone the test repository into a new repository, with the same
   filter
 - create an empty repository, unbundle the partial.bdl

 - take "for-each-ref" and list of objects available in these two
   "partial copies" from the test repository, and compare


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 02/11] revision: put object filter into struct rev_info
  2022-03-04 22:15   ` Junio C Hamano
@ 2022-03-07 13:59     ` Derrick Stolee
  2022-03-07 16:46       ` Junio C Hamano
  0 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee @ 2022-03-07 13:59 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy

On 3/4/2022 5:15 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>>  static int try_bitmap_count(struct rev_info *revs,
>> -			    struct list_objects_filter_options *filter,
>>  			    int filter_provided_objects)
> 
> This makes quite a lot of sense as filter is now available as
> revs->filter.
> 
>>  {
>>  	uint32_t commit_count = 0,
>> @@ -436,7 +434,8 @@ static int try_bitmap_count(struct rev_info *revs,
>>  	 */
>>  	max_count = revs->max_count;
>>  
>> -	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
>> +	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
>> +					 filter_provided_objects);
> 
> And we should be able to do the same to prepare_bitmap_walk().  It
> is OK if such a change comes later and not as part of this commit.
> 
> Perhaps it is deliberate.  Unlike the helpers this step touches,
> namely, try_bitmap_count(), try_bitmap_traversal(), and
> try_bitmap_disk_usage(), prepare_bitmap_walk() is not a file-scope
> static helper and updating it will need touching many more places.

I'm making a note that this cleanup can happen in a follow-up series.
 
>> @@ -597,13 +595,17 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>>  		}
>>  
>>  		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
> 
> #leftoverbit.  We need to remember to clean this up, now "--filter"
> is well established (I am assuming this literal-string pasting is
> because we didn't know what the right and final word to be used as
> the option name back when this code was originally written), when
> the code around here is quiescent.

Good point.

>> -			parse_list_objects_filter(&filter_options, arg);
>> -			if (filter_options.choice && !revs.blob_objects)
>> +			if (!revs.filter)
>> +				CALLOC_ARRAY(revs.filter, 1);
>> +			parse_list_objects_filter(revs.filter, arg);
>> +			if (revs.filter->choice && !revs.blob_objects)
>>  				die(_("object filtering requires --objects"));
>>  			continue;
> 
> OK.  The original "filter_options" was a structure and not a pointer
> to a structure; now we have a pointer to a structure in revs as a
> member so we need an on-demand allocation.  CALLOC_ARRAY() instead
> of xcalloc(), when we know we are creating one element and not an
> array of elements whose size happens to be one, is not wrong but it
> does look strange.  Was there a reason why we avoid xcalloc()?

I think I've been using CALLOC_ARRAY(..., 1) over "... = xcalloc()"
for simplicity's sake for a while. I see quite a few across the
codebase, too, but I can swap the usage here if you feel that is
important.

> Makes me also wonder how big the filter_options structure is;
> because we will not use unbounded many revs structure, it may have
> been a simpler conversion to turn a static struct into an embedded
> struct member in a struct (instead of a member of a struct that is a
> pointer to a struct).  That way, we did not have to ...
>>>  		}
>>  		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
>> -			list_objects_filter_set_no_filter(&filter_options);
>> +			if (!revs.filter)
>> +				CALLOC_ARRAY(revs.filter, 1);
> 
> ... repeat the on-demand allocation.  If some code used to pass
> &filter_options in a parameter to helper functions, and such calling
> sites get rewritten to pass the value in the revs.filter pointer,
> and if revs hasn't gone through this codepath, these helper functions
> will start receiving NULL in their filter_options parameter, which
> they may or may not be prepared to take.  This "we get rid of a
> global struct and replace it with an on-demand allocated structure,
> pointer to which is stored in the revs structure" rewrite somehow
> makes me nervous.

I think the main idea is that the filter being NULL indicates "no
filter is used. Do a full object walk." If we use a static struct,
then we need to instead use revs->filter.filter_spec.nr, but that
is already being used as a BUG() statement:

const char *list_objects_filter_spec(struct list_objects_filter_options *filter)
{
	if (!filter->filter_spec.nr)
		BUG("no filter_spec available for this filter");

so a non-NULL filter with no specs is considered invalid, while
a NULL filter _is_ currently considered valid.

While we _could_ make that switch to using a static struct and
change our checks to allow empty specs, that would be a more
involved change. Maybe we can leave it for a follow up?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 06/11] MyFirstObjectWalk: update recommended usage
  2022-03-04 22:33   ` Junio C Hamano
@ 2022-03-07 14:05     ` Derrick Stolee
  2022-03-07 16:47       ` Junio C Hamano
  0 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee @ 2022-03-07 14:05 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy

On 3/4/2022 5:33 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>>  the name, these functions traverse all kinds of objects. Let's have a look at
>> -the arguments to `traverse_commit_list_filtered()`, which are a superset of the
>> -arguments to the unfiltered version.
>> +the arguments to `traverse_commit_list()`.
>>  
>> -- `struct list_objects_filter_options *filter_options`: This is a struct which
>> -  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
>> -- `struct rev_info *revs`: This is the `rev_info` used for the walk.
>> +- `struct rev_info *revs`: This is the `rev_info` used for the walk. It
>> +  includes a `filter` member which contains information for how to filter
>> +  the object list.
> 
> Perhaps,
> 
>     "When its `filter` member is not NULL, it contains ..."
> 
> implying that it is valid for `filter` member to be NULL and none of
> the following things will happen in such a case.

Definitely room for improvement here. I got hung up on the two uses of
"it" so here is another attempt:

 If its `filter` member is not `NULL`, then `filter` contains
 information for how to filter the object list.

What do you think?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/11] bundle: safely handle --objects option
  2022-03-04 22:58     ` Junio C Hamano
@ 2022-03-07 14:09       ` Derrick Stolee
  0 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee @ 2022-03-07 14:09 UTC (permalink / raw)
  To: Junio C Hamano, Jeff Hostetler
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy

On 3/4/2022 5:58 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> On 2/23/22 12:55 PM, Derrick Stolee via GitGitGadget wrote:
>>> From: Derrick Stolee <derrickstolee@github.com>
>>> Since 'git bundle' uses setup_revisions() to specify the object
>>> walk,
>>> some options do not make sense to include during the pack-objects child
>>> process. Further, these options are used for a call to
>>> traverse_commit_list() which would then require a callback which is
>>> currently NULL.
>>> By populating the callback we prevent a segfault in the case of
>>> adding
>>> the --objects flag. This is really a redundant statement because the
>>
>> Nit: I stumbled over "...because the bundles are constructing..."
>> Is there a better wording here??
> 
> "... because the command is constructing ..." should be sufficient,
> I hope?

That works for me. Thanks!
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 10/11] bundle: create filtered bundles
  2022-03-04 23:35   ` Junio C Hamano
@ 2022-03-07 14:14     ` Derrick Stolee
  2022-03-07 16:49       ` Junio C Hamano
  0 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee @ 2022-03-07 14:14 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy

On 3/4/2022 6:35 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> A previous change allowed Git to parse bundles with the 'filter'
>> capability. Now, teach Git to create bundles with this option.
>>
>> Some rearranging of code is required to get the option parsing in the
>> correct spot. There are now two reasons why we might need capabilities
>> (a new hash algorithm or an object filter) so that is pulled out into a
>> place where we can check both at the same time.
>>
>> The --filter option is parsed as part of setup_revisions(), but it
>> expected the --objects flag, too. That flag is somewhat implied by 'git
>> bundle' because it creates a pack-file walking objects, but there is
>> also a walk that walks the revision range expecting only commits. Make
>> this parsing work by setting 'revs.tree_objects' and 'revs.blob_objects'
>> before the call to setup_revisions().
>>
>> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>> ---
> 
> Now, the gem of the series ;-)

:D

>> +	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] ||
>> +	    revs.filter)
> 
> Did we need to wrap?  With these on a single line, the line is way
> shorter than the line with "because our hash algorithm is not" on
> it.

Perhaps I was thinking about having one line per "reason", which might
be extended in the future. But there's no reason to waste space right
now.

>> +	/*
>> +	 * Nullify the filter here, and any object walking. We only care
>> +	 * about commits and tags here. The revs_copy has the right
>> +	 * instances of these values.
>> +	 */
>> +	revs.filter = NULL;
>>  	revs.blob_objects = revs.tree_objects = 0;
>>  	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
>>  	object_array_remove_duplicates(&revs_copy.pending);
> 
> OK.  We prepare revs, and we save it to revs_copy, because we
> perform two traversals, one to determine which bottom commits are
> required to unbundle the bundle (which is done with the instance
> "revs"), and then later to actually enumerate the objects to place
> in the bundle (using "revs_copy").  Is there a reason why we need to
> remove .filter in order to perform the first traversal?
> 
> This is a tangent, but I wish we could reliably determine when we
> can optimize the first traversal away, by inspecting revs.  If there
> are any pending objects with UNINTERESTING bit, or members like
> max_count, max_age, min_age are set, we'd end up traversing down to
> all roots and the prerequisites list would be empty.

Noted for potential follow-up.
 
>> +	test_expect_success 'filtered bundle: $filter' '
...
> 
> OK.
> 
> It is somewhat curious why our bundle tests do not unbundle and
> check the resulting contents of the repository we unbundle it in.

I haven't checked your response yet, but hopefully this is answered in
the next patch which teaches Git how to unbundle bundles with this new
capability.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 11/11] bundle: unbundle promisor packs
  2022-03-04 23:43   ` Junio C Hamano
@ 2022-03-07 14:48     ` Derrick Stolee
  2022-03-07 16:56       ` Junio C Hamano
  0 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee @ 2022-03-07 14:48 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy

On 3/4/2022 6:43 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> In order to have a valid pack-file after unbundling a bundle that has
>> the 'filter' capability, we need to generate a .promisor file. The
>> bundle does not promise _where_ the objects can be found, but we can
>> expect that these bundles will be unbundled in repositories with
>> appropriate promisor remotes that can find those missing objects.
> 
> That sounds like a lot of wishful thinking, but I do not think of a
> better way to phrase the idea.  Taking a bundle out of a repository
> and unbundling it elsewhere is "git fetch" that could be done to
> send objects from the former to the latter repository, so I am OK
> with the assumption that the original repository will stay available
> for such users who took its contents over sneaker-net instead of
> over the wire.

As an aside, I'm also concerned about the existing model of promisor
remotes where it depends on each remote, and isn't a repository-wide
state. In particular, if I do a blobless partial clone of git/git and
then add git-for-windows/git as a remote and fetch it, it will break
because git-for-windows/git isn't set up as a promisor remote and we
expect to have every blob reachable from its pack-file (even though
it was not sent because we advertised a commit that can reach it).

I've been thinking about adjusting the config parsing around promisors
to say "I see one promisor remote, so I will assume all remotes are
promisors." It seems to me that this will fix cases like the above
without further breaking any cases (that are not already broken).

But that's a tangent for another time. :)
 
>> Use the 'git index-pack --promisor=<message>' option to create this
>> .promisor file. Add "from-bundle" as the message to help anyone diagnose
>> issues with these promisor packs.
>>
>> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>> ---
>>  bundle.c               | 4 ++++
>>  t/t6020-bundle-misc.sh | 8 +++++++-
>>  2 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/bundle.c b/bundle.c
>> index e284ef63062..3d97de40ef0 100644
>> --- a/bundle.c
>> +++ b/bundle.c
>> @@ -631,6 +631,10 @@ int unbundle(struct repository *r, struct bundle_header *header,
>>  	struct child_process ip = CHILD_PROCESS_INIT;
>>  	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
>>  
>> +	/* If there is a filter, then we need to create the promisor pack. */
>> +	if (header->filter)
>> +		strvec_push(&ip.args, "--promisor=from-bundle");
>> +
>>  	if (extra_index_pack_args) {
>>  		strvec_pushv(&ip.args, extra_index_pack_args->v);
>>  		strvec_clear(extra_index_pack_args);
>> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
>> index 39cfefafb65..344af34db1e 100755
>> --- a/t/t6020-bundle-misc.sh
>> +++ b/t/t6020-bundle-misc.sh
>> @@ -513,7 +513,13 @@ do
>>  		The bundle uses this filter: $filter
>>  		The bundle records a complete history.
>>  		EOF
>> -		test_cmp expect actual
>> +		test_cmp expect actual &&
>> +
>> +		# This creates the first pack-file in the
>> +		# .git/objects/pack directory. Look for a .promisor.
>> +		git bundle unbundle partial.bdl &&
>> +		ls .git/objects/pack/pack-*.promisor >promisor &&
>> +		test_line_count = 1 promisor
> 
> OK.  Do we also want to inspect the contents of the resulting
> repository to make sure that the bundle had the right contents?
> 
> One idea to do so would probably be
> 
>  - prepare a test repository (you already have it)
>  - prepare a partial.bdl (you already do this)
> 
>  - clone the test repository into a new repository, with the same
>    filter
>  - create an empty repository, unbundle the partial.bdl
> 
>  - take "for-each-ref" and list of objects available in these two
>    "partial copies" from the test repository, and compare

Good idea. Thanks!

Of course, looking closer at it... "git bundle unbundle" doesn't
actually store the refs directly in the refspace, but instead only
outputs the refs that it used.

Here is an attempt to verify the refs that are reported match
those in a mirror clone.

--- >8 ---

diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 344af34db1e..a228cbfc4e3 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -490,7 +490,7 @@ test_expect_success 'unfiltered bundle with --objects' '
 for filter in "blob:none" "tree:0" "tree:1" "blob:limit=100"
 do
 	test_expect_success 'filtered bundle: $filter' '
-		test_when_finished rm -rf .git/objects/pack &&
+		test_when_finished rm -rf .git/objects/pack cloned unbundled &&
 		git bundle create partial.bdl \
 			--all \
 			--filter=$filter &&
@@ -515,11 +515,22 @@ do
 		EOF
 		test_cmp expect actual &&
 
-		# This creates the first pack-file in the
-		# .git/objects/pack directory. Look for a .promisor.
-		git bundle unbundle partial.bdl &&
-		ls .git/objects/pack/pack-*.promisor >promisor &&
-		test_line_count = 1 promisor
+		git init unbundled &&
+		(
+			cd unbundled &&
+			# This creates the first pack-file in the
+			# .git/objects/pack directory. Look for a .promisor.
+			git bundle unbundle ../partial.bdl >ref-list.txt &&
+			ls .git/objects/pack/pack-*.promisor >promisor &&
+			test_line_count = 1 promisor
+		) &&
+
+		git clone --filter=blob:none --mirror "file://$(pwd)" cloned &&
+		git -C cloned for-each-ref \
+			--format="%(objectname) %(refname)" >cloned-refs.txt &&
+		echo "$(git -C cloned rev-parse HEAD) HEAD" >>cloned-refs.txt &&
+		test_cmp cloned-refs.txt unbundled/ref-list.txt
 	'
 done
 
--- >8 ---

I also attempted doing a "git clone --bare partial.bdl unbundled.git" to
get the 'git clone' command to actually place the refs. However, 'git clone'
does not set up the repository filter based on the bundle, so it reports
missing blobs (even though there is no checkout). Making this work would
require that "repository global promisor config" idea that I mentioned in
another reply. I'll make note of this as a potential application of that
idea.

Thanks,
-Stolee

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/11] Partial bundles
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
                   ` (12 preceding siblings ...)
  2022-03-04 19:19 ` Derrick Stolee
@ 2022-03-07 14:55 ` Ævar Arnfjörð Bjarmason
  2022-03-07 14:59   ` Derrick Stolee
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
  14 siblings, 1 reply; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-07 14:55 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Derrick Stolee


On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:

Just on:

>  Documentation/MyFirstObjectWalk.txt | 44 ++++++---------
>  Documentation/git-index-pack.txt    |  8 +++
>  builtin/pack-objects.c              |  9 +--
>  builtin/rev-list.c                  | 29 +++-------
>  bundle.c                            | 87 ++++++++++++++++++++++++-----
>  bundle.h                            |  3 +
>  list-objects-filter-options.c       |  2 +-
>  list-objects-filter-options.h       |  5 ++
>  list-objects.c                      | 25 +++------
>  list-objects.h                      | 12 +++-
>  pack-bitmap.c                       | 24 ++++----
>  pack-bitmap.h                       |  2 -
>  reachable.c                         |  2 +-
>  revision.c                          | 11 ++++
>  revision.h                          |  4 ++
>  t/t5300-pack-object.sh              |  4 +-
>  t/t6020-bundle-misc.sh              | 48 ++++++++++++++++
>  17 files changed, 215 insertions(+), 104 deletions(-)

This is missing a corresponding change to
Documentation/technical/bundle-format.txt.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/11] Partial bundles
  2022-03-07 14:55 ` Ævar Arnfjörð Bjarmason
@ 2022-03-07 14:59   ` Derrick Stolee
  0 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee @ 2022-03-07 14:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy

On 3/7/2022 9:55 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:
> 
> Just on:
> 
>>  Documentation/MyFirstObjectWalk.txt | 44 ++++++---------
>>  Documentation/git-index-pack.txt    |  8 +++
>>  builtin/pack-objects.c              |  9 +--
>>  builtin/rev-list.c                  | 29 +++-------
>>  bundle.c                            | 87 ++++++++++++++++++++++++-----
>>  bundle.h                            |  3 +
>>  list-objects-filter-options.c       |  2 +-
>>  list-objects-filter-options.h       |  5 ++
>>  list-objects.c                      | 25 +++------
>>  list-objects.h                      | 12 +++-
>>  pack-bitmap.c                       | 24 ++++----
>>  pack-bitmap.h                       |  2 -
>>  reachable.c                         |  2 +-
>>  revision.c                          | 11 ++++
>>  revision.h                          |  4 ++
>>  t/t5300-pack-object.sh              |  4 +-
>>  t/t6020-bundle-misc.sh              | 48 ++++++++++++++++
>>  17 files changed, 215 insertions(+), 104 deletions(-)
> 
> This is missing a corresponding change to
> Documentation/technical/bundle-format.txt.

Thanks. I don't know how I missed that.

-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/11] bundle: safely handle --objects option
  2022-02-23 17:55 ` [PATCH 07/11] bundle: safely handle --objects option Derrick Stolee via GitGitGadget
  2022-02-28 16:00   ` Jeff Hostetler
  2022-03-04 22:57   ` Junio C Hamano
@ 2022-03-07 15:35   ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-07 15:35 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Derrick Stolee


On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> Since 'git bundle' uses setup_revisions() to specify the object walk,
> some options do not make sense to include during the pack-objects child
> process. Further, these options are used for a call to
> traverse_commit_list() which would then require a callback which is
> currently NULL.
>
> By populating the callback we prevent a segfault in the case of adding
> the --objects flag. This is really a redundant statement because the
> bundles are constructing a pack-file containing all objects in the
> discovered commit range.
>
> Adding --objects to a 'git bundle' command might cause a slower command,
> but at least it will not have a hard failure when the user supplies this
> option. We can also disable walking trees and blobs in advance of this
> walk.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  bundle.c               | 10 +++++++++-
>  t/t6020-bundle-misc.sh | 12 ++++++++++++
>  2 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/bundle.c b/bundle.c
> index a0bb687b0f4..dc56db9a50a 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -451,6 +451,12 @@ struct bundle_prerequisites_info {
>  	int fd;
>  };
>  
> +
> +static void ignore_object(struct object *obj, const char *v, void *data)
> +{
> +	/* Do nothing. */
> +}
> +
>  static void write_bundle_prerequisites(struct commit *commit, void *data)
>  {
>  	struct bundle_prerequisites_info *bpi = data;
> @@ -544,7 +550,9 @@ int create_bundle(struct repository *r, const char *path,
>  		die("revision walk setup failed");
>  	bpi.fd = bundle_fd;
>  	bpi.pending = &revs_copy.pending;
> -	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
> +
> +	revs.blob_objects = revs.tree_objects = 0;
> +	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
>  	object_array_remove_duplicates(&revs_copy.pending);

The callback dummy callback part of it seems like something we'd be
better off doing by just teaching traverse_commit_list() to pay
attention to our "NULL" in this case.

But maybe I'd don't quite get why it either can't say "oh it's, NULL,
don't need to call that", or alternatively die earlier as it notices it
needs to call it, but it wasn't provided.

The same presumably goes for show_commit_fn.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 08/11] bundle: parse filter capability
  2022-02-23 17:55 ` [PATCH 08/11] bundle: parse filter capability Derrick Stolee via GitGitGadget
@ 2022-03-07 15:38   ` Ævar Arnfjörð Bjarmason
  2022-03-07 16:14     ` Derrick Stolee
  2022-03-07 15:55   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-07 15:38 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Derrick Stolee


On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The v3 bundle format has capabilities, allowing newer versions of Git to
> create bundles with newer features. Older versions that do not
> understand these new capabilities will fail with a helpful warning.
>
> Create a new capability allowing Git to understand that the contained
> pack-file is filtered according to some object filter. Typically, this
> filter will be "blob:none" for a blobless partial clone.
>
> This change teaches Git to parse this capability, place its value in the
> bundle header, and demonstrate this understanding by adding a message to
> 'git bundle verify'.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  bundle.c                      | 17 ++++++++++++++++-
>  bundle.h                      |  3 +++
>  list-objects-filter-options.c |  2 +-
>  list-objects-filter-options.h |  5 +++++
>  4 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/bundle.c b/bundle.c
> index dc56db9a50a..2afced4d991 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -11,7 +11,7 @@
>  #include "run-command.h"
>  #include "refs.h"
>  #include "strvec.h"
> -
> +#include "list-objects-filter-options.h"
>  
>  static const char v2_bundle_signature[] = "# v2 git bundle\n";
>  static const char v3_bundle_signature[] = "# v3 git bundle\n";
> @@ -33,6 +33,8 @@ void bundle_header_release(struct bundle_header *header)
>  {
>  	string_list_clear(&header->prerequisites, 1);
>  	string_list_clear(&header->references, 1);
> +	list_objects_filter_release(header->filter);
> +	free(header->filter);
>  }
>  
>  static int parse_capability(struct bundle_header *header, const char *capability)
> @@ -45,6 +47,11 @@ static int parse_capability(struct bundle_header *header, const char *capability
>  		header->hash_algo = &hash_algos[algo];
>  		return 0;
>  	}
> +	if (skip_prefix(capability, "filter=", &arg)) {
> +		CALLOC_ARRAY(header->filter, 1);
> +		parse_list_objects_filter(header->filter, arg);
> +		return 0;
> +	}
>  	return error(_("unknown capability '%s'"), capability);
>  }
>  
> @@ -220,6 +227,8 @@ int verify_bundle(struct repository *r,
>  	req_nr = revs.pending.nr;
>  	setup_revisions(2, argv, &revs, NULL);
>  
> +	revs.filter = header->filter;
> +
>  	if (prepare_revision_walk(&revs))
>  		die(_("revision walk setup failed"));
>  
> @@ -259,6 +268,12 @@ int verify_bundle(struct repository *r,
>  			     r->nr),
>  			  r->nr);
>  		list_refs(r, 0, NULL);
> +
> +		if (header->filter) {
> +			printf_ln("The bundle uses this filter: %s",
> +				  list_objects_filter_spec(header->filter));
> +		}
> +
>  		r = &header->prerequisites;
>  		if (!r->nr) {
>  			printf_ln(_("The bundle records a complete history."));
> diff --git a/bundle.h b/bundle.h
> index 06009fe6b1f..eb026153d56 100644
> --- a/bundle.h
> +++ b/bundle.h
> @@ -5,11 +5,14 @@
>  #include "cache.h"
>  #include "string-list.h"
>  
> +struct list_objects_filter_options;
> +

For the other ones we include the relevant header, do the same here (or
if there's a need to not do it, do we need it for the rest too?)

>  struct bundle_header {
>  	unsigned version;
>  	struct string_list prerequisites;
>  	struct string_list references;
>  	const struct git_hash_algo *hash_algo;
> +	struct list_objects_filter_options *filter;
>  };

I haven't tried, but any reason this needs to be a *filter
v.s. embedding it in the struct?

Then we'd just need list_objects_filter_release() and not the free() as
well.

Is it because you're piggy-backing on "if (header->filter)" as "do we
have it" state, better to check .nr?

> @@ -55,7 +55,7 @@ const char *list_object_filter_config_name(enum list_objects_filter_choice c)
>   * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
>   * convenience of the current command.
>   */

These API docs....

> -static int gently_parse_list_objects_filter(
> +int gently_parse_list_objects_filter(
>  	struct list_objects_filter_options *filter_options,
>  	const char *arg,
>  	struct strbuf *errbuf)
> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> index da5b6737e27..347a99c28cf 100644
> --- a/list-objects-filter-options.h
> +++ b/list-objects-filter-options.h
> @@ -72,6 +72,11 @@ struct list_objects_filter_options {
>  /* Normalized command line arguments */
>  #define CL_ARG__FILTER "filter"

...should be moved here, presumably.

> +int gently_parse_list_objects_filter(
> +	struct list_objects_filter_options *filter_options,
> +	const char *arg,
> +	struct strbuf *errbuf);
> +

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 10/11] bundle: create filtered bundles
  2022-02-23 17:55 ` [PATCH 10/11] bundle: create filtered bundles Derrick Stolee via GitGitGadget
  2022-03-04 23:35   ` Junio C Hamano
@ 2022-03-07 15:44   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-07 15:44 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Derrick Stolee


On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>

Just something missed by Junio....

> +for filter in "blob:none" "tree:0" "tree:1" "blob:limit=100"
> +do
> +	test_expect_success 'filtered bundle: $filter' '

I think you'll need "" here, the description part of test_expect_success
isn't eval'd.

> +		test_when_finished rm -rf .git/objects/pack &&
> +		git bundle create partial.bdl \
> +			--all \
> +			--filter=$filter &&
> +
> +		git bundle verify partial.bdl >unfiltered &&
> +		make_user_friendly_and_stable_output <unfiltered >actual &&
> +
> +		cat >expect <<-EOF &&
> +		The bundle contains these 10 refs:
> +		<COMMIT-P> refs/heads/main
> +		<COMMIT-N> refs/heads/release
> +		<COMMIT-D> refs/heads/topic/1
> +		<COMMIT-H> refs/heads/topic/2
> +		<COMMIT-D> refs/pull/1/head
> +		<COMMIT-G> refs/pull/2/head
> +		<TAG-1> refs/tags/v1
> +		<TAG-2> refs/tags/v2
> +		<TAG-3> refs/tags/v3
> +		<COMMIT-P> HEAD
> +		The bundle uses this filter: $filter
> +		The bundle records a complete history.
> +		EOF
> +		test_cmp expect actual
> +	'
> +done
> +
>  test_done

I think this needs a corresponding documentation update, now "verify"
just talks about output related to whether the history is complete or
not.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 11/11] bundle: unbundle promisor packs
  2022-02-23 17:55 ` [PATCH 11/11] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
  2022-03-04 23:43   ` Junio C Hamano
@ 2022-03-07 15:47   ` Ævar Arnfjörð Bjarmason
  2022-03-07 16:10     ` Derrick Stolee
  1 sibling, 1 reply; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-07 15:47 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Derrick Stolee


On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> In order to have a valid pack-file after unbundling a bundle that has
> the 'filter' capability, we need to generate a .promisor file. The
> bundle does not promise _where_ the objects can be found, but we can
> expect that these bundles will be unbundled in repositories with
> appropriate promisor remotes that can find those missing objects.
>
> Use the 'git index-pack --promisor=<message>' option to create this
> .promisor file. Add "from-bundle" as the message to help anyone diagnose
> issues with these promisor packs.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  bundle.c               | 4 ++++
>  t/t6020-bundle-misc.sh | 8 +++++++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/bundle.c b/bundle.c
> index e284ef63062..3d97de40ef0 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -631,6 +631,10 @@ int unbundle(struct repository *r, struct bundle_header *header,
>  	struct child_process ip = CHILD_PROCESS_INIT;
>  	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
>  
> +	/* If there is a filter, then we need to create the promisor pack. */
> +	if (header->filter)
> +		strvec_push(&ip.args, "--promisor=from-bundle");
> +
>  	if (extra_index_pack_args) {
>  		strvec_pushv(&ip.args, extra_index_pack_args->v);
>  		strvec_clear(extra_index_pack_args);
> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
> index 39cfefafb65..344af34db1e 100755
> --- a/t/t6020-bundle-misc.sh
> +++ b/t/t6020-bundle-misc.sh
> @@ -513,7 +513,13 @@ do
>  		The bundle uses this filter: $filter
>  		The bundle records a complete history.
>  		EOF
> -		test_cmp expect actual
> +		test_cmp expect actual &&
> +
> +		# This creates the first pack-file in the
> +		# .git/objects/pack directory. Look for a .promisor.
> +		git bundle unbundle partial.bdl &&
> +		ls .git/objects/pack/pack-*.promisor >promisor &&
> +		test_line_count = 1 promisor
>  	'
>  done

Aside from what Junio mentioned, the preceding commit seems to be
incomplete here. I.e. I'd expect to see this replace a case where we
died or whatever before. What happened if we invoked "unbundle" before?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 08/11] bundle: parse filter capability
  2022-02-23 17:55 ` [PATCH 08/11] bundle: parse filter capability Derrick Stolee via GitGitGadget
  2022-03-07 15:38   ` Ævar Arnfjörð Bjarmason
@ 2022-03-07 15:55   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-07 15:55 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Derrick Stolee


On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
> [...]
>  static int parse_capability(struct bundle_header *header, const char *capability)
> @@ -45,6 +47,11 @@ static int parse_capability(struct bundle_header *header, const char *capability
>  		header->hash_algo = &hash_algos[algo];
>  		return 0;
>  	}
> +	if (skip_prefix(capability, "filter=", &arg)) {
> +		CALLOC_ARRAY(header->filter, 1);
> +		parse_list_objects_filter(header->filter, arg);
> +		return 0;
> +	}
>  	return error(_("unknown capability '%s'"), capability);
>  }

[Something I should have noted in the other reply, but missed].

Before this series we just had the object-format capability, and now we
have a 2nd one.

As before we'll return errors un unknown capabilities.

I think it's worthwhile to stop here & think how we think about
cross-version compatibility between git versions. I.e. we're not
changing the *format version* here (nor is it needed), but just adding a
new capability that older gits won't know about.

I don't know if this is a case where older versions could limp along in
some cases and still unbundle these, probably that's never happening?

So probably nothing needs to change here, I was just wondering *if* we
had capabilities that were optional in some cases whether we shouldn't
while-we're-at-it give those some prefix indicating that, and have older
versions just issue a warning().

Then have them just try to call index-pack & see if that worked.

But yeah, all of that probably isn't applicable here at all :)

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 11/11] bundle: unbundle promisor packs
  2022-03-07 15:47   ` Ævar Arnfjörð Bjarmason
@ 2022-03-07 16:10     ` Derrick Stolee
  0 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee @ 2022-03-07 16:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy

On 3/7/2022 10:47 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> In order to have a valid pack-file after unbundling a bundle that has
>> the 'filter' capability, we need to generate a .promisor file. The
>> bundle does not promise _where_ the objects can be found, but we can
>> expect that these bundles will be unbundled in repositories with
>> appropriate promisor remotes that can find those missing objects.
>>
>> Use the 'git index-pack --promisor=<message>' option to create this
>> .promisor file. Add "from-bundle" as the message to help anyone diagnose
>> issues with these promisor packs.
>>
>> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>> ---
>>  bundle.c               | 4 ++++
>>  t/t6020-bundle-misc.sh | 8 +++++++-
>>  2 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/bundle.c b/bundle.c
>> index e284ef63062..3d97de40ef0 100644
>> --- a/bundle.c
>> +++ b/bundle.c
>> @@ -631,6 +631,10 @@ int unbundle(struct repository *r, struct bundle_header *header,
>>  	struct child_process ip = CHILD_PROCESS_INIT;
>>  	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
>>  
>> +	/* If there is a filter, then we need to create the promisor pack. */
>> +	if (header->filter)
>> +		strvec_push(&ip.args, "--promisor=from-bundle");
>> +
>>  	if (extra_index_pack_args) {
>>  		strvec_pushv(&ip.args, extra_index_pack_args->v);
>>  		strvec_clear(extra_index_pack_args);
>> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
>> index 39cfefafb65..344af34db1e 100755
>> --- a/t/t6020-bundle-misc.sh
>> +++ b/t/t6020-bundle-misc.sh
>> @@ -513,7 +513,13 @@ do
>>  		The bundle uses this filter: $filter
>>  		The bundle records a complete history.
>>  		EOF
>> -		test_cmp expect actual
>> +		test_cmp expect actual &&
>> +
>> +		# This creates the first pack-file in the
>> +		# .git/objects/pack directory. Look for a .promisor.
>> +		git bundle unbundle partial.bdl &&
>> +		ls .git/objects/pack/pack-*.promisor >promisor &&
>> +		test_line_count = 1 promisor
>>  	'
>>  done
> 
> Aside from what Junio mentioned, the preceding commit seems to be
> incomplete here. I.e. I'd expect to see this replace a case where we
> died or whatever before. What happened if we invoked "unbundle" before?

Looking closely, I think the only difference is that this patch
adds the .promisor file. I can push my expanded test to be
earlier in the series so we can verify this.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 08/11] bundle: parse filter capability
  2022-03-07 15:38   ` Ævar Arnfjörð Bjarmason
@ 2022-03-07 16:14     ` Derrick Stolee
  2022-03-07 16:22       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee @ 2022-03-07 16:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy

On 3/7/2022 10:38 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
...
>> diff --git a/bundle.h b/bundle.h
>> index 06009fe6b1f..eb026153d56 100644
>> --- a/bundle.h
>> +++ b/bundle.h
>> @@ -5,11 +5,14 @@
>>  #include "cache.h"
>>  #include "string-list.h"
>>  
>> +struct list_objects_filter_options;
>> +
> 
> For the other ones we include the relevant header, do the same here (or
> if there's a need to not do it, do we need it for the rest too?)

The others are .c files that require looking into the struct. This
declaration is all that's required for this header file.

>>  struct bundle_header {
>>  	unsigned version;
>>  	struct string_list prerequisites;
>>  	struct string_list references;
>>  	const struct git_hash_algo *hash_algo;
>> +	struct list_objects_filter_options *filter;
>>  };
> 
> I haven't tried, but any reason this needs to be a *filter
> v.s. embedding it in the struct?
> 
> Then we'd just need list_objects_filter_release() and not the free() as
> well.
> 
> Is it because you're piggy-backing on "if (header->filter)" as "do we
> have it" state, better to check .nr?

Yes. I replied to Junio before that there is some assumption in
the filtering code that the .nr == 0 case is listed as a BUG()
so we would possibly be breaking expectations in a different
way doing the embedded version.

>> @@ -55,7 +55,7 @@ const char *list_object_filter_config_name(enum list_objects_filter_choice c)
>>   * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
>>   * convenience of the current command.
>>   */
> 
> These API docs....
> 
>> -static int gently_parse_list_objects_filter(
>> +int gently_parse_list_objects_filter(
>>  	struct list_objects_filter_options *filter_options,
>>  	const char *arg,
>>  	struct strbuf *errbuf)
>> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
>> index da5b6737e27..347a99c28cf 100644
>> --- a/list-objects-filter-options.h
>> +++ b/list-objects-filter-options.h
>> @@ -72,6 +72,11 @@ struct list_objects_filter_options {
>>  /* Normalized command line arguments */
>>  #define CL_ARG__FILTER "filter"
> 
> ...should be moved here, presumably.

Yes. Thanks!
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 08/11] bundle: parse filter capability
  2022-03-07 16:14     ` Derrick Stolee
@ 2022-03-07 16:22       ` Ævar Arnfjörð Bjarmason
  2022-03-07 16:29         ` Derrick Stolee
  0 siblings, 1 reply; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-07 16:22 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, stolee, gitster, zhiyou.jx,
	jonathantanmy


On Mon, Mar 07 2022, Derrick Stolee wrote:

> On 3/7/2022 10:38 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:
>> 
>>> From: Derrick Stolee <derrickstolee@github.com>
> ...
>>> diff --git a/bundle.h b/bundle.h
>>> index 06009fe6b1f..eb026153d56 100644
>>> --- a/bundle.h
>>> +++ b/bundle.h
>>> @@ -5,11 +5,14 @@
>>>  #include "cache.h"
>>>  #include "string-list.h"
>>>  
>>> +struct list_objects_filter_options;
>>> +
>> 
>> For the other ones we include the relevant header, do the same here (or
>> if there's a need to not do it, do we need it for the rest too?)
>
> The others are .c files that require looking into the struct. This
> declaration is all that's required for this header file.
>
>>>  struct bundle_header {
>>>  	unsigned version;
>>>  	struct string_list prerequisites;
>>>  	struct string_list references;
>>>  	const struct git_hash_algo *hash_algo;
>>> +	struct list_objects_filter_options *filter;
>>>  };
>> 
>> I haven't tried, but any reason this needs to be a *filter
>> v.s. embedding it in the struct?
>> 
>> Then we'd just need list_objects_filter_release() and not the free() as
>> well.
>> 
>> Is it because you're piggy-backing on "if (header->filter)" as "do we
>> have it" state, better to check .nr?
>
> Yes. I replied to Junio before that there is some assumption in
> the filtering code that the .nr == 0 case is listed as a BUG()
> so we would possibly be breaking expectations in a different
> way doing the embedded version.

Having an "unsigned int using_filter:1" or whatever IMO makes that much
clearer than needing to carefully eyeball code that's already using
embedded structs & see why the one exception that's malloc'd is because
of that or some other reason...

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 08/11] bundle: parse filter capability
  2022-03-07 16:22       ` Ævar Arnfjörð Bjarmason
@ 2022-03-07 16:29         ` Derrick Stolee
  0 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee @ 2022-03-07 16:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, stolee, gitster, zhiyou.jx,
	jonathantanmy

On 3/7/2022 11:22 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Mar 07 2022, Derrick Stolee wrote:
> 
>> On 3/7/2022 10:38 AM, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> On Wed, Feb 23 2022, Derrick Stolee via GitGitGadget wrote:
>>>
>>>> From: Derrick Stolee <derrickstolee@github.com>
>> ...
>>>> diff --git a/bundle.h b/bundle.h
>>>> index 06009fe6b1f..eb026153d56 100644
>>>> --- a/bundle.h
>>>> +++ b/bundle.h
>>>> @@ -5,11 +5,14 @@
>>>>  #include "cache.h"
>>>>  #include "string-list.h"
>>>>  
>>>> +struct list_objects_filter_options;
>>>> +
>>>
>>> For the other ones we include the relevant header, do the same here (or
>>> if there's a need to not do it, do we need it for the rest too?)
>>
>> The others are .c files that require looking into the struct. This
>> declaration is all that's required for this header file.
>>
>>>>  struct bundle_header {
>>>>  	unsigned version;
>>>>  	struct string_list prerequisites;
>>>>  	struct string_list references;
>>>>  	const struct git_hash_algo *hash_algo;
>>>> +	struct list_objects_filter_options *filter;
>>>>  };
>>>
>>> I haven't tried, but any reason this needs to be a *filter
>>> v.s. embedding it in the struct?
>>>
>>> Then we'd just need list_objects_filter_release() and not the free() as
>>> well.
>>>
>>> Is it because you're piggy-backing on "if (header->filter)" as "do we
>>> have it" state, better to check .nr?
>>
>> Yes. I replied to Junio before that there is some assumption in
>> the filtering code that the .nr == 0 case is listed as a BUG()
>> so we would possibly be breaking expectations in a different
>> way doing the embedded version.
> 
> Having an "unsigned int using_filter:1" or whatever IMO makes that much
> clearer than needing to carefully eyeball code that's already using
> embedded structs & see why the one exception that's malloc'd is because
> of that or some other reason...

I think your recommended "using_filter" is messy. Having this
struct be a pointer instead of embedded self-documents that it
is optional (and can be NULL) but that if it is non-NULL, then
it should be considered and valid.

Here, I'm focusing on not allowing a non-sensical state, such
as using_filter = 0 but filter is actually populated with a
valid filter. The possibility of this state means there is a
higher chance of introducing a bug over time by not keeping
these values coupled.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 02/11] revision: put object filter into struct rev_info
  2022-03-07 13:59     ` Derrick Stolee
@ 2022-03-07 16:46       ` Junio C Hamano
  0 siblings, 0 replies; 114+ messages in thread
From: Junio C Hamano @ 2022-03-07 16:46 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy

Derrick Stolee <derrickstolee@github.com> writes:

>> member so we need an on-demand allocation.  CALLOC_ARRAY() instead
>> of xcalloc(), when we know we are creating one element and not an
>> array of elements whose size happens to be one, is not wrong but it
>> does look strange.  Was there a reason why we avoid xcalloc()?
>
> I think I've been using CALLOC_ARRAY(..., 1) over "... = xcalloc()"
> for simplicity's sake for a while. I see quite a few across the
> codebase, too, but I can swap the usage here if you feel that is
> important.

Not at all.  I think it was just me who was confused; CALLOC_ARRAY()
vs xcalloc() is not confusing.  Both are capable of and meant for
allocating an array of these elements, and use of it for a single
element non-array is the same either way.  Nothing to complain about.

>> they may or may not be prepared to take.  This "we get rid of a
>> global struct and replace it with an on-demand allocated structure,
>> pointer to which is stored in the revs structure" rewrite somehow
>> makes me nervous.
>
> I think the main idea is that the filter being NULL indicates "no
> filter is used. Do a full object walk." If we use a static struct,
> then we need to instead use revs->filter.filter_spec.nr, but that
> is already being used as a BUG() statement:

Thanks.  My observation was primarily that it looked deviating from
the original code but that is not an objection unless the original
was without room for improvement.  And in fact in this case, I think
the global variable that was a static struct should have been a
global variable that is a pointer to a struct which is NULL unless
the filtering capability is being used.  So in that sense, the
conversion done in this series is an improvement and going in the
right direction.

> While we _could_ make that switch to using a static struct and
> change our checks to allow empty specs, that would be a more
> involved change. Maybe we can leave it for a follow up?

No, there is no need to.  A pointer that is NULL when unused is the
right thing to do.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 06/11] MyFirstObjectWalk: update recommended usage
  2022-03-07 14:05     ` Derrick Stolee
@ 2022-03-07 16:47       ` Junio C Hamano
  0 siblings, 0 replies; 114+ messages in thread
From: Junio C Hamano @ 2022-03-07 16:47 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy

Derrick Stolee <derrickstolee@github.com> writes:

> Definitely room for improvement here. I got hung up on the two uses of
> "it" so here is another attempt:
>
>  If its `filter` member is not `NULL`, then `filter` contains
>  information for how to filter the object list.
>
> What do you think?

Sure.  I won't get hung up on three uses of filter here ;-)

The above reads well.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 10/11] bundle: create filtered bundles
  2022-03-07 14:14     ` Derrick Stolee
@ 2022-03-07 16:49       ` Junio C Hamano
  0 siblings, 0 replies; 114+ messages in thread
From: Junio C Hamano @ 2022-03-07 16:49 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy

Derrick Stolee <derrickstolee@github.com> writes:

> Perhaps I was thinking about having one line per "reason", which might
> be extended in the future.

Makes sense.  It wasn't a serious objection but mostly a curiousity
question anyway.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 11/11] bundle: unbundle promisor packs
  2022-03-07 14:48     ` Derrick Stolee
@ 2022-03-07 16:56       ` Junio C Hamano
  2022-03-07 18:57         ` Derrick Stolee
  0 siblings, 1 reply; 114+ messages in thread
From: Junio C Hamano @ 2022-03-07 16:56 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy

Derrick Stolee <derrickstolee@github.com> writes:

> Of course, looking closer at it... "git bundle unbundle" doesn't
> actually store the refs directly in the refspace, but instead only
> outputs the refs that it used.

True.  I was more thinking about equivalence between

    cd src_repo
    git clone --no-local --filter=... . ../partial.network.cloned
    git bundle create --filter=... partial.bndl
    git clone partial.bndl ../partial.bundle.cloned

The two resulting repositories should look very similar except for
that the remote.origin.* of the former would expect that it pushes
back to where it was cloned from, while the latter would not.

> +		git init unbundled &&
> +		(
> +			cd unbundled &&
> +			# This creates the first pack-file in the
> +			# .git/objects/pack directory. Look for a .promisor.
> +			git bundle unbundle ../partial.bdl >ref-list.txt &&
> +			ls .git/objects/pack/pack-*.promisor >promisor &&
> +			test_line_count = 1 promisor

And can we enumerate the objects we have in .git/objects, both loose
and packed?

> +		) &&
> +
> +		git clone --filter=blob:none --mirror "file://$(pwd)" cloned &&
> +		git -C cloned for-each-ref \
> +			--format="%(objectname) %(refname)" >cloned-refs.txt &&
> +		echo "$(git -C cloned rev-parse HEAD) HEAD" >>cloned-refs.txt &&
> +		test_cmp cloned-refs.txt unbundled/ref-list.txt

Likewise here?  I think the two should match, and that was what I
was wondering if we should enforce.

>  	'
>  done
>  
> --- >8 ---
>
> I also attempted doing a "git clone --bare partial.bdl unbundled.git" to
> get the 'git clone' command to actually place the refs. However, 'git clone'
> does not set up the repository filter based on the bundle, so it reports
> missing blobs (even though there is no checkout).

Understandable, as cloning from a bundle, if I recall correctly, was
done as yet another special case in "git clone", differently from
the main "over the network" code path.  And from end-user's point of
view, I think updating it is part of introducing the filtered
bundle.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 11/11] bundle: unbundle promisor packs
  2022-03-07 16:56       ` Junio C Hamano
@ 2022-03-07 18:57         ` Derrick Stolee
  2022-03-07 19:40           ` Junio C Hamano
  0 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee @ 2022-03-07 18:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy

On 3/7/2022 11:56 AM, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>> Of course, looking closer at it... "git bundle unbundle" doesn't
>> actually store the refs directly in the refspace, but instead only
>> outputs the refs that it used.
> 
> True.  I was more thinking about equivalence between
> 
>     cd src_repo
>     git clone --no-local --filter=... . ../partial.network.cloned
>     git bundle create --filter=... partial.bndl
>     git clone partial.bndl ../partial.bundle.cloned
> 
> The two resulting repositories should look very similar except for
> that the remote.origin.* of the former would expect that it pushes
> back to where it was cloned from, while the latter would not.

Makes sense.

The one downside is that you list cloning form a partial bundle,
but that currently does not work, even if we avoid a checkout.
It fails because the clone command is not parsing the filter
and properly setting repo-global promisor information. (Again,
this is a bigger change to make this possible.)

I also had some struggles getting this to work since local clones
were actually ignoring the filter. I didn't think it was worth
setting up an HTTP or SSH server just for this test. See
workaround below.
 
>> +		git init unbundled &&
>> +		(
>> +			cd unbundled &&
>> +			# This creates the first pack-file in the
>> +			# .git/objects/pack directory. Look for a .promisor.
>> +			git bundle unbundle ../partial.bdl >ref-list.txt &&
>> +			ls .git/objects/pack/pack-*.promisor >promisor &&
>> +			test_line_count = 1 promisor
> 
> And can we enumerate the objects we have in .git/objects, both loose
> and packed?

I can enumerate using 'git rev-list --objects' to compare the
unbundled set to the full clone (adding --filter=$filter to the
full clone's run and --missing=allow-any to the unbundled one).

>> +		) &&
>> +
>> +		git clone --filter=blob:none --mirror "file://$(pwd)" cloned &&
>> +		git -C cloned for-each-ref \
>> +			--format="%(objectname) %(refname)" >cloned-refs.txt &&
>> +		echo "$(git -C cloned rev-parse HEAD) HEAD" >>cloned-refs.txt &&
>> +		test_cmp cloned-refs.txt unbundled/ref-list.txt
> 
> Likewise here?  I think the two should match, and that was what I
> was wondering if we should enforce.
> 
>>  	'
>>  done
>>  
>> --- >8 ---
>>
>> I also attempted doing a "git clone --bare partial.bdl unbundled.git" to
>> get the 'git clone' command to actually place the refs. However, 'git clone'
>> does not set up the repository filter based on the bundle, so it reports
>> missing blobs (even though there is no checkout).
> 
> Understandable, as cloning from a bundle, if I recall correctly, was
> done as yet another special case in "git clone", differently from
> the main "over the network" code path.  And from end-user's point of
> view, I think updating it is part of introducing the filtered
> bundle.

The reason I did not include that here is because of the lack of
repository-global promisor/filter config. I do want to loop back
and make those updates, but perhaps for this series we should add
an error condition into 'git clone' to say "Cannot currently clone
from a filtered bundle" to help users understand the issue?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 11/11] bundle: unbundle promisor packs
  2022-03-07 18:57         ` Derrick Stolee
@ 2022-03-07 19:40           ` Junio C Hamano
  2022-03-07 19:49             ` Derrick Stolee
  0 siblings, 1 reply; 114+ messages in thread
From: Junio C Hamano @ 2022-03-07 19:40 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy

Derrick Stolee <derrickstolee@github.com> writes:

> I also had some struggles getting this to work since local clones
> were actually ignoring the filter. I didn't think it was worth
> setting up an HTTP or SSH server just for this test.

Does "clone --no-local $path" work as a workaround?  It should do
the same thing as "ssh" codepath except that it uses "sh" instead as
the process on the other side is running locally.

>>> I also attempted doing a "git clone --bare partial.bdl unbundled.git" to
>>> get the 'git clone' command to actually place the refs. However, 'git clone'
>>> does not set up the repository filter based on the bundle, so it reports
>>> missing blobs (even though there is no checkout).
>> 
>> Understandable, as cloning from a bundle, if I recall correctly, was
>> done as yet another special case in "git clone", differently from
>> the main "over the network" code path.  And from end-user's point of
>> view, I think updating it is part of introducing the filtered
>> bundle.
>
> The reason I did not include that here is because of the lack of
> repository-global promisor/filter config. I do want to loop back
> and make those updates, but perhaps for this series we should add
> an error condition into 'git clone' to say "Cannot currently clone
> from a filtered bundle" to help users understand the issue?

It would be a workable stepwise solution, I would think.  It is not
like we are robbing an existing feature from users---it merely is
that the support of partial cloning over different "transport" is
uneven, which is to be expected, especially in earlier phase of
introducing a new feature.

Thanks.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 11/11] bundle: unbundle promisor packs
  2022-03-07 19:40           ` Junio C Hamano
@ 2022-03-07 19:49             ` Derrick Stolee
  2022-03-07 19:54               ` Junio C Hamano
  0 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee @ 2022-03-07 19:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy

On 3/7/2022 2:40 PM, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>> I also had some struggles getting this to work since local clones
>> were actually ignoring the filter. I didn't think it was worth
>> setting up an HTTP or SSH server just for this test.
> 
> Does "clone --no-local $path" work as a workaround?  It should do
> the same thing as "ssh" codepath except that it uses "sh" instead as
> the process on the other side is running locally.

I've been trying this:

 git clone --no-local --filter=$filter "file://$(pwd)" cloned &&

which "succeeds" with this in the stderr:

 warning: filtering not recognized by server, ignoring

>>>> I also attempted doing a "git clone --bare partial.bdl unbundled.git" to
>>>> get the 'git clone' command to actually place the refs. However, 'git clone'
>>>> does not set up the repository filter based on the bundle, so it reports
>>>> missing blobs (even though there is no checkout).
>>>
>>> Understandable, as cloning from a bundle, if I recall correctly, was
>>> done as yet another special case in "git clone", differently from
>>> the main "over the network" code path.  And from end-user's point of
>>> view, I think updating it is part of introducing the filtered
>>> bundle.
>>
>> The reason I did not include that here is because of the lack of
>> repository-global promisor/filter config. I do want to loop back
>> and make those updates, but perhaps for this series we should add
>> an error condition into 'git clone' to say "Cannot currently clone
>> from a filtered bundle" to help users understand the issue?
> 
> It would be a workable stepwise solution, I would think.  It is not
> like we are robbing an existing feature from users---it merely is
> that the support of partial cloning over different "transport" is
> uneven, which is to be expected, especially in earlier phase of
> introducing a new feature.

That was my understanding, too.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 11/11] bundle: unbundle promisor packs
  2022-03-07 19:49             ` Derrick Stolee
@ 2022-03-07 19:54               ` Junio C Hamano
  2022-03-07 20:20                 ` Derrick Stolee
  0 siblings, 1 reply; 114+ messages in thread
From: Junio C Hamano @ 2022-03-07 19:54 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy

Derrick Stolee <derrickstolee@github.com> writes:

> I've been trying this:
>
>  git clone --no-local --filter=$filter "file://$(pwd)" cloned &&
>
> which "succeeds" with this in the stderr:
>
>  warning: filtering not recognized by server, ignoring

Hmph, and we won't see it when going over ssh to the same
repository?  That is puzzling.

>>> an error condition into 'git clone' to say "Cannot currently clone
>>> from a filtered bundle" to help users understand the issue?
>> 
>> It would be a workable stepwise solution, I would think.  It is not
>> like we are robbing an existing feature from users---it merely is
>> that the support of partial cloning over different "transport" is
>> uneven, which is to be expected, especially in earlier phase of
>> introducing a new feature.
>
> That was my understanding, too.

Good to see us on the same page.  Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 11/11] bundle: unbundle promisor packs
  2022-03-07 19:54               ` Junio C Hamano
@ 2022-03-07 20:20                 ` Derrick Stolee
  2022-03-07 21:35                   ` Junio C Hamano
  0 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee @ 2022-03-07 20:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy

On 3/7/2022 2:54 PM, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>> I've been trying this:
>>
>>  git clone --no-local --filter=$filter "file://$(pwd)" cloned &&
>>
>> which "succeeds" with this in the stderr:
>>
>>  warning: filtering not recognized by server, ignoring
> 
> Hmph, and we won't see it when going over ssh to the same
> repository?  That is puzzling.

Of course, it's not an issue with file://, but an issue with the
defaults. In order to test partial clones, I need to enable these
config options in the server repo:

		test_config uploadpack.allowfilter 1 &&
		test_config uploadpack.allowanysha1inwant 1 &&

Sorry for taking so long to realize this.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 11/11] bundle: unbundle promisor packs
  2022-03-07 20:20                 ` Derrick Stolee
@ 2022-03-07 21:35                   ` Junio C Hamano
  0 siblings, 0 replies; 114+ messages in thread
From: Junio C Hamano @ 2022-03-07 21:35 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy

Derrick Stolee <derrickstolee@github.com> writes:

>> Hmph, and we won't see it when going over ssh to the same
>> repository?  That is puzzling.
>
> Of course, it's not an issue with file://, but an issue with the
> defaults. In order to test partial clones, I need to enable these
> config options in the server repo:
>
> 		test_config uploadpack.allowfilter 1 &&
> 		test_config uploadpack.allowanysha1inwant 1 &&
>
> Sorry for taking so long to realize this.

No, thanks for finding that out---the fact that I left it at
"puzzling" without offering that is a strong evidence that I didn't
think of it, either ;-)


^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v2 00/12] Partial bundles
  2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
                   ` (13 preceding siblings ...)
  2022-03-07 14:55 ` Ævar Arnfjörð Bjarmason
@ 2022-03-07 21:50 ` Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 01/12] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
                     ` (13 more replies)
  14 siblings, 14 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee

While discussing bundle-URIs [1], it came to my attention that bundles have
no way to specify an object filter, so bundles cannot be used with partial
clones.

[1]
https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/

This series provides a way to fix that by adding a 'filter' capability to
the bundle file format and allowing one to create a partial bundle with 'git
bundle create --filter=blob:none '.

There are a couple things that I want to point out about this implementation
that could use some high-level feedback:

 1. I moved the '--filter' parsing into setup_revisions() instead of adding
    another place to parse it. This works for 'git bundle' but it also
    allows it to be parsed successfully in commands such as 'git diff' which
    doesn't make sense. Options such as '--objects' are already being parsed
    there, and they don't make sense either, so I want some thoughts on
    this.

 2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
    filtered bundle, then the clone will fail with a message such as

fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
remote did not send all necessary objects

This might be fine. We don't expect users to clone partial bundles or fetch
partial bundles into an unfiltered repo and these failures are expected. It
is possible that we could put in custom logic to fail faster by reading the
bundle header for a filter.

Generally, the idea is to open this up as a potential way to bootstrap a
clone of a partial clone using a set of precomputed partial bundles.


Updates in v2
=============

Thanks for the reviews, Jeff, Junio, and Ævar!

 * Commit message typos and grammar are improved.

 * Grammar in MyFirstObjectWalk.txt is improved.

 * Unnecessary line wrapping is unwrapped.

 * Final test to check unbundled repo is made more rigorous.

 * The new 'filter' capability is added to
   Documentation/technical/bundle-format.txt

 * Expanded docs for 'git bundle verify'.

 * Moved API docs gently_parse_list_objects_filter() to header.

 * Test name swaps '' with "" to evaluate $filter.

 * Added a new patch that helps git clone <bundle> fail gracefully when
   <bundle> is has a filter capability.

Thanks, -Stolee

Derrick Stolee (12):
  index-pack: document and test the --promisor option
  revision: put object filter into struct rev_info
  pack-objects: use rev.filter when possible
  pack-bitmap: drop filter in prepare_bitmap_walk()
  list-objects: consolidate traverse_commit_list[_filtered]
  MyFirstObjectWalk: update recommended usage
  bundle: safely handle --objects option
  bundle: parse filter capability
  rev-list: move --filter parsing into revision.c
  bundle: create filtered bundles
  bundle: unbundle promisor packs
  clone: fail gracefully when cloning filtered bundle

 Documentation/MyFirstObjectWalk.txt       | 44 +++++-------
 Documentation/git-bundle.txt              |  4 +-
 Documentation/git-index-pack.txt          |  8 +++
 Documentation/technical/bundle-format.txt | 11 ++-
 builtin/clone.c                           | 13 ++++
 builtin/pack-objects.c                    |  9 +--
 builtin/rev-list.c                        | 29 ++------
 bundle.c                                  | 86 +++++++++++++++++++----
 bundle.h                                  |  3 +
 list-objects-filter-options.c             | 17 +----
 list-objects-filter-options.h             | 20 ++++++
 list-objects.c                            | 25 +++----
 list-objects.h                            | 12 +++-
 pack-bitmap.c                             | 24 +++----
 pack-bitmap.h                             |  2 -
 reachable.c                               |  2 +-
 revision.c                                | 11 +++
 revision.h                                |  4 ++
 t/t5300-pack-object.sh                    |  4 +-
 t/t6020-bundle-misc.sh                    | 74 +++++++++++++++++++
 20 files changed, 278 insertions(+), 124 deletions(-)


base-commit: 45fe28c951c3e70666ee4ef8379772851a8e4d32
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1159%2Fderrickstolee%2Fbundle%2Fpartial-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1159/derrickstolee/bundle/partial-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1159

Range-diff vs v1:

  1:  a1eb4dceb0b =  1:  a1eb4dceb0b index-pack: document and test the --promisor option
  2:  3a88c99d9bc =  2:  3a88c99d9bc revision: put object filter into struct rev_info
  3:  d5edb193229 =  3:  d5edb193229 pack-objects: use rev.filter when possible
  4:  888774f6f28 =  4:  888774f6f28 pack-bitmap: drop filter in prepare_bitmap_walk()
  5:  ec57ed5c37f !  5:  bcb76a065bf list-objects: consolidate traverse_commit_list[_filtered]
     @@ Commit message
          traverse_commit_list() is the presence of the 'omitted' parameter, which
          is only non-NULL for one caller. We can consolidate these two methods by
          having one call the other and use the simpler form everywhere the
     -    'omitted' paramter would be NULL.
     +    'omitted' parameter would be NULL.
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
  6:  355c503157a !  6:  efc03168818 MyFirstObjectWalk: update recommended usage
     @@ Documentation/MyFirstObjectWalk.txt: function shows that the all-object walk is
      -- `struct list_objects_filter_options *filter_options`: This is a struct which
      -  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
      -- `struct rev_info *revs`: This is the `rev_info` used for the walk.
     -+- `struct rev_info *revs`: This is the `rev_info` used for the walk. It
     -+  includes a `filter` member which contains information for how to filter
     -+  the object list.
     ++- `struct rev_info *revs`: This is the `rev_info` used for the walk. If
     ++  its `filter` member is not `NULL`, then `filter` contains information for
     ++  how to filter the object list.
       - `show_commit_fn show_commit`: A callback which will be used to handle each
         individual commit object.
       - `show_object_fn show_object`: A callback which will be used to handle each
  7:  1476a9495c5 !  7:  19694d5b255 bundle: safely handle --objects option
     @@ Commit message
      
          By populating the callback we prevent a segfault in the case of adding
          the --objects flag. This is really a redundant statement because the
     -    bundles are constructing a pack-file containing all objects in the
     +    command is constructing a pack-file containing all objects in the
          discovered commit range.
      
          Adding --objects to a 'git bundle' command might cause a slower command,
  8:  e7dbb46e6ac !  8:  898a7d94513 bundle: parse filter capability
     @@ Commit message
          bundle header, and demonstrate this understanding by adding a message to
          'git bundle verify'.
      
     +    Since we will use gently_parse_list_objects_filter() outside of
     +    list-objects-filter-options.c, make it an external method and move its
     +    API documentation to before its declaration.
     +
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     + ## Documentation/technical/bundle-format.txt ##
     +@@ Documentation/technical/bundle-format.txt: and the Git bundle v2 format cannot represent a shallow clone repository.
     + == Capabilities
     + 
     + Because there is no opportunity for negotiation, unknown capabilities cause 'git
     +-bundle' to abort.  The only known capability is `object-format`, which specifies
     +-the hash algorithm in use, and can take the same values as the
     +-`extensions.objectFormat` configuration value.
     ++bundle' to abort.
     ++
     ++* `object-format` specifies the hash algorithm in use, and can take the same
     ++  values as the `extensions.objectFormat` configuration value.
     ++
     ++* `filter` specifies an object filter as in the `--filter` option in
     ++  linkgit:git-rev-list[1]. The resulting pack-file must be marked as a
     ++  `.promisor` pack-file after it is unbundled.
     +
       ## bundle.c ##
      @@
       #include "run-command.h"
     @@ bundle.h
      
       ## list-objects-filter-options.c ##
      @@ list-objects-filter-options.c: const char *list_object_filter_config_name(enum list_objects_filter_choice c)
     -  * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
     -  * convenience of the current command.
     -  */
     + 	BUG("list_object_filter_config_name: invalid argument '%d'", c);
     + }
     + 
     +-/*
     +- * Parse value of the argument to the "filter" keyword.
     +- * On the command line this looks like:
     +- *       --filter=<arg>
     +- * and in the pack protocol as:
     +- *       "filter" SP <arg>
     +- *
     +- * The filter keyword will be used by many commands.
     +- * See Documentation/rev-list-options.txt for allowed values for <arg>.
     +- *
     +- * Capture the given arg as the "filter_spec".  This can be forwarded to
     +- * subordinate commands when necessary (although it's better to pass it through
     +- * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
     +- * convenience of the current command.
     +- */
      -static int gently_parse_list_objects_filter(
      +int gently_parse_list_objects_filter(
       	struct list_objects_filter_options *filter_options,
     @@ list-objects-filter-options.h: struct list_objects_filter_options {
       /* Normalized command line arguments */
       #define CL_ARG__FILTER "filter"
       
     ++/*
     ++ * Parse value of the argument to the "filter" keyword.
     ++ * On the command line this looks like:
     ++ *       --filter=<arg>
     ++ * and in the pack protocol as:
     ++ *       "filter" SP <arg>
     ++ *
     ++ * The filter keyword will be used by many commands.
     ++ * See Documentation/rev-list-options.txt for allowed values for <arg>.
     ++ *
     ++ * Capture the given arg as the "filter_spec".  This can be forwarded to
     ++ * subordinate commands when necessary (although it's better to pass it through
     ++ * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
     ++ * convenience of the current command.
     ++ */
      +int gently_parse_list_objects_filter(
      +	struct list_objects_filter_options *filter_options,
      +	const char *arg,
  9:  22c4fe9d4bc =  9:  aaa15d7d512 rev-list: move --filter parsing into revision.c
 10:  5393e74708d ! 10:  82d93fc62e2 bundle: create filtered bundles
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     + ## Documentation/git-bundle.txt ##
     +@@ Documentation/git-bundle.txt: verify <file>::
     + 	cleanly to the current repository.  This includes checks on the
     + 	bundle format itself as well as checking that the prerequisite
     + 	commits exist and are fully linked in the current repository.
     +-	'git bundle' prints a list of missing commits, if any, and exits
     +-	with a non-zero status.
     ++	'git bundle' prints the bundle's object filter and its list of
     ++	missing commits, if any, and exits with a non-zero status.
     + 
     + list-heads <file>::
     + 	Lists the references defined in the bundle.  If followed by a
     +
       ## bundle.c ##
      @@ bundle.c: static int write_pack_data(int bundle_fd, struct rev_info *revs, struct strvec *
       		     "--stdout", "--thin", "--delta-base-offset",
     @@ bundle.c: int create_bundle(struct repository *r, const char *path,
      +	 *    SHA1.
      +	 * 2. @filter is required because we parsed an object filter.
      +	 */
     -+	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] ||
     -+	    revs.filter)
     ++	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] || revs.filter)
      +		min_version = 3;
      +
      +	if (argc > 1) {
     @@ t/t6020-bundle-misc.sh: test_expect_success 'unfiltered bundle with --objects' '
       
      +for filter in "blob:none" "tree:0" "tree:1" "blob:limit=100"
      +do
     -+	test_expect_success 'filtered bundle: $filter' '
     -+		test_when_finished rm -rf .git/objects/pack &&
     ++	test_expect_success "filtered bundle: $filter" '
     ++		test_when_finished rm -rf .git/objects/pack cloned unbundled &&
      +		git bundle create partial.bdl \
      +			--all \
      +			--filter=$filter &&
     @@ t/t6020-bundle-misc.sh: test_expect_success 'unfiltered bundle with --objects' '
      +		The bundle uses this filter: $filter
      +		The bundle records a complete history.
      +		EOF
     -+		test_cmp expect actual
     ++		test_cmp expect actual &&
     ++
     ++		test_config uploadpack.allowfilter 1 &&
     ++		test_config uploadpack.allowanysha1inwant 1 &&
     ++		git clone --no-local --filter=$filter --bare "file://$(pwd)" cloned &&
     ++
     ++		git init unbundled &&
     ++		git -C unbundled bundle unbundle ../partial.bdl >ref-list.txt &&
     ++
     ++		# Count the same number of reachable objects.
     ++		reflist=$(git for-each-ref --format="%(objectname)") &&
     ++		git rev-list --objects --filter=$filter --missing=allow-any \
     ++			$reflist >expect &&
     ++		for repo in cloned unbundled
     ++		do
     ++			git -C $repo rev-list --objects --missing=allow-any \
     ++				$reflist >actual &&
     ++			test_cmp expect actual || return 1
     ++		done
      +	'
      +done
      +
 11:  ec51d0a50e6 ! 11:  ef17691a6b7 bundle: unbundle promisor packs
     @@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
      
       ## t/t6020-bundle-misc.sh ##
      @@ t/t6020-bundle-misc.sh: do
     - 		The bundle uses this filter: $filter
     - 		The bundle records a complete history.
     - 		EOF
     --		test_cmp expect actual
     -+		test_cmp expect actual &&
     -+
     -+		# This creates the first pack-file in the
     -+		# .git/objects/pack directory. Look for a .promisor.
     -+		git bundle unbundle partial.bdl &&
     -+		ls .git/objects/pack/pack-*.promisor >promisor &&
     -+		test_line_count = 1 promisor
     - 	'
     - done
       
     + 		git init unbundled &&
     + 		git -C unbundled bundle unbundle ../partial.bdl >ref-list.txt &&
     ++		ls unbundled/.git/objects/pack/pack-*.promisor >promisor &&
     ++		test_line_count = 1 promisor &&
     + 
     + 		# Count the same number of reachable objects.
     + 		reflist=$(git for-each-ref --format="%(objectname)") &&
  -:  ----------- > 12:  382b9502f6b clone: fail gracefully when cloning filtered bundle

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v2 01/12] index-pack: document and test the --promisor option
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
@ 2022-03-07 21:50   ` Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 02/12] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The --promisor option of 'git index-pack' was created in 88e2f9e
(introduce fetch-object: fetch one promisor object, 2017-12-05) but was
untested. It is currently unused within the Git codebase, but that will
change in an upcoming change to 'git bundle unbundle' when there is a
filter capability.

For now, add documentation about the option and add a test to ensure it
is working as expected.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-index-pack.txt | 8 ++++++++
 t/t5300-pack-object.sh           | 4 +++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 1f1e3592251..4e71c256ecb 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -122,6 +122,14 @@ This option cannot be used with --stdin.
 +
 include::object-format-disclaimer.txt[]
 
+--promisor[=<message>]::
+	Before committing the pack-index, create a .promisor file for this
+	pack. Particularly helpful when writing a promisor pack with --fix-thin
+	since the name of the pack is not final until the pack has been fully
+	written. If a `<message>` is provided, then that content will be
+	written to the .promisor file for future reference. See
+	link:technical/partial-clone.html[partial clone] for more information.
+
 NOTES
 -----
 
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 2fd845187e7..a11d61206ad 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -315,8 +315,10 @@ test_expect_success \
      git index-pack -o tmp.idx test-3.pack &&
      cmp tmp.idx test-1-${packname_1}.idx &&
 
-     git index-pack test-3.pack &&
+     git index-pack --promisor=message test-3.pack &&
      cmp test-3.idx test-1-${packname_1}.idx &&
+     echo message >expect &&
+     test_cmp expect test-3.promisor &&
 
      cat test-2-${packname_2}.pack >test-3.pack &&
      git index-pack -o tmp.idx test-2-${packname_2}.pack &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 02/12] revision: put object filter into struct rev_info
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 01/12] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
@ 2022-03-07 21:50   ` Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 03/12] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
                     ` (11 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Placing a 'struct list_objects_filter_options' pointer within 'struct
rev_info' will assist making some bookkeeping around object filters in
the future.

For now, let's use this new member to remove a static global instance of
the struct from builtin/rev-list.c.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/rev-list.c | 30 ++++++++++++++++--------------
 revision.h         |  4 ++++
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 777558e9b06..6f2b91d304e 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -62,7 +62,6 @@ static const char rev_list_usage[] =
 static struct progress *progress;
 static unsigned progress_counter;
 
-static struct list_objects_filter_options filter_options;
 static struct oidset omitted_objects;
 static int arg_print_omitted; /* print objects omitted by filter */
 
@@ -400,7 +399,6 @@ static inline int parse_missing_action_value(const char *value)
 }
 
 static int try_bitmap_count(struct rev_info *revs,
-			    struct list_objects_filter_options *filter,
 			    int filter_provided_objects)
 {
 	uint32_t commit_count = 0,
@@ -436,7 +434,8 @@ static int try_bitmap_count(struct rev_info *revs,
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
+					 filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -453,7 +452,6 @@ static int try_bitmap_count(struct rev_info *revs,
 }
 
 static int try_bitmap_traversal(struct rev_info *revs,
-				struct list_objects_filter_options *filter,
 				int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
@@ -465,7 +463,8 @@ static int try_bitmap_traversal(struct rev_info *revs,
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
+					 filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -475,7 +474,6 @@ static int try_bitmap_traversal(struct rev_info *revs,
 }
 
 static int try_bitmap_disk_usage(struct rev_info *revs,
-				 struct list_objects_filter_options *filter,
 				 int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
@@ -483,7 +481,7 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!show_disk_usage)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, revs->filter, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -597,13 +595,17 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		}
 
 		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
-			parse_list_objects_filter(&filter_options, arg);
-			if (filter_options.choice && !revs.blob_objects)
+			if (!revs.filter)
+				CALLOC_ARRAY(revs.filter, 1);
+			parse_list_objects_filter(revs.filter, arg);
+			if (revs.filter->choice && !revs.blob_objects)
 				die(_("object filtering requires --objects"));
 			continue;
 		}
 		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-			list_objects_filter_set_no_filter(&filter_options);
+			if (!revs.filter)
+				CALLOC_ARRAY(revs.filter, 1);
+			list_objects_filter_set_no_filter(revs.filter);
 			continue;
 		}
 		if (!strcmp(arg, "--filter-provided-objects")) {
@@ -688,11 +690,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		progress = start_delayed_progress(show_progress, 0);
 
 	if (use_bitmap_index) {
-		if (!try_bitmap_count(&revs, &filter_options, filter_provided_objects))
+		if (!try_bitmap_count(&revs, filter_provided_objects))
 			return 0;
-		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_objects))
+		if (!try_bitmap_disk_usage(&revs, filter_provided_objects))
 			return 0;
-		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_objects))
+		if (!try_bitmap_traversal(&revs, filter_provided_objects))
 			return 0;
 	}
 
@@ -733,7 +735,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
 
 	traverse_commit_list_filtered(
-		&filter_options, &revs, show_commit, show_object, &info,
+		revs.filter, &revs, show_commit, show_object, &info,
 		(arg_print_omitted ? &omitted_objects : NULL));
 
 	if (arg_print_omitted) {
diff --git a/revision.h b/revision.h
index 3c58c18c63a..1ddb73ab82e 100644
--- a/revision.h
+++ b/revision.h
@@ -81,6 +81,7 @@ struct rev_cmdline_info {
 
 struct oidset;
 struct topo_walk_info;
+struct list_objects_filter_options;
 
 struct rev_info {
 	/* Starting list */
@@ -94,6 +95,9 @@ struct rev_info {
 	/* The end-points specified by the end user */
 	struct rev_cmdline_info cmdline;
 
+	/* Object filter options. NULL for no filtering. */
+	struct list_objects_filter_options *filter;
+
 	/* excluding from --branches, --refs, etc. expansion */
 	struct string_list *ref_excludes;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 03/12] pack-objects: use rev.filter when possible
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 01/12] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 02/12] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
@ 2022-03-07 21:50   ` Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 04/12] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

In builtin/pack-objects.c, we use a 'filter_options' global to populate
the --filter=<X> argument. The previous change created a pointer to a
filter option in 'struct rev_info', so we can use that pointer here as a
start to simplifying some usage of object filters.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ba2006f2212..256d9b1798f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, revs->filter, 0)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
@@ -3727,6 +3727,7 @@ static void get_object_list(int ac, const char **av)
 	repo_init_revisions(the_repository, &revs, NULL);
 	save_commit_buffer = 0;
 	setup_revisions(ac, av, &revs, &s_r_opt);
+	revs.filter = &filter_options;
 
 	/* make sure shallows are read */
 	is_repository_shallow(the_repository);
@@ -3777,7 +3778,7 @@ static void get_object_list(int ac, const char **av)
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
-	traverse_commit_list_filtered(&filter_options, &revs,
+	traverse_commit_list_filtered(revs.filter, &revs,
 				      show_commit, fn_show_object, NULL,
 				      NULL);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 04/12] pack-bitmap: drop filter in prepare_bitmap_walk()
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-03-07 21:50   ` [PATCH v2 03/12] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
@ 2022-03-07 21:50   ` Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 05/12] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Now that all consumers of prepare_bitmap_walk() have populated the
'filter' member of 'struct rev_info', we can drop that extra parameter
from the method and access it directly from the 'struct rev_info'.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c |  2 +-
 builtin/rev-list.c     |  8 +++-----
 pack-bitmap.c          | 20 +++++++++-----------
 pack-bitmap.h          |  2 --
 reachable.c            |  2 +-
 5 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 256d9b1798f..57f2cf49696 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, revs->filter, 0)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, 0)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 6f2b91d304e..556e78aebb9 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -434,8 +434,7 @@ static int try_bitmap_count(struct rev_info *revs,
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
-					 filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -463,8 +462,7 @@ static int try_bitmap_traversal(struct rev_info *revs,
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
-					 filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -481,7 +479,7 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!show_disk_usage)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, revs->filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9c666cdb8bd..613f2797cdf 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -739,8 +739,7 @@ static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
 static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 				   struct rev_info *revs,
 				   struct object_list *roots,
-				   struct bitmap *seen,
-				   struct list_objects_filter_options *filter)
+				   struct bitmap *seen)
 {
 	struct bitmap *base = NULL;
 	int needs_walk = 0;
@@ -823,7 +822,7 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 		show_data.bitmap_git = bitmap_git;
 		show_data.base = base;
 
-		traverse_commit_list_filtered(filter, revs,
+		traverse_commit_list_filtered(revs->filter, revs,
 					      show_commit, show_object,
 					      &show_data, NULL);
 
@@ -1219,7 +1218,6 @@ static int can_filter_bitmap(struct list_objects_filter_options *filter)
 }
 
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter,
 					 int filter_provided_objects)
 {
 	unsigned int i;
@@ -1240,7 +1238,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (revs->prune)
 		return NULL;
 
-	if (!can_filter_bitmap(filter))
+	if (!can_filter_bitmap(revs->filter))
 		return NULL;
 
 	/* try to open a bitmapped pack, but don't parse it yet
@@ -1297,8 +1295,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 
 	if (haves) {
 		revs->ignore_missing_links = 1;
-		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL,
-					    filter);
+		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
 		reset_revision_walk();
 		revs->ignore_missing_links = 0;
 
@@ -1306,8 +1303,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 			BUG("failed to perform bitmap walk");
 	}
 
-	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap,
-				    filter);
+	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);
 
 	if (!wants_bitmap)
 		BUG("failed to perform bitmap walk");
@@ -1315,8 +1311,10 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
 
-	filter_bitmap(bitmap_git, (filter && filter_provided_objects) ? NULL : wants,
-		      wants_bitmap, filter);
+	filter_bitmap(bitmap_git,
+		      (revs->filter && filter_provided_objects) ? NULL : wants,
+		      wants_bitmap,
+		      revs->filter);
 
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 19a63fa1abc..3d3ddd77345 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -10,7 +10,6 @@
 struct commit;
 struct repository;
 struct rev_info;
-struct list_objects_filter_options;
 
 static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};
 
@@ -54,7 +53,6 @@ void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
 int test_bitmap_hashes(struct repository *r);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter,
 					 int filter_provided_objects);
 uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git);
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
diff --git a/reachable.c b/reachable.c
index 84e3d0d75ed..b9f4ad886ef 100644
--- a/reachable.c
+++ b/reachable.c
@@ -205,7 +205,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	cp.progress = progress;
 	cp.count = 0;
 
-	bitmap_git = prepare_bitmap_walk(revs, NULL, 0);
+	bitmap_git = prepare_bitmap_walk(revs, 0);
 	if (bitmap_git) {
 		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
 		free_bitmap_index(bitmap_git);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 05/12] list-objects: consolidate traverse_commit_list[_filtered]
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-03-07 21:50   ` [PATCH v2 04/12] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
@ 2022-03-07 21:50   ` Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 06/12] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Now that all consumers of traverse_commit_list_filtered() populate the
'filter' member of 'struct rev_info', we can drop that parameter from
the method prototype to simplify things. In addition, the only thing
different now between traverse_commit_list_filtered() and
traverse_commit_list() is the presence of the 'omitted' parameter, which
is only non-NULL for one caller. We can consolidate these two methods by
having one call the other and use the simpler form everywhere the
'omitted' parameter would be NULL.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c |  6 +++---
 builtin/rev-list.c     |  2 +-
 list-objects.c         | 25 ++++++++-----------------
 list-objects.h         | 12 ++++++++++--
 pack-bitmap.c          |  6 +++---
 5 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 57f2cf49696..0432ae1e499 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3778,9 +3778,9 @@ static void get_object_list(int ac, const char **av)
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
-	traverse_commit_list_filtered(revs.filter, &revs,
-				      show_commit, fn_show_object, NULL,
-				      NULL);
+	traverse_commit_list(&revs,
+			     show_commit, fn_show_object,
+			     NULL);
 
 	if (unpack_unreachable_expiration) {
 		revs.ignore_missing_links = 1;
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 556e78aebb9..3ab727817fd 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -733,7 +733,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
 
 	traverse_commit_list_filtered(
-		revs.filter, &revs, show_commit, show_object, &info,
+		&revs, show_commit, show_object, &info,
 		(arg_print_omitted ? &omitted_objects : NULL));
 
 	if (arg_print_omitted) {
diff --git a/list-objects.c b/list-objects.c
index 2f623f82115..9422625b39e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -416,22 +416,7 @@ static void do_traverse(struct traversal_context *ctx)
 	strbuf_release(&csp);
 }
 
-void traverse_commit_list(struct rev_info *revs,
-			  show_commit_fn show_commit,
-			  show_object_fn show_object,
-			  void *show_data)
-{
-	struct traversal_context ctx;
-	ctx.revs = revs;
-	ctx.show_commit = show_commit;
-	ctx.show_object = show_object;
-	ctx.show_data = show_data;
-	ctx.filter = NULL;
-	do_traverse(&ctx);
-}
-
 void traverse_commit_list_filtered(
-	struct list_objects_filter_options *filter_options,
 	struct rev_info *revs,
 	show_commit_fn show_commit,
 	show_object_fn show_object,
@@ -444,7 +429,13 @@ void traverse_commit_list_filtered(
 	ctx.show_object = show_object;
 	ctx.show_commit = show_commit;
 	ctx.show_data = show_data;
-	ctx.filter = list_objects_filter__init(omitted, filter_options);
+	if (revs->filter)
+		ctx.filter = list_objects_filter__init(omitted, revs->filter);
+	else
+		ctx.filter = NULL;
+
 	do_traverse(&ctx);
-	list_objects_filter__free(ctx.filter);
+
+	if (ctx.filter)
+		list_objects_filter__free(ctx.filter);
 }
diff --git a/list-objects.h b/list-objects.h
index a952680e466..9eaf4de8449 100644
--- a/list-objects.h
+++ b/list-objects.h
@@ -7,7 +7,6 @@ struct rev_info;
 
 typedef void (*show_commit_fn)(struct commit *, void *);
 typedef void (*show_object_fn)(struct object *, const char *, void *);
-void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, void *);
 
 typedef void (*show_edge_fn)(struct commit *);
 void mark_edges_uninteresting(struct rev_info *revs,
@@ -18,11 +17,20 @@ struct oidset;
 struct list_objects_filter_options;
 
 void traverse_commit_list_filtered(
-	struct list_objects_filter_options *filter_options,
 	struct rev_info *revs,
 	show_commit_fn show_commit,
 	show_object_fn show_object,
 	void *show_data,
 	struct oidset *omitted);
 
+static inline void traverse_commit_list(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	void *show_data)
+{
+	traverse_commit_list_filtered(revs, show_commit,
+				      show_object, show_data, NULL);
+}
+
 #endif /* LIST_OBJECTS_H */
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 613f2797cdf..cbefaedbf43 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -822,9 +822,9 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 		show_data.bitmap_git = bitmap_git;
 		show_data.base = base;
 
-		traverse_commit_list_filtered(revs->filter, revs,
-					      show_commit, show_object,
-					      &show_data, NULL);
+		traverse_commit_list(revs,
+				     show_commit, show_object,
+				     &show_data);
 
 		revs->include_check = NULL;
 		revs->include_check_obj = NULL;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 06/12] MyFirstObjectWalk: update recommended usage
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-03-07 21:50   ` [PATCH v2 05/12] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
@ 2022-03-07 21:50   ` Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 07/12] bundle: safely handle --objects option Derrick Stolee via GitGitGadget
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change consolidated traverse_commit_list() and
traverse_commit_list_filtered(). This allows us to simplify the
recommended usage in MyFirstObjectWalk.txt to use this new set of
values.

While here, add some clarification on the difference between the two
methods.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/MyFirstObjectWalk.txt | 44 +++++++++++------------------
 1 file changed, 16 insertions(+), 28 deletions(-)

diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
index ca267941f3e..8d9e85566e6 100644
--- a/Documentation/MyFirstObjectWalk.txt
+++ b/Documentation/MyFirstObjectWalk.txt
@@ -522,24 +522,25 @@ function shows that the all-object walk is being performed by
 `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
 functions reside in `list-objects.c`; examining the source shows that, despite
 the name, these functions traverse all kinds of objects. Let's have a look at
-the arguments to `traverse_commit_list_filtered()`, which are a superset of the
-arguments to the unfiltered version.
+the arguments to `traverse_commit_list()`.
 
-- `struct list_objects_filter_options *filter_options`: This is a struct which
-  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
-- `struct rev_info *revs`: This is the `rev_info` used for the walk.
+- `struct rev_info *revs`: This is the `rev_info` used for the walk. If
+  its `filter` member is not `NULL`, then `filter` contains information for
+  how to filter the object list.
 - `show_commit_fn show_commit`: A callback which will be used to handle each
   individual commit object.
 - `show_object_fn show_object`: A callback which will be used to handle each
   non-commit object (so each blob, tree, or tag).
 - `void *show_data`: A context buffer which is passed in turn to `show_commit`
   and `show_object`.
+
+In addition, `traverse_commit_list_filtered()` has an additional paramter:
+
 - `struct oidset *omitted`: A linked-list of object IDs which the provided
   filter caused to be omitted.
 
-It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
-instead of needing us to call it repeatedly ourselves. Cool! Let's add the
-callbacks first.
+It looks like these methods use callbacks we provide instead of needing us
+to call it repeatedly ourselves. Cool! Let's add the callbacks first.
 
 For the sake of this tutorial, we'll simply keep track of how many of each kind
 of object we find. At file scope in `builtin/walken.c` add the following
@@ -712,20 +713,9 @@ help understand. In our case, that means we omit trees and blobs not directly
 referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
 `HEAD` in the `pending` list.)
 
-First, we'll need to `#include "list-objects-filter-options.h"` and set up the
-`struct list_objects_filter_options` at the top of the function.
-
-----
-static void walken_object_walk(struct rev_info *rev)
-{
-	struct list_objects_filter_options filter_options = { 0 };
-
-	...
-----
-
 For now, we are not going to track the omitted objects, so we'll replace those
 parameters with `NULL`. For the sake of simplicity, we'll add a simple
-build-time branch to use our filter or not. Replace the line calling
+build-time branch to use our filter or not. Preface the line calling
 `traverse_commit_list()` with the following, which will remind us which kind of
 walk we've just performed:
 
@@ -733,19 +723,17 @@ walk we've just performed:
 	if (0) {
 		/* Unfiltered: */
 		trace_printf(_("Unfiltered object walk.\n"));
-		traverse_commit_list(rev, walken_show_commit,
-				walken_show_object, NULL);
 	} else {
 		trace_printf(
 			_("Filtered object walk with filterspec 'tree:1'.\n"));
-		parse_list_objects_filter(&filter_options, "tree:1");
-
-		traverse_commit_list_filtered(&filter_options, rev,
-			walken_show_commit, walken_show_object, NULL, NULL);
+		CALLOC_ARRAY(rev->filter, 1);
+		parse_list_objects_filter(rev->filter, "tree:1");
 	}
+	traverse_commit_list(rev, walken_show_commit,
+			     walken_show_object, NULL);
 ----
 
-`struct list_objects_filter_options` is usually built directly from a command
+The `rev->filter` member is usually built directly from a command
 line argument, so the module provides an easy way to build one from a string.
 Even though we aren't taking user input right now, we can still build one with
 a hardcoded string using `parse_list_objects_filter()`.
@@ -784,7 +772,7 @@ object:
 ----
 	...
 
-		traverse_commit_list_filtered(&filter_options, rev,
+		traverse_commit_list_filtered(rev,
 			walken_show_commit, walken_show_object, NULL, &omitted);
 
 	...
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 07/12] bundle: safely handle --objects option
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2022-03-07 21:50   ` [PATCH v2 06/12] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
@ 2022-03-07 21:50   ` Derrick Stolee via GitGitGadget
  2022-03-08  9:37     ` Ævar Arnfjörð Bjarmason
  2022-03-07 21:50   ` [PATCH v2 08/12] bundle: parse filter capability Derrick Stolee via GitGitGadget
                     ` (6 subsequent siblings)
  13 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Since 'git bundle' uses setup_revisions() to specify the object walk,
some options do not make sense to include during the pack-objects child
process. Further, these options are used for a call to
traverse_commit_list() which would then require a callback which is
currently NULL.

By populating the callback we prevent a segfault in the case of adding
the --objects flag. This is really a redundant statement because the
command is constructing a pack-file containing all objects in the
discovered commit range.

Adding --objects to a 'git bundle' command might cause a slower command,
but at least it will not have a hard failure when the user supplies this
option. We can also disable walking trees and blobs in advance of this
walk.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               | 10 +++++++++-
 t/t6020-bundle-misc.sh | 12 ++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/bundle.c b/bundle.c
index a0bb687b0f4..dc56db9a50a 100644
--- a/bundle.c
+++ b/bundle.c
@@ -451,6 +451,12 @@ struct bundle_prerequisites_info {
 	int fd;
 };
 
+
+static void ignore_object(struct object *obj, const char *v, void *data)
+{
+	/* Do nothing. */
+}
+
 static void write_bundle_prerequisites(struct commit *commit, void *data)
 {
 	struct bundle_prerequisites_info *bpi = data;
@@ -544,7 +550,9 @@ int create_bundle(struct repository *r, const char *path,
 		die("revision walk setup failed");
 	bpi.fd = bundle_fd;
 	bpi.pending = &revs_copy.pending;
-	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
+
+	revs.blob_objects = revs.tree_objects = 0;
+	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
 	object_array_remove_duplicates(&revs_copy.pending);
 
 	/* write bundle refs */
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index b13e8a52a93..6522401617d 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -475,4 +475,16 @@ test_expect_success 'clone from bundle' '
 	test_cmp expect actual
 '
 
+test_expect_success 'unfiltered bundle with --objects' '
+	git bundle create all-objects.bdl \
+		--all --objects &&
+	git bundle create all.bdl \
+		--all &&
+
+	# Compare the headers of these files.
+	head -11 all.bdl >expect &&
+	head -11 all-objects.bdl >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 08/12] bundle: parse filter capability
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2022-03-07 21:50   ` [PATCH v2 07/12] bundle: safely handle --objects option Derrick Stolee via GitGitGadget
@ 2022-03-07 21:50   ` Derrick Stolee via GitGitGadget
  2022-03-08  9:25     ` Ævar Arnfjörð Bjarmason
  2022-03-07 21:50   ` [PATCH v2 09/12] rev-list: move --filter parsing into revision.c Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  13 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The v3 bundle format has capabilities, allowing newer versions of Git to
create bundles with newer features. Older versions that do not
understand these new capabilities will fail with a helpful warning.

Create a new capability allowing Git to understand that the contained
pack-file is filtered according to some object filter. Typically, this
filter will be "blob:none" for a blobless partial clone.

This change teaches Git to parse this capability, place its value in the
bundle header, and demonstrate this understanding by adding a message to
'git bundle verify'.

Since we will use gently_parse_list_objects_filter() outside of
list-objects-filter-options.c, make it an external method and move its
API documentation to before its declaration.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/technical/bundle-format.txt | 11 ++++++++---
 bundle.c                                  | 17 ++++++++++++++++-
 bundle.h                                  |  3 +++
 list-objects-filter-options.c             | 17 +----------------
 list-objects-filter-options.h             | 20 ++++++++++++++++++++
 5 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/Documentation/technical/bundle-format.txt b/Documentation/technical/bundle-format.txt
index bac558d049a..b9be8644cf5 100644
--- a/Documentation/technical/bundle-format.txt
+++ b/Documentation/technical/bundle-format.txt
@@ -71,6 +71,11 @@ and the Git bundle v2 format cannot represent a shallow clone repository.
 == Capabilities
 
 Because there is no opportunity for negotiation, unknown capabilities cause 'git
-bundle' to abort.  The only known capability is `object-format`, which specifies
-the hash algorithm in use, and can take the same values as the
-`extensions.objectFormat` configuration value.
+bundle' to abort.
+
+* `object-format` specifies the hash algorithm in use, and can take the same
+  values as the `extensions.objectFormat` configuration value.
+
+* `filter` specifies an object filter as in the `--filter` option in
+  linkgit:git-rev-list[1]. The resulting pack-file must be marked as a
+  `.promisor` pack-file after it is unbundled.
diff --git a/bundle.c b/bundle.c
index dc56db9a50a..2afced4d991 100644
--- a/bundle.c
+++ b/bundle.c
@@ -11,7 +11,7 @@
 #include "run-command.h"
 #include "refs.h"
 #include "strvec.h"
-
+#include "list-objects-filter-options.h"
 
 static const char v2_bundle_signature[] = "# v2 git bundle\n";
 static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -33,6 +33,8 @@ void bundle_header_release(struct bundle_header *header)
 {
 	string_list_clear(&header->prerequisites, 1);
 	string_list_clear(&header->references, 1);
+	list_objects_filter_release(header->filter);
+	free(header->filter);
 }
 
 static int parse_capability(struct bundle_header *header, const char *capability)
@@ -45,6 +47,11 @@ static int parse_capability(struct bundle_header *header, const char *capability
 		header->hash_algo = &hash_algos[algo];
 		return 0;
 	}
+	if (skip_prefix(capability, "filter=", &arg)) {
+		CALLOC_ARRAY(header->filter, 1);
+		parse_list_objects_filter(header->filter, arg);
+		return 0;
+	}
 	return error(_("unknown capability '%s'"), capability);
 }
 
@@ -220,6 +227,8 @@ int verify_bundle(struct repository *r,
 	req_nr = revs.pending.nr;
 	setup_revisions(2, argv, &revs, NULL);
 
+	revs.filter = header->filter;
+
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 
@@ -259,6 +268,12 @@ int verify_bundle(struct repository *r,
 			     r->nr),
 			  r->nr);
 		list_refs(r, 0, NULL);
+
+		if (header->filter) {
+			printf_ln("The bundle uses this filter: %s",
+				  list_objects_filter_spec(header->filter));
+		}
+
 		r = &header->prerequisites;
 		if (!r->nr) {
 			printf_ln(_("The bundle records a complete history."));
diff --git a/bundle.h b/bundle.h
index 06009fe6b1f..eb026153d56 100644
--- a/bundle.h
+++ b/bundle.h
@@ -5,11 +5,14 @@
 #include "cache.h"
 #include "string-list.h"
 
+struct list_objects_filter_options;
+
 struct bundle_header {
 	unsigned version;
 	struct string_list prerequisites;
 	struct string_list references;
 	const struct git_hash_algo *hash_algo;
+	struct list_objects_filter_options *filter;
 };
 
 #define BUNDLE_HEADER_INIT \
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index fd8d59f653a..d8597cdee36 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -40,22 +40,7 @@ const char *list_object_filter_config_name(enum list_objects_filter_choice c)
 	BUG("list_object_filter_config_name: invalid argument '%d'", c);
 }
 
-/*
- * Parse value of the argument to the "filter" keyword.
- * On the command line this looks like:
- *       --filter=<arg>
- * and in the pack protocol as:
- *       "filter" SP <arg>
- *
- * The filter keyword will be used by many commands.
- * See Documentation/rev-list-options.txt for allowed values for <arg>.
- *
- * Capture the given arg as the "filter_spec".  This can be forwarded to
- * subordinate commands when necessary (although it's better to pass it through
- * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
- * convenience of the current command.
- */
-static int gently_parse_list_objects_filter(
+int gently_parse_list_objects_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg,
 	struct strbuf *errbuf)
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index da5b6737e27..f6fe6a3d2ca 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -72,6 +72,26 @@ struct list_objects_filter_options {
 /* Normalized command line arguments */
 #define CL_ARG__FILTER "filter"
 
+/*
+ * Parse value of the argument to the "filter" keyword.
+ * On the command line this looks like:
+ *       --filter=<arg>
+ * and in the pack protocol as:
+ *       "filter" SP <arg>
+ *
+ * The filter keyword will be used by many commands.
+ * See Documentation/rev-list-options.txt for allowed values for <arg>.
+ *
+ * Capture the given arg as the "filter_spec".  This can be forwarded to
+ * subordinate commands when necessary (although it's better to pass it through
+ * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
+ * convenience of the current command.
+ */
+int gently_parse_list_objects_filter(
+	struct list_objects_filter_options *filter_options,
+	const char *arg,
+	struct strbuf *errbuf);
+
 void list_objects_filter_die_if_populated(
 	struct list_objects_filter_options *filter_options);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 09/12] rev-list: move --filter parsing into revision.c
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2022-03-07 21:50   ` [PATCH v2 08/12] bundle: parse filter capability Derrick Stolee via GitGitGadget
@ 2022-03-07 21:50   ` Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 10/12] bundle: create filtered bundles Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Now that 'struct rev_info' has a 'filter' member and most consumers of
object filtering are using that member instead of an external struct,
move the parsing of the '--filter' option out of builtin/rev-list.c and
into revision.c.

This use within handle_revision_pseudo_opt() allows us to find the
option within setup_revisions() if the arguments are passed directly. In
the case of a command such as 'git blame', the arguments are first
scanned and checked with parse_revision_opt(), which complains about the
option, so 'git blame --filter=blob:none <file>' does not become valid
with this change.

Some commands, such as 'git diff' gain this option without having it
make an effect. And 'git diff --objects' was already possible, but does
not actually make sense in that builtin.

The key addition that is coming is 'git bundle create --filter=<X>' so
we can create bundles containing promisor packs. More work is required
to make them fully functional, but that will follow.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/rev-list.c | 15 ---------------
 revision.c         | 11 +++++++++++
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 3ab727817fd..640828149c5 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -591,21 +591,6 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			show_progress = arg;
 			continue;
 		}
-
-		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
-			if (!revs.filter)
-				CALLOC_ARRAY(revs.filter, 1);
-			parse_list_objects_filter(revs.filter, arg);
-			if (revs.filter->choice && !revs.blob_objects)
-				die(_("object filtering requires --objects"));
-			continue;
-		}
-		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-			if (!revs.filter)
-				CALLOC_ARRAY(revs.filter, 1);
-			list_objects_filter_set_no_filter(revs.filter);
-			continue;
-		}
 		if (!strcmp(arg, "--filter-provided-objects")) {
 			filter_provided_objects = 1;
 			continue;
diff --git a/revision.c b/revision.c
index ad4286fbdde..1d612c1c102 100644
--- a/revision.c
+++ b/revision.c
@@ -32,6 +32,7 @@
 #include "utf8.h"
 #include "bloom.h"
 #include "json-writer.h"
+#include "list-objects-filter-options.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -2669,6 +2670,14 @@ static int handle_revision_pseudo_opt(struct rev_info *revs,
 		revs->no_walk = 0;
 	} else if (!strcmp(arg, "--single-worktree")) {
 		revs->single_worktree = 1;
+	} else if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
+		if (!revs->filter)
+			CALLOC_ARRAY(revs->filter, 1);
+		parse_list_objects_filter(revs->filter, arg);
+	} else if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
+		if (!revs->filter)
+			CALLOC_ARRAY(revs->filter, 1);
+		list_objects_filter_set_no_filter(revs->filter);
 	} else {
 		return 0;
 	}
@@ -2872,6 +2881,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 		die("cannot combine --walk-reflogs with history-limiting options");
 	if (revs->rewrite_parents && revs->children.name)
 		die(_("options '%s' and '%s' cannot be used together"), "--parents", "--children");
+	if (revs->filter && revs->filter->choice && !revs->blob_objects)
+		die(_("object filtering requires --objects"));
 
 	/*
 	 * Limitations on the graph functionality
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 10/12] bundle: create filtered bundles
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2022-03-07 21:50   ` [PATCH v2 09/12] rev-list: move --filter parsing into revision.c Derrick Stolee via GitGitGadget
@ 2022-03-07 21:50   ` Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 11/12] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

A previous change allowed Git to parse bundles with the 'filter'
capability. Now, teach Git to create bundles with this option.

Some rearranging of code is required to get the option parsing in the
correct spot. There are now two reasons why we might need capabilities
(a new hash algorithm or an object filter) so that is pulled out into a
place where we can check both at the same time.

The --filter option is parsed as part of setup_revisions(), but it
expected the --objects flag, too. That flag is somewhat implied by 'git
bundle' because it creates a pack-file walking objects, but there is
also a walk that walks the revision range expecting only commits. Make
this parsing work by setting 'revs.tree_objects' and 'revs.blob_objects'
before the call to setup_revisions().

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-bundle.txt |  4 +--
 bundle.c                     | 55 ++++++++++++++++++++++++++++--------
 t/t6020-bundle-misc.sh       | 48 +++++++++++++++++++++++++++++++
 3 files changed, 94 insertions(+), 13 deletions(-)

diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
index 72ab8139052..831c4788a94 100644
--- a/Documentation/git-bundle.txt
+++ b/Documentation/git-bundle.txt
@@ -75,8 +75,8 @@ verify <file>::
 	cleanly to the current repository.  This includes checks on the
 	bundle format itself as well as checking that the prerequisite
 	commits exist and are fully linked in the current repository.
-	'git bundle' prints a list of missing commits, if any, and exits
-	with a non-zero status.
+	'git bundle' prints the bundle's object filter and its list of
+	missing commits, if any, and exits with a non-zero status.
 
 list-heads <file>::
 	Lists the references defined in the bundle.  If followed by a
diff --git a/bundle.c b/bundle.c
index 2afced4d991..0016d70310c 100644
--- a/bundle.c
+++ b/bundle.c
@@ -334,6 +334,9 @@ static int write_pack_data(int bundle_fd, struct rev_info *revs, struct strvec *
 		     "--stdout", "--thin", "--delta-base-offset",
 		     NULL);
 	strvec_pushv(&pack_objects.args, pack_options->v);
+	if (revs->filter)
+		strvec_pushf(&pack_objects.args, "--filter=%s",
+			     list_objects_filter_spec(revs->filter));
 	pack_objects.in = -1;
 	pack_objects.out = bundle_fd;
 	pack_objects.git_cmd = 1;
@@ -507,10 +510,37 @@ int create_bundle(struct repository *r, const char *path,
 	int bundle_to_stdout;
 	int ref_count = 0;
 	struct rev_info revs, revs_copy;
-	int min_version = the_hash_algo == &hash_algos[GIT_HASH_SHA1] ? 2 : 3;
+	int min_version = 2;
 	struct bundle_prerequisites_info bpi;
 	int i;
 
+	/* init revs to list objects for pack-objects later */
+	save_commit_buffer = 0;
+	repo_init_revisions(r, &revs, NULL);
+
+	/*
+	 * Pre-initialize the '--objects' flag so we can parse a
+	 * --filter option successfully.
+	 */
+	revs.tree_objects = revs.blob_objects = 1;
+
+	argc = setup_revisions(argc, argv, &revs, NULL);
+
+	/*
+	 * Reasons to require version 3:
+	 *
+	 * 1. @object-format is required because our hash algorithm is not
+	 *    SHA1.
+	 * 2. @filter is required because we parsed an object filter.
+	 */
+	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] || revs.filter)
+		min_version = 3;
+
+	if (argc > 1) {
+		error(_("unrecognized argument: %s"), argv[1]);
+		goto err;
+	}
+
 	bundle_to_stdout = !strcmp(path, "-");
 	if (bundle_to_stdout)
 		bundle_fd = 1;
@@ -533,17 +563,14 @@ int create_bundle(struct repository *r, const char *path,
 		write_or_die(bundle_fd, capability, strlen(capability));
 		write_or_die(bundle_fd, the_hash_algo->name, strlen(the_hash_algo->name));
 		write_or_die(bundle_fd, "\n", 1);
-	}
-
-	/* init revs to list objects for pack-objects later */
-	save_commit_buffer = 0;
-	repo_init_revisions(r, &revs, NULL);
 
-	argc = setup_revisions(argc, argv, &revs, NULL);
-
-	if (argc > 1) {
-		error(_("unrecognized argument: %s"), argv[1]);
-		goto err;
+		if (revs.filter) {
+			const char *value = expand_list_objects_filter_spec(revs.filter);
+			capability = "@filter=";
+			write_or_die(bundle_fd, capability, strlen(capability));
+			write_or_die(bundle_fd, value, strlen(value));
+			write_or_die(bundle_fd, "\n", 1);
+		}
 	}
 
 	/* save revs.pending in revs_copy for later use */
@@ -566,6 +593,12 @@ int create_bundle(struct repository *r, const char *path,
 	bpi.fd = bundle_fd;
 	bpi.pending = &revs_copy.pending;
 
+	/*
+	 * Nullify the filter here, and any object walking. We only care
+	 * about commits and tags here. The revs_copy has the right
+	 * instances of these values.
+	 */
+	revs.filter = NULL;
 	revs.blob_objects = revs.tree_objects = 0;
 	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
 	object_array_remove_duplicates(&revs_copy.pending);
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 6522401617d..f10cf011519 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -487,4 +487,52 @@ test_expect_success 'unfiltered bundle with --objects' '
 	test_cmp expect actual
 '
 
+for filter in "blob:none" "tree:0" "tree:1" "blob:limit=100"
+do
+	test_expect_success "filtered bundle: $filter" '
+		test_when_finished rm -rf .git/objects/pack cloned unbundled &&
+		git bundle create partial.bdl \
+			--all \
+			--filter=$filter &&
+
+		git bundle verify partial.bdl >unfiltered &&
+		make_user_friendly_and_stable_output <unfiltered >actual &&
+
+		cat >expect <<-EOF &&
+		The bundle contains these 10 refs:
+		<COMMIT-P> refs/heads/main
+		<COMMIT-N> refs/heads/release
+		<COMMIT-D> refs/heads/topic/1
+		<COMMIT-H> refs/heads/topic/2
+		<COMMIT-D> refs/pull/1/head
+		<COMMIT-G> refs/pull/2/head
+		<TAG-1> refs/tags/v1
+		<TAG-2> refs/tags/v2
+		<TAG-3> refs/tags/v3
+		<COMMIT-P> HEAD
+		The bundle uses this filter: $filter
+		The bundle records a complete history.
+		EOF
+		test_cmp expect actual &&
+
+		test_config uploadpack.allowfilter 1 &&
+		test_config uploadpack.allowanysha1inwant 1 &&
+		git clone --no-local --filter=$filter --bare "file://$(pwd)" cloned &&
+
+		git init unbundled &&
+		git -C unbundled bundle unbundle ../partial.bdl >ref-list.txt &&
+
+		# Count the same number of reachable objects.
+		reflist=$(git for-each-ref --format="%(objectname)") &&
+		git rev-list --objects --filter=$filter --missing=allow-any \
+			$reflist >expect &&
+		for repo in cloned unbundled
+		do
+			git -C $repo rev-list --objects --missing=allow-any \
+				$reflist >actual &&
+			test_cmp expect actual || return 1
+		done
+	'
+done
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 11/12] bundle: unbundle promisor packs
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (9 preceding siblings ...)
  2022-03-07 21:50   ` [PATCH v2 10/12] bundle: create filtered bundles Derrick Stolee via GitGitGadget
@ 2022-03-07 21:50   ` Derrick Stolee via GitGitGadget
  2022-03-07 21:50   ` [PATCH v2 12/12] clone: fail gracefully when cloning filtered bundle Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

In order to have a valid pack-file after unbundling a bundle that has
the 'filter' capability, we need to generate a .promisor file. The
bundle does not promise _where_ the objects can be found, but we can
expect that these bundles will be unbundled in repositories with
appropriate promisor remotes that can find those missing objects.

Use the 'git index-pack --promisor=<message>' option to create this
.promisor file. Add "from-bundle" as the message to help anyone diagnose
issues with these promisor packs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               | 4 ++++
 t/t6020-bundle-misc.sh | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/bundle.c b/bundle.c
index 0016d70310c..9ca6a7eb1c2 100644
--- a/bundle.c
+++ b/bundle.c
@@ -630,6 +630,10 @@ int unbundle(struct repository *r, struct bundle_header *header,
 	struct child_process ip = CHILD_PROCESS_INIT;
 	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
 
+	/* If there is a filter, then we need to create the promisor pack. */
+	if (header->filter)
+		strvec_push(&ip.args, "--promisor=from-bundle");
+
 	if (extra_index_pack_args) {
 		strvec_pushv(&ip.args, extra_index_pack_args->v);
 		strvec_clear(extra_index_pack_args);
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index f10cf011519..42e8cf2eb29 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -521,6 +521,8 @@ do
 
 		git init unbundled &&
 		git -C unbundled bundle unbundle ../partial.bdl >ref-list.txt &&
+		ls unbundled/.git/objects/pack/pack-*.promisor >promisor &&
+		test_line_count = 1 promisor &&
 
 		# Count the same number of reachable objects.
 		reflist=$(git for-each-ref --format="%(objectname)") &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 12/12] clone: fail gracefully when cloning filtered bundle
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (10 preceding siblings ...)
  2022-03-07 21:50   ` [PATCH v2 11/12] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
@ 2022-03-07 21:50   ` Derrick Stolee via GitGitGadget
  2022-03-07 22:11   ` [PATCH v2 00/12] Partial bundles Junio C Hamano
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  13 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-07 21:50 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Users can create a new repository using 'git clone <bundle-file>'. The
new "@filter" capability for bundles means that we can generate a bundle
that does not contain all reachable objects, even if the header has no
negative commit OIDs.

It is feasible to think that we could make a filtered bundle work with
the command

  git clone --filter=$filter --bare <bundle-file>

or possibly replacing --bare with --no-checkout. However, this requires
having some repository-global config that specifies the specified object
filter and notifies Git about the existence of promisor pack-files.
Without a remote, that is currently impossible.

As a stop-gap, parse the bundle header during 'git clone' and die() with
a helpful error message instead of the current behavior of failing due
to "missing objects".

Most of the existing logic for handling bundle clones actually happens
in fetch-pack.c, but that logic is the same as if the user specified
'git fetch <bundle>', so we want to avoid failing to fetch a filtered
bundle when in an existing repository that has the proper config set up
for at least one remote.

Carefully comment around the test that this is not the desired long-term
behavior of 'git clone' in this case, but instead that we need to do
more work before that is possible.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/clone.c        | 13 +++++++++++++
 t/t6020-bundle-misc.sh | 12 ++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 9c29093b352..1c4a3143802 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -33,6 +33,7 @@
 #include "packfile.h"
 #include "list-objects-filter-options.h"
 #include "hook.h"
+#include "bundle.h"
 
 /*
  * Overall FIXMEs:
@@ -1138,6 +1139,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		warning(_("--local is ignored"));
 	transport->cloning = 1;
 
+	if (is_bundle) {
+		struct bundle_header header = { 0 };
+		int fd = read_bundle_header(path, &header);
+		int has_filter = !!header.filter;
+
+		if (fd > 0)
+			close(fd);
+		bundle_header_release(&header);
+		if (has_filter)
+			die(_("cannot clone from filtered bundle"));
+	}
+
 	transport_set_option(transport, TRANS_OPT_KEEP, "yes");
 
 	if (reject_shallow)
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 42e8cf2eb29..5160cb0a75c 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -537,4 +537,16 @@ do
 	'
 done
 
+# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered
+# bundle, but that requires a change to promisor/filter config options.
+# For now, we fail gracefully with a helpful error. This behavior can be
+# changed in the future to succeed as much as possible.
+test_expect_success 'cloning from filtered bundle has useful error' '
+	git bundle create partial.bdl \
+		--all \
+		--filter=blob:none &&
+	test_must_fail git clone --bare partial.bdl partial 2>err &&
+	grep "cannot clone from filtered bundle" err
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 00/12] Partial bundles
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (11 preceding siblings ...)
  2022-03-07 21:50   ` [PATCH v2 12/12] clone: fail gracefully when cloning filtered bundle Derrick Stolee via GitGitGadget
@ 2022-03-07 22:11   ` Junio C Hamano
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  13 siblings, 0 replies; 114+ messages in thread
From: Junio C Hamano @ 2022-03-07 22:11 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Jeff Hostetler,
	Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> While discussing bundle-URIs [1], it came to my attention that bundles have
> no way to specify an object filter, so bundles cannot be used with partial
> clones.

Looking good.  Will replace and queue.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 08/12] bundle: parse filter capability
  2022-03-07 21:50   ` [PATCH v2 08/12] bundle: parse filter capability Derrick Stolee via GitGitGadget
@ 2022-03-08  9:25     ` Ævar Arnfjörð Bjarmason
  2022-03-08 13:43       ` Derrick Stolee
  0 siblings, 1 reply; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-08  9:25 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Jeff Hostetler,
	Derrick Stolee


On Mon, Mar 07 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The v3 bundle format has capabilities, allowing newer versions of Git to
> create bundles with newer features. Older versions that do not
> understand these new capabilities will fail with a helpful warning.
>
> Create a new capability allowing Git to understand that the contained
> pack-file is filtered according to some object filter. Typically, this
> filter will be "blob:none" for a blobless partial clone.
>
> This change teaches Git to parse this capability, place its value in the
> bundle header, and demonstrate this understanding by adding a message to
> 'git bundle verify'.
>
> Since we will use gently_parse_list_objects_filter() outside of
> list-objects-filter-options.c, make it an external method and move its
> API documentation to before its declaration.
> [...]
> --- a/bundle.c
> +++ b/bundle.c
> @@ -11,7 +11,7 @@
>  #include "run-command.h"
>  #include "refs.h"
>  #include "strvec.h"
> -
> +#include "list-objects-filter-options.h"
>  
>  static const char v2_bundle_signature[] = "# v2 git bundle\n";
>  static const char v3_bundle_signature[] = "# v3 git bundle\n";
> @@ -33,6 +33,8 @@ void bundle_header_release(struct bundle_header *header)
>  {
>  	string_list_clear(&header->prerequisites, 1);
>  	string_list_clear(&header->references, 1);
> +	list_objects_filter_release(header->filter);
> +	free(header->filter);
>  }
>  
>  static int parse_capability(struct bundle_header *header, const char *capability)
> @@ -45,6 +47,11 @@ static int parse_capability(struct bundle_header *header, const char *capability
>  		header->hash_algo = &hash_algos[algo];
>  		return 0;
>  	}
> +	if (skip_prefix(capability, "filter=", &arg)) {
> +		CALLOC_ARRAY(header->filter, 1);
> +		parse_list_objects_filter(header->filter, arg);
> +		return 0;
> +	}
>  	return error(_("unknown capability '%s'"), capability);
>  }
>  

Re the comment I had on the v1 about embedding this data in the struct
instead:
https://lore.kernel.org/git/220307.86y21lycne.gmgdl@evledraar.gmail.com/

The below diff passes all your tests, i.e. re using NULL as a marker I
think you may have missed that the API already has a LOFC_DISABLED for
this (and grepping reveals similar API use of it).

I'm not 100% sure it's correct, but if it isn't that's also going to
suggest missing test coverage in this series.

In any case you want the BUNDLE_HEADER_INIT change, your version is
buggy in making that header use NODUP strings by hardcoding { 0 }.

diff --git a/builtin/clone.c b/builtin/clone.c
index 52e50f17baf..000379eea7f 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1172,9 +1172,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	transport->cloning = 1;
 
 	if (is_bundle) {
-		struct bundle_header header = { 0 };
+		struct bundle_header header = BUNDLE_HEADER_INIT;
 		int fd = read_bundle_header(path, &header);
-		int has_filter = !!header.filter;
+		int has_filter = header.filter.choice != LOFC_DISABLED;
 
 		if (fd > 0)
 			close(fd);
diff --git a/bundle.c b/bundle.c
index 9ca6a7eb1c2..3846108f7a6 100644
--- a/bundle.c
+++ b/bundle.c
@@ -33,8 +33,7 @@ void bundle_header_release(struct bundle_header *header)
 {
 	string_list_clear(&header->prerequisites, 1);
 	string_list_clear(&header->references, 1);
-	list_objects_filter_release(header->filter);
-	free(header->filter);
+	list_objects_filter_release(&header->filter);
 }
 
 static int parse_capability(struct bundle_header *header, const char *capability)
@@ -48,8 +47,7 @@ static int parse_capability(struct bundle_header *header, const char *capability
 		return 0;
 	}
 	if (skip_prefix(capability, "filter=", &arg)) {
-		CALLOC_ARRAY(header->filter, 1);
-		parse_list_objects_filter(header->filter, arg);
+		parse_list_objects_filter(&header->filter, arg);
 		return 0;
 	}
 	return error(_("unknown capability '%s'"), capability);
@@ -227,7 +225,7 @@ int verify_bundle(struct repository *r,
 	req_nr = revs.pending.nr;
 	setup_revisions(2, argv, &revs, NULL);
 
-	revs.filter = header->filter;
+	revs.filter = &header->filter;
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
@@ -269,9 +267,9 @@ int verify_bundle(struct repository *r,
 			  r->nr);
 		list_refs(r, 0, NULL);
 
-		if (header->filter) {
+		if (header->filter.choice != LOFC_DISABLED) {
 			printf_ln("The bundle uses this filter: %s",
-				  list_objects_filter_spec(header->filter));
+				  list_objects_filter_spec(&header->filter));
 		}
 
 		r = &header->prerequisites;
@@ -631,7 +629,7 @@ int unbundle(struct repository *r, struct bundle_header *header,
 	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
 
 	/* If there is a filter, then we need to create the promisor pack. */
-	if (header->filter)
+	if (header->filter.choice != LOFC_DISABLED)
 		strvec_push(&ip.args, "--promisor=from-bundle");
 
 	if (extra_index_pack_args) {
diff --git a/bundle.h b/bundle.h
index eb026153d56..7fef2108f43 100644
--- a/bundle.h
+++ b/bundle.h
@@ -4,15 +4,14 @@
 #include "strvec.h"
 #include "cache.h"
 #include "string-list.h"
-
-struct list_objects_filter_options;
+#include "list-objects-filter-options.h"
 
 struct bundle_header {
 	unsigned version;
 	struct string_list prerequisites;
 	struct string_list references;
 	const struct git_hash_algo *hash_algo;
-	struct list_objects_filter_options *filter;
+	struct list_objects_filter_options filter;
 };
 
 #define BUNDLE_HEADER_INIT \

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 07/12] bundle: safely handle --objects option
  2022-03-07 21:50   ` [PATCH v2 07/12] bundle: safely handle --objects option Derrick Stolee via GitGitGadget
@ 2022-03-08  9:37     ` Ævar Arnfjörð Bjarmason
  2022-03-08 13:45       ` Derrick Stolee
  0 siblings, 1 reply; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-08  9:37 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Jeff Hostetler,
	Derrick Stolee


On Mon, Mar 07 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> Since 'git bundle' uses setup_revisions() to specify the object walk,
> some options do not make sense to include during the pack-objects child
> process. Further, these options are used for a call to
> traverse_commit_list() which would then require a callback which is
> currently NULL.
>
> By populating the callback we prevent a segfault in the case of adding
> the --objects flag. This is really a redundant statement because the
> command is constructing a pack-file containing all objects in the
> discovered commit range.
>
> Adding --objects to a 'git bundle' command might cause a slower command,
> but at least it will not have a hard failure when the user supplies this
> option. We can also disable walking trees and blobs in advance of this
> walk.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  bundle.c               | 10 +++++++++-
>  t/t6020-bundle-misc.sh | 12 ++++++++++++
>  2 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/bundle.c b/bundle.c
> index a0bb687b0f4..dc56db9a50a 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -451,6 +451,12 @@ struct bundle_prerequisites_info {
>  	int fd;
>  };
>  
> +
> +static void ignore_object(struct object *obj, const char *v, void *data)
> +{
> +	/* Do nothing. */
> +}
> +
>  static void write_bundle_prerequisites(struct commit *commit, void *data)
>  {
>  	struct bundle_prerequisites_info *bpi = data;
> @@ -544,7 +550,9 @@ int create_bundle(struct repository *r, const char *path,
>  		die("revision walk setup failed");
>  	bpi.fd = bundle_fd;
>  	bpi.pending = &revs_copy.pending;
> -	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
> +
> +	revs.blob_objects = revs.tree_objects = 0;
> +	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
>  	object_array_remove_duplicates(&revs_copy.pending);
>  
>  	/* write bundle refs */
> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
> index b13e8a52a93..6522401617d 100755
> --- a/t/t6020-bundle-misc.sh
> +++ b/t/t6020-bundle-misc.sh
> @@ -475,4 +475,16 @@ test_expect_success 'clone from bundle' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'unfiltered bundle with --objects' '
> +	git bundle create all-objects.bdl \
> +		--all --objects &&
> +	git bundle create all.bdl \
> +		--all &&
> +
> +	# Compare the headers of these files.
> +	head -11 all.bdl >expect &&
> +	head -11 all-objects.bdl >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done

Re this comment on v1: https://lore.kernel.org/git/220307.86fsntzsda.gmgdl@evledraar.gmail.com/

This series also passes your tests with this on top:
	
	diff --git a/bundle.c b/bundle.c
	index 3846108f7a6..1f022f53336 100644
	--- a/bundle.c
	+++ b/bundle.c
	@@ -468,11 +468,6 @@ struct bundle_prerequisites_info {
	 };
	 
	 
	-static void ignore_object(struct object *obj, const char *v, void *data)
	-{
	-	/* Do nothing. */
	-}
	-
	 static void write_bundle_prerequisites(struct commit *commit, void *data)
	 {
	 	struct bundle_prerequisites_info *bpi = data;
	@@ -598,7 +593,7 @@ int create_bundle(struct repository *r, const char *path,
	 	 */
	 	revs.filter = NULL;
	 	revs.blob_objects = revs.tree_objects = 0;
	-	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
	+	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
	 	object_array_remove_duplicates(&revs_copy.pending);
	 
	 	/* write bundle refs */
	diff --git a/list-objects.c b/list-objects.c
	index 9422625b39e..d44a1db2262 100644
	--- a/list-objects.c
	+++ b/list-objects.c
	@@ -227,7 +227,7 @@ static void process_tag(struct traversal_context *ctx,
	 					       ctx->filter);
	 	if (r & LOFR_MARK_SEEN)
	 		tag->object.flags |= SEEN;
	-	if (r & LOFR_DO_SHOW)
	+	if (r & LOFR_DO_SHOW && ctx->show_object)
	 		ctx->show_object(&tag->object, name, ctx->show_data);
	 }
	 

Aside from whether that's a good idea, doesn't that at least point to
missing test coverage here, see traverse_non_commits() and other paths
in list-objects.c that'll call ctx->show_object().

I think an actually sensible patch for this is the below, i.e. the API
is conflating "do show" with "should we show AND we have a callback?":
	
	diff --git a/bundle.c b/bundle.c
	index 3846108f7a6..1f022f53336 100644
	--- a/bundle.c
	+++ b/bundle.c
	@@ -468,11 +468,6 @@ struct bundle_prerequisites_info {
	 };
	 
	 
	-static void ignore_object(struct object *obj, const char *v, void *data)
	-{
	-	/* Do nothing. */
	-}
	-
	 static void write_bundle_prerequisites(struct commit *commit, void *data)
	 {
	 	struct bundle_prerequisites_info *bpi = data;
	@@ -598,7 +593,7 @@ int create_bundle(struct repository *r, const char *path,
	 	 */
	 	revs.filter = NULL;
	 	revs.blob_objects = revs.tree_objects = 0;
	-	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
	+	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
	 	object_array_remove_duplicates(&revs_copy.pending);
	 
	 	/* write bundle refs */
	diff --git a/list-objects.c b/list-objects.c
	index 9422625b39e..1725cb252a9 100644
	--- a/list-objects.c
	+++ b/list-objects.c
	@@ -21,6 +21,22 @@ struct traversal_context {
	 	struct filter *filter;
	 };
	 
	+static void show_commit(struct traversal_context *ctx, struct commit *commit,
	+			void *data)
	+{
	+	if (!ctx->show_commit)
	+		return;
	+	ctx->show_commit(commit, data);
	+}
	+
	+static void show_object(struct traversal_context *ctx, struct object *object,
	+			const char *path, void *data)
	+{
	+	if (!ctx->show_object)
	+		return;
	+	ctx->show_object(object, path, data);
	+}
	+
	 static void process_blob(struct traversal_context *ctx,
	 			 struct blob *blob,
	 			 struct strbuf *path,
	@@ -60,7 +76,7 @@ static void process_blob(struct traversal_context *ctx,
	 	if (r & LOFR_MARK_SEEN)
	 		obj->flags |= SEEN;
	 	if (r & LOFR_DO_SHOW)
	-		ctx->show_object(obj, path->buf, ctx->show_data);
	+		show_object(ctx, obj, path->buf, ctx->show_data);
	 	strbuf_setlen(path, pathlen);
	 }
	 
	@@ -194,7 +210,7 @@ static void process_tree(struct traversal_context *ctx,
	 	if (r & LOFR_MARK_SEEN)
	 		obj->flags |= SEEN;
	 	if (r & LOFR_DO_SHOW)
	-		ctx->show_object(obj, base->buf, ctx->show_data);
	+		show_object(ctx, obj, base->buf, ctx->show_data);
	 	if (base->len)
	 		strbuf_addch(base, '/');
	 
	@@ -210,7 +226,7 @@ static void process_tree(struct traversal_context *ctx,
	 	if (r & LOFR_MARK_SEEN)
	 		obj->flags |= SEEN;
	 	if (r & LOFR_DO_SHOW)
	-		ctx->show_object(obj, base->buf, ctx->show_data);
	+		show_object(ctx, obj, base->buf, ctx->show_data);
	 
	 	strbuf_setlen(base, baselen);
	 	free_tree_buffer(tree);
	@@ -228,7 +244,7 @@ static void process_tag(struct traversal_context *ctx,
	 	if (r & LOFR_MARK_SEEN)
	 		tag->object.flags |= SEEN;
	 	if (r & LOFR_DO_SHOW)
	-		ctx->show_object(&tag->object, name, ctx->show_data);
	+		show_object(ctx, &tag->object, name, ctx->show_data);
	 }
	 
	 static void mark_edge_parents_uninteresting(struct commit *commit,
	@@ -402,7 +418,7 @@ static void do_traverse(struct traversal_context *ctx)
	 		if (r & LOFR_MARK_SEEN)
	 			commit->object.flags |= SEEN;
	 		if (r & LOFR_DO_SHOW)
	-			ctx->show_commit(commit, ctx->show_data);
	+			show_commit(ctx, commit, ctx->show_data);
	 
	 		if (ctx->revs->tree_blobs_in_commit_order)
	 			/*

I think that'll do what you want, and also seems to set us up for safer
API use going forward, i.e. we have a couple of NULL-passing callers
already.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 08/12] bundle: parse filter capability
  2022-03-08  9:25     ` Ævar Arnfjörð Bjarmason
@ 2022-03-08 13:43       ` Derrick Stolee
  0 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee @ 2022-03-08 13:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Jeff Hostetler

On 3/8/2022 4:25 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Mar 07 2022, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> The v3 bundle format has capabilities, allowing newer versions of Git to
>> create bundles with newer features. Older versions that do not
>> understand these new capabilities will fail with a helpful warning.
>>
>> Create a new capability allowing Git to understand that the contained
>> pack-file is filtered according to some object filter. Typically, this
>> filter will be "blob:none" for a blobless partial clone.
>>
>> This change teaches Git to parse this capability, place its value in the
>> bundle header, and demonstrate this understanding by adding a message to
>> 'git bundle verify'.
>>
>> Since we will use gently_parse_list_objects_filter() outside of
>> list-objects-filter-options.c, make it an external method and move its
>> API documentation to before its declaration.
>> [...]
>> --- a/bundle.c
>> +++ b/bundle.c
>> @@ -11,7 +11,7 @@
>>  #include "run-command.h"
>>  #include "refs.h"
>>  #include "strvec.h"
>> -
>> +#include "list-objects-filter-options.h"
>>  
>>  static const char v2_bundle_signature[] = "# v2 git bundle\n";
>>  static const char v3_bundle_signature[] = "# v3 git bundle\n";
>> @@ -33,6 +33,8 @@ void bundle_header_release(struct bundle_header *header)
>>  {
>>  	string_list_clear(&header->prerequisites, 1);
>>  	string_list_clear(&header->references, 1);
>> +	list_objects_filter_release(header->filter);
>> +	free(header->filter);
>>  }
>>  
>>  static int parse_capability(struct bundle_header *header, const char *capability)
>> @@ -45,6 +47,11 @@ static int parse_capability(struct bundle_header *header, const char *capability
>>  		header->hash_algo = &hash_algos[algo];
>>  		return 0;
>>  	}
>> +	if (skip_prefix(capability, "filter=", &arg)) {
>> +		CALLOC_ARRAY(header->filter, 1);
>> +		parse_list_objects_filter(header->filter, arg);
>> +		return 0;
>> +	}
>>  	return error(_("unknown capability '%s'"), capability);
>>  }
>>  
> 
> Re the comment I had on the v1 about embedding this data in the struct
> instead:
> https://lore.kernel.org/git/220307.86y21lycne.gmgdl@evledraar.gmail.com/
> 
> The below diff passes all your tests, i.e. re using NULL as a marker I
> think you may have missed that the API already has a LOFC_DISABLED for
> this (and grepping reveals similar API use of it).

I did miss this LOFC_DISABLED use, which must be the correct way to
interpret an "empty" filter set (re: earlier concerns that a .nr == 0
was used as a BUG() statement in some places).

> I'm not 100% sure it's correct, but if it isn't that's also going to
> suggest missing test coverage in this series.
> 
> In any case you want the BUNDLE_HEADER_INIT change, your version is
> buggy in making that header use NODUP strings by hardcoding { 0 }.

Thanks for pointing this out.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 07/12] bundle: safely handle --objects option
  2022-03-08  9:37     ` Ævar Arnfjörð Bjarmason
@ 2022-03-08 13:45       ` Derrick Stolee
  2022-03-08 13:53         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee @ 2022-03-08 13:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Jeff Hostetler

On 3/8/2022 4:37 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> Re this comment on v1: https://lore.kernel.org/git/220307.86fsntzsda.gmgdl@evledraar.gmail.com/
> Aside from whether that's a good idea, doesn't that at least point to
> missing test coverage here, see traverse_non_commits() and other paths
> in list-objects.c that'll call ctx->show_object().
> 
> I think an actually sensible patch for this is the below, i.e. the API
> is conflating "do show" with "should we show AND we have a callback?":
...
> I think that'll do what you want, and also seems to set us up for safer
> API use going forward, i.e. we have a couple of NULL-passing callers
> already.

Squashing this change into the commit makes most sense to attribute
authorship to you. May I forge your sign-off in that patch for v3?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 07/12] bundle: safely handle --objects option
  2022-03-08 13:45       ` Derrick Stolee
@ 2022-03-08 13:53         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-08 13:53 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, stolee, gitster, zhiyou.jx,
	jonathantanmy, Jeff Hostetler


On Tue, Mar 08 2022, Derrick Stolee wrote:

> On 3/8/2022 4:37 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> Re this comment on v1: https://lore.kernel.org/git/220307.86fsntzsda.gmgdl@evledraar.gmail.com/
>> Aside from whether that's a good idea, doesn't that at least point to
>> missing test coverage here, see traverse_non_commits() and other paths
>> in list-objects.c that'll call ctx->show_object().
>> 
>> I think an actually sensible patch for this is the below, i.e. the API
>> is conflating "do show" with "should we show AND we have a callback?":
> ...
>> I think that'll do what you want, and also seems to set us up for safer
>> API use going forward, i.e. we have a couple of NULL-passing callers
>> already.
>
> Squashing this change into the commit makes most sense to attribute
> authorship to you. May I forge your sign-off in that patch for v3?

Sounds good! :)

(Also, for anything inline or throw-away like that that I post on list
it's safe to assume my Signed-off-by)

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v3 00/12] Partial bundles
  2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (12 preceding siblings ...)
  2022-03-07 22:11   ` [PATCH v2 00/12] Partial bundles Junio C Hamano
@ 2022-03-08 14:39   ` Derrick Stolee via GitGitGadget
  2022-03-08 14:39     ` [PATCH v3 01/12] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
                       ` (12 more replies)
  13 siblings, 13 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee

While discussing bundle-URIs [1], it came to my attention that bundles have
no way to specify an object filter, so bundles cannot be used with partial
clones.

[1]
https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/

This series provides a way to fix that by adding a 'filter' capability to
the bundle file format and allowing one to create a partial bundle with 'git
bundle create --filter=blob:none '.

There are a couple things that I want to point out about this implementation
that could use some high-level feedback:

 1. I moved the '--filter' parsing into setup_revisions() instead of adding
    another place to parse it. This works for 'git bundle' but it also
    allows it to be parsed successfully in commands such as 'git diff' which
    doesn't make sense. Options such as '--objects' are already being parsed
    there, and they don't make sense either, so I want some thoughts on
    this.

 2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
    filtered bundle, then the clone will fail with a message such as

fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
remote did not send all necessary objects

This might be fine. We don't expect users to clone partial bundles or fetch
partial bundles into an unfiltered repo and these failures are expected. It
is possible that we could put in custom logic to fail faster by reading the
bundle header for a filter.

Generally, the idea is to open this up as a potential way to bootstrap a
clone of a partial clone using a set of precomputed partial bundles.


Updates in v3
=============

 * 'struct bundle_header' now has 'filter' embedded statically, using
   filter.choice to indicate if it is an empty filter.

 * list-objects.c is now more robust to NULL function pointers.


Updates in v2
=============

Thanks for the reviews, Jeff, Junio, and Ævar!

 * Commit message typos and grammar are improved.

 * Grammar in MyFirstObjectWalk.txt is improved.

 * Unnecessary line wrapping is unwrapped.

 * Final test to check unbundled repo is made more rigorous.

 * The new 'filter' capability is added to
   Documentation/technical/bundle-format.txt

 * Expanded docs for 'git bundle verify'.

 * Moved API docs gently_parse_list_objects_filter() to header.

 * Test name swaps '' with "" to evaluate $filter.

 * Added a new patch that helps git clone <bundle> fail gracefully when
   <bundle> is has a filter capability.

Thanks, -Stolee

Derrick Stolee (11):
  index-pack: document and test the --promisor option
  revision: put object filter into struct rev_info
  pack-objects: use rev.filter when possible
  pack-bitmap: drop filter in prepare_bitmap_walk()
  list-objects: consolidate traverse_commit_list[_filtered]
  MyFirstObjectWalk: update recommended usage
  bundle: parse filter capability
  rev-list: move --filter parsing into revision.c
  bundle: create filtered bundles
  bundle: unbundle promisor packs
  clone: fail gracefully when cloning filtered bundle

Ævar Arnfjörð Bjarmason (1):
  list-objects: handle NULL function pointers

 Documentation/MyFirstObjectWalk.txt       | 44 +++++--------
 Documentation/git-bundle.txt              |  4 +-
 Documentation/git-index-pack.txt          |  8 +++
 Documentation/technical/bundle-format.txt | 11 +++-
 builtin/clone.c                           | 13 ++++
 builtin/pack-objects.c                    |  9 +--
 builtin/rev-list.c                        | 29 +++------
 bundle.c                                  | 76 +++++++++++++++++++----
 bundle.h                                  |  2 +
 list-objects-filter-options.c             | 17 +----
 list-objects-filter-options.h             | 20 ++++++
 list-objects.c                            | 52 +++++++++-------
 list-objects.h                            | 12 +++-
 pack-bitmap.c                             | 24 ++++---
 pack-bitmap.h                             |  2 -
 reachable.c                               |  2 +-
 revision.c                                | 11 ++++
 revision.h                                |  4 ++
 t/t5300-pack-object.sh                    |  4 +-
 t/t6020-bundle-misc.sh                    | 74 ++++++++++++++++++++++
 20 files changed, 290 insertions(+), 128 deletions(-)


base-commit: 45fe28c951c3e70666ee4ef8379772851a8e4d32
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1159%2Fderrickstolee%2Fbundle%2Fpartial-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1159/derrickstolee/bundle/partial-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1159

Range-diff vs v2:

  1:  a1eb4dceb0b =  1:  a1eb4dceb0b index-pack: document and test the --promisor option
  2:  3a88c99d9bc =  2:  3a88c99d9bc revision: put object filter into struct rev_info
  3:  d5edb193229 =  3:  d5edb193229 pack-objects: use rev.filter when possible
  4:  888774f6f28 =  4:  888774f6f28 pack-bitmap: drop filter in prepare_bitmap_walk()
  5:  bcb76a065bf =  5:  bcb76a065bf list-objects: consolidate traverse_commit_list[_filtered]
  6:  efc03168818 =  6:  efc03168818 MyFirstObjectWalk: update recommended usage
  7:  19694d5b255 !  7:  782182a26e3 bundle: safely handle --objects option
     @@
       ## Metadata ##
     -Author: Derrick Stolee <derrickstolee@github.com>
     +Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
      
       ## Commit message ##
     -    bundle: safely handle --objects option
     +    list-objects: handle NULL function pointers
      
     -    Since 'git bundle' uses setup_revisions() to specify the object walk,
     -    some options do not make sense to include during the pack-objects child
     -    process. Further, these options are used for a call to
     -    traverse_commit_list() which would then require a callback which is
     -    currently NULL.
     +    If a caller to traverse_commit_list() specifies the options for the
     +    --objects flag but does not specify a show_object function pointer, the
     +    result is a segfault. This is currently visible by running 'git bundle
     +    create --objects HEAD'.
      
     -    By populating the callback we prevent a segfault in the case of adding
     -    the --objects flag. This is really a redundant statement because the
     -    command is constructing a pack-file containing all objects in the
     -    discovered commit range.
     +    We could fix this problem by supplying a no-op callback in
     +    builtin/bundle.c, but that only solves the problem for one builtin,
     +    leaving this segfault open for other callers.
      
     -    Adding --objects to a 'git bundle' command might cause a slower command,
     -    but at least it will not have a hard failure when the user supplies this
     -    option. We can also disable walking trees and blobs in advance of this
     -    walk.
     +    Replace all callers of the show_commit and show_object function pointers
     +    in list-objects.c to be local methods show_commit() and show_object()
     +    which check that the given contex has non-NULL functions before passing
     +    the necessary data. One extra benefit is that it reduces duplication
     +    due to passing ctx->show_data to every caller.
      
     +    Test that this segfault no longer occurs for 'git bundle'.
     +
     +    Co-authored-by: Derrick Stolee <derrickstolee@github.com>
     +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## bundle.c ##
     -@@ bundle.c: struct bundle_prerequisites_info {
     - 	int fd;
     - };
     - 
     -+
     -+static void ignore_object(struct object *obj, const char *v, void *data)
     -+{
     -+	/* Do nothing. */
     -+}
     -+
     - static void write_bundle_prerequisites(struct commit *commit, void *data)
     - {
     - 	struct bundle_prerequisites_info *bpi = data;
      @@ bundle.c: int create_bundle(struct repository *r, const char *path,
       		die("revision walk setup failed");
       	bpi.fd = bundle_fd;
       	bpi.pending = &revs_copy.pending;
     --	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
      +
      +	revs.blob_objects = revs.tree_objects = 0;
     -+	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
     + 	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
       	object_array_remove_duplicates(&revs_copy.pending);
       
     - 	/* write bundle refs */
     +
     + ## list-objects.c ##
     +@@ list-objects.c: struct traversal_context {
     + 	struct filter *filter;
     + };
     + 
     ++static void show_commit(struct traversal_context *ctx,
     ++			struct commit *commit)
     ++{
     ++	if (!ctx->show_commit)
     ++		return;
     ++	ctx->show_commit(commit, ctx->show_data);
     ++}
     ++
     ++static void show_object(struct traversal_context *ctx,
     ++			struct object *object,
     ++			const char *name)
     ++{
     ++	if (!ctx->show_object)
     ++		return;
     ++	ctx->show_object(object, name, ctx->show_data);
     ++}
     ++
     + static void process_blob(struct traversal_context *ctx,
     + 			 struct blob *blob,
     + 			 struct strbuf *path,
     +@@ list-objects.c: static void process_blob(struct traversal_context *ctx,
     + 	if (r & LOFR_MARK_SEEN)
     + 		obj->flags |= SEEN;
     + 	if (r & LOFR_DO_SHOW)
     +-		ctx->show_object(obj, path->buf, ctx->show_data);
     ++		show_object(ctx, obj, path->buf);
     + 	strbuf_setlen(path, pathlen);
     + }
     + 
     +@@ list-objects.c: static void process_tree(struct traversal_context *ctx,
     + 	if (r & LOFR_MARK_SEEN)
     + 		obj->flags |= SEEN;
     + 	if (r & LOFR_DO_SHOW)
     +-		ctx->show_object(obj, base->buf, ctx->show_data);
     ++		show_object(ctx, obj, base->buf);
     + 	if (base->len)
     + 		strbuf_addch(base, '/');
     + 
     +@@ list-objects.c: static void process_tree(struct traversal_context *ctx,
     + 	if (r & LOFR_MARK_SEEN)
     + 		obj->flags |= SEEN;
     + 	if (r & LOFR_DO_SHOW)
     +-		ctx->show_object(obj, base->buf, ctx->show_data);
     ++		show_object(ctx, obj, base->buf);
     + 
     + 	strbuf_setlen(base, baselen);
     + 	free_tree_buffer(tree);
     +@@ list-objects.c: static void process_tag(struct traversal_context *ctx,
     + 	if (r & LOFR_MARK_SEEN)
     + 		tag->object.flags |= SEEN;
     + 	if (r & LOFR_DO_SHOW)
     +-		ctx->show_object(&tag->object, name, ctx->show_data);
     ++		show_object(ctx, &tag->object, name);
     + }
     + 
     + static void mark_edge_parents_uninteresting(struct commit *commit,
     +@@ list-objects.c: static void do_traverse(struct traversal_context *ctx)
     + 		if (r & LOFR_MARK_SEEN)
     + 			commit->object.flags |= SEEN;
     + 		if (r & LOFR_DO_SHOW)
     +-			ctx->show_commit(commit, ctx->show_data);
     ++			show_commit(ctx, commit);
     + 
     + 		if (ctx->revs->tree_blobs_in_commit_order)
     + 			/*
      
       ## t/t6020-bundle-misc.sh ##
      @@ t/t6020-bundle-misc.sh: test_expect_success 'clone from bundle' '
  8:  898a7d94513 !  8:  025f38290f5 bundle: parse filter capability
     @@ bundle.c: void bundle_header_release(struct bundle_header *header)
       {
       	string_list_clear(&header->prerequisites, 1);
       	string_list_clear(&header->references, 1);
     -+	list_objects_filter_release(header->filter);
     -+	free(header->filter);
     ++	list_objects_filter_release(&header->filter);
       }
       
       static int parse_capability(struct bundle_header *header, const char *capability)
     @@ bundle.c: static int parse_capability(struct bundle_header *header, const char *
       		return 0;
       	}
      +	if (skip_prefix(capability, "filter=", &arg)) {
     -+		CALLOC_ARRAY(header->filter, 1);
     -+		parse_list_objects_filter(header->filter, arg);
     ++		parse_list_objects_filter(&header->filter, arg);
      +		return 0;
      +	}
       	return error(_("unknown capability '%s'"), capability);
     @@ bundle.c: int verify_bundle(struct repository *r,
       	req_nr = revs.pending.nr;
       	setup_revisions(2, argv, &revs, NULL);
       
     -+	revs.filter = header->filter;
     ++	revs.filter = &header->filter;
      +
       	if (prepare_revision_walk(&revs))
       		die(_("revision walk setup failed"));
     @@ bundle.c: int verify_bundle(struct repository *r,
       			  r->nr);
       		list_refs(r, 0, NULL);
      +
     -+		if (header->filter) {
     ++		if (header->filter.choice != LOFC_DISABLED) {
      +			printf_ln("The bundle uses this filter: %s",
     -+				  list_objects_filter_spec(header->filter));
     ++				  list_objects_filter_spec(&header->filter));
      +		}
      +
       		r = &header->prerequisites;
     @@ bundle.c: int verify_bundle(struct repository *r,
      
       ## bundle.h ##
      @@
     + #include "strvec.h"
       #include "cache.h"
       #include "string-list.h"
     ++#include "list-objects-filter-options.h"
       
     -+struct list_objects_filter_options;
     -+
       struct bundle_header {
       	unsigned version;
       	struct string_list prerequisites;
       	struct string_list references;
       	const struct git_hash_algo *hash_algo;
     -+	struct list_objects_filter_options *filter;
     ++	struct list_objects_filter_options filter;
       };
       
       #define BUNDLE_HEADER_INIT \
  9:  aaa15d7d512 =  9:  2c8e8a6c2a5 rev-list: move --filter parsing into revision.c
 10:  82d93fc62e2 ! 10:  470b6f73e28 bundle: create filtered bundles
     @@ bundle.c: int create_bundle(struct repository *r, const char *path,
      +	 */
      +	revs.filter = NULL;
       	revs.blob_objects = revs.tree_objects = 0;
     - 	traverse_commit_list(&revs, write_bundle_prerequisites, ignore_object, &bpi);
     + 	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
       	object_array_remove_duplicates(&revs_copy.pending);
      
       ## t/t6020-bundle-misc.sh ##
 11:  ef17691a6b7 ! 11:  e85ca2770a3 bundle: unbundle promisor packs
     @@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
       	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
       
      +	/* If there is a filter, then we need to create the promisor pack. */
     -+	if (header->filter)
     ++	if (header->filter.choice != LOFC_DISABLED)
      +		strvec_push(&ip.args, "--promisor=from-bundle");
      +
       	if (extra_index_pack_args) {
 12:  382b9502f6b ! 12:  805e1d11722 clone: fail gracefully when cloning filtered bundle
     @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
      +	if (is_bundle) {
      +		struct bundle_header header = { 0 };
      +		int fd = read_bundle_header(path, &header);
     -+		int has_filter = !!header.filter;
     ++		int has_filter = header.filter.choice != LOFC_DISABLED;
      +
      +		if (fd > 0)
      +			close(fd);

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v3 01/12] index-pack: document and test the --promisor option
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
@ 2022-03-08 14:39     ` Derrick Stolee via GitGitGadget
  2022-03-08 14:39     ` [PATCH v3 02/12] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
                       ` (11 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The --promisor option of 'git index-pack' was created in 88e2f9e
(introduce fetch-object: fetch one promisor object, 2017-12-05) but was
untested. It is currently unused within the Git codebase, but that will
change in an upcoming change to 'git bundle unbundle' when there is a
filter capability.

For now, add documentation about the option and add a test to ensure it
is working as expected.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-index-pack.txt | 8 ++++++++
 t/t5300-pack-object.sh           | 4 +++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 1f1e3592251..4e71c256ecb 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -122,6 +122,14 @@ This option cannot be used with --stdin.
 +
 include::object-format-disclaimer.txt[]
 
+--promisor[=<message>]::
+	Before committing the pack-index, create a .promisor file for this
+	pack. Particularly helpful when writing a promisor pack with --fix-thin
+	since the name of the pack is not final until the pack has been fully
+	written. If a `<message>` is provided, then that content will be
+	written to the .promisor file for future reference. See
+	link:technical/partial-clone.html[partial clone] for more information.
+
 NOTES
 -----
 
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 2fd845187e7..a11d61206ad 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -315,8 +315,10 @@ test_expect_success \
      git index-pack -o tmp.idx test-3.pack &&
      cmp tmp.idx test-1-${packname_1}.idx &&
 
-     git index-pack test-3.pack &&
+     git index-pack --promisor=message test-3.pack &&
      cmp test-3.idx test-1-${packname_1}.idx &&
+     echo message >expect &&
+     test_cmp expect test-3.promisor &&
 
      cat test-2-${packname_2}.pack >test-3.pack &&
      git index-pack -o tmp.idx test-2-${packname_2}.pack &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 02/12] revision: put object filter into struct rev_info
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  2022-03-08 14:39     ` [PATCH v3 01/12] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
@ 2022-03-08 14:39     ` Derrick Stolee via GitGitGadget
  2022-03-08 14:39     ` [PATCH v3 03/12] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
                       ` (10 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Placing a 'struct list_objects_filter_options' pointer within 'struct
rev_info' will assist making some bookkeeping around object filters in
the future.

For now, let's use this new member to remove a static global instance of
the struct from builtin/rev-list.c.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/rev-list.c | 30 ++++++++++++++++--------------
 revision.h         |  4 ++++
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 777558e9b06..6f2b91d304e 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -62,7 +62,6 @@ static const char rev_list_usage[] =
 static struct progress *progress;
 static unsigned progress_counter;
 
-static struct list_objects_filter_options filter_options;
 static struct oidset omitted_objects;
 static int arg_print_omitted; /* print objects omitted by filter */
 
@@ -400,7 +399,6 @@ static inline int parse_missing_action_value(const char *value)
 }
 
 static int try_bitmap_count(struct rev_info *revs,
-			    struct list_objects_filter_options *filter,
 			    int filter_provided_objects)
 {
 	uint32_t commit_count = 0,
@@ -436,7 +434,8 @@ static int try_bitmap_count(struct rev_info *revs,
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
+					 filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -453,7 +452,6 @@ static int try_bitmap_count(struct rev_info *revs,
 }
 
 static int try_bitmap_traversal(struct rev_info *revs,
-				struct list_objects_filter_options *filter,
 				int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
@@ -465,7 +463,8 @@ static int try_bitmap_traversal(struct rev_info *revs,
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
+					 filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -475,7 +474,6 @@ static int try_bitmap_traversal(struct rev_info *revs,
 }
 
 static int try_bitmap_disk_usage(struct rev_info *revs,
-				 struct list_objects_filter_options *filter,
 				 int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
@@ -483,7 +481,7 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!show_disk_usage)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, revs->filter, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -597,13 +595,17 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		}
 
 		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
-			parse_list_objects_filter(&filter_options, arg);
-			if (filter_options.choice && !revs.blob_objects)
+			if (!revs.filter)
+				CALLOC_ARRAY(revs.filter, 1);
+			parse_list_objects_filter(revs.filter, arg);
+			if (revs.filter->choice && !revs.blob_objects)
 				die(_("object filtering requires --objects"));
 			continue;
 		}
 		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-			list_objects_filter_set_no_filter(&filter_options);
+			if (!revs.filter)
+				CALLOC_ARRAY(revs.filter, 1);
+			list_objects_filter_set_no_filter(revs.filter);
 			continue;
 		}
 		if (!strcmp(arg, "--filter-provided-objects")) {
@@ -688,11 +690,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		progress = start_delayed_progress(show_progress, 0);
 
 	if (use_bitmap_index) {
-		if (!try_bitmap_count(&revs, &filter_options, filter_provided_objects))
+		if (!try_bitmap_count(&revs, filter_provided_objects))
 			return 0;
-		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_objects))
+		if (!try_bitmap_disk_usage(&revs, filter_provided_objects))
 			return 0;
-		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_objects))
+		if (!try_bitmap_traversal(&revs, filter_provided_objects))
 			return 0;
 	}
 
@@ -733,7 +735,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
 
 	traverse_commit_list_filtered(
-		&filter_options, &revs, show_commit, show_object, &info,
+		revs.filter, &revs, show_commit, show_object, &info,
 		(arg_print_omitted ? &omitted_objects : NULL));
 
 	if (arg_print_omitted) {
diff --git a/revision.h b/revision.h
index 3c58c18c63a..1ddb73ab82e 100644
--- a/revision.h
+++ b/revision.h
@@ -81,6 +81,7 @@ struct rev_cmdline_info {
 
 struct oidset;
 struct topo_walk_info;
+struct list_objects_filter_options;
 
 struct rev_info {
 	/* Starting list */
@@ -94,6 +95,9 @@ struct rev_info {
 	/* The end-points specified by the end user */
 	struct rev_cmdline_info cmdline;
 
+	/* Object filter options. NULL for no filtering. */
+	struct list_objects_filter_options *filter;
+
 	/* excluding from --branches, --refs, etc. expansion */
 	struct string_list *ref_excludes;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 03/12] pack-objects: use rev.filter when possible
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  2022-03-08 14:39     ` [PATCH v3 01/12] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
  2022-03-08 14:39     ` [PATCH v3 02/12] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
@ 2022-03-08 14:39     ` Derrick Stolee via GitGitGadget
  2022-03-08 14:39     ` [PATCH v3 04/12] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
                       ` (9 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

In builtin/pack-objects.c, we use a 'filter_options' global to populate
the --filter=<X> argument. The previous change created a pointer to a
filter option in 'struct rev_info', so we can use that pointer here as a
start to simplifying some usage of object filters.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ba2006f2212..256d9b1798f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, revs->filter, 0)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
@@ -3727,6 +3727,7 @@ static void get_object_list(int ac, const char **av)
 	repo_init_revisions(the_repository, &revs, NULL);
 	save_commit_buffer = 0;
 	setup_revisions(ac, av, &revs, &s_r_opt);
+	revs.filter = &filter_options;
 
 	/* make sure shallows are read */
 	is_repository_shallow(the_repository);
@@ -3777,7 +3778,7 @@ static void get_object_list(int ac, const char **av)
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
-	traverse_commit_list_filtered(&filter_options, &revs,
+	traverse_commit_list_filtered(revs.filter, &revs,
 				      show_commit, fn_show_object, NULL,
 				      NULL);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 04/12] pack-bitmap: drop filter in prepare_bitmap_walk()
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-03-08 14:39     ` [PATCH v3 03/12] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
@ 2022-03-08 14:39     ` Derrick Stolee via GitGitGadget
  2022-03-08 14:39     ` [PATCH v3 05/12] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
                       ` (8 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Now that all consumers of prepare_bitmap_walk() have populated the
'filter' member of 'struct rev_info', we can drop that extra parameter
from the method and access it directly from the 'struct rev_info'.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c |  2 +-
 builtin/rev-list.c     |  8 +++-----
 pack-bitmap.c          | 20 +++++++++-----------
 pack-bitmap.h          |  2 --
 reachable.c            |  2 +-
 5 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 256d9b1798f..57f2cf49696 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, revs->filter, 0)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, 0)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 6f2b91d304e..556e78aebb9 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -434,8 +434,7 @@ static int try_bitmap_count(struct rev_info *revs,
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
-					 filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -463,8 +462,7 @@ static int try_bitmap_traversal(struct rev_info *revs,
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
-					 filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -481,7 +479,7 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!show_disk_usage)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, revs->filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9c666cdb8bd..613f2797cdf 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -739,8 +739,7 @@ static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
 static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 				   struct rev_info *revs,
 				   struct object_list *roots,
-				   struct bitmap *seen,
-				   struct list_objects_filter_options *filter)
+				   struct bitmap *seen)
 {
 	struct bitmap *base = NULL;
 	int needs_walk = 0;
@@ -823,7 +822,7 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 		show_data.bitmap_git = bitmap_git;
 		show_data.base = base;
 
-		traverse_commit_list_filtered(filter, revs,
+		traverse_commit_list_filtered(revs->filter, revs,
 					      show_commit, show_object,
 					      &show_data, NULL);
 
@@ -1219,7 +1218,6 @@ static int can_filter_bitmap(struct list_objects_filter_options *filter)
 }
 
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter,
 					 int filter_provided_objects)
 {
 	unsigned int i;
@@ -1240,7 +1238,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (revs->prune)
 		return NULL;
 
-	if (!can_filter_bitmap(filter))
+	if (!can_filter_bitmap(revs->filter))
 		return NULL;
 
 	/* try to open a bitmapped pack, but don't parse it yet
@@ -1297,8 +1295,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 
 	if (haves) {
 		revs->ignore_missing_links = 1;
-		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL,
-					    filter);
+		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
 		reset_revision_walk();
 		revs->ignore_missing_links = 0;
 
@@ -1306,8 +1303,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 			BUG("failed to perform bitmap walk");
 	}
 
-	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap,
-				    filter);
+	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);
 
 	if (!wants_bitmap)
 		BUG("failed to perform bitmap walk");
@@ -1315,8 +1311,10 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
 
-	filter_bitmap(bitmap_git, (filter && filter_provided_objects) ? NULL : wants,
-		      wants_bitmap, filter);
+	filter_bitmap(bitmap_git,
+		      (revs->filter && filter_provided_objects) ? NULL : wants,
+		      wants_bitmap,
+		      revs->filter);
 
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 19a63fa1abc..3d3ddd77345 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -10,7 +10,6 @@
 struct commit;
 struct repository;
 struct rev_info;
-struct list_objects_filter_options;
 
 static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};
 
@@ -54,7 +53,6 @@ void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
 int test_bitmap_hashes(struct repository *r);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter,
 					 int filter_provided_objects);
 uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git);
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
diff --git a/reachable.c b/reachable.c
index 84e3d0d75ed..b9f4ad886ef 100644
--- a/reachable.c
+++ b/reachable.c
@@ -205,7 +205,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	cp.progress = progress;
 	cp.count = 0;
 
-	bitmap_git = prepare_bitmap_walk(revs, NULL, 0);
+	bitmap_git = prepare_bitmap_walk(revs, 0);
 	if (bitmap_git) {
 		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
 		free_bitmap_index(bitmap_git);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 05/12] list-objects: consolidate traverse_commit_list[_filtered]
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-03-08 14:39     ` [PATCH v3 04/12] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
@ 2022-03-08 14:39     ` Derrick Stolee via GitGitGadget
  2022-03-09 13:24       ` Ævar Arnfjörð Bjarmason
  2022-03-08 14:39     ` [PATCH v3 06/12] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
                       ` (7 subsequent siblings)
  12 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Now that all consumers of traverse_commit_list_filtered() populate the
'filter' member of 'struct rev_info', we can drop that parameter from
the method prototype to simplify things. In addition, the only thing
different now between traverse_commit_list_filtered() and
traverse_commit_list() is the presence of the 'omitted' parameter, which
is only non-NULL for one caller. We can consolidate these two methods by
having one call the other and use the simpler form everywhere the
'omitted' parameter would be NULL.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c |  6 +++---
 builtin/rev-list.c     |  2 +-
 list-objects.c         | 25 ++++++++-----------------
 list-objects.h         | 12 ++++++++++--
 pack-bitmap.c          |  6 +++---
 5 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 57f2cf49696..0432ae1e499 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3778,9 +3778,9 @@ static void get_object_list(int ac, const char **av)
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
-	traverse_commit_list_filtered(revs.filter, &revs,
-				      show_commit, fn_show_object, NULL,
-				      NULL);
+	traverse_commit_list(&revs,
+			     show_commit, fn_show_object,
+			     NULL);
 
 	if (unpack_unreachable_expiration) {
 		revs.ignore_missing_links = 1;
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 556e78aebb9..3ab727817fd 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -733,7 +733,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
 
 	traverse_commit_list_filtered(
-		revs.filter, &revs, show_commit, show_object, &info,
+		&revs, show_commit, show_object, &info,
 		(arg_print_omitted ? &omitted_objects : NULL));
 
 	if (arg_print_omitted) {
diff --git a/list-objects.c b/list-objects.c
index 2f623f82115..9422625b39e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -416,22 +416,7 @@ static void do_traverse(struct traversal_context *ctx)
 	strbuf_release(&csp);
 }
 
-void traverse_commit_list(struct rev_info *revs,
-			  show_commit_fn show_commit,
-			  show_object_fn show_object,
-			  void *show_data)
-{
-	struct traversal_context ctx;
-	ctx.revs = revs;
-	ctx.show_commit = show_commit;
-	ctx.show_object = show_object;
-	ctx.show_data = show_data;
-	ctx.filter = NULL;
-	do_traverse(&ctx);
-}
-
 void traverse_commit_list_filtered(
-	struct list_objects_filter_options *filter_options,
 	struct rev_info *revs,
 	show_commit_fn show_commit,
 	show_object_fn show_object,
@@ -444,7 +429,13 @@ void traverse_commit_list_filtered(
 	ctx.show_object = show_object;
 	ctx.show_commit = show_commit;
 	ctx.show_data = show_data;
-	ctx.filter = list_objects_filter__init(omitted, filter_options);
+	if (revs->filter)
+		ctx.filter = list_objects_filter__init(omitted, revs->filter);
+	else
+		ctx.filter = NULL;
+
 	do_traverse(&ctx);
-	list_objects_filter__free(ctx.filter);
+
+	if (ctx.filter)
+		list_objects_filter__free(ctx.filter);
 }
diff --git a/list-objects.h b/list-objects.h
index a952680e466..9eaf4de8449 100644
--- a/list-objects.h
+++ b/list-objects.h
@@ -7,7 +7,6 @@ struct rev_info;
 
 typedef void (*show_commit_fn)(struct commit *, void *);
 typedef void (*show_object_fn)(struct object *, const char *, void *);
-void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, void *);
 
 typedef void (*show_edge_fn)(struct commit *);
 void mark_edges_uninteresting(struct rev_info *revs,
@@ -18,11 +17,20 @@ struct oidset;
 struct list_objects_filter_options;
 
 void traverse_commit_list_filtered(
-	struct list_objects_filter_options *filter_options,
 	struct rev_info *revs,
 	show_commit_fn show_commit,
 	show_object_fn show_object,
 	void *show_data,
 	struct oidset *omitted);
 
+static inline void traverse_commit_list(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	void *show_data)
+{
+	traverse_commit_list_filtered(revs, show_commit,
+				      show_object, show_data, NULL);
+}
+
 #endif /* LIST_OBJECTS_H */
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 613f2797cdf..cbefaedbf43 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -822,9 +822,9 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 		show_data.bitmap_git = bitmap_git;
 		show_data.base = base;
 
-		traverse_commit_list_filtered(revs->filter, revs,
-					      show_commit, show_object,
-					      &show_data, NULL);
+		traverse_commit_list(revs,
+				     show_commit, show_object,
+				     &show_data);
 
 		revs->include_check = NULL;
 		revs->include_check_obj = NULL;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 06/12] MyFirstObjectWalk: update recommended usage
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2022-03-08 14:39     ` [PATCH v3 05/12] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
@ 2022-03-08 14:39     ` Derrick Stolee via GitGitGadget
  2022-03-08 14:39     ` [PATCH v3 07/12] list-objects: handle NULL function pointers Ævar Arnfjörð Bjarmason via GitGitGadget
                       ` (6 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change consolidated traverse_commit_list() and
traverse_commit_list_filtered(). This allows us to simplify the
recommended usage in MyFirstObjectWalk.txt to use this new set of
values.

While here, add some clarification on the difference between the two
methods.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/MyFirstObjectWalk.txt | 44 +++++++++++------------------
 1 file changed, 16 insertions(+), 28 deletions(-)

diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
index ca267941f3e..8d9e85566e6 100644
--- a/Documentation/MyFirstObjectWalk.txt
+++ b/Documentation/MyFirstObjectWalk.txt
@@ -522,24 +522,25 @@ function shows that the all-object walk is being performed by
 `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
 functions reside in `list-objects.c`; examining the source shows that, despite
 the name, these functions traverse all kinds of objects. Let's have a look at
-the arguments to `traverse_commit_list_filtered()`, which are a superset of the
-arguments to the unfiltered version.
+the arguments to `traverse_commit_list()`.
 
-- `struct list_objects_filter_options *filter_options`: This is a struct which
-  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
-- `struct rev_info *revs`: This is the `rev_info` used for the walk.
+- `struct rev_info *revs`: This is the `rev_info` used for the walk. If
+  its `filter` member is not `NULL`, then `filter` contains information for
+  how to filter the object list.
 - `show_commit_fn show_commit`: A callback which will be used to handle each
   individual commit object.
 - `show_object_fn show_object`: A callback which will be used to handle each
   non-commit object (so each blob, tree, or tag).
 - `void *show_data`: A context buffer which is passed in turn to `show_commit`
   and `show_object`.
+
+In addition, `traverse_commit_list_filtered()` has an additional paramter:
+
 - `struct oidset *omitted`: A linked-list of object IDs which the provided
   filter caused to be omitted.
 
-It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
-instead of needing us to call it repeatedly ourselves. Cool! Let's add the
-callbacks first.
+It looks like these methods use callbacks we provide instead of needing us
+to call it repeatedly ourselves. Cool! Let's add the callbacks first.
 
 For the sake of this tutorial, we'll simply keep track of how many of each kind
 of object we find. At file scope in `builtin/walken.c` add the following
@@ -712,20 +713,9 @@ help understand. In our case, that means we omit trees and blobs not directly
 referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
 `HEAD` in the `pending` list.)
 
-First, we'll need to `#include "list-objects-filter-options.h"` and set up the
-`struct list_objects_filter_options` at the top of the function.
-
-----
-static void walken_object_walk(struct rev_info *rev)
-{
-	struct list_objects_filter_options filter_options = { 0 };
-
-	...
-----
-
 For now, we are not going to track the omitted objects, so we'll replace those
 parameters with `NULL`. For the sake of simplicity, we'll add a simple
-build-time branch to use our filter or not. Replace the line calling
+build-time branch to use our filter or not. Preface the line calling
 `traverse_commit_list()` with the following, which will remind us which kind of
 walk we've just performed:
 
@@ -733,19 +723,17 @@ walk we've just performed:
 	if (0) {
 		/* Unfiltered: */
 		trace_printf(_("Unfiltered object walk.\n"));
-		traverse_commit_list(rev, walken_show_commit,
-				walken_show_object, NULL);
 	} else {
 		trace_printf(
 			_("Filtered object walk with filterspec 'tree:1'.\n"));
-		parse_list_objects_filter(&filter_options, "tree:1");
-
-		traverse_commit_list_filtered(&filter_options, rev,
-			walken_show_commit, walken_show_object, NULL, NULL);
+		CALLOC_ARRAY(rev->filter, 1);
+		parse_list_objects_filter(rev->filter, "tree:1");
 	}
+	traverse_commit_list(rev, walken_show_commit,
+			     walken_show_object, NULL);
 ----
 
-`struct list_objects_filter_options` is usually built directly from a command
+The `rev->filter` member is usually built directly from a command
 line argument, so the module provides an easy way to build one from a string.
 Even though we aren't taking user input right now, we can still build one with
 a hardcoded string using `parse_list_objects_filter()`.
@@ -784,7 +772,7 @@ object:
 ----
 	...
 
-		traverse_commit_list_filtered(&filter_options, rev,
+		traverse_commit_list_filtered(rev,
 			walken_show_commit, walken_show_object, NULL, &omitted);
 
 	...
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 07/12] list-objects: handle NULL function pointers
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (5 preceding siblings ...)
  2022-03-08 14:39     ` [PATCH v3 06/12] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
@ 2022-03-08 14:39     ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-03-08 17:26       ` Junio C Hamano
  2022-03-08 14:39     ` [PATCH v3 08/12] bundle: parse filter capability Derrick Stolee via GitGitGadget
                       ` (5 subsequent siblings)
  12 siblings, 1 reply; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

If a caller to traverse_commit_list() specifies the options for the
--objects flag but does not specify a show_object function pointer, the
result is a segfault. This is currently visible by running 'git bundle
create --objects HEAD'.

We could fix this problem by supplying a no-op callback in
builtin/bundle.c, but that only solves the problem for one builtin,
leaving this segfault open for other callers.

Replace all callers of the show_commit and show_object function pointers
in list-objects.c to be local methods show_commit() and show_object()
which check that the given contex has non-NULL functions before passing
the necessary data. One extra benefit is that it reduces duplication
due to passing ctx->show_data to every caller.

Test that this segfault no longer occurs for 'git bundle'.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               |  2 ++
 list-objects.c         | 27 ++++++++++++++++++++++-----
 t/t6020-bundle-misc.sh | 12 ++++++++++++
 3 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/bundle.c b/bundle.c
index a0bb687b0f4..7ba60a573d7 100644
--- a/bundle.c
+++ b/bundle.c
@@ -544,6 +544,8 @@ int create_bundle(struct repository *r, const char *path,
 		die("revision walk setup failed");
 	bpi.fd = bundle_fd;
 	bpi.pending = &revs_copy.pending;
+
+	revs.blob_objects = revs.tree_objects = 0;
 	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
 	object_array_remove_duplicates(&revs_copy.pending);
 
diff --git a/list-objects.c b/list-objects.c
index 9422625b39e..0af0bef1dbc 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -21,6 +21,23 @@ struct traversal_context {
 	struct filter *filter;
 };
 
+static void show_commit(struct traversal_context *ctx,
+			struct commit *commit)
+{
+	if (!ctx->show_commit)
+		return;
+	ctx->show_commit(commit, ctx->show_data);
+}
+
+static void show_object(struct traversal_context *ctx,
+			struct object *object,
+			const char *name)
+{
+	if (!ctx->show_object)
+		return;
+	ctx->show_object(object, name, ctx->show_data);
+}
+
 static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
 			 struct strbuf *path,
@@ -60,7 +77,7 @@ static void process_blob(struct traversal_context *ctx,
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		ctx->show_object(obj, path->buf, ctx->show_data);
+		show_object(ctx, obj, path->buf);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -194,7 +211,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		ctx->show_object(obj, base->buf, ctx->show_data);
+		show_object(ctx, obj, base->buf);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -210,7 +227,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		ctx->show_object(obj, base->buf, ctx->show_data);
+		show_object(ctx, obj, base->buf);
 
 	strbuf_setlen(base, baselen);
 	free_tree_buffer(tree);
@@ -228,7 +245,7 @@ static void process_tag(struct traversal_context *ctx,
 	if (r & LOFR_MARK_SEEN)
 		tag->object.flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		ctx->show_object(&tag->object, name, ctx->show_data);
+		show_object(ctx, &tag->object, name);
 }
 
 static void mark_edge_parents_uninteresting(struct commit *commit,
@@ -402,7 +419,7 @@ static void do_traverse(struct traversal_context *ctx)
 		if (r & LOFR_MARK_SEEN)
 			commit->object.flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			ctx->show_commit(commit, ctx->show_data);
+			show_commit(ctx, commit);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index b13e8a52a93..6522401617d 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -475,4 +475,16 @@ test_expect_success 'clone from bundle' '
 	test_cmp expect actual
 '
 
+test_expect_success 'unfiltered bundle with --objects' '
+	git bundle create all-objects.bdl \
+		--all --objects &&
+	git bundle create all.bdl \
+		--all &&
+
+	# Compare the headers of these files.
+	head -11 all.bdl >expect &&
+	head -11 all-objects.bdl >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 08/12] bundle: parse filter capability
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (6 preceding siblings ...)
  2022-03-08 14:39     ` [PATCH v3 07/12] list-objects: handle NULL function pointers Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-03-08 14:39     ` Derrick Stolee via GitGitGadget
  2022-03-08 17:29       ` Junio C Hamano
  2022-03-09 13:30       ` Ævar Arnfjörð Bjarmason
  2022-03-08 14:39     ` [PATCH v3 09/12] rev-list: move --filter parsing into revision.c Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  12 siblings, 2 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The v3 bundle format has capabilities, allowing newer versions of Git to
create bundles with newer features. Older versions that do not
understand these new capabilities will fail with a helpful warning.

Create a new capability allowing Git to understand that the contained
pack-file is filtered according to some object filter. Typically, this
filter will be "blob:none" for a blobless partial clone.

This change teaches Git to parse this capability, place its value in the
bundle header, and demonstrate this understanding by adding a message to
'git bundle verify'.

Since we will use gently_parse_list_objects_filter() outside of
list-objects-filter-options.c, make it an external method and move its
API documentation to before its declaration.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/technical/bundle-format.txt | 11 ++++++++---
 bundle.c                                  | 15 ++++++++++++++-
 bundle.h                                  |  2 ++
 list-objects-filter-options.c             | 17 +----------------
 list-objects-filter-options.h             | 20 ++++++++++++++++++++
 5 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/Documentation/technical/bundle-format.txt b/Documentation/technical/bundle-format.txt
index bac558d049a..b9be8644cf5 100644
--- a/Documentation/technical/bundle-format.txt
+++ b/Documentation/technical/bundle-format.txt
@@ -71,6 +71,11 @@ and the Git bundle v2 format cannot represent a shallow clone repository.
 == Capabilities
 
 Because there is no opportunity for negotiation, unknown capabilities cause 'git
-bundle' to abort.  The only known capability is `object-format`, which specifies
-the hash algorithm in use, and can take the same values as the
-`extensions.objectFormat` configuration value.
+bundle' to abort.
+
+* `object-format` specifies the hash algorithm in use, and can take the same
+  values as the `extensions.objectFormat` configuration value.
+
+* `filter` specifies an object filter as in the `--filter` option in
+  linkgit:git-rev-list[1]. The resulting pack-file must be marked as a
+  `.promisor` pack-file after it is unbundled.
diff --git a/bundle.c b/bundle.c
index 7ba60a573d7..41922565627 100644
--- a/bundle.c
+++ b/bundle.c
@@ -11,7 +11,7 @@
 #include "run-command.h"
 #include "refs.h"
 #include "strvec.h"
-
+#include "list-objects-filter-options.h"
 
 static const char v2_bundle_signature[] = "# v2 git bundle\n";
 static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -33,6 +33,7 @@ void bundle_header_release(struct bundle_header *header)
 {
 	string_list_clear(&header->prerequisites, 1);
 	string_list_clear(&header->references, 1);
+	list_objects_filter_release(&header->filter);
 }
 
 static int parse_capability(struct bundle_header *header, const char *capability)
@@ -45,6 +46,10 @@ static int parse_capability(struct bundle_header *header, const char *capability
 		header->hash_algo = &hash_algos[algo];
 		return 0;
 	}
+	if (skip_prefix(capability, "filter=", &arg)) {
+		parse_list_objects_filter(&header->filter, arg);
+		return 0;
+	}
 	return error(_("unknown capability '%s'"), capability);
 }
 
@@ -220,6 +225,8 @@ int verify_bundle(struct repository *r,
 	req_nr = revs.pending.nr;
 	setup_revisions(2, argv, &revs, NULL);
 
+	revs.filter = &header->filter;
+
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 
@@ -259,6 +266,12 @@ int verify_bundle(struct repository *r,
 			     r->nr),
 			  r->nr);
 		list_refs(r, 0, NULL);
+
+		if (header->filter.choice != LOFC_DISABLED) {
+			printf_ln("The bundle uses this filter: %s",
+				  list_objects_filter_spec(&header->filter));
+		}
+
 		r = &header->prerequisites;
 		if (!r->nr) {
 			printf_ln(_("The bundle records a complete history."));
diff --git a/bundle.h b/bundle.h
index 06009fe6b1f..7fef2108f43 100644
--- a/bundle.h
+++ b/bundle.h
@@ -4,12 +4,14 @@
 #include "strvec.h"
 #include "cache.h"
 #include "string-list.h"
+#include "list-objects-filter-options.h"
 
 struct bundle_header {
 	unsigned version;
 	struct string_list prerequisites;
 	struct string_list references;
 	const struct git_hash_algo *hash_algo;
+	struct list_objects_filter_options filter;
 };
 
 #define BUNDLE_HEADER_INIT \
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index fd8d59f653a..d8597cdee36 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -40,22 +40,7 @@ const char *list_object_filter_config_name(enum list_objects_filter_choice c)
 	BUG("list_object_filter_config_name: invalid argument '%d'", c);
 }
 
-/*
- * Parse value of the argument to the "filter" keyword.
- * On the command line this looks like:
- *       --filter=<arg>
- * and in the pack protocol as:
- *       "filter" SP <arg>
- *
- * The filter keyword will be used by many commands.
- * See Documentation/rev-list-options.txt for allowed values for <arg>.
- *
- * Capture the given arg as the "filter_spec".  This can be forwarded to
- * subordinate commands when necessary (although it's better to pass it through
- * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
- * convenience of the current command.
- */
-static int gently_parse_list_objects_filter(
+int gently_parse_list_objects_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg,
 	struct strbuf *errbuf)
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index da5b6737e27..f6fe6a3d2ca 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -72,6 +72,26 @@ struct list_objects_filter_options {
 /* Normalized command line arguments */
 #define CL_ARG__FILTER "filter"
 
+/*
+ * Parse value of the argument to the "filter" keyword.
+ * On the command line this looks like:
+ *       --filter=<arg>
+ * and in the pack protocol as:
+ *       "filter" SP <arg>
+ *
+ * The filter keyword will be used by many commands.
+ * See Documentation/rev-list-options.txt for allowed values for <arg>.
+ *
+ * Capture the given arg as the "filter_spec".  This can be forwarded to
+ * subordinate commands when necessary (although it's better to pass it through
+ * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
+ * convenience of the current command.
+ */
+int gently_parse_list_objects_filter(
+	struct list_objects_filter_options *filter_options,
+	const char *arg,
+	struct strbuf *errbuf);
+
 void list_objects_filter_die_if_populated(
 	struct list_objects_filter_options *filter_options);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 09/12] rev-list: move --filter parsing into revision.c
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (7 preceding siblings ...)
  2022-03-08 14:39     ` [PATCH v3 08/12] bundle: parse filter capability Derrick Stolee via GitGitGadget
@ 2022-03-08 14:39     ` Derrick Stolee via GitGitGadget
  2022-03-08 14:39     ` [PATCH v3 10/12] bundle: create filtered bundles Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Now that 'struct rev_info' has a 'filter' member and most consumers of
object filtering are using that member instead of an external struct,
move the parsing of the '--filter' option out of builtin/rev-list.c and
into revision.c.

This use within handle_revision_pseudo_opt() allows us to find the
option within setup_revisions() if the arguments are passed directly. In
the case of a command such as 'git blame', the arguments are first
scanned and checked with parse_revision_opt(), which complains about the
option, so 'git blame --filter=blob:none <file>' does not become valid
with this change.

Some commands, such as 'git diff' gain this option without having it
make an effect. And 'git diff --objects' was already possible, but does
not actually make sense in that builtin.

The key addition that is coming is 'git bundle create --filter=<X>' so
we can create bundles containing promisor packs. More work is required
to make them fully functional, but that will follow.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/rev-list.c | 15 ---------------
 revision.c         | 11 +++++++++++
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 3ab727817fd..640828149c5 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -591,21 +591,6 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			show_progress = arg;
 			continue;
 		}
-
-		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
-			if (!revs.filter)
-				CALLOC_ARRAY(revs.filter, 1);
-			parse_list_objects_filter(revs.filter, arg);
-			if (revs.filter->choice && !revs.blob_objects)
-				die(_("object filtering requires --objects"));
-			continue;
-		}
-		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-			if (!revs.filter)
-				CALLOC_ARRAY(revs.filter, 1);
-			list_objects_filter_set_no_filter(revs.filter);
-			continue;
-		}
 		if (!strcmp(arg, "--filter-provided-objects")) {
 			filter_provided_objects = 1;
 			continue;
diff --git a/revision.c b/revision.c
index ad4286fbdde..1d612c1c102 100644
--- a/revision.c
+++ b/revision.c
@@ -32,6 +32,7 @@
 #include "utf8.h"
 #include "bloom.h"
 #include "json-writer.h"
+#include "list-objects-filter-options.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -2669,6 +2670,14 @@ static int handle_revision_pseudo_opt(struct rev_info *revs,
 		revs->no_walk = 0;
 	} else if (!strcmp(arg, "--single-worktree")) {
 		revs->single_worktree = 1;
+	} else if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
+		if (!revs->filter)
+			CALLOC_ARRAY(revs->filter, 1);
+		parse_list_objects_filter(revs->filter, arg);
+	} else if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
+		if (!revs->filter)
+			CALLOC_ARRAY(revs->filter, 1);
+		list_objects_filter_set_no_filter(revs->filter);
 	} else {
 		return 0;
 	}
@@ -2872,6 +2881,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 		die("cannot combine --walk-reflogs with history-limiting options");
 	if (revs->rewrite_parents && revs->children.name)
 		die(_("options '%s' and '%s' cannot be used together"), "--parents", "--children");
+	if (revs->filter && revs->filter->choice && !revs->blob_objects)
+		die(_("object filtering requires --objects"));
 
 	/*
 	 * Limitations on the graph functionality
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 10/12] bundle: create filtered bundles
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (8 preceding siblings ...)
  2022-03-08 14:39     ` [PATCH v3 09/12] rev-list: move --filter parsing into revision.c Derrick Stolee via GitGitGadget
@ 2022-03-08 14:39     ` Derrick Stolee via GitGitGadget
  2022-03-08 14:39     ` [PATCH v3 11/12] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

A previous change allowed Git to parse bundles with the 'filter'
capability. Now, teach Git to create bundles with this option.

Some rearranging of code is required to get the option parsing in the
correct spot. There are now two reasons why we might need capabilities
(a new hash algorithm or an object filter) so that is pulled out into a
place where we can check both at the same time.

The --filter option is parsed as part of setup_revisions(), but it
expected the --objects flag, too. That flag is somewhat implied by 'git
bundle' because it creates a pack-file walking objects, but there is
also a walk that walks the revision range expecting only commits. Make
this parsing work by setting 'revs.tree_objects' and 'revs.blob_objects'
before the call to setup_revisions().

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-bundle.txt |  4 +--
 bundle.c                     | 55 ++++++++++++++++++++++++++++--------
 t/t6020-bundle-misc.sh       | 48 +++++++++++++++++++++++++++++++
 3 files changed, 94 insertions(+), 13 deletions(-)

diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
index 72ab8139052..831c4788a94 100644
--- a/Documentation/git-bundle.txt
+++ b/Documentation/git-bundle.txt
@@ -75,8 +75,8 @@ verify <file>::
 	cleanly to the current repository.  This includes checks on the
 	bundle format itself as well as checking that the prerequisite
 	commits exist and are fully linked in the current repository.
-	'git bundle' prints a list of missing commits, if any, and exits
-	with a non-zero status.
+	'git bundle' prints the bundle's object filter and its list of
+	missing commits, if any, and exits with a non-zero status.
 
 list-heads <file>::
 	Lists the references defined in the bundle.  If followed by a
diff --git a/bundle.c b/bundle.c
index 41922565627..393216c6246 100644
--- a/bundle.c
+++ b/bundle.c
@@ -332,6 +332,9 @@ static int write_pack_data(int bundle_fd, struct rev_info *revs, struct strvec *
 		     "--stdout", "--thin", "--delta-base-offset",
 		     NULL);
 	strvec_pushv(&pack_objects.args, pack_options->v);
+	if (revs->filter)
+		strvec_pushf(&pack_objects.args, "--filter=%s",
+			     list_objects_filter_spec(revs->filter));
 	pack_objects.in = -1;
 	pack_objects.out = bundle_fd;
 	pack_objects.git_cmd = 1;
@@ -499,10 +502,37 @@ int create_bundle(struct repository *r, const char *path,
 	int bundle_to_stdout;
 	int ref_count = 0;
 	struct rev_info revs, revs_copy;
-	int min_version = the_hash_algo == &hash_algos[GIT_HASH_SHA1] ? 2 : 3;
+	int min_version = 2;
 	struct bundle_prerequisites_info bpi;
 	int i;
 
+	/* init revs to list objects for pack-objects later */
+	save_commit_buffer = 0;
+	repo_init_revisions(r, &revs, NULL);
+
+	/*
+	 * Pre-initialize the '--objects' flag so we can parse a
+	 * --filter option successfully.
+	 */
+	revs.tree_objects = revs.blob_objects = 1;
+
+	argc = setup_revisions(argc, argv, &revs, NULL);
+
+	/*
+	 * Reasons to require version 3:
+	 *
+	 * 1. @object-format is required because our hash algorithm is not
+	 *    SHA1.
+	 * 2. @filter is required because we parsed an object filter.
+	 */
+	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] || revs.filter)
+		min_version = 3;
+
+	if (argc > 1) {
+		error(_("unrecognized argument: %s"), argv[1]);
+		goto err;
+	}
+
 	bundle_to_stdout = !strcmp(path, "-");
 	if (bundle_to_stdout)
 		bundle_fd = 1;
@@ -525,17 +555,14 @@ int create_bundle(struct repository *r, const char *path,
 		write_or_die(bundle_fd, capability, strlen(capability));
 		write_or_die(bundle_fd, the_hash_algo->name, strlen(the_hash_algo->name));
 		write_or_die(bundle_fd, "\n", 1);
-	}
-
-	/* init revs to list objects for pack-objects later */
-	save_commit_buffer = 0;
-	repo_init_revisions(r, &revs, NULL);
 
-	argc = setup_revisions(argc, argv, &revs, NULL);
-
-	if (argc > 1) {
-		error(_("unrecognized argument: %s"), argv[1]);
-		goto err;
+		if (revs.filter) {
+			const char *value = expand_list_objects_filter_spec(revs.filter);
+			capability = "@filter=";
+			write_or_die(bundle_fd, capability, strlen(capability));
+			write_or_die(bundle_fd, value, strlen(value));
+			write_or_die(bundle_fd, "\n", 1);
+		}
 	}
 
 	/* save revs.pending in revs_copy for later use */
@@ -558,6 +585,12 @@ int create_bundle(struct repository *r, const char *path,
 	bpi.fd = bundle_fd;
 	bpi.pending = &revs_copy.pending;
 
+	/*
+	 * Nullify the filter here, and any object walking. We only care
+	 * about commits and tags here. The revs_copy has the right
+	 * instances of these values.
+	 */
+	revs.filter = NULL;
 	revs.blob_objects = revs.tree_objects = 0;
 	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
 	object_array_remove_duplicates(&revs_copy.pending);
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 6522401617d..f10cf011519 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -487,4 +487,52 @@ test_expect_success 'unfiltered bundle with --objects' '
 	test_cmp expect actual
 '
 
+for filter in "blob:none" "tree:0" "tree:1" "blob:limit=100"
+do
+	test_expect_success "filtered bundle: $filter" '
+		test_when_finished rm -rf .git/objects/pack cloned unbundled &&
+		git bundle create partial.bdl \
+			--all \
+			--filter=$filter &&
+
+		git bundle verify partial.bdl >unfiltered &&
+		make_user_friendly_and_stable_output <unfiltered >actual &&
+
+		cat >expect <<-EOF &&
+		The bundle contains these 10 refs:
+		<COMMIT-P> refs/heads/main
+		<COMMIT-N> refs/heads/release
+		<COMMIT-D> refs/heads/topic/1
+		<COMMIT-H> refs/heads/topic/2
+		<COMMIT-D> refs/pull/1/head
+		<COMMIT-G> refs/pull/2/head
+		<TAG-1> refs/tags/v1
+		<TAG-2> refs/tags/v2
+		<TAG-3> refs/tags/v3
+		<COMMIT-P> HEAD
+		The bundle uses this filter: $filter
+		The bundle records a complete history.
+		EOF
+		test_cmp expect actual &&
+
+		test_config uploadpack.allowfilter 1 &&
+		test_config uploadpack.allowanysha1inwant 1 &&
+		git clone --no-local --filter=$filter --bare "file://$(pwd)" cloned &&
+
+		git init unbundled &&
+		git -C unbundled bundle unbundle ../partial.bdl >ref-list.txt &&
+
+		# Count the same number of reachable objects.
+		reflist=$(git for-each-ref --format="%(objectname)") &&
+		git rev-list --objects --filter=$filter --missing=allow-any \
+			$reflist >expect &&
+		for repo in cloned unbundled
+		do
+			git -C $repo rev-list --objects --missing=allow-any \
+				$reflist >actual &&
+			test_cmp expect actual || return 1
+		done
+	'
+done
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 11/12] bundle: unbundle promisor packs
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (9 preceding siblings ...)
  2022-03-08 14:39     ` [PATCH v3 10/12] bundle: create filtered bundles Derrick Stolee via GitGitGadget
@ 2022-03-08 14:39     ` Derrick Stolee via GitGitGadget
  2022-03-08 14:39     ` [PATCH v3 12/12] clone: fail gracefully when cloning filtered bundle Derrick Stolee via GitGitGadget
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

In order to have a valid pack-file after unbundling a bundle that has
the 'filter' capability, we need to generate a .promisor file. The
bundle does not promise _where_ the objects can be found, but we can
expect that these bundles will be unbundled in repositories with
appropriate promisor remotes that can find those missing objects.

Use the 'git index-pack --promisor=<message>' option to create this
.promisor file. Add "from-bundle" as the message to help anyone diagnose
issues with these promisor packs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               | 4 ++++
 t/t6020-bundle-misc.sh | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/bundle.c b/bundle.c
index 393216c6246..322810dc1d6 100644
--- a/bundle.c
+++ b/bundle.c
@@ -622,6 +622,10 @@ int unbundle(struct repository *r, struct bundle_header *header,
 	struct child_process ip = CHILD_PROCESS_INIT;
 	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
 
+	/* If there is a filter, then we need to create the promisor pack. */
+	if (header->filter.choice != LOFC_DISABLED)
+		strvec_push(&ip.args, "--promisor=from-bundle");
+
 	if (extra_index_pack_args) {
 		strvec_pushv(&ip.args, extra_index_pack_args->v);
 		strvec_clear(extra_index_pack_args);
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index f10cf011519..42e8cf2eb29 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -521,6 +521,8 @@ do
 
 		git init unbundled &&
 		git -C unbundled bundle unbundle ../partial.bdl >ref-list.txt &&
+		ls unbundled/.git/objects/pack/pack-*.promisor >promisor &&
+		test_line_count = 1 promisor &&
 
 		# Count the same number of reachable objects.
 		reflist=$(git for-each-ref --format="%(objectname)") &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 12/12] clone: fail gracefully when cloning filtered bundle
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (10 preceding siblings ...)
  2022-03-08 14:39     ` [PATCH v3 11/12] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
@ 2022-03-08 14:39     ` Derrick Stolee via GitGitGadget
  2022-03-08 16:10       ` Derrick Stolee
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
  12 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-08 14:39 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Users can create a new repository using 'git clone <bundle-file>'. The
new "@filter" capability for bundles means that we can generate a bundle
that does not contain all reachable objects, even if the header has no
negative commit OIDs.

It is feasible to think that we could make a filtered bundle work with
the command

  git clone --filter=$filter --bare <bundle-file>

or possibly replacing --bare with --no-checkout. However, this requires
having some repository-global config that specifies the specified object
filter and notifies Git about the existence of promisor pack-files.
Without a remote, that is currently impossible.

As a stop-gap, parse the bundle header during 'git clone' and die() with
a helpful error message instead of the current behavior of failing due
to "missing objects".

Most of the existing logic for handling bundle clones actually happens
in fetch-pack.c, but that logic is the same as if the user specified
'git fetch <bundle>', so we want to avoid failing to fetch a filtered
bundle when in an existing repository that has the proper config set up
for at least one remote.

Carefully comment around the test that this is not the desired long-term
behavior of 'git clone' in this case, but instead that we need to do
more work before that is possible.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/clone.c        | 13 +++++++++++++
 t/t6020-bundle-misc.sh | 12 ++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 9c29093b352..623a5040b1c 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -33,6 +33,7 @@
 #include "packfile.h"
 #include "list-objects-filter-options.h"
 #include "hook.h"
+#include "bundle.h"
 
 /*
  * Overall FIXMEs:
@@ -1138,6 +1139,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		warning(_("--local is ignored"));
 	transport->cloning = 1;
 
+	if (is_bundle) {
+		struct bundle_header header = { 0 };
+		int fd = read_bundle_header(path, &header);
+		int has_filter = header.filter.choice != LOFC_DISABLED;
+
+		if (fd > 0)
+			close(fd);
+		bundle_header_release(&header);
+		if (has_filter)
+			die(_("cannot clone from filtered bundle"));
+	}
+
 	transport_set_option(transport, TRANS_OPT_KEEP, "yes");
 
 	if (reject_shallow)
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 42e8cf2eb29..5160cb0a75c 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -537,4 +537,16 @@ do
 	'
 done
 
+# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered
+# bundle, but that requires a change to promisor/filter config options.
+# For now, we fail gracefully with a helpful error. This behavior can be
+# changed in the future to succeed as much as possible.
+test_expect_success 'cloning from filtered bundle has useful error' '
+	git bundle create partial.bdl \
+		--all \
+		--filter=blob:none &&
+	test_must_fail git clone --bare partial.bdl partial 2>err &&
+	grep "cannot clone from filtered bundle" err
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 12/12] clone: fail gracefully when cloning filtered bundle
  2022-03-08 14:39     ` [PATCH v3 12/12] clone: fail gracefully when cloning filtered bundle Derrick Stolee via GitGitGadget
@ 2022-03-08 16:10       ` Derrick Stolee
  2022-03-08 17:19         ` Junio C Hamano
  0 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee @ 2022-03-08 16:10 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy, Jeff Hostetler

On 3/8/2022 9:39 AM, Derrick Stolee via GitGitGadget wrote:

> +	if (is_bundle) {
> +		struct bundle_header header = { 0 };
> +		int fd = read_bundle_header(path, &header);
> +		int has_filter = header.filter.choice != LOFC_DISABLED;

Of course, as I was sending an email replying to What's Cooking, I
realized that I missed one of the suggestions, which is fixed with
this diff:

--- >8 ---

diff --git a/builtin/clone.c b/builtin/clone.c
index 623a5040b1..e57504c2aa 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1140,7 +1140,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	transport->cloning = 1;
 
 	if (is_bundle) {
-		struct bundle_header header = { 0 };
+		struct bundle_header header = BUNDLE_HEADER_INIT;
 		int fd = read_bundle_header(path, &header);
 		int has_filter = header.filter.choice != LOFC_DISABLED;
 


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 12/12] clone: fail gracefully when cloning filtered bundle
  2022-03-08 16:10       ` Derrick Stolee
@ 2022-03-08 17:19         ` Junio C Hamano
  0 siblings, 0 replies; 114+ messages in thread
From: Junio C Hamano @ 2022-03-08 17:19 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, stolee, avarab, zhiyou.jx,
	jonathantanmy, Jeff Hostetler

Derrick Stolee <derrickstolee@github.com> writes:

> On 3/8/2022 9:39 AM, Derrick Stolee via GitGitGadget wrote:
>
>> +	if (is_bundle) {
>> +		struct bundle_header header = { 0 };
>> +		int fd = read_bundle_header(path, &header);
>> +		int has_filter = header.filter.choice != LOFC_DISABLED;
>
> Of course, as I was sending an email replying to What's Cooking, I
> realized that I missed one of the suggestions, which is fixed with
> this diff:
>
> --- >8 ---
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 623a5040b1..e57504c2aa 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1140,7 +1140,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	transport->cloning = 1;
>  
>  	if (is_bundle) {
> -		struct bundle_header header = { 0 };
> +		struct bundle_header header = BUNDLE_HEADER_INIT;
>  		int fd = read_bundle_header(path, &header);
>  		int has_filter = header.filter.choice != LOFC_DISABLED;

Let me squash it into 12/12, then.

Thanks.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 07/12] list-objects: handle NULL function pointers
  2022-03-08 14:39     ` [PATCH v3 07/12] list-objects: handle NULL function pointers Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-03-08 17:26       ` Junio C Hamano
  2022-03-09 13:40         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 114+ messages in thread
From: Junio C Hamano @ 2022-03-08 17:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Jeff Hostetler,
	Derrick Stolee

"Ævar Arnfjörð Bjarmason via GitGitGadget"  <gitgitgadget@gmail.com>
writes:

> Replace all callers of the show_commit and show_object function pointers
> in list-objects.c to be local methods show_commit() and show_object()

"to be local methods" -> "to call helper functions"

> which check that the given contex has non-NULL functions before passing

"contex" -> "context"

> the necessary data. One extra benefit is that it reduces duplication
> due to passing ctx->show_data to every caller.

> -		ctx->show_object(obj, path->buf, ctx->show_data);
> +		show_object(ctx, obj, path->buf);

I guess this is the "reduced duplication" refers to.  The helper
does make it easier to follow and reason about: "show the given
object at the path in this context" is what it asks.

> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
> index b13e8a52a93..6522401617d 100755
> --- a/t/t6020-bundle-misc.sh
> +++ b/t/t6020-bundle-misc.sh
> @@ -475,4 +475,16 @@ test_expect_success 'clone from bundle' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'unfiltered bundle with --objects' '
> +	git bundle create all-objects.bdl \
> +		--all --objects &&
> +	git bundle create all.bdl \
> +		--all &&
> +
> +	# Compare the headers of these files.
> +	head -11 all.bdl >expect &&
> +	head -11 all-objects.bdl >actual &&

"head -n 11" but more importantly, why eleven and not ten or twelve?
Is that a number this code can automatically learn from the given
.bdl file?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 08/12] bundle: parse filter capability
  2022-03-08 14:39     ` [PATCH v3 08/12] bundle: parse filter capability Derrick Stolee via GitGitGadget
@ 2022-03-08 17:29       ` Junio C Hamano
  2022-03-09 14:35         ` Derrick Stolee
  2022-03-09 13:30       ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 114+ messages in thread
From: Junio C Hamano @ 2022-03-08 17:29 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Jeff Hostetler,
	Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/bundle.h b/bundle.h
> index 06009fe6b1f..7fef2108f43 100644
> --- a/bundle.h
> +++ b/bundle.h
> @@ -4,12 +4,14 @@
>  #include "strvec.h"
>  #include "cache.h"
>  #include "string-list.h"
> +#include "list-objects-filter-options.h"
>  
>  struct bundle_header {
>  	unsigned version;
>  	struct string_list prerequisites;
>  	struct string_list references;
>  	const struct git_hash_algo *hash_algo;
> +	struct list_objects_filter_options filter;
>  };

This used to be a pointer to the struct, with "NULL means do not
filter" semantics, with .nr==0 as BUG().  Which was the same
justification used when an earlier step added a pointer to the
filter struct to rev_info.

Should the same logic applies there to make it into an embedded
struct in rev_info as well?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 05/12] list-objects: consolidate traverse_commit_list[_filtered]
  2022-03-08 14:39     ` [PATCH v3 05/12] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
@ 2022-03-09 13:24       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-09 13:24 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Jeff Hostetler,
	Derrick Stolee


On Tue, Mar 08 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
> [...]
>  void traverse_commit_list_filtered(
> -	struct list_objects_filter_options *filter_options,
>  	struct rev_info *revs,
>  	show_commit_fn show_commit,
>  	show_object_fn show_object,
> @@ -444,7 +429,13 @@ void traverse_commit_list_filtered(
>  	ctx.show_object = show_object;
>  	ctx.show_commit = show_commit;
>  	ctx.show_data = show_data;
> -	ctx.filter = list_objects_filter__init(omitted, filter_options);
> +	if (revs->filter)
> +		ctx.filter = list_objects_filter__init(omitted, revs->filter);
> +	else
> +		ctx.filter = NULL;
> +

Purely an optional nit, but here we could also let the initializer take
care of the default NULL-ing:

diff --git a/list-objects.c b/list-objects.c
index 2f623f82115..52c19d54019 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -421,12 +421,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	struct traversal_context ctx;
-	ctx.revs = revs;
-	ctx.show_commit = show_commit;
-	ctx.show_object = show_object;
-	ctx.show_data = show_data;
-	ctx.filter = NULL;
+	struct traversal_context ctx = {
+		.revs = revs,
+		.show_commit = show_commit,
+		ctx.show_object = show_object,
+		ctx.show_data = show_data,
+	};
+	if (revs->filter)
+		ctx.filter = ...;
 	do_traverse(&ctx);
 }
 

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 08/12] bundle: parse filter capability
  2022-03-08 14:39     ` [PATCH v3 08/12] bundle: parse filter capability Derrick Stolee via GitGitGadget
  2022-03-08 17:29       ` Junio C Hamano
@ 2022-03-09 13:30       ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-09 13:30 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Jeff Hostetler,
	Derrick Stolee


On Tue, Mar 08 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
> [...]
>  static const char v2_bundle_signature[] = "# v2 git bundle\n";
>  static const char v3_bundle_signature[] = "# v3 git bundle\n";
> @@ -33,6 +33,7 @@ void bundle_header_release(struct bundle_header *header)
>  {
>  	string_list_clear(&header->prerequisites, 1);
>  	string_list_clear(&header->references, 1);
> +	list_objects_filter_release(&header->filter);
>  }
>  
>  static int parse_capability(struct bundle_header *header, const char *capability)
> @@ -45,6 +46,10 @@ static int parse_capability(struct bundle_header *header, const char *capability
>  		header->hash_algo = &hash_algos[algo];
>  		return 0;
>  	}
> +	if (skip_prefix(capability, "filter=", &arg)) {
> +		parse_list_objects_filter(&header->filter, arg);
> +		return 0;
> +	}
>  	return error(_("unknown capability '%s'"), capability);
>  }

I haven't tested, but I did wonder if our purely "check reachability"
(or equivalent) "verify" would be slowed down by doing whatever filter
magic we're doing here, but then remembered/saw that we only parse the
header, so it can't be that bad :)

I.e. this is only checking the syntax of the filter, surely, and then
spits it back at us. That makes sense.

I think that this hunk from the subsequent 10/12 is in the wrong place
though, and should be here when we change "verify" (not "create" later):
	
	diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
	index 72ab8139052..831c4788a94 100644
	--- a/Documentation/git-bundle.txt
	+++ b/Documentation/git-bundle.txt
	@@ -75,8 +75,8 @@ verify <file>::
	 	cleanly to the current repository.  This includes checks on the
	 	bundle format itself as well as checking that the prerequisite
	 	commits exist and are fully linked in the current repository.
	-	'git bundle' prints a list of missing commits, if any, and exits
	-	with a non-zero status.
	+	'git bundle' prints the bundle's object filter and its list of
	+	missing commits, if any, and exits with a non-zero status.
	 
	 list-heads <file>::
	 	Lists the references defined in the bundle.  If followed by a

I think instead of starting to list every header we might add in the
future there it would make sense to just add after the pre-image
full-stop something like:

    We'll also print out information about any known capabilities, such
    as "object filter". See "Capabilities" in technical/...

Which future proofs it a bit...

> [...]

I think it would also be great to have tests for intentionally
corrupting the header and seeing what the "verify" output is, i.e. do we
die right away, proceed to validate the rest. So just (somewhat
pseudocode):

    git bundle create [...] my.bdl &&
    sed 's/@filter: .*/@filter: bad/' <my.bdl >bad.bdl &&
    test_must_fail git bundle verify bad.bdl 2>actual &&
    [...]
    test_cmp expect actual

Overall this series looks really good at this point, and I'm down to
minor shades of colors on the bikeshedding :)

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 07/12] list-objects: handle NULL function pointers
  2022-03-08 17:26       ` Junio C Hamano
@ 2022-03-09 13:40         ` Ævar Arnfjörð Bjarmason
  2022-03-09 14:16           ` Derrick Stolee
  2022-03-09 18:32           ` Junio C Hamano
  0 siblings, 2 replies; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-09 13:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason via GitGitGadget, git,
	stolee, zhiyou.jx, jonathantanmy, Jeff Hostetler, Derrick Stolee


On Tue, Mar 08 2022, Junio C Hamano wrote:

> "Ævar Arnfjörð Bjarmason via GitGitGadget"  <gitgitgadget@gmail.com>
> writes:
>
>> Replace all callers of the show_commit and show_object function pointers
>> in list-objects.c to be local methods show_commit() and show_object()
>
> "to be local methods" -> "to call helper functions"
>
>> which check that the given contex has non-NULL functions before passing
>
> "contex" -> "context"
>
>> the necessary data. One extra benefit is that it reduces duplication
>> due to passing ctx->show_data to every caller.
>
>> -		ctx->show_object(obj, path->buf, ctx->show_data);
>> +		show_object(ctx, obj, path->buf);
>
> I guess this is the "reduced duplication" refers to.  The helper
> does make it easier to follow and reason about: "show the given
> object at the path in this context" is what it asks.
>
>> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
>> index b13e8a52a93..6522401617d 100755
>> --- a/t/t6020-bundle-misc.sh
>> +++ b/t/t6020-bundle-misc.sh
>> @@ -475,4 +475,16 @@ test_expect_success 'clone from bundle' '
>>  	test_cmp expect actual
>>  '
>>  
>> +test_expect_success 'unfiltered bundle with --objects' '
>> +	git bundle create all-objects.bdl \
>> +		--all --objects &&
>> +	git bundle create all.bdl \
>> +		--all &&
>> +
>> +	# Compare the headers of these files.
>> +	head -11 all.bdl >expect &&
>> +	head -11 all-objects.bdl >actual &&
>
> "head -n 11" but more importantly, why eleven and not ten or twelve?
> Is that a number this code can automatically learn from the given
> .bdl file?

I suspect what's wanted here is "print all stuff before the "\n\n"
header/PACK delimiter, which is better done with "sed" like this:

	sed -n -e '/^$/q' -e 'p'

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 07/12] list-objects: handle NULL function pointers
  2022-03-09 13:40         ` Ævar Arnfjörð Bjarmason
@ 2022-03-09 14:16           ` Derrick Stolee
  2022-03-09 18:32           ` Junio C Hamano
  1 sibling, 0 replies; 114+ messages in thread
From: Derrick Stolee @ 2022-03-09 14:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason via GitGitGadget, git,
	stolee, zhiyou.jx, jonathantanmy, Jeff Hostetler

On 3/9/2022 8:40 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Mar 08 2022, Junio C Hamano wrote:
> 
>> "Ævar Arnfjörð Bjarmason via GitGitGadget"  <gitgitgadget@gmail.com>
>> writes:
>>
>>> Replace all callers of the show_commit and show_object function pointers
>>> in list-objects.c to be local methods show_commit() and show_object()
>>
>> "to be local methods" -> "to call helper functions"
>>
>>> which check that the given contex has non-NULL functions before passing
>>
>> "contex" -> "context"
>>
>>> the necessary data. One extra benefit is that it reduces duplication
>>> due to passing ctx->show_data to every caller.
>>
>>> -		ctx->show_object(obj, path->buf, ctx->show_data);
>>> +		show_object(ctx, obj, path->buf);
>>
>> I guess this is the "reduced duplication" refers to.  The helper
>> does make it easier to follow and reason about: "show the given
>> object at the path in this context" is what it asks.
>>
>>> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
>>> index b13e8a52a93..6522401617d 100755
>>> --- a/t/t6020-bundle-misc.sh
>>> +++ b/t/t6020-bundle-misc.sh
>>> @@ -475,4 +475,16 @@ test_expect_success 'clone from bundle' '
>>>  	test_cmp expect actual
>>>  '
>>>  
>>> +test_expect_success 'unfiltered bundle with --objects' '
>>> +	git bundle create all-objects.bdl \
>>> +		--all --objects &&
>>> +	git bundle create all.bdl \
>>> +		--all &&
>>> +
>>> +	# Compare the headers of these files.
>>> +	head -11 all.bdl >expect &&
>>> +	head -11 all-objects.bdl >actual &&
>>
>> "head -n 11" but more importantly, why eleven and not ten or twelve?
>> Is that a number this code can automatically learn from the given
>> .bdl file?
> 
> I suspect what's wanted here is "print all stuff before the "\n\n"
> header/PACK delimiter, which is better done with "sed" like this:
> 
> 	sed -n -e '/^$/q' -e 'p'

Thanks for this tip. That is indeed the intention.

-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 08/12] bundle: parse filter capability
  2022-03-08 17:29       ` Junio C Hamano
@ 2022-03-09 14:35         ` Derrick Stolee
  0 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee @ 2022-03-09 14:35 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Jeff Hostetler

On 3/8/2022 12:29 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> diff --git a/bundle.h b/bundle.h
>> index 06009fe6b1f..7fef2108f43 100644
>> --- a/bundle.h
>> +++ b/bundle.h
>> @@ -4,12 +4,14 @@
>>  #include "strvec.h"
>>  #include "cache.h"
>>  #include "string-list.h"
>> +#include "list-objects-filter-options.h"
>>  
>>  struct bundle_header {
>>  	unsigned version;
>>  	struct string_list prerequisites;
>>  	struct string_list references;
>>  	const struct git_hash_algo *hash_algo;
>> +	struct list_objects_filter_options filter;
>>  };
> 
> This used to be a pointer to the struct, with "NULL means do not
> filter" semantics, with .nr==0 as BUG().  Which was the same
> justification used when an earlier step added a pointer to the
> filter struct to rev_info.
> 
> Should the same logic applies there to make it into an embedded
> struct in rev_info as well?

You're absolutely right. Making the change will make the
range-diff look absolutely horrid, but the change isn't
terribly hard for the most part.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v4 00/13] Partial bundles
  2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (11 preceding siblings ...)
  2022-03-08 14:39     ` [PATCH v3 12/12] clone: fail gracefully when cloning filtered bundle Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01     ` Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 01/13] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
                         ` (12 more replies)
  12 siblings, 13 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee

While discussing bundle-URIs [1], it came to my attention that bundles have
no way to specify an object filter, so bundles cannot be used with partial
clones.

[1]
https://lore.kernel.org/git/7fab28bf-54e7-d0e9-110a-53fad6244cc9@gmail.com/

This series provides a way to fix that by adding a 'filter' capability to
the bundle file format and allowing one to create a partial bundle with 'git
bundle create --filter=blob:none '.

There are a couple things that I want to point out about this implementation
that could use some high-level feedback:

 1. I moved the '--filter' parsing into setup_revisions() instead of adding
    another place to parse it. This works for 'git bundle' but it also
    allows it to be parsed successfully in commands such as 'git diff' which
    doesn't make sense. Options such as '--objects' are already being parsed
    there, and they don't make sense either, so I want some thoughts on
    this.

 2. If someone uses 'git clone partial.bdl partial' where 'partial.bdl' is a
    filtered bundle, then the clone will fail with a message such as

fatal: missing blob object '9444604d515c0b162e37e59accd54a0bac50ed2e' fatal:
remote did not send all necessary objects

This might be fine. We don't expect users to clone partial bundles or fetch
partial bundles into an unfiltered repo and these failures are expected. It
is possible that we could put in custom logic to fail faster by reading the
bundle header for a filter.

Generally, the idea is to open this up as a potential way to bootstrap a
clone of a partial clone using a set of precomputed partial bundles.


Updates in v4
=============

 * 'struct rev_info' now has 'filter' embedded statically, using
   filter.choice to indicate if it is an empty filter. This makes the
   range-diff look really messy, as lots of '&' characters are inserted,
   especially in the middle patches. The end result looks very similar.

 * To accommodate previous lines that were a pointer copy, create
   list_objects_filter_copy() to assist with deep copies of filters.

 * Commit message typo fixes.

 * Documentation improvements.

 * Tests use 'sed' over 'head' to be more robust to future changes.

 * Initialization of a 'struct traversal_context' is made more compact.

 * "filter.choice != LOFC_DISABLED" is replaced by "filter.choice", since
   LOFC_DISABLED is zero by design.


Updates in v3
=============

 * 'struct bundle_header' now has 'filter' embedded statically, using
   filter.choice to indicate if it is an empty filter.

 * list-objects.c is now more robust to NULL function pointers.


Updates in v2
=============

Thanks for the reviews, Jeff, Junio, and Ævar!

 * Commit message typos and grammar are improved.

 * Grammar in MyFirstObjectWalk.txt is improved.

 * Unnecessary line wrapping is unwrapped.

 * Final test to check unbundled repo is made more rigorous.

 * The new 'filter' capability is added to
   Documentation/technical/bundle-format.txt

 * Expanded docs for 'git bundle verify'.

 * Moved API docs gently_parse_list_objects_filter() to header.

 * Test name swaps '' with "" to evaluate $filter.

 * Added a new patch that helps git clone <bundle> fail gracefully when
   <bundle> is has a filter capability.

Thanks, -Stolee

Derrick Stolee (12):
  index-pack: document and test the --promisor option
  list-objects-filter-options: create copy helper
  revision: put object filter into struct rev_info
  pack-objects: use rev.filter when possible
  pack-bitmap: drop filter in prepare_bitmap_walk()
  list-objects: consolidate traverse_commit_list[_filtered]
  MyFirstObjectWalk: update recommended usage
  bundle: parse filter capability
  rev-list: move --filter parsing into revision.c
  bundle: create filtered bundles
  bundle: unbundle promisor packs
  clone: fail gracefully when cloning filtered bundle

Ævar Arnfjörð Bjarmason (1):
  list-objects: handle NULL function pointers

 Documentation/MyFirstObjectWalk.txt       | 44 +++++---------
 Documentation/git-bundle.txt              |  7 ++-
 Documentation/git-index-pack.txt          |  8 +++
 Documentation/technical/bundle-format.txt | 11 +++-
 builtin/clone.c                           | 13 ++++
 builtin/pack-objects.c                    |  9 +--
 builtin/rev-list.c                        | 29 +++------
 bundle.c                                  | 74 +++++++++++++++++++----
 bundle.h                                  |  2 +
 list-objects-filter-options.c             | 36 ++++++-----
 list-objects-filter-options.h             | 24 ++++++++
 list-objects.c                            | 61 ++++++++++---------
 list-objects.h                            | 12 +++-
 pack-bitmap.c                             | 24 ++++----
 pack-bitmap.h                             |  2 -
 reachable.c                               |  2 +-
 revision.c                                |  7 +++
 revision.h                                |  7 +++
 t/t5300-pack-object.sh                    |  4 +-
 t/t6020-bundle-misc.sh                    | 74 +++++++++++++++++++++++
 20 files changed, 317 insertions(+), 133 deletions(-)


base-commit: 45fe28c951c3e70666ee4ef8379772851a8e4d32
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1159%2Fderrickstolee%2Fbundle%2Fpartial-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1159/derrickstolee/bundle/partial-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1159

Range-diff vs v3:

  1:  a1eb4dceb0b =  1:  a1eb4dceb0b index-pack: document and test the --promisor option
  -:  ----------- >  2:  a4c5a727ef1 list-objects-filter-options: create copy helper
  2:  3a88c99d9bc !  3:  4ac09ddbfaa revision: put object filter into struct rev_info
     @@ Metadata
       ## Commit message ##
          revision: put object filter into struct rev_info
      
     -    Placing a 'struct list_objects_filter_options' pointer within 'struct
     -    rev_info' will assist making some bookkeeping around object filters in
     -    the future.
     +    Placing a 'struct list_objects_filter_options' within 'struct rev_info'
     +    will assist making some bookkeeping around object filters in the future.
      
          For now, let's use this new member to remove a static global instance of
          the struct from builtin/rev-list.c.
     @@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs,
       	max_count = revs->max_count;
       
      -	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
     -+	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
     ++	bitmap_git = prepare_bitmap_walk(revs, &revs->filter,
      +					 filter_provided_objects);
       	if (!bitmap_git)
       		return -1;
     @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs,
       		return -1;
       
      -	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
     -+	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
     ++	bitmap_git = prepare_bitmap_walk(revs, &revs->filter,
      +					 filter_provided_objects);
       	if (!bitmap_git)
       		return -1;
     @@ builtin/rev-list.c: static int try_bitmap_disk_usage(struct rev_info *revs,
       		return -1;
       
      -	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
     -+	bitmap_git = prepare_bitmap_walk(revs, revs->filter, filter_provided_objects);
     ++	bitmap_git = prepare_bitmap_walk(revs, &revs->filter, filter_provided_objects);
       	if (!bitmap_git)
       		return -1;
       
     @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
       		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
      -			parse_list_objects_filter(&filter_options, arg);
      -			if (filter_options.choice && !revs.blob_objects)
     -+			if (!revs.filter)
     -+				CALLOC_ARRAY(revs.filter, 1);
     -+			parse_list_objects_filter(revs.filter, arg);
     -+			if (revs.filter->choice && !revs.blob_objects)
     ++			parse_list_objects_filter(&revs.filter, arg);
     ++			if (revs.filter.choice && !revs.blob_objects)
       				die(_("object filtering requires --objects"));
       			continue;
       		}
       		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
      -			list_objects_filter_set_no_filter(&filter_options);
     -+			if (!revs.filter)
     -+				CALLOC_ARRAY(revs.filter, 1);
     -+			list_objects_filter_set_no_filter(revs.filter);
     ++			list_objects_filter_set_no_filter(&revs.filter);
       			continue;
       		}
       		if (!strcmp(arg, "--filter-provided-objects")) {
     @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
       
       	traverse_commit_list_filtered(
      -		&filter_options, &revs, show_commit, show_object, &info,
     -+		revs.filter, &revs, show_commit, show_object, &info,
     ++		&revs.filter, &revs, show_commit, show_object, &info,
       		(arg_print_omitted ? &omitted_objects : NULL));
       
       	if (arg_print_omitted) {
      
       ## revision.h ##
     -@@ revision.h: struct rev_cmdline_info {
     - 
     - struct oidset;
     - struct topo_walk_info;
     -+struct list_objects_filter_options;
     - 
     - struct rev_info {
     - 	/* Starting list */
     +@@
     + #include "pretty.h"
     + #include "diff.h"
     + #include "commit-slab-decl.h"
     ++#include "list-objects-filter-options.h"
     + 
     + /**
     +  * The revision walking API offers functions to build a list of revisions
      @@ revision.h: struct rev_info {
       	/* The end-points specified by the end user */
       	struct rev_cmdline_info cmdline;
       
     -+	/* Object filter options. NULL for no filtering. */
     -+	struct list_objects_filter_options *filter;
     ++	/*
     ++	 * Object filter options. No filtering is specified
     ++	 * if and only if filter.choice is zero.
     ++	 */
     ++	struct list_objects_filter_options filter;
      +
       	/* excluding from --branches, --refs, etc. expansion */
       	struct string_list *ref_excludes;
  3:  d5edb193229 !  4:  ed22a77782b pack-objects: use rev.filter when possible
     @@ builtin/pack-objects.c: static int pack_options_allow_reuse(void)
       static int get_object_list_from_bitmap(struct rev_info *revs)
       {
      -	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
     -+	if (!(bitmap_git = prepare_bitmap_walk(revs, revs->filter, 0)))
     ++	if (!(bitmap_git = prepare_bitmap_walk(revs, &revs->filter, 0)))
       		return -1;
       
       	if (pack_options_allow_reuse() &&
     @@ builtin/pack-objects.c: static void get_object_list(int ac, const char **av)
       	repo_init_revisions(the_repository, &revs, NULL);
       	save_commit_buffer = 0;
       	setup_revisions(ac, av, &revs, &s_r_opt);
     -+	revs.filter = &filter_options;
     ++	list_objects_filter_copy(&revs.filter, &filter_options);
       
       	/* make sure shallows are read */
       	is_repository_shallow(the_repository);
     @@ builtin/pack-objects.c: static void get_object_list(int ac, const char **av)
       	if (!fn_show_object)
       		fn_show_object = show_object;
      -	traverse_commit_list_filtered(&filter_options, &revs,
     -+	traverse_commit_list_filtered(revs.filter, &revs,
     ++	traverse_commit_list_filtered(&revs.filter, &revs,
       				      show_commit, fn_show_object, NULL,
       				      NULL);
       
  4:  888774f6f28 !  5:  346baa78ec5 pack-bitmap: drop filter in prepare_bitmap_walk()
     @@ builtin/pack-objects.c: static int pack_options_allow_reuse(void)
       
       static int get_object_list_from_bitmap(struct rev_info *revs)
       {
     --	if (!(bitmap_git = prepare_bitmap_walk(revs, revs->filter, 0)))
     +-	if (!(bitmap_git = prepare_bitmap_walk(revs, &revs->filter, 0)))
      +	if (!(bitmap_git = prepare_bitmap_walk(revs, 0)))
       		return -1;
       
     @@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs,
       	 */
       	max_count = revs->max_count;
       
     --	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
     +-	bitmap_git = prepare_bitmap_walk(revs, &revs->filter,
      -					 filter_provided_objects);
      +	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
       	if (!bitmap_git)
     @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs,
       	if (revs->max_count >= 0)
       		return -1;
       
     --	bitmap_git = prepare_bitmap_walk(revs, revs->filter,
     +-	bitmap_git = prepare_bitmap_walk(revs, &revs->filter,
      -					 filter_provided_objects);
      +	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
       	if (!bitmap_git)
     @@ builtin/rev-list.c: static int try_bitmap_disk_usage(struct rev_info *revs,
       	if (!show_disk_usage)
       		return -1;
       
     --	bitmap_git = prepare_bitmap_walk(revs, revs->filter, filter_provided_objects);
     +-	bitmap_git = prepare_bitmap_walk(revs, &revs->filter, filter_provided_objects);
      +	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
       	if (!bitmap_git)
       		return -1;
     @@ pack-bitmap.c: static struct bitmap *find_objects(struct bitmap_index *bitmap_gi
       		show_data.base = base;
       
      -		traverse_commit_list_filtered(filter, revs,
     -+		traverse_commit_list_filtered(revs->filter, revs,
     ++		traverse_commit_list_filtered(&revs->filter, revs,
       					      show_commit, show_object,
       					      &show_data, NULL);
       
     @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
       		return NULL;
       
      -	if (!can_filter_bitmap(filter))
     -+	if (!can_filter_bitmap(revs->filter))
     ++	if (!can_filter_bitmap(&revs->filter))
       		return NULL;
       
       	/* try to open a bitmapped pack, but don't parse it yet
     @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
      -	filter_bitmap(bitmap_git, (filter && filter_provided_objects) ? NULL : wants,
      -		      wants_bitmap, filter);
      +	filter_bitmap(bitmap_git,
     -+		      (revs->filter && filter_provided_objects) ? NULL : wants,
     ++		      (revs->filter.choice && filter_provided_objects) ? NULL : wants,
      +		      wants_bitmap,
     -+		      revs->filter);
     ++		      &revs->filter);
       
       	bitmap_git->result = wants_bitmap;
       	bitmap_git->haves = haves_bitmap;
  5:  bcb76a065bf !  6:  34afea8fcd6 list-objects: consolidate traverse_commit_list[_filtered]
     @@ builtin/pack-objects.c: static void get_object_list(int ac, const char **av)
       
       	if (!fn_show_object)
       		fn_show_object = show_object;
     --	traverse_commit_list_filtered(revs.filter, &revs,
     +-	traverse_commit_list_filtered(&revs.filter, &revs,
      -				      show_commit, fn_show_object, NULL,
      -				      NULL);
      +	traverse_commit_list(&revs,
     @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
       		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
       
       	traverse_commit_list_filtered(
     --		revs.filter, &revs, show_commit, show_object, &info,
     +-		&revs.filter, &revs, show_commit, show_object, &info,
      +		&revs, show_commit, show_object, &info,
       		(arg_print_omitted ? &omitted_objects : NULL));
       
     @@ list-objects.c: static void do_traverse(struct traversal_context *ctx)
       	struct rev_info *revs,
       	show_commit_fn show_commit,
       	show_object_fn show_object,
     -@@ list-objects.c: void traverse_commit_list_filtered(
     - 	ctx.show_object = show_object;
     - 	ctx.show_commit = show_commit;
     - 	ctx.show_data = show_data;
     --	ctx.filter = list_objects_filter__init(omitted, filter_options);
     -+	if (revs->filter)
     -+		ctx.filter = list_objects_filter__init(omitted, revs->filter);
     -+	else
     -+		ctx.filter = NULL;
     + 	void *show_data,
     + 	struct oidset *omitted)
     + {
     +-	struct traversal_context ctx;
     ++	struct traversal_context ctx = {
     ++		.revs = revs,
     ++		.show_object = show_object,
     ++		.show_commit = show_commit,
     ++		.show_data = show_data,
     ++	};
      +
     ++	if (revs->filter.choice)
     ++		ctx.filter = list_objects_filter__init(omitted, &revs->filter);
     + 
     +-	ctx.revs = revs;
     +-	ctx.show_object = show_object;
     +-	ctx.show_commit = show_commit;
     +-	ctx.show_data = show_data;
     +-	ctx.filter = list_objects_filter__init(omitted, filter_options);
       	do_traverse(&ctx);
      -	list_objects_filter__free(ctx.filter);
      +
     @@ pack-bitmap.c: static struct bitmap *find_objects(struct bitmap_index *bitmap_gi
       		show_data.bitmap_git = bitmap_git;
       		show_data.base = base;
       
     --		traverse_commit_list_filtered(revs->filter, revs,
     +-		traverse_commit_list_filtered(&revs->filter, revs,
      -					      show_commit, show_object,
      -					      &show_data, NULL);
      +		traverse_commit_list(revs,
  6:  efc03168818 =  7:  e522bf61b68 MyFirstObjectWalk: update recommended usage
  7:  782182a26e3 !  8:  7287aaec598 list-objects: handle NULL function pointers
     @@ Commit message
          leaving this segfault open for other callers.
      
          Replace all callers of the show_commit and show_object function pointers
     -    in list-objects.c to be local methods show_commit() and show_object()
     -    which check that the given contex has non-NULL functions before passing
     -    the necessary data. One extra benefit is that it reduces duplication
     -    due to passing ctx->show_data to every caller.
     +    in list-objects.c to call helper functions show_commit() and
     +    show_object() which check that the given context has non-NULL functions
     +    before passing the necessary data. One extra benefit is that it reduces
     +    duplication due to passing ctx->show_data to every caller.
      
          Test that this segfault no longer occurs for 'git bundle'.
      
     @@ t/t6020-bundle-misc.sh: test_expect_success 'clone from bundle' '
      +		--all &&
      +
      +	# Compare the headers of these files.
     -+	head -11 all.bdl >expect &&
     -+	head -11 all-objects.bdl >actual &&
     ++	sed -n -e "/^$/q" -e "p" all.bdl >expect &&
     ++	sed -n -e "/^$/q" -e "p" all-objects.bdl >actual &&
      +	test_cmp expect actual
      +'
      +
  8:  025f38290f5 !  9:  faf7a38b0e5 bundle: parse filter capability
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     + ## Documentation/git-bundle.txt ##
     +@@ Documentation/git-bundle.txt: verify <file>::
     + 	cleanly to the current repository.  This includes checks on the
     + 	bundle format itself as well as checking that the prerequisite
     + 	commits exist and are fully linked in the current repository.
     +-	'git bundle' prints a list of missing commits, if any, and exits
     +-	with a non-zero status.
     ++	Information about additional capabilities, such as "object filter",
     ++	is printed. See "Capabilities" in link:technical/bundle-format.html
     ++	for more information. Finally, 'git bundle' prints a list of
     ++	missing commits, if any. The exit code is zero for success, but
     ++	will be nonzero if the bundle file is invalid.
     + 
     + list-heads <file>::
     + 	Lists the references defined in the bundle.  If followed by a
     +
       ## Documentation/technical/bundle-format.txt ##
      @@ Documentation/technical/bundle-format.txt: and the Git bundle v2 format cannot represent a shallow clone repository.
       == Capabilities
     @@ bundle.c: int verify_bundle(struct repository *r,
       	req_nr = revs.pending.nr;
       	setup_revisions(2, argv, &revs, NULL);
       
     -+	revs.filter = &header->filter;
     ++	list_objects_filter_copy(&revs.filter, &header->filter);
      +
       	if (prepare_revision_walk(&revs))
       		die(_("revision walk setup failed"));
     @@ bundle.c: int verify_bundle(struct repository *r,
       			  r->nr);
       		list_refs(r, 0, NULL);
      +
     -+		if (header->filter.choice != LOFC_DISABLED) {
     ++		if (header->filter.choice) {
      +			printf_ln("The bundle uses this filter: %s",
      +				  list_objects_filter_spec(&header->filter));
      +		}
  9:  2c8e8a6c2a5 ! 10:  05d7322fdfc rev-list: move --filter parsing into revision.c
     @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
       		}
      -
      -		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
     --			if (!revs.filter)
     --				CALLOC_ARRAY(revs.filter, 1);
     --			parse_list_objects_filter(revs.filter, arg);
     --			if (revs.filter->choice && !revs.blob_objects)
     +-			parse_list_objects_filter(&revs.filter, arg);
     +-			if (revs.filter.choice && !revs.blob_objects)
      -				die(_("object filtering requires --objects"));
      -			continue;
      -		}
      -		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
     --			if (!revs.filter)
     --				CALLOC_ARRAY(revs.filter, 1);
     --			list_objects_filter_set_no_filter(revs.filter);
     +-			list_objects_filter_set_no_filter(&revs.filter);
      -			continue;
      -		}
       		if (!strcmp(arg, "--filter-provided-objects")) {
     @@ revision.c: static int handle_revision_pseudo_opt(struct rev_info *revs,
       	} else if (!strcmp(arg, "--single-worktree")) {
       		revs->single_worktree = 1;
      +	} else if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
     -+		if (!revs->filter)
     -+			CALLOC_ARRAY(revs->filter, 1);
     -+		parse_list_objects_filter(revs->filter, arg);
     ++		parse_list_objects_filter(&revs->filter, arg);
      +	} else if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
     -+		if (!revs->filter)
     -+			CALLOC_ARRAY(revs->filter, 1);
     -+		list_objects_filter_set_no_filter(revs->filter);
     ++		list_objects_filter_set_no_filter(&revs->filter);
       	} else {
       		return 0;
       	}
     @@ revision.c: int setup_revisions(int argc, const char **argv, struct rev_info *re
       		die("cannot combine --walk-reflogs with history-limiting options");
       	if (revs->rewrite_parents && revs->children.name)
       		die(_("options '%s' and '%s' cannot be used together"), "--parents", "--children");
     -+	if (revs->filter && revs->filter->choice && !revs->blob_objects)
     ++	if (revs->filter.choice && !revs->blob_objects)
      +		die(_("object filtering requires --objects"));
       
       	/*
 10:  470b6f73e28 ! 11:  7435095bbc9 bundle: create filtered bundles
     @@ Commit message
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     - ## Documentation/git-bundle.txt ##
     -@@ Documentation/git-bundle.txt: verify <file>::
     - 	cleanly to the current repository.  This includes checks on the
     - 	bundle format itself as well as checking that the prerequisite
     - 	commits exist and are fully linked in the current repository.
     --	'git bundle' prints a list of missing commits, if any, and exits
     --	with a non-zero status.
     -+	'git bundle' prints the bundle's object filter and its list of
     -+	missing commits, if any, and exits with a non-zero status.
     - 
     - list-heads <file>::
     - 	Lists the references defined in the bundle.  If followed by a
     -
       ## bundle.c ##
      @@ bundle.c: static int write_pack_data(int bundle_fd, struct rev_info *revs, struct strvec *
       		     "--stdout", "--thin", "--delta-base-offset",
       		     NULL);
       	strvec_pushv(&pack_objects.args, pack_options->v);
     -+	if (revs->filter)
     ++	if (revs->filter.choice)
      +		strvec_pushf(&pack_objects.args, "--filter=%s",
     -+			     list_objects_filter_spec(revs->filter));
     ++			     list_objects_filter_spec(&revs->filter));
       	pack_objects.in = -1;
       	pack_objects.out = bundle_fd;
       	pack_objects.git_cmd = 1;
     @@ bundle.c: int create_bundle(struct repository *r, const char *path,
      +	 *    SHA1.
      +	 * 2. @filter is required because we parsed an object filter.
      +	 */
     -+	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] || revs.filter)
     ++	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] || revs.filter.choice)
      +		min_version = 3;
      +
      +	if (argc > 1) {
     @@ bundle.c: int create_bundle(struct repository *r, const char *path,
      -	if (argc > 1) {
      -		error(_("unrecognized argument: %s"), argv[1]);
      -		goto err;
     -+		if (revs.filter) {
     -+			const char *value = expand_list_objects_filter_spec(revs.filter);
     ++		if (revs.filter.choice) {
     ++			const char *value = expand_list_objects_filter_spec(&revs.filter);
      +			capability = "@filter=";
      +			write_or_die(bundle_fd, capability, strlen(capability));
      +			write_or_die(bundle_fd, value, strlen(value));
     @@ bundle.c: int create_bundle(struct repository *r, const char *path,
       	bpi.pending = &revs_copy.pending;
       
      +	/*
     -+	 * Nullify the filter here, and any object walking. We only care
     -+	 * about commits and tags here. The revs_copy has the right
     -+	 * instances of these values.
     ++	 * Remove any object walking here. We only care about commits and
     ++	 * tags here. The revs_copy has the right instances of these values.
      +	 */
     -+	revs.filter = NULL;
       	revs.blob_objects = revs.tree_objects = 0;
       	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
       	object_array_remove_duplicates(&revs_copy.pending);
 11:  e85ca2770a3 ! 12:  77a62156332 bundle: unbundle promisor packs
     @@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
       	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
       
      +	/* If there is a filter, then we need to create the promisor pack. */
     -+	if (header->filter.choice != LOFC_DISABLED)
     ++	if (header->filter.choice)
      +		strvec_push(&ip.args, "--promisor=from-bundle");
      +
       	if (extra_index_pack_args) {
 12:  805e1d11722 ! 13:  de6a1a868d3 clone: fail gracefully when cloning filtered bundle
     @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
       	transport->cloning = 1;
       
      +	if (is_bundle) {
     -+		struct bundle_header header = { 0 };
     ++		struct bundle_header header = BUNDLE_HEADER_INIT;
      +		int fd = read_bundle_header(path, &header);
      +		int has_filter = header.filter.choice != LOFC_DISABLED;
      +

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v4 01/13] index-pack: document and test the --promisor option
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01       ` Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 02/13] list-objects-filter-options: create copy helper Derrick Stolee via GitGitGadget
                         ` (11 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The --promisor option of 'git index-pack' was created in 88e2f9e
(introduce fetch-object: fetch one promisor object, 2017-12-05) but was
untested. It is currently unused within the Git codebase, but that will
change in an upcoming change to 'git bundle unbundle' when there is a
filter capability.

For now, add documentation about the option and add a test to ensure it
is working as expected.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-index-pack.txt | 8 ++++++++
 t/t5300-pack-object.sh           | 4 +++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 1f1e3592251..4e71c256ecb 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -122,6 +122,14 @@ This option cannot be used with --stdin.
 +
 include::object-format-disclaimer.txt[]
 
+--promisor[=<message>]::
+	Before committing the pack-index, create a .promisor file for this
+	pack. Particularly helpful when writing a promisor pack with --fix-thin
+	since the name of the pack is not final until the pack has been fully
+	written. If a `<message>` is provided, then that content will be
+	written to the .promisor file for future reference. See
+	link:technical/partial-clone.html[partial clone] for more information.
+
 NOTES
 -----
 
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 2fd845187e7..a11d61206ad 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -315,8 +315,10 @@ test_expect_success \
      git index-pack -o tmp.idx test-3.pack &&
      cmp tmp.idx test-1-${packname_1}.idx &&
 
-     git index-pack test-3.pack &&
+     git index-pack --promisor=message test-3.pack &&
      cmp test-3.idx test-1-${packname_1}.idx &&
+     echo message >expect &&
+     test_cmp expect test-3.promisor &&
 
      cat test-2-${packname_2}.pack >test-3.pack &&
      git index-pack -o tmp.idx test-2-${packname_2}.pack &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 02/13] list-objects-filter-options: create copy helper
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 01/13] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01       ` Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 03/13] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
                         ` (10 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

As we add more embedded members with type 'struct
list_objects_filter_options', it will be important to easily perform a
deep copy across multiple such structs. Create
list_objects_filter_copy() to satisfy this need.

This method is recursive to match the recursive nature of the struct.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 list-objects-filter-options.c | 19 +++++++++++++++++++
 list-objects-filter-options.h |  4 ++++
 2 files changed, 23 insertions(+)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index fd8d59f653a..449d53af69f 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -415,3 +415,22 @@ void partial_clone_get_default_filter_spec(
 					 &errbuf);
 	strbuf_release(&errbuf);
 }
+
+void list_objects_filter_copy(
+	struct list_objects_filter_options *dest,
+	const struct list_objects_filter_options *src)
+{
+	int i;
+	struct string_list_item *item;
+
+	/* Copy everything. We will overwrite the pointers shortly. */
+	memcpy(dest, src, sizeof(struct list_objects_filter_options));
+
+	string_list_init_dup(&dest->filter_spec);
+	for_each_string_list_item(item, &src->filter_spec)
+		string_list_append(&dest->filter_spec, item->string);
+
+	ALLOC_ARRAY(dest->sub, dest->sub_alloc);
+	for (i = 0; i < src->sub_nr; i++)
+		list_objects_filter_copy(&dest->sub[i], &src->sub[i]);
+}
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index da5b6737e27..425c38cae9d 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -132,4 +132,8 @@ void partial_clone_get_default_filter_spec(
 	struct list_objects_filter_options *filter_options,
 	const char *remote);
 
+void list_objects_filter_copy(
+	struct list_objects_filter_options *dest,
+	const struct list_objects_filter_options *src);
+
 #endif /* LIST_OBJECTS_FILTER_OPTIONS_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 03/13] revision: put object filter into struct rev_info
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 01/13] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 02/13] list-objects-filter-options: create copy helper Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01       ` Derrick Stolee via GitGitGadget
  2022-03-09 18:48         ` Junio C Hamano
  2022-03-09 16:01       ` [PATCH v4 04/13] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
                         ` (9 subsequent siblings)
  12 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Placing a 'struct list_objects_filter_options' within 'struct rev_info'
will assist making some bookkeeping around object filters in the future.

For now, let's use this new member to remove a static global instance of
the struct from builtin/rev-list.c.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/rev-list.c | 26 ++++++++++++--------------
 revision.h         |  7 +++++++
 2 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 777558e9b06..1beb578cc51 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -62,7 +62,6 @@ static const char rev_list_usage[] =
 static struct progress *progress;
 static unsigned progress_counter;
 
-static struct list_objects_filter_options filter_options;
 static struct oidset omitted_objects;
 static int arg_print_omitted; /* print objects omitted by filter */
 
@@ -400,7 +399,6 @@ static inline int parse_missing_action_value(const char *value)
 }
 
 static int try_bitmap_count(struct rev_info *revs,
-			    struct list_objects_filter_options *filter,
 			    int filter_provided_objects)
 {
 	uint32_t commit_count = 0,
@@ -436,7 +434,8 @@ static int try_bitmap_count(struct rev_info *revs,
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, &revs->filter,
+					 filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -453,7 +452,6 @@ static int try_bitmap_count(struct rev_info *revs,
 }
 
 static int try_bitmap_traversal(struct rev_info *revs,
-				struct list_objects_filter_options *filter,
 				int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
@@ -465,7 +463,8 @@ static int try_bitmap_traversal(struct rev_info *revs,
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, &revs->filter,
+					 filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -475,7 +474,6 @@ static int try_bitmap_traversal(struct rev_info *revs,
 }
 
 static int try_bitmap_disk_usage(struct rev_info *revs,
-				 struct list_objects_filter_options *filter,
 				 int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
@@ -483,7 +481,7 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!show_disk_usage)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, &revs->filter, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -597,13 +595,13 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		}
 
 		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
-			parse_list_objects_filter(&filter_options, arg);
-			if (filter_options.choice && !revs.blob_objects)
+			parse_list_objects_filter(&revs.filter, arg);
+			if (revs.filter.choice && !revs.blob_objects)
 				die(_("object filtering requires --objects"));
 			continue;
 		}
 		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-			list_objects_filter_set_no_filter(&filter_options);
+			list_objects_filter_set_no_filter(&revs.filter);
 			continue;
 		}
 		if (!strcmp(arg, "--filter-provided-objects")) {
@@ -688,11 +686,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		progress = start_delayed_progress(show_progress, 0);
 
 	if (use_bitmap_index) {
-		if (!try_bitmap_count(&revs, &filter_options, filter_provided_objects))
+		if (!try_bitmap_count(&revs, filter_provided_objects))
 			return 0;
-		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_objects))
+		if (!try_bitmap_disk_usage(&revs, filter_provided_objects))
 			return 0;
-		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_objects))
+		if (!try_bitmap_traversal(&revs, filter_provided_objects))
 			return 0;
 	}
 
@@ -733,7 +731,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
 
 	traverse_commit_list_filtered(
-		&filter_options, &revs, show_commit, show_object, &info,
+		&revs.filter, &revs, show_commit, show_object, &info,
 		(arg_print_omitted ? &omitted_objects : NULL));
 
 	if (arg_print_omitted) {
diff --git a/revision.h b/revision.h
index 3c58c18c63a..b1669a8cc33 100644
--- a/revision.h
+++ b/revision.h
@@ -8,6 +8,7 @@
 #include "pretty.h"
 #include "diff.h"
 #include "commit-slab-decl.h"
+#include "list-objects-filter-options.h"
 
 /**
  * The revision walking API offers functions to build a list of revisions
@@ -94,6 +95,12 @@ struct rev_info {
 	/* The end-points specified by the end user */
 	struct rev_cmdline_info cmdline;
 
+	/*
+	 * Object filter options. No filtering is specified
+	 * if and only if filter.choice is zero.
+	 */
+	struct list_objects_filter_options filter;
+
 	/* excluding from --branches, --refs, etc. expansion */
 	struct string_list *ref_excludes;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 04/13] pack-objects: use rev.filter when possible
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
                         ` (2 preceding siblings ...)
  2022-03-09 16:01       ` [PATCH v4 03/13] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01       ` Derrick Stolee via GitGitGadget
  2022-03-10 13:11         ` Ævar Arnfjörð Bjarmason
  2022-03-09 16:01       ` [PATCH v4 05/13] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
                         ` (8 subsequent siblings)
  12 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

In builtin/pack-objects.c, we use a 'filter_options' global to populate
the --filter=<X> argument. The previous change created a pointer to a
filter option in 'struct rev_info', so we can use that pointer here as a
start to simplifying some usage of object filters.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ba2006f2212..e5b7d015d7d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, &revs->filter, 0)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
@@ -3727,6 +3727,7 @@ static void get_object_list(int ac, const char **av)
 	repo_init_revisions(the_repository, &revs, NULL);
 	save_commit_buffer = 0;
 	setup_revisions(ac, av, &revs, &s_r_opt);
+	list_objects_filter_copy(&revs.filter, &filter_options);
 
 	/* make sure shallows are read */
 	is_repository_shallow(the_repository);
@@ -3777,7 +3778,7 @@ static void get_object_list(int ac, const char **av)
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
-	traverse_commit_list_filtered(&filter_options, &revs,
+	traverse_commit_list_filtered(&revs.filter, &revs,
 				      show_commit, fn_show_object, NULL,
 				      NULL);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 05/13] pack-bitmap: drop filter in prepare_bitmap_walk()
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
                         ` (3 preceding siblings ...)
  2022-03-09 16:01       ` [PATCH v4 04/13] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01       ` Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 06/13] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
                         ` (7 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Now that all consumers of prepare_bitmap_walk() have populated the
'filter' member of 'struct rev_info', we can drop that extra parameter
from the method and access it directly from the 'struct rev_info'.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c |  2 +-
 builtin/rev-list.c     |  8 +++-----
 pack-bitmap.c          | 20 +++++++++-----------
 pack-bitmap.h          |  2 --
 reachable.c            |  2 +-
 5 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e5b7d015d7d..bafce542778 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, &revs->filter, 0)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, 0)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 1beb578cc51..ab7558bd66a 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -434,8 +434,7 @@ static int try_bitmap_count(struct rev_info *revs,
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, &revs->filter,
-					 filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -463,8 +462,7 @@ static int try_bitmap_traversal(struct rev_info *revs,
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, &revs->filter,
-					 filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -481,7 +479,7 @@ static int try_bitmap_disk_usage(struct rev_info *revs,
 	if (!show_disk_usage)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, &revs->filter, filter_provided_objects);
+	bitmap_git = prepare_bitmap_walk(revs, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9c666cdb8bd..37fa4905796 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -739,8 +739,7 @@ static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
 static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 				   struct rev_info *revs,
 				   struct object_list *roots,
-				   struct bitmap *seen,
-				   struct list_objects_filter_options *filter)
+				   struct bitmap *seen)
 {
 	struct bitmap *base = NULL;
 	int needs_walk = 0;
@@ -823,7 +822,7 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 		show_data.bitmap_git = bitmap_git;
 		show_data.base = base;
 
-		traverse_commit_list_filtered(filter, revs,
+		traverse_commit_list_filtered(&revs->filter, revs,
 					      show_commit, show_object,
 					      &show_data, NULL);
 
@@ -1219,7 +1218,6 @@ static int can_filter_bitmap(struct list_objects_filter_options *filter)
 }
 
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter,
 					 int filter_provided_objects)
 {
 	unsigned int i;
@@ -1240,7 +1238,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (revs->prune)
 		return NULL;
 
-	if (!can_filter_bitmap(filter))
+	if (!can_filter_bitmap(&revs->filter))
 		return NULL;
 
 	/* try to open a bitmapped pack, but don't parse it yet
@@ -1297,8 +1295,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 
 	if (haves) {
 		revs->ignore_missing_links = 1;
-		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL,
-					    filter);
+		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
 		reset_revision_walk();
 		revs->ignore_missing_links = 0;
 
@@ -1306,8 +1303,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 			BUG("failed to perform bitmap walk");
 	}
 
-	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap,
-				    filter);
+	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);
 
 	if (!wants_bitmap)
 		BUG("failed to perform bitmap walk");
@@ -1315,8 +1311,10 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
 
-	filter_bitmap(bitmap_git, (filter && filter_provided_objects) ? NULL : wants,
-		      wants_bitmap, filter);
+	filter_bitmap(bitmap_git,
+		      (revs->filter.choice && filter_provided_objects) ? NULL : wants,
+		      wants_bitmap,
+		      &revs->filter);
 
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 19a63fa1abc..3d3ddd77345 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -10,7 +10,6 @@
 struct commit;
 struct repository;
 struct rev_info;
-struct list_objects_filter_options;
 
 static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};
 
@@ -54,7 +53,6 @@ void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
 int test_bitmap_hashes(struct repository *r);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter,
 					 int filter_provided_objects);
 uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git);
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
diff --git a/reachable.c b/reachable.c
index 84e3d0d75ed..b9f4ad886ef 100644
--- a/reachable.c
+++ b/reachable.c
@@ -205,7 +205,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	cp.progress = progress;
 	cp.count = 0;
 
-	bitmap_git = prepare_bitmap_walk(revs, NULL, 0);
+	bitmap_git = prepare_bitmap_walk(revs, 0);
 	if (bitmap_git) {
 		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
 		free_bitmap_index(bitmap_git);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 06/13] list-objects: consolidate traverse_commit_list[_filtered]
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
                         ` (4 preceding siblings ...)
  2022-03-09 16:01       ` [PATCH v4 05/13] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01       ` Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 07/13] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
                         ` (6 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Now that all consumers of traverse_commit_list_filtered() populate the
'filter' member of 'struct rev_info', we can drop that parameter from
the method prototype to simplify things. In addition, the only thing
different now between traverse_commit_list_filtered() and
traverse_commit_list() is the presence of the 'omitted' parameter, which
is only non-NULL for one caller. We can consolidate these two methods by
having one call the other and use the simpler form everywhere the
'omitted' parameter would be NULL.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c |  6 +++---
 builtin/rev-list.c     |  2 +-
 list-objects.c         | 34 ++++++++++++----------------------
 list-objects.h         | 12 ++++++++++--
 pack-bitmap.c          |  6 +++---
 5 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index bafce542778..b18724e32a3 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3778,9 +3778,9 @@ static void get_object_list(int ac, const char **av)
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
-	traverse_commit_list_filtered(&revs.filter, &revs,
-				      show_commit, fn_show_object, NULL,
-				      NULL);
+	traverse_commit_list(&revs,
+			     show_commit, fn_show_object,
+			     NULL);
 
 	if (unpack_unreachable_expiration) {
 		revs.ignore_missing_links = 1;
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ab7558bd66a..ec433cb6d37 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -729,7 +729,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
 
 	traverse_commit_list_filtered(
-		&revs.filter, &revs, show_commit, show_object, &info,
+		&revs, show_commit, show_object, &info,
 		(arg_print_omitted ? &omitted_objects : NULL));
 
 	if (arg_print_omitted) {
diff --git a/list-objects.c b/list-objects.c
index 2f623f82115..117f734398c 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -416,35 +416,25 @@ static void do_traverse(struct traversal_context *ctx)
 	strbuf_release(&csp);
 }
 
-void traverse_commit_list(struct rev_info *revs,
-			  show_commit_fn show_commit,
-			  show_object_fn show_object,
-			  void *show_data)
-{
-	struct traversal_context ctx;
-	ctx.revs = revs;
-	ctx.show_commit = show_commit;
-	ctx.show_object = show_object;
-	ctx.show_data = show_data;
-	ctx.filter = NULL;
-	do_traverse(&ctx);
-}
-
 void traverse_commit_list_filtered(
-	struct list_objects_filter_options *filter_options,
 	struct rev_info *revs,
 	show_commit_fn show_commit,
 	show_object_fn show_object,
 	void *show_data,
 	struct oidset *omitted)
 {
-	struct traversal_context ctx;
+	struct traversal_context ctx = {
+		.revs = revs,
+		.show_object = show_object,
+		.show_commit = show_commit,
+		.show_data = show_data,
+	};
+
+	if (revs->filter.choice)
+		ctx.filter = list_objects_filter__init(omitted, &revs->filter);
 
-	ctx.revs = revs;
-	ctx.show_object = show_object;
-	ctx.show_commit = show_commit;
-	ctx.show_data = show_data;
-	ctx.filter = list_objects_filter__init(omitted, filter_options);
 	do_traverse(&ctx);
-	list_objects_filter__free(ctx.filter);
+
+	if (ctx.filter)
+		list_objects_filter__free(ctx.filter);
 }
diff --git a/list-objects.h b/list-objects.h
index a952680e466..9eaf4de8449 100644
--- a/list-objects.h
+++ b/list-objects.h
@@ -7,7 +7,6 @@ struct rev_info;
 
 typedef void (*show_commit_fn)(struct commit *, void *);
 typedef void (*show_object_fn)(struct object *, const char *, void *);
-void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, void *);
 
 typedef void (*show_edge_fn)(struct commit *);
 void mark_edges_uninteresting(struct rev_info *revs,
@@ -18,11 +17,20 @@ struct oidset;
 struct list_objects_filter_options;
 
 void traverse_commit_list_filtered(
-	struct list_objects_filter_options *filter_options,
 	struct rev_info *revs,
 	show_commit_fn show_commit,
 	show_object_fn show_object,
 	void *show_data,
 	struct oidset *omitted);
 
+static inline void traverse_commit_list(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	void *show_data)
+{
+	traverse_commit_list_filtered(revs, show_commit,
+				      show_object, show_data, NULL);
+}
+
 #endif /* LIST_OBJECTS_H */
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 37fa4905796..97909d48da3 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -822,9 +822,9 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 		show_data.bitmap_git = bitmap_git;
 		show_data.base = base;
 
-		traverse_commit_list_filtered(&revs->filter, revs,
-					      show_commit, show_object,
-					      &show_data, NULL);
+		traverse_commit_list(revs,
+				     show_commit, show_object,
+				     &show_data);
 
 		revs->include_check = NULL;
 		revs->include_check_obj = NULL;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 07/13] MyFirstObjectWalk: update recommended usage
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
                         ` (5 preceding siblings ...)
  2022-03-09 16:01       ` [PATCH v4 06/13] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01       ` Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 08/13] list-objects: handle NULL function pointers Ævar Arnfjörð Bjarmason via GitGitGadget
                         ` (5 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change consolidated traverse_commit_list() and
traverse_commit_list_filtered(). This allows us to simplify the
recommended usage in MyFirstObjectWalk.txt to use this new set of
values.

While here, add some clarification on the difference between the two
methods.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/MyFirstObjectWalk.txt | 44 +++++++++++------------------
 1 file changed, 16 insertions(+), 28 deletions(-)

diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
index ca267941f3e..8d9e85566e6 100644
--- a/Documentation/MyFirstObjectWalk.txt
+++ b/Documentation/MyFirstObjectWalk.txt
@@ -522,24 +522,25 @@ function shows that the all-object walk is being performed by
 `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
 functions reside in `list-objects.c`; examining the source shows that, despite
 the name, these functions traverse all kinds of objects. Let's have a look at
-the arguments to `traverse_commit_list_filtered()`, which are a superset of the
-arguments to the unfiltered version.
+the arguments to `traverse_commit_list()`.
 
-- `struct list_objects_filter_options *filter_options`: This is a struct which
-  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
-- `struct rev_info *revs`: This is the `rev_info` used for the walk.
+- `struct rev_info *revs`: This is the `rev_info` used for the walk. If
+  its `filter` member is not `NULL`, then `filter` contains information for
+  how to filter the object list.
 - `show_commit_fn show_commit`: A callback which will be used to handle each
   individual commit object.
 - `show_object_fn show_object`: A callback which will be used to handle each
   non-commit object (so each blob, tree, or tag).
 - `void *show_data`: A context buffer which is passed in turn to `show_commit`
   and `show_object`.
+
+In addition, `traverse_commit_list_filtered()` has an additional paramter:
+
 - `struct oidset *omitted`: A linked-list of object IDs which the provided
   filter caused to be omitted.
 
-It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
-instead of needing us to call it repeatedly ourselves. Cool! Let's add the
-callbacks first.
+It looks like these methods use callbacks we provide instead of needing us
+to call it repeatedly ourselves. Cool! Let's add the callbacks first.
 
 For the sake of this tutorial, we'll simply keep track of how many of each kind
 of object we find. At file scope in `builtin/walken.c` add the following
@@ -712,20 +713,9 @@ help understand. In our case, that means we omit trees and blobs not directly
 referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
 `HEAD` in the `pending` list.)
 
-First, we'll need to `#include "list-objects-filter-options.h"` and set up the
-`struct list_objects_filter_options` at the top of the function.
-
-----
-static void walken_object_walk(struct rev_info *rev)
-{
-	struct list_objects_filter_options filter_options = { 0 };
-
-	...
-----
-
 For now, we are not going to track the omitted objects, so we'll replace those
 parameters with `NULL`. For the sake of simplicity, we'll add a simple
-build-time branch to use our filter or not. Replace the line calling
+build-time branch to use our filter or not. Preface the line calling
 `traverse_commit_list()` with the following, which will remind us which kind of
 walk we've just performed:
 
@@ -733,19 +723,17 @@ walk we've just performed:
 	if (0) {
 		/* Unfiltered: */
 		trace_printf(_("Unfiltered object walk.\n"));
-		traverse_commit_list(rev, walken_show_commit,
-				walken_show_object, NULL);
 	} else {
 		trace_printf(
 			_("Filtered object walk with filterspec 'tree:1'.\n"));
-		parse_list_objects_filter(&filter_options, "tree:1");
-
-		traverse_commit_list_filtered(&filter_options, rev,
-			walken_show_commit, walken_show_object, NULL, NULL);
+		CALLOC_ARRAY(rev->filter, 1);
+		parse_list_objects_filter(rev->filter, "tree:1");
 	}
+	traverse_commit_list(rev, walken_show_commit,
+			     walken_show_object, NULL);
 ----
 
-`struct list_objects_filter_options` is usually built directly from a command
+The `rev->filter` member is usually built directly from a command
 line argument, so the module provides an easy way to build one from a string.
 Even though we aren't taking user input right now, we can still build one with
 a hardcoded string using `parse_list_objects_filter()`.
@@ -784,7 +772,7 @@ object:
 ----
 	...
 
-		traverse_commit_list_filtered(&filter_options, rev,
+		traverse_commit_list_filtered(rev,
 			walken_show_commit, walken_show_object, NULL, &omitted);
 
 	...
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 08/13] list-objects: handle NULL function pointers
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
                         ` (6 preceding siblings ...)
  2022-03-09 16:01       ` [PATCH v4 07/13] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01       ` Ævar Arnfjörð Bjarmason via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 09/13] bundle: parse filter capability Derrick Stolee via GitGitGadget
                         ` (4 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?=
 <avarab@gmail.com>

If a caller to traverse_commit_list() specifies the options for the
--objects flag but does not specify a show_object function pointer, the
result is a segfault. This is currently visible by running 'git bundle
create --objects HEAD'.

We could fix this problem by supplying a no-op callback in
builtin/bundle.c, but that only solves the problem for one builtin,
leaving this segfault open for other callers.

Replace all callers of the show_commit and show_object function pointers
in list-objects.c to call helper functions show_commit() and
show_object() which check that the given context has non-NULL functions
before passing the necessary data. One extra benefit is that it reduces
duplication due to passing ctx->show_data to every caller.

Test that this segfault no longer occurs for 'git bundle'.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               |  2 ++
 list-objects.c         | 27 ++++++++++++++++++++++-----
 t/t6020-bundle-misc.sh | 12 ++++++++++++
 3 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/bundle.c b/bundle.c
index a0bb687b0f4..7ba60a573d7 100644
--- a/bundle.c
+++ b/bundle.c
@@ -544,6 +544,8 @@ int create_bundle(struct repository *r, const char *path,
 		die("revision walk setup failed");
 	bpi.fd = bundle_fd;
 	bpi.pending = &revs_copy.pending;
+
+	revs.blob_objects = revs.tree_objects = 0;
 	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
 	object_array_remove_duplicates(&revs_copy.pending);
 
diff --git a/list-objects.c b/list-objects.c
index 117f734398c..250d9de41cb 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -21,6 +21,23 @@ struct traversal_context {
 	struct filter *filter;
 };
 
+static void show_commit(struct traversal_context *ctx,
+			struct commit *commit)
+{
+	if (!ctx->show_commit)
+		return;
+	ctx->show_commit(commit, ctx->show_data);
+}
+
+static void show_object(struct traversal_context *ctx,
+			struct object *object,
+			const char *name)
+{
+	if (!ctx->show_object)
+		return;
+	ctx->show_object(object, name, ctx->show_data);
+}
+
 static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
 			 struct strbuf *path,
@@ -60,7 +77,7 @@ static void process_blob(struct traversal_context *ctx,
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		ctx->show_object(obj, path->buf, ctx->show_data);
+		show_object(ctx, obj, path->buf);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -194,7 +211,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		ctx->show_object(obj, base->buf, ctx->show_data);
+		show_object(ctx, obj, base->buf);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -210,7 +227,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		ctx->show_object(obj, base->buf, ctx->show_data);
+		show_object(ctx, obj, base->buf);
 
 	strbuf_setlen(base, baselen);
 	free_tree_buffer(tree);
@@ -228,7 +245,7 @@ static void process_tag(struct traversal_context *ctx,
 	if (r & LOFR_MARK_SEEN)
 		tag->object.flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		ctx->show_object(&tag->object, name, ctx->show_data);
+		show_object(ctx, &tag->object, name);
 }
 
 static void mark_edge_parents_uninteresting(struct commit *commit,
@@ -402,7 +419,7 @@ static void do_traverse(struct traversal_context *ctx)
 		if (r & LOFR_MARK_SEEN)
 			commit->object.flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			ctx->show_commit(commit, ctx->show_data);
+			show_commit(ctx, commit);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index b13e8a52a93..df5ff561fa5 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -475,4 +475,16 @@ test_expect_success 'clone from bundle' '
 	test_cmp expect actual
 '
 
+test_expect_success 'unfiltered bundle with --objects' '
+	git bundle create all-objects.bdl \
+		--all --objects &&
+	git bundle create all.bdl \
+		--all &&
+
+	# Compare the headers of these files.
+	sed -n -e "/^$/q" -e "p" all.bdl >expect &&
+	sed -n -e "/^$/q" -e "p" all-objects.bdl >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 09/13] bundle: parse filter capability
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
                         ` (7 preceding siblings ...)
  2022-03-09 16:01       ` [PATCH v4 08/13] list-objects: handle NULL function pointers Ævar Arnfjörð Bjarmason via GitGitGadget
@ 2022-03-09 16:01       ` Derrick Stolee via GitGitGadget
  2022-03-09 18:41         ` Junio C Hamano
  2022-03-09 16:01       ` [PATCH v4 10/13] rev-list: move --filter parsing into revision.c Derrick Stolee via GitGitGadget
                         ` (3 subsequent siblings)
  12 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The v3 bundle format has capabilities, allowing newer versions of Git to
create bundles with newer features. Older versions that do not
understand these new capabilities will fail with a helpful warning.

Create a new capability allowing Git to understand that the contained
pack-file is filtered according to some object filter. Typically, this
filter will be "blob:none" for a blobless partial clone.

This change teaches Git to parse this capability, place its value in the
bundle header, and demonstrate this understanding by adding a message to
'git bundle verify'.

Since we will use gently_parse_list_objects_filter() outside of
list-objects-filter-options.c, make it an external method and move its
API documentation to before its declaration.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-bundle.txt              |  7 +++++--
 Documentation/technical/bundle-format.txt | 11 ++++++++---
 bundle.c                                  | 15 ++++++++++++++-
 bundle.h                                  |  2 ++
 list-objects-filter-options.c             | 17 +----------------
 list-objects-filter-options.h             | 20 ++++++++++++++++++++
 6 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
index 72ab8139052..ac4c4352aae 100644
--- a/Documentation/git-bundle.txt
+++ b/Documentation/git-bundle.txt
@@ -75,8 +75,11 @@ verify <file>::
 	cleanly to the current repository.  This includes checks on the
 	bundle format itself as well as checking that the prerequisite
 	commits exist and are fully linked in the current repository.
-	'git bundle' prints a list of missing commits, if any, and exits
-	with a non-zero status.
+	Information about additional capabilities, such as "object filter",
+	is printed. See "Capabilities" in link:technical/bundle-format.html
+	for more information. Finally, 'git bundle' prints a list of
+	missing commits, if any. The exit code is zero for success, but
+	will be nonzero if the bundle file is invalid.
 
 list-heads <file>::
 	Lists the references defined in the bundle.  If followed by a
diff --git a/Documentation/technical/bundle-format.txt b/Documentation/technical/bundle-format.txt
index bac558d049a..b9be8644cf5 100644
--- a/Documentation/technical/bundle-format.txt
+++ b/Documentation/technical/bundle-format.txt
@@ -71,6 +71,11 @@ and the Git bundle v2 format cannot represent a shallow clone repository.
 == Capabilities
 
 Because there is no opportunity for negotiation, unknown capabilities cause 'git
-bundle' to abort.  The only known capability is `object-format`, which specifies
-the hash algorithm in use, and can take the same values as the
-`extensions.objectFormat` configuration value.
+bundle' to abort.
+
+* `object-format` specifies the hash algorithm in use, and can take the same
+  values as the `extensions.objectFormat` configuration value.
+
+* `filter` specifies an object filter as in the `--filter` option in
+  linkgit:git-rev-list[1]. The resulting pack-file must be marked as a
+  `.promisor` pack-file after it is unbundled.
diff --git a/bundle.c b/bundle.c
index 7ba60a573d7..41e75efab9a 100644
--- a/bundle.c
+++ b/bundle.c
@@ -11,7 +11,7 @@
 #include "run-command.h"
 #include "refs.h"
 #include "strvec.h"
-
+#include "list-objects-filter-options.h"
 
 static const char v2_bundle_signature[] = "# v2 git bundle\n";
 static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -33,6 +33,7 @@ void bundle_header_release(struct bundle_header *header)
 {
 	string_list_clear(&header->prerequisites, 1);
 	string_list_clear(&header->references, 1);
+	list_objects_filter_release(&header->filter);
 }
 
 static int parse_capability(struct bundle_header *header, const char *capability)
@@ -45,6 +46,10 @@ static int parse_capability(struct bundle_header *header, const char *capability
 		header->hash_algo = &hash_algos[algo];
 		return 0;
 	}
+	if (skip_prefix(capability, "filter=", &arg)) {
+		parse_list_objects_filter(&header->filter, arg);
+		return 0;
+	}
 	return error(_("unknown capability '%s'"), capability);
 }
 
@@ -220,6 +225,8 @@ int verify_bundle(struct repository *r,
 	req_nr = revs.pending.nr;
 	setup_revisions(2, argv, &revs, NULL);
 
+	list_objects_filter_copy(&revs.filter, &header->filter);
+
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 
@@ -259,6 +266,12 @@ int verify_bundle(struct repository *r,
 			     r->nr),
 			  r->nr);
 		list_refs(r, 0, NULL);
+
+		if (header->filter.choice) {
+			printf_ln("The bundle uses this filter: %s",
+				  list_objects_filter_spec(&header->filter));
+		}
+
 		r = &header->prerequisites;
 		if (!r->nr) {
 			printf_ln(_("The bundle records a complete history."));
diff --git a/bundle.h b/bundle.h
index 06009fe6b1f..7fef2108f43 100644
--- a/bundle.h
+++ b/bundle.h
@@ -4,12 +4,14 @@
 #include "strvec.h"
 #include "cache.h"
 #include "string-list.h"
+#include "list-objects-filter-options.h"
 
 struct bundle_header {
 	unsigned version;
 	struct string_list prerequisites;
 	struct string_list references;
 	const struct git_hash_algo *hash_algo;
+	struct list_objects_filter_options filter;
 };
 
 #define BUNDLE_HEADER_INIT \
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 449d53af69f..f02d8df1422 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -40,22 +40,7 @@ const char *list_object_filter_config_name(enum list_objects_filter_choice c)
 	BUG("list_object_filter_config_name: invalid argument '%d'", c);
 }
 
-/*
- * Parse value of the argument to the "filter" keyword.
- * On the command line this looks like:
- *       --filter=<arg>
- * and in the pack protocol as:
- *       "filter" SP <arg>
- *
- * The filter keyword will be used by many commands.
- * See Documentation/rev-list-options.txt for allowed values for <arg>.
- *
- * Capture the given arg as the "filter_spec".  This can be forwarded to
- * subordinate commands when necessary (although it's better to pass it through
- * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
- * convenience of the current command.
- */
-static int gently_parse_list_objects_filter(
+int gently_parse_list_objects_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg,
 	struct strbuf *errbuf)
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 425c38cae9d..2eb6c983949 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -72,6 +72,26 @@ struct list_objects_filter_options {
 /* Normalized command line arguments */
 #define CL_ARG__FILTER "filter"
 
+/*
+ * Parse value of the argument to the "filter" keyword.
+ * On the command line this looks like:
+ *       --filter=<arg>
+ * and in the pack protocol as:
+ *       "filter" SP <arg>
+ *
+ * The filter keyword will be used by many commands.
+ * See Documentation/rev-list-options.txt for allowed values for <arg>.
+ *
+ * Capture the given arg as the "filter_spec".  This can be forwarded to
+ * subordinate commands when necessary (although it's better to pass it through
+ * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
+ * convenience of the current command.
+ */
+int gently_parse_list_objects_filter(
+	struct list_objects_filter_options *filter_options,
+	const char *arg,
+	struct strbuf *errbuf);
+
 void list_objects_filter_die_if_populated(
 	struct list_objects_filter_options *filter_options);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 10/13] rev-list: move --filter parsing into revision.c
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
                         ` (8 preceding siblings ...)
  2022-03-09 16:01       ` [PATCH v4 09/13] bundle: parse filter capability Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01       ` Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 11/13] bundle: create filtered bundles Derrick Stolee via GitGitGadget
                         ` (2 subsequent siblings)
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Now that 'struct rev_info' has a 'filter' member and most consumers of
object filtering are using that member instead of an external struct,
move the parsing of the '--filter' option out of builtin/rev-list.c and
into revision.c.

This use within handle_revision_pseudo_opt() allows us to find the
option within setup_revisions() if the arguments are passed directly. In
the case of a command such as 'git blame', the arguments are first
scanned and checked with parse_revision_opt(), which complains about the
option, so 'git blame --filter=blob:none <file>' does not become valid
with this change.

Some commands, such as 'git diff' gain this option without having it
make an effect. And 'git diff --objects' was already possible, but does
not actually make sense in that builtin.

The key addition that is coming is 'git bundle create --filter=<X>' so
we can create bundles containing promisor packs. More work is required
to make them fully functional, but that will follow.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/rev-list.c | 11 -----------
 revision.c         |  7 +++++++
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ec433cb6d37..640828149c5 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -591,17 +591,6 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			show_progress = arg;
 			continue;
 		}
-
-		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
-			parse_list_objects_filter(&revs.filter, arg);
-			if (revs.filter.choice && !revs.blob_objects)
-				die(_("object filtering requires --objects"));
-			continue;
-		}
-		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-			list_objects_filter_set_no_filter(&revs.filter);
-			continue;
-		}
 		if (!strcmp(arg, "--filter-provided-objects")) {
 			filter_provided_objects = 1;
 			continue;
diff --git a/revision.c b/revision.c
index ad4286fbdde..15efe23c40f 100644
--- a/revision.c
+++ b/revision.c
@@ -32,6 +32,7 @@
 #include "utf8.h"
 #include "bloom.h"
 #include "json-writer.h"
+#include "list-objects-filter-options.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -2669,6 +2670,10 @@ static int handle_revision_pseudo_opt(struct rev_info *revs,
 		revs->no_walk = 0;
 	} else if (!strcmp(arg, "--single-worktree")) {
 		revs->single_worktree = 1;
+	} else if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
+		parse_list_objects_filter(&revs->filter, arg);
+	} else if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
+		list_objects_filter_set_no_filter(&revs->filter);
 	} else {
 		return 0;
 	}
@@ -2872,6 +2877,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 		die("cannot combine --walk-reflogs with history-limiting options");
 	if (revs->rewrite_parents && revs->children.name)
 		die(_("options '%s' and '%s' cannot be used together"), "--parents", "--children");
+	if (revs->filter.choice && !revs->blob_objects)
+		die(_("object filtering requires --objects"));
 
 	/*
 	 * Limitations on the graph functionality
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 11/13] bundle: create filtered bundles
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
                         ` (9 preceding siblings ...)
  2022-03-09 16:01       ` [PATCH v4 10/13] rev-list: move --filter parsing into revision.c Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01       ` Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 12/13] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 13/13] clone: fail gracefully when cloning filtered bundle Derrick Stolee via GitGitGadget
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

A previous change allowed Git to parse bundles with the 'filter'
capability. Now, teach Git to create bundles with this option.

Some rearranging of code is required to get the option parsing in the
correct spot. There are now two reasons why we might need capabilities
(a new hash algorithm or an object filter) so that is pulled out into a
place where we can check both at the same time.

The --filter option is parsed as part of setup_revisions(), but it
expected the --objects flag, too. That flag is somewhat implied by 'git
bundle' because it creates a pack-file walking objects, but there is
also a walk that walks the revision range expecting only commits. Make
this parsing work by setting 'revs.tree_objects' and 'revs.blob_objects'
before the call to setup_revisions().

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               | 53 +++++++++++++++++++++++++++++++++---------
 t/t6020-bundle-misc.sh | 48 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 11 deletions(-)

diff --git a/bundle.c b/bundle.c
index 41e75efab9a..9370a6e307c 100644
--- a/bundle.c
+++ b/bundle.c
@@ -332,6 +332,9 @@ static int write_pack_data(int bundle_fd, struct rev_info *revs, struct strvec *
 		     "--stdout", "--thin", "--delta-base-offset",
 		     NULL);
 	strvec_pushv(&pack_objects.args, pack_options->v);
+	if (revs->filter.choice)
+		strvec_pushf(&pack_objects.args, "--filter=%s",
+			     list_objects_filter_spec(&revs->filter));
 	pack_objects.in = -1;
 	pack_objects.out = bundle_fd;
 	pack_objects.git_cmd = 1;
@@ -499,10 +502,37 @@ int create_bundle(struct repository *r, const char *path,
 	int bundle_to_stdout;
 	int ref_count = 0;
 	struct rev_info revs, revs_copy;
-	int min_version = the_hash_algo == &hash_algos[GIT_HASH_SHA1] ? 2 : 3;
+	int min_version = 2;
 	struct bundle_prerequisites_info bpi;
 	int i;
 
+	/* init revs to list objects for pack-objects later */
+	save_commit_buffer = 0;
+	repo_init_revisions(r, &revs, NULL);
+
+	/*
+	 * Pre-initialize the '--objects' flag so we can parse a
+	 * --filter option successfully.
+	 */
+	revs.tree_objects = revs.blob_objects = 1;
+
+	argc = setup_revisions(argc, argv, &revs, NULL);
+
+	/*
+	 * Reasons to require version 3:
+	 *
+	 * 1. @object-format is required because our hash algorithm is not
+	 *    SHA1.
+	 * 2. @filter is required because we parsed an object filter.
+	 */
+	if (the_hash_algo != &hash_algos[GIT_HASH_SHA1] || revs.filter.choice)
+		min_version = 3;
+
+	if (argc > 1) {
+		error(_("unrecognized argument: %s"), argv[1]);
+		goto err;
+	}
+
 	bundle_to_stdout = !strcmp(path, "-");
 	if (bundle_to_stdout)
 		bundle_fd = 1;
@@ -525,17 +555,14 @@ int create_bundle(struct repository *r, const char *path,
 		write_or_die(bundle_fd, capability, strlen(capability));
 		write_or_die(bundle_fd, the_hash_algo->name, strlen(the_hash_algo->name));
 		write_or_die(bundle_fd, "\n", 1);
-	}
-
-	/* init revs to list objects for pack-objects later */
-	save_commit_buffer = 0;
-	repo_init_revisions(r, &revs, NULL);
 
-	argc = setup_revisions(argc, argv, &revs, NULL);
-
-	if (argc > 1) {
-		error(_("unrecognized argument: %s"), argv[1]);
-		goto err;
+		if (revs.filter.choice) {
+			const char *value = expand_list_objects_filter_spec(&revs.filter);
+			capability = "@filter=";
+			write_or_die(bundle_fd, capability, strlen(capability));
+			write_or_die(bundle_fd, value, strlen(value));
+			write_or_die(bundle_fd, "\n", 1);
+		}
 	}
 
 	/* save revs.pending in revs_copy for later use */
@@ -558,6 +585,10 @@ int create_bundle(struct repository *r, const char *path,
 	bpi.fd = bundle_fd;
 	bpi.pending = &revs_copy.pending;
 
+	/*
+	 * Remove any object walking here. We only care about commits and
+	 * tags here. The revs_copy has the right instances of these values.
+	 */
 	revs.blob_objects = revs.tree_objects = 0;
 	traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi);
 	object_array_remove_duplicates(&revs_copy.pending);
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index df5ff561fa5..6e97c044ee7 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -487,4 +487,52 @@ test_expect_success 'unfiltered bundle with --objects' '
 	test_cmp expect actual
 '
 
+for filter in "blob:none" "tree:0" "tree:1" "blob:limit=100"
+do
+	test_expect_success "filtered bundle: $filter" '
+		test_when_finished rm -rf .git/objects/pack cloned unbundled &&
+		git bundle create partial.bdl \
+			--all \
+			--filter=$filter &&
+
+		git bundle verify partial.bdl >unfiltered &&
+		make_user_friendly_and_stable_output <unfiltered >actual &&
+
+		cat >expect <<-EOF &&
+		The bundle contains these 10 refs:
+		<COMMIT-P> refs/heads/main
+		<COMMIT-N> refs/heads/release
+		<COMMIT-D> refs/heads/topic/1
+		<COMMIT-H> refs/heads/topic/2
+		<COMMIT-D> refs/pull/1/head
+		<COMMIT-G> refs/pull/2/head
+		<TAG-1> refs/tags/v1
+		<TAG-2> refs/tags/v2
+		<TAG-3> refs/tags/v3
+		<COMMIT-P> HEAD
+		The bundle uses this filter: $filter
+		The bundle records a complete history.
+		EOF
+		test_cmp expect actual &&
+
+		test_config uploadpack.allowfilter 1 &&
+		test_config uploadpack.allowanysha1inwant 1 &&
+		git clone --no-local --filter=$filter --bare "file://$(pwd)" cloned &&
+
+		git init unbundled &&
+		git -C unbundled bundle unbundle ../partial.bdl >ref-list.txt &&
+
+		# Count the same number of reachable objects.
+		reflist=$(git for-each-ref --format="%(objectname)") &&
+		git rev-list --objects --filter=$filter --missing=allow-any \
+			$reflist >expect &&
+		for repo in cloned unbundled
+		do
+			git -C $repo rev-list --objects --missing=allow-any \
+				$reflist >actual &&
+			test_cmp expect actual || return 1
+		done
+	'
+done
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 12/13] bundle: unbundle promisor packs
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
                         ` (10 preceding siblings ...)
  2022-03-09 16:01       ` [PATCH v4 11/13] bundle: create filtered bundles Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01       ` Derrick Stolee via GitGitGadget
  2022-03-09 16:01       ` [PATCH v4 13/13] clone: fail gracefully when cloning filtered bundle Derrick Stolee via GitGitGadget
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

In order to have a valid pack-file after unbundling a bundle that has
the 'filter' capability, we need to generate a .promisor file. The
bundle does not promise _where_ the objects can be found, but we can
expect that these bundles will be unbundled in repositories with
appropriate promisor remotes that can find those missing objects.

Use the 'git index-pack --promisor=<message>' option to create this
.promisor file. Add "from-bundle" as the message to help anyone diagnose
issues with these promisor packs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               | 4 ++++
 t/t6020-bundle-misc.sh | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/bundle.c b/bundle.c
index 9370a6e307c..56681c21131 100644
--- a/bundle.c
+++ b/bundle.c
@@ -620,6 +620,10 @@ int unbundle(struct repository *r, struct bundle_header *header,
 	struct child_process ip = CHILD_PROCESS_INIT;
 	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
 
+	/* If there is a filter, then we need to create the promisor pack. */
+	if (header->filter.choice)
+		strvec_push(&ip.args, "--promisor=from-bundle");
+
 	if (extra_index_pack_args) {
 		strvec_pushv(&ip.args, extra_index_pack_args->v);
 		strvec_clear(extra_index_pack_args);
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 6e97c044ee7..7c6db670221 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -521,6 +521,8 @@ do
 
 		git init unbundled &&
 		git -C unbundled bundle unbundle ../partial.bdl >ref-list.txt &&
+		ls unbundled/.git/objects/pack/pack-*.promisor >promisor &&
+		test_line_count = 1 promisor &&
 
 		# Count the same number of reachable objects.
 		reflist=$(git for-each-ref --format="%(objectname)") &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 13/13] clone: fail gracefully when cloning filtered bundle
  2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
                         ` (11 preceding siblings ...)
  2022-03-09 16:01       ` [PATCH v4 12/13] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
@ 2022-03-09 16:01       ` Derrick Stolee via GitGitGadget
  12 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-09 16:01 UTC (permalink / raw)
  To: git
  Cc: stolee, avarab, gitster, zhiyou.jx, jonathantanmy,
	Jeff Hostetler, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Users can create a new repository using 'git clone <bundle-file>'. The
new "@filter" capability for bundles means that we can generate a bundle
that does not contain all reachable objects, even if the header has no
negative commit OIDs.

It is feasible to think that we could make a filtered bundle work with
the command

  git clone --filter=$filter --bare <bundle-file>

or possibly replacing --bare with --no-checkout. However, this requires
having some repository-global config that specifies the specified object
filter and notifies Git about the existence of promisor pack-files.
Without a remote, that is currently impossible.

As a stop-gap, parse the bundle header during 'git clone' and die() with
a helpful error message instead of the current behavior of failing due
to "missing objects".

Most of the existing logic for handling bundle clones actually happens
in fetch-pack.c, but that logic is the same as if the user specified
'git fetch <bundle>', so we want to avoid failing to fetch a filtered
bundle when in an existing repository that has the proper config set up
for at least one remote.

Carefully comment around the test that this is not the desired long-term
behavior of 'git clone' in this case, but instead that we need to do
more work before that is possible.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/clone.c        | 13 +++++++++++++
 t/t6020-bundle-misc.sh | 12 ++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 9c29093b352..e57504c2aa8 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -33,6 +33,7 @@
 #include "packfile.h"
 #include "list-objects-filter-options.h"
 #include "hook.h"
+#include "bundle.h"
 
 /*
  * Overall FIXMEs:
@@ -1138,6 +1139,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		warning(_("--local is ignored"));
 	transport->cloning = 1;
 
+	if (is_bundle) {
+		struct bundle_header header = BUNDLE_HEADER_INIT;
+		int fd = read_bundle_header(path, &header);
+		int has_filter = header.filter.choice != LOFC_DISABLED;
+
+		if (fd > 0)
+			close(fd);
+		bundle_header_release(&header);
+		if (has_filter)
+			die(_("cannot clone from filtered bundle"));
+	}
+
 	transport_set_option(transport, TRANS_OPT_KEEP, "yes");
 
 	if (reject_shallow)
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 7c6db670221..ed95d195427 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -537,4 +537,16 @@ do
 	'
 done
 
+# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered
+# bundle, but that requires a change to promisor/filter config options.
+# For now, we fail gracefully with a helpful error. This behavior can be
+# changed in the future to succeed as much as possible.
+test_expect_success 'cloning from filtered bundle has useful error' '
+	git bundle create partial.bdl \
+		--all \
+		--filter=blob:none &&
+	test_must_fail git clone --bare partial.bdl partial 2>err &&
+	grep "cannot clone from filtered bundle" err
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 07/12] list-objects: handle NULL function pointers
  2022-03-09 13:40         ` Ævar Arnfjörð Bjarmason
  2022-03-09 14:16           ` Derrick Stolee
@ 2022-03-09 18:32           ` Junio C Hamano
  1 sibling, 0 replies; 114+ messages in thread
From: Junio C Hamano @ 2022-03-09 18:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Ævar Arnfjörð Bjarmason via GitGitGadget, git,
	stolee, zhiyou.jx, jonathantanmy, Jeff Hostetler, Derrick Stolee

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I suspect what's wanted here is "print all stuff before the "\n\n"
> header/PACK delimiter, which is better done with "sed" like this:
>
> 	sed -n -e '/^$/q' -e 'p'

I see.  Or just "sed -e '/^$/q'" would also be fine (i.e. "print
everything up to, including the first blank line") for comparison
purposes and may be simpler.

Thanks.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 09/13] bundle: parse filter capability
  2022-03-09 16:01       ` [PATCH v4 09/13] bundle: parse filter capability Derrick Stolee via GitGitGadget
@ 2022-03-09 18:41         ` Junio C Hamano
  2022-03-09 18:55           ` Derrick Stolee
  0 siblings, 1 reply; 114+ messages in thread
From: Junio C Hamano @ 2022-03-09 18:41 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Jeff Hostetler,
	Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
> index 72ab8139052..ac4c4352aae 100644
> --- a/Documentation/git-bundle.txt
> +++ b/Documentation/git-bundle.txt
> @@ -75,8 +75,11 @@ verify <file>::
>  	cleanly to the current repository.  This includes checks on the
>  	bundle format itself as well as checking that the prerequisite
>  	commits exist and are fully linked in the current repository.
> -	'git bundle' prints a list of missing commits, if any, and exits
> -	with a non-zero status.
> +	Information about additional capabilities, such as "object filter",
> +	is printed. See "Capabilities" in link:technical/bundle-format.html
> +	for more information. Finally, 'git bundle' prints a list of
> +	missing commits, if any. The exit code is zero for success, but
> +	will be nonzero if the bundle file is invalid.

Hmph.  I wasn't expecting this change (not objecting, but mostly am
surprised) relative to the previous round where the filter was
mentioned only when we issue an error message.  I was expecting to
see something like "list-filters <file>", which is analog to the
"list-heads <file>", to help those who want to programatically build
around the "git bundle" command output.  Or "--list-capabilities" to
accomodate the current, this, and future capabilities.  We already
have the object-format thing before this series.  Do we have an
interface to expose that out of a given bundle file?


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 03/13] revision: put object filter into struct rev_info
  2022-03-09 16:01       ` [PATCH v4 03/13] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
@ 2022-03-09 18:48         ` Junio C Hamano
  0 siblings, 0 replies; 114+ messages in thread
From: Junio C Hamano @ 2022-03-09 18:48 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Jeff Hostetler,
	Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/revision.h b/revision.h
> index 3c58c18c63a..b1669a8cc33 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -8,6 +8,7 @@
>  #include "pretty.h"
>  #include "diff.h"
>  #include "commit-slab-decl.h"
> +#include "list-objects-filter-options.h"
>  
>  /**
>   * The revision walking API offers functions to build a list of revisions
> @@ -94,6 +95,12 @@ struct rev_info {
>  	/* The end-points specified by the end user */
>  	struct rev_cmdline_info cmdline;
>  
> +	/*
> +	 * Object filter options. No filtering is specified
> +	 * if and only if filter.choice is zero.
> +	 */
> +	struct list_objects_filter_options filter;

I wondered if s/zero/LOFC_DISABLED/ would make it more helpful, but
seeing changes like the one i the later "parse filter capability"
step, which can just become

-	if (header->filter.choice != LOFC_DISABLED) {
+	if (header->filter.choice) {
		... do things ...

relative to the previous round, I think what is in the posted patch
indeed is a better way to help developers.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 09/13] bundle: parse filter capability
  2022-03-09 18:41         ` Junio C Hamano
@ 2022-03-09 18:55           ` Derrick Stolee
  0 siblings, 0 replies; 114+ messages in thread
From: Derrick Stolee @ 2022-03-09 18:55 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, stolee, avarab, zhiyou.jx, jonathantanmy, Jeff Hostetler

On 3/9/2022 1:41 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
>> index 72ab8139052..ac4c4352aae 100644
>> --- a/Documentation/git-bundle.txt
>> +++ b/Documentation/git-bundle.txt
>> @@ -75,8 +75,11 @@ verify <file>::
>>  	cleanly to the current repository.  This includes checks on the
>>  	bundle format itself as well as checking that the prerequisite
>>  	commits exist and are fully linked in the current repository.
>> -	'git bundle' prints a list of missing commits, if any, and exits
>> -	with a non-zero status.
>> +	Information about additional capabilities, such as "object filter",
>> +	is printed. See "Capabilities" in link:technical/bundle-format.html
>> +	for more information. Finally, 'git bundle' prints a list of
>> +	missing commits, if any. The exit code is zero for success, but
>> +	will be nonzero if the bundle file is invalid.
> 
> Hmph.  I wasn't expecting this change (not objecting, but mostly am
> surprised) relative to the previous round where the filter was
> mentioned only when we issue an error message.  I was expecting to
> see something like "list-filters <file>", which is analog to the
> "list-heads <file>", to help those who want to programatically build
> around the "git bundle" command output.  Or "--list-capabilities" to
> accomodate the current, this, and future capabilities.  We already
> have the object-format thing before this series.  Do we have an
> interface to expose that out of a given bundle file?
 
The object format does _not_ appear to be output by 'git bundle verify'.
I just ran t6020 under GIT_TEST_DEFAULT_HASH=sha256 and see the
capability being written, but not output by verify:


The bundle contains these 10 refs:
d519553fbcf280df4448d588c25a51872f2d8dec95ba65a8a1bd3c64a5eec664 refs/heads/main
265b1effb3fdb80e04f7ea64e717f5677ddf57d00145dce7c508ba1f5ddb9081 refs/heads/release
611ac8182ea26d7aad227873b70f584593af1aa584bbdd37b36055e71be6ccd7 refs/heads/topic/1
4251af01ec70cdca692a3d15d78ccb9a6ca92ef344bd2dbc3bac20081347ae9b refs/heads/topic/2
611ac8182ea26d7aad227873b70f584593af1aa584bbdd37b36055e71be6ccd7 refs/pull/1/head
ec7e40a591df46923b25fd44bd86a2a80927f343d141f55ddf295c5d2d57959e refs/pull/2/head
754e9363bbfce179d35ccc48ae3a3c81db95a489cc632fafe5c10b25aed29d74 refs/tags/v1
a96c78650835f2041c49dce964bb759add14cfc4d35af3b7ee2b22289f9ba817 refs/tags/v2
398a930e72d21ea455c982227cca3c8fb5feb88c31f1d42226a3e6c42ff8db8f refs/tags/v3
d519553fbcf280df4448d588c25a51872f2d8dec95ba65a8a1bd3c64a5eec664 HEAD
The bundle uses this filter: blob:none
The bundle records a complete history.
partial.bdl is okay


This output does not seem to be designed for machine parsing, so
this extension of "The bundle uses this filter:" shouldn't be
problematic. A similar addition for object-format should be
possible as a follow-up.

Further, it seems you are asking for something that just reads
the header and supplies valuable data from the header such as
the refs and the capabilities. Perhaps 'git bundle header <file>'?

(We already have 'git bundle list-heads', so maybe we should
leave the refs out of 'git bundle header' output?)

I can add both as follow-ups to my existing list of things.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 04/13] pack-objects: use rev.filter when possible
  2022-03-09 16:01       ` [PATCH v4 04/13] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
@ 2022-03-10 13:11         ` Ævar Arnfjörð Bjarmason
  2022-03-10 13:33           ` Derrick Stolee
  0 siblings, 1 reply; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-10 13:11 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Jeff Hostetler,
	Derrick Stolee


On Wed, Mar 09 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> In builtin/pack-objects.c, we use a 'filter_options' global to populate
> the --filter=<X> argument. The previous change created a pointer to a
> filter option in 'struct rev_info', so we can use that pointer here as a
> start to simplifying some usage of object filters.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  builtin/pack-objects.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index ba2006f2212..e5b7d015d7d 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
>  
>  static int get_object_list_from_bitmap(struct rev_info *revs)
>  {
> -	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
> +	if (!(bitmap_git = prepare_bitmap_walk(revs, &revs->filter, 0)))
>  		return -1;
>  
>  	if (pack_options_allow_reuse() &&
> @@ -3727,6 +3727,7 @@ static void get_object_list(int ac, const char **av)
>  	repo_init_revisions(the_repository, &revs, NULL);
>  	save_commit_buffer = 0;
>  	setup_revisions(ac, av, &revs, &s_r_opt);
> +	list_objects_filter_copy(&revs.filter, &filter_options);
>  
>  	/* make sure shallows are read */
>  	is_repository_shallow(the_repository);
> @@ -3777,7 +3778,7 @@ static void get_object_list(int ac, const char **av)
>  
>  	if (!fn_show_object)
>  		fn_show_object = show_object;
> -	traverse_commit_list_filtered(&filter_options, &revs,
> +	traverse_commit_list_filtered(&revs.filter, &revs,
>  				      show_commit, fn_show_object, NULL,
>  				      NULL);

Re your
https://lore.kernel.org/git/77c8ef4b-5dce-401b-e703-cfa32e18c853@github.com/
I was looking at how to handle the interaction between this and my
revisions_release() series.

This adds a new memory leak, which can be seen with:

    make SANITIZE=leak
    (cd t && ./t5532-fetch-proxy.sh -vixd)

I.e. this part is new:
    
    remote: Direct leak of 1 byte(s) in 1 object(s) allocated from:
    remote:     #0 0x4552f8 in __interceptor_malloc (git+0x4552f8)
    remote:     #1 0x75a089 in do_xmalloc wrapper.c:41:8
    remote:     #2 0x75a046 in xmalloc wrapper.c:62:9
    remote:     #3 0x62c692 in list_objects_filter_copy list-objects-filter-options.c:433:2
    remote:     #4 0x4f70bf in get_object_list builtin/pack-objects.c:3730:2
    remote:     #5 0x4f5e0e in cmd_pack_objects builtin/pack-objects.c:4157:3
    remote:     #6 0x4592ba in run_builtin git.c:465:11
    remote:     #7 0x457d71 in handle_builtin git.c:718:3
    remote:     #8 0x458ca5 in run_argv git.c:785:4
    remote:     #9 0x457b30 in cmd_main git.c:916:19
    remote:     #10 0x563179 in main common-main.c:56:11
    remote:     #11 0x7fddd2da67ec in __libc_start_main csu/../csu/libc-start.c:332:16
    remote:     #12 0x4300e9 in _start (git+0x4300e9)
    
Of course it's not "new" in the sense that we in effect leaked this
before, but it was still reachable, you're just changing it so that
instead of being stored in the static "filter_options" variable in
pack-objects.c we're storing it in "struct rev_info", which has a
different lifetime.

I think instead of me rebasing my series on top of yours and tangling
the two up a better option is to just add a change to this, so after
list_objects_filter_copy() do:

    UNLEAK(revs.filter);

Or, alternatively adding this to the end of the function (in which case
Junio will need to deal with a minor textual conflict):

    list_objects_filter_release(&revs.filter);

Both of those make my series merged with "seen" (which has this change)
pass with SANITIZE=leak + GIT_TEST_PASSING_SANITIZE_LEAK=true again.

You could do the same in your later change adding
list_objects_filter_copy() to verify_bundle(), that one also adds a new
leak, but happens not to cause test failures since the bundle.c code
isn't otherwise marked as passing with SANITIZE=leak, it fails in
various other ways.

Obviously we should do something about the actual leak eventually, but
that can be done in some follow-up work to finish up the missing bits of
release_revisions(), i.e. adding list_objects_filter_release() etc. to
release_revisions().

So I think just adding UNLEAK() here (and optionally, also to the
bundle.c code) is the least invasive thing, if you & Junio are OK with
that approach.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 04/13] pack-objects: use rev.filter when possible
  2022-03-10 13:11         ` Ævar Arnfjörð Bjarmason
@ 2022-03-10 13:33           ` Derrick Stolee
  2022-03-10 14:24             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 114+ messages in thread
From: Derrick Stolee @ 2022-03-10 13:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget
  Cc: git, stolee, gitster, zhiyou.jx, jonathantanmy, Jeff Hostetler

On 3/10/2022 8:11 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Mar 09 2022, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> In builtin/pack-objects.c, we use a 'filter_options' global to populate
>> the --filter=<X> argument. The previous change created a pointer to a
>> filter option in 'struct rev_info', so we can use that pointer here as a
>> start to simplifying some usage of object filters.
>>
>> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>> ---
>>  builtin/pack-objects.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index ba2006f2212..e5b7d015d7d 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
>>  
>>  static int get_object_list_from_bitmap(struct rev_info *revs)
>>  {
>> -	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
>> +	if (!(bitmap_git = prepare_bitmap_walk(revs, &revs->filter, 0)))
>>  		return -1;
>>  
>>  	if (pack_options_allow_reuse() &&
>> @@ -3727,6 +3727,7 @@ static void get_object_list(int ac, const char **av)
>>  	repo_init_revisions(the_repository, &revs, NULL);
>>  	save_commit_buffer = 0;
>>  	setup_revisions(ac, av, &revs, &s_r_opt);
>> +	list_objects_filter_copy(&revs.filter, &filter_options);
>>  
>>  	/* make sure shallows are read */
>>  	is_repository_shallow(the_repository);
>> @@ -3777,7 +3778,7 @@ static void get_object_list(int ac, const char **av)
>>  
>>  	if (!fn_show_object)
>>  		fn_show_object = show_object;
>> -	traverse_commit_list_filtered(&filter_options, &revs,
>> +	traverse_commit_list_filtered(&revs.filter, &revs,
>>  				      show_commit, fn_show_object, NULL,
>>  				      NULL);
> 
> Re your
> https://lore.kernel.org/git/77c8ef4b-5dce-401b-e703-cfa32e18c853@github.com/
> I was looking at how to handle the interaction between this and my
> revisions_release() series.
> 
> This adds a new memory leak, which can be seen with:
> 
>     make SANITIZE=leak
>     (cd t && ./t5532-fetch-proxy.sh -vixd)
> 
> I.e. this part is new:
>     
>     remote: Direct leak of 1 byte(s) in 1 object(s) allocated from:
>     remote:     #0 0x4552f8 in __interceptor_malloc (git+0x4552f8)
>     remote:     #1 0x75a089 in do_xmalloc wrapper.c:41:8
>     remote:     #2 0x75a046 in xmalloc wrapper.c:62:9
>     remote:     #3 0x62c692 in list_objects_filter_copy list-objects-filter-options.c:433:2
>     remote:     #4 0x4f70bf in get_object_list builtin/pack-objects.c:3730:2
>     remote:     #5 0x4f5e0e in cmd_pack_objects builtin/pack-objects.c:4157:3
>     remote:     #6 0x4592ba in run_builtin git.c:465:11
>     remote:     #7 0x457d71 in handle_builtin git.c:718:3
>     remote:     #8 0x458ca5 in run_argv git.c:785:4
>     remote:     #9 0x457b30 in cmd_main git.c:916:19
>     remote:     #10 0x563179 in main common-main.c:56:11
>     remote:     #11 0x7fddd2da67ec in __libc_start_main csu/../csu/libc-start.c:332:16
>     remote:     #12 0x4300e9 in _start (git+0x4300e9)
>     
> Of course it's not "new" in the sense that we in effect leaked this
> before, but it was still reachable, you're just changing it so that
> instead of being stored in the static "filter_options" variable in
> pack-objects.c we're storing it in "struct rev_info", which has a
> different lifetime.

True, and 'struct rev_info' is not being release in any way, either,
right?

> I think instead of me rebasing my series on top of yours and tangling
> the two up a better option is to just add a change to this, so after
> list_objects_filter_copy() do:
> 
>     UNLEAK(revs.filter);
> 
> Or, alternatively adding this to the end of the function (in which case
> Junio will need to deal with a minor textual conflict):
> 
>     list_objects_filter_release(&revs.filter);
> 
> Both of those make my series merged with "seen" (which has this change)
> pass with SANITIZE=leak + GIT_TEST_PASSING_SANITIZE_LEAK=true again.
> 
> You could do the same in your later change adding
> list_objects_filter_copy() to verify_bundle(), that one also adds a new
> leak, but happens not to cause test failures since the bundle.c code
> isn't otherwise marked as passing with SANITIZE=leak, it fails in
> various other ways.
> 
> Obviously we should do something about the actual leak eventually, but
> that can be done in some follow-up work to finish up the missing bits of
> release_revisions(), i.e. adding list_objects_filter_release() etc. to
> release_revisions().

I understand that you like to "show your work" by marking tests as
safe for leak-check by making the smallest changes possible, but your
series has a lot of small patches that do nothing but add a free() or
release_*() call instead of implementing the "right" release_revisions()
from the start.
 
> So I think just adding UNLEAK() here (and optionally, also to the
> bundle.c code) is the least invasive thing, if you & Junio are OK with
> that approach.

Could you send a patch that does just that, so we are sure to cover
the warnings you are seeing in your tests?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 04/13] pack-objects: use rev.filter when possible
  2022-03-10 13:33           ` Derrick Stolee
@ 2022-03-10 14:24             ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 114+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-10 14:24 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, stolee, gitster, zhiyou.jx,
	jonathantanmy, Jeff Hostetler


On Thu, Mar 10 2022, Derrick Stolee wrote:

> On 3/10/2022 8:11 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Wed, Mar 09 2022, Derrick Stolee via GitGitGadget wrote:
>> 
>>> From: Derrick Stolee <derrickstolee@github.com>
>>>
>>> In builtin/pack-objects.c, we use a 'filter_options' global to populate
>>> the --filter=<X> argument. The previous change created a pointer to a
>>> filter option in 'struct rev_info', so we can use that pointer here as a
>>> start to simplifying some usage of object filters.
>>>
>>> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>>> ---
>>>  builtin/pack-objects.c | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>>> index ba2006f2212..e5b7d015d7d 100644
>>> --- a/builtin/pack-objects.c
>>> +++ b/builtin/pack-objects.c
>>> @@ -3651,7 +3651,7 @@ static int pack_options_allow_reuse(void)
>>>  
>>>  static int get_object_list_from_bitmap(struct rev_info *revs)
>>>  {
>>> -	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
>>> +	if (!(bitmap_git = prepare_bitmap_walk(revs, &revs->filter, 0)))
>>>  		return -1;
>>>  
>>>  	if (pack_options_allow_reuse() &&
>>> @@ -3727,6 +3727,7 @@ static void get_object_list(int ac, const char **av)
>>>  	repo_init_revisions(the_repository, &revs, NULL);
>>>  	save_commit_buffer = 0;
>>>  	setup_revisions(ac, av, &revs, &s_r_opt);
>>> +	list_objects_filter_copy(&revs.filter, &filter_options);
>>>  
>>>  	/* make sure shallows are read */
>>>  	is_repository_shallow(the_repository);
>>> @@ -3777,7 +3778,7 @@ static void get_object_list(int ac, const char **av)
>>>  
>>>  	if (!fn_show_object)
>>>  		fn_show_object = show_object;
>>> -	traverse_commit_list_filtered(&filter_options, &revs,
>>> +	traverse_commit_list_filtered(&revs.filter, &revs,
>>>  				      show_commit, fn_show_object, NULL,
>>>  				      NULL);
>> 
>> Re your
>> https://lore.kernel.org/git/77c8ef4b-5dce-401b-e703-cfa32e18c853@github.com/
>> I was looking at how to handle the interaction between this and my
>> revisions_release() series.
>> 
>> This adds a new memory leak, which can be seen with:
>> 
>>     make SANITIZE=leak
>>     (cd t && ./t5532-fetch-proxy.sh -vixd)
>> 
>> I.e. this part is new:
>>     
>>     remote: Direct leak of 1 byte(s) in 1 object(s) allocated from:
>>     remote:     #0 0x4552f8 in __interceptor_malloc (git+0x4552f8)
>>     remote:     #1 0x75a089 in do_xmalloc wrapper.c:41:8
>>     remote:     #2 0x75a046 in xmalloc wrapper.c:62:9
>>     remote:     #3 0x62c692 in list_objects_filter_copy list-objects-filter-options.c:433:2
>>     remote:     #4 0x4f70bf in get_object_list builtin/pack-objects.c:3730:2
>>     remote:     #5 0x4f5e0e in cmd_pack_objects builtin/pack-objects.c:4157:3
>>     remote:     #6 0x4592ba in run_builtin git.c:465:11
>>     remote:     #7 0x457d71 in handle_builtin git.c:718:3
>>     remote:     #8 0x458ca5 in run_argv git.c:785:4
>>     remote:     #9 0x457b30 in cmd_main git.c:916:19
>>     remote:     #10 0x563179 in main common-main.c:56:11
>>     remote:     #11 0x7fddd2da67ec in __libc_start_main csu/../csu/libc-start.c:332:16
>>     remote:     #12 0x4300e9 in _start (git+0x4300e9)
>>     
>> Of course it's not "new" in the sense that we in effect leaked this
>> before, but it was still reachable, you're just changing it so that
>> instead of being stored in the static "filter_options" variable in
>> pack-objects.c we're storing it in "struct rev_info", which has a
>> different lifetime.
>
> True, and 'struct rev_info' is not being release in any way, either,
> right?

Yes, sorry to not be clear there. There's 2x other leaks just in that
one test on "master", my series addresses those, but then the 3rd
"added" here will be left behind.

>> I think instead of me rebasing my series on top of yours and tangling
>> the two up a better option is to just add a change to this, so after
>> list_objects_filter_copy() do:
>> 
>>     UNLEAK(revs.filter);
>> 
>> Or, alternatively adding this to the end of the function (in which case
>> Junio will need to deal with a minor textual conflict):
>> 
>>     list_objects_filter_release(&revs.filter);
>> 
>> Both of those make my series merged with "seen" (which has this change)
>> pass with SANITIZE=leak + GIT_TEST_PASSING_SANITIZE_LEAK=true again.
>> 
>> You could do the same in your later change adding
>> list_objects_filter_copy() to verify_bundle(), that one also adds a new
>> leak, but happens not to cause test failures since the bundle.c code
>> isn't otherwise marked as passing with SANITIZE=leak, it fails in
>> various other ways.
>> 
>> Obviously we should do something about the actual leak eventually, but
>> that can be done in some follow-up work to finish up the missing bits of
>> release_revisions(), i.e. adding list_objects_filter_release() etc. to
>> release_revisions().
>
> I understand that you like to "show your work" by marking tests as
> safe for leak-check by making the smallest changes possible, but your
> series has a lot of small patches that do nothing but add a free() or
> release_*() call instead of implementing the "right" release_revisions()
> from the start.

Yes, another way to do it would be to add the end-state
release_revisions() I have and incrementally add it everywhere.

The way I opted to do it admittedly results in a bit more churn, but was
(and still is) very useful to bisect and debug any changes.

I.e. I can easily pinpoint for any failures what exact rev.* member
being released caused a failure, which I couldn't do if I added the
end-state release_revisions() and then started adding it to code in
various places.

Then a failure due to adding it to say pack-objects wouldn't be easily
distinguishable from a failure due to releasing rev.SOMETHING in
pack-objects.

But in any case, the interaction with tips of these two sets of patches
(this series & mine) would be the same whatever the intra-series
progression I opted for is.

>> So I think just adding UNLEAK() here (and optionally, also to the
>> bundle.c code) is the least invasive thing, if you & Junio are OK with
>> that approach.
>
> Could you send a patch that does just that, so we are sure to cover
> the warnings you are seeing in your tests?

I'm suggesting fixing the following up into this series, the first hunk
is needed for the interaction with the release_revisions() series, the
second is there for completeness, but isn't required currently:

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1a0b0950a28..ffe6197729c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3987,6 +3987,7 @@ static void get_object_list(int ac, const char **av)
 	save_commit_buffer = 0;
 	setup_revisions(ac, av, &revs, &s_r_opt);
 	list_objects_filter_copy(&revs.filter, &filter_options);
+	UNLEAK(revs.filter);
 
 	/* make sure shallows are read */
 	is_repository_shallow(the_repository);
diff --git a/bundle.c b/bundle.c
index e359370cfcd..90cfea0c984 100644
--- a/bundle.c
+++ b/bundle.c
@@ -226,6 +226,7 @@ int verify_bundle(struct repository *r,
 	setup_revisions(2, argv, &revs, NULL);
 
 	list_objects_filter_copy(&revs.filter, &header->filter);
+	UNLEAK(revs.filter);
 
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));

^ permalink raw reply related	[flat|nested] 114+ messages in thread

end of thread, other threads:[~2022-03-10 14:35 UTC | newest]

Thread overview: 114+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-23 17:55 [PATCH 00/11] Partial bundles Derrick Stolee via GitGitGadget
2022-02-23 17:55 ` [PATCH 01/11] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
2022-02-23 17:55 ` [PATCH 02/11] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
2022-03-04 22:15   ` Junio C Hamano
2022-03-07 13:59     ` Derrick Stolee
2022-03-07 16:46       ` Junio C Hamano
2022-02-23 17:55 ` [PATCH 03/11] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
2022-03-04 22:25   ` Junio C Hamano
2022-02-23 17:55 ` [PATCH 04/11] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
2022-03-04 22:26   ` Junio C Hamano
2022-02-23 17:55 ` [PATCH 05/11] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
2022-03-04 22:30   ` Junio C Hamano
2022-02-23 17:55 ` [PATCH 06/11] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
2022-03-04 22:33   ` Junio C Hamano
2022-03-07 14:05     ` Derrick Stolee
2022-03-07 16:47       ` Junio C Hamano
2022-02-23 17:55 ` [PATCH 07/11] bundle: safely handle --objects option Derrick Stolee via GitGitGadget
2022-02-28 16:00   ` Jeff Hostetler
2022-03-04 22:58     ` Junio C Hamano
2022-03-07 14:09       ` Derrick Stolee
2022-03-04 22:57   ` Junio C Hamano
2022-03-07 15:35   ` Ævar Arnfjörð Bjarmason
2022-02-23 17:55 ` [PATCH 08/11] bundle: parse filter capability Derrick Stolee via GitGitGadget
2022-03-07 15:38   ` Ævar Arnfjörð Bjarmason
2022-03-07 16:14     ` Derrick Stolee
2022-03-07 16:22       ` Ævar Arnfjörð Bjarmason
2022-03-07 16:29         ` Derrick Stolee
2022-03-07 15:55   ` Ævar Arnfjörð Bjarmason
2022-02-23 17:55 ` [PATCH 09/11] rev-list: move --filter parsing into revision.c Derrick Stolee via GitGitGadget
2022-02-23 17:55 ` [PATCH 10/11] bundle: create filtered bundles Derrick Stolee via GitGitGadget
2022-03-04 23:35   ` Junio C Hamano
2022-03-07 14:14     ` Derrick Stolee
2022-03-07 16:49       ` Junio C Hamano
2022-03-07 15:44   ` Ævar Arnfjörð Bjarmason
2022-02-23 17:55 ` [PATCH 11/11] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
2022-03-04 23:43   ` Junio C Hamano
2022-03-07 14:48     ` Derrick Stolee
2022-03-07 16:56       ` Junio C Hamano
2022-03-07 18:57         ` Derrick Stolee
2022-03-07 19:40           ` Junio C Hamano
2022-03-07 19:49             ` Derrick Stolee
2022-03-07 19:54               ` Junio C Hamano
2022-03-07 20:20                 ` Derrick Stolee
2022-03-07 21:35                   ` Junio C Hamano
2022-03-07 15:47   ` Ævar Arnfjörð Bjarmason
2022-03-07 16:10     ` Derrick Stolee
2022-02-28 17:00 ` [PATCH 00/11] Partial bundles Jeff Hostetler
2022-02-28 17:54   ` Derrick Stolee
2022-03-01 18:03     ` Jeff Hostetler
2022-03-04 19:19 ` Derrick Stolee
2022-03-07 14:55 ` Ævar Arnfjörð Bjarmason
2022-03-07 14:59   ` Derrick Stolee
2022-03-07 21:50 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
2022-03-07 21:50   ` [PATCH v2 01/12] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
2022-03-07 21:50   ` [PATCH v2 02/12] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
2022-03-07 21:50   ` [PATCH v2 03/12] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
2022-03-07 21:50   ` [PATCH v2 04/12] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
2022-03-07 21:50   ` [PATCH v2 05/12] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
2022-03-07 21:50   ` [PATCH v2 06/12] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
2022-03-07 21:50   ` [PATCH v2 07/12] bundle: safely handle --objects option Derrick Stolee via GitGitGadget
2022-03-08  9:37     ` Ævar Arnfjörð Bjarmason
2022-03-08 13:45       ` Derrick Stolee
2022-03-08 13:53         ` Ævar Arnfjörð Bjarmason
2022-03-07 21:50   ` [PATCH v2 08/12] bundle: parse filter capability Derrick Stolee via GitGitGadget
2022-03-08  9:25     ` Ævar Arnfjörð Bjarmason
2022-03-08 13:43       ` Derrick Stolee
2022-03-07 21:50   ` [PATCH v2 09/12] rev-list: move --filter parsing into revision.c Derrick Stolee via GitGitGadget
2022-03-07 21:50   ` [PATCH v2 10/12] bundle: create filtered bundles Derrick Stolee via GitGitGadget
2022-03-07 21:50   ` [PATCH v2 11/12] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
2022-03-07 21:50   ` [PATCH v2 12/12] clone: fail gracefully when cloning filtered bundle Derrick Stolee via GitGitGadget
2022-03-07 22:11   ` [PATCH v2 00/12] Partial bundles Junio C Hamano
2022-03-08 14:39   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2022-03-08 14:39     ` [PATCH v3 01/12] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
2022-03-08 14:39     ` [PATCH v3 02/12] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
2022-03-08 14:39     ` [PATCH v3 03/12] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
2022-03-08 14:39     ` [PATCH v3 04/12] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
2022-03-08 14:39     ` [PATCH v3 05/12] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
2022-03-09 13:24       ` Ævar Arnfjörð Bjarmason
2022-03-08 14:39     ` [PATCH v3 06/12] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
2022-03-08 14:39     ` [PATCH v3 07/12] list-objects: handle NULL function pointers Ævar Arnfjörð Bjarmason via GitGitGadget
2022-03-08 17:26       ` Junio C Hamano
2022-03-09 13:40         ` Ævar Arnfjörð Bjarmason
2022-03-09 14:16           ` Derrick Stolee
2022-03-09 18:32           ` Junio C Hamano
2022-03-08 14:39     ` [PATCH v3 08/12] bundle: parse filter capability Derrick Stolee via GitGitGadget
2022-03-08 17:29       ` Junio C Hamano
2022-03-09 14:35         ` Derrick Stolee
2022-03-09 13:30       ` Ævar Arnfjörð Bjarmason
2022-03-08 14:39     ` [PATCH v3 09/12] rev-list: move --filter parsing into revision.c Derrick Stolee via GitGitGadget
2022-03-08 14:39     ` [PATCH v3 10/12] bundle: create filtered bundles Derrick Stolee via GitGitGadget
2022-03-08 14:39     ` [PATCH v3 11/12] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
2022-03-08 14:39     ` [PATCH v3 12/12] clone: fail gracefully when cloning filtered bundle Derrick Stolee via GitGitGadget
2022-03-08 16:10       ` Derrick Stolee
2022-03-08 17:19         ` Junio C Hamano
2022-03-09 16:01     ` [PATCH v4 00/13] Partial bundles Derrick Stolee via GitGitGadget
2022-03-09 16:01       ` [PATCH v4 01/13] index-pack: document and test the --promisor option Derrick Stolee via GitGitGadget
2022-03-09 16:01       ` [PATCH v4 02/13] list-objects-filter-options: create copy helper Derrick Stolee via GitGitGadget
2022-03-09 16:01       ` [PATCH v4 03/13] revision: put object filter into struct rev_info Derrick Stolee via GitGitGadget
2022-03-09 18:48         ` Junio C Hamano
2022-03-09 16:01       ` [PATCH v4 04/13] pack-objects: use rev.filter when possible Derrick Stolee via GitGitGadget
2022-03-10 13:11         ` Ævar Arnfjörð Bjarmason
2022-03-10 13:33           ` Derrick Stolee
2022-03-10 14:24             ` Ævar Arnfjörð Bjarmason
2022-03-09 16:01       ` [PATCH v4 05/13] pack-bitmap: drop filter in prepare_bitmap_walk() Derrick Stolee via GitGitGadget
2022-03-09 16:01       ` [PATCH v4 06/13] list-objects: consolidate traverse_commit_list[_filtered] Derrick Stolee via GitGitGadget
2022-03-09 16:01       ` [PATCH v4 07/13] MyFirstObjectWalk: update recommended usage Derrick Stolee via GitGitGadget
2022-03-09 16:01       ` [PATCH v4 08/13] list-objects: handle NULL function pointers Ævar Arnfjörð Bjarmason via GitGitGadget
2022-03-09 16:01       ` [PATCH v4 09/13] bundle: parse filter capability Derrick Stolee via GitGitGadget
2022-03-09 18:41         ` Junio C Hamano
2022-03-09 18:55           ` Derrick Stolee
2022-03-09 16:01       ` [PATCH v4 10/13] rev-list: move --filter parsing into revision.c Derrick Stolee via GitGitGadget
2022-03-09 16:01       ` [PATCH v4 11/13] bundle: create filtered bundles Derrick Stolee via GitGitGadget
2022-03-09 16:01       ` [PATCH v4 12/13] bundle: unbundle promisor packs Derrick Stolee via GitGitGadget
2022-03-09 16:01       ` [PATCH v4 13/13] clone: fail gracefully when cloning filtered bundle Derrick Stolee via GitGitGadget

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).