All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Partial bundle follow ups
@ 2022-03-22 17:28 Derrick Stolee via GitGitGadget
  2022-03-22 17:28 ` [PATCH 1/5] list-objects-filter: remove CL_ARG__FILTER Derrick Stolee via GitGitGadget
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-22 17:28 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, Derrick Stolee


For Junio
=========

This is based on master, which recently took ds/partial-bundles.

There are a couple conflicts with 'seen':

 * rc/fetch-refetch adds the "--refetch" option right next to where I remove
   an instance of CL_ARG__FILTER.
 * tb/cruft-packs updates the add_unseen_recent_objects_to_traversal()
   method, but this series changes one call from using "&revs" to using
   "revs".

Hopefully these are easy conflicts to resolve.


Series Goals
============

These are a few cleanups that were discussed as part of ds/partial-bundles.

 * Patch 1 removes the CL_ARG__FILTER macro.
 * Patches 2-3 help parse --filter directly into a revs.filter member
   instead of copying it from another filter_options.
 * Patches 4-5 add output of the hash function capability in 'git bundle
   verify'. It also moves the capability output to the end, since it looks a
   bit strange in the current location when there are boundary commits.


Series Non-Goals
================

Some items were discussed that I either attempted and dropped, or delayed
for another series:

 * create_bundle() in bundle.c does two commit walks: first to discover the
   boundary commits to write into the bundle header, and then another that
   happens when constructing the pack-file. It looks like this could be
   avoided if there will not be any boundary, but there are additional
   checks in write_bundle_refs() that look for the SHOWN bit on the commit
   objects in order to determine if a requested ref would be excluded by the
   rev-list options. This behavior seems important, so I did not remove it.

 * 'git clone --bare partial.bdl" should work when partial.bdl is a partial
   bundle. However, this requires changing the way we configure partial
   cloned repositories to not rely on a remote in order to understand an
   object filter. I'm working on this as a parallel series that will likely
   require more discussion than these small cleanups.

 * Ævar pointed out some newly-visible memory leaks due to moving the filter
   out of a static global. After looking at the recommended change, it seems
   that we actually need to be more careful about freeing the revs and not
   just revs.filter.

Thanks, -Stolee

Derrick Stolee (5):
  list-objects-filter: remove CL_ARG__FILTER
  pack-objects: move revs out of get_object_list()
  pack-objects: parse --filter directly into revs.filter
  bundle: move capabilities to end of 'verify'
  bundle: output hash information in 'verify'

 Documentation/git-bundle.txt  | 10 +++++-----
 builtin/fetch-pack.c          |  4 ++--
 builtin/pack-objects.c        | 37 +++++++++++++++++------------------
 bundle.c                      | 11 ++++++-----
 list-objects-filter-options.h |  5 +----
 revision.c                    |  4 ++--
 t/t6020-bundle-misc.sh        | 26 ++++++++++++++++--------
 7 files changed, 52 insertions(+), 45 deletions(-)


base-commit: f01e51a7cfd75131b7266131b1f7540ce0a8e5c1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1186%2Fderrickstolee%2Fpartial-bundle-follow-up-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1186/derrickstolee/partial-bundle-follow-up-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1186
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] list-objects-filter: remove CL_ARG__FILTER
  2022-03-22 17:28 [PATCH 0/5] Partial bundle follow ups Derrick Stolee via GitGitGadget
@ 2022-03-22 17:28 ` Derrick Stolee via GitGitGadget
  2022-03-22 17:28 ` [PATCH 2/5] pack-objects: move revs out of get_object_list() Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-22 17:28 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

We have established the command-line interface for the --[no-]filter
options for a while now, so we do not need a helper to make this
editable in the future.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/fetch-pack.c          | 4 ++--
 list-objects-filter-options.h | 5 +----
 revision.c                    | 4 ++--
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index c2d96f4c89a..c4b9104f9b5 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -153,11 +153,11 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.from_promisor = 1;
 			continue;
 		}
-		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
+		if (skip_prefix(arg, ("--filter="), &arg)) {
 			parse_list_objects_filter(&args.filter_options, arg);
 			continue;
 		}
-		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
+		if (!strcmp(arg, ("--no-filter"))) {
 			list_objects_filter_set_no_filter(&args.filter_options);
 			continue;
 		}
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 2eb6c983949..90e4bc96252 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -69,9 +69,6 @@ struct list_objects_filter_options {
 	 */
 };
 
-/* Normalized command line arguments */
-#define CL_ARG__FILTER "filter"
-
 /*
  * Parse value of the argument to the "filter" keyword.
  * On the command line this looks like:
@@ -111,7 +108,7 @@ int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset);
 
 #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
-	OPT_CALLBACK(0, CL_ARG__FILTER, fo, N_("args"), \
+	OPT_CALLBACK(0, "filter", fo, N_("args"), \
 	  N_("object filtering"), \
 	  opt_parse_list_objects_filter)
 
diff --git a/revision.c b/revision.c
index 2646b78990e..7d435f80480 100644
--- a/revision.c
+++ b/revision.c
@@ -2691,9 +2691,9 @@ static int handle_revision_pseudo_opt(struct rev_info *revs,
 		revs->no_walk = 0;
 	} else if (!strcmp(arg, "--single-worktree")) {
 		revs->single_worktree = 1;
-	} else if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
+	} else if (skip_prefix(arg, ("--filter="), &arg)) {
 		parse_list_objects_filter(&revs->filter, arg);
-	} else if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
+	} else if (!strcmp(arg, ("--no-filter"))) {
 		list_objects_filter_set_no_filter(&revs->filter);
 	} else {
 		return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5] pack-objects: move revs out of get_object_list()
  2022-03-22 17:28 [PATCH 0/5] Partial bundle follow ups Derrick Stolee via GitGitGadget
  2022-03-22 17:28 ` [PATCH 1/5] list-objects-filter: remove CL_ARG__FILTER Derrick Stolee via GitGitGadget
@ 2022-03-22 17:28 ` Derrick Stolee via GitGitGadget
  2022-03-22 17:28 ` [PATCH 3/5] pack-objects: parse --filter directly into revs.filter Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-22 17:28 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

We intend to parse the --filter option directly into revs.filter, but we
first need to move the 'revs' variable out of get_object_list() and pass
it as a pointer instead. This change only deals with the issues of
making that work.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 829ca359cf9..f29bef9d0b6 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3714,9 +3714,8 @@ static void mark_bitmap_preferred_tips(void)
 	}
 }
 
-static void get_object_list(int ac, const char **av)
+static void get_object_list(struct rev_info *revs, int ac, const char **av)
 {
-	struct rev_info revs;
 	struct setup_revision_opt s_r_opt = {
 		.allow_exclude_promisor_objects = 1,
 	};
@@ -3724,10 +3723,10 @@ static void get_object_list(int ac, const char **av)
 	int flags = 0;
 	int save_warning;
 
-	repo_init_revisions(the_repository, &revs, NULL);
+	repo_init_revisions(the_repository, revs, NULL);
 	save_commit_buffer = 0;
-	setup_revisions(ac, av, &revs, &s_r_opt);
-	list_objects_filter_copy(&revs.filter, &filter_options);
+	setup_revisions(ac, av, revs, &s_r_opt);
+	list_objects_filter_copy(&revs->filter, &filter_options);
 
 	/* make sure shallows are read */
 	is_repository_shallow(the_repository);
@@ -3757,13 +3756,13 @@ static void get_object_list(int ac, const char **av)
 			}
 			die(_("not a rev '%s'"), line);
 		}
-		if (handle_revision_arg(line, &revs, flags, REVARG_CANNOT_BE_FILENAME))
+		if (handle_revision_arg(line, revs, flags, REVARG_CANNOT_BE_FILENAME))
 			die(_("bad revision '%s'"), line);
 	}
 
 	warn_on_object_refname_ambiguity = save_warning;
 
-	if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
+	if (use_bitmap_index && !get_object_list_from_bitmap(revs))
 		return;
 
 	if (use_delta_islands)
@@ -3772,24 +3771,24 @@ static void get_object_list(int ac, const char **av)
 	if (write_bitmap_index)
 		mark_bitmap_preferred_tips();
 
-	if (prepare_revision_walk(&revs))
+	if (prepare_revision_walk(revs))
 		die(_("revision walk setup failed"));
-	mark_edges_uninteresting(&revs, show_edge, sparse);
+	mark_edges_uninteresting(revs, show_edge, sparse);
 
 	if (!fn_show_object)
 		fn_show_object = show_object;
-	traverse_commit_list(&revs,
+	traverse_commit_list(revs,
 			     show_commit, fn_show_object,
 			     NULL);
 
 	if (unpack_unreachable_expiration) {
-		revs.ignore_missing_links = 1;
-		if (add_unseen_recent_objects_to_traversal(&revs,
+		revs->ignore_missing_links = 1;
+		if (add_unseen_recent_objects_to_traversal(revs,
 				unpack_unreachable_expiration))
 			die(_("unable to add recent objects"));
-		if (prepare_revision_walk(&revs))
+		if (prepare_revision_walk(revs))
 			die(_("revision walk setup failed"));
-		traverse_commit_list(&revs, record_recent_commit,
+		traverse_commit_list(revs, record_recent_commit,
 				     record_recent_object, NULL);
 	}
 
@@ -3872,6 +3871,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	int rev_list_index = 0;
 	int stdin_packs = 0;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
+	struct rev_info revs;
+
 	struct option pack_objects_options[] = {
 		OPT_SET_INT('q', "quiet", &progress,
 			    N_("do not show progress meter"), 0),
@@ -3976,6 +3977,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 
+	repo_init_revisions(the_repository, &revs, NULL);
+
 	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
 	if (the_repository->gitdir) {
 		prepare_repo_settings(the_repository);
@@ -4154,7 +4157,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	} else if (!use_internal_rev_list) {
 		read_object_list_from_stdin();
 	} else {
-		get_object_list(rp.nr, rp.v);
+		get_object_list(&revs, rp.nr, rp.v);
 	}
 	cleanup_preferred_base();
 	if (include_tag && nr_result)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5] pack-objects: parse --filter directly into revs.filter
  2022-03-22 17:28 [PATCH 0/5] Partial bundle follow ups Derrick Stolee via GitGitGadget
  2022-03-22 17:28 ` [PATCH 1/5] list-objects-filter: remove CL_ARG__FILTER Derrick Stolee via GitGitGadget
  2022-03-22 17:28 ` [PATCH 2/5] pack-objects: move revs out of get_object_list() Derrick Stolee via GitGitGadget
@ 2022-03-22 17:28 ` Derrick Stolee via GitGitGadget
  2022-03-22 19:37   ` [-SPAM-] " Ramsay Jones
  2022-03-22 21:15   ` Ævar Arnfjörð Bjarmason
  2022-03-22 17:28 ` [PATCH 4/5] bundle: move capabilities to end of 'verify' Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-22 17:28 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change moved the 'revs' variable into cmd_pack_objects()
and now we can remote the global filter_options in favor of revs.filter.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/pack-objects.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f29bef9d0b6..d39f668ad56 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -237,8 +237,6 @@ static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
-static struct list_objects_filter_options filter_options;
-
 static struct string_list uri_protocols = STRING_LIST_INIT_NODUP;
 
 enum missing_action {
@@ -3723,10 +3721,8 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av)
 	int flags = 0;
 	int save_warning;
 
-	repo_init_revisions(the_repository, revs, NULL);
 	save_commit_buffer = 0;
 	setup_revisions(ac, av, revs, &s_r_opt);
-	list_objects_filter_copy(&revs->filter, &filter_options);
 
 	/* make sure shallows are read */
 	is_repository_shallow(the_repository);
@@ -3958,7 +3954,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			      &write_bitmap_index,
 			      N_("write a bitmap index if possible"),
 			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
-		OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
+		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
 		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
 		  N_("handling for missing objects"), PARSE_OPT_NONEG,
 		  option_parse_missing_action),
@@ -4080,7 +4076,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (filter_options.choice) {
+	if (revs.filter.choice) {
 		if (!pack_to_stdout)
 			die(_("cannot use --filter without --stdout"));
 		if (stdin_packs)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5] bundle: move capabilities to end of 'verify'
  2022-03-22 17:28 [PATCH 0/5] Partial bundle follow ups Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-03-22 17:28 ` [PATCH 3/5] pack-objects: parse --filter directly into revs.filter Derrick Stolee via GitGitGadget
@ 2022-03-22 17:28 ` Derrick Stolee via GitGitGadget
  2022-03-23  7:08   ` Bagas Sanjaya
  2022-03-22 17:28 ` [PATCH 5/5] bundle: output hash information in 'verify' Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-22 17:28 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The 'filter' capability was added in 105c6f14a (bundle: parse filter
capability, 2022-03-09), but was added in a strange place in the 'git
bundle verify' output.

The tests for this show output like the following:

	The bundle contains these 2 refs:
	<COMMIT1> <REF1>
	<COMMIT2> <REF2>
	The bundle uses this filter: blob:none
	The bundle records a complete history.

This looks very odd if we have a thin bundle that contains boundary
commits instead of a complete history:

	The bundle contains these 2 refs:
	<COMMIT1> <REF1>
	<COMMIT2> <REF2>
	The bundle uses this filter: blob:none
	The bundle requires these 2 refs:
	<COMMIT3>
	<COMMIT4>

This separation between tip refs and boundary refs is unfortunate. Move
the filter capability output to the end of the output. Update the
documentation to match.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-bundle.txt | 10 +++++-----
 bundle.c                     |  9 ++++-----
 t/t6020-bundle-misc.sh       |  2 +-
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-bundle.txt b/Documentation/git-bundle.txt
index ac4c4352aae..7685b570455 100644
--- a/Documentation/git-bundle.txt
+++ b/Documentation/git-bundle.txt
@@ -75,11 +75,11 @@ verify <file>::
 	cleanly to the current repository.  This includes checks on the
 	bundle format itself as well as checking that the prerequisite
 	commits exist and are fully linked in the current repository.
-	Information about additional capabilities, such as "object filter",
-	is printed. See "Capabilities" in link:technical/bundle-format.html
-	for more information. Finally, 'git bundle' prints a list of
-	missing commits, if any. The exit code is zero for success, but
-	will be nonzero if the bundle file is invalid.
+	Then, 'git bundle' prints a list of missing commits, if any.
+	Finally, information about additional capabilities, such as "object
+	filter", is printed. See "Capabilities" in link:technical/bundle-format.html
+	for more information. The exit code is zero for success, but will
+	be nonzero if the bundle file is invalid.
 
 list-heads <file>::
 	Lists the references defined in the bundle.  If followed by a
diff --git a/bundle.c b/bundle.c
index e359370cfcd..276b55f8ce2 100644
--- a/bundle.c
+++ b/bundle.c
@@ -267,11 +267,6 @@ int verify_bundle(struct repository *r,
 			  (uintmax_t)r->nr);
 		list_refs(r, 0, NULL);
 
-		if (header->filter.choice) {
-			printf_ln("The bundle uses this filter: %s",
-				  list_objects_filter_spec(&header->filter));
-		}
-
 		r = &header->prerequisites;
 		if (!r->nr) {
 			printf_ln(_("The bundle records a complete history."));
@@ -282,6 +277,10 @@ int verify_bundle(struct repository *r,
 				  (uintmax_t)r->nr);
 			list_refs(r, 0, NULL);
 		}
+
+		if (header->filter.choice)
+			printf_ln("The bundle uses this filter: %s",
+				  list_objects_filter_spec(&header->filter));
 	}
 	return ret;
 }
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index ed95d195427..c4ab1367afc 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -510,8 +510,8 @@ do
 		<TAG-2> refs/tags/v2
 		<TAG-3> refs/tags/v3
 		<COMMIT-P> HEAD
-		The bundle uses this filter: $filter
 		The bundle records a complete history.
+		The bundle uses this filter: $filter
 		EOF
 		test_cmp expect actual &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5] bundle: output hash information in 'verify'
  2022-03-22 17:28 [PATCH 0/5] Partial bundle follow ups Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-03-22 17:28 ` [PATCH 4/5] bundle: move capabilities to end of 'verify' Derrick Stolee via GitGitGadget
@ 2022-03-22 17:28 ` Derrick Stolee via GitGitGadget
  2022-03-23 21:27 ` [PATCH 0/5] Partial bundle follow ups Junio C Hamano
  2022-03-25 14:25 ` [PATCH] pack-objects: lazily set up "struct rev_info", don't leak Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-03-22 17:28 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change moved the 'filter' capability to the end of the 'git
bundle verify' output. Now, add the 'object-format' capability to the
output, when it exists.

This change makes 'git bundle verify' output the hash used in all cases,
even if the capability is not in the bundle. This means that v2 bundles
will always output that they use "sha1". This might look noisy to some
users, but it does simplify the implementation and the test strategy for
this feature.

Since 'verify' ends early when a prerequisite commit is missing, we need
to insert this hash message carefully into our expected test output
throughout t6020.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle.c               |  2 ++
 t/t6020-bundle-misc.sh | 24 +++++++++++++++++-------
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/bundle.c b/bundle.c
index 276b55f8ce2..d50cfb5aa7e 100644
--- a/bundle.c
+++ b/bundle.c
@@ -278,6 +278,8 @@ int verify_bundle(struct repository *r,
 			list_refs(r, 0, NULL);
 		}
 
+		printf_ln("The bundle uses this hash algorithm: %s",
+			  header->hash_algo->name);
 		if (header->filter.choice)
 			printf_ln("The bundle uses this filter: %s",
 				  list_objects_filter_spec(&header->filter));
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index c4ab1367afc..833205125ab 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -122,6 +122,8 @@ format_and_save_expect () {
 	sed -e 's/Z$//' >expect
 }
 
+HASH_MESSAGE="The bundle uses this hash algorithm: $GIT_DEFAULT_HASH"
+
 #            (C)   (D, pull/1/head, topic/1)
 #             o --- o
 #            /       \                              (L)
@@ -194,11 +196,12 @@ test_expect_success 'create bundle from special rev: main^!' '
 
 	git bundle verify special-rev.bdl |
 		make_user_friendly_and_stable_output >actual &&
-	format_and_save_expect <<-\EOF &&
+	format_and_save_expect <<-EOF &&
 	The bundle contains this ref:
 	<COMMIT-P> refs/heads/main
 	The bundle requires this ref:
 	<COMMIT-O> Z
+	$HASH_MESSAGE
 	EOF
 	test_cmp expect actual &&
 
@@ -215,12 +218,13 @@ test_expect_success 'create bundle with --max-count option' '
 
 	git bundle verify max-count.bdl |
 		make_user_friendly_and_stable_output >actual &&
-	format_and_save_expect <<-\EOF &&
+	format_and_save_expect <<-EOF &&
 	The bundle contains these 2 refs:
 	<COMMIT-P> refs/heads/main
 	<TAG-1> refs/tags/v1
 	The bundle requires this ref:
 	<COMMIT-O> Z
+	$HASH_MESSAGE
 	EOF
 	test_cmp expect actual &&
 
@@ -240,7 +244,7 @@ test_expect_success 'create bundle with --since option' '
 
 	git bundle verify since.bdl |
 		make_user_friendly_and_stable_output >actual &&
-	format_and_save_expect <<-\EOF &&
+	format_and_save_expect <<-EOF &&
 	The bundle contains these 5 refs:
 	<COMMIT-P> refs/heads/main
 	<COMMIT-N> refs/heads/release
@@ -250,6 +254,7 @@ test_expect_success 'create bundle with --since option' '
 	The bundle requires these 2 refs:
 	<COMMIT-M> Z
 	<COMMIT-K> Z
+	$HASH_MESSAGE
 	EOF
 	test_cmp expect actual &&
 
@@ -267,11 +272,12 @@ test_expect_success 'create bundle 1 - no prerequisites' '
 	EOF
 	git bundle create stdin-1.bdl --stdin <input &&
 
-	cat >expect <<-\EOF &&
+	format_and_save_expect <<-EOF &&
 	The bundle contains these 2 refs:
 	<COMMIT-D> refs/heads/topic/1
 	<COMMIT-H> refs/heads/topic/2
 	The bundle records a complete history.
+	$HASH_MESSAGE
 	EOF
 
 	# verify bundle, which has no prerequisites
@@ -308,13 +314,14 @@ test_expect_success 'create bundle 2 - has prerequisites' '
 		--stdin \
 		release <input &&
 
-	format_and_save_expect <<-\EOF &&
+	format_and_save_expect <<-EOF &&
 	The bundle contains this ref:
 	<COMMIT-N> refs/heads/release
 	The bundle requires these 3 refs:
 	<COMMIT-D> Z
 	<COMMIT-E> Z
 	<COMMIT-G> Z
+	$HASH_MESSAGE
 	EOF
 
 	git bundle verify 2.bdl |
@@ -367,13 +374,14 @@ test_expect_success 'create bundle 3 - two refs, same object' '
 		--stdin \
 		main HEAD <input &&
 
-	format_and_save_expect <<-\EOF &&
+	format_and_save_expect <<-EOF &&
 	The bundle contains these 2 refs:
 	<COMMIT-P> refs/heads/main
 	<COMMIT-P> HEAD
 	The bundle requires these 2 refs:
 	<COMMIT-M> Z
 	<COMMIT-K> Z
+	$HASH_MESSAGE
 	EOF
 
 	git bundle verify 3.bdl |
@@ -409,12 +417,13 @@ test_expect_success 'create bundle 4 - with tags' '
 		--stdin \
 		--all <input &&
 
-	cat >expect <<-\EOF &&
+	cat >expect <<-EOF &&
 	The bundle contains these 3 refs:
 	<TAG-1> refs/tags/v1
 	<TAG-2> refs/tags/v2
 	<TAG-3> refs/tags/v3
 	The bundle records a complete history.
+	$HASH_MESSAGE
 	EOF
 
 	git bundle verify 4.bdl |
@@ -511,6 +520,7 @@ do
 		<TAG-3> refs/tags/v3
 		<COMMIT-P> HEAD
 		The bundle records a complete history.
+		$HASH_MESSAGE
 		The bundle uses this filter: $filter
 		EOF
 		test_cmp expect actual &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [-SPAM-] [PATCH 3/5] pack-objects: parse --filter directly into revs.filter
  2022-03-22 17:28 ` [PATCH 3/5] pack-objects: parse --filter directly into revs.filter Derrick Stolee via GitGitGadget
@ 2022-03-22 19:37   ` Ramsay Jones
  2022-03-23 13:48     ` Derrick Stolee
  2022-03-22 21:15   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 25+ messages in thread
From: Ramsay Jones @ 2022-03-22 19:37 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git; +Cc: gitster, avarab, Derrick Stolee



On 22/03/2022 17:28, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The previous change moved the 'revs' variable into cmd_pack_objects()
> and now we can remote the global filter_options in favor of revs.filter.

s/remote/remove/

ATB,
Ramsay Jones


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] pack-objects: parse --filter directly into revs.filter
  2022-03-22 17:28 ` [PATCH 3/5] pack-objects: parse --filter directly into revs.filter Derrick Stolee via GitGitGadget
  2022-03-22 19:37   ` [-SPAM-] " Ramsay Jones
@ 2022-03-22 21:15   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-22 21:15 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, Derrick Stolee


On Tue, Mar 22 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The previous change moved the 'revs' variable into cmd_pack_objects()
> and now we can remote the global filter_options in favor of revs.filter.

grammar: s/remote/remove/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] bundle: move capabilities to end of 'verify'
  2022-03-22 17:28 ` [PATCH 4/5] bundle: move capabilities to end of 'verify' Derrick Stolee via GitGitGadget
@ 2022-03-23  7:08   ` Bagas Sanjaya
  2022-03-23 13:39     ` Derrick Stolee
  0 siblings, 1 reply; 25+ messages in thread
From: Bagas Sanjaya @ 2022-03-23  7:08 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git; +Cc: gitster, avarab, Derrick Stolee

On 23/03/22 00.28, Derrick Stolee via GitGitGadget wrote:

> -	Information about additional capabilities, such as "object filter",
> -	is printed. See "Capabilities" in link:technical/bundle-format.html
> -	for more information. Finally, 'git bundle' prints a list of
> -	missing commits, if any. The exit code is zero for success, but
> -	will be nonzero if the bundle file is invalid.
> +	Then, 'git bundle' prints a list of missing commits, if any.
> +	Finally, information about additional capabilities, such as "object
> +	filter", is printed. See "Capabilities" in link:technical/bundle-format.html
> +	for more information. The exit code is zero for success, but will
> +	be nonzero if the bundle file is invalid.

That means, nonzero exit code for corrupted bundle files, right?

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] bundle: move capabilities to end of 'verify'
  2022-03-23  7:08   ` Bagas Sanjaya
@ 2022-03-23 13:39     ` Derrick Stolee
  0 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee @ 2022-03-23 13:39 UTC (permalink / raw)
  To: Bagas Sanjaya, Derrick Stolee via GitGitGadget, git; +Cc: gitster, avarab

On 3/23/2022 3:08 AM, Bagas Sanjaya wrote:
> On 23/03/22 00.28, Derrick Stolee via GitGitGadget wrote:
> 
>> -    Information about additional capabilities, such as "object filter",
>> -    is printed. See "Capabilities" in link:technical/bundle-format.html
>> -    for more information. Finally, 'git bundle' prints a list of
>> -    missing commits, if any. The exit code is zero for success, but
>> -    will be nonzero if the bundle file is invalid.
>> +    Then, 'git bundle' prints a list of missing commits, if any.
>> +    Finally, information about additional capabilities, such as "object
>> +    filter", is printed. See "Capabilities" in link:technical/bundle-format.html
>> +    for more information. The exit code is zero for success, but will
>> +    be nonzero if the bundle file is invalid.
> 
> That means, nonzero exit code for corrupted bundle files, right?

Yes. That last sentence is unchanged from the earlier version
(modulo line wrapping).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [-SPAM-] [PATCH 3/5] pack-objects: parse --filter directly into revs.filter
  2022-03-22 19:37   ` [-SPAM-] " Ramsay Jones
@ 2022-03-23 13:48     ` Derrick Stolee
  0 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee @ 2022-03-23 13:48 UTC (permalink / raw)
  To: Ramsay Jones, Derrick Stolee via GitGitGadget, git; +Cc: gitster, avarab

On 3/22/2022 3:37 PM, Ramsay Jones wrote:> On 22/03/2022 17:28, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> The previous change moved the 'revs' variable into cmd_pack_objects()
>> and now we can remote the global filter_options in favor of revs.filter.
> 
> s/remote/remove/

Thanks (you and Ævar both).
-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] Partial bundle follow ups
  2022-03-22 17:28 [PATCH 0/5] Partial bundle follow ups Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-03-22 17:28 ` [PATCH 5/5] bundle: output hash information in 'verify' Derrick Stolee via GitGitGadget
@ 2022-03-23 21:27 ` Junio C Hamano
  2022-03-25 14:25 ` [PATCH] pack-objects: lazily set up "struct rev_info", don't leak Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2022-03-23 21:27 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, avarab, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This is based on master, which recently took ds/partial-bundles.
>
> There are a couple conflicts with 'seen':
>
>  * rc/fetch-refetch adds the "--refetch" option right next to where I remove
>    an instance of CL_ARG__FILTER.
>  * tb/cruft-packs updates the add_unseen_recent_objects_to_traversal()
>    method, but this series changes one call from using "&revs" to using
>    "revs".

Demonstrating that the submitter has already made an effort to make
sure the new topic plays well with other topics in flight this way
is very much appreciated.

> These are a few cleanups that were discussed as part of ds/partial-bundles.
>
>  * Patch 1 removes the CL_ARG__FILTER macro.
>  * Patches 2-3 help parse --filter directly into a revs.filter member
>    instead of copying it from another filter_options.
>  * Patches 4-5 add output of the hash function capability in 'git bundle
>    verify'. It also moves the capability output to the end, since it looks a
>    bit strange in the current location when there are boundary commits.

OK.

>  * create_bundle() in bundle.c does two commit walks: first to discover the
>    boundary commits to write into the bundle header, and then another that
>    happens when constructing the pack-file. It looks like this could be
>    avoided if there will not be any boundary, but there are additional
>    checks in write_bundle_refs() that look for the SHOWN bit on the commit
>    objects in order to determine if a requested ref would be excluded by the
>    rev-list options. This behavior seems important, so I did not remove it.

Good thinking.  Thanks.

>  * 'git clone --bare partial.bdl" should work when partial.bdl is a partial
>    bundle. However, this requires changing the way we configure partial
>    cloned repositories to not rely on a remote in order to understand an
>    object filter. I'm working on this as a parallel series that will likely
>    require more discussion than these small cleanups.

Leaving it outside the topic, saying that you cannot yet clone from
such a bundle, is perfectly fine, I would think.  Thanks.

>  * Ævar pointed out some newly-visible memory leaks due to moving the filter
>    out of a static global. After looking at the recommended change, it seems
>    that we actually need to be more careful about freeing the revs and not
>    just revs.filter.

OK.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-22 17:28 [PATCH 0/5] Partial bundle follow ups Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-03-23 21:27 ` [PATCH 0/5] Partial bundle follow ups Junio C Hamano
@ 2022-03-25 14:25 ` Ævar Arnfjörð Bjarmason
  2022-03-25 14:57   ` Derrick Stolee
                     ` (2 more replies)
  6 siblings, 3 replies; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-25 14:25 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

In the preceding [1] (pack-objects: move revs out of
get_object_list(), 2022-03-22) the "repo_init_revisions()" was moved
to cmd_pack_objects() so that it unconditionally took place for all
invocations of "git pack-objects".

We'd thus start leaking memory, which is easily reproduced in
e.g. git.git by feeding e83c5163316 (Initial revision of "git", the
information manager from hell, 2005-04-07) to "git pack-objects";

    $ echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial
    [...]
	==19130==ERROR: LeakSanitizer: detected memory leaks

	Direct leak of 7120 byte(s) in 1 object(s) allocated from:
	    #0 0x455308 in __interceptor_malloc (/home/avar/g/git/git+0x455308)
	    #1 0x75b399 in do_xmalloc /home/avar/g/git/wrapper.c:41:8
	    #2 0x75b356 in xmalloc /home/avar/g/git/wrapper.c:62:9
	    #3 0x5d7609 in prep_parse_options /home/avar/g/git/diff.c:5647:2
	    #4 0x5d415a in repo_diff_setup /home/avar/g/git/diff.c:4621:2
	    #5 0x6dffbb in repo_init_revisions /home/avar/g/git/revision.c:1853:2
	    #6 0x4f599d in cmd_pack_objects /home/avar/g/git/builtin/pack-objects.c:3980:2
	    #7 0x4592ca in run_builtin /home/avar/g/git/git.c:465:11
	    #8 0x457d81 in handle_builtin /home/avar/g/git/git.c:718:3
	    #9 0x458ca5 in run_argv /home/avar/g/git/git.c:785:4
	    #10 0x457b40 in cmd_main /home/avar/g/git/git.c:916:19
	    #11 0x562259 in main /home/avar/g/git/common-main.c:56:11
	    #12 0x7fce792ac7ec in __libc_start_main csu/../csu/libc-start.c:332:16
	    #13 0x4300f9 in _start (/home/avar/g/git/git+0x4300f9)

	SUMMARY: LeakSanitizer: 7120 byte(s) leaked in 1 allocation(s).
	Aborted

Narrowly fixing that commit would have been easy, just add call
repo_init_revisions() right before get_object_list(), which is
effectively what was done before that commit.

But an unstated constraint when setting it up early is that it was
needed for the subsequent [2] (pack-objects: parse --filter directly
into revs.filter, 2022-03-22), i.e. we might have a --filter
command-line option, and need to either have the "struct rev_info"
setup when we encounter that option, or later.

Let's just change the control flow so that we'll instead set up the
"struct rev_info" only when we need it. Doing so leads to a bit more
verbosity, but it's a lot clearer what we're doing and why.

We could furthermore combine the two get_object_list() invocations
here by having repo_init_revisions() invoked on &pfd.revs, but I think
clearly separating the two makes the flow clearer. Likewise
redundantly but explicitly (i.e. redundant v.s. a "{ 0 }") "0" to
"have_revs" early in cmd_pack_objects().

This does add the future constraint to opt_parse_list_objects_filter()
that we'll need to adjust this wrapper code if it looks at any other
value of the "struct option" than the "value" member.

But that regression should be relatively easy to spot. I'm
intentionally not initializing the "struct wrap" with e.g. "{ 0 }" so
that various memory sanity checkers would spot that, we just
initialize the "value" in po_filter_cb(). By doing this e.g. we'll die
on e.g. this test if we were to use another member of "opt" in
opt_parse_list_objects_filter()>

    ./t5317-pack-objects-filter-objects.sh -vixd --valgrind-only=3

While we're at it add parentheses around the arguments to the OPT_*
macros in in list-objects-filter-options.h, as we need to change those
lines anyway. It doesn't matter in this case, but is good general
practice.

1. https://lore.kernel.org/git/619b757d98465dbc4995bdc11a5282fbfcbd3daa.1647970119.git.gitgitgadget@gmail.com
2. https://lore.kernel.org/git/97de926904988b89b5663bd4c59c011a1723a8f5.1647970119.git.gitgitgadget@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

This is on top of ds/partial-bundle-more: I thought the fix for this
new leak was involved enough to propose it as a commit-on-top rather
than a fixup for a re-roll, especially since aside from the newly
leaked memory I don't think ds/partial-bundle-more is breaking
anything by doing that.

Except that is, interacting badly with my release_revisions() series
in "seen", which currently causes the "linux-leaks" job to fail there:
https://lore.kernel.org/git/cover-v2-00.27-00000000000-20220323T203149Z-avarab@gmail.com/

This is proper fix for the issue the interaction with my topic
revealed (not caused, we just started testing for this leak there),
i.e. it obsoletes the suggestion of adding an UNLEAK() there.

 builtin/pack-objects.c        | 30 +++++++++++++++++++++++++-----
 list-objects-filter-options.h |  8 +++++---
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d39f668ad56..735080a4a95 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3857,6 +3857,23 @@ static int option_parse_unpack_unreachable(const struct option *opt,
 	return 0;
 }
 
+struct po_filter_data {
+	unsigned have_revs:1;
+	struct rev_info revs;
+};
+
+static int po_filter_cb(const struct option *opt, const char *arg, int unset)
+{
+	struct po_filter_data *data = opt->value;
+	struct option wrap; /* don't initialize! */
+
+	repo_init_revisions(the_repository, &data->revs, NULL);
+	wrap.value = &data->revs.filter;
+	data->have_revs = 1;
+
+	return opt_parse_list_objects_filter(&wrap, arg, unset);
+}
+
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
@@ -3867,7 +3884,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	int rev_list_index = 0;
 	int stdin_packs = 0;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
-	struct rev_info revs;
+	struct po_filter_data pfd = { .have_revs = 0 };
 
 	struct option pack_objects_options[] = {
 		OPT_SET_INT('q', "quiet", &progress,
@@ -3954,7 +3971,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			      &write_bitmap_index,
 			      N_("write a bitmap index if possible"),
 			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
-		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
+		OPT_PARSE_LIST_OBJECTS_FILTER_CB(&pfd, po_filter_cb),
 		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
 		  N_("handling for missing objects"), PARSE_OPT_NONEG,
 		  option_parse_missing_action),
@@ -3973,8 +3990,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 
-	repo_init_revisions(the_repository, &revs, NULL);
-
 	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
 	if (the_repository->gitdir) {
 		prepare_repo_settings(the_repository);
@@ -4076,7 +4091,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (revs.filter.choice) {
+	if (pfd.have_revs && pfd.revs.filter.choice) {
 		if (!pack_to_stdout)
 			die(_("cannot use --filter without --stdout"));
 		if (stdin_packs)
@@ -4152,7 +4167,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			add_unreachable_loose_objects();
 	} else if (!use_internal_rev_list) {
 		read_object_list_from_stdin();
+	} else if (pfd.have_revs) {
+		get_object_list(&pfd.revs, rp.nr, rp.v);
 	} else {
+		struct rev_info revs;
+
+		repo_init_revisions(the_repository, &revs, NULL);
 		get_object_list(&revs, rp.nr, rp.v);
 	}
 	cleanup_preferred_base();
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 90e4bc96252..910f4a46e91 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -107,10 +107,12 @@ void parse_list_objects_filter(
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset);
 
+#define OPT_PARSE_LIST_OBJECTS_FILTER_CB(fo, cb) \
+	OPT_CALLBACK(0, "filter", (fo), N_("args"), \
+		     N_("object filtering"), (cb))
+
 #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
-	OPT_CALLBACK(0, "filter", fo, N_("args"), \
-	  N_("object filtering"), \
-	  opt_parse_list_objects_filter)
+	OPT_PARSE_LIST_OBJECTS_FILTER_CB((fo), opt_parse_list_objects_filter)
 
 /*
  * Translates abbreviated numbers in the filter's filter_spec into their
-- 
2.35.1.1501.g363d33e530b


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-25 14:25 ` [PATCH] pack-objects: lazily set up "struct rev_info", don't leak Ævar Arnfjörð Bjarmason
@ 2022-03-25 14:57   ` Derrick Stolee
  2022-03-25 16:00     ` Ævar Arnfjörð Bjarmason
  2022-03-25 18:53   ` Junio C Hamano
  2022-03-28 15:43   ` [PATCH v2] " Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 25+ messages in thread
From: Derrick Stolee @ 2022-03-25 14:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano, Bagas Sanjaya

On 3/25/2022 10:25 AM, Ævar Arnfjörð Bjarmason wrote:
> In the preceding [1] (pack-objects: move revs out of
> get_object_list(), 2022-03-22) the "repo_init_revisions()" was moved
> to cmd_pack_objects() so that it unconditionally took place for all
> invocations of "git pack-objects".
> 
> We'd thus start leaking memory, which is easily reproduced in
> e.g. git.git by feeding e83c5163316 (Initial revision of "git", the
> information manager from hell, 2005-04-07) to "git pack-objects";
> 
>     $ echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial
>     [...]
> 	==19130==ERROR: LeakSanitizer: detected memory leaks
> 
> 	Direct leak of 7120 byte(s) in 1 object(s) allocated from:
> 	    #0 0x455308 in __interceptor_malloc (/home/avar/g/git/git+0x455308)
> 	    #1 0x75b399 in do_xmalloc /home/avar/g/git/wrapper.c:41:8
> 	    #2 0x75b356 in xmalloc /home/avar/g/git/wrapper.c:62:9
> 	    #3 0x5d7609 in prep_parse_options /home/avar/g/git/diff.c:5647:2
> 	    #4 0x5d415a in repo_diff_setup /home/avar/g/git/diff.c:4621:2
> 	    #5 0x6dffbb in repo_init_revisions /home/avar/g/git/revision.c:1853:2
> 	    #6 0x4f599d in cmd_pack_objects /home/avar/g/git/builtin/pack-objects.c:3980:2
> 	    #7 0x4592ca in run_builtin /home/avar/g/git/git.c:465:11
> 	    #8 0x457d81 in handle_builtin /home/avar/g/git/git.c:718:3
> 	    #9 0x458ca5 in run_argv /home/avar/g/git/git.c:785:4
> 	    #10 0x457b40 in cmd_main /home/avar/g/git/git.c:916:19
> 	    #11 0x562259 in main /home/avar/g/git/common-main.c:56:11
> 	    #12 0x7fce792ac7ec in __libc_start_main csu/../csu/libc-start.c:332:16
> 	    #13 0x4300f9 in _start (/home/avar/g/git/git+0x4300f9)
> 
> 	SUMMARY: LeakSanitizer: 7120 byte(s) leaked in 1 allocation(s).
> 	Aborted
> 
> Narrowly fixing that commit would have been easy, just add call
> repo_init_revisions() right before get_object_list(), which is
> effectively what was done before that commit.
> 
> But an unstated constraint when setting it up early is that it was
> needed for the subsequent [2] (pack-objects: parse --filter directly
> into revs.filter, 2022-03-22), i.e. we might have a --filter
> command-line option, and need to either have the "struct rev_info"
> setup when we encounter that option, or later.
> 
> Let's just change the control flow so that we'll instead set up the
> "struct rev_info" only when we need it. Doing so leads to a bit more
> verbosity, but it's a lot clearer what we're doing and why.

This makes sense.

> We could furthermore combine the two get_object_list() invocations
> here by having repo_init_revisions() invoked on &pfd.revs, but I think
> clearly separating the two makes the flow clearer. Likewise
> redundantly but explicitly (i.e. redundant v.s. a "{ 0 }") "0" to
> "have_revs" early in cmd_pack_objects().

I disagree, especially when you later want to make sure we free
the data from revs using your release_revisions().

> This does add the future constraint to opt_parse_list_objects_filter()
> that we'll need to adjust this wrapper code if it looks at any other
> value of the "struct option" than the "value" member.

So we are coupling ourselves to the implementation of this method.

> But that regression should be relatively easy to spot. I'm
> intentionally not initializing the "struct wrap" with e.g. "{ 0 }" so
> that various memory sanity checkers would spot that, we just
> initialize the "value" in po_filter_cb(). By doing this e.g. we'll die
> on e.g. this test if we were to use another member of "opt" in
> opt_parse_list_objects_filter()>

So you are using uninitialized memory as a way to discover any
necessary changes to that coupling. I'm not sure this is super-safe
because we don't necessarily run memory checkers during CI builds.

I'd rather have a consistently initialized chunk of data that would
behave predictably (and hopefully we'd discover it is behaving
incorrectly with that predictable behavior).


> While we're at it add parentheses around the arguments to the OPT_*
> macros in in list-objects-filter-options.h, as we need to change those
> lines anyway. It doesn't matter in this case, but is good general
> practice.
> 
> 1. https://lore.kernel.org/git/619b757d98465dbc4995bdc11a5282fbfcbd3daa.1647970119.git.gitgitgadget@gmail.com
> 2. https://lore.kernel.org/git/97de926904988b89b5663bd4c59c011a1723a8f5.1647970119.git.gitgitgadget@gmail.com/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> 
> This is on top of ds/partial-bundle-more: I thought the fix for this
> new leak was involved enough to propose it as a commit-on-top rather
> than a fixup for a re-roll, especially since aside from the newly
> leaked memory I don't think ds/partial-bundle-more is breaking
> anything by doing that.
> 
> Except that is, interacting badly with my release_revisions() series
> in "seen", which currently causes the "linux-leaks" job to fail there:
> https://lore.kernel.org/git/cover-v2-00.27-00000000000-20220323T203149Z-avarab@gmail.com/
> 
> This is proper fix for the issue the interaction with my topic
> revealed (not caused, we just started testing for this leak there),
> i.e. it obsoletes the suggestion of adding an UNLEAK() there.
> 
>  builtin/pack-objects.c        | 30 +++++++++++++++++++++++++-----
>  list-objects-filter-options.h |  8 +++++---
>  2 files changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index d39f668ad56..735080a4a95 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3857,6 +3857,23 @@ static int option_parse_unpack_unreachable(const struct option *opt,
>  	return 0;
>  }
>  
> +struct po_filter_data {
> +	unsigned have_revs:1;
> +	struct rev_info revs;
> +};
> +
> +static int po_filter_cb(const struct option *opt, const char *arg, int unset)
> +{
> +	struct po_filter_data *data = opt->value;
> +	struct option wrap; /* don't initialize! */
> +
> +	repo_init_revisions(the_repository, &data->revs, NULL);
> +	wrap.value = &data->revs.filter;
> +	data->have_revs = 1;
> +
> +	return opt_parse_list_objects_filter(&wrap, arg, unset);
> +}

The coupling here is unfortunate, but unavoidable. The future-proof
way to do it would be to modify opt->value and pass the rest of its
members as-is, but it's marked const so that's not an option.

One way to help make this coupling more obvious would be to move
this method into list-filter-options.c so we can have their
implementations adjacent and even refer to them.

Here is a potential version that looks like that:

--- >8 ---

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index f02d8df142..55c7131814 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -281,6 +281,10 @@ void parse_list_objects_filter(
 		die("%s", errbuf.buf);
 }
 
+/*
+ * If you change this to use any member in 'opt' other than 'value',
+ * then be sure to update opt_parse_list_objects_filter_in_revs().
+ */
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset)
 {
@@ -293,6 +297,18 @@ int opt_parse_list_objects_filter(const struct option *opt,
 	return 0;
 }
 
+int opt_parse_list_objects_filter_in_revs(const struct option *opt,
+					  const char *arg, int unset)
+{
+	struct rev_info_maybe_empty *ri = opt->value;
+	struct option wrap = { .value = &ri->revs.filter };
+
+	repo_init_revisions(the_repository, &ri->revs, NULL);
+	ri->has_revs = 1;
+
+	return opt_parse_list_objects_filter(&wrap, arg, unset);
+}
+
 const char *list_objects_filter_spec(struct list_objects_filter_options *filter)
 {
 	if (!filter->filter_spec.nr)
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 90e4bc9625..cf8b710a76 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -106,6 +106,8 @@ void parse_list_objects_filter(
 
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset);
+int opt_parse_list_objects_filter_in_revs(const struct option *opt,
+					  const char *arg, int unset);
 
 #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
 	OPT_CALLBACK(0, "filter", fo, N_("args"), \
diff --git a/revision.h b/revision.h
index 5bc59c7bfe..95aa3ee891 100644
--- a/revision.h
+++ b/revision.h
@@ -329,6 +329,11 @@ struct rev_info {
 	struct tmp_objdir *remerge_objdir;
 };
 
+struct rev_info_maybe_empty {
+	int has_revs;
+	struct rev_info revs;
+};
+
 int ref_excluded(struct string_list *, const char *path);
 void clear_ref_exclusion(struct string_list **);
 void add_ref_exclusion(struct string_list **, const char *exclude);

--- >8 ---


> +	} else if (pfd.have_revs) {
> +		get_object_list(&pfd.revs, rp.nr, rp.v);
>  	} else {
> +		struct rev_info revs;
> +
> +		repo_init_revisions(the_repository, &revs, NULL);
>  		get_object_list(&revs, rp.nr, rp.v);
>  	}

Here, I think it would be better to have

	else {
		if (!pfd.have_revs) {
			repo_init_revisions(the_repository, &pfd.revs, NULL);
			pfd.have_revs = 1;
		}
		get_object_list(&pfd.revs, rp.nr, rp.v);
	}

and then later you can add

	if (pfd.have_revs)
		release_revisions(&pfd.revs);

to clear the memory in exactly one place.

Thanks,
-Stolee

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-25 14:57   ` Derrick Stolee
@ 2022-03-25 16:00     ` Ævar Arnfjörð Bjarmason
  2022-03-25 16:41       ` Derrick Stolee
  0 siblings, 1 reply; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-25 16:00 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano, Bagas Sanjaya


On Fri, Mar 25 2022, Derrick Stolee wrote:

> On 3/25/2022 10:25 AM, Ævar Arnfjörð Bjarmason wrote:
>> In the preceding [1] (pack-objects: move revs out of
>> get_object_list(), 2022-03-22) the "repo_init_revisions()" was moved
>> to cmd_pack_objects() so that it unconditionally took place for all
>> invocations of "git pack-objects".
>> 
>> We'd thus start leaking memory, which is easily reproduced in
>> e.g. git.git by feeding e83c5163316 (Initial revision of "git", the
>> information manager from hell, 2005-04-07) to "git pack-objects";
>> 
>>     $ echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial
>>     [...]
>> 	==19130==ERROR: LeakSanitizer: detected memory leaks
>> 
>> 	Direct leak of 7120 byte(s) in 1 object(s) allocated from:
>> 	    #0 0x455308 in __interceptor_malloc (/home/avar/g/git/git+0x455308)
>> 	    #1 0x75b399 in do_xmalloc /home/avar/g/git/wrapper.c:41:8
>> 	    #2 0x75b356 in xmalloc /home/avar/g/git/wrapper.c:62:9
>> 	    #3 0x5d7609 in prep_parse_options /home/avar/g/git/diff.c:5647:2
>> 	    #4 0x5d415a in repo_diff_setup /home/avar/g/git/diff.c:4621:2
>> 	    #5 0x6dffbb in repo_init_revisions /home/avar/g/git/revision.c:1853:2
>> 	    #6 0x4f599d in cmd_pack_objects /home/avar/g/git/builtin/pack-objects.c:3980:2
>> 	    #7 0x4592ca in run_builtin /home/avar/g/git/git.c:465:11
>> 	    #8 0x457d81 in handle_builtin /home/avar/g/git/git.c:718:3
>> 	    #9 0x458ca5 in run_argv /home/avar/g/git/git.c:785:4
>> 	    #10 0x457b40 in cmd_main /home/avar/g/git/git.c:916:19
>> 	    #11 0x562259 in main /home/avar/g/git/common-main.c:56:11
>> 	    #12 0x7fce792ac7ec in __libc_start_main csu/../csu/libc-start.c:332:16
>> 	    #13 0x4300f9 in _start (/home/avar/g/git/git+0x4300f9)
>> 
>> 	SUMMARY: LeakSanitizer: 7120 byte(s) leaked in 1 allocation(s).
>> 	Aborted
>> 
>> Narrowly fixing that commit would have been easy, just add call
>> repo_init_revisions() right before get_object_list(), which is
>> effectively what was done before that commit.
>> 
>> But an unstated constraint when setting it up early is that it was
>> needed for the subsequent [2] (pack-objects: parse --filter directly
>> into revs.filter, 2022-03-22), i.e. we might have a --filter
>> command-line option, and need to either have the "struct rev_info"
>> setup when we encounter that option, or later.
>> 
>> Let's just change the control flow so that we'll instead set up the
>> "struct rev_info" only when we need it. Doing so leads to a bit more
>> verbosity, but it's a lot clearer what we're doing and why.
>
> This makes sense.
>
>> We could furthermore combine the two get_object_list() invocations
>> here by having repo_init_revisions() invoked on &pfd.revs, but I think
>> clearly separating the two makes the flow clearer. Likewise
>> redundantly but explicitly (i.e. redundant v.s. a "{ 0 }") "0" to
>> "have_revs" early in cmd_pack_objects().
>
> I disagree, especially when you later want to make sure we free
> the data from revs using your release_revisions().

Because we'll need two release_revisions() instead of one?

>> This does add the future constraint to opt_parse_list_objects_filter()
>> that we'll need to adjust this wrapper code if it looks at any other
>> value of the "struct option" than the "value" member.
>
> So we are coupling ourselves to the implementation of this method.

Sure, but aren't all OPT_* where the macro is providing some custom
callback coupling themselves to how it's implemented?

>> But that regression should be relatively easy to spot. I'm
>> intentionally not initializing the "struct wrap" with e.g. "{ 0 }" so
>> that various memory sanity checkers would spot that, we just
>> initialize the "value" in po_filter_cb(). By doing this e.g. we'll die
>> on e.g. this test if we were to use another member of "opt" in
>> opt_parse_list_objects_filter()>
>
> So you are using uninitialized memory as a way to discover any
> necessary changes to that coupling. I'm not sure this is super-safe
> because we don't necessarily run memory checkers during CI builds.
>
> I'd rather have a consistently initialized chunk of data that would
> behave predictably (and hopefully we'd discover it is behaving
> incorrectly with that predictable behavior).

I'd like to keep this as-is, i.e. if you zero it out it might also
behave unpredictably or even in undefined ways (e.g. a NULL ptr
dereference). Likewise in my version, but it has the benefit of being
caught by some tooling we have.

>> +struct po_filter_data {
>> +	unsigned have_revs:1;
>> +	struct rev_info revs;
>> +};
>> +
>> +static int po_filter_cb(const struct option *opt, const char *arg, int unset)
>> +{
>> +	struct po_filter_data *data = opt->value;
>> +	struct option wrap; /* don't initialize! */
>> +
>> +	repo_init_revisions(the_repository, &data->revs, NULL);
>> +	wrap.value = &data->revs.filter;
>> +	data->have_revs = 1;
>> +
>> +	return opt_parse_list_objects_filter(&wrap, arg, unset);
>> +}
>
> The coupling here is unfortunate, but unavoidable. The future-proof
> way to do it would be to modify opt->value and pass the rest of its
> members as-is, but it's marked const so that's not an option.

We could always cast it..., but that would probably be more nasty.

> One way to help make this coupling more obvious would be to move
> this method into list-filter-options.c so we can have their
> implementations adjacent and even refer to them.
>
> Here is a potential version that looks like that:
>
> --- >8 ---
>
> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index f02d8df142..55c7131814 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -281,6 +281,10 @@ void parse_list_objects_filter(
>  		die("%s", errbuf.buf);
>  }
>  
> +/*
> + * If you change this to use any member in 'opt' other than 'value',
> + * then be sure to update opt_parse_list_objects_filter_in_revs().
> + */
>  int opt_parse_list_objects_filter(const struct option *opt,
>  				  const char *arg, int unset)
>  {
> @@ -293,6 +297,18 @@ int opt_parse_list_objects_filter(const struct option *opt,
>  	return 0;
>  }
>  
> +int opt_parse_list_objects_filter_in_revs(const struct option *opt,
> +					  const char *arg, int unset)
> +{
> +	struct rev_info_maybe_empty *ri = opt->value;
> +	struct option wrap = { .value = &ri->revs.filter };
> +
> +	repo_init_revisions(the_repository, &ri->revs, NULL);
> +	ri->has_revs = 1;
> +
> +	return opt_parse_list_objects_filter(&wrap, arg, unset);
> +}
> +
>  const char *list_objects_filter_spec(struct list_objects_filter_options *filter)
>  {
>  	if (!filter->filter_spec.nr)
> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> index 90e4bc9625..cf8b710a76 100644
> --- a/list-objects-filter-options.h
> +++ b/list-objects-filter-options.h
> @@ -106,6 +106,8 @@ void parse_list_objects_filter(
>  
>  int opt_parse_list_objects_filter(const struct option *opt,
>  				  const char *arg, int unset);
> +int opt_parse_list_objects_filter_in_revs(const struct option *opt,
> +					  const char *arg, int unset);
>  
>  #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
>  	OPT_CALLBACK(0, "filter", fo, N_("args"), \
> diff --git a/revision.h b/revision.h
> index 5bc59c7bfe..95aa3ee891 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -329,6 +329,11 @@ struct rev_info {
>  	struct tmp_objdir *remerge_objdir;
>  };
>  
> +struct rev_info_maybe_empty {
> +	int has_revs;
> +	struct rev_info revs;
> +};
> +
>  int ref_excluded(struct string_list *, const char *path);
>  void clear_ref_exclusion(struct string_list **);
>  void add_ref_exclusion(struct string_list **, const char *exclude);
>
> --- >8 ---
>
>
>> +	} else if (pfd.have_revs) {
>> +		get_object_list(&pfd.revs, rp.nr, rp.v);
>>  	} else {
>> +		struct rev_info revs;
>> +
>> +		repo_init_revisions(the_repository, &revs, NULL);
>>  		get_object_list(&revs, rp.nr, rp.v);
>>  	}
>
> Here, I think it would be better to have
>
> 	else {
> 		if (!pfd.have_revs) {
> 			repo_init_revisions(the_repository, &pfd.revs, NULL);
> 			pfd.have_revs = 1;
> 		}
> 		get_object_list(&pfd.revs, rp.nr, rp.v);
> 	}

Conceptually I think the saving of that one line isn't worth it to the
reader.

Then you'd need to read up and see exactly how pfd.revs gets mutated,
and its callback etc., only to see we're just doing this to save
ourselves a variable deceleration and a call to release_revisions().

> and then later you can add
>
> 	if (pfd.have_revs)
> 		release_revisions(&pfd.revs);
>
> to clear the memory in exactly one place.

Yeah, it would work. I'd just prefer control flow that's trivial to
reason about over saving a couple of lines here.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-25 16:00     ` Ævar Arnfjörð Bjarmason
@ 2022-03-25 16:41       ` Derrick Stolee
  2022-03-25 17:34         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 25+ messages in thread
From: Derrick Stolee @ 2022-03-25 16:41 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Bagas Sanjaya

On 3/25/2022 12:00 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Mar 25 2022, Derrick Stolee wrote:
> 
>> On 3/25/2022 10:25 AM, Ævar Arnfjörð Bjarmason wrote:
>>> In the preceding [1] (pack-objects: move revs out of
>>> get_object_list(), 2022-03-22) the "repo_init_revisions()" was moved
>>> to cmd_pack_objects() so that it unconditionally took place for all
>>> invocations of "git pack-objects".
>>>
>>> We'd thus start leaking memory, which is easily reproduced in
>>> e.g. git.git by feeding e83c5163316 (Initial revision of "git", the
>>> information manager from hell, 2005-04-07) to "git pack-objects";
>>>
>>>     $ echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial
>>>     [...]
>>> 	==19130==ERROR: LeakSanitizer: detected memory leaks
>>>
>>> 	Direct leak of 7120 byte(s) in 1 object(s) allocated from:
>>> 	    #0 0x455308 in __interceptor_malloc (/home/avar/g/git/git+0x455308)
>>> 	    #1 0x75b399 in do_xmalloc /home/avar/g/git/wrapper.c:41:8
>>> 	    #2 0x75b356 in xmalloc /home/avar/g/git/wrapper.c:62:9
>>> 	    #3 0x5d7609 in prep_parse_options /home/avar/g/git/diff.c:5647:2
>>> 	    #4 0x5d415a in repo_diff_setup /home/avar/g/git/diff.c:4621:2
>>> 	    #5 0x6dffbb in repo_init_revisions /home/avar/g/git/revision.c:1853:2
>>> 	    #6 0x4f599d in cmd_pack_objects /home/avar/g/git/builtin/pack-objects.c:3980:2
>>> 	    #7 0x4592ca in run_builtin /home/avar/g/git/git.c:465:11
>>> 	    #8 0x457d81 in handle_builtin /home/avar/g/git/git.c:718:3
>>> 	    #9 0x458ca5 in run_argv /home/avar/g/git/git.c:785:4
>>> 	    #10 0x457b40 in cmd_main /home/avar/g/git/git.c:916:19
>>> 	    #11 0x562259 in main /home/avar/g/git/common-main.c:56:11
>>> 	    #12 0x7fce792ac7ec in __libc_start_main csu/../csu/libc-start.c:332:16
>>> 	    #13 0x4300f9 in _start (/home/avar/g/git/git+0x4300f9)
>>>
>>> 	SUMMARY: LeakSanitizer: 7120 byte(s) leaked in 1 allocation(s).
>>> 	Aborted
>>>
>>> Narrowly fixing that commit would have been easy, just add call
>>> repo_init_revisions() right before get_object_list(), which is
>>> effectively what was done before that commit.
>>>
>>> But an unstated constraint when setting it up early is that it was
>>> needed for the subsequent [2] (pack-objects: parse --filter directly
>>> into revs.filter, 2022-03-22), i.e. we might have a --filter
>>> command-line option, and need to either have the "struct rev_info"
>>> setup when we encounter that option, or later.
>>>
>>> Let's just change the control flow so that we'll instead set up the
>>> "struct rev_info" only when we need it. Doing so leads to a bit more
>>> verbosity, but it's a lot clearer what we're doing and why.
>>
>> This makes sense.
>>
>>> We could furthermore combine the two get_object_list() invocations
>>> here by having repo_init_revisions() invoked on &pfd.revs, but I think
>>> clearly separating the two makes the flow clearer. Likewise
>>> redundantly but explicitly (i.e. redundant v.s. a "{ 0 }") "0" to
>>> "have_revs" early in cmd_pack_objects().
>>
>> I disagree, especially when you later want to make sure we free
>> the data from revs using your release_revisions().
> 
> Because we'll need two release_revisions() instead of one?

And it would be easy to miss one of the two if you are only testing
certain paths with leak-check on.

>>> This does add the future constraint to opt_parse_list_objects_filter()
>>> that we'll need to adjust this wrapper code if it looks at any other
>>> value of the "struct option" than the "value" member.
>>
>> So we are coupling ourselves to the implementation of this method.
> 
> Sure, but aren't all OPT_* where the macro is providing some custom
> callback coupling themselves to how it's implemented?

No, I mean that the callback function you are making here is coupling
itself to the existing callback function in a novel way.

I tried to find another example of a nested callback, but I didn't
even find another instance of creating a new 'struct option'.

>>> But that regression should be relatively easy to spot. I'm
>>> intentionally not initializing the "struct wrap" with e.g. "{ 0 }" so
>>> that various memory sanity checkers would spot that, we just
>>> initialize the "value" in po_filter_cb(). By doing this e.g. we'll die
>>> on e.g. this test if we were to use another member of "opt" in
>>> opt_parse_list_objects_filter()>
>>
>> So you are using uninitialized memory as a way to discover any
>> necessary changes to that coupling. I'm not sure this is super-safe
>> because we don't necessarily run memory checkers during CI builds.
>>
>> I'd rather have a consistently initialized chunk of data that would
>> behave predictably (and hopefully we'd discover it is behaving
>> incorrectly with that predictable behavior).
> 
> I'd like to keep this as-is, i.e. if you zero it out it might also
> behave unpredictably or even in undefined ways (e.g. a NULL ptr
> dereference). Likewise in my version, but it has the benefit of being
> caught by some tooling we have.

I guess by "predictable" I mean "well-defined" or "deterministic".

I don't want the build options to change how this works (for instance,
debug builds sometimes initialize data to zero).

>> +struct rev_info_maybe_empty {
>> +	int has_revs;
>> +	struct rev_info revs;
>> +};

Thinking about this a second time, perhaps it would be best to add
an "unsigned initialized:1;" to struct rev_info so we can look at
such a struct and know whether or not repo_init_revisions() has
been run or not. Avoids the custom struct and unifies a few things.

In particular, release_revisions() could choose to do nothing if
revs->initialized is false.

Further, a second repo_init_revisions() could do nothing if
revs->initialized is true. This allows us to safely "re-init"
without our own "if (has_revs)" checks...
>> 	else {
>> 		if (!pfd.have_revs) {
>> 			repo_init_revisions(the_repository, &pfd.revs, NULL);
>> 			pfd.have_revs = 1;
>> 		}
>> 		get_object_list(&pfd.revs, rp.nr, rp.v);
>> 	}

...making this just

	else {
		repo_init_revisions(the_repository, &revs, NULL);
		get_object_list(&revs. rp.nr, rp.v);
	}

> Conceptually I think the saving of that one line isn't worth it to the
> reader.
> 
> Then you'd need to read up and see exactly how pfd.revs gets mutated,
> and its callback etc., only to see we're just doing this to save
> ourselves a variable deceleration and a call to release_revisions().
> 
>> and then later you can add
>>
>> 	if (pfd.have_revs)
>> 		release_revisions(&pfd.revs);

And this would just be

	release_revisions(&revs);

>> to clear the memory in exactly one place.
> 
> Yeah, it would work. I'd just prefer control flow that's trivial to
> reason about over saving a couple of lines here.

I think having multiple revision things that can live at different
levels (one embedded in a custom struct, one not) is not trivial to
reason about. If we change the "is this initialized" indicator to be
within 'struct rev_info', then this gets simpler.

It seems to me that it is easier to track a single struct and release
the memory in one place based on its lifespan.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-25 16:41       ` Derrick Stolee
@ 2022-03-25 17:34         ` Ævar Arnfjörð Bjarmason
  2022-03-25 19:08           ` Derrick Stolee
  0 siblings, 1 reply; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-25 17:34 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano, Bagas Sanjaya


On Fri, Mar 25 2022, Derrick Stolee wrote:

> On 3/25/2022 12:00 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Fri, Mar 25 2022, Derrick Stolee wrote:
>> [...]
>>>> We could furthermore combine the two get_object_list() invocations
>>>> here by having repo_init_revisions() invoked on &pfd.revs, but I think
>>>> clearly separating the two makes the flow clearer. Likewise
>>>> redundantly but explicitly (i.e. redundant v.s. a "{ 0 }") "0" to
>>>> "have_revs" early in cmd_pack_objects().
>>>
>>> I disagree, especially when you later want to make sure we free
>>> the data from revs using your release_revisions().
>> 
>> Because we'll need two release_revisions() instead of one?
>
> And it would be easy to miss one of the two if you are only testing
> certain paths with leak-check on.
>
>>>> This does add the future constraint to opt_parse_list_objects_filter()
>>>> that we'll need to adjust this wrapper code if it looks at any other
>>>> value of the "struct option" than the "value" member.
>>>
>>> So we are coupling ourselves to the implementation of this method.
>> 
>> Sure, but aren't all OPT_* where the macro is providing some custom
>> callback coupling themselves to how it's implemented?
>
> No, I mean that the callback function you are making here is coupling
> itself to the existing callback function in a novel way.

Yeah, I did start by just copy/pasting its contents, or I could have
both of them call a semi-"private" wrapper, would you prefer one of
those instead?

> I tried to find another example of a nested callback, but I didn't
> even find another instance of creating a new 'struct option'.

I haven't seen a nested one, FWIW OPT_ALIAS() creates a new "struct
option", but in a completely unrelated way, what I'm doing here is
probably novel within the codebase...

>>>> But that regression should be relatively easy to spot. I'm
>>>> intentionally not initializing the "struct wrap" with e.g. "{ 0 }" so
>>>> that various memory sanity checkers would spot that, we just
>>>> initialize the "value" in po_filter_cb(). By doing this e.g. we'll die
>>>> on e.g. this test if we were to use another member of "opt" in
>>>> opt_parse_list_objects_filter()>
>>>
>>> So you are using uninitialized memory as a way to discover any
>>> necessary changes to that coupling. I'm not sure this is super-safe
>>> because we don't necessarily run memory checkers during CI builds.
>>>
>>> I'd rather have a consistently initialized chunk of data that would
>>> behave predictably (and hopefully we'd discover it is behaving
>>> incorrectly with that predictable behavior).
>> 
>> I'd like to keep this as-is, i.e. if you zero it out it might also
>> behave unpredictably or even in undefined ways (e.g. a NULL ptr
>> dereference). Likewise in my version, but it has the benefit of being
>> caught by some tooling we have.
>
> I guess by "predictable" I mean "well-defined" or "deterministic".
>
> I don't want the build options to change how this works (for instance,
> debug builds sometimes initialize data to zero).

Yeah I agree that does suck.

Honestly I don't care that much in this case, I just wanted to resolve
the topic conflict.

In the general case though I think it's much better to have a flaky
failure caught on some platforms, via valgrind or whatever because we
used uninitialized memory than to have a silent failure (e.g. because
it's always an int=0, but we expected something else...)

>>> +struct rev_info_maybe_empty {
>>> +	int has_revs;
>>> +	struct rev_info revs;
>>> +};
>
> Thinking about this a second time, perhaps it would be best to add
> an "unsigned initialized:1;" to struct rev_info so we can look at
> such a struct and know whether or not repo_init_revisions() has
> been run or not. Avoids the custom struct and unifies a few things.
>
> In particular, release_revisions() could choose to do nothing if
> revs->initialized is false.

This plan won't work because that behavior is both undefined per the
standard, and something that's wildly undefined in practice.

I.e. we initialize it on the stack, so it'll point to uninitialized
memory, sometimes that bit will be 0, sometimes 1...

If you mean just initialize it to { 0 } or whatever that would work,
yes, but if we're going to refactor all the callers to do that we might
as well refactor the few missing bits that would be needed to initialize
it statically, and drop the dynamic by default initialization...

> Further, a second repo_init_revisions() could do nothing if
> revs->initialized is true. This allows us to safely "re-init"
> without our own "if (has_revs)" checks...

Yeah if you had a previous repo_init_revisions() you could rely on that.

>>> 	else {
>>> 		if (!pfd.have_revs) {
>>> 			repo_init_revisions(the_repository, &pfd.revs, NULL);
>>> 			pfd.have_revs = 1;
>>> 		}
>>> 		get_object_list(&pfd.revs, rp.nr, rp.v);
>>> 	}
>
> ...making this just
>
> 	else {
> 		repo_init_revisions(the_repository, &revs, NULL);
> 		get_object_list(&revs. rp.nr, rp.v);
> 	}
>
>> Conceptually I think the saving of that one line isn't worth it to the
>> reader.
>> 
>> Then you'd need to read up and see exactly how pfd.revs gets mutated,
>> and its callback etc., only to see we're just doing this to save
>> ourselves a variable deceleration and a call to release_revisions().
>> 
>>> and then later you can add
>>>
>>> 	if (pfd.have_revs)
>>> 		release_revisions(&pfd.revs);
>
> And this would just be
>
> 	release_revisions(&revs);
>
>>> to clear the memory in exactly one place.
>> 
>> Yeah, it would work. I'd just prefer control flow that's trivial to
>> reason about over saving a couple of lines here.
>
> I think having multiple revision things that can live at different
> levels (one embedded in a custom struct, one not) is not trivial to
> reason about. If we change the "is this initialized" indicator to be
> within 'struct rev_info', then this gets simpler.

I meant: if you want to read that code through without considering the
"filter" case it's obvious that you can skip the whole "pfd.revs" part
in that case, and that it's only something to do with "filter".

> It seems to me that it is easier to track a single struct and release
> the memory in one place based on its lifespan.

I submitted a patch on top because it looked like your topic was going
to be merged to "next" soon, and I really didn't care much in this case.

But FWIW I think a much more obvious thing to do overall would be to
skip the whole "filter bust me in rev_info" refactoring part of your
series and just add a trivial list_objects_filter_copy_attach() method,
or do it inline with memcpy/memset.

I.e. to not touch the "filter" etc. callback stuff at all, still pass it
to get_object_list(). Can't 2/5 and 3/5 in your series be replaced by
this simpler and smaller change?:

	diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
	index 829ca359cf9..4ce76a378c8 100644
	--- a/builtin/pack-objects.c
	+++ b/builtin/pack-objects.c
	@@ -237,8 +237,6 @@ static unsigned long cache_max_small_delta_size = 1000;
	 
	 static unsigned long window_memory_limit = 0;
	 
	-static struct list_objects_filter_options filter_options;
	-
	 static struct string_list uri_protocols = STRING_LIST_INIT_NODUP;
	 
	 enum missing_action {
	@@ -3714,7 +3712,8 @@ static void mark_bitmap_preferred_tips(void)
	 	}
	 }
	 
	-static void get_object_list(int ac, const char **av)
	+static void get_object_list(int ac, const char **av,
	+			    struct list_objects_filter_options *filter)
	 {
	 	struct rev_info revs;
	 	struct setup_revision_opt s_r_opt = {
	@@ -3727,7 +3726,9 @@ static void get_object_list(int ac, const char **av)
	 	repo_init_revisions(the_repository, &revs, NULL);
	 	save_commit_buffer = 0;
	 	setup_revisions(ac, av, &revs, &s_r_opt);
	-	list_objects_filter_copy(&revs.filter, &filter_options);
	+	/* attach our CLI --filter to rev_info's filter */
	+	memcpy(&revs.filter, filter, sizeof(*filter));
	+	memset(filter, 0, sizeof(*filter));
	 
	 	/* make sure shallows are read */
	 	is_repository_shallow(the_repository);
	@@ -3872,6 +3873,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
	 	int rev_list_index = 0;
	 	int stdin_packs = 0;
	 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
	+	struct list_objects_filter_options filter_options = { 0 };
	 	struct option pack_objects_options[] = {
	 		OPT_SET_INT('q', "quiet", &progress,
	 			    N_("do not show progress meter"), 0),
	@@ -4154,7 +4156,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
	 	} else if (!use_internal_rev_list) {
	 		read_object_list_from_stdin();
	 	} else {
	-		get_object_list(rp.nr, rp.v);
	+		get_object_list(rp.nr, rp.v, &filter_options);
	 	}
	 	cleanup_preferred_base();
	 	if (include_tag && nr_result)

And even most of that could be omitted by not removing the global
"static struct" since pack-objects is a one-off anyway ... :)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-25 14:25 ` [PATCH] pack-objects: lazily set up "struct rev_info", don't leak Ævar Arnfjörð Bjarmason
  2022-03-25 14:57   ` Derrick Stolee
@ 2022-03-25 18:53   ` Junio C Hamano
  2022-03-26  1:09     ` Ævar Arnfjörð Bjarmason
  2022-03-28 15:43   ` [PATCH v2] " Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2022-03-25 18:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Derrick Stolee, Bagas Sanjaya

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> In the preceding [1] (pack-objects: move revs out of
> get_object_list(), 2022-03-22) the "repo_init_revisions()" was moved
> to cmd_pack_objects() so that it unconditionally took place for all
> invocations of "git pack-objects".
>
> We'd thus start leaking memory, which is easily reproduced in
> e.g. git.git by feeding e83c5163316 (Initial revision of "git", the
> information manager from hell, 2005-04-07) to "git pack-objects";
> ...
> Narrowly fixing that commit would have been easy, just add call
> repo_init_revisions() right before get_object_list(), which is
> effectively what was done before that commit.
>
> But an unstated constraint when setting it up early is that it was
> needed for the subsequent [2] (pack-objects: parse --filter directly
> into revs.filter, 2022-03-22), i.e. we might have a --filter
> command-line option, and need to either have the "struct rev_info"
> setup when we encounter that option, or later.
>
> Let's just change the control flow so that we'll instead set up the
> "struct rev_info" only when we need it. Doing so leads to a bit more
> verbosity, but it's a lot clearer what we're doing and why.

Is this about "we take it as given that the use of rev_info leaks
until we fix revisions API, so let's keep its use limited to avoid
unnecessary leaks"?

If so, it sort-of makes sense, but smells like a roundabout way to
address the issue.  An obvious alternative is to wait until both the
topic and the "plug revision API" topic graduate and then add a
"release" call to release the resource in the same sope as the
unconditional call to init_revisions at the end.  I do not quite get
what on-demand lazy set-up buys us.  What we need to lazily set-up,
when we do lazily set-up, needs to be released either way, no?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-25 17:34         ` Ævar Arnfjörð Bjarmason
@ 2022-03-25 19:08           ` Derrick Stolee
  2022-03-26  0:52             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 25+ messages in thread
From: Derrick Stolee @ 2022-03-25 19:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Bagas Sanjaya

On 3/25/2022 1:34 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Mar 25 2022, Derrick Stolee wrote:
> 
>> On 3/25/2022 12:00 PM, Ævar Arnfjörð Bjarmason wrote:
>>>> +struct rev_info_maybe_empty {
>>>> +	int has_revs;
>>>> +	struct rev_info revs;
>>>> +};
>>
>> Thinking about this a second time, perhaps it would be best to add
>> an "unsigned initialized:1;" to struct rev_info so we can look at
>> such a struct and know whether or not repo_init_revisions() has
>> been run or not. Avoids the custom struct and unifies a few things.
>>
>> In particular, release_revisions() could choose to do nothing if
>> revs->initialized is false.
> 
> This plan won't work because that behavior is both undefined per the
> standard, and something that's wildly undefined in practice.
> 
> I.e. we initialize it on the stack, so it'll point to uninitialized
> memory, sometimes that bit will be 0, sometimes 1...
> 
> If you mean just initialize it to { 0 } or whatever that would work,
> yes, but if we're going to refactor all the callers to do that we might
> as well refactor the few missing bits that would be needed to initialize
> it statically, and drop the dynamic by default initialization...

Yes, I was assuming that we initialize all structs to all-zero,
but the existing failure to do this will cause such a change too
large for this issue.

> But FWIW I think a much more obvious thing to do overall would be to
> skip the whole "filter bust me in rev_info" refactoring part of your
> series and just add a trivial list_objects_filter_copy_attach() method,
> or do it inline with memcpy/memset.
> 
> I.e. to not touch the "filter" etc. callback stuff at all, still pass it
> to get_object_list(). Can't 2/5 and 3/5 in your series be replaced by
> this simpler and smaller change?:

> 	-	list_objects_filter_copy(&revs.filter, &filter_options);
> 	+	/* attach our CLI --filter to rev_info's filter */
> 	+	memcpy(&revs.filter, filter, sizeof(*filter));
> 	+	memset(filter, 0, sizeof(*filter));

Here, you are removing a deep copy with a shallow copy. After this,
freeing the arrays within revs.filter would cause a double-free when
freeing the arrays in the original filter_options.

If you went this way, then you could do a s/&filter_options/filter/
in the existing line.

> 	 	/* make sure shallows are read */
> 	 	is_repository_shallow(the_repository);
> 	@@ -3872,6 +3873,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
> 	 	int rev_list_index = 0;
> 	 	int stdin_packs = 0;
> 	 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
> 	+	struct list_objects_filter_options filter_options = { 0 };
> 	 	struct option pack_objects_options[] = {
> 	 		OPT_SET_INT('q', "quiet", &progress,
> 	 			    N_("do not show progress meter"), 0),
> 	@@ -4154,7 +4156,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
> 	 	} else if (!use_internal_rev_list) {
> 	 		read_object_list_from_stdin();
> 	 	} else {
> 	-		get_object_list(rp.nr, rp.v);
> 	+		get_object_list(rp.nr, rp.v, &filter_options);
> 	 	}
> 	 	cleanup_preferred_base();
> 	 	if (include_tag && nr_result)
> 
> And even most of that could be omitted by not removing the global
> "static struct" since pack-objects is a one-off anyway ... :)

Even if you fix the deep/shallow copy above, you still need to
clean up the filter in two places.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-25 19:08           ` Derrick Stolee
@ 2022-03-26  0:52             ` Ævar Arnfjörð Bjarmason
  2022-03-28 14:04               ` Derrick Stolee
  0 siblings, 1 reply; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-26  0:52 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano, Bagas Sanjaya


On Fri, Mar 25 2022, Derrick Stolee wrote:

> On 3/25/2022 1:34 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Fri, Mar 25 2022, Derrick Stolee wrote:
>> 
>>> On 3/25/2022 12:00 PM, Ævar Arnfjörð Bjarmason wrote:
>>>>> +struct rev_info_maybe_empty {
>>>>> +	int has_revs;
>>>>> +	struct rev_info revs;
>>>>> +};
>>>
>>> Thinking about this a second time, perhaps it would be best to add
>>> an "unsigned initialized:1;" to struct rev_info so we can look at
>>> such a struct and know whether or not repo_init_revisions() has
>>> been run or not. Avoids the custom struct and unifies a few things.
>>>
>>> In particular, release_revisions() could choose to do nothing if
>>> revs->initialized is false.
>> 
>> This plan won't work because that behavior is both undefined per the
>> standard, and something that's wildly undefined in practice.
>> 
>> I.e. we initialize it on the stack, so it'll point to uninitialized
>> memory, sometimes that bit will be 0, sometimes 1...
>> 
>> If you mean just initialize it to { 0 } or whatever that would work,
>> yes, but if we're going to refactor all the callers to do that we might
>> as well refactor the few missing bits that would be needed to initialize
>> it statically, and drop the dynamic by default initialization...
>
> Yes, I was assuming that we initialize all structs to all-zero,
> but the existing failure to do this will cause such a change too
> large for this issue.

I don't see how that wouldn't be a regression on the upthread patch in
the sense that yes, we could of course initialize it, but the whole
point of not doing so was to have our tooling detect if the downstream
code assumed it could start using a struct member we hadn't filled in.

By initializing it we'll never know.

But yes, if you consider that a non-goal then init to "{ 0 }" makes the
most sense.

>> But FWIW I think a much more obvious thing to do overall would be to
>> skip the whole "filter bust me in rev_info" refactoring part of your
>> series and just add a trivial list_objects_filter_copy_attach() method,
>> or do it inline with memcpy/memset.
>> 
>> I.e. to not touch the "filter" etc. callback stuff at all, still pass it
>> to get_object_list(). Can't 2/5 and 3/5 in your series be replaced by
>> this simpler and smaller change?:
>
>> 	-	list_objects_filter_copy(&revs.filter, &filter_options);
>> 	+	/* attach our CLI --filter to rev_info's filter */
>> 	+	memcpy(&revs.filter, filter, sizeof(*filter));
>> 	+	memset(filter, 0, sizeof(*filter));
>
> Here, you are removing a deep copy with a shallow copy. After this,
> freeing the arrays within revs.filter would cause a double-free when
> freeing the arrays in the original filter_options.

Yes, and that's what we want, right? I.e. we don't want a copy, but to
use the &filter for parse_options(), then once that's populated we
shallow-copy that to "struct rev_info"'s "filter", and forget about our
own copy (i.e. the memset there is redundant, but just a "let's not use
this again) marker.

Of course this will leak now, but once merged with my
release_revisions() patch will work, and we'll free what we allocated
(once!).

> If you went this way, then you could do a s/&filter_options/filter/
> in the existing line.
>
>> 	 	/* make sure shallows are read */
>> 	 	is_repository_shallow(the_repository);
>> 	@@ -3872,6 +3873,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>> 	 	int rev_list_index = 0;
>> 	 	int stdin_packs = 0;
>> 	 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
>> 	+	struct list_objects_filter_options filter_options = { 0 };
>> 	 	struct option pack_objects_options[] = {
>> 	 		OPT_SET_INT('q', "quiet", &progress,
>> 	 			    N_("do not show progress meter"), 0),
>> 	@@ -4154,7 +4156,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>> 	 	} else if (!use_internal_rev_list) {
>> 	 		read_object_list_from_stdin();
>> 	 	} else {
>> 	-		get_object_list(rp.nr, rp.v);
>> 	+		get_object_list(rp.nr, rp.v, &filter_options);
>> 	 	}
>> 	 	cleanup_preferred_base();
>> 	 	if (include_tag && nr_result)
>> 
>> And even most of that could be omitted by not removing the global
>> "static struct" since pack-objects is a one-off anyway ... :)
>
> Even if you fix the deep/shallow copy above, you still need to
> clean up the filter in two places.

If you "fix" the shallow copying you need to free it twice, but if you
don't you free it once.

I.e. this is conceptually the same as strbuf_detach() + strbuf_attach().

But maybe I'm missing something...

(If I am it's rather worrying that it passed all our tests, both in your
series + merged with the release_revisions() series).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-25 18:53   ` Junio C Hamano
@ 2022-03-26  1:09     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-26  1:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Derrick Stolee, Bagas Sanjaya


On Fri, Mar 25 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> In the preceding [1] (pack-objects: move revs out of
>> get_object_list(), 2022-03-22) the "repo_init_revisions()" was moved
>> to cmd_pack_objects() so that it unconditionally took place for all
>> invocations of "git pack-objects".
>>
>> We'd thus start leaking memory, which is easily reproduced in
>> e.g. git.git by feeding e83c5163316 (Initial revision of "git", the
>> information manager from hell, 2005-04-07) to "git pack-objects";
>> ...
>> Narrowly fixing that commit would have been easy, just add call
>> repo_init_revisions() right before get_object_list(), which is
>> effectively what was done before that commit.
>>
>> But an unstated constraint when setting it up early is that it was
>> needed for the subsequent [2] (pack-objects: parse --filter directly
>> into revs.filter, 2022-03-22), i.e. we might have a --filter
>> command-line option, and need to either have the "struct rev_info"
>> setup when we encounter that option, or later.
>>
>> Let's just change the control flow so that we'll instead set up the
>> "struct rev_info" only when we need it. Doing so leads to a bit more
>> verbosity, but it's a lot clearer what we're doing and why.
>
> Is this about "we take it as given that the use of rev_info leaks
> until we fix revisions API, so let's keep its use limited to avoid
> unnecessary leaks"?

Not exactly,. When you use the revisions API to do "filter" stuff in
this codepath it leaks both before & after Derrick's patches, so nothing
has changed in that case, but...

> If so, it sort-of makes sense, but smells like a roundabout way to
> address the issue.  An obvious alternative is to wait until both the
> topic and the "plug revision API" topic graduate and then add a
> "release" call to release the resource in the same sope as the
> unconditional call to init_revisions at the end.  I do not quite get
> what on-demand lazy set-up buys us.  What we need to lazily set-up,
> when we do lazily set-up, needs to be released either way, no?

...We were doing lazy setup of "struct rev_info" before the parent
series, and as a result it introduces a new memory leak. We do a
malloc() for some diff.c code that revisions.c uses unconditionally,
which then don't use at all in some common cases.

The patch I've submitted on top just restores the previous state of the
initialization being lazy, but in a way that has to be adapted for other
code changes the series made.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-26  0:52             ` Ævar Arnfjörð Bjarmason
@ 2022-03-28 14:04               ` Derrick Stolee
  0 siblings, 0 replies; 25+ messages in thread
From: Derrick Stolee @ 2022-03-28 14:04 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Bagas Sanjaya

On 3/25/2022 8:52 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Mar 25 2022, Derrick Stolee wrote:
> 
>> On 3/25/2022 1:34 PM, Ævar Arnfjörð Bjarmason wrote:>>> 	-	list_objects_filter_copy(&revs.filter, &filter_options);
>>> 	+	/* attach our CLI --filter to rev_info's filter */
>>> 	+	memcpy(&revs.filter, filter, sizeof(*filter));
>>> 	+	memset(filter, 0, sizeof(*filter));
>>
>> Here, you are removing a deep copy with a shallow copy. After this,
>> freeing the arrays within revs.filter would cause a double-free when
>> freeing the arrays in the original filter_options.
> 
> Yes, and that's what we want, right? I.e. we don't want a copy, but to
> use the &filter for parse_options(), then once that's populated we
> shallow-copy that to "struct rev_info"'s "filter", and forget about our
> own copy (i.e. the memset there is redundant, but just a "let's not use
> this again) marker.
> 
> Of course this will leak now, but once merged with my
> release_revisions() patch will work, and we'll free what we allocated
> (once!).


>> Even if you fix the deep/shallow copy above, you still need to
>> clean up the filter in two places.
> 
> If you "fix" the shallow copying you need to free it twice, but if you
> don't you free it once.
> 
> I.e. this is conceptually the same as strbuf_detach() + strbuf_attach().
> 
> But maybe I'm missing something...
> 
> (If I am it's rather worrying that it passed all our tests, both in your
> series + merged with the release_revisions() series).

My problem is that you need to know that the filter data was
"detached" in a different scope than it is defined.

 * filter_options is defined in cmd_pack_objects()
 * The detach and reattach to revs is in get_object_list()

Your model requires internal information from get_object_list()
to know that you shouldn't release filter_options within
cmd_pack_objects(), which I think is a code smell. Better to
have something allocated in cmd_pack_objects() be freed in that
same method so it is visually detectable that we are freeing
correctly.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-25 14:25 ` [PATCH] pack-objects: lazily set up "struct rev_info", don't leak Ævar Arnfjörð Bjarmason
  2022-03-25 14:57   ` Derrick Stolee
  2022-03-25 18:53   ` Junio C Hamano
@ 2022-03-28 15:43   ` Ævar Arnfjörð Bjarmason
  2022-03-28 15:58     ` Derrick Stolee
  2022-03-28 17:10     ` Junio C Hamano
  2 siblings, 2 replies; 25+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-28 15:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

In the preceding [1] (pack-objects: move revs out of
get_object_list(), 2022-03-22) the "repo_init_revisions()" was moved
to cmd_pack_objects() so that it unconditionally took place for all
invocations of "git pack-objects".

We'd thus start leaking memory, which is easily reproduced in
e.g. git.git by feeding e83c5163316 (Initial revision of "git", the
information manager from hell, 2005-04-07) to "git pack-objects";

    $ echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial
    [...]
	==19130==ERROR: LeakSanitizer: detected memory leaks

	Direct leak of 7120 byte(s) in 1 object(s) allocated from:
	    #0 0x455308 in __interceptor_malloc (/home/avar/g/git/git+0x455308)
	    #1 0x75b399 in do_xmalloc /home/avar/g/git/wrapper.c:41:8
	    #2 0x75b356 in xmalloc /home/avar/g/git/wrapper.c:62:9
	    #3 0x5d7609 in prep_parse_options /home/avar/g/git/diff.c:5647:2
	    #4 0x5d415a in repo_diff_setup /home/avar/g/git/diff.c:4621:2
	    #5 0x6dffbb in repo_init_revisions /home/avar/g/git/revision.c:1853:2
	    #6 0x4f599d in cmd_pack_objects /home/avar/g/git/builtin/pack-objects.c:3980:2
	    #7 0x4592ca in run_builtin /home/avar/g/git/git.c:465:11
	    #8 0x457d81 in handle_builtin /home/avar/g/git/git.c:718:3
	    #9 0x458ca5 in run_argv /home/avar/g/git/git.c:785:4
	    #10 0x457b40 in cmd_main /home/avar/g/git/git.c:916:19
	    #11 0x562259 in main /home/avar/g/git/common-main.c:56:11
	    #12 0x7fce792ac7ec in __libc_start_main csu/../csu/libc-start.c:332:16
	    #13 0x4300f9 in _start (/home/avar/g/git/git+0x4300f9)

	SUMMARY: LeakSanitizer: 7120 byte(s) leaked in 1 allocation(s).
	Aborted

Narrowly fixing that commit would have been easy, just add call
repo_init_revisions() right before get_object_list(), which is
effectively what was done before that commit.

But an unstated constraint when setting it up early is that it was
needed for the subsequent [2] (pack-objects: parse --filter directly
into revs.filter, 2022-03-22), i.e. we might have a --filter
command-line option, and need to either have the "struct rev_info"
setup when we encounter that option, or later.

Let's just change the control flow so that we'll instead set up the
"struct rev_info" only when we need it. Doing so leads to a bit more
verbosity, but it's a lot clearer what we're doing and why.

An earlier version of this commit[3] went behind
opt_parse_list_objects_filter()'s back by faking up a "struct option"
before calling it. Let's avoid that and instead create a blessed API
for this pattern.

We could furthermore combine the two get_object_list() invocations
here by having repo_init_revisions() invoked on &pfd.revs, but I think
clearly separating the two makes the flow clearer. Likewise
redundantly but explicitly (i.e. redundant v.s. a "{ 0 }") "0" to
"have_revs" early in cmd_pack_objects().

While we're at it add parentheses around the arguments to the OPT_*
macros in in list-objects-filter-options.h, as we need to change those
lines anyway. It doesn't matter in this case, but is good general
practice.

1. https://lore.kernel.org/git/619b757d98465dbc4995bdc11a5282fbfcbd3daa.1647970119.git.gitgitgadget@gmail.com
2. https://lore.kernel.org/git/97de926904988b89b5663bd4c59c011a1723a8f5.1647970119.git.gitgitgadget@gmail.com
3. https://lore.kernel.org/git/patch-1.1-193534b0f07-20220325T121715Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

Hopefully this v2 addresses enough of the concern on v1 to move
forward with squashing the linux-leaks failure in "seen" currently.

There's still "struct rev_info" in play here, which I still think
makes sense, but this gets entirely rid of the evil magic trickery
with faking up a "struct option" in favor of a defined API for doing
the lazy initialization.

Range-diff against v1:
1:  193534b0f07 ! 1:  9951d92176e pack-objects: lazily set up "struct rev_info", don't leak
    @@ Commit message
         "struct rev_info" only when we need it. Doing so leads to a bit more
         verbosity, but it's a lot clearer what we're doing and why.
     
    +    An earlier version of this commit[3] went behind
    +    opt_parse_list_objects_filter()'s back by faking up a "struct option"
    +    before calling it. Let's avoid that and instead create a blessed API
    +    for this pattern.
    +
         We could furthermore combine the two get_object_list() invocations
         here by having repo_init_revisions() invoked on &pfd.revs, but I think
         clearly separating the two makes the flow clearer. Likewise
         redundantly but explicitly (i.e. redundant v.s. a "{ 0 }") "0" to
         "have_revs" early in cmd_pack_objects().
     
    -    This does add the future constraint to opt_parse_list_objects_filter()
    -    that we'll need to adjust this wrapper code if it looks at any other
    -    value of the "struct option" than the "value" member.
    -
    -    But that regression should be relatively easy to spot. I'm
    -    intentionally not initializing the "struct wrap" with e.g. "{ 0 }" so
    -    that various memory sanity checkers would spot that, we just
    -    initialize the "value" in po_filter_cb(). By doing this e.g. we'll die
    -    on e.g. this test if we were to use another member of "opt" in
    -    opt_parse_list_objects_filter()>
    -
    -        ./t5317-pack-objects-filter-objects.sh -vixd --valgrind-only=3
    -
         While we're at it add parentheses around the arguments to the OPT_*
         macros in in list-objects-filter-options.h, as we need to change those
         lines anyway. It doesn't matter in this case, but is good general
         practice.
     
         1. https://lore.kernel.org/git/619b757d98465dbc4995bdc11a5282fbfcbd3daa.1647970119.git.gitgitgadget@gmail.com
    -    2. https://lore.kernel.org/git/97de926904988b89b5663bd4c59c011a1723a8f5.1647970119.git.gitgitgadget@gmail.com/
    +    2. https://lore.kernel.org/git/97de926904988b89b5663bd4c59c011a1723a8f5.1647970119.git.gitgitgadget@gmail.com
    +    3. https://lore.kernel.org/git/patch-1.1-193534b0f07-20220325T121715Z-avarab@gmail.com/
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ builtin/pack-objects.c: static int option_parse_unpack_unreachable(const struct
     +	struct rev_info revs;
     +};
     +
    -+static int po_filter_cb(const struct option *opt, const char *arg, int unset)
    ++static struct list_objects_filter_options *po_filter_revs_init(void *value)
     +{
    -+	struct po_filter_data *data = opt->value;
    -+	struct option wrap; /* don't initialize! */
    ++	struct po_filter_data *data = value;
     +
     +	repo_init_revisions(the_repository, &data->revs, NULL);
    -+	wrap.value = &data->revs.filter;
     +	data->have_revs = 1;
     +
    -+	return opt_parse_list_objects_filter(&wrap, arg, unset);
    ++	return &data->revs.filter;
     +}
     +
      int cmd_pack_objects(int argc, const char **argv, const char *prefix)
    @@ builtin/pack-objects.c: int cmd_pack_objects(int argc, const char **argv, const
      			      N_("write a bitmap index if possible"),
      			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
     -		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
    -+		OPT_PARSE_LIST_OBJECTS_FILTER_CB(&pfd, po_filter_cb),
    ++		OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init),
      		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
      		  N_("handling for missing objects"), PARSE_OPT_NONEG,
      		  option_parse_missing_action),
    @@ builtin/pack-objects.c: int cmd_pack_objects(int argc, const char **argv, const
      	}
      	cleanup_preferred_base();
     
    + ## list-objects-filter-options.c ##
    +@@ list-objects-filter-options.c: int opt_parse_list_objects_filter(const struct option *opt,
    + 				  const char *arg, int unset)
    + {
    + 	struct list_objects_filter_options *filter_options = opt->value;
    ++	opt_lof_init init = (opt_lof_init)opt->defval;
    ++
    ++	if (init)
    ++		filter_options = init(opt->value);
    + 
    + 	if (unset || !arg)
    + 		list_objects_filter_set_no_filter(filter_options);
    +
      ## list-objects-filter-options.h ##
     @@ list-objects-filter-options.h: void parse_list_objects_filter(
    + 	struct list_objects_filter_options *filter_options,
    + 	const char *arg);
    + 
    ++/**
    ++ * The opt->value to opt_parse_list_objects_filter() is either a
    ++ * "struct list_objects_filter_option *" when using
    ++ * OPT_PARSE_LIST_OBJECTS_FILTER().
    ++ *
    ++ * Or, if using no "struct option" field is used by the callback,
    ++ * except the "defval" which is expected to be an "opt_lof_init"
    ++ * function, which is called with the "opt->value" and must return a
    ++ * pointer to the ""struct list_objects_filter_option *" to be used.
    ++ *
    ++ * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the
    ++ * "struct list_objects_filter_option" is embedded in a "struct
    ++ * rev_info", which the "defval" could be tasked with lazily
    ++ * initializing. See cmd_pack_objects() for an example.
    ++ */
      int opt_parse_list_objects_filter(const struct option *opt,
      				  const char *arg, int unset);
    ++typedef struct list_objects_filter_options *(*opt_lof_init)(void *);
    ++#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \
    ++	{ OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \
    ++	  N_("object filtering"), 0, opt_parse_list_objects_filter, \
    ++	  (intptr_t)(init) }
      
    -+#define OPT_PARSE_LIST_OBJECTS_FILTER_CB(fo, cb) \
    -+	OPT_CALLBACK(0, "filter", (fo), N_("args"), \
    -+		     N_("object filtering"), (cb))
    -+
      #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
     -	OPT_CALLBACK(0, "filter", fo, N_("args"), \
     -	  N_("object filtering"), \
     -	  opt_parse_list_objects_filter)
    -+	OPT_PARSE_LIST_OBJECTS_FILTER_CB((fo), opt_parse_list_objects_filter)
    ++	OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL)
      
      /*
       * Translates abbreviated numbers in the filter's filter_spec into their

 builtin/pack-objects.c        | 28 +++++++++++++++++++++++-----
 list-objects-filter-options.c |  4 ++++
 list-objects-filter-options.h | 24 +++++++++++++++++++++---
 3 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d39f668ad56..09680fb6a8b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3857,6 +3857,21 @@ static int option_parse_unpack_unreachable(const struct option *opt,
 	return 0;
 }
 
+struct po_filter_data {
+	unsigned have_revs:1;
+	struct rev_info revs;
+};
+
+static struct list_objects_filter_options *po_filter_revs_init(void *value)
+{
+	struct po_filter_data *data = value;
+
+	repo_init_revisions(the_repository, &data->revs, NULL);
+	data->have_revs = 1;
+
+	return &data->revs.filter;
+}
+
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
@@ -3867,7 +3882,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	int rev_list_index = 0;
 	int stdin_packs = 0;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
-	struct rev_info revs;
+	struct po_filter_data pfd = { .have_revs = 0 };
 
 	struct option pack_objects_options[] = {
 		OPT_SET_INT('q', "quiet", &progress,
@@ -3954,7 +3969,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			      &write_bitmap_index,
 			      N_("write a bitmap index if possible"),
 			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
-		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
+		OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init),
 		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
 		  N_("handling for missing objects"), PARSE_OPT_NONEG,
 		  option_parse_missing_action),
@@ -3973,8 +3988,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 
-	repo_init_revisions(the_repository, &revs, NULL);
-
 	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
 	if (the_repository->gitdir) {
 		prepare_repo_settings(the_repository);
@@ -4076,7 +4089,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (revs.filter.choice) {
+	if (pfd.have_revs && pfd.revs.filter.choice) {
 		if (!pack_to_stdout)
 			die(_("cannot use --filter without --stdout"));
 		if (stdin_packs)
@@ -4152,7 +4165,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			add_unreachable_loose_objects();
 	} else if (!use_internal_rev_list) {
 		read_object_list_from_stdin();
+	} else if (pfd.have_revs) {
+		get_object_list(&pfd.revs, rp.nr, rp.v);
 	} else {
+		struct rev_info revs;
+
+		repo_init_revisions(the_repository, &revs, NULL);
 		get_object_list(&revs, rp.nr, rp.v);
 	}
 	cleanup_preferred_base();
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index f02d8df1422..4b25287886d 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -285,6 +285,10 @@ int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset)
 {
 	struct list_objects_filter_options *filter_options = opt->value;
+	opt_lof_init init = (opt_lof_init)opt->defval;
+
+	if (init)
+		filter_options = init(opt->value);
 
 	if (unset || !arg)
 		list_objects_filter_set_no_filter(filter_options);
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 90e4bc96252..ffc02d77e76 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -104,13 +104,31 @@ void parse_list_objects_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg);
 
+/**
+ * The opt->value to opt_parse_list_objects_filter() is either a
+ * "struct list_objects_filter_option *" when using
+ * OPT_PARSE_LIST_OBJECTS_FILTER().
+ *
+ * Or, if using no "struct option" field is used by the callback,
+ * except the "defval" which is expected to be an "opt_lof_init"
+ * function, which is called with the "opt->value" and must return a
+ * pointer to the ""struct list_objects_filter_option *" to be used.
+ *
+ * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the
+ * "struct list_objects_filter_option" is embedded in a "struct
+ * rev_info", which the "defval" could be tasked with lazily
+ * initializing. See cmd_pack_objects() for an example.
+ */
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset);
+typedef struct list_objects_filter_options *(*opt_lof_init)(void *);
+#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \
+	{ OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \
+	  N_("object filtering"), 0, opt_parse_list_objects_filter, \
+	  (intptr_t)(init) }
 
 #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
-	OPT_CALLBACK(0, "filter", fo, N_("args"), \
-	  N_("object filtering"), \
-	  opt_parse_list_objects_filter)
+	OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL)
 
 /*
  * Translates abbreviated numbers in the filter's filter_spec into their
-- 
2.35.1.1541.g9c2d54e20ab


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-28 15:43   ` [PATCH v2] " Ævar Arnfjörð Bjarmason
@ 2022-03-28 15:58     ` Derrick Stolee
  2022-03-28 17:10     ` Junio C Hamano
  1 sibling, 0 replies; 25+ messages in thread
From: Derrick Stolee @ 2022-03-28 15:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git; +Cc: Junio C Hamano, Bagas Sanjaya

On 3/28/2022 11:43 AM, Ævar Arnfjörð Bjarmason wrote:
> An earlier version of this commit[3] went behind
> opt_parse_list_objects_filter()'s back by faking up a "struct option"
> before calling it. Let's avoid that and instead create a blessed API
> for this pattern.

...

> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index f02d8df1422..4b25287886d 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -285,6 +285,10 @@ int opt_parse_list_objects_filter(const struct option *opt,
>  				  const char *arg, int unset)
>  {
>  	struct list_objects_filter_options *filter_options = opt->value;
> +	opt_lof_init init = (opt_lof_init)opt->defval;
> +
> +	if (init)
> +		filter_options = init(opt->value);
>  
>  	if (unset || !arg)
>  		list_objects_filter_set_no_filter(filter_options);
> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> index 90e4bc96252..ffc02d77e76 100644
> --- a/list-objects-filter-options.h
> +++ b/list-objects-filter-options.h
> @@ -104,13 +104,31 @@ void parse_list_objects_filter(
>  	struct list_objects_filter_options *filter_options,
>  	const char *arg);
>  
> +/**
> + * The opt->value to opt_parse_list_objects_filter() is either a
> + * "struct list_objects_filter_option *" when using
> + * OPT_PARSE_LIST_OBJECTS_FILTER().
> + *
> + * Or, if using no "struct option" field is used by the callback,
> + * except the "defval" which is expected to be an "opt_lof_init"
> + * function, which is called with the "opt->value" and must return a
> + * pointer to the ""struct list_objects_filter_option *" to be used.
> + *
> + * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the
> + * "struct list_objects_filter_option" is embedded in a "struct
> + * rev_info", which the "defval" could be tasked with lazily
> + * initializing. See cmd_pack_objects() for an example.
> + */
>  int opt_parse_list_objects_filter(const struct option *opt,
>  				  const char *arg, int unset);
> +typedef struct list_objects_filter_options *(*opt_lof_init)(void *);
> +#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \
> +	{ OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \
> +	  N_("object filtering"), 0, opt_parse_list_objects_filter, \
> +	  (intptr_t)(init) }
>  
>  #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
> -	OPT_CALLBACK(0, "filter", fo, N_("args"), \
> -	  N_("object filtering"), \
> -	  opt_parse_list_objects_filter)
> +	OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL)

I like this way of combining the macros into one implementation
(guarded with a NULL check).

This version looks good to me.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] pack-objects: lazily set up "struct rev_info", don't leak
  2022-03-28 15:43   ` [PATCH v2] " Ævar Arnfjörð Bjarmason
  2022-03-28 15:58     ` Derrick Stolee
@ 2022-03-28 17:10     ` Junio C Hamano
  1 sibling, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2022-03-28 17:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Derrick Stolee, Bagas Sanjaya

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> index 90e4bc96252..ffc02d77e76 100644
> --- a/list-objects-filter-options.h
> +++ b/list-objects-filter-options.h
> @@ -104,13 +104,31 @@ void parse_list_objects_filter(
> ...
> +typedef struct list_objects_filter_options *(*opt_lof_init)(void *);

Do we want or need to expose this type to anybody outside the
list_objects_filter_options.c file?  After all, the client has to
define a function of that type like so:

> +static struct list_objects_filter_options *po_filter_revs_init(void *value)
> +{
> +	struct po_filter_data *data = value;
> +
> +	repo_init_revisions(the_repository, &data->revs, NULL);
> +	data->have_revs = 1;
> +
> +	return &data->revs.filter;
> +}

and not

	static opt_lof_init po_filter_revs_init
	{
		body of the function
	}

and the use of it, i.e. 

> @@ -3954,7 +3969,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  			      &write_bitmap_index,
>  			      N_("write a bitmap index if possible"),
>  			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
> -		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
> +		OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init),

casts the type away to (intptr_t) so having a handy way to spell the
type out does not contribute to type safety, either.

Other than that, looks good to me.

Thanks, will queue directly on top of Derrick's partial-bundle
clean-up series.


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-03-28 17:10 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-22 17:28 [PATCH 0/5] Partial bundle follow ups Derrick Stolee via GitGitGadget
2022-03-22 17:28 ` [PATCH 1/5] list-objects-filter: remove CL_ARG__FILTER Derrick Stolee via GitGitGadget
2022-03-22 17:28 ` [PATCH 2/5] pack-objects: move revs out of get_object_list() Derrick Stolee via GitGitGadget
2022-03-22 17:28 ` [PATCH 3/5] pack-objects: parse --filter directly into revs.filter Derrick Stolee via GitGitGadget
2022-03-22 19:37   ` [-SPAM-] " Ramsay Jones
2022-03-23 13:48     ` Derrick Stolee
2022-03-22 21:15   ` Ævar Arnfjörð Bjarmason
2022-03-22 17:28 ` [PATCH 4/5] bundle: move capabilities to end of 'verify' Derrick Stolee via GitGitGadget
2022-03-23  7:08   ` Bagas Sanjaya
2022-03-23 13:39     ` Derrick Stolee
2022-03-22 17:28 ` [PATCH 5/5] bundle: output hash information in 'verify' Derrick Stolee via GitGitGadget
2022-03-23 21:27 ` [PATCH 0/5] Partial bundle follow ups Junio C Hamano
2022-03-25 14:25 ` [PATCH] pack-objects: lazily set up "struct rev_info", don't leak Ævar Arnfjörð Bjarmason
2022-03-25 14:57   ` Derrick Stolee
2022-03-25 16:00     ` Ævar Arnfjörð Bjarmason
2022-03-25 16:41       ` Derrick Stolee
2022-03-25 17:34         ` Ævar Arnfjörð Bjarmason
2022-03-25 19:08           ` Derrick Stolee
2022-03-26  0:52             ` Ævar Arnfjörð Bjarmason
2022-03-28 14:04               ` Derrick Stolee
2022-03-25 18:53   ` Junio C Hamano
2022-03-26  1:09     ` Ævar Arnfjörð Bjarmason
2022-03-28 15:43   ` [PATCH v2] " Ævar Arnfjörð Bjarmason
2022-03-28 15:58     ` Derrick Stolee
2022-03-28 17:10     ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.