All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"Bagas Sanjaya" <bagasdotme@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v2] pack-objects: lazily set up "struct rev_info", don't leak
Date: Mon, 28 Mar 2022 17:43:18 +0200	[thread overview]
Message-ID: <patch-v2-1.1-9951d92176e-20220328T154049Z-avarab@gmail.com> (raw)
In-Reply-To: <patch-1.1-193534b0f07-20220325T121715Z-avarab@gmail.com>

In the preceding [1] (pack-objects: move revs out of
get_object_list(), 2022-03-22) the "repo_init_revisions()" was moved
to cmd_pack_objects() so that it unconditionally took place for all
invocations of "git pack-objects".

We'd thus start leaking memory, which is easily reproduced in
e.g. git.git by feeding e83c5163316 (Initial revision of "git", the
information manager from hell, 2005-04-07) to "git pack-objects";

    $ echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial
    [...]
	==19130==ERROR: LeakSanitizer: detected memory leaks

	Direct leak of 7120 byte(s) in 1 object(s) allocated from:
	    #0 0x455308 in __interceptor_malloc (/home/avar/g/git/git+0x455308)
	    #1 0x75b399 in do_xmalloc /home/avar/g/git/wrapper.c:41:8
	    #2 0x75b356 in xmalloc /home/avar/g/git/wrapper.c:62:9
	    #3 0x5d7609 in prep_parse_options /home/avar/g/git/diff.c:5647:2
	    #4 0x5d415a in repo_diff_setup /home/avar/g/git/diff.c:4621:2
	    #5 0x6dffbb in repo_init_revisions /home/avar/g/git/revision.c:1853:2
	    #6 0x4f599d in cmd_pack_objects /home/avar/g/git/builtin/pack-objects.c:3980:2
	    #7 0x4592ca in run_builtin /home/avar/g/git/git.c:465:11
	    #8 0x457d81 in handle_builtin /home/avar/g/git/git.c:718:3
	    #9 0x458ca5 in run_argv /home/avar/g/git/git.c:785:4
	    #10 0x457b40 in cmd_main /home/avar/g/git/git.c:916:19
	    #11 0x562259 in main /home/avar/g/git/common-main.c:56:11
	    #12 0x7fce792ac7ec in __libc_start_main csu/../csu/libc-start.c:332:16
	    #13 0x4300f9 in _start (/home/avar/g/git/git+0x4300f9)

	SUMMARY: LeakSanitizer: 7120 byte(s) leaked in 1 allocation(s).
	Aborted

Narrowly fixing that commit would have been easy, just add call
repo_init_revisions() right before get_object_list(), which is
effectively what was done before that commit.

But an unstated constraint when setting it up early is that it was
needed for the subsequent [2] (pack-objects: parse --filter directly
into revs.filter, 2022-03-22), i.e. we might have a --filter
command-line option, and need to either have the "struct rev_info"
setup when we encounter that option, or later.

Let's just change the control flow so that we'll instead set up the
"struct rev_info" only when we need it. Doing so leads to a bit more
verbosity, but it's a lot clearer what we're doing and why.

An earlier version of this commit[3] went behind
opt_parse_list_objects_filter()'s back by faking up a "struct option"
before calling it. Let's avoid that and instead create a blessed API
for this pattern.

We could furthermore combine the two get_object_list() invocations
here by having repo_init_revisions() invoked on &pfd.revs, but I think
clearly separating the two makes the flow clearer. Likewise
redundantly but explicitly (i.e. redundant v.s. a "{ 0 }") "0" to
"have_revs" early in cmd_pack_objects().

While we're at it add parentheses around the arguments to the OPT_*
macros in in list-objects-filter-options.h, as we need to change those
lines anyway. It doesn't matter in this case, but is good general
practice.

1. https://lore.kernel.org/git/619b757d98465dbc4995bdc11a5282fbfcbd3daa.1647970119.git.gitgitgadget@gmail.com
2. https://lore.kernel.org/git/97de926904988b89b5663bd4c59c011a1723a8f5.1647970119.git.gitgitgadget@gmail.com
3. https://lore.kernel.org/git/patch-1.1-193534b0f07-20220325T121715Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

Hopefully this v2 addresses enough of the concern on v1 to move
forward with squashing the linux-leaks failure in "seen" currently.

There's still "struct rev_info" in play here, which I still think
makes sense, but this gets entirely rid of the evil magic trickery
with faking up a "struct option" in favor of a defined API for doing
the lazy initialization.

Range-diff against v1:
1:  193534b0f07 ! 1:  9951d92176e pack-objects: lazily set up "struct rev_info", don't leak
    @@ Commit message
         "struct rev_info" only when we need it. Doing so leads to a bit more
         verbosity, but it's a lot clearer what we're doing and why.
     
    +    An earlier version of this commit[3] went behind
    +    opt_parse_list_objects_filter()'s back by faking up a "struct option"
    +    before calling it. Let's avoid that and instead create a blessed API
    +    for this pattern.
    +
         We could furthermore combine the two get_object_list() invocations
         here by having repo_init_revisions() invoked on &pfd.revs, but I think
         clearly separating the two makes the flow clearer. Likewise
         redundantly but explicitly (i.e. redundant v.s. a "{ 0 }") "0" to
         "have_revs" early in cmd_pack_objects().
     
    -    This does add the future constraint to opt_parse_list_objects_filter()
    -    that we'll need to adjust this wrapper code if it looks at any other
    -    value of the "struct option" than the "value" member.
    -
    -    But that regression should be relatively easy to spot. I'm
    -    intentionally not initializing the "struct wrap" with e.g. "{ 0 }" so
    -    that various memory sanity checkers would spot that, we just
    -    initialize the "value" in po_filter_cb(). By doing this e.g. we'll die
    -    on e.g. this test if we were to use another member of "opt" in
    -    opt_parse_list_objects_filter()>
    -
    -        ./t5317-pack-objects-filter-objects.sh -vixd --valgrind-only=3
    -
         While we're at it add parentheses around the arguments to the OPT_*
         macros in in list-objects-filter-options.h, as we need to change those
         lines anyway. It doesn't matter in this case, but is good general
         practice.
     
         1. https://lore.kernel.org/git/619b757d98465dbc4995bdc11a5282fbfcbd3daa.1647970119.git.gitgitgadget@gmail.com
    -    2. https://lore.kernel.org/git/97de926904988b89b5663bd4c59c011a1723a8f5.1647970119.git.gitgitgadget@gmail.com/
    +    2. https://lore.kernel.org/git/97de926904988b89b5663bd4c59c011a1723a8f5.1647970119.git.gitgitgadget@gmail.com
    +    3. https://lore.kernel.org/git/patch-1.1-193534b0f07-20220325T121715Z-avarab@gmail.com/
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ builtin/pack-objects.c: static int option_parse_unpack_unreachable(const struct
     +	struct rev_info revs;
     +};
     +
    -+static int po_filter_cb(const struct option *opt, const char *arg, int unset)
    ++static struct list_objects_filter_options *po_filter_revs_init(void *value)
     +{
    -+	struct po_filter_data *data = opt->value;
    -+	struct option wrap; /* don't initialize! */
    ++	struct po_filter_data *data = value;
     +
     +	repo_init_revisions(the_repository, &data->revs, NULL);
    -+	wrap.value = &data->revs.filter;
     +	data->have_revs = 1;
     +
    -+	return opt_parse_list_objects_filter(&wrap, arg, unset);
    ++	return &data->revs.filter;
     +}
     +
      int cmd_pack_objects(int argc, const char **argv, const char *prefix)
    @@ builtin/pack-objects.c: int cmd_pack_objects(int argc, const char **argv, const
      			      N_("write a bitmap index if possible"),
      			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
     -		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
    -+		OPT_PARSE_LIST_OBJECTS_FILTER_CB(&pfd, po_filter_cb),
    ++		OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init),
      		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
      		  N_("handling for missing objects"), PARSE_OPT_NONEG,
      		  option_parse_missing_action),
    @@ builtin/pack-objects.c: int cmd_pack_objects(int argc, const char **argv, const
      	}
      	cleanup_preferred_base();
     
    + ## list-objects-filter-options.c ##
    +@@ list-objects-filter-options.c: int opt_parse_list_objects_filter(const struct option *opt,
    + 				  const char *arg, int unset)
    + {
    + 	struct list_objects_filter_options *filter_options = opt->value;
    ++	opt_lof_init init = (opt_lof_init)opt->defval;
    ++
    ++	if (init)
    ++		filter_options = init(opt->value);
    + 
    + 	if (unset || !arg)
    + 		list_objects_filter_set_no_filter(filter_options);
    +
      ## list-objects-filter-options.h ##
     @@ list-objects-filter-options.h: void parse_list_objects_filter(
    + 	struct list_objects_filter_options *filter_options,
    + 	const char *arg);
    + 
    ++/**
    ++ * The opt->value to opt_parse_list_objects_filter() is either a
    ++ * "struct list_objects_filter_option *" when using
    ++ * OPT_PARSE_LIST_OBJECTS_FILTER().
    ++ *
    ++ * Or, if using no "struct option" field is used by the callback,
    ++ * except the "defval" which is expected to be an "opt_lof_init"
    ++ * function, which is called with the "opt->value" and must return a
    ++ * pointer to the ""struct list_objects_filter_option *" to be used.
    ++ *
    ++ * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the
    ++ * "struct list_objects_filter_option" is embedded in a "struct
    ++ * rev_info", which the "defval" could be tasked with lazily
    ++ * initializing. See cmd_pack_objects() for an example.
    ++ */
      int opt_parse_list_objects_filter(const struct option *opt,
      				  const char *arg, int unset);
    ++typedef struct list_objects_filter_options *(*opt_lof_init)(void *);
    ++#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \
    ++	{ OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \
    ++	  N_("object filtering"), 0, opt_parse_list_objects_filter, \
    ++	  (intptr_t)(init) }
      
    -+#define OPT_PARSE_LIST_OBJECTS_FILTER_CB(fo, cb) \
    -+	OPT_CALLBACK(0, "filter", (fo), N_("args"), \
    -+		     N_("object filtering"), (cb))
    -+
      #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
     -	OPT_CALLBACK(0, "filter", fo, N_("args"), \
     -	  N_("object filtering"), \
     -	  opt_parse_list_objects_filter)
    -+	OPT_PARSE_LIST_OBJECTS_FILTER_CB((fo), opt_parse_list_objects_filter)
    ++	OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL)
      
      /*
       * Translates abbreviated numbers in the filter's filter_spec into their

 builtin/pack-objects.c        | 28 +++++++++++++++++++++++-----
 list-objects-filter-options.c |  4 ++++
 list-objects-filter-options.h | 24 +++++++++++++++++++++---
 3 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d39f668ad56..09680fb6a8b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3857,6 +3857,21 @@ static int option_parse_unpack_unreachable(const struct option *opt,
 	return 0;
 }
 
+struct po_filter_data {
+	unsigned have_revs:1;
+	struct rev_info revs;
+};
+
+static struct list_objects_filter_options *po_filter_revs_init(void *value)
+{
+	struct po_filter_data *data = value;
+
+	repo_init_revisions(the_repository, &data->revs, NULL);
+	data->have_revs = 1;
+
+	return &data->revs.filter;
+}
+
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	int use_internal_rev_list = 0;
@@ -3867,7 +3882,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	int rev_list_index = 0;
 	int stdin_packs = 0;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
-	struct rev_info revs;
+	struct po_filter_data pfd = { .have_revs = 0 };
 
 	struct option pack_objects_options[] = {
 		OPT_SET_INT('q', "quiet", &progress,
@@ -3954,7 +3969,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			      &write_bitmap_index,
 			      N_("write a bitmap index if possible"),
 			      WRITE_BITMAP_QUIET, PARSE_OPT_HIDDEN),
-		OPT_PARSE_LIST_OBJECTS_FILTER(&revs.filter),
+		OPT_PARSE_LIST_OBJECTS_FILTER_INIT(&pfd, po_filter_revs_init),
 		OPT_CALLBACK_F(0, "missing", NULL, N_("action"),
 		  N_("handling for missing objects"), PARSE_OPT_NONEG,
 		  option_parse_missing_action),
@@ -3973,8 +3988,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 
-	repo_init_revisions(the_repository, &revs, NULL);
-
 	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
 	if (the_repository->gitdir) {
 		prepare_repo_settings(the_repository);
@@ -4076,7 +4089,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!rev_list_all || !rev_list_reflog || !rev_list_index)
 		unpack_unreachable_expiration = 0;
 
-	if (revs.filter.choice) {
+	if (pfd.have_revs && pfd.revs.filter.choice) {
 		if (!pack_to_stdout)
 			die(_("cannot use --filter without --stdout"));
 		if (stdin_packs)
@@ -4152,7 +4165,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			add_unreachable_loose_objects();
 	} else if (!use_internal_rev_list) {
 		read_object_list_from_stdin();
+	} else if (pfd.have_revs) {
+		get_object_list(&pfd.revs, rp.nr, rp.v);
 	} else {
+		struct rev_info revs;
+
+		repo_init_revisions(the_repository, &revs, NULL);
 		get_object_list(&revs, rp.nr, rp.v);
 	}
 	cleanup_preferred_base();
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index f02d8df1422..4b25287886d 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -285,6 +285,10 @@ int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset)
 {
 	struct list_objects_filter_options *filter_options = opt->value;
+	opt_lof_init init = (opt_lof_init)opt->defval;
+
+	if (init)
+		filter_options = init(opt->value);
 
 	if (unset || !arg)
 		list_objects_filter_set_no_filter(filter_options);
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 90e4bc96252..ffc02d77e76 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -104,13 +104,31 @@ void parse_list_objects_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg);
 
+/**
+ * The opt->value to opt_parse_list_objects_filter() is either a
+ * "struct list_objects_filter_option *" when using
+ * OPT_PARSE_LIST_OBJECTS_FILTER().
+ *
+ * Or, if using no "struct option" field is used by the callback,
+ * except the "defval" which is expected to be an "opt_lof_init"
+ * function, which is called with the "opt->value" and must return a
+ * pointer to the ""struct list_objects_filter_option *" to be used.
+ *
+ * The OPT_PARSE_LIST_OBJECTS_FILTER_INIT() can be used e.g. the
+ * "struct list_objects_filter_option" is embedded in a "struct
+ * rev_info", which the "defval" could be tasked with lazily
+ * initializing. See cmd_pack_objects() for an example.
+ */
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset);
+typedef struct list_objects_filter_options *(*opt_lof_init)(void *);
+#define OPT_PARSE_LIST_OBJECTS_FILTER_INIT(fo, init) \
+	{ OPTION_CALLBACK, 0, "filter", (fo), N_("args"), \
+	  N_("object filtering"), 0, opt_parse_list_objects_filter, \
+	  (intptr_t)(init) }
 
 #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
-	OPT_CALLBACK(0, "filter", fo, N_("args"), \
-	  N_("object filtering"), \
-	  opt_parse_list_objects_filter)
+	OPT_PARSE_LIST_OBJECTS_FILTER_INIT((fo), NULL)
 
 /*
  * Translates abbreviated numbers in the filter's filter_spec into their
-- 
2.35.1.1541.g9c2d54e20ab


  parent reply	other threads:[~2022-03-28 15:43 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-22 17:28 [PATCH 0/5] Partial bundle follow ups Derrick Stolee via GitGitGadget
2022-03-22 17:28 ` [PATCH 1/5] list-objects-filter: remove CL_ARG__FILTER Derrick Stolee via GitGitGadget
2022-03-22 17:28 ` [PATCH 2/5] pack-objects: move revs out of get_object_list() Derrick Stolee via GitGitGadget
2022-03-22 17:28 ` [PATCH 3/5] pack-objects: parse --filter directly into revs.filter Derrick Stolee via GitGitGadget
2022-03-22 19:37   ` [-SPAM-] " Ramsay Jones
2022-03-23 13:48     ` Derrick Stolee
2022-03-22 21:15   ` Ævar Arnfjörð Bjarmason
2022-03-22 17:28 ` [PATCH 4/5] bundle: move capabilities to end of 'verify' Derrick Stolee via GitGitGadget
2022-03-23  7:08   ` Bagas Sanjaya
2022-03-23 13:39     ` Derrick Stolee
2022-03-22 17:28 ` [PATCH 5/5] bundle: output hash information in 'verify' Derrick Stolee via GitGitGadget
2022-03-23 21:27 ` [PATCH 0/5] Partial bundle follow ups Junio C Hamano
2022-03-25 14:25 ` [PATCH] pack-objects: lazily set up "struct rev_info", don't leak Ævar Arnfjörð Bjarmason
2022-03-25 14:57   ` Derrick Stolee
2022-03-25 16:00     ` Ævar Arnfjörð Bjarmason
2022-03-25 16:41       ` Derrick Stolee
2022-03-25 17:34         ` Ævar Arnfjörð Bjarmason
2022-03-25 19:08           ` Derrick Stolee
2022-03-26  0:52             ` Ævar Arnfjörð Bjarmason
2022-03-28 14:04               ` Derrick Stolee
2022-03-25 18:53   ` Junio C Hamano
2022-03-26  1:09     ` Ævar Arnfjörð Bjarmason
2022-03-28 15:43   ` Ævar Arnfjörð Bjarmason [this message]
2022-03-28 15:58     ` [PATCH v2] " Derrick Stolee
2022-03-28 17:10     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=patch-v2-1.1-9951d92176e-20220328T154049Z-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.