All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee <derrickstolee@github.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>
Subject: Re: [PATCH] pack-objects: lazily set up "struct rev_info", don't leak
Date: Sat, 26 Mar 2022 01:52:42 +0100	[thread overview]
Message-ID: <220326.865yo1leay.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <d90bb9c8-3155-ca5f-8363-154876a7ad0a@github.com>


On Fri, Mar 25 2022, Derrick Stolee wrote:

> On 3/25/2022 1:34 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Fri, Mar 25 2022, Derrick Stolee wrote:
>> 
>>> On 3/25/2022 12:00 PM, Ævar Arnfjörð Bjarmason wrote:
>>>>> +struct rev_info_maybe_empty {
>>>>> +	int has_revs;
>>>>> +	struct rev_info revs;
>>>>> +};
>>>
>>> Thinking about this a second time, perhaps it would be best to add
>>> an "unsigned initialized:1;" to struct rev_info so we can look at
>>> such a struct and know whether or not repo_init_revisions() has
>>> been run or not. Avoids the custom struct and unifies a few things.
>>>
>>> In particular, release_revisions() could choose to do nothing if
>>> revs->initialized is false.
>> 
>> This plan won't work because that behavior is both undefined per the
>> standard, and something that's wildly undefined in practice.
>> 
>> I.e. we initialize it on the stack, so it'll point to uninitialized
>> memory, sometimes that bit will be 0, sometimes 1...
>> 
>> If you mean just initialize it to { 0 } or whatever that would work,
>> yes, but if we're going to refactor all the callers to do that we might
>> as well refactor the few missing bits that would be needed to initialize
>> it statically, and drop the dynamic by default initialization...
>
> Yes, I was assuming that we initialize all structs to all-zero,
> but the existing failure to do this will cause such a change too
> large for this issue.

I don't see how that wouldn't be a regression on the upthread patch in
the sense that yes, we could of course initialize it, but the whole
point of not doing so was to have our tooling detect if the downstream
code assumed it could start using a struct member we hadn't filled in.

By initializing it we'll never know.

But yes, if you consider that a non-goal then init to "{ 0 }" makes the
most sense.

>> But FWIW I think a much more obvious thing to do overall would be to
>> skip the whole "filter bust me in rev_info" refactoring part of your
>> series and just add a trivial list_objects_filter_copy_attach() method,
>> or do it inline with memcpy/memset.
>> 
>> I.e. to not touch the "filter" etc. callback stuff at all, still pass it
>> to get_object_list(). Can't 2/5 and 3/5 in your series be replaced by
>> this simpler and smaller change?:
>
>> 	-	list_objects_filter_copy(&revs.filter, &filter_options);
>> 	+	/* attach our CLI --filter to rev_info's filter */
>> 	+	memcpy(&revs.filter, filter, sizeof(*filter));
>> 	+	memset(filter, 0, sizeof(*filter));
>
> Here, you are removing a deep copy with a shallow copy. After this,
> freeing the arrays within revs.filter would cause a double-free when
> freeing the arrays in the original filter_options.

Yes, and that's what we want, right? I.e. we don't want a copy, but to
use the &filter for parse_options(), then once that's populated we
shallow-copy that to "struct rev_info"'s "filter", and forget about our
own copy (i.e. the memset there is redundant, but just a "let's not use
this again) marker.

Of course this will leak now, but once merged with my
release_revisions() patch will work, and we'll free what we allocated
(once!).

> If you went this way, then you could do a s/&filter_options/filter/
> in the existing line.
>
>> 	 	/* make sure shallows are read */
>> 	 	is_repository_shallow(the_repository);
>> 	@@ -3872,6 +3873,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>> 	 	int rev_list_index = 0;
>> 	 	int stdin_packs = 0;
>> 	 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
>> 	+	struct list_objects_filter_options filter_options = { 0 };
>> 	 	struct option pack_objects_options[] = {
>> 	 		OPT_SET_INT('q', "quiet", &progress,
>> 	 			    N_("do not show progress meter"), 0),
>> 	@@ -4154,7 +4156,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>> 	 	} else if (!use_internal_rev_list) {
>> 	 		read_object_list_from_stdin();
>> 	 	} else {
>> 	-		get_object_list(rp.nr, rp.v);
>> 	+		get_object_list(rp.nr, rp.v, &filter_options);
>> 	 	}
>> 	 	cleanup_preferred_base();
>> 	 	if (include_tag && nr_result)
>> 
>> And even most of that could be omitted by not removing the global
>> "static struct" since pack-objects is a one-off anyway ... :)
>
> Even if you fix the deep/shallow copy above, you still need to
> clean up the filter in two places.

If you "fix" the shallow copying you need to free it twice, but if you
don't you free it once.

I.e. this is conceptually the same as strbuf_detach() + strbuf_attach().

But maybe I'm missing something...

(If I am it's rather worrying that it passed all our tests, both in your
series + merged with the release_revisions() series).

  reply	other threads:[~2022-03-26  0:59 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-22 17:28 [PATCH 0/5] Partial bundle follow ups Derrick Stolee via GitGitGadget
2022-03-22 17:28 ` [PATCH 1/5] list-objects-filter: remove CL_ARG__FILTER Derrick Stolee via GitGitGadget
2022-03-22 17:28 ` [PATCH 2/5] pack-objects: move revs out of get_object_list() Derrick Stolee via GitGitGadget
2022-03-22 17:28 ` [PATCH 3/5] pack-objects: parse --filter directly into revs.filter Derrick Stolee via GitGitGadget
2022-03-22 19:37   ` [-SPAM-] " Ramsay Jones
2022-03-23 13:48     ` Derrick Stolee
2022-03-22 21:15   ` Ævar Arnfjörð Bjarmason
2022-03-22 17:28 ` [PATCH 4/5] bundle: move capabilities to end of 'verify' Derrick Stolee via GitGitGadget
2022-03-23  7:08   ` Bagas Sanjaya
2022-03-23 13:39     ` Derrick Stolee
2022-03-22 17:28 ` [PATCH 5/5] bundle: output hash information in 'verify' Derrick Stolee via GitGitGadget
2022-03-23 21:27 ` [PATCH 0/5] Partial bundle follow ups Junio C Hamano
2022-03-25 14:25 ` [PATCH] pack-objects: lazily set up "struct rev_info", don't leak Ævar Arnfjörð Bjarmason
2022-03-25 14:57   ` Derrick Stolee
2022-03-25 16:00     ` Ævar Arnfjörð Bjarmason
2022-03-25 16:41       ` Derrick Stolee
2022-03-25 17:34         ` Ævar Arnfjörð Bjarmason
2022-03-25 19:08           ` Derrick Stolee
2022-03-26  0:52             ` Ævar Arnfjörð Bjarmason [this message]
2022-03-28 14:04               ` Derrick Stolee
2022-03-25 18:53   ` Junio C Hamano
2022-03-26  1:09     ` Ævar Arnfjörð Bjarmason
2022-03-28 15:43   ` [PATCH v2] " Ævar Arnfjörð Bjarmason
2022-03-28 15:58     ` Derrick Stolee
2022-03-28 17:10     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220326.865yo1leay.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.