git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "Andrzej Hunt" <andrzej@ahunt.org>,
	git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Lénaïc Huard" <lenaic@lhuard.fr>,
	"Derrick Stolee" <dstolee@microsoft.com>,
	"Felipe Contreras" <felipe.contreras@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Đoàn Trần Công Danh" <congdanhqx@gmail.com>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: Re: [PATCH v2 2/4] SANITIZE tests: fix memory leaks in t13*config*, add to whitelist
Date: Fri, 16 Jul 2021 09:46:33 +0200	[thread overview]
Message-ID: <87wnpqy8zd.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <YPCrvOce5qRWk6Rq@coredump.intra.peff.net>


On Thu, Jul 15 2021, Jeff King wrote:

> On Wed, Jul 14, 2021 at 08:57:37PM +0200, Andrzej Hunt wrote:
>
>> > @@ -1331,8 +1336,10 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
>> >   	if (!strcmp(var, "core.attributesfile"))
>> >   		return git_config_pathname(&git_attributes_file, var, value);
>> > -	if (!strcmp(var, "core.hookspath"))
>> > +	if (!strcmp(var, "core.hookspath")) {
>> > +		UNLEAK(git_hooks_path);
>> >   		return git_config_pathname(&git_hooks_path, var, value);
>> > +	}
>> 
>> Why is the UNLEAK necessary here? We generally want to limit use of UNLEAK
>> to cmd_* functions or direct helpers. git_default_core_config() seems
>> generic enough that it could be called from anywhere, and using UNLEAK here
>> means we're potentially masking a real leak?
>> 
>> IIUC the leak here happens because:
>> - git_hooks_path is a global variable - hence it's unlikely we'd ever
>>   bother cleaning it up.
>> - git_default_core_config() gets called a first time with
>>   core.hookspath, and we end up allocating new memory into
>>   git_hooks_path.
>> - git_default_core_config() gets called again with core.hookspath,
>>   and we overwrite git_hooks_path with a new string which leaks
>>   the string that git_hooks_path used to point to.
>> 
>> So I think the real fix is to free(git_hooks_path) instead of an UNLEAK?
>> (Looking at the surrounding code, it looks like the same pattern of leak
>> might be repeated for other similar globals - is it worth auditing those
>> while we're here?)
>
> This is a common leak pattern in Git. We do something like:
>
>   static const char *foo = "default";
>   ...
>   int config_cb(const char *var, const char *value, void *)
>   {
>           if (!strcmp(var, "core.foo"))
> 	          foo = xstrdup(value);
>   }
>
> So we leak if the variable appears twice. But we can't just call
> "free(foo)" here. In the first call, it's pointing to a string literal!
>
> In the case of git_hooks_path, it defaults to NULL, so this works out
> OK. But it's setting up a trap for somebody later on, who assigns it a
> default value (and the compiler won't help; it's a "const char *", so
> the assignment is fine, and the free() would already be casting away the
> constness).
>
> I see a few possible solutions:
>
>   - instead of strdup'ing long-lived config values, strintern() them.
>     This is really leaking them, but in a way that we hold on to the old
>     values. This is actually more or less what UNLEAK() is doing under
>     the hood (saving a reference to the old buffer, even the variable is
>     overwritten).
>
>   - find a way to tell when a string comes from the heap versus a
>     literal. I don't think you can do this portably without keeping your
>     own separate flag. We could abstract away some of the pain with a
>     struct like:
>
>        struct def_string {
>                /* might point to heap memory; const because you must
>                 * check flag before modifying */
>                const char *value;
>                int from_heap;
>        }
>
>        /* regular static initialization is OK if you don't want a default */
>        #define DEF_STRING_INIT(str) { .value = str }
>
>        static void def_string_set(struct def_string *ds, const char *value)
>        {
>                if (ds->from_heap)
>                        free(ds->value);
>                ds->value = xstrdup(value);
>                ds->from_heap = 1;
>        }
>
>     The annoying thing is all of the users need to refer to
>     git_hook_path.value instead of just git_hook_path. If you don't mind
>     a little macro hackery, we could get around that by declaring pairs
>     of variables. Like:
>
>       #define DEF_STRING_DECLARE(name, value) \
>       const char *name = value; \
>       int name##_from_heap
>
>       #define DEF_STRING_SET(name, value) do { \
>               if (name##_from_heap) \
>                       free(name); \
>               name = xstrdup(value); \
>               name##_from_heap = 1; \
>       } while(0)
>
> I can't say I _love_ any of that, but I think it would work (and
> probably we'd adapt our helpers like git_config_pathname() to take a
> def_string. Or I guess just have a def_string_free() which can be called
> before writing into them).
>
> But maybe there's a better solution I'm missing.

Instead of: "int from_heap" in your "def_string" I think we should just
use "struct string_list_item". I.e. you want a void* here. Why?

<Digression>

I have an unsent series for handling some more common cases in the
string-list API. I started writing it due to a very related problem,
i.e. that we conflate "string init dup/nodup" with "do we want to
free?".

We (ab)use the "strdup_strings" in a few places to free that sort of
thing at the end if we have heap-allocated strings, but ones we did not
strdup ourselves, e.g. this in merge-ort.c (not picking on Elijah (CC'd)
here, it's common in lots of places, and this one was pretty much lifted
from merge-recursive).

        opti->paths_to_free.strdup_strings = 1;
        string_list_clear(&opti->paths_to_free, 0);
        opti->paths_to_free.strdup_strings = 0;

So I improved the string-list and strmap free functions so you can
instead do:

    string_list_clear_strings((&opti->paths_to_free, 0);

And that along with some other changes allows you to clear (or not) any
combination of the string, util, or have a callback function of your own
run (but be ensured to run all of those before we get to any of the
other freeing).

</Digression>

You must be thinking what any of this has to do with heap strings in C,
well one common case you've not discussed is that we sometimes do the
equivalent of, with string-list.h or not (somewhat pseudocode);

	void add_to_list(struct string_list *list, char *on_heap_now_we_own_it)
	{
		char *ptr = on_heap_now_we_own_it;
		char *mydup = xstrdup("foo");

	        ptr++; /* skip first byte */
		string_list_append(list, ptr);
		string_list_append(list, mydup);
	}

And:

        struct string_list list = STRING_LIST_INIT_NODUP;
        /* other stuff here, we get strings from somewhere etc. */
        add_to_list(list, some_string);

So now you're left with needing to free both at the end, but we since we
did ptr++ there we can't free() that (we'd need to free(ptr - 1), but
how to keep track of that?).

Well, tying this back to my clear() improvements for string-list.h I
thought a really neat solution to this was:

    string_list_append(list, ptr)->util = on_heap_now_we_own_it;
    string_list_append(list, mydup)->util = mydup;

I.e. by convention we store the pointer we need to free (if any) in the
"util" field.

And then if you get a string not from the heap you just leave the "util"
as NULL, and at the end you just free() all your "util" fields, and it
just so happens that some of them are the same as the "string" field.

We're not in the habit of passing loose "string_list_item" around now,
but I don't see why we wouldn't (possibly with a change to extract that
bit out, so we could use it in other places).

The neat thing about doing this is also that you're not left with every
API boundary needing to deal with your new "def_string", a lot of them
use string_list already, and hardly need to change anything, to the
extent that we do need to change anything having a "void *util" is a lot
more generally usable. You end up getting memory management for free as
you gain a feature to pass arbitrary data along with your items.

  parent reply	other threads:[~2021-07-16  8:10 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 14:38 UNLEAK(), leak checking in the default tests etc Ævar Arnfjörð Bjarmason
2021-06-09 17:44 ` Andrzej Hunt
2021-06-09 20:36   ` Felipe Contreras
2021-06-10 10:46   ` Jeff King
2021-06-10 10:56   ` Ævar Arnfjörð Bjarmason
2021-06-10 13:38     ` Jeff King
2021-06-10 15:32       ` Andrzej Hunt
2021-06-10 16:36         ` Jeff King
2021-06-11 15:44           ` Andrzej Hunt
2021-06-10 19:01 ` SZEDER Gábor
2021-07-14  0:11 ` [PATCH 0/4] add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-07-14  0:11   ` [PATCH 1/4] tests: " Ævar Arnfjörð Bjarmason
2021-07-14  3:23     ` Đoàn Trần Công Danh
2021-07-14  0:11   ` [PATCH 2/4] SANITIZE tests: fix memory leaks in t13*config*, add to whitelist Ævar Arnfjörð Bjarmason
2021-07-14  0:11   ` [PATCH 3/4] SANITIZE tests: fix memory leaks in t5701*, " Ævar Arnfjörð Bjarmason
2021-07-14  0:11   ` [PATCH 4/4] SANITIZE tests: fix leak in mailmap.c Ævar Arnfjörð Bjarmason
2021-07-14  2:19     ` Eric Sunshine
2021-07-14 17:23   ` [PATCH v2 0/4] add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-07-14 17:23     ` [PATCH v2 1/4] tests: " Ævar Arnfjörð Bjarmason
2021-07-14 18:42       ` Andrzej Hunt
2021-07-14 22:39         ` Ævar Arnfjörð Bjarmason
2021-07-15 21:14         ` Jeff King
2021-07-15 21:06       ` Jeff King
2021-07-16 14:46         ` Ævar Arnfjörð Bjarmason
2021-07-16 18:09           ` Jeff King
2021-07-16 18:45             ` Jeff King
2021-07-16 18:56             ` Ævar Arnfjörð Bjarmason
2021-07-16 19:22               ` Jeff King
2021-07-14 17:23     ` [PATCH v2 2/4] SANITIZE tests: fix memory leaks in t13*config*, add to whitelist Ævar Arnfjörð Bjarmason
2021-07-14 18:57       ` Andrzej Hunt
2021-07-14 22:56         ` Ævar Arnfjörð Bjarmason
2021-07-15 21:42         ` Jeff King
2021-07-16  5:18           ` Andrzej Hunt
2021-07-16 21:20             ` Jeff King
2021-07-16  7:46           ` Ævar Arnfjörð Bjarmason [this message]
2021-07-16 21:16             ` Jeff King
2021-08-31 12:47               ` Ævar Arnfjörð Bjarmason
2021-09-01  7:53                 ` Jeff King
2021-09-01 11:45                   ` Ævar Arnfjörð Bjarmason
2021-07-14 17:23     ` [PATCH v2 3/4] SANITIZE tests: fix memory leaks in t5701*, " Ævar Arnfjörð Bjarmason
2021-07-15 17:37       ` Andrzej Hunt
2021-07-15 21:43       ` Jeff King
2021-08-31 13:46       ` [PATCH] protocol-caps.c: fix memory leak in send_info() Ævar Arnfjörð Bjarmason
2021-08-31 15:32         ` Bruno Albuquerque
2021-08-31 18:15           ` Junio C Hamano
     [not found]         ` <CAPeR6H69a_HMwWnpHzssaCm_ow=ic7AnzMdZVQJQ2ECRDaWzaA@mail.gmail.com>
2021-08-31 20:08           ` Ævar Arnfjörð Bjarmason
2021-07-14 17:23     ` [PATCH v2 4/4] SANITIZE tests: fix leak in mailmap.c Ævar Arnfjörð Bjarmason
2021-08-31 13:42       ` [PATCH] mailmap.c: fix a memory leak in free_mailap_{info,entry}() Ævar Arnfjörð Bjarmason
2021-08-31 16:22         ` Eric Sunshine
2021-08-31 19:38         ` Jeff King
2021-08-31 19:46           ` Junio C Hamano
2021-07-15 17:37     ` [PATCH v2 0/4] add a test mode for SANITIZE=leak, run it in CI Andrzej Hunt
2021-08-31 13:35     ` [PATCH v3 0/8] " Ævar Arnfjörð Bjarmason
2021-09-01  9:56       ` Jeff King
2021-09-01 10:42         ` Jeff King
2021-09-02 12:25         ` Ævar Arnfjörð Bjarmason
2021-09-03 11:13           ` Jeff King
2021-09-07 15:33       ` [PATCH v4 0/3] " Ævar Arnfjörð Bjarmason
2021-09-07 15:33         ` [PATCH v4 1/3] Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS Ævar Arnfjörð Bjarmason
2021-09-07 15:33         ` [PATCH v4 2/3] CI: refactor "if" to "case" statement Ævar Arnfjörð Bjarmason
2021-09-07 15:33         ` [PATCH v4 3/3] tests: add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-09-07 16:29           ` Eric Sunshine
2021-09-07 16:51           ` Jeff King
2021-09-07 16:44         ` [PATCH v4 0/3] " Jeff King
2021-09-07 18:22           ` Junio C Hamano
2021-09-07 21:30         ` [PATCH v5 " Ævar Arnfjörð Bjarmason
2021-09-07 21:30           ` [PATCH v5 1/3] Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS Ævar Arnfjörð Bjarmason
2021-09-07 21:30           ` [PATCH v5 2/3] CI: refactor "if" to "case" statement Ævar Arnfjörð Bjarmason
2021-09-07 21:30           ` [PATCH v5 3/3] tests: add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-09-08  4:46             ` Eric Sunshine
2021-09-16  3:56             ` [PATCH] fixup! " Carlo Marcelo Arenas Belón
2021-09-16  6:14               ` Ævar Arnfjörð Bjarmason
2021-09-08 11:02           ` [PATCH v5 0/3] " Junio C Hamano
2021-09-08 12:03             ` Ævar Arnfjörð Bjarmason
2021-09-09 23:10               ` Emily Shaffer
2021-09-16 10:48           ` [PATCH v6 0/2] " Ævar Arnfjörð Bjarmason
2021-09-16 10:48             ` [PATCH v6 1/2] Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS Ævar Arnfjörð Bjarmason
2021-09-16 10:48             ` [PATCH v6 2/2] tests: add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-09-19  8:03             ` [PATCH v7 0/2] " Ævar Arnfjörð Bjarmason
2021-09-19  8:03               ` [PATCH v7 1/2] Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS Ævar Arnfjörð Bjarmason
2021-09-19  8:03               ` [PATCH v7 2/2] tests: add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-09-22 11:17                 ` [PATCH] fixup! " Carlo Marcelo Arenas Belón
2021-09-23  1:50                   ` Ævar Arnfjörð Bjarmason
2021-09-23  9:20               ` [PATCH v8 0/2] " Ævar Arnfjörð Bjarmason
2021-09-23  9:20                 ` [PATCH v8 1/2] Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS Ævar Arnfjörð Bjarmason
2021-09-23  9:20                 ` [PATCH v8 2/2] tests: add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-11-03 22:44                   ` Re* " Junio C Hamano
2021-11-03 23:57                     ` Junio C Hamano
2021-11-04 10:06                     ` Ævar Arnfjörð Bjarmason
2021-11-16 18:31                       ` [PATCH] t0006: date_mode can leak .strftime_fmt member Ævar Arnfjörð Bjarmason
2021-11-16 19:04                         ` Junio C Hamano
2021-11-16 19:31                         ` Jeff King
2022-02-02 21:03                           ` [PATCH 0/5] date.[ch] API: split from cache.h, add API docs, stop leaking memory Ævar Arnfjörð Bjarmason
2022-02-02 21:03                             ` [PATCH 1/5] cache.h: remove always unused show_date_human() declaration Ævar Arnfjörð Bjarmason
2022-02-02 21:03                             ` [PATCH 2/5] date API: create a date.h, split from cache.h Ævar Arnfjörð Bjarmason
2022-02-02 21:19                               ` Ævar Arnfjörð Bjarmason
2022-02-15  3:04                               ` Junio C Hamano
2022-02-02 21:03                             ` [PATCH 3/5] date API: provide and use a DATE_MODE_INIT Ævar Arnfjörð Bjarmason
2022-02-02 21:03                             ` [PATCH 4/5] date API: add basic API docs Ævar Arnfjörð Bjarmason
2022-02-15  2:14                               ` Junio C Hamano
2022-02-02 21:03                             ` [PATCH 5/5] date API: add and use a date_mode_release() Ævar Arnfjörð Bjarmason
2022-02-15  0:28                               ` Junio C Hamano
2022-02-04 23:53                             ` [PATCH v2 0/5] date.[ch] API: split from cache.h, add API docs, stop leaking memory Ævar Arnfjörð Bjarmason
2022-02-04 23:53                               ` [PATCH v2 1/5] cache.h: remove always unused show_date_human() declaration Ævar Arnfjörð Bjarmason
2022-02-04 23:53                               ` [PATCH v2 2/5] date API: create a date.h, split from cache.h Ævar Arnfjörð Bjarmason
2022-02-04 23:53                               ` [PATCH v2 3/5] date API: provide and use a DATE_MODE_INIT Ævar Arnfjörð Bjarmason
2022-02-04 23:53                               ` [PATCH v2 4/5] date API: add basic API docs Ævar Arnfjörð Bjarmason
2022-02-04 23:53                               ` [PATCH v2 5/5] date API: add and use a date_mode_release() Ævar Arnfjörð Bjarmason
2022-02-14 17:25                               ` [PATCH v2 0/5] date.[ch] API: split from cache.h, add API docs, stop leaking memory Ævar Arnfjörð Bjarmason
2022-02-14 19:52                                 ` Junio C Hamano
2022-02-16  8:14                               ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2022-02-16  8:14                                 ` [PATCH v3 1/5] cache.h: remove always unused show_date_human() declaration Ævar Arnfjörð Bjarmason
2022-02-16  8:14                                 ` [PATCH v3 2/5] date API: create a date.h, split from cache.h Ævar Arnfjörð Bjarmason
2022-02-16  8:14                                 ` [PATCH v3 3/5] date API: provide and use a DATE_MODE_INIT Ævar Arnfjörð Bjarmason
2022-02-16  8:14                                 ` [PATCH v3 4/5] date API: add basic API docs Ævar Arnfjörð Bjarmason
2022-02-16  8:14                                 ` [PATCH v3 5/5] date API: add and use a date_mode_release() Ævar Arnfjörð Bjarmason
2022-02-16 17:45                                 ` [PATCH v3 0/5] date.[ch] API: split from cache.h, add API docs, stop leaking memory Junio C Hamano
     [not found]     ` <cover-v3-0.8-00000000000-20210831T132607Z-avarab@gmail.com>
2021-08-31 13:35       ` [PATCH v3 1/8] Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 2/8] CI: refactor "if" to "case" statement Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 3/8] tests: add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 4/8] tests: annotate t000*.sh with TEST_PASSES_SANITIZE_LEAK=true Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 5/8] tests: annotate t001*.sh " Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 6/8] tests: annotate t002*.sh " Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 7/8] tests: annotate select t0*.sh " Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 8/8] tests: annotate select t*.sh " Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wnpqy8zd.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=andrzej@ahunt.org \
    --cc=congdanhqx@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=felipe.contreras@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=lenaic@lhuard.fr \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=sunshine@sunshineco.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).