git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Atneya Nair <atneya@google.com>,
	git@vger.kernel.org, jeffhost@microsoft.com, me@ttaylorr.com,
	nasamuffin@google.com, Tanay Abhra <tanayabh@gmail.com>,
	Glen Choo <glencbz@gmail.com>
Subject: Re: [RFC PATCH 2/3] Make ce_compare_gitlink thread-safe
Date: Tue, 5 Mar 2024 20:23:23 -0500	[thread overview]
Message-ID: <20240306012323.GA3817803@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqsf141pf5.fsf@gitster.g>

On Tue, Mar 05, 2024 at 10:53:02AM -0800, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > The use of strintern() comes originally from 3df8fd62 ...
> > ..., so they may
> > know how safe the change on the config side would be (I still do
> > not understand why you'd want to do this in the first place, though,
> > especially if you are protecting the callsites with mutex).
> 
> The risks of turning code that uses strintern() to use strdup() are
> 
>  * you will leak the allocated string unless you explicitly free the
>    string you now own.
> 
>  * you may consume too much memory if you are creating too many
>    copies of the same string (e.g. if you need filename for each
>    line in a file in an application, the memory consumption can
>    become 1000-fold).
> 
>  * the code may be taking advantage of the fact that two such
>    strings can be compared for (in)equality simply by comparing
>    their addresses, which you would need to adjust to use !strcmp()
>    and the like.

There is one more, I think: if you _do_ free the allocated string to
avoid the leak you mention, then some other code which was relying on
the lifetime of that string to be effectively infinite will now have a
user-after-free.

And I think that may be the case here. The "kvi" struct itself is local
to do_config_from(). But when we load the caching configset, we do so in
configset_add_value() which makes a shallow copy of the kvi struct. And
then that shallow copy may live on and be accessed with further calls to
git_config().

So doing just this:

diff --git a/config.c b/config.c
index 3cfeb3d8bd..2f6c83ffe7 100644
--- a/config.c
+++ b/config.c
@@ -1017,7 +1017,7 @@ static void kvi_from_source(struct config_source *cs,
 			    enum config_scope scope,
 			    struct key_value_info *out)
 {
-	out->filename = strintern(cs->name);
+	out->filename = xstrdup(cs->name);
 	out->origin_type = cs->origin_type;
 	out->linenr = cs->linenr;
 	out->scope = scope;
@@ -1855,6 +1855,7 @@ static int do_config_from(struct config_source *top, config_fn_t fn,
 
 	ret = git_parse_source(top, fn, &kvi, data, opts);
 
+	free((char *)kvi.filename);
 	strbuf_release(&top->value);
 	strbuf_release(&top->var);
 

will cause t4013.199 (among others) to fail when built with
SANITIZE=address, as it detects the user-after-free. I think you'd need
this on top:

diff --git a/config.c b/config.c
index 2f6c83ffe7..9854ca002d 100644
--- a/config.c
+++ b/config.c
@@ -2262,6 +2262,7 @@ static int configset_add_value(const struct key_value_info *kvi_p,
 	l_item->value_index = e->value_list.nr - 1;
 
 	*kv_info = *kvi_p;
+	kv_info->filename = xstrdup_or_null(kvi_p->filename); /* deep copy! */
 	si->util = kv_info;
 
 	return 0;

though probably an actual kvi_copy() function would be less horrible.

A few other things to note, looking at this code:

  - isn't kvi->path in the same boat? We do not duplicate it at all, so
    it seems like the shallow copy made in the configset could cause a
    user-after-free.

  - the "fix" I showed above hits your point 2: now we are making a lot
    more copies of that string. I will note that we're already making a
    lot of copies of the kvi struct in the first place, so unless you
    have really long pathnames, it probably isn't a big difference.

    But it possibly could make sense to have the configset own a single
    duplicate string, and then let the kvi structs it holds point to
    that string. But IMHO all of this should be details of the configset
    code, and the main config-iteration code should not have to worry
    about this at all. I.e., I think kvi_from_source() should not be
    duplicating anything in the first place.

-Peff

  reply	other threads:[~2024-03-06  1:23 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05  1:21 [RFC PATCH 0/3] Parallel submodule status Atneya Nair
2024-03-05  1:21 ` [RFC PATCH 1/3] Make read_gitfile and resolve_gitfile thread safe Atneya Nair
2024-03-05  2:22   ` Junio C Hamano
2024-03-05  4:29   ` Eric Sunshine
2024-03-12 20:38     ` Atneya Nair
2024-03-06  8:13   ` Jean-Noël Avila
2024-03-06 16:57     ` Junio C Hamano
2024-03-12 20:44       ` Atneya Nair
2024-03-05  1:21 ` [RFC PATCH 2/3] Make ce_compare_gitlink thread-safe Atneya Nair
2024-03-05 17:08   ` Junio C Hamano
2024-03-05 18:53     ` Junio C Hamano
2024-03-06  1:23       ` Jeff King [this message]
2024-03-06  1:58         ` Junio C Hamano
2024-03-12 22:13         ` Atneya Nair
2024-03-12 22:15     ` Atneya Nair
2024-03-13 17:51       ` Junio C Hamano
2024-03-05  1:21 ` [RFC PATCH 3/3] Preload submodule state in refresh_index Atneya Nair

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240306012323.GA3817803@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=atneya@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=glencbz@gmail.com \
    --cc=jeffhost@microsoft.com \
    --cc=me@ttaylorr.com \
    --cc=nasamuffin@google.com \
    --cc=tanayabh@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).