All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Squash leaks in t0000
@ 2021-09-18 13:49 Andrzej Hunt via GitGitGadget
  2021-09-18 13:49 ` [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks Andrzej Hunt via GitGitGadget
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Andrzej Hunt via GitGitGadget @ 2021-09-18 13:49 UTC (permalink / raw)
  To: git; +Cc: Carlo Arenas, Andrzej Hunt

Carlo points out that t0000 currently doesn't pass with leak-checking
enabled in:
https://public-inbox.org/git/CAPUEsphMUNYRACmK-nksotP1RrMn09mNGFdEHLLuNEWH4AcU7Q@mail.gmail.com/T/#m7e40220195d98aee4be7e8593d30094b88a6ee71

Here's a series that I've sat on for a while, which adds some UNLEAK's to
"fix" this situation - see the individual patches for a justification of why
an UNLEAK seems appropriate.

ATB, Andrzej

Andrzej Hunt (2):
  log: UNLEAK rev to silence a large number of leaks
  log: UNLEAK original pending objects

 builtin/log.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


base-commit: 186eaaae567db501179c0af0bf89b34cbea02c26
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1092%2Fahunt%2Fleaks-t0000-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1092/ahunt/leaks-t0000-v1
Pull-Request: https://github.com/git/git/pull/1092
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks
  2021-09-18 13:49 [PATCH 0/2] Squash leaks in t0000 Andrzej Hunt via GitGitGadget
@ 2021-09-18 13:49 ` Andrzej Hunt via GitGitGadget
  2021-09-18 20:06   ` Carlo Marcelo Arenas Belón
  2021-09-18 13:49 ` [PATCH 2/2] log: UNLEAK original pending objects Andrzej Hunt via GitGitGadget
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Andrzej Hunt via GitGitGadget @ 2021-09-18 13:49 UTC (permalink / raw)
  To: git; +Cc: Carlo Arenas, Andrzej Hunt, Andrzej Hunt

From: Andrzej Hunt <ajrhunt@google.com>

cmd_show puts a lot of data into rev, and doesn't clean it up before
returning. That's reasonable - we use most if not all of rev up until
cmd_show is finished - there's not much value in doing a proper cleanup.
Therefore we take the easy way out and UNLEAK rev.

The UNLEAK has to be performed early on, as cmd_show might return via
cmd_log_walk() in the next few lines, or it might continue to the
no-walk implementation below.

This patch silences the following leaks which were found when running
t0000 against LSAN:

Direct leak of 41 byte(s) in 1 object(s) allocated from:
    #0 0x486834 in strdup /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0x9ab168 in xstrdup /home/ahunt/oss-fuzz/git/wrapper.c:29:14
    #2 0x83cced in add_object_array_with_path /home/ahunt/oss-fuzz/git/object.c:349:17
    #3 0x8f4f5a in add_pending_object_with_path /home/ahunt/oss-fuzz/git/revision.c:329:2
    #4 0x8eb2b6 in handle_revision_arg_1 /home/ahunt/oss-fuzz/git/revision.c:2082:2
    #5 0x8eadad in handle_revision_arg /home/ahunt/oss-fuzz/git/revision.c:2089:12
    #6 0x8eea99 in setup_revisions /home/ahunt/oss-fuzz/git/revision.c:2756:7
    #7 0x59c024 in cmd_log_init_finish /home/ahunt/oss-fuzz/git/builtin/log.c:206:9
    #8 0x5998d8 in cmd_log_init /home/ahunt/oss-fuzz/git/builtin/log.c:275:2
    #9 0x599f9b in cmd_show /home/ahunt/oss-fuzz/git/builtin/log.c:641:2
    #10 0x4cd92d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
    #11 0x4cb5fa in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
    #12 0x4ccf57 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
    #13 0x4caf49 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
    #14 0x69ce3e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
    #15 0x7f7c56197349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x49a9d2 in calloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
    #1 0x9ab4c2 in xcalloc /home/ahunt/oss-fuzz/git/wrapper.c:140:8
    #2 0x59c269 in cmd_log_init_finish /home/ahunt/oss-fuzz/git/builtin/log.c:233:18
    #3 0x5998d8 in cmd_log_init /home/ahunt/oss-fuzz/git/builtin/log.c:275:2
    #4 0x599f9b in cmd_show /home/ahunt/oss-fuzz/git/builtin/log.c:641:2
    #5 0x4cd92d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
    #6 0x4cb5fa in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
    #7 0x4ccf57 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
    #8 0x4caf49 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
    #9 0x69ce3e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
    #10 0x7f7c56197349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 41 byte(s) in 1 object(s) allocated from:
    #0 0x486834 in strdup /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0x9ab168 in xstrdup /home/ahunt/oss-fuzz/git/wrapper.c:29:14
    #2 0x8f5e30 in add_rev_cmdline /home/ahunt/oss-fuzz/git/revision.c:1482:23
    #3 0x8eb26d in handle_revision_arg_1 /home/ahunt/oss-fuzz/git/revision.c:2081:2
    #4 0x8eadad in handle_revision_arg /home/ahunt/oss-fuzz/git/revision.c:2089:12
    #5 0x8eea99 in setup_revisions /home/ahunt/oss-fuzz/git/revision.c:2756:7
    #6 0x59c024 in cmd_log_init_finish /home/ahunt/oss-fuzz/git/builtin/log.c:206:9
    #7 0x5998d8 in cmd_log_init /home/ahunt/oss-fuzz/git/builtin/log.c:275:2
    #8 0x599f9b in cmd_show /home/ahunt/oss-fuzz/git/builtin/log.c:641:2
    #9 0x4cd92d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
    #10 0x4cb5fa in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
    #11 0x4ccf57 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
    #12 0x4caf49 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
    #13 0x69ce3e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
    #14 0x7fc4b3f06349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
---
 builtin/log.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/log.c b/builtin/log.c
index f75d87e8d7f..6faaddf17a6 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -644,6 +644,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 	opt.def = "HEAD";
 	opt.tweak = show_setup_revisions_tweak;
 	cmd_log_init(argc, argv, prefix, &rev, &opt);
+	UNLEAK(rev);
 
 	if (!rev.no_walk)
 		return cmd_log_walk(&rev);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] log: UNLEAK original pending objects
  2021-09-18 13:49 [PATCH 0/2] Squash leaks in t0000 Andrzej Hunt via GitGitGadget
  2021-09-18 13:49 ` [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks Andrzej Hunt via GitGitGadget
@ 2021-09-18 13:49 ` Andrzej Hunt via GitGitGadget
  2021-09-18 17:28 ` [PATCH 0/2] Squash leaks in t0000 Carlo Arenas
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Andrzej Hunt via GitGitGadget @ 2021-09-18 13:49 UTC (permalink / raw)
  To: git; +Cc: Carlo Arenas, Andrzej Hunt, Andrzej Hunt

From: Andrzej Hunt <ajrhunt@google.com>

cmd_show() uses objects to point to rev.pending's objects. Later, it might
detach rev.pending's objects: when handling an OBJ_COMMIT it will reset
rev.pending (without freeing its objects). Detaching (as opposed to freeing)
is necessary because cmd_show() continues iterating over the original objects
array.

We choose to UNLEAK because there's no real advantage to cleaning up
properly (cmd_show() exits immediately after looping over these
objects). A number of alternatives exist, but are all significantly more
complex for no gain:

Alternative 1:
  Convert objects into an object_array, and memcpy rev.pending into it
  (followed by detaching rev.pending immediately - making objects the
  owner of what used to be rev.pending). Then we could safely
  objects_array_clear() at the end of cmd_show(). And we can rely on
  a preexisting UNLEAK(rev) to avoid having to clean up rev.pending.
  This is a more complex and riskier approach vs a simple UNLEAK,
  and doesn't add any user-visible value.

Alternative 2:
  A variation on alternative 1. We make objects own the object_array as
  before. Once we're done, we free the new rev.pending array (which
  might be empty), and we memcpy objects back into rev.pending, relying
  on the existin UNLEAK(rev) to avoid having to free rev.pending.

ASAN output from t0000:

Direct leak of 41 byte(s) in 1 object(s) allocated from:
    #0 0x487504 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:437:3
    #1 0x9e4ef8 in xstrdup wrapper.c:29:14
    #2 0x86395d in add_object_array_with_path object.c:366:17
    #3 0x9264fc in add_pending_object_with_path revision.c:330:2
    #4 0x91c4e0 in handle_revision_arg_1 revision.c:2086:2
    #5 0x91bfcd in handle_revision_arg revision.c:2093:12
    #6 0x91ff5a in setup_revisions revision.c:2780:7
    #7 0x5a7678 in cmd_log_init_finish builtin/log.c:206:9
    #8 0x5a4f18 in cmd_log_init builtin/log.c:278:2
    #9 0x5a55d1 in cmd_show builtin/log.c:646:2
    #10 0x4cff30 in run_builtin git.c:461:11
    #11 0x4cdb00 in handle_builtin git.c:713:3
    #12 0x4cf527 in run_argv git.c:780:4
    #13 0x4cd426 in cmd_main git.c:911:19
    #14 0x6b2eb5 in main common-main.c:52:11
    #15 0x7f74fc9bd349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 41 byte(s) leaked in 1 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
---
 builtin/log.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/log.c b/builtin/log.c
index 6faaddf17a6..769ee6a9258 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -702,7 +702,8 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			ret = error(_("unknown type: %d"), o->type);
 		}
 	}
-	free(objects);
+	UNLEAK(objects);
+
 	return ret;
 }
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Squash leaks in t0000
  2021-09-18 13:49 [PATCH 0/2] Squash leaks in t0000 Andrzej Hunt via GitGitGadget
  2021-09-18 13:49 ` [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks Andrzej Hunt via GitGitGadget
  2021-09-18 13:49 ` [PATCH 2/2] log: UNLEAK original pending objects Andrzej Hunt via GitGitGadget
@ 2021-09-18 17:28 ` Carlo Arenas
  2021-09-19 15:38   ` Andrzej Hunt
  2021-09-19 10:58 ` Ævar Arnfjörð Bjarmason
  2021-09-20 17:55 ` Junio C Hamano
  4 siblings, 1 reply; 15+ messages in thread
From: Carlo Arenas @ 2021-09-18 17:28 UTC (permalink / raw)
  To: Andrzej Hunt via GitGitGadget; +Cc: git, Andrzej Hunt

On Sat, Sep 18, 2021 at 6:49 AM Andrzej Hunt via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Carlo points out that t0000 currently doesn't pass with leak-checking
> enabled in:
> https://public-inbox.org/git/CAPUEsphMUNYRACmK-nksotP1RrMn09mNGFdEHLLuNEWH4AcU7Q@mail.gmail.com/T/#m7e40220195d98aee4be7e8593d30094b88a6ee71

Did you figure out why it doesn't trigger on maint even if the code
seems to be mostly the same?
At least seems to trigger consistently in master, next and seen.

> Here's a series that I've sat on for a while, which adds some UNLEAK's to
> "fix" this situation - see the individual patches for a justification of why
> an UNLEAK seems appropriate.

While I see that UNLEAK in this specific case, might be an ok "fix", I
have to admit that not finding a repo_clear_revisions() (or equivalent
function) that could be used to clear revs seems like a problem worth
fixing as well for the future.

Will reply with my WIP so we can see if it could work either as an
alternative to this, or at least lay some foundations so that a long
running process that needs to use a `struct revision` or some of this
logic can in the future without having to deal with leaks.

Thanks and "Reviewed-by: Carlo Marcelo Arenas Belón
<carenas@gmail.com>" if needed.

Carlo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks
  2021-09-18 13:49 ` [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks Andrzej Hunt via GitGitGadget
@ 2021-09-18 20:06   ` Carlo Marcelo Arenas Belón
  2021-09-19 15:51     ` Andrzej Hunt
  2021-09-19 16:13     ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 15+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2021-09-18 20:06 UTC (permalink / raw)
  To: Andrzej Hunt via GitGitGadget; +Cc: git, Andrzej Hunt, Andrzej Hunt

My equivalent version for these fixes is obviously more verbose but IMHO
not that ugly (and as safe)

It avoids the need to UNLEAK early by changing the program flow also for
the early return so the cleanup could be centralized in one single
function.

Both, the cmdline and mailmap arrays (and the objects they accumulate)
are cleaned in a "reusable" way.

Note that the cleaning of the "name" in the cmdline item throws a warning
as shown below which I intentionally didn't fix, as it would seem that
either the use of const there or the need to strdup is wrong.  So hope
someone that knows this code better could chime in.

Carlo
------ >8 ------
Subject: [PATCH] builtin/log: leaks from `git show` in t0000

obviously not ready, since the following will need to be corrected:

  revision.c:1496:8: warning: passing 'const char *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
                  free(info->rev[i].name);
                       ^~~~~~~~~~~~~~~~~

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 builtin/log.c |  8 ++++++--
 revision.c    | 20 ++++++++++++++++++++
 revision.h    |  5 +++++
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index f75d87e8d7..1b1c1f53f4 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -645,8 +645,10 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 	opt.tweak = show_setup_revisions_tweak;
 	cmd_log_init(argc, argv, prefix, &rev, &opt);
 
-	if (!rev.no_walk)
-		return cmd_log_walk(&rev);
+	if (!rev.no_walk) {
+		ret = cmd_log_walk(&rev);
+		goto done;
+	}
 
 	count = rev.pending.nr;
 	objects = rev.pending.objects;
@@ -702,6 +704,8 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 		}
 	}
 	free(objects);
+done:
+	repo_clear_revisions(&rev);
 	return ret;
 }
 
diff --git a/revision.c b/revision.c
index 0dabb5a0bc..ce62192dd8 100644
--- a/revision.c
+++ b/revision.c
@@ -1487,6 +1487,18 @@ static void add_rev_cmdline(struct rev_info *revs,
 	info->nr++;
 }
 
+static void clear_rev_cmdline(struct rev_info *revs)
+{
+	struct rev_cmdline_info *info = &revs->cmdline;
+	size_t i, nr = info->nr;
+
+	for (i = 0; i < nr; i++)
+		free(info->rev[i].name);
+
+	FREE_AND_NULL(info->rev);
+	info->nr = info->alloc = 0;
+}
+
 static void add_rev_cmdline_list(struct rev_info *revs,
 				 struct commit_list *commit_list,
 				 int whence,
@@ -1845,6 +1857,14 @@ void repo_init_revisions(struct repository *r,
 	init_display_notes(&revs->notes_opt);
 }
 
+void repo_clear_revisions(struct rev_info *revs)
+{
+	if (revs->mailmap)
+		clear_mailmap(revs->mailmap);
+	FREE_AND_NULL(revs->mailmap);
+	clear_rev_cmdline(revs);
+}
+
 static void add_pending_commit_list(struct rev_info *revs,
 				    struct commit_list *commit_list,
 				    unsigned int flags)
diff --git a/revision.h b/revision.h
index 0c65a760ee..f695c41cee 100644
--- a/revision.h
+++ b/revision.h
@@ -358,6 +358,11 @@ void repo_init_revisions(struct repository *r,
 			 struct rev_info *revs,
 			 const char *prefix);
 
+/*
+ * Free all structures dynamically allocated for the provided rev_info
+ */
+void repo_clear_revisions(struct rev_info *revs);
+
 /**
  * Parse revision information, filling in the `rev_info` structure, and
  * removing the used arguments from the argument list. Returns the number
-- 
2.33.0.911.gbe391d4e11


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Squash leaks in t0000
  2021-09-18 13:49 [PATCH 0/2] Squash leaks in t0000 Andrzej Hunt via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-09-18 17:28 ` [PATCH 0/2] Squash leaks in t0000 Carlo Arenas
@ 2021-09-19 10:58 ` Ævar Arnfjörð Bjarmason
  2021-09-20 17:55 ` Junio C Hamano
  4 siblings, 0 replies; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-19 10:58 UTC (permalink / raw)
  To: Andrzej Hunt via GitGitGadget; +Cc: git, Carlo Arenas, Andrzej Hunt


On Sat, Sep 18 2021, Andrzej Hunt via GitGitGadget wrote:

> Carlo points out that t0000 currently doesn't pass with leak-checking
> enabled in:
> https://public-inbox.org/git/CAPUEsphMUNYRACmK-nksotP1RrMn09mNGFdEHLLuNEWH4AcU7Q@mail.gmail.com/T/#m7e40220195d98aee4be7e8593d30094b88a6ee71
>
> Here's a series that I've sat on for a while, which adds some UNLEAK's to
> "fix" this situation - see the individual patches for a justification of why
> an UNLEAK seems appropriate.
>
> ATB, Andrzej
>
> Andrzej Hunt (2):
>   log: UNLEAK rev to silence a large number of leaks
>   log: UNLEAK original pending objects
>
>  builtin/log.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

I sent a re-roll of that series[1] that bypasses the issue by no longer
running t0000-basic.sh, so there won't be an immediate need for a fixup
series like this.

As for these patches & approach, I think that these unleak fixes that
just narrowly squash some failure in a specific test aren't worth doing,
and are actually counter-productive.

We should instead eventually fix the leaks more generally and make the
built-ins use those APIs.

Maybe our differing approaches there are because we've got different
end-goals in mind. My end-goal is three-fold:

 A. Make git's core APIs nicer, in most cases that we're not freeing
    memory is a result of a rather messy API that's not quite sure who
    should be managing its memory. This usually makes using it correctly
    harder in other ways.

 B. Make those APIs not leak memory, so we can use them as libraries.

 C. Have regression tests testing [*B*]

Given that, I think that fixing memory leaks in built-in when we're
about to exit is completely pointless as a goal in itself. We're about
to exit anyway, why care that we're leaking memory?

The only reason, I think, is that we're doing it as a proxy to get to a
combination of [*A*] and [*B*] above. Once we know that we can run "git
log" in various modes without it leaking, it's likely that most or all
of the revisions walking API, refname resolution, object lookup
etc. isn't leaking.

I have a WIP branch that would obsolete this[2], see the commit at its
tip. As shown there you're fixing a leak in cmd_show(), but omit the
same leak in its sister functions.

At that point we won't need these UNLEAK(), and as a follow-up any
concerns about spending too much time in a built-in just to clean up
could rather easily be done with something like a GIT_DESTRUCT_LEVEL[3],
i.e. we'd conditionally skip the freeing in some cases.

I'm not saying that there's no point in adding UNLEAK() somewhere, but I
really don't see it in this case. We didn't *need* to mark
t0000-basic.sh as leak-free right away, I just did so because it was the
first test, and I naïvely thought it would stay that way while my series
cooked.

I'd think that when building on top of my SANITIZE=leak series you'd
want instead of UNLEAK() to instead label the test as
TEST_PASSES_SANITIZE_LEAK=true, but just omit some specific breakages
with a use of the "SANITIZE_LEAK" prerequisite.

Maybe there's cases where you'd want to use
TEST_PASSES_SANITIZE_LEAK=true, but the leak is so deep in the guts of
some API that a transitional UNLEAK() is worth it, *and* you can't just
mark some other test that mostly tests the command you're interested in
with TEST_PASSES_SANITIZE_LEAK=true.

But so far I haven't seen such cases, e.g. there's cases where "git tag"
leaks in obscure cases, but not in some common cases with some of my
preliminary fixes. In that state I can usually find a test that uses
"git tag" in some way and mark that as TEST_PASSES_SANITIZE_LEAK=true,
instead of sprinkling UNLEAK() in builtin/tag.c just so I can mark the
main "git tag" test as passing.

1. 62833https://lore.kernel.org/git/cover-v7-0.2-00000000000-20210919T075619Z-avarab@gmail.com/
2. https://github.com/git/git/compare/master...avar:avar/tests-post-add-sanitize-leak-test-mode-fix-leaks
3. https://lore.kernel.org/git/87y2bi0vvl.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Squash leaks in t0000
  2021-09-18 17:28 ` [PATCH 0/2] Squash leaks in t0000 Carlo Arenas
@ 2021-09-19 15:38   ` Andrzej Hunt
  0 siblings, 0 replies; 15+ messages in thread
From: Andrzej Hunt @ 2021-09-19 15:38 UTC (permalink / raw)
  To: Carlo Arenas, Andrzej Hunt via GitGitGadget; +Cc: git



On 18/09/2021 19:28, Carlo Arenas wrote:
> 
>> Here's a series that I've sat on for a while, which adds some UNLEAK's to
>> "fix" this situation - see the individual patches for a justification of why
>> an UNLEAK seems appropriate.
> 
> While I see that UNLEAK in this specific case, might be an ok "fix", I
> have to admit that not finding a repo_clear_revisions() (or equivalent
> function) that could be used to clear revs seems like a problem worth
> fixing as well for the future.
> 
> Will reply with my WIP so we can see if it could work either as an
> alternative to this, or at least lay some foundations so that a long
> running process that needs to use a `struct revision` or some of this
> logic can in the future without having to deal with leaks.

Great - in this case I think it's best to ignore my patches since 
actually fixing the leaks is obviously a better solution :) !

ATB,

Andrzej


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks
  2021-09-18 20:06   ` Carlo Marcelo Arenas Belón
@ 2021-09-19 15:51     ` Andrzej Hunt
  2021-09-19 16:13     ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 15+ messages in thread
From: Andrzej Hunt @ 2021-09-19 15:51 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón, Andrzej Hunt via GitGitGadget
  Cc: git, Andrzej Hunt



On 18/09/2021 22:06, Carlo Marcelo Arenas Belón wrote:
> My equivalent version for these fixes is obviously more verbose but IMHO
> not that ugly (and as safe)
> 
> It avoids the need to UNLEAK early by changing the program flow also for
> the early return so the cleanup could be centralized in one single
> function.
> 
> Both, the cmdline and mailmap arrays (and the objects they accumulate)
> are cleaned in a "reusable" way.
> 
> Note that the cleaning of the "name" in the cmdline item throws a warning
> as shown below which I intentionally didn't fix, as it would seem that
> either the use of const there or the need to strdup is wrong.  So hope
> someone that knows this code better could chime in.
> 
> Carlo
> ------ >8 ------
> Subject: [PATCH] builtin/log: leaks from `git show` in t0000
> 
> obviously not ready, since the following will need to be corrected:
> 
>    revision.c:1496:8: warning: passing 'const char *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
>                    free(info->rev[i].name);
>                         ^~~~~~~~~~~~~~~~~
> 

Casting the pointer a la "free((void *) ...)" seems to be a common 
pattern in git, and seems like a reasonable option here. AFAIUI the 
const is still needed because clients  of rev_cmdline_info shouldn't be 
changing name. But since we own and created rev_cmdline_info, we also 
know it's safe to clean it up. For comparison, here's an example of 
submodule_entry being cleaned up - all members end up needing a cast:

static void free_one_config(struct submodule_entry *entry)
{
	free((void *) entry->config->path);
	free((void *) entry->config->name);
	free((void *) entry->config->branch);
	free((void *) entry->config->update_strategy.command);
	free(entry->config);
}

> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>   builtin/log.c |  8 ++++++--
>   revision.c    | 20 ++++++++++++++++++++
>   revision.h    |  5 +++++
>   3 files changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin/log.c b/builtin/log.c
> index f75d87e8d7..1b1c1f53f4 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -645,8 +645,10 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>   	opt.tweak = show_setup_revisions_tweak;
>   	cmd_log_init(argc, argv, prefix, &rev, &opt);
>   
> -	if (!rev.no_walk)
> -		return cmd_log_walk(&rev);
> +	if (!rev.no_walk) {
> +		ret = cmd_log_walk(&rev);
> +		goto done;
> +	}
>   
>   	count = rev.pending.nr;
>   	objects = rev.pending.objects;
> @@ -702,6 +704,8 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>   		}
>   	}
>   	free(objects);
> +done:
> +	repo_clear_revisions(&rev);
>   	return ret;
>   }
>   
> diff --git a/revision.c b/revision.c
> index 0dabb5a0bc..ce62192dd8 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1487,6 +1487,18 @@ static void add_rev_cmdline(struct rev_info *revs,
>   	info->nr++;
>   }
>   
> +static void clear_rev_cmdline(struct rev_info *revs)
> +{
> +	struct rev_cmdline_info *info = &revs->cmdline;
> +	size_t i, nr = info->nr;
> +
> +	for (i = 0; i < nr; i++)
> +		free(info->rev[i].name);
> +
> +	FREE_AND_NULL(info->rev);
> +	info->nr = info->alloc = 0;
> +}
> +
>   static void add_rev_cmdline_list(struct rev_info *revs,
>   				 struct commit_list *commit_list,
>   				 int whence,
> @@ -1845,6 +1857,14 @@ void repo_init_revisions(struct repository *r,
>   	init_display_notes(&revs->notes_opt);
>   }
>   
> +void repo_clear_revisions(struct rev_info *revs)
> +{
> +	if (revs->mailmap)
> +		clear_mailmap(revs->mailmap);
> +	FREE_AND_NULL(revs->mailmap);
> +	clear_rev_cmdline(revs);
> +}
> +
>   static void add_pending_commit_list(struct rev_info *revs,
>   				    struct commit_list *commit_list,
>   				    unsigned int flags)
> diff --git a/revision.h b/revision.h
> index 0c65a760ee..f695c41cee 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -358,6 +358,11 @@ void repo_init_revisions(struct repository *r,
>   			 struct rev_info *revs,
>   			 const char *prefix);
>   
> +/*
> + * Free all structures dynamically allocated for the provided rev_info
> + */
> +void repo_clear_revisions(struct rev_info *revs);
> +
>   /**
>    * Parse revision information, filling in the `rev_info` structure, and
>    * removing the used arguments from the argument list. Returns the number
> 

This patch looks good to me (modulo adding the cast as discussed above), 
and is obviously much better than my approach of using an UNLEAK!

ATB,

Andrzej

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks
  2021-09-18 20:06   ` Carlo Marcelo Arenas Belón
  2021-09-19 15:51     ` Andrzej Hunt
@ 2021-09-19 16:13     ` Ævar Arnfjörð Bjarmason
  2021-09-19 21:34       ` Carlo Marcelo Arenas Belón
  1 sibling, 1 reply; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-19 16:13 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón
  Cc: Andrzej Hunt via GitGitGadget, git, Andrzej Hunt, Andrzej Hunt


On Sat, Sep 18 2021, Carlo Marcelo Arenas Belón wrote:

> My equivalent version for these fixes is obviously more verbose but IMHO
> not that ugly (and as safe)
>
> It avoids the need to UNLEAK early by changing the program flow also for
> the early return so the cleanup could be centralized in one single
> function.
>
> Both, the cmdline and mailmap arrays (and the objects they accumulate)
> are cleaned in a "reusable" way.
>
> Note that the cleaning of the "name" in the cmdline item throws a warning
> as shown below which I intentionally didn't fix, as it would seem that
> either the use of const there or the need to strdup is wrong.  So hope
> someone that knows this code better could chime in.

It should just be a "char *", I got that wrong in my version posted in
the side-thread[1] & mentioned in the side-reply[2].

(I think I got it right in some earlier version days ago, it should be a
'char *' like anyting we malloc, but brainfart when re-doing/re-basing
those changes).

Yours here below has a bug where you free() rev_cmdline_info items, you
need to use release_revisions_cmdline_rev().

I should have said in [2], but thanks a lot to you and Andrzej for
following up on the mess in t0000-basic.sh addressed by my v7
re-roll. It'll be really nice to get some of these leaks fixed & tested
for.

I think I was rather curt in [2] after a long debugging session, just
saying I appreciate it. Hopefully we can figure out some plan for mostly
pulling in the same direction with regards to the way forward.

1. https://github.com/git/git/commit/06380cd4f56f4c542685eb7aa79e28285fe02c55
2. https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/

> Carlo
> ------ >8 ------
> Subject: [PATCH] builtin/log: leaks from `git show` in t0000
>
> obviously not ready, since the following will need to be corrected:
>
>   revision.c:1496:8: warning: passing 'const char *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
>                   free(info->rev[i].name);
>                        ^~~~~~~~~~~~~~~~~
>
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>  builtin/log.c |  8 ++++++--
>  revision.c    | 20 ++++++++++++++++++++
>  revision.h    |  5 +++++
>  3 files changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/log.c b/builtin/log.c
> index f75d87e8d7..1b1c1f53f4 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -645,8 +645,10 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  	opt.tweak = show_setup_revisions_tweak;
>  	cmd_log_init(argc, argv, prefix, &rev, &opt);
>  
> -	if (!rev.no_walk)
> -		return cmd_log_walk(&rev);
> +	if (!rev.no_walk) {
> +		ret = cmd_log_walk(&rev);
> +		goto done;
> +	}
>  
>  	count = rev.pending.nr;
>  	objects = rev.pending.objects;
> @@ -702,6 +704,8 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  		}
>  	}
>  	free(objects);
> +done:
> +	repo_clear_revisions(&rev);
>  	return ret;
>  }
>  
> diff --git a/revision.c b/revision.c
> index 0dabb5a0bc..ce62192dd8 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1487,6 +1487,18 @@ static void add_rev_cmdline(struct rev_info *revs,
>  	info->nr++;
>  }
>  
> +static void clear_rev_cmdline(struct rev_info *revs)
> +{
> +	struct rev_cmdline_info *info = &revs->cmdline;
> +	size_t i, nr = info->nr;
> +
> +	for (i = 0; i < nr; i++)
> +		free(info->rev[i].name);
> +
> +	FREE_AND_NULL(info->rev);
> +	info->nr = info->alloc = 0;
> +}
> +
>  static void add_rev_cmdline_list(struct rev_info *revs,
>  				 struct commit_list *commit_list,
>  				 int whence,
> @@ -1845,6 +1857,14 @@ void repo_init_revisions(struct repository *r,
>  	init_display_notes(&revs->notes_opt);
>  }
>  
> +void repo_clear_revisions(struct rev_info *revs)
> +{
> +	if (revs->mailmap)
> +		clear_mailmap(revs->mailmap);
> +	FREE_AND_NULL(revs->mailmap);
> +	clear_rev_cmdline(revs);
> +}
> +
>  static void add_pending_commit_list(struct rev_info *revs,
>  				    struct commit_list *commit_list,
>  				    unsigned int flags)
> diff --git a/revision.h b/revision.h
> index 0c65a760ee..f695c41cee 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -358,6 +358,11 @@ void repo_init_revisions(struct repository *r,
>  			 struct rev_info *revs,
>  			 const char *prefix);
>  
> +/*
> + * Free all structures dynamically allocated for the provided rev_info
> + */
> +void repo_clear_revisions(struct rev_info *revs);
> +
>  /**
>   * Parse revision information, filling in the `rev_info` structure, and
>   * removing the used arguments from the argument list. Returns the number


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks
  2021-09-19 16:13     ` Ævar Arnfjörð Bjarmason
@ 2021-09-19 21:34       ` Carlo Marcelo Arenas Belón
  2021-09-20  6:06         ` Eric Sunshine
  0 siblings, 1 reply; 15+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2021-09-19 21:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Andrzej Hunt via GitGitGadget, git, Andrzej Hunt, Andrzej Hunt,
	Michael Haggerty

On Sun, Sep 19, 2021 at 06:13:43PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Sep 18 2021, Carlo Marcelo Arenas Belón wrote:
> 
> > Note that the cleaning of the "name" in the cmdline item throws a warning
> > as shown below which I intentionally didn't fix, as it would seem that
> > either the use of const there or the need to strdup is wrong.  So hope
> > someone that knows this code better could chime in.
> 
> It should just be a "char *", I got that wrong in my version posted in
> the side-thread[1] & mentioned in the side-reply[2].

I was instead leaning towards keeping it as a "const char *" and removing
the strdup as shown in the patch below (obviously the last hunk not relevant
to your series).

This object doesn't hold or even manipulate, the objects it contains, so
it might be also a cleaner API to ensure it only keeps references and
doesn't own any in the more CS sense (note I am not a CS guy, so maybe I
get the concept here wrong).

Ironically the original patch that added the strdup was because of leak
related work, but I think that in this case might had gotten it backwards.

Even if we start holding pointers to names that are not static, I would
expect whoever created those buffers to own the data anyway.

Carlo

CC Michael for advise as the original author
------ >8 ------
Subject: [PATCH] revision: remove dup() of name in add_rev_cmdline()

df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
2013-05-25) adds it, probably introducing a leak.

All names we will ever get will either come from the commandline
or be pointers to a static buffer in hex.c, so it is safe not to
xstrdup and clean them up (just like the struct object *item).

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 revision.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/revision.c b/revision.c
index ce62192dd8..b20bc58ccd 100644
--- a/revision.c
+++ b/revision.c
@@ -1468,7 +1468,6 @@ static int limit_list(struct rev_info *revs)
 
 /*
  * Add an entry to refs->cmdline with the specified information.
- * *name is copied.
  */
 static void add_rev_cmdline(struct rev_info *revs,
 			    struct object *item,
@@ -1481,7 +1480,7 @@ static void add_rev_cmdline(struct rev_info *revs,
 
 	ALLOC_GROW(info->rev, nr + 1, info->alloc);
 	info->rev[nr].item = item;
-	info->rev[nr].name = xstrdup(name);
+	info->rev[nr].name = name;
 	info->rev[nr].whence = whence;
 	info->rev[nr].flags = flags;
 	info->nr++;
@@ -1490,10 +1489,6 @@ static void add_rev_cmdline(struct rev_info *revs,
 static void clear_rev_cmdline(struct rev_info *revs)
 {
 	struct rev_cmdline_info *info = &revs->cmdline;
-	size_t i, nr = info->nr;
-
-	for (i = 0; i < nr; i++)
-		free(info->rev[i].name);
 
 	FREE_AND_NULL(info->rev);
 	info->nr = info->alloc = 0;
-- 
2.33.0.911.gbe391d4e11


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks
  2021-09-19 21:34       ` Carlo Marcelo Arenas Belón
@ 2021-09-20  6:06         ` Eric Sunshine
  2021-09-20 21:39           ` Carlo Marcelo Arenas Belón
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Sunshine @ 2021-09-20  6:06 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón
  Cc: Ævar Arnfjörð Bjarmason,
	Andrzej Hunt via GitGitGadget, Git List, Andrzej Hunt,
	Andrzej Hunt, Michael Haggerty

On Sun, Sep 19, 2021 at 5:34 PM Carlo Marcelo Arenas Belón
<carenas@gmail.com> wrote:
> Subject: [PATCH] revision: remove dup() of name in add_rev_cmdline()
>
> df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
> 2013-05-25) adds it, probably introducing a leak.
>
> All names we will ever get will either come from the commandline
> or be pointers to a static buffer in hex.c, so it is safe not to
> xstrdup and clean them up (just like the struct object *item).

I haven't been following this thread closely, but the mention of the
static buffer in hex.c invalidates the premise of this patch, as far
as I can tell. The "static buffer" is actually a ring of four buffers
which oid_to_hex() uses, one after another, into which it formats an
OID as hex. This allows a caller to format up to -- and only up to --
four OIDs without worrying about allocating its own memory for the hex
result. Beyond four, the caller can't use oid_to_hex() without doing
some sort of memory management itself, whether that be duplicating the
result of oid_to_hex() or by allocating its own buffers and calling
oid_to_hex_r() instead.

In this particular case, one of the callers of add_rev_cmdline() is
add_rev_cmdline_list(), which does this:

    while (commit_list) {
        ...
        add_rev_cmdline(..., oid_to_hex(...), ...);
        ...
    }

which may call add_rev_cmdline() any number of times, quite possibly
more than four.

Therefore (if I'm reading this correctly), it is absolutely correct
for add_rev_cmdline() to be duplicating that string to ensure that the
hexified OID value remains valid, and incorrect for this patch to be
removing the call to xstrdup().

> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
> diff --git a/revision.c b/revision.c
> @@ -1481,7 +1480,7 @@ static void add_rev_cmdline(struct rev_info *revs,
>         info->rev[nr].item = item;
> -       info->rev[nr].name = xstrdup(name);
> +       info->rev[nr].name = name;
>         info->rev[nr].whence = whence;
> @@ -1490,10 +1489,6 @@ static void add_rev_cmdline(struct rev_info *revs,
>  static void clear_rev_cmdline(struct rev_info *revs)
>  {
>         struct rev_cmdline_info *info = &revs->cmdline;
> -       size_t i, nr = info->nr;
> -
> -       for (i = 0; i < nr; i++)
> -               free(info->rev[i].name);
>
>         FREE_AND_NULL(info->rev);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Squash leaks in t0000
  2021-09-18 13:49 [PATCH 0/2] Squash leaks in t0000 Andrzej Hunt via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-09-19 10:58 ` Ævar Arnfjörð Bjarmason
@ 2021-09-20 17:55 ` Junio C Hamano
  2021-09-21 23:01   ` Ævar Arnfjörð Bjarmason
  4 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2021-09-20 17:55 UTC (permalink / raw)
  To: Andrzej Hunt via GitGitGadget; +Cc: git, Carlo Arenas, Andrzej Hunt

"Andrzej Hunt via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Carlo points out that t0000 currently doesn't pass with leak-checking
> enabled in:
> https://public-inbox.org/git/CAPUEsphMUNYRACmK-nksotP1RrMn09mNGFdEHLLuNEWH4AcU7Q@mail.gmail.com/T/#m7e40220195d98aee4be7e8593d30094b88a6ee71
>
> Here's a series that I've sat on for a while, which adds some UNLEAK's to
> "fix" this situation - see the individual patches for a justification of why
> an UNLEAK seems appropriate.

It seems that discussion on 1/2 seemed to be heading in an
improvement but has petered out?  

I think the simplest fix in these two patches are worth taking, even
if we plan to further improve either by refining the granularity of
UNLEAK application or by introducing repo_clear_revisions() as Carlo
mentions (which is a preferred way to do this if we can manage it),
on top.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks
  2021-09-20  6:06         ` Eric Sunshine
@ 2021-09-20 21:39           ` Carlo Marcelo Arenas Belón
  2021-09-21  3:09             ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2021-09-20 21:39 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Ævar Arnfjörð Bjarmason,
	Andrzej Hunt via GitGitGadget, Git List, Andrzej Hunt,
	Andrzej Hunt, Michael Haggerty

On Mon, Sep 20, 2021 at 02:06:01AM -0400, Eric Sunshine wrote:
> On Sun, Sep 19, 2021 at 5:34 PM Carlo Marcelo Arenas Belón
> <carenas@gmail.com> wrote:
> > Subject: [PATCH] revision: remove dup() of name in add_rev_cmdline()
> >
> > df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
> > 2013-05-25) adds it, probably introducing a leak.
> >
> > All names we will ever get will either come from the commandline
> > or be pointers to a static buffer in hex.c, so it is safe not to
> > xstrdup and clean them up (just like the struct object *item).
> 
> I haven't been following this thread closely, but the mention of the
> static buffer in hex.c invalidates the premise of this patch, as far
> as I can tell. The "static buffer" is actually a ring of four buffers
> which oid_to_hex() uses, one after another, into which it formats an
> OID as hex. This allows a caller to format up to -- and only up to --
> four OIDs without worrying about allocating its own memory for the hex
> result. Beyond four, the caller can't use oid_to_hex() without doing
> some sort of memory management itself, whether that be duplicating the
> result of oid_to_hex() or by allocating its own buffers and calling
> oid_to_hex_r() instead.

Thanks; this then explains why as I was suspecting add_rev_cmdline_list()
was indeed buggy, and might had even triggered the workaround of doing the
strdup.

> Therefore (if I'm reading this correctly), it is absolutely correct
> for add_rev_cmdline() to be duplicating that string to ensure that the
> hexified OID value remains valid, and incorrect for this patch to be
> removing the call to xstrdup().

Indeed, but the values that are being strdup were never used anyway, so
I suspect the original code might had just put it as a logical default.

We might do better instead as shown in the following patch (again, second
hunk not relevant for the current code); I suspect if we were to land this,
the last hunks probably should be done first in an independent patch, as
well.

Carlo
-------- >8 --------
Subject: [PATCH] revision: remove xstrdup() of name in add_rev_cmdline()

a765499a08 (revision.c: treat A...B merge bases as if manually
specified, 2013-05-13) adds calls to this function in a loop,
abusing oid_to_hex (at that time called sha1_to_hex).

df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
2013-05-25) adds the strdup, introducing a leak.

All names we will ever get should come from the commandline or be
constant values, so it is safe not to xstrdup and clean them up.

Just like the struct object *item, that is referenced in the same
struct, name isn't owned or managed so correct both issues by making
sure all entries are indeed constant or valid global pointers (from
the real command line) and remove the leak.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 revision.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/revision.c b/revision.c
index ce62192dd8..829af28658 100644
--- a/revision.c
+++ b/revision.c
@@ -1468,7 +1468,6 @@ static int limit_list(struct rev_info *revs)
 
 /*
  * Add an entry to refs->cmdline with the specified information.
- * *name is copied.
  */
 static void add_rev_cmdline(struct rev_info *revs,
 			    struct object *item,
@@ -1481,7 +1480,7 @@ static void add_rev_cmdline(struct rev_info *revs,
 
 	ALLOC_GROW(info->rev, nr + 1, info->alloc);
 	info->rev[nr].item = item;
-	info->rev[nr].name = xstrdup(name);
+	info->rev[nr].name = name;
 	info->rev[nr].whence = whence;
 	info->rev[nr].flags = flags;
 	info->nr++;
@@ -1490,10 +1489,6 @@ static void add_rev_cmdline(struct rev_info *revs,
 static void clear_rev_cmdline(struct rev_info *revs)
 {
 	struct rev_cmdline_info *info = &revs->cmdline;
-	size_t i, nr = info->nr;
-
-	for (i = 0; i < nr; i++)
-		free(info->rev[i].name);
 
 	FREE_AND_NULL(info->rev);
 	info->nr = info->alloc = 0;
@@ -1504,10 +1499,10 @@ static void add_rev_cmdline_list(struct rev_info *revs,
 				 int whence,
 				 unsigned flags)
 {
+	static const char *synthetic = ".synthetic";
 	while (commit_list) {
 		struct object *object = &commit_list->item->object;
-		add_rev_cmdline(revs, object, oid_to_hex(&object->oid),
-				whence, flags);
+		add_rev_cmdline(revs, object, synthetic, whence, flags);
 		commit_list = commit_list->next;
 	}
 }
@@ -1753,7 +1748,7 @@ struct add_alternate_refs_data {
 static void add_one_alternate_ref(const struct object_id *oid,
 				  void *vdata)
 {
-	const char *name = ".alternate";
+	static const char *name = ".alternate";
 	struct add_alternate_refs_data *data = vdata;
 	struct object *obj;
 
@@ -1940,7 +1935,7 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
 			   struct object_context *a_oc,
 			   struct object_context *b_oc)
 {
-	const char *a_name, *b_name;
+	static const char *a_name, *b_name;
 	struct object_id a_oid, b_oid;
 	struct object *a_obj, *b_obj;
 	unsigned int a_flags, b_flags;
-- 
2.33.0.911.gbe391d4e11


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks
  2021-09-20 21:39           ` Carlo Marcelo Arenas Belón
@ 2021-09-21  3:09             ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2021-09-21  3:09 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón
  Cc: Eric Sunshine, Ævar Arnfjörð Bjarmason,
	Andrzej Hunt via GitGitGadget, Git List, Andrzej Hunt,
	Andrzej Hunt, Michael Haggerty

On Mon, Sep 20, 2021 at 02:39:12PM -0700, Carlo Marcelo Arenas Belón wrote:

> > Therefore (if I'm reading this correctly), it is absolutely correct
> > for add_rev_cmdline() to be duplicating that string to ensure that the
> > hexified OID value remains valid, and incorrect for this patch to be
> > removing the call to xstrdup().
> 
> Indeed, but the values that are being strdup were never used anyway, so
> I suspect the original code might had just put it as a logical default.

We do look at them in a few cases, like "fast-export", but only if they
are not marked UNINTERESTING. And add_rev_cmdline_list(), the variant
that writes the hex values, only ever gets called with the UNINTERESTING
flag.

So I think you're right that these ones would never be seen. I did
wonder if we'd ever show them with "log --source" or similar, but that
pulls the name from object_array_entry, I think.

> -------- >8 --------
> Subject: [PATCH] revision: remove xstrdup() of name in add_rev_cmdline()
> 
> a765499a08 (revision.c: treat A...B merge bases as if manually
> specified, 2013-05-13) adds calls to this function in a loop,
> abusing oid_to_hex (at that time called sha1_to_hex).
> 
> df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
> 2013-05-25) adds the strdup, introducing a leak.
> 
> All names we will ever get should come from the commandline or be
> constant values, so it is safe not to xstrdup and clean them up.

This last paragraph is questionable, I think. We feed the argv from
setup_revisions() here, but that is not always coming from the actual
command line. Most cases seem to finish with the traversal before what
they've passed in goes out of scope, but not all. The call in
bisect_rev_setup() intentionally leaks the strvec (even though it
doesn't need to do so with the current code). The one in
cmd__fast_rebase() does clear its strvec after setup_revisions() but
before the actual traversal. I wouldn't be surprised if your patch
triggered memory problems there.

> @@ -1504,10 +1499,10 @@ static void add_rev_cmdline_list(struct rev_info *revs,
>  				 int whence,
>  				 unsigned flags)
>  {
> +	static const char *synthetic = ".synthetic";

I don't think there's any point in making this static. It's not an
array, but rather a pointer to a string literal. That string literal
will remain valid regardless (the standard does not guarantee we get the
_same_ string literal every time, but that doesn't matter for our
purposes. And in practice it will be the same one).

> @@ -1753,7 +1748,7 @@ struct add_alternate_refs_data {
>  static void add_one_alternate_ref(const struct object_id *oid,
>  				  void *vdata)
>  {
> -	const char *name = ".alternate";
> +	static const char *name = ".alternate";
>  	struct add_alternate_refs_data *data = vdata;
>  	struct object *obj;

Ditto here.

> @@ -1940,7 +1935,7 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
>  			   struct object_context *a_oc,
>  			   struct object_context *b_oc)
>  {
> -	const char *a_name, *b_name;
> +	static const char *a_name, *b_name;
>  	struct object_id a_oid, b_oid;
>  	struct object *a_obj, *b_obj;
>  	unsigned int a_flags, b_flags;

And I don't see how this changes anything at all. Our pointers will live
on, but the memory they point to is not affected.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Squash leaks in t0000
  2021-09-20 17:55 ` Junio C Hamano
@ 2021-09-21 23:01   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-21 23:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Andrzej Hunt via GitGitGadget, git, Carlo Arenas, Andrzej Hunt


On Mon, Sep 20 2021, Junio C Hamano wrote:

> "Andrzej Hunt via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> Carlo points out that t0000 currently doesn't pass with leak-checking
>> enabled in:
>> https://public-inbox.org/git/CAPUEsphMUNYRACmK-nksotP1RrMn09mNGFdEHLLuNEWH4AcU7Q@mail.gmail.com/T/#m7e40220195d98aee4be7e8593d30094b88a6ee71
>>
>> Here's a series that I've sat on for a while, which adds some UNLEAK's to
>> "fix" this situation - see the individual patches for a justification of why
>> an UNLEAK seems appropriate.
>
> It seems that discussion on 1/2 seemed to be heading in an
> improvement but has petered out?  
>
> I think the simplest fix in these two patches are worth taking, even
> if we plan to further improve either by refining the granularity of
> UNLEAK application or by introducing repo_clear_revisions() as Carlo
> mentions (which is a preferred way to do this if we can manage it),
> on top.

I think per Andrzej's own [1] it's best to not pick up this series.

I've got a lot of memory leak fixes queued up locally, I'm just waiting
on the SANITIZE=leak CI mode to land on master so I can add new tests to
the whitelist as I fix the memory leaks, that includes "real" fixes for
the ones Andrzej's added "UNLEAK()"'s for here.

Hence my meniton of this sort of thing being counter-productive[2],
i.e. I'd need to monkeypatch revert this on top just to make sure I was
still finding leaks that are hidden by these new UNLEAK() (which hide
some really common ones).

1. https://lore.kernel.org/git/05754f9c-cd58-30f5-e2d3-58b9221d2770@ahunt.org/
2. https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-09-21 23:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-18 13:49 [PATCH 0/2] Squash leaks in t0000 Andrzej Hunt via GitGitGadget
2021-09-18 13:49 ` [PATCH 1/2] log: UNLEAK rev to silence a large number of leaks Andrzej Hunt via GitGitGadget
2021-09-18 20:06   ` Carlo Marcelo Arenas Belón
2021-09-19 15:51     ` Andrzej Hunt
2021-09-19 16:13     ` Ævar Arnfjörð Bjarmason
2021-09-19 21:34       ` Carlo Marcelo Arenas Belón
2021-09-20  6:06         ` Eric Sunshine
2021-09-20 21:39           ` Carlo Marcelo Arenas Belón
2021-09-21  3:09             ` Jeff King
2021-09-18 13:49 ` [PATCH 2/2] log: UNLEAK original pending objects Andrzej Hunt via GitGitGadget
2021-09-18 17:28 ` [PATCH 0/2] Squash leaks in t0000 Carlo Arenas
2021-09-19 15:38   ` Andrzej Hunt
2021-09-19 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-20 17:55 ` Junio C Hamano
2021-09-21 23:01   ` Ævar Arnfjörð Bjarmason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.