* [PATCH v2 1/6] bisect.c: add missing "goto" for release_revisions()
2022-07-29 8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
@ 2022-07-29 8:31 ` Ævar Arnfjörð Bjarmason
2022-07-29 8:31 ` [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
` (6 subsequent siblings)
7 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29 8:31 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason
Add a missing "goto cleanup", this fixes a bug in
f196c1e908d (revisions API users: use release_revisions() needing
REV_INFO_INIT, 2022-04-13).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
bisect.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/bisect.c b/bisect.c
index b63669cc9d7..421470bfa59 100644
--- a/bisect.c
+++ b/bisect.c
@@ -1054,7 +1054,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
*/
res = error_if_skipped_commits(tried, NULL);
if (res < 0)
- return res;
+ goto cleanup;
printf(_("%s was both %s and %s\n"),
oid_to_hex(current_bad_oid),
term_good,
--
2.37.1.1196.g8af3636bc64
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again)
2022-07-29 8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2022-07-29 8:31 ` [PATCH v2 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
@ 2022-07-29 8:31 ` Ævar Arnfjörð Bjarmason
2022-07-30 5:22 ` Eric Sunshine
2022-07-29 8:31 ` [PATCH v2 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
` (5 subsequent siblings)
7 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29 8:31 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason
Fix a bug in 0139c58ab95 (revisions API users: add "goto cleanup" for
release_revisions(), 2022-04-13), in that commit a release_revisions()
call was added to this function, but it never did anything due to this
TODO memset() added in fe1a21d5267 (fast-rebase: demonstrate
merge-ort's API via new test-tool command, 2020-10-29).
Simply removing the memset() will fix the "cmdline" which can be seen
when running t5520-pull.sh.
This sort of thing could be detected automatically with a rule similar
to the unused.cocci merged in 7fa60d2a5b6 (Merge branch
'ab/cocci-unused' into next, 2022-07-11). The following rule on top
would catch the case being fixed here:
@@
type T;
identifier I;
identifier REL1 =~ "^[a-z_]*_(release|reset|clear|free)$";
identifier REL2 =~ "^(release|clear|free)_[a-z_]*$";
@@
- memset(\( I \| &I \), 0, ...);
... when != \( I \| &I \)
(
\( REL1 \| REL2 \)( \( I \| &I \), ...);
|
\( REL1 \| REL2 \)( \( &I \| I \) );
)
... when != \( I \| &I \)
That rule should arguably use only &I, not I (as we might be passed a
pointer). He distinction would matter if anyone cared about the
side-effects of a memset() followed by release() of a pointer to a
variable passed into the function.
As such a pattern would be at best very confusing, and most likely
point to buggy code as in this case, the above rule is probably fine
as-is.
But as this rule only found one such bug in the entire codebase let's
not add it to contrib/coccinelle/unused.cocci for now, we can always
dig it up in the future if it's deemed useful.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/helper/test-fast-rebase.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/t/helper/test-fast-rebase.c b/t/helper/test-fast-rebase.c
index 4e5553e2024..45665ec19a5 100644
--- a/t/helper/test-fast-rebase.c
+++ b/t/helper/test-fast-rebase.c
@@ -184,8 +184,6 @@ int cmd__fast_rebase(int argc, const char **argv)
last_picked_commit = commit;
last_commit = create_commit(result.tree, commit, last_commit);
}
- /* TODO: There should be some kind of rev_info_free(&revs) call... */
- memset(&revs, 0, sizeof(revs));
merge_switch_to_result(&merge_opt, head_tree, &result, 1, !result.clean);
--
2.37.1.1196.g8af3636bc64
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again)
2022-07-29 8:31 ` [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
@ 2022-07-30 5:22 ` Eric Sunshine
0 siblings, 0 replies; 52+ messages in thread
From: Eric Sunshine @ 2022-07-30 5:22 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Git List, Junio C Hamano, Jeff King
On Fri, Jul 29, 2022 at 4:33 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Fix a bug in 0139c58ab95 (revisions API users: add "goto cleanup" for
> release_revisions(), 2022-04-13), in that commit a release_revisions()
> call was added to this function, but it never did anything due to this
> TODO memset() added in fe1a21d5267 (fast-rebase: demonstrate
> merge-ort's API via new test-tool command, 2020-10-29).
>
> Simply removing the memset() will fix the "cmdline" which can be seen
> when running t5520-pull.sh.
>
> This sort of thing could be detected automatically with a rule similar
> to the unused.cocci merged in 7fa60d2a5b6 (Merge branch
> 'ab/cocci-unused' into next, 2022-07-11). The following rule on top
> would catch the case being fixed here:
>
> @@
> type T;
> identifier I;
> identifier REL1 =~ "^[a-z_]*_(release|reset|clear|free)$";
> identifier REL2 =~ "^(release|clear|free)_[a-z_]*$";
> @@
>
> - memset(\( I \| &I \), 0, ...);
> ... when != \( I \| &I \)
> (
> \( REL1 \| REL2 \)( \( I \| &I \), ...);
> |
> \( REL1 \| REL2 \)( \( &I \| I \) );
> )
> ... when != \( I \| &I \)
>
> That rule should arguably use only &I, not I (as we might be passed a
> pointer). He distinction would matter if anyone cared about the
s/He/The/
> side-effects of a memset() followed by release() of a pointer to a
> variable passed into the function.
>
> As such a pattern would be at best very confusing, and most likely
> point to buggy code as in this case, the above rule is probably fine
> as-is.
>
> But as this rule only found one such bug in the entire codebase let's
> not add it to contrib/coccinelle/unused.cocci for now, we can always
> dig it up in the future if it's deemed useful.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v2 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer
2022-07-29 8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2022-07-29 8:31 ` [PATCH v2 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
2022-07-29 8:31 ` [PATCH v2 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
@ 2022-07-29 8:31 ` Ævar Arnfjörð Bjarmason
2022-07-29 18:44 ` Junio C Hamano
2022-07-29 8:31 ` [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
` (4 subsequent siblings)
7 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29 8:31 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason
Adjust code added in 5d7eeee2ac6 (git-show: grok blobs, trees and
tags, too, 2006-12-14) to use the "memcpy a &blank" idiom introduced
in 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT
macro, 2021-07-01).
Now the types in play are guaranteed to correspond, i.e. we used "int"
here for the "count" before, but the corresponding "nr" is an
"unsigned int". By using a "blank" object we almost entirely bypass
that, we'll only need to declare our own "unsigned int i".
There are no functional changes here aside from potential overflow
guard rails, the structure only has these three members ("nr", "alloc"
and "objects"), but now we're obviously future-proof against assuming
that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/log.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/builtin/log.c b/builtin/log.c
index 88a5e98875a..6135f8191a9 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -668,10 +668,12 @@ static void show_setup_revisions_tweak(struct rev_info *rev,
int cmd_show(int argc, const char **argv, const char *prefix)
{
struct rev_info rev;
- struct object_array_entry *objects;
+ struct object_array blank = OBJECT_ARRAY_INIT;
+ struct object_array pending = OBJECT_ARRAY_INIT;
+ unsigned int i;
struct setup_revision_opt opt;
struct pathspec match_all;
- int i, count, ret = 0;
+ int ret = 0;
init_log_defaults();
git_config(git_log_config, NULL);
@@ -698,12 +700,11 @@ int cmd_show(int argc, const char **argv, const char *prefix)
if (!rev.no_walk)
return cmd_log_deinit(cmd_log_walk(&rev), &rev);
- count = rev.pending.nr;
- objects = rev.pending.objects;
+ memcpy(&pending, &rev.pending, sizeof(rev.pending));
rev.diffopt.no_free = 1;
- for (i = 0; i < count && !ret; i++) {
- struct object *o = objects[i].item;
- const char *name = objects[i].name;
+ for (i = 0; i < pending.nr && !ret; i++) {
+ struct object *o = pending.objects[i].item;
+ const char *name = pending.objects[i].name;
switch (o->type) {
case OBJ_BLOB:
ret = show_blob_object(&o->oid, &rev, name);
@@ -726,7 +727,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
if (!o)
ret = error(_("could not read object %s"),
oid_to_hex(oid));
- objects[i].item = o;
+ pending.objects[i].item = o;
i--;
break;
}
@@ -743,8 +744,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
rev.shown_one = 1;
break;
case OBJ_COMMIT:
- rev.pending.nr = rev.pending.alloc = 0;
- rev.pending.objects = NULL;
+ memcpy(&rev.pending, &blank, sizeof(rev.pending));
add_object_array(o, name, &rev.pending);
ret = cmd_log_walk_no_free(&rev);
break;
--
2.37.1.1196.g8af3636bc64
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show"
2022-07-29 8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
` (2 preceding siblings ...)
2022-07-29 8:31 ` [PATCH v2 3/6] log: make the intent of cmd_show()'s "rev.pending" juggling clearer Ævar Arnfjörð Bjarmason
@ 2022-07-29 8:31 ` Ævar Arnfjörð Bjarmason
2022-07-29 18:55 ` Jeff King
2022-07-29 18:59 ` Jeff King
2022-07-29 8:31 ` [PATCH v2 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
` (3 subsequent siblings)
7 siblings, 2 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29 8:31 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason
Fix a very common memory leak introduced in 5d7eeee2ac6 (git-show: grok blobs,
trees and tags, too, 2006-12-14).
When "git show" displays commits it needs to temporarily clobbers the
"rev.pending" array, but by doing so we'll fail to
release_revisions(), as we have for most other uses of "struct
rev_info" since 2da81d1efb0 (Merge branch 'ab/plug-leak-in-revisions',
2022-06-07).
In the preceding commit this code was made to use a more extendable
pattern, which we can now complete. Once we've clobbered our
"rev.pending" and invoked "cmd_log_walk_no_free()" we need to
"object_array_clear()" our old "rev.pending".
So in addition to the now-populated &pending array, to avoid leaking
the memory related to the one member array we've created. The
&rev.pending was already being object_array_clear()'d by the
release_revisions() added in f6bfea0ad01 (revisions API users: use
release_revisions() in builtin/log.c, 2022-04-13), now we'll also
correctly free the previous data (f6bfea0ad01 noted this memory leak
as an outstanding TODO).
This way of doing it is making assumptions about the internals of
"struct rev_info", in particular that the "pending" member is
stand-alone, and that no other state is referring to it. But we've
been making those assumptions ever since 5d7eeee2ac6 (git-show: grok
blobs, trees and tags, too, 2006-12-14).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/log.c | 3 ++-
t/t0203-gettext-setlocale-sanity.sh | 1 +
t/t1020-subdirectory.sh | 1 +
t/t3307-notes-man.sh | 1 +
t/t3920-crlf-messages.sh | 2 ++
t/t4069-remerge-diff.sh | 1 +
t/t7007-show.sh | 1 +
t/t7503-pre-commit-and-pre-merge-commit-hooks.sh | 1 +
t/t9122-git-svn-author.sh | 1 -
t/t9162-git-svn-dcommit-interactive.sh | 1 -
10 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/builtin/log.c b/builtin/log.c
index 6135f8191a9..5c7e254e994 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -701,6 +701,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
return cmd_log_deinit(cmd_log_walk(&rev), &rev);
memcpy(&pending, &rev.pending, sizeof(rev.pending));
+ memcpy(&rev.pending, &blank, sizeof(rev.pending));
rev.diffopt.no_free = 1;
for (i = 0; i < pending.nr && !ret; i++) {
struct object *o = pending.objects[i].item;
@@ -744,7 +745,6 @@ int cmd_show(int argc, const char **argv, const char *prefix)
rev.shown_one = 1;
break;
case OBJ_COMMIT:
- memcpy(&rev.pending, &blank, sizeof(rev.pending));
add_object_array(o, name, &rev.pending);
ret = cmd_log_walk_no_free(&rev);
break;
@@ -752,6 +752,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
ret = error(_("unknown type: %d"), o->type);
}
}
+ object_array_clear(&pending);
rev.diffopt.no_free = 0;
diff_free(&rev.diffopt);
diff --git a/t/t0203-gettext-setlocale-sanity.sh b/t/t0203-gettext-setlocale-sanity.sh
index 0ce1f22eff6..86cff324ff1 100755
--- a/t/t0203-gettext-setlocale-sanity.sh
+++ b/t/t0203-gettext-setlocale-sanity.sh
@@ -5,6 +5,7 @@
test_description="The Git C functions aren't broken by setlocale(3)"
+TEST_PASSES_SANITIZE_LEAK=true
. ./lib-gettext.sh
test_expect_success 'git show a ISO-8859-1 commit under C locale' '
diff --git a/t/t1020-subdirectory.sh b/t/t1020-subdirectory.sh
index 9fdbb2af80e..45eef9457fe 100755
--- a/t/t1020-subdirectory.sh
+++ b/t/t1020-subdirectory.sh
@@ -6,6 +6,7 @@
test_description='Try various core-level commands in subdirectory.
'
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
. "$TEST_DIRECTORY"/lib-read-tree.sh
diff --git a/t/t3307-notes-man.sh b/t/t3307-notes-man.sh
index 1aa366a410e..ae316502c45 100755
--- a/t/t3307-notes-man.sh
+++ b/t/t3307-notes-man.sh
@@ -4,6 +4,7 @@ test_description='Examples from the git-notes man page
Make sure the manual is not full of lies.'
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
test_expect_success 'setup' '
diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index 0276edbe3d3..4c661d4d54a 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -1,6 +1,8 @@
#!/bin/sh
test_description='Test ref-filter and pretty APIs for commit and tag messages using CRLF'
+
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
LIB_CRLF_BRANCHES=""
diff --git a/t/t4069-remerge-diff.sh b/t/t4069-remerge-diff.sh
index 35f94957fce..9e7cac68b1c 100755
--- a/t/t4069-remerge-diff.sh
+++ b/t/t4069-remerge-diff.sh
@@ -2,6 +2,7 @@
test_description='remerge-diff handling'
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
# This test is ort-specific
diff --git a/t/t7007-show.sh b/t/t7007-show.sh
index d6cc69e0f2c..f908a4d1abc 100755
--- a/t/t7007-show.sh
+++ b/t/t7007-show.sh
@@ -2,6 +2,7 @@
test_description='git show'
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
test_expect_success setup '
diff --git a/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh b/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
index ad1eb64ba0d..aa004b70a8d 100755
--- a/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
+++ b/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
@@ -5,6 +5,7 @@ test_description='pre-commit and pre-merge-commit hooks'
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
test_expect_success 'root commit' '
diff --git a/t/t9122-git-svn-author.sh b/t/t9122-git-svn-author.sh
index 527ba3d2932..0fc289ae0f0 100755
--- a/t/t9122-git-svn-author.sh
+++ b/t/t9122-git-svn-author.sh
@@ -2,7 +2,6 @@
test_description='git svn authorship'
-TEST_FAILS_SANITIZE_LEAK=true
. ./lib-git-svn.sh
test_expect_success 'setup svn repository' '
diff --git a/t/t9162-git-svn-dcommit-interactive.sh b/t/t9162-git-svn-dcommit-interactive.sh
index e2aa8ed88a9..b3ce033a0d3 100755
--- a/t/t9162-git-svn-dcommit-interactive.sh
+++ b/t/t9162-git-svn-dcommit-interactive.sh
@@ -4,7 +4,6 @@
test_description='git svn dcommit --interactive series'
-TEST_FAILS_SANITIZE_LEAK=true
. ./lib-git-svn.sh
test_expect_success 'initialize repo' '
--
2.37.1.1196.g8af3636bc64
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show"
2022-07-29 8:31 ` [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
@ 2022-07-29 18:55 ` Jeff King
2022-07-29 19:07 ` Jeff King
2022-07-29 18:59 ` Jeff King
1 sibling, 1 reply; 52+ messages in thread
From: Jeff King @ 2022-07-29 18:55 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano
On Fri, Jul 29, 2022 at 10:31:06AM +0200, Ævar Arnfjörð Bjarmason wrote:
> @@ -701,6 +701,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
> return cmd_log_deinit(cmd_log_walk(&rev), &rev);
>
> memcpy(&pending, &rev.pending, sizeof(rev.pending));
> + memcpy(&rev.pending, &blank, sizeof(rev.pending));
> rev.diffopt.no_free = 1;
> for (i = 0; i < pending.nr && !ret; i++) {
> struct object *o = pending.objects[i].item;
OK, so now "pending" is completely separate from what's in "rev". And
then in the later hunk we clear it:
> @@ -752,6 +752,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
> ret = error(_("unknown type: %d"), o->type);
> }
> }
> + object_array_clear(&pending);
Good. But in the middle hunk...
> @@ -744,7 +745,6 @@ int cmd_show(int argc, const char **argv, const char *prefix)
> rev.shown_one = 1;
> break;
> case OBJ_COMMIT:
> - memcpy(&rev.pending, &blank, sizeof(rev.pending));
> add_object_array(o, name, &rev.pending);
> ret = cmd_log_walk_no_free(&rev);
> break;
We now do not do anything to clean up rev.pending. On the first pass,
we'd see our blank pending array and add to it. But on a subsequent pass
(i.e., because we are showing two or more commits), what will we see?
My initial assumption was we'd have the last pass's commit in "pending"
here, so we'd be leaking it. But I think in practice it's OK. We end up
in prepare_revision_walk(), which blanks the list again, and then
processes each element. Non-commits _do_ end up back in the pending
list, which would be a leak. But we know that this code triggers only
for commits, which are placed only in the "commits" list (and that's
cleaned up as we walk it via get_revision()).
That might be worth mentioning in the commit message, or even including
a comment like:
/*
* No need to clean up rev.pending here; commits are removed from it
* as part of prepare_revision_walk().
*/
-Peff
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show"
2022-07-29 18:55 ` Jeff King
@ 2022-07-29 19:07 ` Jeff King
0 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-07-29 19:07 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano
On Fri, Jul 29, 2022 at 02:55:35PM -0400, Jeff King wrote:
> > @@ -744,7 +745,6 @@ int cmd_show(int argc, const char **argv, const char *prefix)
> > rev.shown_one = 1;
> > break;
> > case OBJ_COMMIT:
> > - memcpy(&rev.pending, &blank, sizeof(rev.pending));
> > add_object_array(o, name, &rev.pending);
> > ret = cmd_log_walk_no_free(&rev);
> > break;
>
> We now do not do anything to clean up rev.pending. On the first pass,
> we'd see our blank pending array and add to it. But on a subsequent pass
> (i.e., because we are showing two or more commits), what will we see?
>
> My initial assumption was we'd have the last pass's commit in "pending"
> here, so we'd be leaking it. But I think in practice it's OK. We end up
> in prepare_revision_walk(), which blanks the list again, and then
> processes each element. Non-commits _do_ end up back in the pending
> list, which would be a leak. But we know that this code triggers only
> for commits, which are placed only in the "commits" list (and that's
> cleaned up as we walk it via get_revision()).
Sorry, just one more clarification here.
The "so we'd be leaking it" in the second paragraph is really: so before
this patch, we'd have been leaking it writing over it with blank. After
this patch, we'd be building up an ever-growing list of pending objects,
and showing a quadratic number of entries. ;)
-Peff
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show"
2022-07-29 8:31 ` [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
2022-07-29 18:55 ` Jeff King
@ 2022-07-29 18:59 ` Jeff King
1 sibling, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-07-29 18:59 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano
On Fri, Jul 29, 2022 at 10:31:06AM +0200, Ævar Arnfjörð Bjarmason wrote:
> So in addition to the now-populated &pending array, to avoid leaking
> the memory related to the one member array we've created. The
> &rev.pending was already being object_array_clear()'d by the
> release_revisions() added in f6bfea0ad01 (revisions API users: use
> release_revisions() in builtin/log.c, 2022-04-13), now we'll also
> correctly free the previous data (f6bfea0ad01 noted this memory leak
> as an outstanding TODO).
I couldn't quite parse the first sentence. But I think this explanation
is proceeding along a wrong assumption (which is the same one I had when
reviewing v1) that we are leaking the one-member array. I think this
paragraph can be replaced with something along the lines of what I
mentioned in my other reply.
-Peff
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v2 5/6] bisect.c: partially fix bisect_rev_setup() memory leak
2022-07-29 8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
` (3 preceding siblings ...)
2022-07-29 8:31 ` [PATCH v2 4/6] log: fix common "rev.pending" memory leak in "git show" Ævar Arnfjörð Bjarmason
@ 2022-07-29 8:31 ` Ævar Arnfjörð Bjarmason
2022-07-29 8:31 ` [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
` (2 subsequent siblings)
7 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29 8:31 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason
Partially fix the memory leak noted in in 8a534b61241 (bisect: use
argv_array API, 2011-09-13), which added the "XXX" comment seen in the
context. We can partially fix it by having the bisect_rev_setup()
function take a "struct strvec", rather than constructing it.
As the comment notes we need to keep the construct "rev_argv" around
while the "struct rev_info" is around, which as seen in the newly
added "strvec_clear()" calls here we do after "release_revisions()".
This "partially" fixes the memory leak because we're leaking the "--"
added to the "rev_argv" here still, which will be addressed in a
subsequent commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
bisect.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/bisect.c b/bisect.c
index 421470bfa59..6afb98be7a1 100644
--- a/bisect.c
+++ b/bisect.c
@@ -648,11 +648,11 @@ static struct commit_list *managed_skipped(struct commit_list *list,
}
static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
+ struct strvec *rev_argv,
const char *prefix,
const char *bad_format, const char *good_format,
int read_paths)
{
- struct strvec rev_argv = STRVEC_INIT;
int i;
repo_init_revisions(r, revs, prefix);
@@ -660,16 +660,16 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
revs->commit_format = CMIT_FMT_UNSPECIFIED;
/* rev_argv.argv[0] will be ignored by setup_revisions */
- strvec_push(&rev_argv, "bisect_rev_setup");
- strvec_pushf(&rev_argv, bad_format, oid_to_hex(current_bad_oid));
+ strvec_push(rev_argv, "bisect_rev_setup");
+ strvec_pushf(rev_argv, bad_format, oid_to_hex(current_bad_oid));
for (i = 0; i < good_revs.nr; i++)
- strvec_pushf(&rev_argv, good_format,
+ strvec_pushf(rev_argv, good_format,
oid_to_hex(good_revs.oid + i));
- strvec_push(&rev_argv, "--");
+ strvec_push(rev_argv, "--");
if (read_paths)
- read_bisect_paths(&rev_argv);
+ read_bisect_paths(rev_argv);
- setup_revisions(rev_argv.nr, rev_argv.v, revs, NULL);
+ setup_revisions(rev_argv->nr, rev_argv->v, revs, NULL);
/* XXX leak rev_argv, as "revs" may still be pointing to it */
}
@@ -873,10 +873,11 @@ static enum bisect_error check_merge_bases(int rev_nr, struct commit **rev, int
static int check_ancestors(struct repository *r, int rev_nr,
struct commit **rev, const char *prefix)
{
+ struct strvec rev_argv = STRVEC_INIT;
struct rev_info revs;
int res;
- bisect_rev_setup(r, &revs, prefix, "^%s", "%s", 0);
+ bisect_rev_setup(r, &revs, &rev_argv, prefix, "^%s", "%s", 0);
bisect_common(&revs);
res = (revs.commits != NULL);
@@ -885,6 +886,7 @@ static int check_ancestors(struct repository *r, int rev_nr,
clear_commit_marks_many(rev_nr, rev, ALL_REV_FLAGS);
release_revisions(&revs);
+ strvec_clear(&rev_argv);
return res;
}
@@ -1010,6 +1012,7 @@ void read_bisect_terms(const char **read_bad, const char **read_good)
*/
enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
{
+ struct strvec rev_argv = STRVEC_INIT;
struct rev_info revs = REV_INFO_INIT;
struct commit_list *tried;
int reaches = 0, all = 0, nr, steps;
@@ -1037,7 +1040,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
if (res)
goto cleanup;
- bisect_rev_setup(r, &revs, prefix, "%s", "^%s", 1);
+ bisect_rev_setup(r, &revs, &rev_argv, prefix, "%s", "^%s", 1);
revs.first_parent_only = !!(bisect_flags & FIND_BISECTION_FIRST_PARENT_ONLY);
revs.limited = 1;
@@ -1112,6 +1115,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
res = bisect_checkout(bisect_rev, no_checkout);
cleanup:
release_revisions(&revs);
+ strvec_clear(&rev_argv);
return res;
}
--
2.37.1.1196.g8af3636bc64
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing
2022-07-29 8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
` (4 preceding siblings ...)
2022-07-29 8:31 ` [PATCH v2 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
@ 2022-07-29 8:31 ` Ævar Arnfjörð Bjarmason
2022-07-29 19:00 ` Jeff King
2022-07-29 19:02 ` [PATCH v2 0/6] revisions API: fix more memory leaks Jeff King
2022-08-02 15:33 ` [PATCH v3 " Ævar Arnfjörð Bjarmason
7 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-29 8:31 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason
Add a "free_removed_argv_elements" member to "struct
setup_revision_opt", and use it to fix several memory leaks, e.g. the
one with a "XXX" comment added in 8a534b61241 (bisect: use argv_array
API, 2011-09-13).
We have various memory leaks in APIs that take and munge "const
char **argv", e.g. parse_options(). Sometimes these APIs are given the
"argv" we get to the "main" function, in which case we don't leak
memory, but other times we're giving it the "v" member of a "struct
strvec" we created.
There's several potential ways to fix those sort of leaks, we could
add a "nodup" mode to "struct strvec", which would work for the cases
where we push constant strings to it. But that wouldn't work as soon
as we used strvec_pushf(), or otherwise needed to duplicate or create
a string for that "struct strvec".
Let's instead make it the responsibility of the revisions API. If it's
going to clobber elements of argv it can also free() them, which it
will now do if instructed to do so via "free_removed_argv_elements".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
bisect.c | 6 ++++--
builtin/submodule--helper.c | 5 ++++-
remote.c | 5 ++++-
revision.c | 2 ++
revision.h | 3 ++-
t/t2020-checkout-detach.sh | 1 +
6 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/bisect.c b/bisect.c
index 6afb98be7a1..38b3891f3a6 100644
--- a/bisect.c
+++ b/bisect.c
@@ -653,6 +653,9 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
const char *bad_format, const char *good_format,
int read_paths)
{
+ struct setup_revision_opt opt = {
+ .free_removed_argv_elements = 1,
+ };
int i;
repo_init_revisions(r, revs, prefix);
@@ -669,8 +672,7 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
if (read_paths)
read_bisect_paths(rev_argv);
- setup_revisions(rev_argv->nr, rev_argv->v, revs, NULL);
- /* XXX leak rev_argv, as "revs" may still be pointing to it */
+ setup_revisions(rev_argv->nr, rev_argv->v, revs, &opt);
}
static void bisect_common(struct rev_info *revs)
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index fac52ade5e1..b63f420ecef 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1104,6 +1104,9 @@ static int compute_summary_module_list(struct object_id *head_oid,
{
struct strvec diff_args = STRVEC_INIT;
struct rev_info rev;
+ struct setup_revision_opt opt = {
+ .free_removed_argv_elements = 1,
+ };
struct module_cb_list list = MODULE_CB_LIST_INIT;
int ret = 0;
@@ -1121,7 +1124,7 @@ static int compute_summary_module_list(struct object_id *head_oid,
init_revisions(&rev, info->prefix);
rev.abbrev = 0;
precompose_argv_prefix(diff_args.nr, diff_args.v, NULL);
- setup_revisions(diff_args.nr, diff_args.v, &rev, NULL);
+ setup_revisions(diff_args.nr, diff_args.v, &rev, &opt);
rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
rev.diffopt.format_callback = submodule_summary_callback;
rev.diffopt.format_callback_data = &list;
diff --git a/remote.c b/remote.c
index 1ee2b145d07..d66064118cb 100644
--- a/remote.c
+++ b/remote.c
@@ -2169,6 +2169,9 @@ static int stat_branch_pair(const char *branch_name, const char *base,
struct object_id oid;
struct commit *ours, *theirs;
struct rev_info revs;
+ struct setup_revision_opt opt = {
+ .free_removed_argv_elements = 1,
+ };
struct strvec argv = STRVEC_INIT;
/* Cannot stat if what we used to build on no longer exists */
@@ -2203,7 +2206,7 @@ static int stat_branch_pair(const char *branch_name, const char *base,
strvec_push(&argv, "--");
repo_init_revisions(the_repository, &revs, NULL);
- setup_revisions(argv.nr, argv.v, &revs, NULL);
+ setup_revisions(argv.nr, argv.v, &revs, &opt);
if (prepare_revision_walk(&revs))
die(_("revision walk setup failed"));
diff --git a/revision.c b/revision.c
index 0c6e26cd9c8..35d24e4fd3e 100644
--- a/revision.c
+++ b/revision.c
@@ -2784,6 +2784,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
const char *arg = argv[i];
if (strcmp(arg, "--"))
continue;
+ if (opt && opt->free_removed_argv_elements)
+ free((char *)argv[i]);
argv[i] = NULL;
argc = i;
if (argv[i + 1])
diff --git a/revision.h b/revision.h
index e576845cdd1..bb91e7ed914 100644
--- a/revision.h
+++ b/revision.h
@@ -375,7 +375,8 @@ struct setup_revision_opt {
const char *def;
void (*tweak)(struct rev_info *, struct setup_revision_opt *);
unsigned int assume_dashdash:1,
- allow_exclude_promisor_objects:1;
+ allow_exclude_promisor_objects:1,
+ free_removed_argv_elements:1;
unsigned revarg_opt;
};
int setup_revisions(int argc, const char **argv, struct rev_info *revs,
diff --git a/t/t2020-checkout-detach.sh b/t/t2020-checkout-detach.sh
index bc46713a43e..2eab6474f8d 100755
--- a/t/t2020-checkout-detach.sh
+++ b/t/t2020-checkout-detach.sh
@@ -4,6 +4,7 @@ test_description='checkout into detached HEAD state'
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
check_detached () {
--
2.37.1.1196.g8af3636bc64
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing
2022-07-29 8:31 ` [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
@ 2022-07-29 19:00 ` Jeff King
0 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-07-29 19:00 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano
On Fri, Jul 29, 2022 at 10:31:08AM +0200, Ævar Arnfjörð Bjarmason wrote:
> Add a "free_removed_argv_elements" member to "struct
> setup_revision_opt", and use it to fix several memory leaks, e.g. the
> one with a "XXX" comment added in 8a534b61241 (bisect: use argv_array
> API, 2011-09-13).
I think the mention of this "XXX" comment is a little misleading. That
comment is not talking about leaking the "--" entry. It is talking about
leaking the _entire_ strvec, because of memory-lifetime issues. And we
fixed those already in 5.
It's a small point, and the code here is fine. But if you take my
suggested replacement for 5, then it becomes even more nonsensical. ;)
-Peff
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v2 0/6] revisions API: fix more memory leaks
2022-07-29 8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
` (5 preceding siblings ...)
2022-07-29 8:31 ` [PATCH v2 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
@ 2022-07-29 19:02 ` Jeff King
2022-08-02 15:33 ` [PATCH v3 " Ævar Arnfjörð Bjarmason
7 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-07-29 19:02 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano
On Fri, Jul 29, 2022 at 10:31:02AM +0200, Ævar Arnfjörð Bjarmason wrote:
> A late re-roll of this set of revisions API and API user memory leak
> fixes. I think the much simpler code here in 4/6 should address the
> points Jeff King brought up in the v1 review.
>
> Other than that I renamed the variable in 3/6 s/cp/pending/g, per
> Jeff's suggestion (FWIW "cp" = "copy").
Thanks (and for the explanation regarding t91xx; I did totally miss in
the v1 review that those were removing "FAIL" lines, not "PASS" ones).
I don't think there's anything incorrect here, but I did make a few
cosmetic / explanation suggestions.
-Peff
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3 0/6] revisions API: fix more memory leaks
2022-07-29 8:31 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
` (6 preceding siblings ...)
2022-07-29 19:02 ` [PATCH v2 0/6] revisions API: fix more memory leaks Jeff King
@ 2022-08-02 15:33 ` Ævar Arnfjörð Bjarmason
2022-08-02 15:33 ` [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
` (6 more replies)
7 siblings, 7 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Eric Sunshine,
Ævar Arnfjörð Bjarmason
This series fixes various leaking users of the revisions API.
Changes since v2:
* I think the rewrite here of fixing the leak should address Jeff
King's concerns. I was somewhat mentally stuck on trying to change
the code to make the leak fix easier, and then having the leak fix.
But I think as the new 3/6 shows fixing the leak first is much
easier, and more straightforward to explain.
* A new 4/6 then follows-up and rewrites the variable clobbering that
was needed for the pre-image of 3/6.
* Don't reference the "XXX" comment in 6/6.
* A typo fix, pointed out by Eric Sunshine.
Thanks all for the review on the v2!
Ævar Arnfjörð Bjarmason (6):
bisect.c: add missing "goto" for release_revisions()
test-fast-rebase helper: use release_revisions() (again)
log: fix a memory leak in "git show <revision>..."
log: refactor "rev.pending" code in cmd_show()
bisect.c: partially fix bisect_rev_setup() memory leak
revisions API: don't leak memory on argv elements that need free()-ing
bisect.c | 28 ++++++++++-------
builtin/log.c | 31 +++++++++++++------
builtin/submodule--helper.c | 5 ++-
remote.c | 5 ++-
revision.c | 2 ++
revision.h | 3 +-
t/helper/test-fast-rebase.c | 2 --
t/t0203-gettext-setlocale-sanity.sh | 1 +
t/t1020-subdirectory.sh | 1 +
t/t2020-checkout-detach.sh | 1 +
t/t3307-notes-man.sh | 1 +
t/t3920-crlf-messages.sh | 2 ++
t/t4069-remerge-diff.sh | 1 +
t/t7007-show.sh | 1 +
...3-pre-commit-and-pre-merge-commit-hooks.sh | 1 +
t/t9122-git-svn-author.sh | 1 -
t/t9162-git-svn-dcommit-interactive.sh | 1 -
17 files changed, 59 insertions(+), 28 deletions(-)
Range-diff against v2:
1: 12a4a20c59f = 1: 12a4a20c59f bisect.c: add missing "goto" for release_revisions()
2: bbd3a7e5ecc ! 2: a04fade9d9d test-fast-rebase helper: use release_revisions() (again)
@@ Commit message
... when != \( I \| &I \)
That rule should arguably use only &I, not I (as we might be passed a
- pointer). He distinction would matter if anyone cared about the
+ pointer). The distinction would matter if anyone cared about the
side-effects of a memset() followed by release() of a pointer to a
variable passed into the function.
4: 54b632c1124 ! 3: 83fc1835fe5 log: fix common "rev.pending" memory leak in "git show"
@@ Metadata
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## Commit message ##
- log: fix common "rev.pending" memory leak in "git show"
-
- Fix a very common memory leak introduced in 5d7eeee2ac6 (git-show: grok blobs,
- trees and tags, too, 2006-12-14).
-
- When "git show" displays commits it needs to temporarily clobbers the
- "rev.pending" array, but by doing so we'll fail to
- release_revisions(), as we have for most other uses of "struct
- rev_info" since 2da81d1efb0 (Merge branch 'ab/plug-leak-in-revisions',
- 2022-06-07).
-
- In the preceding commit this code was made to use a more extendable
- pattern, which we can now complete. Once we've clobbered our
- "rev.pending" and invoked "cmd_log_walk_no_free()" we need to
- "object_array_clear()" our old "rev.pending".
-
- So in addition to the now-populated &pending array, to avoid leaking
- the memory related to the one member array we've created. The
- &rev.pending was already being object_array_clear()'d by the
- release_revisions() added in f6bfea0ad01 (revisions API users: use
- release_revisions() in builtin/log.c, 2022-04-13), now we'll also
- correctly free the previous data (f6bfea0ad01 noted this memory leak
- as an outstanding TODO).
-
- This way of doing it is making assumptions about the internals of
- "struct rev_info", in particular that the "pending" member is
- stand-alone, and that no other state is referring to it. But we've
- been making those assumptions ever since 5d7eeee2ac6 (git-show: grok
- blobs, trees and tags, too, 2006-12-14).
+ log: fix a memory leak in "git show <revision>..."
+
+ Fix a memory leak in code added in 5d7eeee2ac6 (git-show: grok blobs,
+ trees and tags, too, 2006-12-14). As we iterate over a "<revision>..."
+ command-line and encounter ad OBJ_COMMIT we want to use our "struct
+ rev_info", but with a "pending" array of one element: the one commit
+ we're showing in the loop.
+
+ To do this 5d7eeee2ac6 saved away a pointer to rev.pending.objects and
+ rev.pending.nr for its iteration. We'd then clobber those (and alloc)
+ when we needed to show an OBJ_COMMIT.
+
+ We'd therefore leak the "rev.pending" we started out with, and only
+ free the new "rev.pending" in the "OBJ_COMMIT" case arm as
+ prepare_revision_walk() would draw it down.
+
+ Let's fix this memory leak. Now when we encounter an OBJ_COMMIT we
+ save away the "rev.pending" before clearing it. We then add a single
+ commit to it, which our indirect invocation of prepare_revision_walk()
+ will remove. After that we restore the "rev.pending".
+
+ Our "rev.pending" will then get free'd by the release_revisions()
+ added in f6bfea0ad01 (revisions API users: use release_revisions() in
+ builtin/log.c, 2022-04-13)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## builtin/log.c ##
-@@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
- return cmd_log_deinit(cmd_log_walk(&rev), &rev);
-
- memcpy(&pending, &rev.pending, sizeof(rev.pending));
-+ memcpy(&rev.pending, &blank, sizeof(rev.pending));
- rev.diffopt.no_free = 1;
- for (i = 0; i < pending.nr && !ret; i++) {
- struct object *o = pending.objects[i].item;
@@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
rev.shown_one = 1;
break;
case OBJ_COMMIT:
-- memcpy(&rev.pending, &blank, sizeof(rev.pending));
++ {
++ struct object_array old;
++
++ memcpy(&old, &rev.pending, sizeof(old));
+ rev.pending.nr = rev.pending.alloc = 0;
+ rev.pending.objects = NULL;
add_object_array(o, name, &rev.pending);
ret = cmd_log_walk_no_free(&rev);
++ memcpy(&rev.pending, &old, sizeof(rev.pending));
break;
-@@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
++ }
+ default:
ret = error(_("unknown type: %d"), o->type);
}
- }
-+ object_array_clear(&pending);
-
- rev.diffopt.no_free = 0;
- diff_free(&rev.diffopt);
## t/t0203-gettext-setlocale-sanity.sh ##
@@
3: 1629299f883 ! 4: fd474666e7c log: make the intent of cmd_show()'s "rev.pending" juggling clearer
@@ Metadata
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## Commit message ##
- log: make the intent of cmd_show()'s "rev.pending" juggling clearer
+ log: refactor "rev.pending" code in cmd_show()
- Adjust code added in 5d7eeee2ac6 (git-show: grok blobs, trees and
- tags, too, 2006-12-14) to use the "memcpy a &blank" idiom introduced
- in 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT
- macro, 2021-07-01).
+ Refactor the juggling of "rev.pending" and our replacement for it
+ amended in the preceding commit so that:
- Now the types in play are guaranteed to correspond, i.e. we used "int"
- here for the "count" before, but the corresponding "nr" is an
- "unsigned int". By using a "blank" object we almost entirely bypass
- that, we'll only need to declare our own "unsigned int i".
+ * We use an "unsigned int" instead of an "int" for "i", this matches
+ the types of "struct rev_info" itself.
- There are no functional changes here aside from potential overflow
- guard rails, the structure only has these three members ("nr", "alloc"
- and "objects"), but now we're obviously future-proof against assuming
- that.
+ * We don't need the "count" and "objects" variables introduced in
+ 5d7eeee2ac6 (git-show: grok blobs, trees and tags, too, 2006-12-14).
+
+ They were originally added since we'd clobber rev.pending in the
+ loop without restoring it. Since the preceding commit we are
+ restoring it when we handle OBJ_COMMIT, so the main for-loop can
+ refer to "rev.pending" didrectly.
+
+ * We use the "memcpy a &blank" idiom introduced in
+ 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT
+ macro, 2021-07-01).
+
+ This is more obvious than relying on us enumerating all of the
+ relevant members of the "struct object_array" that we need to
+ clear.
+
+ * We comment on why we don't need an object_array_clear() here, see
+ the analysis in [1].
+
+ 1. https://lore.kernel.org/git/YuQtJ2DxNKX%2Fy70N@coredump.intra.peff.net/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
+ Helped-by: Jeff King <peff@peff.net>
## builtin/log.c ##
@@ builtin/log.c: static void show_setup_revisions_tweak(struct rev_info *rev,
@@ builtin/log.c: static void show_setup_revisions_tweak(struct rev_info *rev,
{
struct rev_info rev;
- struct object_array_entry *objects;
-+ struct object_array blank = OBJECT_ARRAY_INIT;
-+ struct object_array pending = OBJECT_ARRAY_INIT;
+ unsigned int i;
struct setup_revision_opt opt;
struct pathspec match_all;
@@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
- count = rev.pending.nr;
- objects = rev.pending.objects;
-+ memcpy(&pending, &rev.pending, sizeof(rev.pending));
rev.diffopt.no_free = 1;
- for (i = 0; i < count && !ret; i++) {
- struct object *o = objects[i].item;
- const char *name = objects[i].name;
-+ for (i = 0; i < pending.nr && !ret; i++) {
-+ struct object *o = pending.objects[i].item;
-+ const char *name = pending.objects[i].name;
++ for (i = 0; i < rev.pending.nr && !ret; i++) {
++ struct object *o = rev.pending.objects[i].item;
++ const char *name = rev.pending.objects[i].name;
switch (o->type) {
case OBJ_BLOB:
ret = show_blob_object(&o->oid, &rev, name);
@@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
ret = error(_("could not read object %s"),
oid_to_hex(oid));
- objects[i].item = o;
-+ pending.objects[i].item = o;
++ rev.pending.objects[i].item = o;
i--;
break;
}
@@ builtin/log.c: int cmd_show(int argc, const char **argv, const char *prefix)
- rev.shown_one = 1;
- break;
case OBJ_COMMIT:
+ {
+ struct object_array old;
++ struct object_array blank = OBJECT_ARRAY_INIT;
+
+ memcpy(&old, &rev.pending, sizeof(old));
- rev.pending.nr = rev.pending.alloc = 0;
- rev.pending.objects = NULL;
+ memcpy(&rev.pending, &blank, sizeof(rev.pending));
++
add_object_array(o, name, &rev.pending);
ret = cmd_log_walk_no_free(&rev);
++
++ /*
++ * No need for
++ * object_array_clear(&pending). It was
++ * cleared already in prepare_revision_walk()
++ */
+ memcpy(&rev.pending, &old, sizeof(rev.pending));
break;
+ }
5: 2af2bce2d17 = 5: dae5872e722 bisect.c: partially fix bisect_rev_setup() memory leak
6: 3c57e126554 ! 6: 5e637f55ff1 revisions API: don't leak memory on argv elements that need free()-ing
@@ Commit message
revisions API: don't leak memory on argv elements that need free()-ing
Add a "free_removed_argv_elements" member to "struct
- setup_revision_opt", and use it to fix several memory leaks, e.g. the
- one with a "XXX" comment added in 8a534b61241 (bisect: use argv_array
- API, 2011-09-13).
+ setup_revision_opt", and use it to fix several memory leaks.
We have various memory leaks in APIs that take and munge "const
char **argv", e.g. parse_options(). Sometimes these APIs are given the
--
2.37.1.1233.ge8b09efaedc
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions()
2022-08-02 15:33 ` [PATCH v3 " Ævar Arnfjörð Bjarmason
@ 2022-08-02 15:33 ` Ævar Arnfjörð Bjarmason
2022-08-03 17:13 ` Junio C Hamano
2022-08-02 15:33 ` [PATCH v3 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
` (5 subsequent siblings)
6 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Eric Sunshine,
Ævar Arnfjörð Bjarmason
Add a missing "goto cleanup", this fixes a bug in
f196c1e908d (revisions API users: use release_revisions() needing
REV_INFO_INIT, 2022-04-13).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
bisect.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/bisect.c b/bisect.c
index b63669cc9d7..421470bfa59 100644
--- a/bisect.c
+++ b/bisect.c
@@ -1054,7 +1054,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
*/
res = error_if_skipped_commits(tried, NULL);
if (res < 0)
- return res;
+ goto cleanup;
printf(_("%s was both %s and %s\n"),
oid_to_hex(current_bad_oid),
term_good,
--
2.37.1.1233.ge8b09efaedc
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions()
2022-08-02 15:33 ` [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
@ 2022-08-03 17:13 ` Junio C Hamano
0 siblings, 0 replies; 52+ messages in thread
From: Junio C Hamano @ 2022-08-03 17:13 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Eric Sunshine
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> Add a missing "goto cleanup", this fixes a bug in
> f196c1e908d (revisions API users: use release_revisions() needing
> REV_INFO_INIT, 2022-04-13).
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> bisect.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/bisect.c b/bisect.c
> index b63669cc9d7..421470bfa59 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -1054,7 +1054,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
> */
> res = error_if_skipped_commits(tried, NULL);
> if (res < 0)
> - return res;
> + goto cleanup;
> printf(_("%s was both %s and %s\n"),
> oid_to_hex(current_bad_oid),
> term_good,
OK, from the cleanup: label we call release_revisions(&rev) which
was used in the "failed" attempt to find bisection. Looks good.
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3 2/6] test-fast-rebase helper: use release_revisions() (again)
2022-08-02 15:33 ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2022-08-02 15:33 ` [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
@ 2022-08-02 15:33 ` Ævar Arnfjörð Bjarmason
2022-08-02 15:33 ` [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..." Ævar Arnfjörð Bjarmason
` (4 subsequent siblings)
6 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Eric Sunshine,
Ævar Arnfjörð Bjarmason
Fix a bug in 0139c58ab95 (revisions API users: add "goto cleanup" for
release_revisions(), 2022-04-13), in that commit a release_revisions()
call was added to this function, but it never did anything due to this
TODO memset() added in fe1a21d5267 (fast-rebase: demonstrate
merge-ort's API via new test-tool command, 2020-10-29).
Simply removing the memset() will fix the "cmdline" which can be seen
when running t5520-pull.sh.
This sort of thing could be detected automatically with a rule similar
to the unused.cocci merged in 7fa60d2a5b6 (Merge branch
'ab/cocci-unused' into next, 2022-07-11). The following rule on top
would catch the case being fixed here:
@@
type T;
identifier I;
identifier REL1 =~ "^[a-z_]*_(release|reset|clear|free)$";
identifier REL2 =~ "^(release|clear|free)_[a-z_]*$";
@@
- memset(\( I \| &I \), 0, ...);
... when != \( I \| &I \)
(
\( REL1 \| REL2 \)( \( I \| &I \), ...);
|
\( REL1 \| REL2 \)( \( &I \| I \) );
)
... when != \( I \| &I \)
That rule should arguably use only &I, not I (as we might be passed a
pointer). The distinction would matter if anyone cared about the
side-effects of a memset() followed by release() of a pointer to a
variable passed into the function.
As such a pattern would be at best very confusing, and most likely
point to buggy code as in this case, the above rule is probably fine
as-is.
But as this rule only found one such bug in the entire codebase let's
not add it to contrib/coccinelle/unused.cocci for now, we can always
dig it up in the future if it's deemed useful.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/helper/test-fast-rebase.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/t/helper/test-fast-rebase.c b/t/helper/test-fast-rebase.c
index 4e5553e2024..45665ec19a5 100644
--- a/t/helper/test-fast-rebase.c
+++ b/t/helper/test-fast-rebase.c
@@ -184,8 +184,6 @@ int cmd__fast_rebase(int argc, const char **argv)
last_picked_commit = commit;
last_commit = create_commit(result.tree, commit, last_commit);
}
- /* TODO: There should be some kind of rev_info_free(&revs) call... */
- memset(&revs, 0, sizeof(revs));
merge_switch_to_result(&merge_opt, head_tree, &result, 1, !result.clean);
--
2.37.1.1233.ge8b09efaedc
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..."
2022-08-02 15:33 ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2022-08-02 15:33 ` [PATCH v3 1/6] bisect.c: add missing "goto" for release_revisions() Ævar Arnfjörð Bjarmason
2022-08-02 15:33 ` [PATCH v3 2/6] test-fast-rebase helper: use release_revisions() (again) Ævar Arnfjörð Bjarmason
@ 2022-08-02 15:33 ` Ævar Arnfjörð Bjarmason
2022-08-03 17:27 ` Jeff King
2022-08-03 17:54 ` Junio C Hamano
2022-08-02 15:33 ` [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show() Ævar Arnfjörð Bjarmason
` (3 subsequent siblings)
6 siblings, 2 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Eric Sunshine,
Ævar Arnfjörð Bjarmason
Fix a memory leak in code added in 5d7eeee2ac6 (git-show: grok blobs,
trees and tags, too, 2006-12-14). As we iterate over a "<revision>..."
command-line and encounter ad OBJ_COMMIT we want to use our "struct
rev_info", but with a "pending" array of one element: the one commit
we're showing in the loop.
To do this 5d7eeee2ac6 saved away a pointer to rev.pending.objects and
rev.pending.nr for its iteration. We'd then clobber those (and alloc)
when we needed to show an OBJ_COMMIT.
We'd therefore leak the "rev.pending" we started out with, and only
free the new "rev.pending" in the "OBJ_COMMIT" case arm as
prepare_revision_walk() would draw it down.
Let's fix this memory leak. Now when we encounter an OBJ_COMMIT we
save away the "rev.pending" before clearing it. We then add a single
commit to it, which our indirect invocation of prepare_revision_walk()
will remove. After that we restore the "rev.pending".
Our "rev.pending" will then get free'd by the release_revisions()
added in f6bfea0ad01 (revisions API users: use release_revisions() in
builtin/log.c, 2022-04-13)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/log.c | 6 ++++++
t/t0203-gettext-setlocale-sanity.sh | 1 +
t/t1020-subdirectory.sh | 1 +
t/t3307-notes-man.sh | 1 +
t/t3920-crlf-messages.sh | 2 ++
t/t4069-remerge-diff.sh | 1 +
t/t7007-show.sh | 1 +
t/t7503-pre-commit-and-pre-merge-commit-hooks.sh | 1 +
t/t9122-git-svn-author.sh | 1 -
t/t9162-git-svn-dcommit-interactive.sh | 1 -
10 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/builtin/log.c b/builtin/log.c
index 88a5e98875a..b4b1d974617 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -743,11 +743,17 @@ int cmd_show(int argc, const char **argv, const char *prefix)
rev.shown_one = 1;
break;
case OBJ_COMMIT:
+ {
+ struct object_array old;
+
+ memcpy(&old, &rev.pending, sizeof(old));
rev.pending.nr = rev.pending.alloc = 0;
rev.pending.objects = NULL;
add_object_array(o, name, &rev.pending);
ret = cmd_log_walk_no_free(&rev);
+ memcpy(&rev.pending, &old, sizeof(rev.pending));
break;
+ }
default:
ret = error(_("unknown type: %d"), o->type);
}
diff --git a/t/t0203-gettext-setlocale-sanity.sh b/t/t0203-gettext-setlocale-sanity.sh
index 0ce1f22eff6..86cff324ff1 100755
--- a/t/t0203-gettext-setlocale-sanity.sh
+++ b/t/t0203-gettext-setlocale-sanity.sh
@@ -5,6 +5,7 @@
test_description="The Git C functions aren't broken by setlocale(3)"
+TEST_PASSES_SANITIZE_LEAK=true
. ./lib-gettext.sh
test_expect_success 'git show a ISO-8859-1 commit under C locale' '
diff --git a/t/t1020-subdirectory.sh b/t/t1020-subdirectory.sh
index 9fdbb2af80e..45eef9457fe 100755
--- a/t/t1020-subdirectory.sh
+++ b/t/t1020-subdirectory.sh
@@ -6,6 +6,7 @@
test_description='Try various core-level commands in subdirectory.
'
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
. "$TEST_DIRECTORY"/lib-read-tree.sh
diff --git a/t/t3307-notes-man.sh b/t/t3307-notes-man.sh
index 1aa366a410e..ae316502c45 100755
--- a/t/t3307-notes-man.sh
+++ b/t/t3307-notes-man.sh
@@ -4,6 +4,7 @@ test_description='Examples from the git-notes man page
Make sure the manual is not full of lies.'
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
test_expect_success 'setup' '
diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index 0276edbe3d3..4c661d4d54a 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -1,6 +1,8 @@
#!/bin/sh
test_description='Test ref-filter and pretty APIs for commit and tag messages using CRLF'
+
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
LIB_CRLF_BRANCHES=""
diff --git a/t/t4069-remerge-diff.sh b/t/t4069-remerge-diff.sh
index 35f94957fce..9e7cac68b1c 100755
--- a/t/t4069-remerge-diff.sh
+++ b/t/t4069-remerge-diff.sh
@@ -2,6 +2,7 @@
test_description='remerge-diff handling'
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
# This test is ort-specific
diff --git a/t/t7007-show.sh b/t/t7007-show.sh
index d6cc69e0f2c..f908a4d1abc 100755
--- a/t/t7007-show.sh
+++ b/t/t7007-show.sh
@@ -2,6 +2,7 @@
test_description='git show'
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
test_expect_success setup '
diff --git a/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh b/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
index ad1eb64ba0d..aa004b70a8d 100755
--- a/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
+++ b/t/t7503-pre-commit-and-pre-merge-commit-hooks.sh
@@ -5,6 +5,7 @@ test_description='pre-commit and pre-merge-commit hooks'
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
test_expect_success 'root commit' '
diff --git a/t/t9122-git-svn-author.sh b/t/t9122-git-svn-author.sh
index 527ba3d2932..0fc289ae0f0 100755
--- a/t/t9122-git-svn-author.sh
+++ b/t/t9122-git-svn-author.sh
@@ -2,7 +2,6 @@
test_description='git svn authorship'
-TEST_FAILS_SANITIZE_LEAK=true
. ./lib-git-svn.sh
test_expect_success 'setup svn repository' '
diff --git a/t/t9162-git-svn-dcommit-interactive.sh b/t/t9162-git-svn-dcommit-interactive.sh
index e2aa8ed88a9..b3ce033a0d3 100755
--- a/t/t9162-git-svn-dcommit-interactive.sh
+++ b/t/t9162-git-svn-dcommit-interactive.sh
@@ -4,7 +4,6 @@
test_description='git svn dcommit --interactive series'
-TEST_FAILS_SANITIZE_LEAK=true
. ./lib-git-svn.sh
test_expect_success 'initialize repo' '
--
2.37.1.1233.ge8b09efaedc
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..."
2022-08-02 15:33 ` [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..." Ævar Arnfjörð Bjarmason
@ 2022-08-03 17:27 ` Jeff King
2022-08-03 17:29 ` Jeff King
2022-08-03 17:54 ` Junio C Hamano
1 sibling, 1 reply; 52+ messages in thread
From: Jeff King @ 2022-08-03 17:27 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Eric Sunshine
On Tue, Aug 02, 2022 at 05:33:13PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Let's fix this memory leak. Now when we encounter an OBJ_COMMIT we
> save away the "rev.pending" before clearing it. We then add a single
> commit to it, which our indirect invocation of prepare_revision_walk()
> will remove. After that we restore the "rev.pending".
>
> Our "rev.pending" will then get free'd by the release_revisions()
> added in f6bfea0ad01 (revisions API users: use release_revisions() in
> builtin/log.c, 2022-04-13)
OK, I agree this is doing the correct thing. I actually found the
earlier "let's dissociate rev.pending from rev entirely" approach easier
to reason about, though.
> diff --git a/builtin/log.c b/builtin/log.c
> index 88a5e98875a..b4b1d974617 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -743,11 +743,17 @@ int cmd_show(int argc, const char **argv, const char *prefix)
> rev.shown_one = 1;
> break;
> case OBJ_COMMIT:
> + {
> + struct object_array old;
> +
> + memcpy(&old, &rev.pending, sizeof(old));
> rev.pending.nr = rev.pending.alloc = 0;
> rev.pending.objects = NULL;
> add_object_array(o, name, &rev.pending);
> ret = cmd_log_walk_no_free(&rev);
> + memcpy(&rev.pending, &old, sizeof(rev.pending));
> break;
> + }
Here we overwrite the one-item rev.pending without freeing it, but just
immediately after instead of before. It's a little subtle, but your
comment in the commit message:
[...] and only free the new "rev.pending" in the "OBJ_COMMIT" case arm
as prepare_revision_walk() would draw it down.
covers that. IMHO that could be spelled out a bit more (particularly
that this only works for OBJ_COMMIT, but that's OK because that's the
only type we're adding), but I can live with it.
-Peff
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..."
2022-08-03 17:27 ` Jeff King
@ 2022-08-03 17:29 ` Jeff King
0 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-08-03 17:29 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Eric Sunshine
On Wed, Aug 03, 2022 at 01:27:36PM -0400, Jeff King wrote:
> > diff --git a/builtin/log.c b/builtin/log.c
> > index 88a5e98875a..b4b1d974617 100644
> > --- a/builtin/log.c
> > +++ b/builtin/log.c
> > @@ -743,11 +743,17 @@ int cmd_show(int argc, const char **argv, const char *prefix)
> > rev.shown_one = 1;
> > break;
> > case OBJ_COMMIT:
> > + {
> > + struct object_array old;
> > +
> > + memcpy(&old, &rev.pending, sizeof(old));
> > rev.pending.nr = rev.pending.alloc = 0;
> > rev.pending.objects = NULL;
> > add_object_array(o, name, &rev.pending);
> > ret = cmd_log_walk_no_free(&rev);
> > + memcpy(&rev.pending, &old, sizeof(rev.pending));
> > break;
> > + }
>
> Here we overwrite the one-item rev.pending without freeing it, but just
> immediately after instead of before. It's a little subtle, but your
> comment in the commit message:
>
> [...] and only free the new "rev.pending" in the "OBJ_COMMIT" case arm
> as prepare_revision_walk() would draw it down.
>
> covers that. IMHO that could be spelled out a bit more (particularly
> that this only works for OBJ_COMMIT, but that's OK because that's the
> only type we're adding), but I can live with it.
Ah, I see that you did add a comment in the next commit. That's
sufficient, though I really think it makes more sense here, where we're
actually dealing with leaks.
-Peff
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..."
2022-08-02 15:33 ` [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..." Ævar Arnfjörð Bjarmason
2022-08-03 17:27 ` Jeff King
@ 2022-08-03 17:54 ` Junio C Hamano
1 sibling, 0 replies; 52+ messages in thread
From: Junio C Hamano @ 2022-08-03 17:54 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Eric Sunshine
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> Fix a memory leak in code added in 5d7eeee2ac6 (git-show: grok blobs,
> trees and tags, too, 2006-12-14). As we iterate over a "<revision>..."
> command-line and encounter ad OBJ_COMMIT we want to use our "struct
"... encounter an OBJ_COMMIT, we want to ..."?
> rev_info", but with a "pending" array of one element: the one commit
> we're showing in the loop.
>
> To do this 5d7eeee2ac6 saved away a pointer to rev.pending.objects and
> rev.pending.nr for its iteration. We'd then clobber those (and alloc)
> when we needed to show an OBJ_COMMIT.
>
> We'd therefore leak the "rev.pending" we started out with, and only
> free the new "rev.pending" in the "OBJ_COMMIT" case arm as
> prepare_revision_walk() would draw it down.
>
> Let's fix this memory leak. Now when we encounter an OBJ_COMMIT we
> save away the "rev.pending" before clearing it. We then add a single
> commit to it, which our indirect invocation of prepare_revision_walk()
> will remove. After that we restore the "rev.pending".
>
> Our "rev.pending" will then get free'd by the release_revisions()
> added in f6bfea0ad01 (revisions API users: use release_revisions() in
> builtin/log.c, 2022-04-13)
Hmph, this gives me a strange sense of deja-vu that I saw a better
solution in a separate patch from you, strange because I do not see
anything at the tip of 'seen' like what I thought you did elsewhere.
We do need to reuse "rev_info" we got from outside the loop so that
we will have to consistently apply diffopt and other things we got
in there from the configuration and the command line. But after we
decide to go to "each object is shown individually" mode, the
contents of the pending array (rather, what we got from the command
line in cmdline member) is only relevant to enumerate which
individual objects are shown in the loop, and should not even be
visible to the code that handles each individual object, e.g. even
we pass &rev to this code when we see a blob:
switch (o->type) {
case OBJ_BLOB:
ret = show_blob_object(&o->oid, &rev, name);
break;
there is no point in exposing the rev.pending to show_blob_object()
at al. The same is true for codepaths used to show a tag or a tree.
When showing a commit, we do not even want the codepath that shows a
single-commit range to touch and destroy the original rev.pending;
we instead want to give a single-element pending array.
So instead of keeping rev.pending when we enter the loop and saving
it away and restoring it, it feels a lot cleaner to invent/use an
interface to "clone" the settings in an existing rev_info by:
* Grab rev.pending into a separate "struct object_array" that is a
local variable in cmd_show() and clear rev.pending, immediately
after we decide to go to "show individually" mode.
* Iterate over the objects in that local variable. For each object
to be shown, prepare a rev_info, clone the setting from the
original rev_info, put that single object to the pending member
of the clone, use that cloned rev_info, and release the resources
held by the cloned rev_info once we are done.
> diff --git a/builtin/log.c b/builtin/log.c
> index 88a5e98875a..b4b1d974617 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -743,11 +743,17 @@ int cmd_show(int argc, const char **argv, const char *prefix)
> rev.shown_one = 1;
> break;
> case OBJ_COMMIT:
> + {
> + struct object_array old;
> +
> + memcpy(&old, &rev.pending, sizeof(old));
> rev.pending.nr = rev.pending.alloc = 0;
> rev.pending.objects = NULL;
> add_object_array(o, name, &rev.pending);
> ret = cmd_log_walk_no_free(&rev);
> + memcpy(&rev.pending, &old, sizeof(rev.pending));
> break;
> + }
The reason why I find this approach a bit disturbing is that we
pretend to know for certain that pending is the only thing we need
to protect across multiple calls to the log_walk(), but I suspect
that such an overconfidence will come back and bite us. We are not
re-initializing other states reachable from the rev_info (e.g. the
diff_options struct has various members that records what happened
in an invocation, like needed_rename_limit, found_follow, and
found_changes, that would want to be reinitialized if we are
starting a new and totally independent traversal after we are done
one traversal).
But I do not mind at all to leave such a fundamental clean-up to a
totally separate topic, and keep this patch only about "we are
clobbering and leaking rev.pending, let's plug the leak". In fact,
I would prefer it that way. So take all of the above as material
for possible NEEDSWORK comment, food for further thought.
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show()
2022-08-02 15:33 ` [PATCH v3 " Ævar Arnfjörð Bjarmason
` (2 preceding siblings ...)
2022-08-02 15:33 ` [PATCH v3 3/6] log: fix a memory leak in "git show <revision>..." Ævar Arnfjörð Bjarmason
@ 2022-08-02 15:33 ` Ævar Arnfjörð Bjarmason
2022-08-03 17:32 ` Jeff King
2022-08-03 17:59 ` Junio C Hamano
2022-08-02 15:33 ` [PATCH v3 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
` (2 subsequent siblings)
6 siblings, 2 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Eric Sunshine,
Ævar Arnfjörð Bjarmason
Refactor the juggling of "rev.pending" and our replacement for it
amended in the preceding commit so that:
* We use an "unsigned int" instead of an "int" for "i", this matches
the types of "struct rev_info" itself.
* We don't need the "count" and "objects" variables introduced in
5d7eeee2ac6 (git-show: grok blobs, trees and tags, too, 2006-12-14).
They were originally added since we'd clobber rev.pending in the
loop without restoring it. Since the preceding commit we are
restoring it when we handle OBJ_COMMIT, so the main for-loop can
refer to "rev.pending" didrectly.
* We use the "memcpy a &blank" idiom introduced in
5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT
macro, 2021-07-01).
This is more obvious than relying on us enumerating all of the
relevant members of the "struct object_array" that we need to
clear.
* We comment on why we don't need an object_array_clear() here, see
the analysis in [1].
1. https://lore.kernel.org/git/YuQtJ2DxNKX%2Fy70N@coredump.intra.peff.net/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Jeff King <peff@peff.net>
---
builtin/log.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/builtin/log.c b/builtin/log.c
index b4b1d974617..9b937d59b83 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -668,10 +668,10 @@ static void show_setup_revisions_tweak(struct rev_info *rev,
int cmd_show(int argc, const char **argv, const char *prefix)
{
struct rev_info rev;
- struct object_array_entry *objects;
+ unsigned int i;
struct setup_revision_opt opt;
struct pathspec match_all;
- int i, count, ret = 0;
+ int ret = 0;
init_log_defaults();
git_config(git_log_config, NULL);
@@ -698,12 +698,10 @@ int cmd_show(int argc, const char **argv, const char *prefix)
if (!rev.no_walk)
return cmd_log_deinit(cmd_log_walk(&rev), &rev);
- count = rev.pending.nr;
- objects = rev.pending.objects;
rev.diffopt.no_free = 1;
- for (i = 0; i < count && !ret; i++) {
- struct object *o = objects[i].item;
- const char *name = objects[i].name;
+ for (i = 0; i < rev.pending.nr && !ret; i++) {
+ struct object *o = rev.pending.objects[i].item;
+ const char *name = rev.pending.objects[i].name;
switch (o->type) {
case OBJ_BLOB:
ret = show_blob_object(&o->oid, &rev, name);
@@ -726,7 +724,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
if (!o)
ret = error(_("could not read object %s"),
oid_to_hex(oid));
- objects[i].item = o;
+ rev.pending.objects[i].item = o;
i--;
break;
}
@@ -745,12 +743,19 @@ int cmd_show(int argc, const char **argv, const char *prefix)
case OBJ_COMMIT:
{
struct object_array old;
+ struct object_array blank = OBJECT_ARRAY_INIT;
memcpy(&old, &rev.pending, sizeof(old));
- rev.pending.nr = rev.pending.alloc = 0;
- rev.pending.objects = NULL;
+ memcpy(&rev.pending, &blank, sizeof(rev.pending));
+
add_object_array(o, name, &rev.pending);
ret = cmd_log_walk_no_free(&rev);
+
+ /*
+ * No need for
+ * object_array_clear(&pending). It was
+ * cleared already in prepare_revision_walk()
+ */
memcpy(&rev.pending, &old, sizeof(rev.pending));
break;
}
--
2.37.1.1233.ge8b09efaedc
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show()
2022-08-02 15:33 ` [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show() Ævar Arnfjörð Bjarmason
@ 2022-08-03 17:32 ` Jeff King
2022-08-03 17:59 ` Junio C Hamano
1 sibling, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-08-03 17:32 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Eric Sunshine
On Tue, Aug 02, 2022 at 05:33:14PM +0200, Ævar Arnfjörð Bjarmason wrote:
> * We don't need the "count" and "objects" variables introduced in
> 5d7eeee2ac6 (git-show: grok blobs, trees and tags, too, 2006-12-14).
>
> They were originally added since we'd clobber rev.pending in the
> loop without restoring it. Since the preceding commit we are
> restoring it when we handle OBJ_COMMIT, so the main for-loop can
> refer to "rev.pending" didrectly.
I think this is accurate, though it does feel a bit weird that we are
iterating over rev.pending, and we clobber and restore it mid-loop.
It's correct because of the restore, but I think that's why my gut
feeling favored the earlier approach to completely dissociate the
iteration variables from "rev.pending" before the loop even starts.
That said, it seems like we're spending a lot more time going back and
forth on this topic than it is really worth, so I can live with any of
the versions.
-Peff
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show()
2022-08-02 15:33 ` [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show() Ævar Arnfjörð Bjarmason
2022-08-03 17:32 ` Jeff King
@ 2022-08-03 17:59 ` Junio C Hamano
2022-08-04 7:51 ` Ævar Arnfjörð Bjarmason
1 sibling, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2022-08-03 17:59 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Eric Sunshine
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> Refactor the juggling of "rev.pending" and our replacement for it
> amended in the preceding commit so that:
> ...
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Helped-by: Jeff King <peff@peff.net>
List trailer lines chronologically, please.
This does not look wrong per-se, but given that we might want to
rethink the way the original rev_info is reused inside the loop,
as I mentioned in my review of the previous step, this change may be
irrelevant in the longer term.
I may have said this earlier, but once we start using the "prepare a
blank one, copy it to clear another" pattern, we should stop using
memcpy() and use structure assignment, especially if we are trying to
make our intent clear.
> @@ -745,12 +743,19 @@ int cmd_show(int argc, const char **argv, const char *prefix)
> case OBJ_COMMIT:
> {
> struct object_array old;
> + struct object_array blank = OBJECT_ARRAY_INIT;
>
> memcpy(&old, &rev.pending, sizeof(old));
> - rev.pending.nr = rev.pending.alloc = 0;
> - rev.pending.objects = NULL;
> + memcpy(&rev.pending, &blank, sizeof(rev.pending));
> +
> add_object_array(o, name, &rev.pending);
> ret = cmd_log_walk_no_free(&rev);
> +
> + /*
> + * No need for
> + * object_array_clear(&pending). It was
> + * cleared already in prepare_revision_walk()
> + */
> memcpy(&rev.pending, &old, sizeof(rev.pending));
> break;
> }
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show()
2022-08-03 17:59 ` Junio C Hamano
@ 2022-08-04 7:51 ` Ævar Arnfjörð Bjarmason
2022-08-04 15:59 ` Junio C Hamano
0 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-04 7:51 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Jeff King, Eric Sunshine
On Wed, Aug 03 2022, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> Refactor the juggling of "rev.pending" and our replacement for it
>> amended in the preceding commit so that:
>> ...
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> Helped-by: Jeff King <peff@peff.net>
>
> List trailer lines chronologically, please.
Willdo.
> I may have said this earlier, but once we start using the "prepare a
> blank one, copy it to clear another" pattern, we should stop using
> memcpy() and use structure assignment, especially if we are trying to
> make our intent clear.
Yeah, I saw that. I took it as we should consider changing this more
generally (e.g. with coccicheck etc.).
This was mentioned in one of the original threads about the memcpy()
idiom, but IIRC there was some reason to think that it wasn't as widely
supported, or in any case we'd want to re-rest that the compilers we
care about similarly optimize it.
So I think it's best to just use the more widely used pattern for now,
and if we'd like change them all in some follow-up change...
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show()
2022-08-04 7:51 ` Ævar Arnfjörð Bjarmason
@ 2022-08-04 15:59 ` Junio C Hamano
2022-08-05 9:01 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2022-08-04 15:59 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Eric Sunshine
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> Yeah, I saw that. I took it as we should consider changing this more
> generally (e.g. with coccicheck etc.).
To make things easier in the future, for the record, I in general do
not suggest such a bulk rewrite for the sake of rewrite, whether
driven with Coccinelle or something else, and I did not in this
case.
> This was mentioned in one of the original threads about the memcpy()
> idiom, but IIRC there was some reason to think that it wasn't as widely
> supported, ...
I somehow thought that we had that stage too long ago; I recall we
spotted struct assignment in a patch post release and left it there
without reverting.
> ... or in any case we'd want to re-rest that the compilers we
> care about similarly optimize it.
Perhaps. Using struct assignment only when we feel an urge to use
memcpy() in a new code (or in the postimage of a newly rewritten
code), instead of doing a bulk update, would give us a chance to
start small and vet the result with compilers of such a small scale
rewrite carefully to build confidence, hopefully?
Thanks.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show()
2022-08-04 15:59 ` Junio C Hamano
@ 2022-08-05 9:01 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-05 9:01 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Jeff King, Eric Sunshine
On Thu, Aug 04 2022, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> [...]
>> ... or in any case we'd want to re-rest that the compilers we
>> care about similarly optimize it.
>
> Perhaps. Using struct assignment only when we feel an urge to use
> memcpy() in a new code (or in the postimage of a newly rewritten
> code), instead of doing a bulk update, would give us a chance to
> start small and vet the result with compilers of such a small scale
> rewrite carefully to build confidence, hopefully?
I think it would make sense to set out a baloon for it, but as some of
it (and I'll note this in a re-roll) involves some new C99 support I'd
prefer not to conflate that with this series.
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3 5/6] bisect.c: partially fix bisect_rev_setup() memory leak
2022-08-02 15:33 ` [PATCH v3 " Ævar Arnfjörð Bjarmason
` (3 preceding siblings ...)
2022-08-02 15:33 ` [PATCH v3 4/6] log: refactor "rev.pending" code in cmd_show() Ævar Arnfjörð Bjarmason
@ 2022-08-02 15:33 ` Ævar Arnfjörð Bjarmason
2022-08-02 15:33 ` [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
2022-08-03 17:33 ` [PATCH v3 0/6] revisions API: fix more memory leaks Jeff King
6 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Eric Sunshine,
Ævar Arnfjörð Bjarmason
Partially fix the memory leak noted in in 8a534b61241 (bisect: use
argv_array API, 2011-09-13), which added the "XXX" comment seen in the
context. We can partially fix it by having the bisect_rev_setup()
function take a "struct strvec", rather than constructing it.
As the comment notes we need to keep the construct "rev_argv" around
while the "struct rev_info" is around, which as seen in the newly
added "strvec_clear()" calls here we do after "release_revisions()".
This "partially" fixes the memory leak because we're leaking the "--"
added to the "rev_argv" here still, which will be addressed in a
subsequent commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
bisect.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/bisect.c b/bisect.c
index 421470bfa59..6afb98be7a1 100644
--- a/bisect.c
+++ b/bisect.c
@@ -648,11 +648,11 @@ static struct commit_list *managed_skipped(struct commit_list *list,
}
static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
+ struct strvec *rev_argv,
const char *prefix,
const char *bad_format, const char *good_format,
int read_paths)
{
- struct strvec rev_argv = STRVEC_INIT;
int i;
repo_init_revisions(r, revs, prefix);
@@ -660,16 +660,16 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
revs->commit_format = CMIT_FMT_UNSPECIFIED;
/* rev_argv.argv[0] will be ignored by setup_revisions */
- strvec_push(&rev_argv, "bisect_rev_setup");
- strvec_pushf(&rev_argv, bad_format, oid_to_hex(current_bad_oid));
+ strvec_push(rev_argv, "bisect_rev_setup");
+ strvec_pushf(rev_argv, bad_format, oid_to_hex(current_bad_oid));
for (i = 0; i < good_revs.nr; i++)
- strvec_pushf(&rev_argv, good_format,
+ strvec_pushf(rev_argv, good_format,
oid_to_hex(good_revs.oid + i));
- strvec_push(&rev_argv, "--");
+ strvec_push(rev_argv, "--");
if (read_paths)
- read_bisect_paths(&rev_argv);
+ read_bisect_paths(rev_argv);
- setup_revisions(rev_argv.nr, rev_argv.v, revs, NULL);
+ setup_revisions(rev_argv->nr, rev_argv->v, revs, NULL);
/* XXX leak rev_argv, as "revs" may still be pointing to it */
}
@@ -873,10 +873,11 @@ static enum bisect_error check_merge_bases(int rev_nr, struct commit **rev, int
static int check_ancestors(struct repository *r, int rev_nr,
struct commit **rev, const char *prefix)
{
+ struct strvec rev_argv = STRVEC_INIT;
struct rev_info revs;
int res;
- bisect_rev_setup(r, &revs, prefix, "^%s", "%s", 0);
+ bisect_rev_setup(r, &revs, &rev_argv, prefix, "^%s", "%s", 0);
bisect_common(&revs);
res = (revs.commits != NULL);
@@ -885,6 +886,7 @@ static int check_ancestors(struct repository *r, int rev_nr,
clear_commit_marks_many(rev_nr, rev, ALL_REV_FLAGS);
release_revisions(&revs);
+ strvec_clear(&rev_argv);
return res;
}
@@ -1010,6 +1012,7 @@ void read_bisect_terms(const char **read_bad, const char **read_good)
*/
enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
{
+ struct strvec rev_argv = STRVEC_INIT;
struct rev_info revs = REV_INFO_INIT;
struct commit_list *tried;
int reaches = 0, all = 0, nr, steps;
@@ -1037,7 +1040,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
if (res)
goto cleanup;
- bisect_rev_setup(r, &revs, prefix, "%s", "^%s", 1);
+ bisect_rev_setup(r, &revs, &rev_argv, prefix, "%s", "^%s", 1);
revs.first_parent_only = !!(bisect_flags & FIND_BISECTION_FIRST_PARENT_ONLY);
revs.limited = 1;
@@ -1112,6 +1115,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
res = bisect_checkout(bisect_rev, no_checkout);
cleanup:
release_revisions(&revs);
+ strvec_clear(&rev_argv);
return res;
}
--
2.37.1.1233.ge8b09efaedc
^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing
2022-08-02 15:33 ` [PATCH v3 " Ævar Arnfjörð Bjarmason
` (4 preceding siblings ...)
2022-08-02 15:33 ` [PATCH v3 5/6] bisect.c: partially fix bisect_rev_setup() memory leak Ævar Arnfjörð Bjarmason
@ 2022-08-02 15:33 ` Ævar Arnfjörð Bjarmason
2022-08-03 18:30 ` Junio C Hamano
2022-08-03 17:33 ` [PATCH v3 0/6] revisions API: fix more memory leaks Jeff King
6 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-08-02 15:33 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Eric Sunshine,
Ævar Arnfjörð Bjarmason
Add a "free_removed_argv_elements" member to "struct
setup_revision_opt", and use it to fix several memory leaks.
We have various memory leaks in APIs that take and munge "const
char **argv", e.g. parse_options(). Sometimes these APIs are given the
"argv" we get to the "main" function, in which case we don't leak
memory, but other times we're giving it the "v" member of a "struct
strvec" we created.
There's several potential ways to fix those sort of leaks, we could
add a "nodup" mode to "struct strvec", which would work for the cases
where we push constant strings to it. But that wouldn't work as soon
as we used strvec_pushf(), or otherwise needed to duplicate or create
a string for that "struct strvec".
Let's instead make it the responsibility of the revisions API. If it's
going to clobber elements of argv it can also free() them, which it
will now do if instructed to do so via "free_removed_argv_elements".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
bisect.c | 6 ++++--
builtin/submodule--helper.c | 5 ++++-
remote.c | 5 ++++-
revision.c | 2 ++
revision.h | 3 ++-
t/t2020-checkout-detach.sh | 1 +
6 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/bisect.c b/bisect.c
index 6afb98be7a1..38b3891f3a6 100644
--- a/bisect.c
+++ b/bisect.c
@@ -653,6 +653,9 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
const char *bad_format, const char *good_format,
int read_paths)
{
+ struct setup_revision_opt opt = {
+ .free_removed_argv_elements = 1,
+ };
int i;
repo_init_revisions(r, revs, prefix);
@@ -669,8 +672,7 @@ static void bisect_rev_setup(struct repository *r, struct rev_info *revs,
if (read_paths)
read_bisect_paths(rev_argv);
- setup_revisions(rev_argv->nr, rev_argv->v, revs, NULL);
- /* XXX leak rev_argv, as "revs" may still be pointing to it */
+ setup_revisions(rev_argv->nr, rev_argv->v, revs, &opt);
}
static void bisect_common(struct rev_info *revs)
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index fac52ade5e1..b63f420ecef 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1104,6 +1104,9 @@ static int compute_summary_module_list(struct object_id *head_oid,
{
struct strvec diff_args = STRVEC_INIT;
struct rev_info rev;
+ struct setup_revision_opt opt = {
+ .free_removed_argv_elements = 1,
+ };
struct module_cb_list list = MODULE_CB_LIST_INIT;
int ret = 0;
@@ -1121,7 +1124,7 @@ static int compute_summary_module_list(struct object_id *head_oid,
init_revisions(&rev, info->prefix);
rev.abbrev = 0;
precompose_argv_prefix(diff_args.nr, diff_args.v, NULL);
- setup_revisions(diff_args.nr, diff_args.v, &rev, NULL);
+ setup_revisions(diff_args.nr, diff_args.v, &rev, &opt);
rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
rev.diffopt.format_callback = submodule_summary_callback;
rev.diffopt.format_callback_data = &list;
diff --git a/remote.c b/remote.c
index 1ee2b145d07..d66064118cb 100644
--- a/remote.c
+++ b/remote.c
@@ -2169,6 +2169,9 @@ static int stat_branch_pair(const char *branch_name, const char *base,
struct object_id oid;
struct commit *ours, *theirs;
struct rev_info revs;
+ struct setup_revision_opt opt = {
+ .free_removed_argv_elements = 1,
+ };
struct strvec argv = STRVEC_INIT;
/* Cannot stat if what we used to build on no longer exists */
@@ -2203,7 +2206,7 @@ static int stat_branch_pair(const char *branch_name, const char *base,
strvec_push(&argv, "--");
repo_init_revisions(the_repository, &revs, NULL);
- setup_revisions(argv.nr, argv.v, &revs, NULL);
+ setup_revisions(argv.nr, argv.v, &revs, &opt);
if (prepare_revision_walk(&revs))
die(_("revision walk setup failed"));
diff --git a/revision.c b/revision.c
index 0c6e26cd9c8..35d24e4fd3e 100644
--- a/revision.c
+++ b/revision.c
@@ -2784,6 +2784,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
const char *arg = argv[i];
if (strcmp(arg, "--"))
continue;
+ if (opt && opt->free_removed_argv_elements)
+ free((char *)argv[i]);
argv[i] = NULL;
argc = i;
if (argv[i + 1])
diff --git a/revision.h b/revision.h
index e576845cdd1..bb91e7ed914 100644
--- a/revision.h
+++ b/revision.h
@@ -375,7 +375,8 @@ struct setup_revision_opt {
const char *def;
void (*tweak)(struct rev_info *, struct setup_revision_opt *);
unsigned int assume_dashdash:1,
- allow_exclude_promisor_objects:1;
+ allow_exclude_promisor_objects:1,
+ free_removed_argv_elements:1;
unsigned revarg_opt;
};
int setup_revisions(int argc, const char **argv, struct rev_info *revs,
diff --git a/t/t2020-checkout-detach.sh b/t/t2020-checkout-detach.sh
index bc46713a43e..2eab6474f8d 100755
--- a/t/t2020-checkout-detach.sh
+++ b/t/t2020-checkout-detach.sh
@@ -4,6 +4,7 @@ test_description='checkout into detached HEAD state'
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
check_detached () {
--
2.37.1.1233.ge8b09efaedc
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing
2022-08-02 15:33 ` [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
@ 2022-08-03 18:30 ` Junio C Hamano
0 siblings, 0 replies; 52+ messages in thread
From: Junio C Hamano @ 2022-08-03 18:30 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Eric Sunshine
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> @@ -2784,6 +2784,8 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
> const char *arg = argv[i];
> if (strcmp(arg, "--"))
> continue;
> + if (opt && opt->free_removed_argv_elements)
> + free((char *)argv[i]);
We have "arg", let's free that instead.
> argv[i] = NULL;
> argc = i;
> unsigned int assume_dashdash:1,
> - allow_exclude_promisor_objects:1;
> + allow_exclude_promisor_objects:1,
> + free_removed_argv_elements:1;
It would be nicer to pick a name that explains why we want to "free
removed argv[] elements" than "if you remove argv[] elements, then
please free them", because the explanation on the reason why we
request the API to free would help future developers to decide if
they need to free in some situations where they do not "remove" but
do something else in their updates to the revisions API.
If we cannot come up with a better name, at least we should add a
comment that says that the caller owns the **argv strings, and it
asks the API to free those if reference to them are lost in any way.
Thanks. Overall the series was a pleasant read.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3 0/6] revisions API: fix more memory leaks
2022-08-02 15:33 ` [PATCH v3 " Ævar Arnfjörð Bjarmason
` (5 preceding siblings ...)
2022-08-02 15:33 ` [PATCH v3 6/6] revisions API: don't leak memory on argv elements that need free()-ing Ævar Arnfjörð Bjarmason
@ 2022-08-03 17:33 ` Jeff King
6 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2022-08-03 17:33 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Eric Sunshine
On Tue, Aug 02, 2022 at 05:33:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Changes since v2:
>
> * I think the rewrite here of fixing the leak should address Jeff
> King's concerns. I was somewhat mentally stuck on trying to change
> the code to make the leak fix easier, and then having the leak fix.
>
> But I think as the new 3/6 shows fixing the leak first is much
> easier, and more straightforward to explain.
Yeah, I think fixing the leak first is fine. I did prefer the end result
of the earlier round, but I can live with it either way.
> * A new 4/6 then follows-up and rewrites the variable clobbering that
> was needed for the pre-image of 3/6.
>
> * Don't reference the "XXX" comment in 6/6.
I'd hoped you'd take my replacement 5/6, but again, I can live with it
either way.
-Peff
^ permalink raw reply [flat|nested] 52+ messages in thread