From: Jeff King <peff@peff.net>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 3/3] dir: fix problematic API to avoid memory leaks
Date: Sun, 16 Aug 2020 05:11:54 -0400 [thread overview]
Message-ID: <20200816091154.GC1221900@coredump.intra.peff.net> (raw)
In-Reply-To: <b9310e9941e91336258edd97a913e5908180720e.1597561152.git.gitgitgadget@gmail.com>
On Sun, Aug 16, 2020 at 06:59:11AM +0000, Elijah Newren via GitGitGadget wrote:
> Digging further, I found that despite the pretty clear documentation
> near the top of dir.h that folks were supposed to call clear_directory()
> when the user no longer needed the dir_struct, there were four callers
> that didn't bother doing that at all. However, two of them clearly
> thought about leaks since they had an UNLEAK(dir) directive, which to me
> suggests that the method to free the data was too unclear. I suspect
> the non-obviousness of the API and its holes led folks to avoid it,
> which then snowballed into further problems with the entries[],
> ignored[], parent_hashmap, and recursive_hashmap problems.
The UNLEAK() ones are sort-of my fault, and are a combination of:
- The commit adding them says "in some cases (e.g., dir_struct) we
don't even have a function which knows how to free all of the struct
members". I'm not sure if why I didn't fix clear_directory() as you
did. I may not have known about it, or I may have been lazy. Or more
likely I was simply holding the UNLEAK() hammer, so it looked like a
nail. ;)
- My focus was on making leak-checker output cleaner. And it wasn't
clear for cases where we're about to exit the program whether we
should be actually freeing (which has small but non-zero cost) or
just annotating (which is zero-cost, but clearly has confused some
people about how UNLEAK() was meant to be used). I think I'm leaning
these days towards "free if it is easy to do so".
So this definitely seems like a good direction to me.
> Rename clear_directory() to dir_free() to be more in line with other
> data structures in git, and introduce a dir_init() to handle the
> suggested memsetting of dir_struct to all zeroes. I hope that a name
> like "dir_free()" is more clear, and that the presence of dir_init()
> will provide a hint to those looking at the code that there may be a
> corresponding dir_free() that they need to call.
I think having a pair of matched calls is good. I _thought_ we had
established a pattern that we should prefer "clear" to "free" for cases
where the struct itself its not freed. But grepping around, I see there
are a few exceptions (hashmap_free() is the big one, and then
oidmap_free() which is built around it seems to have inherited it).
The rest seem to follow that pattern, though: attr_check_free,
cache_tree_free, and submodule_cache_free all actually free the pointer
passed in. And "git grep '_clear(' *.h" shows lots of
clear-but-don't-free examples.
Would dir_clear() make the pairing more obvious, but keep the clear
name?
(I also wouldn't be opposed to changing hashmap and oidmap to use the
name "clear", but that's obviously a separate patch).
> builtin/add.c | 4 ++--
> builtin/check-ignore.c | 4 ++--
> builtin/clean.c | 8 ++++----
> builtin/grep.c | 3 ++-
> builtin/ls-files.c | 4 ++--
> builtin/stash.c | 4 ++--
> dir.c | 7 ++++++-
> dir.h | 19 ++++++++++---------
> merge.c | 3 ++-
> wt-status.c | 4 ++--
> 10 files changed, 34 insertions(+), 26 deletions(-)
That patch itself looks good except for two minor points:
> /* Frees memory within dir which was allocated. Does not free dir itself. */
> -void clear_directory(struct dir_struct *dir)
> +void dir_free(struct dir_struct *dir)
> {
> int i, j;
> struct exclude_list_group *group;
As I mentioned in my previous email, I think it would be nice if this
called dir_init() at the end, so that the struct is always in a
consistent state.
> diff --git a/dir.h b/dir.h
> index 7d76d0644f..7c55c1a460 100644
> --- a/dir.h
> +++ b/dir.h
> [...]
> - * - Use `dir.entries[]`.
> + * - Use `dir.entries[]` and `dir.ignored[]`.
> *
> * - Call `clear_directory()` when the contained elements are no longer in use.
> *
This last line should become dir_free() / dir_clear().
-Peff
next prev parent reply other threads:[~2020-08-16 9:11 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-16 6:59 [PATCH 0/3] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-16 6:59 ` [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap Elijah Newren via GitGitGadget
2020-08-16 8:43 ` Jeff King
2020-08-17 16:57 ` Elijah Newren
2020-08-16 6:59 ` [PATCH 2/3] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-16 8:54 ` Jeff King
2020-08-17 16:58 ` Elijah Newren
2020-08-16 6:59 ` [PATCH 3/3] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-16 9:11 ` Jeff King [this message]
2020-08-17 17:19 ` Elijah Newren
2020-08-17 19:00 ` Jeff King
2020-08-18 22:58 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-18 22:58 ` [PATCH v2 1/2] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-18 22:58 ` [PATCH v2 2/2] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-19 13:51 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200816091154.GC1221900@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).