git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 3/3] dir: fix problematic API to avoid memory leaks
Date: Sun, 16 Aug 2020 05:11:54 -0400	[thread overview]
Message-ID: <20200816091154.GC1221900@coredump.intra.peff.net> (raw)
In-Reply-To: <b9310e9941e91336258edd97a913e5908180720e.1597561152.git.gitgitgadget@gmail.com>

On Sun, Aug 16, 2020 at 06:59:11AM +0000, Elijah Newren via GitGitGadget wrote:

> Digging further, I found that despite the pretty clear documentation
> near the top of dir.h that folks were supposed to call clear_directory()
> when the user no longer needed the dir_struct, there were four callers
> that didn't bother doing that at all.  However, two of them clearly
> thought about leaks since they had an UNLEAK(dir) directive, which to me
> suggests that the method to free the data was too unclear.  I suspect
> the non-obviousness of the API and its holes led folks to avoid it,
> which then snowballed into further problems with the entries[],
> ignored[], parent_hashmap, and recursive_hashmap problems.

The UNLEAK() ones are sort-of my fault, and are a combination of:

  - The commit adding them says "in some cases (e.g., dir_struct) we
    don't even have a function which knows how to free all of the struct
    members". I'm not sure if why I didn't fix clear_directory() as you
    did. I may not have known about it, or I may have been lazy. Or more
    likely I was simply holding the UNLEAK() hammer, so it looked like a
    nail. ;)

  - My focus was on making leak-checker output cleaner. And it wasn't
    clear for cases where we're about to exit the program whether we
    should be actually freeing (which has small but non-zero cost) or
    just annotating (which is zero-cost, but clearly has confused some
    people about how UNLEAK() was meant to be used). I think I'm leaning
    these days towards "free if it is easy to do so".

So this definitely seems like a good direction to me.

> Rename clear_directory() to dir_free() to be more in line with other
> data structures in git, and introduce a dir_init() to handle the
> suggested memsetting of dir_struct to all zeroes.  I hope that a name
> like "dir_free()" is more clear, and that the presence of dir_init()
> will provide a hint to those looking at the code that there may be a
> corresponding dir_free() that they need to call.

I think having a pair of matched calls is good. I _thought_ we had
established a pattern that we should prefer "clear" to "free" for cases
where the struct itself its not freed. But grepping around, I see there
are a few exceptions (hashmap_free() is the big one, and then
oidmap_free() which is built around it seems to have inherited it).

The rest seem to follow that pattern, though: attr_check_free,
cache_tree_free, and submodule_cache_free all actually free the pointer
passed in. And "git grep '_clear(' *.h" shows lots of
clear-but-don't-free examples.

Would dir_clear() make the pairing more obvious, but keep the clear
name?

(I also wouldn't be opposed to changing hashmap and oidmap to use the
name "clear", but that's obviously a separate patch).

>  builtin/add.c          |  4 ++--
>  builtin/check-ignore.c |  4 ++--
>  builtin/clean.c        |  8 ++++----
>  builtin/grep.c         |  3 ++-
>  builtin/ls-files.c     |  4 ++--
>  builtin/stash.c        |  4 ++--
>  dir.c                  |  7 ++++++-
>  dir.h                  | 19 ++++++++++---------
>  merge.c                |  3 ++-
>  wt-status.c            |  4 ++--
>  10 files changed, 34 insertions(+), 26 deletions(-)

That patch itself looks good except for two minor points:

>  /* Frees memory within dir which was allocated.  Does not free dir itself. */
> -void clear_directory(struct dir_struct *dir)
> +void dir_free(struct dir_struct *dir)
>  {
>  	int i, j;
>  	struct exclude_list_group *group;

As I mentioned in my previous email, I think it would be nice if this
called dir_init() at the end, so that the struct is always in a
consistent state.

> diff --git a/dir.h b/dir.h
> index 7d76d0644f..7c55c1a460 100644
> --- a/dir.h
> +++ b/dir.h
> [...]
> - * - Use `dir.entries[]`.
> + * - Use `dir.entries[]` and `dir.ignored[]`.
>   *
>   * - Call `clear_directory()` when the contained elements are no longer in use.
>   *

This last line should become dir_free() / dir_clear().

-Peff

  reply	other threads:[~2020-08-16  9:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-16  6:59 [PATCH 0/3] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-16  6:59 ` [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap Elijah Newren via GitGitGadget
2020-08-16  8:43   ` Jeff King
2020-08-17 16:57     ` Elijah Newren
2020-08-16  6:59 ` [PATCH 2/3] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-16  8:54   ` Jeff King
2020-08-17 16:58     ` Elijah Newren
2020-08-16  6:59 ` [PATCH 3/3] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-16  9:11   ` Jeff King [this message]
2020-08-17 17:19     ` Elijah Newren
2020-08-17 19:00       ` Jeff King
2020-08-18 22:58 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-18 22:58   ` [PATCH v2 1/2] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-18 22:58   ` [PATCH v2 2/2] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-19 13:51   ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200816091154.GC1221900@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).